My recommendation tho of course is to find an existing project you’re passionate about, find their tracker, and dig in! Few softwares are ever truly done. Make one you love even better.
That is how you end up never contributing anything or having your first pull request be rejected.
The random driveby contributor strategy has never worked for me. Most of my projects had me as their "long term" contributor or the opposite, I let one of my contributors become one.
Organization is critical to the success of a project and you're basically telling someone to forget about it and waste the maintainers time.
As a maintainer of a lot of large/popular projects, this couldn't be further from the truth. I stick that label onto things that are typically trivial and I don't want to allot much time to aside from approving/merging a PR and letting it get stuffed into the next release cycle.
> That is how you end up never contributing anything or having your first pull request be rejected.
Best way to avoid having your pull request rejected, is to make sure the change is actually wanted, which you can figure out by communicating.
So if some random webapp has a button in the wrong place, don't open a PR moving the button to the right place, open a issue and confirm it's wrong, and confirm the maintainer(s) would be fine with the suggested change. If something is unclear, clear it up before spending time on the PR.
Once consensus is reached, it'll be trivial to get the PR merged.
By just using common sense one can still contribute to a project without making a huge commitment. One of the most important things to do is to check how they respond to pull requests. The vast majority of projects are positive and responsive towards them. There are some that seem to sit in limbo for various reasons but even those can later be merged into an unofficial fork and still be useful.
I've had dozens of "driveby" contributions get approved and merged. Just check the CONTRIBUTING.md if there's one and make sure your pull request includes everything the maintainer may want to know.
I'd like to make a pitch for Openlibrary.org the free online library from Internet Archive that includes a fulltext search of millions of books.
I've been volunteering with them on and off for several years and it's always a lovely experience. Their backend is python and frontend mostly from python templates and some Vue for librarian stuff.
Every Tuesday they have a call on Zoom that everyone is welcome to join to share what they're working on, ask for help, and generally chat a bit. It's a great time.
Depending on what you're interested in there's a lot to do from helping build import pipelines for more book entries, writing bots to cleanup data, Performance improvements, better documenting public APIs, etc
I'm currently slowly working on a wikidata integration for their authors page. We also could use some help upgrading to Vue 3, mentors for Google summer of code would be helpful, find of ML projects needing help, moving away from old jQuery libraries, etc.
Cool project. Where is the data and how was it acquired / what's the provenance? I found this:
> Data: We have a bunch of catalog data and fulltext acquired from various sources, either sitting in the Archive or to be uploaded to there. I think the acquisition processes (including web crawling scripts for some of the data) is outside the scope of an Open Library software install. There are a bunch of additional scripts to make the stuff usable in openlibrary and these need to be documented. These include TDB Conversion Scripts written by dbg, and (for OCA fulltext) Archive Spidering and Solr Importing scripts written by phr.
edit: I should add the question: does openlibrary support fulltext search of actual books or fulltext search of the metadata?
If you would like to see the data that's currently in Open library, there are dumps published every month or so. Where the data comes from is either partners or one off projects to scrape from public sources or sometimes just people adding what they like.
Fulltext search of actual books! And also metadata.
If you're serious about getting involved I'd say the best thing to do it just come to the meeting if you can. The GitHub has a lot of things that without context can be hard to pickup. Most of the easy stuff is on the JS side.
Small Python library, slowly and steadily growing in use.
New contributors, junior devs welcome. Mainly just maintenance work, occasional bug fixes and small feature development. Fine someone looking for casual involvement.
Sure, I write small python CLI utils that help me solve media organization, media consumption, and sometimes data analysis. I use this every day on Linux and Android but I haven't tested it on other platforms. There are a lot of different subcommands and, although the CLI package will always be opinionated to some extent, there is a lot of niche functionality which might not need to exist. So I'm open to things being refactored or new subcommands being added. [1]
I have a lot of ideas for new ones, for example, I want a CLI that can take an artist name like "Theodor Kittelsen" and fetch highest quality public domain images--but I realize any implementation that does this well will be somewhat fragile so I haven't really attempted that yet. Other ideas that I have are often solved by piping output from one of my existing commands to another or adding some optional args to an existing command.
This is such a major effort: getting up to speed on existing code bases to the point you can add to them.
I feel for the author of this comment, you want to help out and work on stuff you have energy at the beginning, I find it easier to write my own code than to get up to speed with someone else's code. Because you lose steam or the activation energy to get the project built and ran and then played with and customised/changed/"hacked on" is a major effort.
I have been thinking a lot on an idea inspired by compiler design: intermediate representations and term rewriting.
If software features were an arbitrary stack of crisscrossing intermediate representations that are rewritten and mutually recursive/referential or parsed or transpiled into actual code, then we could inspect the intermediate representations to work out how things work.
It would be nice to narrow down on a piece of behaviour and see how it works from end-to-end. But in practice, you have an opaque wholeness of a codebase to understand.
A modern system or codebase at a company or mature open source project: it's like those games of wooden sticks or wooden bricks jenga where they're arranged in a pile and you're piling things on-top of things and if you unsettle it slightly, it falls over or doesn't work.
I used a piece of software called OpenGrok which renders a large code base as clickable surfable wiki in the browser. So you can explore codebases.
I have a primitive python SQL database on my github that can execute simple graph cypher queries, simple joins of multiple tables, "dynamodb" style queries and document database queries.
> This is such a major effort: getting up to speed on existing code bases to the point you can add to them.
It's a great thing that different projects have different scopes! If you're just looking to contribute to some project to gain experience, it might be a good idea to start with some smaller scoped-project before jumping into larger ones.
For example, if you're interested to contributing to the core of Kubernetes, maybe start by contributing to a plugin that interact with Kubernetes. Eventually, while trying to contribute to the plugin, you get indirectly exposed to some of the internals of the parent project, and eventually you'll be able to faster jump into contributing to Kubernetes itself.
I saw you mention Python, but if you also happen to be interested in Golang and data streaming, https://benthos.dev is a good project to contribute to. There are quite a few issues open on the GitHub project which anyone can pick up. Writing new connectors and adding tests / docs is always a good place to start. The maintainer is super-friendly and he's always active on the https://benthos.dev/community channels. I'm also there most of the time, since I've been contributing to the project for several years now.
This is awesome! Not OP but I think I found something to work on - I've worked as a backend for a company like fivetran so I feel I might have something worth offering
You're welcome! Please feel free to join the community channels I linked above. There are blobtalk meetings on Discord every 2 weeks where we can discuss ideas in detail and, otherwise, we're always active in the chat.
I know about this project https://www.codetriage.com/?language=Python that might be of interest to you. Basically you pick what repos you want to work on and receive issue suggestions in your inbox.
Why not create one of your own? There are so many ideas a person can come up with these days, and so much information available free online, on how to do things - even before LLMs, which I personally would avoid, for a learning project like this.
Plus, if you create your own project, you can make it into a product, and try to market and sell it.
My recommendation tho of course is to find an existing project you’re passionate about, find their tracker, and dig in! Few softwares are ever truly done. Make one you love even better.