Unfortunately, outside of most popular tasks it is much less populated. But it's community driven, so I encourage researchers from areas for which leaderboards are not populated to fill them and try to keep it up to date. It's not hard to choose a single task and update the tables and can be very profitable for the community.
I greatly enjoy publications that provide their implementation (or a simplified) version. Playing around with the problems yourself, can give a much greater insight and understanding in addition to the fundamental, written work.
However, I did encounter situations where providing the direct implementation was seen as a bad thing. It was thought of as `giving away your advantage' and squeezing out multiple papers before even thinking about publishing the code had their preference.
It is great to see more and more research publications go together with their implementations.
Amazing! Saves so much time not having to read a paper only to find there is really no code or very limited code to run. Not having delved too deep into this a suggestion would be to add the level of completeness of code to each paper. There are times a paper has great code included and 80% is there but is missing a crucial piece, sometimes the secret sauce, which would render it if not impossible to use. This is the case with a lot of OpenAI papers.
Excluding some (e.g. OpenAI) papers that have limited their released results for ethical purposes, why would the authors generally not include their code?
I remember going through a digital image processing course in uni where the final project was to implement a paper and check the results, and I remember that our results, when coded were different from the paper’s authors (although I can’t remember if it was because we didn’t code it like they did or if their results were not to be trusted).
It’s just so frustrating and borderline disingenuous to publish results, mention bits of code, but not include the whole code.
> why would the authors generally not include their code?
Decades of "printing out the code listings would cost too much paper" made many researchers get comfortable with the idea that nobody would ever see the code behind their work. That means:
- your code can be shit and potentially buggy and nobody will point and laugh at you
- replication is harder, so you can bluff some of the details
- if key findings are actually bugs, nobody will find out
These are all terrible reasons, but they're comfortable reasons as well. Open access was the first barrier, code included is simply the next.
If code included would've been commonplace, fiascos such as Imperial College of London's possibly-totally-bogus covid simulation [0][1] would've never made it to the cabinet.
This is great! It always bugs me when I can't find an implementation of a paper to reference. I'm much more likely to read a paper if there is code associated with it.
One thing that stands out on it is that the top article is OpenAI's gpt-3 paper. While I've only read popular summaries here, one point that's highlighted is that the model involves 175 billion parameters. Which seems like an indication that while you can download the code and the paper, you're going to be able to run it - at not until/unless GPUs get much bigger (or you have Open AI, Google or similar level resources).
I wonder if you could have a site which not allows the uploading of code but actually involves the code executing on the site.
Check out https://mybinder.org -- if your code is in Jupyter notebooks in a GitHub repo, it'll build a Docker container with whatever dependencies you specify, spin up an instance on some cloud somewhere, and run the notebooks in your browser.
Edit: they do have heavy limits for the version they host, but the platform is open-source, so institutions can set up mybinder instances for their own users.
There's no code for GPT-3. Also, you don't need GPUs to run it if it's ever released. You just need a server with 350GB of RAM. It's going to be slow (probably a few words per minute), but you would be able to run it at a reasonable cost (e.g. r5.16xlarge spot pricing is ~$0.6/hr).
Great website but too bad it's owned by Facebook. Personally I don't feel comfortable that FB will own/manage/control a repo of mostly academic research papers.
In the end it's just an index linking papers usually hosted in arxiv with code usually in GitHub. It's more of a nice starting point for literature research rather than some content owning platform.
I am impressed by its content. I use both for research (to track progress) and teaching. Previous references:
- Measuring the Progress of AI Research by Electronic Frontier Foundation, https://www.eff.org/ai/metrics
- Natural Language Processing Progress https://nlpprogress.com/
are great, but nowhere near Papers with Code, when it comes to the completeness, and UI.