Current agentic systems don't just cram the entire code base into the context window. They use a CST (e.g., https://tree-sitter.github.io/tree-sitter/) to have a high-level overview of function signatures and the location/names of the files that contain them. They then autonomously open and read files that they think are relevant to implementing a particular feature before they start. Then you implement a feature or whatever and start with a new task and fresh context window.
I question the conceit of the article, which seems to be based on the flawed assumption that the entire codebase (or even large chunks of it) has to fit in the context window. There is an argument to be made for token efficiency, but I'm not sure the article makes it convincingly.
I thought about the same thing the other day, but most LLMs are unfortunately not trained on large Ruby on Rails codebases. If you ask Claude about the most preferred languages it can code, it would recommend using Python or Javascript.
Maybe DHH can sponsor fine-tuning a LLM with more ruby on rails data :P
RoR is just a web framework, do they mean just Ruby? But I understand the take away from the article, because Ruby is so good at code golfing, it is perfectly suitable for cramming in a lot of information with less tokens.
There is a talk about code golfing with Ruby where they took Ruby to a bizarre level I could barely comprehend was possible.
Python is already similar and already has a huge amount of training data. In fact Python backends written with FastAPI do much better than the absolute mess typeScript and nextjs creates when being used with a LLM
I question the conceit of the article, which seems to be based on the flawed assumption that the entire codebase (or even large chunks of it) has to fit in the context window. There is an argument to be made for token efficiency, but I'm not sure the article makes it convincingly.