Strongly typed languages have a fundamentally superior iteration strategy for coding agents.
The rust compiler, particularly, will often give extremely specific “how to fix” advice… but in general I see this as a future trend with rust and, increasingly, other languages.
Fundamentally, being able to assert “this code compiles” (and iterate until it does) before returning “completed task” is superior for agents to dynamic languages where the only possible verification is runtime.
(And at best the agent can assert “i guess it looks ok”)
I actually don't think it's that cut and dry. I expect especially that rust (due to lifetimes) will stump LLMs - fixing locally triggers a need for refactor elsewhere.
I actually think a language like Clojure (very functional, very compositional, focus on local, stand-alone functions, manipulate base data-structures (list, set, map), not specialist types (~classes) would do well.
That said, atm. I get WAY more issues in ocaml suggestions from claude than for Python. Training is king - the LLM cannot reason so types are not as big a help as one might think.
> fixing locally triggers a need for refactor elsewhere.
Yes, but such refactors are most of the time very mechanical, and there's no reason to believe the agent won't be able to do it.
> the LLM cannot reason so types are not as big a help as one might think.
You are missing the point: the person you are responding expects it to be superior in an agentic scenario, where the LLM can try its code and see the compiler output, rather than in a pure text-generation situation where the LLM can only assess the code from bird eye view.
No, I think others are missing the point. An "Agentic scenario" is not dissimilar from passing code manually to an AI, it just does it by itself. And if you've tried to use AI for Rust, you would understand why this is not reliable.
An LLM can read compiler output, but how it corrects the code is, ultimately, a semantic guess. It can look at the names of types, it can use its training to guess where new code should go based on types, but it's not able to actually use those types while making changes. It would use them in the same way it would use comments, to inform what code it should output. It makes a guess, checks the compiler output, makes another guess, etc. This may lead to code that compiles, but not code that should be committed by any means. And Rust is not what I'd call a "flexible language," where lots of different coding styles are acceptable in a codebase. You can easily end up with brittle code.
So you don't get much benefits from types, but you do have the overhead of semantic complexity. This is a huge problem for a language like Rust, which is one of the most semantically complicated languages. The best languages are going to be ones that are semantically simple but popular, like Golang. Although I do think Clojure's support is impressive given how little code there is compared to other languages.
> so types are not as big a help as one might think.
Yes, they are.
An agent can combine the compiler type system and iterate.
That is impossible using clojure.
The reason you have problems woth ocaml is that the tooling youre using is too shit to support iterating until the compiler passes before returning the results to you.
…not because tooling doesnt exist. Not because the tooling doesn't work.
—> because you are not using it.
Sure, rust ownership makes it hard for LLMs. Faaair point; but ultimately, why would a coding agent ever suggest code to you that doesnt compile?
Either: a) the agent tooling is poor or b) it is impossible to verify if the code compiles.
One of those is a solvable problem.
One is not.
(Yes, what many current agents do is run test suites; but dynamically generating valid tests is tricky; checking if code compiles is not tricky.)
> An agent can combine the compiler type system and iterate.
> That is impossible using clojure.
It might be impossible to use the compiler type system, but in Clojure you have much more powerful tools for actually working with your program as it runs, one would think this would be a much better setup for an LLM that aims to implement something.
Instead of just relying on the static types based on text, the LLM could actually inspect the live data as the program runs.
Besides, the LLM could also replace individual functions/variables in a running program, without having to restart.
The more I think about it, the more obvious it becomes how well fitted Clojure would be for an LLM to iteratively build an actual working program, compared to other static approaches like using Rust.
I understand the point , however I think explicit types are still superior, due to abundance of data in the training phase. It seems to me to be too computationally hard to incorporate a REPL-like interactive interface in the gpu training loop. Since it’s processing large amounts of data you want to keep it simple, without back-and-forth with CPUs that would kill performance.
And if you can’t do it at training time, it’s hard to expect for the LLM to do well at inference time.
Well, if you could run clojure purely on gpu/inside the neural net, that might be interesting!
Why would it be more expensive to include a REPL-like experienced compared to running the whole of the Rust compiler, in the GPU training loop?
Not that I argued that you should that (I don't think either makes much sense, point was at inference time, not for training), but if you apply that to one side of the argument (for Clojure a REPL), don't you think you should also apply that to the other side (for Rust, a compiler) for a fair comparison?
I agree. I am under the impression that unlike Rust, there aren’t explicit types required in Clojure.
(I don’t know clojure)
So there are examples online, with rust code and types and compiler errors, and how to fix them. But for clojure, the type information is missing and you need to get it from repl.
> So there are examples online, with rust code and types and compiler errors, and how to fix them. But for clojure, the type information is missing and you need to get it from repl.
Right, my point is that instead of the LLM relying on static types and text, with Clojure the LLM could actually inspect the live application. So instead of trying to "understand" that variable A contains 123, it'll do "<execute>(println A)</execute>" and whatever, and then see the results for themselves.
Haven't thought deeply about it, but my intuition tells me the more (accurate and fresh) relevant data you can give the LLM for solving problems, the better. So having the actual live data available is better than trying to figure out what the data would be based on static types and manually following the flow.
If you want to build LLM specific to clojure, it could be probably engineered, to add the types as traces for training via synthetic dataset, and provide them from repl at inference time. Sounds like awfully large amount of work for non mainstream language.
I'm waiting for someone to figure out that coding is essentially a sequence of refactoring steps where each step is a code transformation that transforms it from one valid state to another. Equipping refactoring IDEs with an MCP facade would give direct access to that as well as feedback on compilation state and lots of other information. That makes it a lot easier to do structured transformations of entire code bases without having to feed the entire code base as a context and then hope the LLM hallucinates together the right tokens and uses reasoning to figure out if it might be correct. They are actually pretty good at doing that but it doesn't scale very well currently and gets expensive quickly (in time and tokens).
This stuff is indeed inherently harder for dynamic languages. But it's been standard for (some) statically compiled languages like Java, Kotlin, C#, Scala, etc. for most of this century. I was using refactoring IDEs for Java as early as 2002.
Smalltalk Refactoring Browser! (Where do you think Java IDEs got the idea from?)
"A very large Smalltalk application was developed at Cargill to support the operation of grain elevators and the associated commodity trading activities. The Smalltalk client application has 385 windows and over 5,000 classes. About 2,000 classes in this application interacted with an early (circa 1993) data access framework. The framework dynamically performed a mapping of object attributes to data table columns.
Analysis showed that although dynamic look up consumed 40% of the client execution time, it was unnecessary.
A new data layer interface was developed that required the business class to provide the object attribute to column mapping in an explicitly coded method. Testing showed that this interface was orders of magnitude faster. The issue was how to change the 2,100 business class users of the data layer.
A large application under development cannot freeze code while a transformation of an interface is constructed and tested. We had to construct and test the transformations in a parallel branch of the code repository from the main development stream. When the transformation was fully tested, then it was applied to the main code stream in a single operation.
Less than 35 bugs were found in the 17,100 changes. All of the bugs were quickly resolved in a three-week period.
If the changes were done manually we estimate that it would have taken 8,500 hours, compared with 235 hours to develop the transformation rules.
The task was completed in 3% of the expected time by using Rewrite Rules. This is an improvement by a factor of 36."
from “Transformation of an application data layer” Will Loew-Blosser OOPSLA 2002
It's not really that much harder, if at all, for dynamic languages, because you can use type hints in some cases (i.e. Python), and a different language (typescript) in case of Javascript; there's plenty of tools that'll tell you if you're not respecting those type hints, and you can feed the output to the LLM.
But yeah, if we get better & faster models, then hopefully we might get to a point where we can let the LLM manage its own context itself, and then we can see what it can do with large codebases.
Which based many of their tools on what Xerox PARC has done with their Smalltalk, Mesa (XDE), Mesa/Cedar, Interlisp-D environments.
This kind of processing is possible on dynamic languages, when using an image base system, as it also contains metadata that somehow takes the role of static types.
From the previous list only Mesa and Cedar are statically typed.
Feels like this would be possible to achieve using group theory and a lot of work on representing ASTs of program segments in such a way as to be able to invert them.
On the other hand, using "it compiles" as a heuristic for "it does what I want" seems to be missing the goal of why you're coding what you're coding in the first place. I'd much rather setup one E2E test with how I want the thing to work, then let the magical robot figure out how to get there while also being able to run the test and see if they're there yet or not.
Not really. Even humans regularly get lifetimes wrong.
As someone not super experienced in Rust, my workflow was often very very compiler-error-driven. I would type a bit, see what it says, changes it and so on. Maybe someone more experienced can write whole chucks single-pass that compile on first try but that should far exceed anything generative AI will be able to do in the next few years.
The problem here is that iteration with AI is slow and expensive at the moment.
If anything you want to use a language with automatic garbage collection as it removes mental overhead for both generative AI as well as humans. Also you want to to have a more boilerplate heavy language because they are more easily to reason about while the boilerplate doesn't matter when the AI does the work.
I haven't tried it but I suspect golang should work very well. The language is very stable so older training data still works fine. Projects are very uniform, there isn't much variation in coding style, so easy to grok for AI.
Also probably Java but I suspect it might get confused with the different versions and all the magic certain frameworks use.
I've been saying this for years on X. I think static languages are winning in general now, having gained much of the ergonomics of dynamic languages without sacrificing anything.
But AI thrives with a tight feedback loop, and that's works best with static languages. A Python linter (or even mypy) just isn't as good as the Rust compiler.
The future will be dominated by static languages.
I say this is a long-time dynamic languages and Python proponent who started seeing the light back when Go was first released.
I think this is a great point! I.e. while for humans, it's easier to write not strongly-typed python-like code, as you skip a lot of boiler-plate code, but for AI, the boiler-plate is probably useful, because it reinforces what variable is of what type, and also obviously it's easier to detect errors early on at compilation time.
I actually wonder if that will force languages like python to create a more strictly enforced type modes, as boiler-plate is much less of an issue now.
Hot take, this is a transition step, like the -S switch back when Assembly developers didn't believe compilers could output code as good as themselves.
Eventually a few decades later, optimising backends made hand written Assembly a niche use case.
Eventually AI based programming tools will be able to generate executables. And like it happened with -S we might require the generation into a classical programming language to validate what the AI compiler backend is doing, until it gets good enough and only those arguing on AI Compiler Explorer will care.
It's probably pointless writing run of the mill assembly these days, but SIMD has seen a resurgence in low-level coding, at least until compilers get better at generating it. I don't think I'd fully trust LLM generated SIMD code as if it was flawed it'd be a nightmare to debug.
This won't be a thing and for very obvious reasons.
Programming languages solve the specification problem, (which happens to be equivalent to "The Control Problem"). If you want the computer to behave in a certain way, you will have to provide a complete specification of the behavior. The more loose and informal that specification is, the more blanks have to be filled in, the more you are letting the AI make decisions for you.
You tell your robotic chef to make a pizza, and he does, but it turns out it decided to make a vegan pizza. You yell at the robot for making a mistake and it sure gets that you don't want a vegan pizza, so it decides to add canned tuna. Except, turns out you don't like tuna either. You yell at the robot again and again until it gets it. Every single time you're telling the AI that it made a mistake, you're actually providing a negative specification of what not to do. In the extreme case you will have to give the AI an exhaustive list of your preferences and dislikes, in other words, a complete specification.
By directly producing executables, you have reduced the number of knobs and levers that can be used to steer the AI and made it that much harder to provide a specification of what the application is supposed to do. In other words, you're assuming that the model in itself is already a complete specification and your prompt is just retrieving the already existing specification.
> Programming languages solve the specification problem, (which happens to be equivalent to "The Control Problem").
Not equivalent, for several reasons. An obvious one is simply that even providing a perfect specification in fact does nothing to solve the control problem, because you still have to enforce following of the specification.
It's similar to the human version of the control problem: we can write comprehensive and copious laws, but that doesn't automatically prevent humans from breaking them.
Besides that, the control problem is inherently subjective. The difficulty of breaking it down into an exhaustive list of deterministic rules is a big part of the problem, which programming languages do nothing to solve.
I've found this to be very true. I don't think this is a hot take. It's the big take.
Now I code almost all tools that aren't shell scripting in rust. I'm only using dynamic languages when forced to by platform or dependencies. I'm looking at you, pytorch.
Strongly typed languages have a fundamentally superior iteration strategy for coding agents.
The rust compiler, particularly, will often give extremely specific “how to fix” advice… but in general I see this as a future trend with rust and, increasingly, other languages.
Fundamentally, being able to assert “this code compiles” (and iterate until it does) before returning “completed task” is superior for agents to dynamic languages where the only possible verification is runtime.
(And at best the agent can assert “i guess it looks ok”)