Verification, the Key to AI (2001)

jrvarela56 · 2025-05-09T09:18:59 1746782339

This applies to coding agents. If the agent can't run the code, it's unlikely that it can produce working code. Add to running: linting, running tests, compiling, code review and any other tool/process humans do to check if software is 'good' or working.

If the agent can apply these processes to the output, then we're on our way to getting good chunk of our work done for us. Even from the product pov, if the agent is allowed to experiment by making deployments and check user-facing metrics, it eventually could build software product - but we should still solve the coding part as it seems easier to objectively verify quickly.

jgalt212 · 2025-05-09T11:45:33 1746791133

You're right, but actually running the code can be destructive (even when run as intended). You really need to be careful about dev environments. Even the destructive operations will cost you time (and money) in resetting the dev environment.

jrvarela56 · 2025-05-09T15:18:14 1746803894

Agreed and I think this highlights the importance of interactivity/snappiness as well as idempotency. This is needed for a human to play around with also.

If the agent has fast+safe feeback loop to experiment then it can go through more cycles, faster, and improve its output.

jbellis · 2025-05-09T11:59:34 1746791974

Wow, 2001. Legitimately prescient.

And verification ("evaluation" we call it now) really is the key, although most people working on "AI apps" haven't figured it out yet.

Follow Hamel to catch up on the state of the art: https://x.com/HamelHusain

a3w · 2025-05-09T08:53:06 1746780786

Nice. LLMs can prove barely anything, providing some sources, or doing pure math that already circulates. AFAICT, so far, no novel ideas have been proven, i.e. the "these systems never invented anything"-paradox for three years now.

Symbolic AI seems to prove everything it states, but never novel ideas, either.

Let's see if we get neurosymbolic AI that can do something both could not do on their own — I doubt it, AI might just be a doom cult after all.

tasuki · 2025-05-09T10:46:34 1746787594

You can use an external proving mechanism and feed the results to the LLM.

A sufficiently rich type system (think Idris rather than C) or a sufficiently powerful test suite (eg property-based tests) should do the trick.