I have pushed back in a similar way many times (wrt LLMs), the response I typica...

_glass · 2025-01-02T10:19:43 1735813183

Prolog is AI, so whenever I see such a problem, I use miniKanren, implementing Relational/Logic programming in a light-weight way. Bleeding edge AI it is.

physicsguy · 2025-01-02T14:00:27 1735826427

> will take much longer to build so would be lower ROI

This one is funny because my experience has been that ekeing out the issues in this sort of thing is enormously complicated and unreliable and takes an inordinate amount of time. Often the 'bugs' aren't trivially fixable. One we had was the LLM formatting URIs given in the prompt wrongly meaning they're no longer valid. Most of the time it works fine, but sometimes it doesn't, and it's not reproducible easily.

extr · 2025-01-02T17:42:40 1735839760

It's true, it can be maddening (impossible?) to chase down all the edge-case failures LLMs produce. But outside of life/death applications with extreme accuracy requirements (eg: medical diagnostics) the attitude I've seen is: who cares? A lot of users "get" AI now and don't really expect it to be 100% reliable. They're satisfied with a 95% solution, especially if it was deployed quickly and produces something they can iterate on for the last 5%.

jorblumesea · 2025-01-02T16:46:31 1735836391

for 1/2, surprised to hear this because debugging models is usually a total black box and practically impossible. for 2, it's a similar problem where getting performance and accuracy using the same model over and over again on different problem sets can be challenging. not an AI expert or anything this has been my experience on the product side.

extr · 2025-01-02T17:47:57 1735840077

Responded to the same sentiment elsewhere but my general sense is that for many use cases users simply do not care about high 9s accuracy/consistency. A 95% solution using AI is "good enough" if you can ship it quickly and give them the tools to iterate on that last 5%.

jorblumesea · 2025-01-02T19:00:57 1735844457

95% solution might work for small startup X or small biz y but at large company scale 5% is a huge deviation to correct on. Maybe just depends on the client and how touchy they are. At my company, we measure metrics in bps and moving something 50 bps is a huge win. 500 bps would be unheard of.

extr · 2025-01-02T19:58:15 1735847895

IMO it's less about the size of the company and moreso the nature of the integration. Users are more forgiving of 95% accuracy when it's used to enhance/complement an existing (manual?) workflow than when it's used to wholesale replace it. The comparison would be building an AI tool to make data entry easier/faster for a human employee (making them say, 2x as productive even at 95%) versus an AI tool that bills itself as a full replacement for hiring a data entry function at all (requiring human or superhuman accuracy, edge case handling, maddening LLM debugging, etc).

In the long run the latter is of course more valuable and has a larger market, so it's understandable large corps would try to "shoot for the moon" and unlock that value, but for now the former is far far more practical. It's just a more natural way for the tech to get integrated and come to market, in most large corp settings per-head productivity is already a measurable and well understood metric. "Hands off" LLM workflows are totally new and are a much less certain value proposition, there will be some hesitation at adoption until solutions are proven and mature.

mrieck · 2025-01-02T17:10:19 1735837819

You didn't list the most important reason:

- Assume LLMs will be more intelligent and cheaper, and the cost of switching to a new LLM model is non-existent. How does improving the custom/heuristic compare in that future?

extr · 2025-01-02T17:37:31 1735839451

That's kind of what I was getting at in point 2, about "new use cases" opening up, but yeah you stated it more directly. It's hard to argue with. With a heuristic driven approach we know we will need expertise, dev hours, etc to improve the feature. With LLMs, well, some lab out there is basically doing all the hard work for us, all we need to do is sit back and wait for a year or two and then change one line of code, model="gpt-4o" to model="gpt-5o" or whatever.