I have pushed back in a similar way many times (wrt LLMs), the response I typically get is some combination of:
- A custom/heuristic driven approach would perform better but will take much longer to build so would be lower ROI.
- There is a strategic angle to using AI here (building competency). We aren't sure what new use cases will open up in the medium term and we need to be fluent with building AI products.
- There is a perceptual/marketing angle to using AI here. We need to convince the market/investors we are on the bleeding edge (hype).
3 is widely mocked but is a completely rational allocation of resources when you need to compete in a market for funding.
Prolog is AI, so whenever I see such a problem, I use miniKanren, implementing Relational/Logic programming in a light-weight way. Bleeding edge AI it is.
> will take much longer to build so would be lower ROI
This one is funny because my experience has been that ekeing out the issues in this sort of thing is enormously complicated and unreliable and takes an inordinate amount of time. Often the 'bugs' aren't trivially fixable. One we had was the LLM formatting URIs given in the prompt wrongly meaning they're no longer valid. Most of the time it works fine, but sometimes it doesn't, and it's not reproducible easily.
It's true, it can be maddening (impossible?) to chase down all the edge-case failures LLMs produce. But outside of life/death applications with extreme accuracy requirements (eg: medical diagnostics) the attitude I've seen is: who cares? A lot of users "get" AI now and don't really expect it to be 100% reliable. They're satisfied with a 95% solution, especially if it was deployed quickly and produces something they can iterate on for the last 5%.
for 1/2, surprised to hear this because debugging models is usually a total black box and practically impossible. for 2, it's a similar problem where getting performance and accuracy using the same model over and over again on different problem sets can be challenging. not an AI expert or anything this has been my experience on the product side.
Responded to the same sentiment elsewhere but my general sense is that for many use cases users simply do not care about high 9s accuracy/consistency. A 95% solution using AI is "good enough" if you can ship it quickly and give them the tools to iterate on that last 5%.
95% solution might work for small startup X or small biz y but at large company scale 5% is a huge deviation to correct on. Maybe just depends on the client and how touchy they are. At my company, we measure metrics in bps and moving something 50 bps is a huge win. 500 bps would be unheard of.
IMO it's less about the size of the company and moreso the nature of the integration. Users are more forgiving of 95% accuracy when it's used to enhance/complement an existing (manual?) workflow than when it's used to wholesale replace it. The comparison would be building an AI tool to make data entry easier/faster for a human employee (making them say, 2x as productive even at 95%) versus an AI tool that bills itself as a full replacement for hiring a data entry function at all (requiring human or superhuman accuracy, edge case handling, maddening LLM debugging, etc).
In the long run the latter is of course more valuable and has a larger market, so it's understandable large corps would try to "shoot for the moon" and unlock that value, but for now the former is far far more practical. It's just a more natural way for the tech to get integrated and come to market, in most large corp settings per-head productivity is already a measurable and well understood metric. "Hands off" LLM workflows are totally new and are a much less certain value proposition, there will be some hesitation at adoption until solutions are proven and mature.
- Assume LLMs will be more intelligent and cheaper, and the cost of switching to a new LLM model is non-existent. How does improving the custom/heuristic compare in that future?
That's kind of what I was getting at in point 2, about "new use cases" opening up, but yeah you stated it more directly. It's hard to argue with. With a heuristic driven approach we know we will need expertise, dev hours, etc to improve the feature. With LLMs, well, some lab out there is basically doing all the hard work for us, all we need to do is sit back and wait for a year or two and then change one line of code, model="gpt-4o" to model="gpt-5o" or whatever.
- A custom/heuristic driven approach would perform better but will take much longer to build so would be lower ROI.
- There is a strategic angle to using AI here (building competency). We aren't sure what new use cases will open up in the medium term and we need to be fluent with building AI products.
- There is a perceptual/marketing angle to using AI here. We need to convince the market/investors we are on the bleeding edge (hype).
3 is widely mocked but is a completely rational allocation of resources when you need to compete in a market for funding.