Crazy, I'd have thought that modern multi-modal LLMs can do this, but when I tri...

ale42 · 2024-07-02T13:02:40 1719925360

Visually comparing two PDFs is something a PC can do deterministically without any resource (and energy) intensive LLMs. People will soon use LLMs for things they are not especially good or efficient at, like computing the sum of numbers in an Excel table... (or are they doing it already?).

B1FF_PSUVM · 2024-07-02T18:52:41 1719946361

As a bonus, they'll get a result that looks likely.

ertgbnm · 2024-07-02T12:43:48 1719924228

This may be the type of thing that LLMs are currently the worst at. I'm not surprised at all.

infecto · 2024-07-02T12:18:13 1719922693

This is definitely not a strength for multi-modal LLM. Multi-modal capabilities are still too flaky especially when looking at a page of a PDF which can have multiple areas of focus.

pmarreck · 2024-07-02T13:04:46 1719925486

I would fully expect an LLM to not get natively good at this but to know how to reach out to another tool in order to get good at this