Hacker News new | past | comments | ask | show | jobs | submit login

> I’m always a bit puzzled why people stress the importance of these “needle in a haystack” tests where the model has to find one specific thing in a huge document. That seems far less relevant to me in terms of usefulness in the real world.

How do you mean?

Half of writing code within a codebase, is knowing what functions already exist in the codebase for you to call in your own code; and/or, what code you'll have to change upstream and downstream of the code you're modifying within the same codebase — or even by forking your dependencies and changing them — to get what you want to happen, to happen.

And half of, say, writing a longform novel, is knowing all the promises you've made to the reader, the active Chekov's guns, and all the other constraints you've placed on yourself by hundreds of pages or even several books ago, that just became relevant again as of this very sentence. Or, moreover, which of those details it's the proper time to make relevant again for maximum impact and proper first-in-last-out narrative bridging structure.

In both cases, these aren't really literal "needle in a haystack" stress-tests; they should properly be tests of the model's ability to perform some kind of "associational priority indexing" on the context, allowing it to build concepts into associational sub-networks and then make long-distance associations where the nodes are entire subnetworks. (Which isn't something we really see yet, in any model.)




Yes agreed, I wasn’t trying to say it’s totally useless, but it’s not as helpful as synthesizing all that context intelligently. It’s more of a parlor trick. But that trick can be handy if you need something like that. Really, the main issue with Gemini is that it’s simply not very smart compared to the competition, and the big context doesn’t make up for that in the slightest.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: