Hacker Newsnew | past | comments | ask | show | jobs | submit | ghita_'s commentslogin

Hey! Thanks so much! I fixed the link thanks for flagging. Yes the same approach could be used for internet search. The fact that we now have an "absolute score" is very interesting since we can also use a threshold value to determine when an answer simply doesn't exist in a corpus. The only issue is that if all scores are below the cutoff value, you end up discarding them all, and end up with many "I don't know"s. Best approach could just be to flag the "trust" the model has in each source retrieved and use it as such.


would love to check out the code if you have it!


https://github.com/Neywiny/merge-sort

It was actually done to counter Elo based approaches so there's some references in the readme on how to prove who's better. I haven't run this code in 5 years and haven't developed on it in maybe 6, but I can probably fix any issues that come up. My co-author looks to have diverged a bit. Haven't checked out his code. https://github.com/FrankWSamuelson/merge-sort . There may also be a fork by the FDA itself, not sure. This work was done for the FDA's medical imaging device evaluation division


thank you, will check out the paper, the hf space is very cool!


Thanks! We trained on most european languages (english, french, spanish, russian...), arabic, and chinese so it does well on those! We haven't tested too much on other languages, but happy to do so if there is a use case


oh interesting, had no idea, thanks for sharing


It actually runs pretty fast, our benchmarks show ~149ms for 12665 bytes. It's faster than many other models


I would prominently display your benchmarks (against your competitors, of course). That's your selling point, right?


Yes! We did this here: https://www.zeroentropy.dev/blog/announcing-zeroentropys-fir... We wanted to share the approach with the community in this post. It does do better than competitors though!


oh waw thanks for flagging, just fixed, thanks!


yes we found it hard to find a good title for this, thanks for the feedback


ZeroEntropy (W25) | GTM Engineer | SF in person preferred, remote ok | Full-time | www.zeroentropy.dev

We are building a high accuracy search engine for RAG and AI Agents. Our API is live and we're processing billions of tokens monthly. We're looking for a highly skilled full stack developer, to help us with our GTM efforts.

This role is for you if: - You love shipping full stack products and demos fast - You love creating automations for everything - You love talking to customers and have great communication skills

You'd be joining a team of extremely cracked engineers (IOI and ICPC finalists and medalists, IMO and Putnam finalists and medalists) in a cool YC-backed startup where you'd be an incredibly valuable team member.

Send your resume and your proudest achievement at: ghita@zeroentropy.dev


ZeroEntropy | Founding Engineer | In person (San Francisco) | Full time | ghita@zeroentropy.dev

Company: ZeroEntropy is building the next-generation retrieval engine for AI systems. We’re rethinking search from the ground up: faster, more accurate, and built to serve as infrastructure for the next decade of AI. We're working on both the infrastructure and AI model layers.

Role: You'll be in the founding team working closely with the founders on deeply challenging technical problems in search. You'll work across research and engineering to design, train, and optimize machine learning systems that push the limits of what’s possible in performance-critical environments. You'll also contribute to a scalable and low-latency infrastructure for a state-of-the-art search engine using Rust.

Reqs: either exceptional and very low level programming skills (down to the metal), or extensive experience in AI model engineering and training. Preference for IOI, IMO, Quant or AI research background.

Apply: email me at ghita@zeroentropy.dev


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: