soumendrak's comments

soumendrak · 2025-05-27T02:27:08 1748312828

No projects. Using GitHub sponsor for individual OSS contributors.

soumendrak · 2025-05-12T19:01:40 1747076500

Are you blindly running expensive LLM evaluations on EVERY response your AI generates?

This widespread practice is costing companies thousands while delivering questionable value.

Here's why your LLM evaluation strategy might be broken:

1. Generic evals are practically USELESS • Hallucination and toxicity scores mean nothing without context • Your use case is unique - generic metrics rarely capture what matters

2. More evaluation ≠ better results • Evaluating entire conversations drastically reduces judge accuracy • Specific, targeted inputs yield more reliable scores

3. Your judges need guidance too • Binary outputs with justification > arbitrary 1-5 scales • Few-shot examples from YOUR domain are critical

4. The reliability problem is real • Position bias: favors responses based on presentation order • Verbosity bias: longer responses get better scores regardless of quality • Self-enhancement bias: models favor their own outputs

Smart evaluation strategies that won't break the bank:

• Sample strategically instead of evaluating everything • Combine automated evals with periodic human validation • Provide context-specific examples to your judge • Always request justification, not just scores

Remember: The best benchmark isn't some generic leaderboard - it's how well the model performs in YOUR specific application.

soumendrak · on April 16, 2024

This describes many wrongs in the timelines of the Indian section.

soumendrak · on March 8, 2024

This is good to teach programming. I have used IBM Cognos before and may be biased, for complex logic text programming will be better than visual programming.

soumendrak · on March 6, 2024

How is Kagi compared to You.com?

soumendrak · on Aug 7, 2023

Asking the right question here.

soumendrak · on Dec 23, 2022

Thanks a lot.

soumendrak · on Nov 26, 2022

Cloudflare and Akamai: competition

soumendrak · on Aug 11, 2022

But, do we know the speed of the natural thing?