Thanks for the “bitter lesson” news from the frontlines. Curious; did you experiment with 4o as the sole pipeline? And of course as I think you mention, it would be interesting to know if say llama 8b could do a similar job as well.
they don't self-host the models, neither embedding nor last step llm. taking into account low load self-hosting likely would be more expensive. if so why not to use the best models.
Congrats on shipping.