It's difficult to assess how typical your experience is; I tried your initial pr...

__loam · 2025-05-17T09:48:14 1747475294

Shocked you got a different output from the stochastic token generator.

karencarits · 2025-05-17T15:35:28 1747496128

That's not the point. While there is a temperature setting and randomness involved, you can still benchmark and experience significant differences in the output between models and generations. I thus provided more details and the full output to make it easier for people to assess the context of the comment I replied to

When someone uses the same tools as I do but seem to experience problems I do not have - these kind of posts often describes how bad LLMs are or how bad Google search is - I get a bit confused. Is it A/B testing going on? Am I just lucky? Am I inattentive to these weaknesses? Is it about promoting? Or what areas we work in? Do we actually use the same tools (i.e., same models)?