But I feel we're going back full circle. These small models are not generalist, ...

colechristensen · 2025-01-21T22:16:06 1737497766

But that's the thing, I don't need my ML model to be able to write me a sonnet about the history of beets, especially if I want to run it at home for specific tasks like as a programming assistant.

I'm fine with and prefer specialist models in most cases.

zeroCalories · 2025-01-21T23:44:10 1737503050

I would love a model that knows SQL really well so I don't need to remember all the small details of the language. Beyond that, I don't see why the transformer architecture can't be applied to any problem that needs to predict sequences.

dr_kiszonka · 2025-01-22T01:02:48 1737507768

The trick is to find such problems with enough training data and some market potential. I am terrible at it.

Suppafly · 2025-01-21T23:01:34 1737500494

Specialized models work much better still for most stuff. Really we need an LLM to understand the input and then hand it off to a specialized model that actually provides good results.

janalsncm · 2025-01-21T23:12:03 1737501123

I think playing word games about what really counts as an LLM is a losing battle. It has become a marketing term, mostly. It’s better to have a functionalist point of view of “what can this thing do”.