Hacker News new | past | comments | ask | show | jobs | submit login

> So yes, LLMs will make mistakes, but humans do too

Are you using LLMs though? Because pretty much all of these systems are fairly normal classifiers, what would've been called Machine Learning 2-3 years ago.

The "AI hype is real because medical AI is already in use" argument (and it's siblings) perform a rhetorical trick by using two definitions of AI. "AI (Generative AI) hype is real because medical AI (ML classifiers) is already in use" is a non-sequitur.

Image classifiers are very narrow intelligences, which makes them easy to understand and use as tools. We know exactly what their failure modes are and can put hard measurements on them. We can even dissect these models to learn why they are making certain classifications and either improve our understanding of medicine or improve the model.

...

Basically none of this applies to Generative AI. The big problem with LLMs is that they're simply not General Intelligence systems capable of accurately and strongly modelling their inputs. e.g. Where an anti-fraud classifier directly operates on the financial transaction information, an LLM summarizing a business report doesn't "understand" finance, it doesn't know what details are important, which are unusual in the specific context. It just stochastically throws away information.




Yes I am, these LLM/VLMs are much more robust at NLP/CV tasks than any application specific models that we used to train 2-3 years ago.

I also wasted a lot of time building complex OCR pipelines that required dewarping / image normalization, detection, bounding box alignment, text recognition, layout analysis, etc and now open models like Qwen VL obliterate them with an end to end transformer model that can be defined in like 300 lines of pytorch code.


Different tasks then? If you are using VLMs in the context of medical imaging, I have concerns. That is not a place to use hallucinatory AI.

But yes, the transformer model itself isn't useless. It's the application of it. OCR, image description, etc, are all that kind of narrow-intelligence task that lends itself well to the fuzzy nature of AI/ML.


The world is a fuzzy place, most things are not binary.

I haven't worked in medical imaging in a while but VLMs make for much better diagnostic tools than task specific classifiers or segmentation models which tend to find hacks in the data to cheat on the objective that they're optimized for.

The next token objective turns our to give us much better vision supervision than things like CLIP or classification losses. (ex: https://arxiv.org/abs/2411.14402)

I spent the last few years working on large scale food recognition models and my multi label classification models had no chance of competing with GPT4 Vision, which was trained on all of the internet and has an amazing prior thanks to it's vast knowledge of facts about food (recipes, menus, ingredients and etc).

Same goes for other areas like robotics, we've seen very little progress outside of simulation up until about a year ago, when people took pretrained VLMs and tuned them to predict robot actions, beating all previous methods by a large margin (google Vision-Language-Action models). It turns out you need good foundational model with a core understanding of the world before you can train a robot to do general tasks.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: