From what I remember Glaze is using some small CLIP model and LPIPS (based on VGG) for their adversarial loss, that's why it's so ineffective to large, better trained model.
It use SD to do a style transfer on the image using image-to-image, then it use gradient descent on the image itself to lower the difference between CLIP embeddings of the original and style transfer image + trying to maintain LPIPS, then every step is normalized to not exceed a certain threshold from the original image.
So essentially it's an adversarial attack against a small CLIP model, even though today's models are much robust than that.
The people talking about semantics in the comment section seems to completely ignore the positive correlation of LLMs between accuracy and stated confidence, this is called calibration and this "old" blog post from a year ago already showed it, LLMs can know what they know: https://openai.com/index/introducing-simpleqa/
≥In general encoder+decoder models are much more efficient at infererence than decoder-only models because they run over the entire input all at once (which leverages parallel compute more effectively).
Decoder-only models also do this, the only difference is that they use a masked attention.
They went too far, now the Flash model is competing with their Pro version. Better SWE-bench, better ARC-AGI 2 than 3.0 Pro. I imagine they are going to improve 3.0 Pro before it's no more in Preview.
Also I don't see it written in the blog post but Flash supports more granular settings for reasoning: minimal, low, medium, high (like openai models), while pro is only low and high.
> Matches the “no thinking” setting for most queries. The model may think very minimally for complex coding tasks. Minimizes latency for chat or high throughput applications.
I'd prefer a hard "no thinking" rule than what this is.
reply