As is now traditional for new LLM releases, I used Qwen 3 (32B, run via Ollama o...

manmal · 2025-04-29T09:25:13 1745918713

One person on Reddit claimed the first unsloth release was buggy - if you used that, maybe you can retry with the fixed version?

daemonologist · 2025-04-29T13:56:27 1745934987

It was - Unsloth put up a message on their HF for a while to only use the Q6 and larger. I'm not sure to what extent this affected prediction accuracy though.

hobofan · 2025-04-29T10:36:26 1745922986

I think this was only regarding the chat template that was provided in the metadata (this was also broken in the official release). However, I doubt that this would impact this test, as most inference frameworks will just error if provided with a broken template.

anentropic · 2025-04-29T12:08:40 1745928520

This sounds like a task where you wouldn't want to use the 'thinking' mode

hbbio · 2025-04-29T07:26:55 1745911615

I also have a benchmark that I'm using for my nanoagent[1] controllers.

Qwen3 is impressive in some aspects but it thinks too much!

Qwen3-0.6b is showing even better performance than Llama 3.2 3b... but it is 6x slower.

The results are similar to Gemma3 4b, but the latter is 5x faster on Apple M3 hardware. So maybe, the utility is to run better models in cases where memory is the limiting factor, such as Nvidia GPUs?

[1] github.com/hbbio/nanoagent

phh · 2025-04-29T08:14:27 1745914467

What's cool with those models is that you can tweak the thinking process, all the way down to "no thinking". It's maybe not available in your inference engine though

hbbio · 2025-04-30T06:12:20 1745993540

Now it is, thanks for suggesting. Qwen3 4b seems to be the best default model for usual steps.

https://github.com/hbbio/nanoagent/pull/1

hbbio · 2025-04-29T08:20:49 1745914849

Feel free to add a PR :)

What is the parameter?

ammo1662 · 2025-04-29T09:02:17 1745917337

Just add "/no_think" in your prompt.

https://qwenlm.github.io/blog/qwen3/#advanced-usages

simonw · 2025-04-29T13:24:54 1745933094

Hah, and now we can't summarize this thread any more because your comment will turn thinking off!

Casteil · 2025-04-29T13:49:29 1745934569

FWIW, their readme states /nothink - and that's what works for me.

>/think and /nothink instructions: Use those words in the system or user message to signify whether Qwen3 should think. In multi-turn conversations, the latest instruction is followed.

https://github.com/QwenLM/Qwen3/blob/main/README.md

hbbio · 2025-04-30T05:13:04 1745989984

Thanks, /nothink works!

So, Qwen3 1.7b is about the same speed just slightly worse than Gemma3 4b which is pretty impressive.

Qwen3 4b passes all 200 tests and is much faster than Mistral Small 3.1 24b or Gemma3 27b.

hbbio · 2025-04-29T12:26:31 1745929591

Thanks!

Turns out just is not the word here. My benchmark is made using conversations, where there is a SystemMessage and some structured content in a UserMessage.

But Qwen3 seems to ignore /no_think when appended to the SystemMessage. I can try to add it to the structured content but that will be a bit weird. Would have been better to have a "think" parameter like temperature.

claiir · 2025-04-29T12:38:15 1745930295

o1-preview had this same issue too! You’d give it a long conversation to summarize, and if the conversation ended with a question, o1-preview would answer that, completely ignoring your instructions.

Generally unimpressed with Qwen3 from my own personal set of problems.

littlestymaar · 2025-04-29T05:17:04 1745903824

Aren't all Qwen models known to perform poorly with system prompt though?

simonw · 2025-04-29T05:32:25 1745904745

I hadn't heard that, but it would certainly explain why the model made a mess of this task.

Tried it again like this, using a regular prompt rather than a system prompt (with the https://github.com/simonw/llm-hacker-news plugin for the hn: prefix):

  llm -f hn:43825900 \
  'Summarize the themes of the opinions expressed here.
  For each theme, output a markdown header.
  Include direct "quotations" (with author attribution) where appropriate.
  You MUST quote directly from users when crediting them, with double quotes.
  Fix HTML entities. Output markdown. Go long. Include a section of quotes that illustrate opinions uncommon in the rest of the piece' \
  -m qwen3:32b

This worked much better! https://gist.github.com/simonw/3b7dbb2432814ebc8615304756395...

littlestymaar · 2025-04-29T08:02:01 1745913721

Wow, it hallucinates quotes a lot!

croemer · 2025-04-29T12:12:07 1745928727

Seems to truncate the input to only 2048 input tokens

simonw · 2025-04-29T13:21:04 1745932864

Oops! That's an Ollama default setting. You can fix that by increasing the num_ctx setting - I'll try running this again.

The num_predict setting controls output size.

notfromhere · 2025-04-29T05:27:05 1745904425

Qwen does decently, DeepSeek doesn't like system prompts. For Qwen you really have to play with parameters