The criteria for purchase for anybody trying to use it is "run slowly but accept...

The criteria for purchase for anybody trying to use it is "run slowly but acceptably" vs. "run so slow as to be unusable".

My memory is wrong, it was the 32b. I'm running the 70b against a similar prompt and the 5950X is probably going to take over an hour for what the M4 managed in about 7 minutes.

edit: an hour later and the 5950 isn't even done thinking yet. Token generation is generously around 1 token/s.

edit edit: final statistics. M4 Pro managing 4 tokens/s prompt eval, 4.8 tokens/s token generation. 5950X managing 150 tokens/s prompt eval, and 1 token/s generation.

Perceptually I can live with the M4's performance. It's a set prompt, do something else, come back sort of thing. The 5950/RTX3080's is too slow to be even remotely usable with the 70b parameter model.