9b with 4bits runs with around 60 tok/s on my RTX 4070 with 12GB VRAM and 35b-A3B runs with around 14 tok/s and partial offloading. For roleplaying I prefer the faster 9b Version but for coding tasks both aren't really usable and Claude is still way better especially if you manage to persuade your employer to give you unlimited access.