I would love to throw the 3060 out and replace it with a 3090... once money permits. (It's only about $800 nowadays.)
But yes. I'm aware how laughably insane it is to run a 70b model that way. And that's why I was pointing it out to the commenter who suggested to just run a 70b model instead.
To a comment that suggested I try the 70b model, I replied "my card can't run that model". Someone replies back with "you may as well throw the card out if you're going to be trying to run that model". My point exactly.
More seriously, using all-CPU is not much faster as my computer only has 16GB of actual memory, which I'm aware is also hugely underspecced for a 70b model, even with memory mapping.
I have a nice NVMe SSD, so there's not much else for me to do here except upgrade my memory or graphics card.
I got something like 1-2 tokens per second the last time I tried, with CPU offloading and an absolutely offensive page file (32gb).