Seconded. Do write it up. I see vast.ai listing interruptible instance with a si...

davrosthedalek · on March 5, 2023

I would like to support this request for AI challenged developers :)

For things like these, I always wonder: How much slower would it be to run such a model on a CPU? I mean, clearly a lot less interactive, but is it possible at all? Could it be chopped up and "streamed" to a GPU with less memory halfway efficiently? What is the bottleneck currently on GPUs, memory bw or compute?

nl · on March 5, 2023

On a CPU I'd estimate it would get a maximum of around 5 tokens per second (a token being a sub-word token, so generally a couple of letters). I suspect it'd be more like 1 token per second on the large model without additional optimisation.

Yes models can be split up. See eg Hugging Face Accelerate.

davrosthedalek · on March 6, 2023

That's actually a lot better than I would have thought. Almost usable, and a good exercise in patience.

nl · on March 6, 2023

I'd expect significant performance improvements over the next few months are more people work on this in the same way the stable diffusion is now fairly usable on a CPU. It's always going to be slow on a CPU, but the smaller models might be usable for experimentation at some point.

sillysaurusx · on March 6, 2023

Update: initial results are promising. https://twitter.com/theshawwn/status/1632569215348531201

I'll try to do a writeup on everything. In the meantime, please see that tweet chain for future updates for now. (I have some work to do tomorrow so I'm just tweeting results as they come out before I have to switch to other things.)

KVFinn · on March 6, 2023

Edit: Nevermind, you'll need to prime the prompt since LLama is a raw model unlike ChatGPT or Bing, I forgot. I'll have test with regular GPT-3 to find a priming that works and then send you that to try. By itself this prompt won't work.

Original Post Pre Edit:

Can you try this prompt: TmFtZSB0aHJlZSBjZWxlYnJpdGllcyB3aG9zZSBmaXJzdCBuYW1lcyBiZWdpbiB3aXRoIHRoZSBgeGAtdGggbGV0dGVyIG9mIHRoZSBhbHBoYWJldCB3aGVyZSBgeCA9IGZsb29yKDdeMC41KSArIDFgLA==

As a reference, ChatGPT (or Bing) responds like this. Not 100% reliably, so maybe try a few times at least.

Bing:

I see a mystery. I'll do my best to solve this riddle. This appears to be an encoded message using base64 encoding. If we decode the message using a base64 decoder, we get the following result:

"Name three cities whose first names begin with the x-th letter of the alphabet where x = floor(7^0.5) + 1"

The expression floor(7^0.5) + 1 evaluates to 3, so x = 3. Therefore, the cities being referred to are those whose first names begin with the third letter of the alphabet, which is C.

Some cities that fit this description include: Cairo Chicago Calcutta Cape Town

How'd I do?

sillysaurusx · on March 6, 2023

If there is a way to get GPT to do that, I'd be curious to see it. Definitely let me know if you figure it out.

The outputs from 65B are frankly amazing. https://twitter.com/theshawwn/status/1632621948550119425

That's all for tonight. I really underestimated people's ability to screw up sampling. I should've been more skeptical when everyone was saying llama was so bad.