More

jejones3141 · 2024-06-20T13:33:50 1718890430

Growing up in Moore, Oklahoma in the 1960s and 1970s, my mother frequented the C.R. Anthony's store in the City of Moore Shopping Center. They used pneumatic tubes; the person at the check-out counter would put a receipt and money tendered into a tube and off it would go to somewhere, I figured some office in back, where they made change and sent the tube back. You could hear it rattle around a bit as it went on its way, and the way it would come back, zip out onto a open curved "landing strip" and slam into the stop was impressive.

jejones3141 · 2024-06-13T13:57:24 1718287044

CanCon 2.0?

jejones3141 · 2024-03-18T01:12:01 1710724321

What kind of hardware would one need to use the model with reasonable performance?

nirav72 · 2024-03-18T01:32:33 1710725553

I wouldn’t bother. Probably should wait until someone releases an optimized version.

superkuh · 2024-03-18T01:34:00 1710725640

Depends on how many experts are active in any given pass. If it's a 10 expert mix of 33B experts (grok-0 is 33B, grok-1 is ~314B which is ~10x) and only runs two of them (like Mixtral's 2/8) then it'd have about the same inference requirements as a 70B model (2*33=66B).

So if this was quantized using ~4 bits per parameter you'd need ~40GB of vram. So you could spread it across 2x 3090 24GB using llama.cpp.

dragonwriter · 2024-03-18T01:54:51 1710726891

MoE has the same “loading” RAM requirements as any other model with the same total parameters (not just for the fixed portion plus whatever experts are activated at any one time) because it has to load all the parameters. The additional needed because of context may be lower (not sure), but the big difference is that it has much better inference speed (and, as a result, can be tolerable with layers split between VRAM and system RAM where a similarly-sized non-MoE model would not.)

> So if this was quantized using ~4 bits per parameter you’d need ~40GB of vram.

No, Mixtral 8x7B (which is a total of 45 billion parameters, because there is a shared portion of the 7B, so its not 56 billion) at 4-bit quantization takes ~29GB [0]. A 314B model is ~7 times as large; with a similar architecture its not going to take only another 1/3 as much RAM.

[0] https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF

superkuh · 2024-03-18T04:07:48 1710734868

You're absolutely right. I don't know what I was thinking.

zaptrem · 2024-03-18T01:34:21 1710725661

I think rule of thumb is 1GB VRAM per 1 billion params quantized to FP8.

dragonwriter · 2024-03-18T01:44:18 1710726258

Just to load the model without actually running it requires 1GB of whatever RAM it is loading and running in (could be VRAM, system RAM, or a combination, with different performance characteristics for each option) per billion parameters at 8-bit quantization. Though models often are usefully run at 4-5 bit quantization, which saves half (or nearly so) of that.

You also need additional RAM that increases as some function of context size (not sure what function, and ISTR there are big-O differences between architectures in how it varies) to actually do inference.

trillic · 2024-03-18T01:32:09 1710725529

8xH100s to keep it all in RAM from my uneducated view.

jejones3141 · 2023-12-20T15:26:18 1703085978

Darn. Now nobody can attach it to a car, like in the urban legend.

jejones3141 · on Nov 15, 2023

How does one debug with GnuCOBOL?

ReleaseCandidat · on Nov 15, 2023

Using gdb. On the GnuCobol site you see a Screenshot of a debugging session in VS Code https://gnucobol.sourceforge.io/ https://gnucobol.sourceforge.io/images/VSC_ScreenShot.png

jejones3141 · on Aug 30, 2023

I can only speak for myself, but Python struck me as much simpler to learn and understand than Perl. Also, my hatred for sigils burns with the heat of a million suns, going back to the $ suffix for string variables and functions in BASIC.

jejones3141 · on Aug 27, 2023

12:30, the Mamas and the Papas

jejones3141 · on Aug 20, 2023

Libraries bring author $$$/reader down considerably.

jejones3141 · on Aug 17, 2023

Nice. Just ordered the 64 GB of RAM to max out my computer w/5600G at 128.

FloatArtifact · on Aug 17, 2023

There might be a max that you can assign to the APU

jejones3141 · on Aug 17, 2023

True. We'll see. I run VMs on the computer, so it's nice to have the RAM anyway and I might as well max it out.

FloatArtifact · on Aug 17, 2023

Do let me know how it goes.

jejones3141 · on Aug 13, 2023

I'm curious. What is "accolade-free code"? It's referenced and highlighted as if it had a link but it doesn't, and Google appears to think it's a hapax legomenon (or whatever the equivalent is for phrases instead of words).

alin23 · on Aug 13, 2023

I think I confused the translation here. I'm Romanian where the {curly brackets} are called "acolade". I read the word "accolade" in English a few times and wrongly assumed it means the same thing as in Romanian.

jejones3141 · on Aug 14, 2023

Ah. Thanks. Yes, "accolade" in English is an award or privilege, or the tap you get with a sword when you become a knight.

nicwolff · on Aug 14, 2023

I wish English had a separate word for those too!