Distributing very large matrix multiplications doesn't scale across networks wel...

Distributing very large matrix multiplications doesn't scale across networks well.

While this is a MINST classifier (and can be run in the browser) you can get an idea of the math behind it. https://www.3blue1brown.com/lessons/gradient-descent and https://www.3blue1brown.com/lessons/neural-network-analysis

When dealing with a LLM, it's being run again and again - token by token. Pick the best next token, append it to the input, run it again.

If you want to generate 100 tokens (rather small amount of data when compared to much of the GPT-4 conversations), that means running it 100 times.

The network traffic makes the entire system much slower than running it all in one spot.

Consider also the "are you the input to everyone?" and the privacy implications of that.