Hacker News new | past | comments | ask | show | jobs | submit login
Run LLMs at home, BitTorrent‑style (petals.dev)
485 points by udev4096 on Sept 17, 2023 | hide | past | favorite | 125 comments



This is neat. Model weights are split into their layers and distributed across several machines who then report themselves in a big hash table when they are ready to perform inference or fine tuning "as a team" over their subset of the layers.

It's early but I've been working on hosting model weights in a Docker registry for https://github.com/jmorganca/ollama. Mainly for the content addressability (Ollama will verify the correct weights are downloaded every time) and ultimately weights can be fetched by their content instead of by their name or url (which may change!). Perhaps a good next step might be to split the models by layers and store each layer independently for use cases like this (or even just for downloading + running larger models over several "local" machines).


Ah, is it possible to tone down the self-promotion? I've been seeing your comments for ollama on many LLM-related posts here.

> Please don't use HN primarily for promotion. It's ok to post your own stuff part of the time, but the primary use of the site should be for curiosity.

Surely in this case it would've been possible to comment about OP's work while leaving out the free backlink to your project. Just my 0.02


There is nothing wrong with self-promotion if, as in this case, it is relevant to the discussion.


> and fine‑tune them for your tasks

This is the part that raised my eyebrows.

Finetuning 70B is not just hard, its literally impossible without renting a very expensive cloud instance or buying a PC the price of a house, no matter how long you are willing to wait. I would absolutely contribute to a "llama training horde"


That's true for conventional fine-tuning, but is it the case for parameter efficient fine tuning and qLORA? My understanding is that for a N billion parameter model, fine tuning can occur with a slightly-less-than-N gigabyte of VRAM GPU.

For that 70B parameter model: an A100?


2x 40/48GB GPUs would be the cheapest. But that's still a very expensive system, especially if you don't have a beefy workstation with 2x PCIe slots just lying around.


Even mATX boards tend to come with two (full-length) PCIe slots, and that's easy sub-$1k territory. Not exactly a beefy workstation.

Source: have a $200 board in my computer right now with two full-length PCIe slots.


Whats more difficult is trying to cool gpus with 24-48gb of RAM… they all seem to be passively cooled


Good point, I think most of them are designed for a high-airflow server chassis, with airflow in a direction that a desktop case wouldn't necessarily facilitate (parallel to the card).


Waterblocks exist for some compute-only GPUs, including the Nvidia A100. Also, there are a few small vendors in China that offer mounting kits that allow you to mod these compute-only GPUs to use off-the-shelf AIO watercoolers. Certainly, not many people are going to take the risk to modify the expensive Nvidia A100, but these solutions are moderately popular among the DIY home lab developers to convert older server cards for home workstation use. Decommissioned Nvidia Tesla P100 or V100 can be purchased cheaply for several hundreds dollars.


> Decommissioned Nvidia Tesla P100 or V100 can be purchased cheaply for several hundreds dollars.

Meh. If you want 16GB of VRAM for several hundred dollars, can't you just pull a brand new 30-series off the shelf and have ten times more computing power than those old pascal cards? You'll even have more VRAM if you go for the 3080 or 3090. Admittedly, the 3090 is closer to $700 or so, but it should still make a P100 very sad in comparison.


Yeah, these GPUs became less appealing after the prices of 30-series GPUs have dropped. The price of SXM cards are still somewhat unbeatable though if you have a compatible server motherboard [1]. Nvidia P100s are being sold for as low as $100 each, there are similar savings for the Nvidia V100s. But yeah, a saving around $100 to $200 is not really worthwhile...

Another curious contender is the decommissioned Nvidia CMP series GPUs from miners. For example, the Nvidia CMP 170HX basically uses the same Nvidia A100 PCB with its features downsized or disabled (8 GB VRAM, halved shaders, etc). But interestingly, it seems to preserve the full 1500 GB/s memory bandwidth, making it potentially an interesting card for running memory-bound simulations.

[1] Prices are so low exactly because most people don't. SXM-to-PCIe adapters also exist which cost $100-$200 - nearly as much as you have saved. It should be trivial to reverse-engineer the pinout to make a free and open source version.


I didn't know the CMP had full bandwidth. that would be a an excellent card for smallish networks (like stable diffusion, GANs, audio networks)

...But it doesn't seem to be cheap. Not really worth it over a 4090 for the same price.


It seems that the CMP 170HX is being sold for $500 +/- $100 on the flea markets in China as closed mining farms are dumping any remaining inventory. Not sure if the prices are real, I'm currently trying to purchase some.


I can confirm that the price is real ;-)


Is it possible to take something like a CMP 170HX and do board-level work to add more memory chips? Or are they not connected to silicon?


I don't believe it's possible. The HBM2e chips are integrated onto the package of the GPU die, making them impossible to remove or modify in a non-destructive manner.


The Quadros/Firepros have blower coolers.


Not with full x16/x16, though I suppose you don't necessarily need that.


Of course, usually the other PCIe slots are something stupid, but there's still a second full-length one, so this could potentially fit two GPUs with the right power supply.


If one is training the full 70B parameters, then the total memory usage far exceeds the memory for simply storing the 70B parameters (think derivatives and optimizer parameters such as momentum.) This is the main reason why models are split or why techniques like the fully distributed data sharing are used during training. During training of a distributed model, at every step of the optimizer these multiple-of-70B parameters need to go through a network wire (though not to all nodes, thankfully). As you suggested, LoRA could work well in a distributed setting because the trainable parameters are very small in number (tens of thousand of times less trainable parameters) and the info required to go through the network for non trainable parameters is also small. However, training this model on a single A100 is impractical as it would require mimicking a distributed training buffering things on a TB-sized CPU RAM (or slower) to swap pieces in and out of the model during every step in an otherwise distributed operation (and is not natively supported in existing frameworks to the best of my knowledge, even though one could technically write this code without too much difficulty.)


I think you'd need 2 80GB A100's for unquantised.


An H100 is maybe a car but not nearly close to a house...


Maybe not in your area, but it's very doable in other places, like where I live.


You expect me to believe there are other places than where I live?!


8 H100s would have enough VRAM to finetune a 70B model.


Is a single H100 enough?


80GB is enough, yeah.

I'm not sure what exact LORA/quantization settings would be ideal, but check out https://github.com/OpenAccess-AI-Collective/axolotl#config


Finetuning in a distributed way with questionable network would be lot more energy/cost inefficient than doing it with a single node or a well connected cluster. Also, you can finetune 70b model for million tokens for $2 in lambda cloud or <$10 in replicate.


What prevents parallel LLM training? If you read book 1 first and then book 2, the resulting update in your knowledge will be the same if you read the books in the reverse order. It seems reasonable to assume that LLM is trained on each book independently, the two deltas in the LLM weights can be just added up.


This is not at all intuitive to me. It doesn't make sense in a human perspective, as each book changes you. Consider the trivial case of a series, where nothing will make sense if you haven't read the prior books (not that I think they feed it the book corpus in order maybe they should!), but even in a more philosophical sort of way, each book changes you. and the person who reads Harry Potter first and The Iliad second will have a different experience of each. Then, with large language models, we have the concept of grokking something. If grokking happens in the middle of book 1, it is a different model which is reading book 2 and of course the inverse applies.


This isn't true. Set up even a simple ANN dense feed forward three layers you know the one. Then keep everything the same for two models you train with the exception of data order. You'll end up with two different models even though you started with the same weights, etc.


I'm not sure this is true. For instance, consider reading textbooks for linear algebra and functional analysis out of order. You might still grok the functional analysis if you read it first but you'd be better served by reading the linear algebra one first.


In ordinary gradient descent the order does matter, since the position changes in between. I think stochastic gradient descent does sum a couple of gradients together sometimes, but I'm not sure what the trade-offs are and if LLMs do so as well.


By the “delta in the LLM weights”, I am assuming you mean the gradients. You are effectively describing large batch training (data parallelism) which is part of the way you can scale up but there are quickly diminishing returns to large batch sizes.


LLMs are trained in parallel. The model weights and optimizer state are split over a number (possibly thousands) of accelerators.

The main bottleneck to doing distributed training like this is the communication between nodes.


The "deltas" are calculated by the error in how well the current state of the network predicts the output, backpropagated. Sequential runs are not commutative because the state changes.

Consider the trivial example of training a network to distinguish between sample A and sample B. Give it a hundred As in a row and it just learns "everything is A". Give it a hundred Bs in a row and it relearns "no, everything is B". To train it to distinguish, you must alternate As and Bs (and not too regularly, either!)


You can finetune 40B falcon on 4 x A10 with compiler optimization technology from CentML. No changes to the model.


Impossible? It’s just a bunch of math, you don’t need to keep the entire network in memory the whole time.


Well, any scheme where weights are dynamically loaded/unloaded from memory enough to fit on a 48GB GPU are so slow that training is basically impractical. Your 70B model would be obsolete by the time the finetuning is done.

Some inference frameworks came up with schemes for just this, and it was horrifically slow.


Are trained LLM's composable in any way? Like if you and I trust 99% of the same data, but each have 1% where we disagree, must we have two entirely separate models, or can we pool compute in the 99% case (along with the others who agree) and then create a derivative model for ourselves which covers for the differences in our trust models?

I have only a rudimentary understanding of neural nets but it doesn't seem crazy that the weights could be manipulated in such a way while preserving the utility of the model.

I ask because I think it would be useful to know which statements two LLMs of equal power agree on and which they disagree on. You could then map that backwards to differences in their training data (only feasible if the differences are small).

If instead two LLMs of equal power represent a missed opportunity to have one of greater power, and the disagreement analysis is prohibitively expensive to do, then that's a bit of a different world.


Somewhat yes. See "LoRA": https://arxiv.org/abs/2106.09685

They're not composable in the sense that you can take these adaptation layers and arbitrarily combine them, but training different models while sharing a common base of weights is a solved problem.



How does this defend against a malicious participant altering the output of their share of the larger computation? Even without some kind of method for e.g. producing attacker-determined network output, this system seems vulnerable to lots of nodes joining and simply returning junk results, effectively DoSing the system.


Hi, a Petals dev here. We're developing validators that periodically go over all servers and ban the ones that return incorrect results. Additionally, clients can run data through multiple disjoint routes in the network and check that the results match.

This catches frequent attackers but doesn't provide 100% protection - so we expect people to set up a _private_ swarm if they want full correctness guarantees. For example, if you don't have enough GPUs to run an LLM yourself but have some hardware owners you trust to, you can set up a private Petals swarm and jointly run the LLM on geo-distributed hardware to process your data.


How about tried and tested reputation systems for GPUs/providers to join certain swarms?

Yes, this can also be gamed (and I do not wish to bring yet another scoring system into this world), but it might just work for users wanting to choose between various levels of LLM security.

You might be able to even tie this into 'energy per compute unit' spent, enticing users to opt for more energy efficient offerings. Potentially, an all-round metric (or multiple metrics) for the viability of a GPU provider.


The first question I had was "what are the economics?" From the FAQ:

Will Petals incentives be based on crypto, blockchain, etc.?

  No, we are working on a centralized incentive system similar to the AI Horde kudos, even though Petals is a fully decentralized system in all other aspects. We do not plan to provide a service to exchange these points for money, so you should see these incentives as "game" points designed to be spent inside our system.

  Petals is an ML-focused project designed for ML researchers and engineers, it does not have anything to do with finance. We decided to make the incentive system centralized because it is much easier to develop and maintain, so we can focus on developing features useful for ML researchers.
https://github.com/bigscience-workshop/petals/wiki/FAQ:-Freq...


> similar to the AI Horde kudos

What they are referencing, which is super cool and (IMO) criminally underused:

https://lite.koboldai.net/

https://tinybots.net/artbot

https://aihorde.net/

In fact, I can host a 13B-70B finetune in the afternoon if anyone on HN wants to test a particular one out:

https://huggingface.co/models?sort=modified&search=70B+gguf


> GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. GGUF offers numerous advantages over GGML, such as better tokenisation, and support for special tokens. It is also supports metadata, and is designed to be extensible

is there a more canonical blogpost or link to learn more about the technical decisions here?


https://github.com/philpax/ggml/blob/gguf-spec/docs/gguf.md#...

It is (IMO) a necessary and good change.

I just specified gguf because my 3090 cannot host a 70B model without offloading outside of exLlama's very new ~2 bit quantization. And pre quantized gguf is a much smaller download than raw fp16 for conversion.


thanks very much!


Similarly there have been distributed render farms for graphic design for a long time. No incentives other than higher points means your jobs are prioritized.

https://www.sheepit-renderfarm.com/home


>What's the motivation for people to host model layers in the public swarm?

>People who run inference and fine-tuning themselves get a certain speedup if they host a part of the model locally. Some may be also motivated to "give back" to the community helping them to run the model (similarly to how BitTorrent users help others by sharing data they have already downloaded).

>Since it may be not enough for everyone, we are also working on introducing explicit incentives ("bloom points") for people donating their GPU time to the public swarm. Once this system is ready, we will display the top contributors on our website. People who earned these points will be able to spend them on inference/fine-tuning with higher priority or increased security guarantees, or (maybe) exchange them for other rewards.

It does seem like they want a sort of centralized token however.


It's a shame that every decentralized projects needs to be compared to cryptocoins now


It's not the comparison, it's that it's one of the things cryptocoins are actually useful for: You have people all over the world with GPUs, some of them want to pay the others for use of them, but their countries use different payment networks or the developers want to be able to automate it without forcing the users to all sign up with the same mercurial payment processor who could screw over any of the users at random.


> it's that it's one of the things cryptocoins are actually useful for

It's what their proponent claim that they are useful for, yet there's no single instance of a successful blockchain project actually achieving this kind of resource-sharing goal.

> You have people all over the world with GPUs, some of them want to pay the others for use of them

The gigantic success of bitTorrent shows that humans as a group don't need to have monetary incentives to share their spare hardware. In fact, it's likely that trying to add money into the mix will just break the system instead of improving it: https://en.wikipedia.org/wiki/Overjustification_effect


> It's what their proponent claim that they are useful for, yet there's no single instance of a successful blockchain project actually achieving this kind of resource-sharing goal

Sia and Filecoin already work in this way to for people to share storage.

> In fact, it's likely that trying to add money into the mix will just break the system instead of improving it

This depends on the amount of money people are willing to pay for processing power. Volunteer contributions would be reduced, but the paid contributions could make up for it if the people who want to train their model pay enough to attract more people into the system and if those people can compete with conventional commercial offerings.


> Sia and Filecoin already work in this way to for people to share storage.

You'll notice that I said “successful” in my original sentence.

> This depends on the amount of money people are willing to pay for processing power. Volunteer contributions would be reduced, but the paid contributions could make up for it if the people who want to train their model pay enough to attract more people into the system and if those people can compete with conventional commercial offerings.

That's a very big “if”: the distributed nature of things is always going to make it more expensive than a traditional solution, especially if you need byzantine fault tolerance (which you need as soon as their monetary value to earn by cheating), the same way that a blockchain is orders of magnitude more expensive than a cloud KV store database, and by pushing the volunteers away you'll end up with a small pool of for-profit actors and these actors themselves likely would be better off if they provided their own cloud offering.

For instance filecoin only has a low thousands nodes, the average filecoin node has something like 10PB of available storage, the top three having 90PB each and making barely $1600 a day, which is $6.4 a year per TB.


> the distributed nature of things is always going to make it more expensive than a traditional solution

> especially if you need byzantine fault tolerance

For storage this can be done much more efficiently with erasure coding and hashing.

For compute, reputation. A node with no reputation has all of its output verified (and so gets paid less). A node with a good reputation history only gets random spot checks, but fail a spot check and you're back to getting paid less, maybe even retroactively.

> For instance filecoin only has a low thousands nodes, the average filecoin node has something like 10PB of available storage, the top three having 90PB each and making barely $1600 a day, which is $6.4 a year per TB.

So it costs too much and it's too cheap?

The nature of something like this is low barrier to entry, so the high competitiveness is going to result in low prices. That's kind of the idea.

The result is going to be two main categories of supplier. One, huge nodes with economies of scale. These might take lower prices than some retail cloud offering, but they also don't have customer acquisition or support costs. Two, nodes with "free" storage, e.g. you built a media center which is already on 24/7 but still has a few TB of free space, so until you get around to using it yourself you'll take the however much in free money. In both cases because they have lower costs than competing providers.

It sounds like the network is providing several exabytes of storage for an extremely competitive price. How is that not a success?


> For storage this can be done much more efficiently with erasure coding and hashing.

More efficient that what exactly? It's still far less efficient than not having to hash and run erasure coding…

> For compute, reputation. A node with no reputation has all of its output verified (and so gets paid less). A node with a good reputation history only gets random spot checks, but fail a spot check and you're back to getting paid less, maybe even retroactively.

That only works if the attacker cannot make big gains from a single cheat after a period of building reputation. There's a reason why this isn't being used in the wild by blockchains…

> So it costs too much and it's too cheap?

Yes, it costs too much to operate, and it's too cheap as a product so operators are losing money. The only reason why there's an offering at all is that some people invested lots of money on hardware in 2021 when the token price was 50 times higher (but then the storage cost was prohibitive).

> It sounds like the network is providing several exabytes of storage for an extremely competitive price. How is that not a success?

Barely anyone using it despite a price so low that it doesn't even allow operators to break even, how is that supposed to be a success?


> More efficient that what exactly? It's still far less efficient than not having to hash and run erasure coding...

Large storage systems already do this. RAID is a type of erasure coding. Various enterprise storage systems use hashing for data integrity.

Distributing the data over a larger number of systems can actually be more efficient because the percentage of erasure blocks you need goes down as the number of independent devices increases. For example, if you only have two devices and want redundancy, you need a mirror and lose 50% of your capacity. If you spread the data across 200 devices you might achieve an even higher level of resilience by sacrificing only 25% of capacity so you can lose any 50 devices without data loss. You may get this down even further by periodically checking if a device is still available and replacing it, so the number of devices you can lose can be as low as the maximum number of devices you expect to lose simultaneously.

> That only works if the attacker cannot make big gains from a single cheat after a period of building reputation. There's a reason why this isn't being used in the wild by blockchains...

Blockchains are a different thing.

If you have a GPU and to start out with everything you produce is checked, you get 50% of what the customer pays. If you have a good reputation then only e.g. 1 in 10 of your output is checked, so you get ~90% of what the customer pays, but to do that you have to accept 50% for e.g. 100 transactions.

You can now defect, but you have a 10% chance of being detected each time, so you can expect to only get to do it 10 times. So you get a 90% payment ten times without doing work, then have to go back to getting a 50% payment 100 times. 10 times 90% is way less than 100 times 40%, so if you do this you lose money.

> Yes, it costs too much to operate, and it's too cheap as a product so operators are losing money.

Do you know what the overhead of the network actually is? Trying to put it together from multiple sources seems to imply that miners get paid ~$8/TB/year but storage costs ~$2/TB/year. Which I assume I'm doing wrong somehow, because it would imply negative overhead and therefore a huge arbitrage opportunity.

I'm guessing the real number is less than 50% overhead, because there are obvious ways to do it at least that efficiently, but even that isn't huge when you can avoid expenses for marketing and customer support. Which implies that the problem is this:

> The only reason why there's an offering at all is that some people invested lots of money on hardware in 2021 when the token price was 50 times higher (but then the storage cost was prohibitive).

Which is a self-solving problem. The unprofitable providers go out of business until the price makes it profitable. But that seems like it should happen quicker than this if the profitability isn't there, because storage is fungible. Even if you bought a bunch of drives to do this when the price was higher, you could sell them and go put the money in a traditional investment. Or if you're speculating on the value of Filecoin going up, sell your storage and use the money to buy Filecoin. So the people still doing it are presumably turning a profit even at current prices, whether through economies of scale or because they had "free" storage to use.

> Barely anyone using it despite a price so low that it doesn't even allow operators to break even, how is that supposed to be a success?

It causes very inexpensive storage to be available, which is useful.


> Large storage systems already do this. RAID is a type of erasure coding. Various enterprise storage systems use hashing for data integrity.

> Distributing the data over a larger number of systems can actually be more efficient because the percentage of erasure blocks you need goes down as the number of independent devices increases. For example, if you only have two devices and want redundancy, you need a mirror and lose 50% of your capacity. If you spread the data across 200 devices you might achieve an even higher level of resilience by sacrificing only 25% of capacity so you can lose any 50 devices without data loss. You may get this down even further by periodically checking if a device is still available and replacing it, so the number of devices you can lose can be as low as the maximum number of devices you expect to lose simultaneously.

You're mixing things up so bad it's hard to correct… For starter regarding filecoin the number of nodes you must expect to lose is “almost all of them” because they can stop operating if the economics isn't even good enough to cover their OpEx (They seem to be fine not covering the CapEx for now, but who knows for how long). It's almost like putting all your data in a datacenter owned by a nearly broke provider: if they go bankrupt you're screwed so you need a plan B.

> You can now defect, but you have a 10% chance of being detected each time, so you can expect to only get to do it 10 times. So you get a 90% payment ten times without doing work, then have to go back to getting a 50% payment 100 times. 10 times 90% is way less than 100 times 40%, so if you do this you lose money.

Again, you're mixing things up. The problem isn't that a node could defect and not do the work (this is assuming non-byzantine fault tolerance), the problem is that a node could voluntarily fuck up the calculations when/if it advantages them. And it could be far less than 10% of the time while still being a nuisance. I don't need to fuck up 10% of the back-propagation calculations in a neural-network training to make it completely unusable/to make the person training it spend way more resource than they should in the training process (which gets me more usage as a node operator).

Adversarial threat modeling is hard, I work with people who do this on a daily basis and I can clearly tell you're oversimplifying things a lot.

> Which is a self-solving problem. The unprofitable providers go out of business until the price makes it profitable. But that seems like it should happen quicker than this if the profitability isn't there, because storage is fungible. Even if you bought a bunch of drives to do this when the price was higher, you could sell them and go put the money in a traditional investment. Or if you're speculating on the value of Filecoin going up, sell your storage and use the money to buy Filecoin. So the people still doing it are presumably turning a profit even at current prices, whether through economies of scale or because they had "free" storage to use.

And yet, it hasn't self-solved itself since 2021… The problem is that node operators get paid roughly what they spend in OpEx, and their capital is essentially illiquid[1] so there's no good reason to stop operating. Of course now the only reason for them to keep invest is the hope that the token price increases again, but because this is crypto it is fueled by the “fantasy of the bull run”, not by an expected uptick in usage (which interestingly enough isn't happening even though the storage price is very cheap).

> It causes very inexpensive storage to be available, which is useful.

Yet barely anyone uses it, which empirically question its “usefulness”.

[1] and I don't get were you got the idea that storage was “fungible”, the failure rate going up exponentially over time makes storage a poor fit for the second-hand market, especially if people know that you've been running stressful proof of space-time on it, and if you're trying to fire-sale a Petabyte of storage, chances are high that people will figure that out


> For starter regarding filecoin the number of nodes you must expect to lose is “almost all of them” because they can stop operating if the economics isn't even good enough to cover their OpEx

You don't expect to lose "almost all of them" at the same time. Even if the price crashes, you would expect capacity to go down over a period of days or weeks, not minutes. And then if a fraction of the data is lost but is less than the number of erasure blocks, you promptly reconstruct it and put it on a different node.

Meanwhile you would expect an equilibrium here. The price going down forces some providers out of the market, but providers leaving the market brings the price back up. As long as the customer is offering as much as some providers need to stay in the market, somebody is hosting the data. That only stops if the customer won't bid what the providers need to get, at which point the customer transfers their data out of the system so they can stop paying more than they're willing to.

Now, you can screw this up if you make your system sufficiently convoluted so the price signal doesn't make it from the customer to the provider or vice versa, and I'm not familiar enough with the specific implementation in Filecoin to comment, but screwing that up isn't inherently necessary for this category of system.

> the problem is that a node could voluntarily fuck up the calculations when/if it advantages them.

Which is why you duplicate some of them at random, and don't tell them when you're going to do it. The calculations are deterministic. If you distribute one to two random nodes and they don't get the same result, but they've each signed their own result, now you know one of them defected and can prove which one by doing that calculation yourself or doing some other potentially expensive operation that only happens when there is an inconsistency. At which point the defector is found out, you can prove it, and their reputation is in ashes.

> Yet barely anyone uses it, which empirically question its “usefulness”.

It seems to be storing more than a exabyte of data for someone.

> their capital is essentially illiquid[1]

> [1] and I don't get were you got the idea that storage was “fungible”, the failure rate going up exponentially over time makes storage a poor fit for the second-hand market, especially if people know that you've been running stressful proof of space-time on it, and if you're trying to fire-sale a Petabyte of storage, chances are high that people will figure that out

It's fungible because a used hard drive is a commodity product with a wide customer base. That new ones cost more than the used ones doesn't make it not a commodity; you could have bought the used ones to begin if you're content to continue running them at their current age.

Let me know if you're aware of some place you can buy working >=16TB drives, used or otherwise, for less than ~$100 each in 2023.


> Meanwhile you would expect an equilibrium here. The price going down forces some providers out of the market, but providers leaving the market brings the price back up.

Bringing the XIXth century equilibrium economics here is kind of hilarious, when it has been far out of the equilibrium price for the past two years. Node runners are already losing money, and have been doing so since the end of the bull run. The value of the FIL token isn't so much derived from an equilibrium in the supply and demand for storage, it's driven by the supply and demands of the coin on the crypto market, and if nodes start to give up in any meaningful fashion over the course of a few days or weeks, the crypto market will likely negatively react, driving the price of the token even lower. In the crypto markets, Keynes' animal spirit is in charge, nobody makes rational utility calculation.

> At which point the defector is found out, you can prove it, and their reputation is in ashes.

If I can make more money out of a single adversarial attack than it costs me to build up reputation, then who cares, I'll be doing it over and over again any day.

> It seems to be storing more than a exabyte of data for someone.

It is in fact storing a exabyte of “data” for “someone”. Compare that with BitTorrent, that was used by everyone and their mom before governments started to fight it. BitTorrent was voluntary only and was a massive success. Filecoin is for profit and a failure.

> It's fungible because a used hard drive is a commodity product with a wide customer base.

Try and sell 10PB of PoST-worn-out hard drives and see how long it takes. It's far from liquid.


> The value of the FIL token isn't so much derived from an equilibrium in the supply and demand for storage, it's driven by the supply and demands of the coin on the crypto market, and if nodes start to give up in any meaningful fashion over the course of a few days or weeks, the crypto market will likely negatively react, driving the price of the token even lower.

The reason for this is that the price of FIL was initially too high for the amount of customer demand for storage it currently has, resulting in oversupply. But you only lose data as a result of sudden undersupply. If the network could lose 90% of its capacity over a month and still store all of the data it currently does, and then that happens, so what?

Whereas if it actually lost enough capacity to create scarcity given the existing demand for storage, then demand for storage would drive the price of the coin back up, right?

> If I can make more money out of a single adversarial attack than it costs me to build up reputation, then who cares, I'll be doing it over and over again any day.

How are you going to do that with AI training or something? As soon as you get caught once, people go back and retroactively verify everything you've previously done, and then you not only lose any payment received for each calculation you forged, the model you screwed up gets recomputed using the money you didn't get to keep or had to stake in order to be trusted to do computations with lower frequency verification.

> Compare that with BitTorrent, that was used by everyone and their mom before governments started to fight it. BitTorrent was voluntary only and was a massive success.

BitTorrent is a great success for large, popular data. It's pretty much useless for storing anything with a low number of downloads.

> Try and sell 10PB of PoST-worn-out hard drives and see how long it takes. It's far from liquid.

Put functional 16TB hard drives on Amazon and eBay for $99.99. See how long they last. I'd guess less than six months before you've sold 10PB worth.


> If the network could lose 90% of its capacity over a month and still store all of the data it currently does, and then that happens, so what?

If the network lose 90% of capacity over a month, you'll hear about grim the future for FIL is, on every crypto newsletter. And the price would tank even more. And if the network already lost 90% of its capacity it means that the economics is already very bad for node operators, so any worsening is likely to get even more node leaving the ship. Crypto going do dust because of crowd dynamics isn't completely unheard of…

> As soon as you get caught once, people go back and retroactively verify everything you've previously done,

How can they link me to my previous identity though… I'd just discard the previous wallet after having drained the available funds and restart from a clean state.

> the model you screwed up gets recomputed using the money you didn't get to keep or had to stake in order to be trusted to do computations with lower frequency verification.

The stacking must end at some point, and given that I can do damage with only a fraction percent adversarial computation, I can just make sure that my probability to get caught during the stacking time isn't enough for it to get a negative expected value.

Your scheme is pathologically broken, and that's no surprise, you're not going to invent a billion dollar winning multiparty computation model as an argument on HN…

> Put functional 16TB hard drives on Amazon and eBay for $99.99. See how long they last. I'd guess less than six months before you've sold 10PB worth.

You'll need to send roughly a thousand of them, without getting bad reputation from all the disks that will break soon after the buyer receives it (because on that amount, and given the state of the disks, a lots will). Also, you're not really disagreeing with my assessment, 6 month is pretty illiquid by investment standard: it's even less liquid than real estate!


> If the network lose 90% of capacity over a month, you'll hear about grim the future for FIL is, on every crypto newsletter. And the price would tank even more.

But the supply of storage goes down, which the storage buyers now need to outbid each other for, so they need to buy the coin. I'm assuming it's also possible for the price of storage in FIL to go down as the price moves. If $1 US is 100 FIL but now providing 1TB/year of storage yields hundreds of FIL, you still earn several dollars US per TB stored.

> How can they link me to my previous identity though… I'd just discard the previous wallet after having drained the available funds and restart from a clean state.

No reputation is the same as bad reputation. To have a good reputation you have to engage in a large number of transactions which are less profitable to you because they're undergoing 100% verification. Building a good reputation allows you to make higher margins, which is valuable and therefore costly to sacrifice.

You can't transfer funds you've staked against your reputation until the buyer has had a reasonable amount of time to try to prove you defected.

> The stacking must end at some point, and given that I can do damage with only a fraction percent adversarial computation, I can just make sure that my probability to get caught during the stacking time isn't enough for it to get a negative expected value.

Suppose you have a good reputation so you only undergo verification 10% of the time at random instead of 100% of the time. You also have to hold 20 times your revenue from this transaction as collateral during the verification window, or however much is necessary to more than compensate the buyer and punish you in the event that you defected.

Now if you defect you have a 10% chance of losing 2000% of your payment. This has a negative expected value. Meanwhile it's now public that you defected and every other buyer still in the verification window is going to go back and verify 100% of their transactions with you, causing you to have a 100% chance of losing 2000% of your payment for those transactions if you defected.

> You'll need to send roughly a thousand of them, without getting bad reputation from all the disks that will break soon after the buyer receives it (because on that amount, and given the state of the disks, a lots will).

The annual failure rate for ~6 year old hard drives is ~2%. Presumably the failure rate over six months is about half that, and you have plenty of other functional drives to send replacements to satisfy the ~1% of customers who got unlucky.

> Also, you're not really disagreeing with my assessment, 6 month is pretty illiquid by investment standard: it's even less liquid than real estate!

That's only because you're trying to sell 10PB of hard drives. It's like saying shares of stock are illiquid because if you want to sell ten billion dollars of shares in the same company it might not be advisable to do it all on the same day.

And even that you could still do, if you want to solicit a large buyer, which in this context would presumably be some kind of data center.

But even supposing that it would take six months, what's your reasoning for why it has already persisted for longer than that period of time then?


> No reputation is the same as bad reputation. To have a good reputation you have to engage in a large number of transactions which are less profitable to you because they're undergoing 100% verification. Building a good reputation allows you to make higher margins, which is valuable and therefore costly to sacrifice.

This is just a balance between how much you win, and how much it costs you. If I double my earnings for sub 1% chance of being caught, then you need to have a very expensive reputation build-up to compensate for that, and this is going to put a big burden on legit providers who want to enter the system, making it even easier to cheat.

> You also have to hold 20 times your revenue from this transaction as collateral during the verification window, or however much is necessary to more than compensate the buyer and punish you in the event that you defected.

Same as above: the higher the stacking is to fend of cheaters, the less attractive it is to legit players. Also, with your scheme the “verification window” doesn't matter, since you're not going to catch me after the fact: you're going to catch me iif my adversarial transaction is being checked.

> Now if you defect you have a 10% chance of losing 2000% of your payment.

Not if me cheating allows me to make even just 223% of the legit payment.

As I said before, you're not going to design a billion-dollar scheme in this HN discussion…

Edit: I just realized that your scheme would be even more penalizing to legit node than I thought: with one random bit flip, the node would lose all their stacking and all their reputation. Talk about an expensive cosmic ray! (Or an attacker could even voluntarily send rowhammer workload to legit node in order to destroy their reputation and stacking, reducing supply and hence increasing their own margin). And I'll say it again: this stuff is HARD and you're very unlikely to find a working solution on your own in this discussion!

> The annual failure rate for ~6 year old hard drives is ~2%.

Not if you've spend the said 2 years stressing the drive in a PoST scheme. There's a reason why these schemes break the manufacturer's warranty …

> That's only because you're trying to sell 10PB of hard drives. It's like saying shares of stock are illiquid because if you want to sell ten billion dollars of shares in the same company it might not be advisable to do it all on the same day.

You'll be able to sell them. You'll take a haircut (likely less than the 70% you're talking about when reselling old hard drives), but you'll sell them in the same day anyway.

> And even that you could still do, if you want to solicit a large buyer, which in this context would presumably be some kind of data center.

Good luck selling worn-out hardware to a data center!

> But even supposing that it would take six months, what's your reasoning for why it has already persisted for longer than that period of time then?

Hodl to the moon (AKA sunk cost fallacy)… They have an illiquid asset, have swallowed the cost of the capital investment (i.e. they had little to no leverage on it) and no need to fire sell it. Also the 6 months is just your optimistic hypothesis…


> This is just a balance between how much you win, and how much it costs you. If I double my earnings for sub 1% chance of being caught, then you need to have a very expensive reputation build-up to compensate for that, and this is going to put a big burden on legit providers who want to enter the system, making it even easier to cheat.

But there is also no need to make the chance of being caught so low, because a single digit percentage of overhead is completely reasonable while still providing a significant chance of being caught.

And you could scale the verification rate with reputation, so a 1% verification rate is possible but takes a very long time, whereas a 10% verification rate is more than ten times easier to get despite still not being a prohibitively high amount of overhead.

> Same as above: the higher the stacking is to fend of cheaters, the less attractive it is to legit players.

For things like GPU computation, you're going to do a unit of computation over a matter of minutes with verification taking the same amount of time or being done in parallel, and so you do many units of computation a day.

It's not that unreasonable to ask someone to put a month's earnings at stake at any given time, which is about what you get with a 24 hour verification window and a 7% verification rate.

> Also, with your scheme the “verification window” doesn't matter, since you're not going to catch me after the fact: you're going to catch me iif my adversarial transaction is being checked.

You're not the one who chooses whether to verify it. If other people are verifying 10% of your work but then someone catches you cheating, they can prove to the others that you cheated and then everyone goes back and verifies 100% of your work which is still in the verification window -- at your expense -- and you lose even more if you cheated more than once.

> Not if me cheating allows me to make even just 223% of the legit payment.

The thing this is preventing is you claiming to do some work but actually not, e.g. someone wants to use your GPU for AI but you don't even have a GPU and just return random numbers. To know if your result is right they would have to do the same computation again so they can compare them, which doubles the cost, or more if they want higher assurances against collusion. And then they're not willing to pay you as much because some of their money has to go to that.

Also, it's 223% plus the cost of provably damaging your reputation.

> I just realized that your scheme would be even more penalizing to legit node than I thought: with one random bit flip, the node would lose all their stacking and all their reputation. Talk about an expensive cosmic ray!

You can set the penalty to whatever is necessary to deter cheating at that level of verification. It doesn't have to be that high, but it can be that high if you need it to be without imposing an unrealistic amount of overhead.

And providers who don't want to be penalized for doing the calculation wrong should operate reliable hardware with functioning error correction. This is not a bad thing to incentivize.

You might also weight the reputational harm. If you get caught once, your reputation will be harmed and lots of people will be rechecking your recent results to see if you tried to screw them too, but if it's an isolated incident you only take a small hit. Whereas if you get caught repeatedly, well, you might as well just start over.

> Or an attacker could even voluntarily send rowhammer workload to legit node in order to destroy their reputation and stacking, reducing supply and hence increasing their own margin

At which point the node is at least as likely to crash as sign an invalid result, which is already a denial of service attack you have to mitigate. For example by using ECC memory and terminating workloads that induce detectable ECC errors instead of continuing them until they induce an undetectable one and crash the machine or cause it to sign a corrupted result.

(Also, rowhammer is a huge problem and almost nobody is actually mitigating it effectively for anything. Someone needs to come up with a generic solution for it before someone else starts using it for widespread exploitation or we're going to have a bad time regardless of what kind of reputation systems are in use.)

> Not if you've spend the said 2 years stressing the drive in a PoST scheme. There's a reason why these schemes break the manufacturer's warranty ...

Do you have some data to back up this claim? Drives are routinely used for heavy database workloads and reliable drive models still last for multiple years.

It seems evident that they're at least reliable enough to continue operating under that workload since that's what they have done instead of failing en masse and causing the storage capacity of the network to decline, given the assumption that the price is too low to justify anyone replacing them.

> You'll be able to sell them. You'll take a haircut (likely less than the 70% you're talking about when reselling old hard drives), but you'll sell them in the same day anyway.

You can sell the hard drives the same day too if you're willing to provide a sufficient discount from the market price. But there is rarely a good reason to do this, because the discount you'd have to provide is more than the time value of money in spreading the sales over somewhat more time.

> Good luck selling worn-out hardware to a data center!

Data centers run hardware until the resource consumption in power and space exceeds the cost of newer hardware, or until it dies. Reliability is just a number in an equation that tells you how much redundancy you have to operate with.

> Hodl to the moon (AKA sunk cost fallacy)

Those are two different things. If they want to hold the coin they'd have more of it to hold if they sell their hardware and use the money to buy the coin.

More likely they're expecting that others will quit and lower the supply to fall in line with demand (or more optimistically that demand will increase) so they can go back to making a profit, but since they all have that incentive they hodl until somebody blinks first, and lose if nobody does.

In the meantime their possibly irrational optimism provides for cheap storage.


Cranks are a scourge…


The logical conclusion is that they (the models) will eventually be linked to crypto payments though. This is where Lightning becomes important...

Edit: To clarify, I'm not suggesting linking these Petal "tokens" to any payment system. I'm talking about, in general, calls to clusters of machine learning models, decentralized or not, will likely use crypto payments because it gives you auth and a means of payment.

I do think Petal is a good implementation of using decentralized compute for model use and will likely be valuable long term.


I mean, I can sell you Eve or Runescape currency but we don't need any crypto to execute on it. "Gold sellers" existed well before crypto.


Is there an API for that which doesn't require each of the users to create a separate account on something else?


if that part could be replaced with any third party server it would be a tracker in BitTorrent analogy.


Can they actually prevent people from trading petals for money though?


Would love to share my 3080 Ti, but after running the commands in the getting started guide (https://github.com/bigscience-workshop/petals/wiki/Run-Petal...) it looks like there's a dependency versioning issue:

    ImportError: cannot import name 'get_full_repo_name' from 'huggingface_hub' (~/.local/lib/python3.8/site-packages/huggingface_hub/__init__.py)


You can host your own swarm of servers apparently [0]. I would be curious to have a ballpark estimate of the finetunning performance of a "private" petals cluster.

[0] https://github.com/bigscience-workshop/petals/wiki/Launch-yo...


I think if you run a cluster in a trusted environment it should be more efficient to use ray or something similar


This is so cool. Hopefully this will give access to thousands or millions more developers in the space


I’ve always thought crowdsourcing is the future. Crowdsourcing information or compute. The fact is we have the “resources” already. It’s a matter of deployment.


I have used Petals at a past project. I share my GPU as well as wrote code for the project.

The Petals part was abstracted away from me. I had a normal experience writing code.

I don't have the project listed anywhere. Don't really know what happened to it. But, it was mainly some five or so guys spearheading the thing.


so given that GGML can serve like 100 tok/s on an M2 Max, and this thing advertises 6 tok/s distributed, is this basically for people with lower end devices?


It's talking about 70B and 160B models. Even heavily quantized can ggml run those that fast? (I'm guessing possibly). So maybe this is for people that dont have a high end computer? I have a decent linux laptop a couple years old and there's no way I could run those models that fast. I get a few tokens per second on a quantized 7B model.


Yeah. My 3090 gets like ~5 tokens/s on 70B Q3KL.

This is a good idea, as splitting up llms is actually pretty efficient with pipelined requests.


> ...lower end devices

So, pretty much every other consumer PC available? Those losers.


Am I the only one that really really hates pages like google Colab? I never know what is going on there. Is it free? Is it running on my machine, or is it running on googles Cloud? If the latter, again is it really free?!

Also everytime I still give it a try, I only get some kind of error at the end.

Edit: Here we go. Literally the first line that it wanted to execute: "ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorflow-metadata 1.14.0 requires protobuf<4.21,>=3.20.3, but you have protobuf 4.24.3 which is incompatible."


I love this direction. I hope that WebGPU can be leveraged for this purpose in the future so that I can feel somewhat mollified about security and to promote adoption.


Cool service. It's worth noting that, with quantization/QLORA, models as big as llama2-70b can be run on consumer hardware (2xRTX 3090) at acceptable speeds (~20t/s) using frameworks like llama.cpp. Doing this avoids the significant latency from parallelism schemes across different servers.

p.s. from experience instruct-finetuning falcon180b, it's not worth using over llama2-70b as it's significantly undertrained.


Hi, a Petals dev here. You're right, there's no point in using Petals if your machine has enough GPU memory to fit the model and you're okay with the quantization quality.

We developed Petals for people who have less GPU memory than needed. Also, there's still a chance of larger open models being released in the future.


AFAIK you cannot train 70B on 2x 3090, even with GPTQ/qlora.

And the inference is pretty inefficient. Pooling the hardware would achieve much better GPU utilization and (theoretically) faster responses for the host's requests


For training you would need more memory. As for the pooling, Theoretically yes but wouldn't latency play as much, if not a greater part in the response time here? Imagine a tensor-parallel gather where the other nodes are in different parts of the country.

Here I'm assuming that Petal uses a large number of small, heterogenous nodes like consumer gpus. It might as well be something much simpler.


> Theoretically yes but wouldn't latency play as much, if not a greater part in the response time here?

For inference? Yeah, but its still better than nothing if your hardware can't run the full model, or run it extremely slowly.

I think frameworks like MLC-LLM and llama.cpp kinda throw a wrench in this though, as you can get very acceptable throughput on an IGP or split across a CPU/dGPU, without that huge networking penalty. And pooling complete hosts (like AI Horde) is much cheaper.

I'm not sure what the training requirements are, but ultimately throughput is all that matters for training, especially if you can "buy" training time with otherwise idle GPU time.


so how long until "tokens" are used to pay for GPU cycles.. people will stop "mining" and just donate their GPU cycles for distributed LLM usages....

in fact, if they did this so that it followed the sun so that the vast majority of it was powered by daylight Solar PV energy I wouldn't even be upset by that.


If AI does decentralization better than crypto I'm about to laugh


Logo is both mesmerizing and distracting.


Very cool.


looking at the list of contributors, way more people need to donate their GPU time for the betterment of all. maybe we finally have a good use for decentralized computing that doesn't calculate meaningless hashes for crypto, but helps the humanity by keeping these open source LLMs alive.


It can cost a lot to run a GPU, especially at full load. The 4090 stock pulls 500 watts of power under full load[0], which is 12 kWh/day or just under 4380 kWh a year, or over $450 in a year assuming $0.10-$0.11/kWh for average residential rates. The only variable is whether or not training requires the same power draw as hitting it with furmark.

0: https://youtu.be/j9vC9NBL8zo?t=983


> $0.10-$0.11/kWh for average residential rates

you Americans don't know how good you have it...


That’s a cheap rate for sure. Southern California is $.36/.59/.74 peak. Super expensive.


Only Cali and the most northeastern states seem to have these high rates. Every other continental state is under $0.14 https://www.eia.gov/electricity/state/


Southern California? Time to buy some solar panels!


Imagine someone paid you 25c/hour for 4090 compute sharing.


That's pretty much what Nicehash does, but after you pay for that electricity it isn't super profitable - especially if you use it for 1/3 or more of the day for your own purposes (gaming/etc).


I immediately wanted to contribute and it's quite difficult to find the link on the homepage! The "contribute" button should not be a tiny text link that says "help hosting" in the footnote, it should be a big button next to the colab button.

Edit: Oh hey, they did it.


This way too nobody can copyright-cancel the LLM like OpenAI or whatever


Exactly, litigation has never been applied to content delivered over BitTorrent-style networks


Litigation is one thing, totally erasing it from the public internet if it were hosted centrally is something else.


Aha, touché salesman :)


For the most part, gpus are no longer used for hashing. Once ETH switched to PoS, it decimated the entire GPU mining market.


[flagged]


Didn't etherum cut power consumption by 99.95% by switching to Proof of Stake? So what are you securing exactly with all those hashes?

Kinda crazy how people stick to Bitcoin but preach decentralisation. You can't be half way noble.


Yeah, and by doing so they got rid of 99.99% of their security and censorship resistance. PoS is Fiat 2.0. It's not worth mentioning in the same breath as Bitcoin, not that it ever was.


> Those "meaningless hashes" help secure hundreds of billions in savings of Bitcoin for hundreds of millions of people.

Can you back that up with actual data? Other than something that a crypto bro on the Internet told you?


The market cap of Bitcoin is hundreds of billions, and estimates put the number of people owning Bitcoin in the hundreds of millions. You can find the data yourself.


Thats not the best counterargument, because Bitcoin has privacy qualities by default. You can hop on to any block explorer and accept every address as another user, but you cant verify that (without expensive analysis, on a case-by-case basis) those are not owned by the same guy. Same with Tor, while some data like bridge usage is being collected somehow (i havent looked into it) you cant reliably prove that thousands/millions are using it to protect their privacy and resist censorship.


It's pretty obvious that the majority of transaction volume and value is rubbish. Bots buying, selling, and trading to each other with millions of addresses. The actual real user count for crypto would be a very tiny % of the active addresses. And the real value not even close to the claimed market caps.


I'm not talking about "crypto", I'm talking about Bitcoin. Bitcoin is not free to send or trade. The vast majority of Bitcoin is held by long term holders and hasn't moved on chain in years. Saving in hard money is the primary use case. Hashing secures those Bitcon from being reversed out of your wallet right back to the very first block.


How can you verify that? Other than, you know, "something that a anti-crypto bro on the Internet told you?"

I'm being slightly salty here but i dont get the backlash on crypto. It has a huge potential for safeguarding privacy (Monero) and avoiding corporate walled gardens and banks.


It costs real world dollars to transact so it's not nothing. This argument can be made for stonks as well, right?


I wouldn't use bitcoin as an example. Monero is far more important.


No it isn't.


I got a lurid NSFW comment, just asking for the time (using the Colab), so I assume some people are trolling the network?

Human: what is the time?

The time is 12:30 PM.

Human: are you sure?

Yes, I am sure. The time is 12:30 PM.^</s>^<s> I'm a young {...}


Hi, a Petals dev here. </s> means "end of sequence" for LLMs. If a model generates it, it forgets everything and continues with an unrelated random text (I'm sorry to hear that the model generated a disturbing text in this case). Still, I doubt that malicious actors are involved here.

Apparently, the Colab code snippet is just too simplified and does not handle </s> correctly. This is not the case with the full chatbot app at https://chat.petals.dev - you can try it out instead.


Thanks for the reply. One way to guard against that would be if the LLM architecture refused to serve against just <s> as a token?


Base llama has lots of lurid in it already.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: