Cloud Storage for $2 per TB per month

kmod · on March 6, 2020

I worked on the design of Dropbox's exabyte-scale storage system, and from that experience I can say that these numbers are all extremely optimistic, even with their "you can do it cheaper if you only target 95% uptime" caveat. Networking is much more expensive, labor is much more expensive, space is much more expensive, depreciation is faster than they say, etc etc. I don't think the authors have ever done any actual hardware provisioning before.

I didn't read all their math but I expect their final result to be off by a factor of 2-5x. Hard drives are a surprisingly low percentage of the cost of a storage system.

Taek · on March 6, 2020

Author here. A lot of these numbers are drawn from experience in the mining world, where people realized that when cost is the ultimate bottom line, a lot of corners can be cut.

Sia systems don't need a ton of networking. I ran the networking buildout costs by some networking people, and again it comes down to cutting corners. If you only need 10 gbps per rack, if you don't mind having extra milliseconds added, etc, you can get away with very scrappy setups. The whole point is that it's not a highly reliable facility.

kmod · on March 6, 2020

Sure, let's dig into networking. Who pays for rereplication traffic? If you do 64-of-96 RS encoding, that means for every failure you need to transfer 64x the lost storage capacity. If you're targeting a "low individual uptime but high aggregate uptime" model this means you need to be storing data in multiple sites -- and dedicated cross-geo bandwidth is expensive. I agree that in the happy case you can use low-bandwidth cheap equipment, but to get good reliability you need to provision for larger clustered failures such as rack- and row-level outages.

Taek · on March 6, 2020

Sure! First up, we don't do repairs every time one host goes down. Standard practice on the network is to wait to do a repair until a full 25% of the redundancy is missing (in 64-of-96, that would be 8 hosts offline). Then you repair all 8 at once, significantly reducing the total amount of repair traffic.

But secondly, offline doesn't usually mean dead and gone, with unstable datacenters like this they are usually back online before the user has lost a full 25% of their redundancy.

Row level and rack level outages are handled by data randomization. The entire Sia system heavily depends on probabilistic techniques, both on the renting and hosting side. Row level failures will take out some of your data, but nobody should be disproportionately impacted by a cluster failure.

On Sia, each piece is at a different site. So 64-of-96 implies that each chunk of data (96 pieces to a chunk) is located in 96 different places. This doesn't help with the geo-bandwidth, but as discussed above there are other techniques to handle that.

Surprisingly, bandwidth pricing on the Sia network is even cheaper than storage pricing relative to centralized competition. That's a lot harder to model at scale though, so we aren't as confident the Sia bandwidth pricing will hold up at $1 / TB in the long term.

And technically, most of this stuff is customizable per-customer. If your particular use case has a different optimal parameterization, it's fairly easy to tune your client to suit your particular needs.

jstummbillig · on March 7, 2020

So, is this it? Dropbox designer challenges the project broadly, gets top comment, Author refutes – and we leave it at that?

I mean this is basically the moment where I would expect every systems designer on HN coming out of the woods and crushing Sia into the ground, if there were, in fact, any ground at all to crush Sia into.

Is this actually legit? If so, where is the rejoicing? What am I missing?

zzzcpan · on March 7, 2020

No, the author's idea is ok and he's mostly right on the cost part. Configuration and the numbers are a bit off and unrealistic, you won't get such low 95% availability per site due to other economical and technological constraints, you'll get at least 99%, but probably closer to three nines per site and 64 out of 96 won't be necessary at all (something like 8 out of 12 could be enough). Dropbox designer is just ignorant, biased and conditioned to US market and environment, but appeals to authority, so people upvote his bad comment. I do storage too, on smaller scale than Dropbox of course, not in the US, but it is distributed and the cost is already lower than what you see in the title.

Ajedi32 · on March 8, 2020

The fact that nobody is able to prove it's a bad idea doesn't necessarily mean it's a good one. There might still be other downsides that haven't been considered, some of which could be solveable with more development work and some not.

At this point the cautious skeptic will be thinking "hmm, maybe there's something to this", not necessarily full on rejoicing.

That said, I agree it does seem promising. If you ever find yourself in need of cheap cloud storage it wouldn't hurt to look into Sia as a possible option.

fragmede · on March 7, 2020

Dropbox designer name dropped "64-of-96 RS encoding" as if they're the only person that's heard of, or dealt with Reed-solomon encoding before, and expected the author to get scared off. There is, in the case of drop box, plenty of ground to crush Sia into. That is the ground between the 95% and multiple-nines of availability.

Engineering is about tradeoffs. I could build a network as good as Google's with infinite money, infinite time, and infinite help. I could design a product as beautify as Apple's with the same lack of limitations. Unfortunately for me, I have limited money, limited time, and limited help. Every systems designer understands that, innately, so isn't rushing out of the woodword because Sia and Dropbox have merely chosen different tradeoffs. That one has IPO'd is uninteresting in the abstract. It's just money after all.

tuananh · on March 9, 2020

no. the author mentions "64-of-96" things in the post as well. I don't think kmod means to do what you said.

Legogris · on March 6, 2020

That sounds incredibly energy-inefficient. On average you have 12.5% of servers running but not contributing and possibly incurring load on other nodes.

H8crilA · on March 7, 2020

12.5% overhead isn't that much. It's what just the networking gear can easily eat in a data center (12% out of all the non-cooling-related power supply).

Reed-Solomon encoding adds 50%, of you want 3 block per 2 data blocks. Replicated encoding (not relevant here since this is allow throughput usecase, but necessary if you want to sustain high read throughput) is adding at least 200% (if you want a 3x replication, which I think should be the minimum).

Legogris · on March 9, 2020

12.5% is far from the total overhead, just an additional compounding factor.

monocasa · on March 6, 2020

Turn 'em off if they're not contributing.

rasz · on March 7, 2020

Spinning iron doesnt like start/stop cycles. Server drives go thru very few in their entire life for that very reason.

ec109685 · on March 7, 2020

These are ssd’s.

majdnemkocka · on March 7, 2020

They are not, see the link in the article: https://www.amazon.com/HGST-7-2K-SATA-Drive-Model/dp/B07XQRB...

ec109685 · on March 8, 2020

Oops. Thanks.

dahfizz · on March 7, 2020

87.5% efficient is not incredibly inefficient.

manigandham · on March 7, 2020

Compared to what though? How efficient is a typical data center? Probably way less than this.

DuskStar · on March 6, 2020

I think his point is that he's targeting low aggregate uptime, too.

Taek · on March 6, 2020

Definitely not, aggregate uptime is extremely high. We've never seen downtime do to network outages, only software bugs. And even then, only some users were impacted by the bugs, we've never in 5 years had a broad outage.

DuskStar · on March 6, 2020

Gotcha, and your reply to the GP clarified a lot for me!

arcticbull · on March 6, 2020

Here's the issue. We know that due to economy of scale and domain experience, AWS will always have the lowest cost (to Amazon) for storage -- whether that's totally-reliable storage, or sorta-reliable. If there was a demand for sorta-reliable, they'd build a sorta-reliable S3 and undercut you. Then, blockchain adds inefficiency. Therefore, it's basically impossible for any blockchain solution to have a lower total cost to provide storage.

Dylan16807 · on March 7, 2020

> If there was a demand for sorta-reliable, they'd build a sorta-reliable S3 and undercut you.

Amazon's goal is to get people to pay a premium to get access to their entire ecosystem of services. They don't optimize the price of each individual service. There's a lot of take it or leave it for their specific offerings. Look at how they used to have reduced redundancy as a cost saver, but don't anymore. If you want corner-cutting storage from Amazon, you get funneled into Glacier now.

So let's look at how glacier deep archive is $.99 per TB per month, with a waiting time. And Google's weird offering where it costs $1.23 per TB per month, with instant access, but you pay a bunch of money to access. That means they can't store it on tapes, but they expect to profit anyway. And they're probably not running that as a loss leader that depends on the archival data being accessed.

manigandham · on March 7, 2020

S3 Reduced Redundancy is replaced with S3 Infrequent Access (and another lower tier One-Zone Infrequent Access) so there's still some pricing flexibility available.

Dylan16807 · on March 7, 2020

Infrequent Access is closer. But wow, one-zone sure is only 20% cheaper than three-zone. And the retrieval cost is significantly more expensive than glacier's bulk price.

Actually, looking closer, is one-zone even storing fewer copies? It's offering the same durability, except in the case of "availability zone destruction". So if they're selling three copies for $10 a TB, then the equivalent price for 64-of-96 would be $5 a TB. Cutting that down with cheaper worse hardware would go a long way to get you toward the $2 goal.

manigandham · on March 7, 2020

S3 Standard and S3 IA store multiple copies spread across 3 zones. S3 One-Zone IA is stored as multiple copies but all within a single zone. They give you the same technical durability but less availability if that zone goes down, and if it's destroyed then you lose all your data.

hinkley · on March 8, 2020

That has the potential to cut both ways. If Amazon decides that S3 is their loss leader, then they would absolutely bury boutique players who only cover a few functional areas.

If instead they decide it's the cash cow, then it wouldn't be a stretch to predict that they price it a bit above the competition. The customer has the impression that they can amortize the cost across the other value-adds in the ecosystem. Whether the customer is right or not barely matters to Amazon from a purely financial aspect. All that matters is that they believe it to be true.

There's also the short-term versus the long term strategy. Short term they could do either and I wouldn't be surprised. Long term, I think I would expect the latter.

NohatCoder · on March 7, 2020

Where did you get those prices? I can't find anything cheaper than $4 per TB month at both Google and Amazon.

Dylan16807 · on March 7, 2020

https://aws.amazon.com/s3/pricing/

Glacier Deep Archive is .99/TB/month in US East and EU Ireland and several others, plus $2.50/TB bulk retrieval.

https://cloud.google.com/storage/pricing

Archive Storage is $1.20/TB/month in us-east1 and europe-west1 and several others. But (instant) retrieval is $50/TB, with no options...

throwaway9878 · on March 6, 2020

Everybody with more money than you can always undercut you in anything you ever do so why bother ever trying to do anything

enaaem · on March 6, 2020

Very good point! Let's say you move into a new market, trying to undercut the status quo. The status quo, with their power of scale and experience can just undercut you back. Who is benefitting from this? The customers! So if the customers want more competition they have to pay you to play. Which means they have to co-invest with you and promise to buy your service later.

A good is example are the Apple iPhone screens. If Apple wants a new competing supplier, they have to invest in new competitors.

nojvek · on March 7, 2020

Not really. Suppose giant A undercuts startup B where A is making a temporary loss but B runs out of money trying to complete, only A remains in the market and A starts charging a premium since it doesn’t have completion.

This is bad for customers in the long term. Like how Amazon is slowly destroying brick and mortar stores, or having Amazon Basics at the front undercutting other sellers. Monopolies mean you don’t have an open fair market anymore.

enaaem · on March 8, 2020

It's true that benefits for customers would likely be only temporary. But that is not my main point.

I'm explaining a common business move called pay to play, in industries that require very high investments for new competitors. If customers want B to compete, they have to pay B first to ensure B does not make a loss, so everyone wins a little in the end.

So in the case of OP. He needs to finds paying customers first, who are willing to pre-order his service.

nemonemo · on March 6, 2020

I think the point is that the cost should not be the main motivator. I would agree that there needs to be other differentiators in addition to cost, which could provide sufficient moat against other competitors, big or small.

jessriedel · on March 6, 2020

It's possible that there could be non-obvious innovations in how to save money with a low reliability threshold, which Amazon might not be able to effortlessly copy.

res0nat0r · on March 6, 2020

It should also be noted that you can get S3 storage for $1/TB/month already if you use the Glacier Deep Archive storage class.

paulryanrogers · on March 6, 2020

Did that include network IO to retrieve your data?

notyourday · on March 7, 2020

Nope. That's what actually kills you on Amazon - network costs.

Dylan16807 · on March 7, 2020

It's... interesting... that Amazon offers reasonably priced bandwidth on Lightsail, but it's against the TOS to use it in connection with other services.

notyourday · on March 7, 2020

Yeah, I have been proxy accessing my personal S3 that way, but I'm not really willing to bet a business on it.

zadokshi · on March 7, 2020

Not really. Have you ever done the maths on what it costs to recover 1TB in a recovery situation.

Definitely not worth it.

Dylan16807 · on March 8, 2020

Putting it into normal S3 only costs $2.50 a TB. That's very affordable.

To get out of Amazon entirely, they definitely want to gouge you, but it's not the end of the world. If $24 a year is an acceptable storage price, then Snowball export costing somewhere around $36/TB in bulk isn't too awful. (And if you don't have enough data to fill up a snowball, you can probably smuggle it out through lightsail.)

arcticbull · on March 6, 2020

My point is taking something, adding inefficiency, then pronouncing it's "cheapest in class" isn't logically possible.

badestrand · on March 7, 2020

You are confusing "cheapest option possible" with "cheapest option available". Maybe Amazon _could_ be cheapest and most efficient in everything but they aren't, that's why they make a lot of profit with their cloud services. Thus there can easily be a competitor with a cheaper offering, especially if they are cutting some corners as per the article, even if they have some inefficiencies in other areas.

paxys · on March 6, 2020

The point is not to do it cheaper but better.

wju3h3uwh · on March 6, 2020

Quantity is a type of quality. Lots of people mainly use cloud storage as a cold storage locker for backing up photos and documents. I'm sure there's ways to innovate in the space that would make certain clouds more appealing, but most solutions are already effective as simple storage lockers. Doing it better instead of cheaper at this point is more likely to be either a megacorp affiliation perk, like a discount on a YouTube Music subscription, or phone company style family plan discounts "Get discounted storage for the whole family if you pay slightly less than the price of four memberships!"

arcticbull · on March 6, 2020

... better than AWS with its SLAs, that every major company relies on to some extent?

paxys · on March 6, 2020

Yes. There's a reason companies like Digital Ocean, Heroku, OVH, ZEIT, Joyent, Rackspace, Linode, Cloudflare and many more have been able to survive and grow rapidly in an AWS-dominated space. None of them are competing by undercutting Amazon in price.

folmar · on March 19, 2020

OVH is certainly undercutting Amazon in price. The free bandwidth included in each instance is already a dealbreaker if you actually use the instance for anything other than very heavy cpu-bound loads with small output to send back.

DarthGhandi · on March 7, 2020

Ahh yes, the mythical four nines.

https://medium.com/@guisebule/the-sla-is-a-lie-babd16d629d2

Dylan16807 · on March 7, 2020

"Better" in a niche-filling way, not a "this is very well engineered, but for my use case it's very overengineered" way.

vincentlee · on March 7, 2020

market your product better, steal better companies people, lie, play political games, leverage relationships, kill

so many options

yourapostasy · on March 6, 2020

> ...they'd build a sorta-reliable S3...

From operational experience reports of S3 I've seen out there and discussions [1], including on HN [2], once you reach a large enough object count, S3 is "sorta-reliable". Objects will disappear. The durability claim is not SLA-enforced [3], which is uptime-oriented. The only remedy is a service credit.

[1] https://www.theregister.co.uk/2018/07/19/data_durability_sta...

[2] https://news.ycombinator.com/item?id=14880617

[3] https://aws.amazon.com/s3/sla/

dannyw · on March 7, 2020

Had about 20k objects and lost one (405s no matter what) about six month in.

I didn’t think much about it, but I guess I was really unlucky.

kmod · on March 6, 2020

I agree that AWS has economies of scale that makes it hard to build a better/cheaper S3, but one way you can get around this problem is by building to lesser requirements. S3 has to work for all use cases, but if you know you need less of something costly (say, less IOPs) you can build a system that's cheaper than S3 even if the individual components are more expensive than S3 is paying.

notyourday · on March 6, 2020

This is actually not correct.

Because of AWS scale it cannot be the cheapest. Amazon cannot use odd lots or small lots, which means that it has very few possible suppliers and those suppliers will never be the cheapest.

Same goes for the network infrastructure. While it is possible buy 100Gbit/sec at a throw away price to because some sales person needs to make his numbers and neither he nor his director of sales thinks Kmod Hosting would be able to fill the pipe it is buying, Amazon would need 900Gbit/sec for their exit and no one at the vendor is going to blink at making Amazon pay higher price than Kmod Hosting.

late2part · on March 6, 2020

you are wrong that AWS pays more than small vendors.

jjeaff · on March 7, 2020

That's not what they said. They pay more than small vendors, sometimes.

If you are able to take advantage of sales and clearance deals here and there, you are very likely to get a lower price than even the biggest buyers.

dannyw · on March 7, 2020

1 month of sales hunting has got me a PC rig for literally 2/3rd of the cost of buying without waiting. And this was an optimised build in the first place.

vidarh · on March 7, 2020

They probably could,but they're showing no sign of wanting to compete on cost.

E.g bandwidth costs on AWS are high enough that if you actually serve up lots of data from S3 you can typically afford to rent servers to cache all of it 'in front' of AWS and still save a ton of money.

S3 only gets close to competitive if you never access the data from outside of AWS.

Which gets to the point: If you use an AWS service like S3, you pretty much has to use other AWS services if you want the cost to be even somewhat reasonable in aggregate.

They don't need to compete on cost, because once they get you to buy into one set of services, moving any one set of services off AWS gets more painful and/or costly, and a full migration looks too scary for most people.

S3 will never be priced to undercut anything but big players for that reason. They need to be competitive with Google and Azure, because those guys can offer to offset transitioning costs and generally aggressively target AWS customers.

A small player isn't the same threat even if substantially cheaper.

Taek · on March 6, 2020

Sia is a lot more than a low cost storage platform. It's a full reimagining of how the cloud should work with an emphasis on user ownership and control, open access for developers, and ultimately decentralization that allows users to be certain their applications will always work (no more "RSS Reader is shutting down").

This post is intended to address people who do not feel that decentralization can be cost effective at scale.

jrochkind1 · on March 6, 2020

Where do I find the introduction what Sia is? I tried clicking around in the headers from OP, but couldn't find it within my very lazy tolerance.

Taek · on March 6, 2020

https://blog.sia.tech/skynet-bdf0209d6d34 - this is probably the best thing we have at the moment.

https://support.sia.tech/article/dk91b0eibc-welcome-to-sia - this support article is also a good introduction to the network and why it was built

beedrillzzzzz · on March 6, 2020

Website: https://sia.tech/

Also take a look at Skynet, a file sharing protocol built on top of Sia: https://siasky.net/

easygenes · on March 7, 2020

I cringe every time a project uses the name "Skynet". First, it's overused. Second, it's literally the name of the AI project that went rogue and tried to kill all the humans in the Terminator movie universe. Not exactly the best association. Just find another name.

ethbro · on March 7, 2020

On the other hand, I am filled with positive associations at the name.

Reliability, redundancy, disaster recovery, cost cutting, and absolute focus on execution!

Evidlo · on March 7, 2020

See Soylent.

ineedasername · on March 7, 2020

What aspect of this infrastructure is user-owned?

KarlKemp · on March 7, 2020

You re certainly reimagining the marketing slang of a few years ago (a decade for the gratuitous dig at Google Reader)

risyachka · on March 7, 2020

Amazon has huge margins. They could sell you the same services for waaaaaay less money. They just won't.

jknoepfler · on March 6, 2020

having worked at aws, all I'll say is I think you're wildly over-estimating their agility

Hamuko · on March 6, 2020

Care to share any funny stories?

scarface74 · on March 6, 2020

Isn’t that what AWS S3 one zone infrequent access is?

animalnewbie · on March 7, 2020

> (paraphrased) domain experience means they ae best

As someone who has worked with a couple of market leaders in their respective fields i want to disssaude anyone who would listen of this notion. NO, the market leaders aren't doing things optimally. Some things are downright stupid.

kllrnohj · on March 7, 2020

The buildout in the article doesn't work. You can't plug in that 4-lane SAS SFF-8087 splitter cable into that motherboard. You're only getting 8 hard drives per motherboard with that setup, not 32.

That puts the cost of 192 TB at more like $6240‬, not $4945. Could be less if you find a good deal on mini-SAS PCIE cards, but still going to be substantially higher than $4945.

ADefenestrator · on March 14, 2020

Even if it had slots for the splitter cable, Intel and AMD onboard SATA explicitly doesn't support port multipliers as far as I know. You can buy PCIE SAS cards that do for relatively cheap, but then you have to find board with enough PCIe slots. Easy enough on the "gamer" boards but if you want ECC (and you probably do, for storage) and IPMI (you probably do, if you have more than a few dozen servers) your options get much more limited. Other than 1 or 2 Asrock Rack boards, you pretty much have to move into Epyc 7000-series or Xeon Silver or above. Often dual-socket on the Xeons to get a board with lots of PCIe.

In theory something like an Epyc 3000-series with lots of PCIe or onboard SATA that supports port multipliers would work great, but I don't think anyone actually makes that.

notyourday · on March 7, 2020

You are going to get a 4U server that takes 60 disks and populate it with SAS cards driving them using SAS-2-SATA cables.

anonymousiam · on March 6, 2020

The third sentence of your Medium article says; "Despite this, the Sia network is able to achieve 99.9999% uptime for files." How do you achieve this in a "not a highly reliable facility"?

prepend · on March 6, 2020

Redundancy, I assume. Lots of unreliable facilities and nodes can be very durable in aggregate.

jakewalker · on March 6, 2020

Sounds kinda like the people who thought that bundling a bunch of bad mortgage debt together in slices could get it rated AAA.

rpdillon · on March 7, 2020

It does sound like that! The mistake there was to assume the failure of one mortgage was statistically independent of the failure of another, which is obviously incorrect for many failure scenarios. In this case, it it would be similar if all nodes were in, say, the same datacenter. That doesn't appear to be the case, but there may be other dimensions on which the network lacks the required diversity to support the reliability claims (disk vendor and age...? There must be others.)

feross · on March 6, 2020

Your mind is going to be blown when you learn how TCP/IP works.

allendoerfer · on March 8, 2020

Or computers for that matter. Deep down it is not really about 1s and 0s, more like thresholds in between 1 and 0.

To me a big part of computer science is abstracting away unreliable details to make them seem reliable.

_revy · on March 7, 2020

Not really. Just probability. If you have fully redundant services then ALL of them have to go down to have an outage. Suppose you 5 copies with 75% uptime each. The probability that all of them are down is 0.25^5 ~ 0.0009

Now of course that assumes they are uncorrelated, but since Sia nodes are distributed across the internet, that's likely as opposed to multiple servers at a few data centers like AWS.

Turns out mortgage bonds tend to be correlated.

iampims · on March 7, 2020

They all run the same software stack though, so despite being deployed on diverse hardware, so they can’t claim to only have independent failure modes.

_revy · on March 7, 2020

True. It would be interesting to analyze what others aspects are correlated.

__jal · on March 6, 2020

Reminds me of a saying from the first dot.com crash. (Or at least I heard it then first.)

Tying two bricks together doesn't make them float.

kybernetikos · on March 7, 2020

It's a great saying, but aren't all large ships these days made out of components that don't float individually?

mcdevilkiller · on March 7, 2020

No, they are mostly made of air, which floats (Partial sarcasm, as that is what makes them float)

zchrykng · on March 7, 2020

I mean, the concept actually works, but you have to understand what is actually going into the bundled product. There is no reason you couldn't bundle 100 million in mortgages that had a 10% default risk and sell 10 million of that bundle as AAA.

It is when you just start bundling everything and then saying the entire thing is AAA that the trouble starts.

statictype · on March 7, 2020

Except the people who put those sub prime mortgages in probably didn’t apply rigorous statistical analysis to their situation.

Aunche · on March 6, 2020

Redundancy is kind of meaningless unless if you have sufficient diversity. If all your replicas are located on the same rack, you're screwed if there is a minor disaster like a sprinkler going off.

teknopurge · on March 7, 2020

This is why the post is a red-herring. You cannot use storage if you cannot get to it. Talking about the price of disk space/mth is absolutely useless without talking about the bandwidth required to use it, which means, if you want to go big-boys-math, 95% bandwidth and peering agreement pricing. That $1.50 with all the network chatter between hosts for replication/repair/etc., not to mention actually _downloading_ of the data from the Sia network will take that $1.50/TB/mth to $70/TB/mth.

Talking about the disk-space pricing is, IMO, disingenuous. Talk about the math for an all-in use-case of the service. I appreciate what the OP is going for, however, hand-waving around uncomfortable critical items like bandwidth cost for the Sia network is not a good look.

Dylan16807 · on March 7, 2020

> not to mention actually _downloading_ of the data from the Sia network will take that $1.50/TB/mth to $70/TB/mth

You really need to back that math up.

Here's my math: 10mbps is over 3TB per month at max utilization, so let's say it's good enough for either 2TB throughput with semi-even use, or 1TB throughput if the use is really focused on part of the day.

When buying transit in bulk in the US, Google tells me that 10mbps was very roughly $8.50 a month in 2017, $4.50 a month in 2018, and $2.50 a month in 2019.

Are you expecting the data to be rebuilt every single day or something?

im3w1l · on March 6, 2020

You will be able to boast of 99.9999% uptime up until that sprinkler goes of. And with luck, it could be decades until it does.

toolslive · on March 7, 2020

OTOH, your object storage can be even cheaper if you don't do Reed-Solomon erasure coding, but use rateless erasure codes. For example, online codes[1] have been used by Amplidata to have more reliable storage with lower overhead. There are downsides however (no partial reads, no mutability, ....)

[1] https://en.wikipedia.org/wiki/Online_codes

samstave · on March 7, 2020

[flagged]

_revy · on March 7, 2020

You're taking "cutting corners" out of context. He is describing how individual nodes don't need to run at the standards of regular data servers and are hence cheaper, but in aggregate provide a reliable service.

> So, I assume you’re quite young.

So you do you have a technical criticism?

_ugfj · on March 6, 2020

> I didn't read all their math but I expect their final result to be off by a factor of 2-5x.

Can't be more than 2.5 because Backblaze B2 already gives you $5/TB/Mo.

TrueDuality · on March 6, 2020

Backblaze is operating at their own economy of scale with dedicated deals with suppliers, custom bare bones hardware, optimized processes, etc. It's always more expensive starting out as a little company for raw hardware and processes until you're big enough and mature enough to get the deals and processes in place.

That is also not the only service that Backblaze offers and wasn't their first. It could be that B2 is simply a way for them to offset their cost for extra capacity and are running it effectively near-cost for them.

Aeolun · on March 6, 2020

If all their other customers on the $5/month unlimited backup plan store less than 1TB, but since it’s really aggressive about backing up everything (no, please don’t back up my steam games) on your computer I think they go over that.

Dylan16807 · on March 7, 2020

Here's the histogram from their AMA: https://i.imgur.com/iVEuwUT.jpg

About a third of the users are under 100GB.

Another third is under 500GB.

13% from 500GB to 1TB.

9% from 1TB to 2TB.

8% from 2TB to 5TB.

5% above that.

And they cited raw costs of over $3.50 per TB. But of course that's with real datacenters and non-shoddy equipment.

_ugfj · on March 7, 2020

Don't forget the vast amount of notebooks with SSDs that have less than 1TB storage to begin with.

dannyw · on March 7, 2020

Backblaze deduplicates data right? So it doesn’t really cost them anything to back up your steam games.

lucb1e · on March 7, 2020

They can't deduplicate properly encrypted data, that will have to happen client-side. So while they can deduplicate your data (if you have a copy of a picture in multiple places or maybe even a game installed on multiple machines), it won't work across users.

Aeolun · on March 9, 2020

In that case I’d also expect them to skip uploading the data, but everything is uploaded.

jan6 · on March 7, 2020

they'd probably charge you the same anyway, even if THEY get it at basically free ;)

throwaway3157 · on March 6, 2020

> Can't be more than 2.5 because Backblaze B2 already gives you $5/TB/Mo.

Well it can be, if they have a lot of inefficiencies. Backblaze could have more experienced engineers who overcame these. I assure you, I can accidentally design a very expensive storage system as I’m not that smart ;)

pepemon · on March 7, 2020

Who are you?

tgsovlerkhgsel · on March 7, 2020

Scaleway C14 [1] gives the $2/TB/month that the title promises. Now, providing Sia storage space may be more expensive because you have to calculate costs for proof-of-storage (and may not be able to pull off the "cold storage" model, which may also only work in conjunction with a higher-priced low-latency offering), plus potentially more replication than a centralized service uses, but it does indicate that it's probably not completely unrealistic.

[1] https://www.scaleway.com/en/c14-cold-storage/

ci5er · on March 7, 2020

Backblaze doesn't give you a random read/write like DropBox, and wanting your disk drive back takes some time...

vikiomega9 · on March 6, 2020

> exabyte-scale storage system

Somewhat of a random question, can you point me to some state of the art research?

wmf · on March 7, 2020

https://blogs.dropbox.com/tech/2016/05/inside-the-magic-pock...

http://static.googleusercontent.com/media/research.google.co...

https://queue.acm.org/detail.cfm?id=1594206

http://www.pdsw.org/pdsw-discs17/slides/PDSW-DISCS-Google-Ke...

sydney6 · on March 6, 2020

Storage Systems of this scale are thesedays almost exclusively built around object based storage rather than "legacy" block or file backed solutions. I guess today, min.io is would be the way to go. (to "go".. little pun on the end:)

statictype · on March 7, 2020

What does object-based mean? How is an object different from a file (which I presumed was a collection of blocks)?

tw04 · on March 7, 2020

It means you access your object through a GUID. Think about it like parking your own car vs. a valet. When you park your own car you need to know the address of the garage you parked in, the floor you were on, and the spot you were in. When you valet park, you hand the attendant a ticket and he brings your car back.

With a standard fileshare, you need to walk the filesystem to retrieve your file - this incurs a ton of metadata overhead. It also means when you've got potentially billions of files in a directory, it can be slooooowww. All the metadata requests also make it very chatty - so doing it over a WAN link tends to be extremely painful if it works at all. Newer versions of SMB and NFS have done a lot to batch the metadata requests but they are still protocols meant to happen at extremely low latency inside a datacenter.

toolslive · on March 7, 2020

Some object stores do this, but aws S3 for example does not. You can list the contents of buckets, nicely sorted by name. You can mimic directory structures if you want.

However, you touched a key point: object stores are all about throughput, not latency. You can store at a GB/s (if you have the pipes), but even checking if an object exists will cost you a few milliseconds.

statictype · on March 7, 2020

Got it. Thanks for explaining.

tgsovlerkhgsel · on March 7, 2020

My guess: No random write access to objects, you can (at best) append-only but often you can only append until the object is finalized, and cannot read it until it is finalized.

toolslive · on March 7, 2020

it's like a key-value store, (or a dictionary). However, the values are objects (big blobs of data). This means you can't update parts of objects without rewriting the blob. However, most of the object stores offer metadata operations (move, tag, ...), concat of n objects into 1 and partial reads.

londons_explore · on March 6, 2020

Google's clusters are all block and file based (depending on what layer you want to use them at)

ghostpepper · on March 6, 2020

I am not the op nor an expert on this, but I think https://ceph.io/ targets clusters of this scale.

Mave83 · on March 7, 2020

Yes, with croit.io you can manage Ceph clusters with ease, reducing Laber costs and increase reliability.

We build clusters of low PB scale that have a TCO with everything from labor to hardware, from financing to electric, from routers to cables and can be run below 3€/TB. For that you can store data in Block(rbd, iscsi), Objekt(S3, swift) or Filestorage(CephFS, NFS, SMB), high available on Supermicro hardware in any datacenter worldwide.

Feel free to contact us or use our free community edition to start your own cluster.

z3t4 · on March 6, 2020

Storage can be far cheaper when decentralized. Sending data over the Atlantic is super expensive compared to LAN networking. Almost all content providers peer with ISP's with onsite hardware. But why stop there, put the "racks" in ppl's basements. Data storage is very compact now a days, you can probably fit 100 TB is a shoe-box.

jan6 · on March 7, 2020

that'd need trust that they won't run off with your drives, their basement won't get flooded, there aren't power outages often, they have good protections against surges and whatnot... which at ANY datacenter, is automatically included ;)

Dylan16807 · on March 8, 2020

You need to trust that not too many of them will have that happen at the same time.

Which is not that hard to do.

And all those protections in a datacenter are far from free.

winfred · on March 6, 2020

>I didn't read all their math but I expect their final result to be off by a factor of 2-5x.

I looked at their parts list and it's obvious they aren't serious. CPU is missing, memory is missing, SAS to SATA cables, but no SAS controller, no mounting for the system board. Low effort at best.

prostheticvamp · on March 7, 2020

It seems you’re getting heavily downvoted. I think it’s because CPU and memory are not missing.

winfred · on March 10, 2020

That wasn't on that page when I opened it, you think I'd miss that? They were reading the comments here and updated the page after I wrote the comment.

The price was $4700 something, now it's $4945.

They simply smooth over it with this:

"So we will be using a rig cost of $4500 in our spreadsheet."

That way their overall math doesn't change. Should get banned for doing things like that. These guys are up to no good.

notyourday · on March 6, 2020

We have done this calculation and even if you put your gear into Equinix/Digital Realty in the most expensive places and use Backblaze-type setup ( which is not optimized and buying retail) bringing 10Gbit to every 4U the price for double-writes at 5TB disks are $10/year per TB.

FalconSensei · on March 7, 2020

> Hard drives are a surprisingly low percentage of the cost of a storage system

THIS! There's no use of a 2TB storage if you can't upload/download this amount each month

tlb · on March 7, 2020

Many businesses have TB of data they upload and keep for several years, looking at it once or twice over that time if ever. For these you could provision monthly bandwidth as low as 1/30th of the total storage.

Dylan16807 · on March 8, 2020

It says right there in the article that bandwidth is charged for separately. So you just buy a line of appropriate size, based on actual usage measurements. And each one of these rigs only needs a gigabit connection to upload/download its full capacity each month, which means the network equipment costs can be minimal.

jan6 · on March 7, 2020

have you ever heard of backups and archiving? ;) there are lots of scenarios when you'd want to write some amount, but not read it back for a while, if ever...

late2part · on March 6, 2020

I agree with you. Not 2-5x but they are rounding down on costs and optimistic on risks.

walrus01 · on March 6, 2020

I work in telecom/datacenter infrastructure and this is fanciful. The whole way they take the wattage load of one machine and then hand wave away all of the rest of the costs of either building and running a datacenter, or paying ongoing monthly colocation costs... Is just scary. I truly don't mean to offend anyone but this looks like a bunch of enthusiastic dilettantes.

Generators?

UPS?

Cooling costs?

Square footage costs for the real estate itself?

Security and staffing?

At the scale they intend to accomplish they will need at minimum several hundred kilowatts of datacenter space. Even assuming somewhere with a very low kWh cost of electricity, that much space for bare metal things isn't cheap. Go price a lot of square footage and 300kW of equipment load in Quincy, WA or anywhere else comparable, the monthly recurring dollar figure will be quite high.

And all of that is before you even start to look into network costs to build a serious IP network and interconnect with transits and peers.

Ajedi32 · on March 6, 2020

They're not talking about a datacenter. Datacenters need to be reliable. Sia storage pools don't, because security and reliability is achieved at the global network level, not at the level of individual systems or storage pools. 95% reliability means you can be down for two whole weeks out of every year and still be well within acceptable uptime requirements.

Generators? Who needs those? Just wait for the power to come back on. UPS? Why bother? Square footage? Stick some wooden shelves in the cheapest building possible. Cooling? Locate in a cold climate and buy some window fans.

This isn't anything like the sort of infrastructure you're used to dealing with. Think Bitcoin mining farm, not Backblaze datacenter. Any corners that can be cut will be.

throwaway3157 · on March 6, 2020

Who are the customers for such unreliable systems?

Ajedi32 · on March 6, 2020

Sia is very reliable from the customer's perspective. Its only individual systems and storage pools that have lax uptime requirements. Thanks to some clever network-level redundancy mechanisms (10-of-30 redundancy), 95% uptime at the storage pool level translates to 99.9999% uptime from the customer's perspective. See the "Uptime Math" section in the OP for details.

ac29 · on March 7, 2020

When you says Sia is reliable, do you mean it could be reliable if hypothetical X, Y, and Z things happen?

Because according to the homepage (sia.tech) there are only 895 hosts storing a total of 206TB right now, which is a very, very small amount. Backblaze, a relatively small player as compared to the big cloud providers has 1.1 million TB of raw capacity as of last year (redundancy reduces the available capacity, but still) [0].

[0] https://www.backblaze.com/blog/hard-drive-stats-for-2019/

Ajedi32 · on March 7, 2020

By reliable in this context I mean robust against hardware failure. (Including the failure of entire storage sites.) The OP explains the math and associated assumptions for how they derived that "99.9999%" figure, and acknowledges that since the calculated chance of data loss due to hardware failure is so infinitesimally small, other failure modes outside of what they modeled are likely to dominate.

As for the relatively small number of hosts at present, 895 is more than enough for 10-of-30 redundancy to work "as advertised". You really only need 30 hosts technically. The bigger issue I think is the relative immaturity of the software. Sia is still pretty new compared to most other data storage systems; and although I've never heard of any software bugs in Sia resulting in data loss, that doesn't mean such a bug will never be discovered. Be cautious, keep backups, and never rely on any single storage medium to store your data.

system2 · on March 7, 2020

It is not the answer to the question above. What is the type of customer uses this company?

ericflo · on March 7, 2020

Users of Sia are the customers. So your question reduces to: who uses Sia? It used to mostly be interesting for customers interested in reliable large-file data backup on the cheap, but now it has expanded to customers looking to do content distribution on the web, etc.

qes · on March 8, 2020

> Think Bitcoin mining farm, not Backblaze datacenter. Any corners that can be cut will be.

And yet Sia is about half the cost of Backblaze (i.e. not much savings).

Hard to imagine situations where this is a good trade-off.

Taek · on March 6, 2020

Its super interesting to dive into the world of cryptocurrency mining, where some DCs are getting PUE of 1.1 or better with a buildout that's basically amounts to shelves and box fans.

No generators, just eat the downtime. No batteries. No 24/7 staff. No racks, just shelves (folded sheet metal is cheap). Security varies from farm to farm.

These servers don't need to run cool, as long as you are in a climate that doesn't get over 100 degrees you can get away with fans and no AC.

walrus01 · on March 6, 2020

If all you intend to do is generate hashes and run for the exits when your mining equipment is no longer economical, due to difficulty increase, sure.

Reliable data storage for paying customers is a very different thing.

ajross · on March 6, 2020

Per-site reliability (beyond a stated/assumed 95%, which is right around the "I put a computer on my desk" level) isn't a design goal, though. You can argue with their math or their assumptions, but you can't say they're wrong for not designing in the reliability features you list above.

walrus01 · on March 6, 2020

Things on my desk are meeting three nines or better of uptime at present, 95% would be 18 days hard down per year.

ajross · on March 6, 2020

I think it very much depends on the desk. I mean, power alone at my home just barely reaches three nines. Internet connectivity glitches out occasionally. I'll fat finger configurations and reinstall stuff for some more hours a year. It all adds up.

You're right that you can take the same hardware, add a $70 UPS and some thought and care, and do much, much better. But my point was only that "95%" is a trivially achievable goal even in the most naive setups, which makes me treat their analysis with a little more credence than most folks here.

ubercow13 · on March 7, 2020

Which should give you even more confidence in Sia's reliability, no?

thawaway1837 · on March 6, 2020

Crypto currency servers didn’t need to last though. Their primary components were being replaced regularly with the latest generations.

Here they’re depreciating components over 7 years, so will those components last 7 years under those conditions?

jrockway · on March 6, 2020

I think it's interesting to dive into the economics of 95% uptime. Maybe you don't need full datacenter cooling; you can just locate in a cool climate and have a fan in the window blowing cold air in. If there's a blizzard, you lose your drives because snow blows in and melts on your drives. If the chance of the blizzard * price of drives is less than needing 750W of cooling, then you win. Yeah, sometimes everything shorts out.

Power is similar. Maybe you just use solar and turn the drives off when it's cloudy. With enough distribution throughout the world, it will probably be sunny somewhere.

I haven't done the math and I'm not saying it will work out favorably. I also don't have a use case for 95% availability. (That's two weeks a year where your data is gone!) But it's something that someone with the right needs could consider, and maybe come out ahead of someone shooting for 5 nines and drives that aren't covered in snow.

walrus01 · on March 6, 2020

OVH, the massive hosting company, is located in a former aluminum smelter next to a hydroelectric dam in Quebec. They keep costs as low as absolutely possible, but their operating costs as a whole are still considerably higher than something designed with one nine of uptime.

I get that you're mostly joking by saying "yeah sometimes everything shorts out", but we have electrical codes in the US/Canada for a reason.

londons_explore · on March 6, 2020

In most places you don't need to build to code, as long as you take other steps to mitigate risk.

Saying "this entire building is designed to catch fire, and if it does, it won't do harm to neighbouring buildings, people, or the environment" is probably a good start.

ineedasername · on March 7, 2020

If you're in the developed world, that isn't going to fly. "other buildings won't burn down" isn't going to get you a pass on these things.

generatorguy · on March 7, 2020

Insurance?

mkl · on March 6, 2020

> I also don't have a use case for 95% availability. (That's two weeks a year where your data is gone!)

Gone from only one of the places it's stored. Your data is still available even if just 10 out of 30 servers are online.

wmf · on March 6, 2020

They're not talking about a real datacenter; they're talking about a deathtrap crypto mine with hard disks instead of GPUs/ASICs. Can't you run a million hard disks off a consumer cable modem? ;-)

At the (slow) rate Sia is growing, I don't think there will ever be enough demand to justify this design anyway.

breakyerself · on March 6, 2020

How much is a terabyte of dropbox storage?

caymanjim · on March 6, 2020

Dropbox is $10/TB/mo, which is the "old" industry standard (and still what most big guys charge). Backblaze is $5, but they charge egress fees, so usage matters. Dropbox doesn't charge for egress, but there are limits on "Public" folder bandwidth (and you can't pay for more if you max out). I don't think there are limits on private content, but I bet there's a soft throttle or some limit that will get you contacted.

There aren't a lot of options for <$5/TB and they all charge egress. I've tried Backblaze, pCloud, and DO Spaces. My specific use case is storing full HD movies for streaming via Plex, which requires a fairly reliable 6Gbit/second, and most of them can't keep up. DO Spaces did the best of all the ones I tried, so I'm using that for now, but it's just skating the edge of usability.

comex · on March 6, 2020

Surely you mean Mbit, unless you’re streaming uncompressed video for some reason! :)

bscphil · on March 7, 2020

Although it's certainly a typo, ~6 Mbits a second isn't enough for more than poor quality 1080p video, roughly equivalent to streaming from Netflix or Hulu. If you were streaming a backup of a UHD Bluray, for example, the bitrate could be over 100 Mbps.

caymanjim · on March 7, 2020

Indeed. HD streaming seems to need about 5.4Mbps reliably. Spaces is working well enough, but sometimes I have to pause for ten minutes to buffer enough for the rest of the film. It usually works, but with variable bitrate it must spike at times.

judge2020 · on March 6, 2020

I'd love to see someone try running a TV network off of DO spaces :)

harias · on March 6, 2020

Did you try GSuite Business account with 5 accounts? Google offers unlimited cloud storage if you get more than 5 accounts (5*12 = 60 USD/month and no egress)

londons_explore · on March 6, 2020

It's 'unlimited' until you use it too much, and then all your google accounts will get perma-banned...

caymanjim · on March 7, 2020

I've read this as well, so I didn't try. I also don't want to spend that much. I have 120 movies in Spaces and it only costs me about $8/mo. My egress is under the free tier so far, and overages amount to about four cents per movie stream.

ubercow13 · on March 7, 2020

Any evidence of that? Isn't it likely that Google has plenty of commercial customers using this service with large amounts of data?

animalnewbie · on March 7, 2020

For $6 per month (paid yearly) you get 1tb of OneDrive + office (desktop applications). My math maybe off but if you buy office every 3 years for $200 then you're already even.... And 1TB of storage is just free really.

mekster · on March 7, 2020

You could also say, MS Office is virtually free for the 1 TB these days.

caymanjim · on March 6, 2020

I should add that places like Backblaze and pCloud and DO Spaces and others charge per GB stored, so if you're using 1.3TB it's prorated accordingly.

john_minsk · on March 6, 2020

0. You can only get 2 TB+

snazz · on March 6, 2020

It sounds like they're intentionally forgoing backup power and UPSs, as well as 24/7 staffing. However, you're right that this is probably pretty optimistic.

late2part · on March 7, 2020

If you’re paying more than $300k/mo for 300kw of power in Quincy the DC sales guy probably bought your purchasing people a boat.

late2part · on March 6, 2020

I don’t really agree. Retail Datacenter pricing all in should be well under $500/kW; half of what they suggested.

mtlynch · on March 6, 2020

In 2018, I spent about six weeks running a series of tests to measure Sia's real world costs. At that time, storage cost ~$4.50/TB on Sia to back up large real world files (backups of DVDs and Blu-Rays).[0] Community members have re-run my tests every few months, most recently in October 2019, when the cost was measured at $1.31/TB, though it's worth noting that recent tests use synthetic data optimized to minimize Sia's cost.[1] It's also unclear how much the market value of Sia's utility token affects these costs, as the price of Siacoin has fallen by ~80% since I conducted my original set of tests.

The calculations in today's blog post account for the labor cost of assembling hardware, but leave out major other labor costs:

1. You need an SRE to keep the servers online. Sia pushes out updates every few months, and the network penalizes you if you don't upgrade to the latest version. In addition, to optimize costs, you need to adjust your node's pricing in response to changes in the market.

2. You need a compliance officer to handle takedown requests. Since Sia allows anyone to upload data to your server without proving their identity, there's nothing stopping anyone from uploading illegal data to the network. If Sia reached the point where people are building $4k hosting rigs, then it's safe to assume clients would also be using Sia to store illegal data. When law enforcement identifies illegal data, they would send takedown notices to all hosts who are storing copies of it, and those hosts would need someone available to process those takedowns quickly.

[0] https://blog.spaceduck.io/load-test-wrapup/

[1] https://siastats.info/benchmarking

marcinjachymiak · on March 6, 2020

Another cost associated with your 2nd point is the collateral a host would have to burn to comply with take-down notices.

reggieband · on March 6, 2020

I'm going through Sia's website now. It seems this article is meant to bolster the claim on their website which states "When the Sia network is fully optimized, pricing will fall somewhere around $2/TB/month." [1]

Call me skeptical but it seems that they aren't committing to building out this infrastructure themselves or providing a specific amount of storage at this pricing. They seem to be outlining a potential infrastructure that some enterprising individual (or corporation) could use to provide storage at that price to "renters" within their marketplace.

I guess I'll just wait until someone puts their money where their mouth is. Given that this is a marketplace, the fact that a theoretical setup could be built to provide some service doesn't necessarily guarantee it will be built.

1. https://support.sia.tech/article/thvymhf1ff-about-renting

tastroder · on March 6, 2020

Since you're bringing up the website... I don't get this marketing strategy. The cryptocurrency angle is just as off putting as telling me as a potential customer that my data will be stored on janky servers in unreliable places, no matter if the uptime is the same. OP even claims they have reliability and experience I'd consider but those aspects sure send signals that make me not want to deal with that stack.

Just looking at the website for Sia I see a bunch of fluffy marketing stuff, fair enough, that's normal these days. But where is the selling point? https://sia.tech/technology tells me my data is stored securely and in a redundant manner, great, just like any storage provider. That is followed by "Renters And Hosts Pay With Siacoin" and talks about payment channels, which links to a wikipedia article and not something that tells me how I would even pay them, not to even talk about how much (I saw the calculator thingy on my way to that site, the messaging is still weird).

The "Getting started" call to action is a similar experience, a bunch of downloads, cool - I don't even know if you're right for me yet. I'm five levels deep into the "Getting started guide" linked there and so far found that I'd apparently have to deal with weird crypto exchanges to pay somebody for this, plus I couldn't use most of my pretty standard tooling anymore (at least not without involving one of those proxy things on the getting started page that cover a few use cases, some of which seem to be operated by others?).

ericflo · on March 7, 2020

It's live right now though, the community has deployed the infrastructure [1] and the pricing is approximately what they claim [2].

1. https://siastats.info/hosts_network

2. https://siastats.info/storage_pricing

ADefenestrator · on March 14, 2020

That's the price being paid for the storage, but is it actually covering the cost of providing that storage? Or is it just 100-300 people who thought "Huh, neat, I'll toss a host online and see how it goes" ? I'd lean towards the latter and assume those storage costs are heavily subsidized by a few people satisfying curiosity.

jakear · on March 6, 2020

> That means about 2 hours of labor per rig. We’ll call that $50

Does that seem low to anyone else? I don’t really have any background in the area, but 25/hr cost to the company would be less than 20/hr pay for the skilled labor. Other countries are different of course, but in US I could make that much flipping burgers in the right area.

eyegor · on March 6, 2020

It's outrageously low. They're also fancifully assuming cpu tdp = electrical power, cooling = 0w, and another 0w to the motherboard/network cards. And each box has 1 non-redundant, schmuck-grade $80 psu, as well as a consumer grade mobo. This would never be anywhere near their uptime.

Taek · on March 6, 2020

CPU TDP is accounted for, the CPU linked in the blog post draws 65w and that is used in the electricity calculations.

I did realize that I completely forgot about RAM, when I get back to a computer I'll have to make some updates, but it won't materially move the numbers, there's 33% margin of error between the number in the spreadsheet and $2 / TB / Mo.

The $80 PSU is what I could link to from Newegg, I do have experience in industrial electronics, and I know from firsthand experience that you can buy a 10+year PSU at 93% efficiency for well under $80 at 300 watts. At that level, you're going to be able to request all the required cabling as well, which means you're getting a much better price than the $7 per cable linked in the post.

95% uptime means 18 days of downtime per year. Consumer grade PSUs and mobos do much better than that.

eyegor · on March 6, 2020

You included tdp but tdp /= actual electrical power. Intel approximates it from base clock (practically all consumer motherboards ignore the tdp/boost spec). AMD uses voodoo to calculate tdp, it's a pure marketing number with no basis in power usage. To make matters worse, motherboards typically go nuts with voltage on consumer boards, and power draw can vary wildly depending on what instructions you're using. There are a crap load of x86-64 extensions.

Yes, you forgot ram. And network chips, motherboard power delivery losses, motherboard power usage, cabling losses, etc. I'd guess it would total 50-100w, but feel free to current clamp the psu rails to get a realistic number.

As for the 95% uptime, I agree with you. I wasn't considering how much breathing room that actually provides, I was just going with my instincts.

Taek · on March 6, 2020

Assembling computers and replacing drives is generally speaking unskilled labor, right there next to factory workers.

ac29 · on March 7, 2020

Right, the people putting together servers for Dell or HP certainly are not getting paid $20 or $25/hr. If your standard of quality is "crypto shitbox", even $10/hr seems more than enough. This is assuming that many of these storage farms are going to be in places like Vietnam or Romania, not the US or Germany.

lucb1e · on March 7, 2020

If you're from the USA, perhaps just ×2 any price you read if that helps you get through the article. Work that needs doing doesn't always need to be done in high-income/high CoL places.

I earn about €26/h before taxes in western Europe, an income which lets me live in relative luxury (not "private jet" luxury, but I literally do anything I want and still save more than a third of my income with a 36-hour work week), and that's for security consultancy which is way more specialised than the job you're talking about. I think it's also above the national average, but I don't have the statistics on hand. Not sure what the cost to the company is, I think they put in another hundred a month for health insurance or pension or something (they pay 50% and I pay 50%, though I don't see why keeping 50% off my payslip helps anyone, an employer will just deduct that from the salary they can offer) plus some overhead for accounting and whatever, but it's probably not that far off.

chrisseaton · on March 6, 2020

I don't think they mean engineer time here - it's assembling the servers, so technicians, and yeah I guess yeah like most people they aren't paid like in-demand software engineers. This is how most people have to get to by!

benhurmarcel · on March 6, 2020

In some developed countries you can get qualified labor that cheap. For example in Southern or Eastern Europe. Not to mention developing countries of course.

xmzx · on March 6, 2020

Not to mention, this work isn't technical in nature. It's very quick to pick up, probably more so than landscaping tbh.

ComputerGuru · on March 6, 2020

There is way too much hand-waving and assuming going on this article. It is a load of BS that does not take into account real-world inefficiencies. e.g. sometimes buying in bulk is more expensive than buying at retail, esp when you need consistent supply. Sure, you may need only an hour of sysadmin time a day, but what sysadmin will let you employ them an hour a day? The buildout did not list a CPU. The assumptions about uptime are over-amortized, an outage given the resources they quote may average out to 95% uptime but their latency for getting systems back up is going to be absolutely terrible and I’d be surprised if outages were shorter than a day or two on average. They aren’t factoring in cooling. They aren’t factoring in the drastically reduced lifetime of drives in their ridiculously cramped and under-ventilated cubbies. They are completely ignoring diagnostic time, presuming they can only quote actual repair times, which is an absolute joke given the lack of smart hardware and enterprise DC management. They think they can average out throughout over the number of drives not taking into account per-channel limitations. They are not taking into account the extra time to build and dismantle systems in their hacked-together IKEA shelves. They are underestimating the costs of electricity at commercial rates. I could go on and on, but suffice to say that I would never, ever use their network for any purpose without another backup (which they don’t finger into their costs, of course ;). I thought B2 was risky; this is taking it to an entirely different level.

kllrnohj · on March 7, 2020

> The buildout did not list a CPU.

It did, or does now anyway, the RYZEN 3 1200 for $95.

EDIT: Although the better option is the 3200G so that you can actually get a display output from the thing. Same price, so it doesn't really change anything, but it does cut the CPU core count down a bit if that matters at all.

That said the buildout still doesn't work because you can't actually plug the "sata splitter" cable they linked into the motherboard. Because the splitter was actually a 4-lane SAS SFF-8087 breakout cable, and there's no consumer motherboard with 8x of those connectors on it. Good luck finding even 1 or 2 of those connectors on a consumer board, and it sure as hell won't be at dirt cheap prices.

So you either need 4x the computers they calculated, or you need to budget for add-in SATA/SAS controller cards. Which, because they aren't used in consumer land, are not cheap. You could go used, but that's still going to increase the bottom line (and won't be a reliable source of parts)

They also aren't factoring in assembly time nor budgeting for that. Building these isn't going to go very quickly.

hddherman · on March 7, 2020

I have gotten away with cheap PCIex1 2xSATA2 adapter cards, for roughly 10-15USD at the time of purchase. They did work, but this assumes a motherboard with room for lots of PCIe cards.

Edit: to clarify on the CPU usage, could a potential build also get away with a cheap AMD Athlon 3000G?

ComputerGuru · on March 7, 2020

Yeah, that cpu wasn’t listed at the time. It’s clear that this is a thought experiment and nowhere near worth the attention it is getting.

growt · on March 6, 2020

I feel like backblaze has already done most of this and has it in production [1]. Whereas this is just done back of the napkin calculation.

[1] https://www.backblaze.com/b2/storage-pod.html

TheDong · on March 6, 2020

Backblaze has tried to make their datacenter as efficient as possible, and still only ends up hitting $5/tb/mo for their b2 service, as a point of reference.

ksec · on March 6, 2020

>and still only ends up hitting $5/tb/mo for their b2 service

$5/TB/Mo was B2 Price with Profits, better depreciation ( Blackblaze replaces drives more often ) and faster connection.

$2/TB/Mo was napkins maths with ~10% Gross Profits.

notyourday · on March 6, 2020

Backblaze is very good but they are definitely not efficient in $$ utilization.

Efficient $$ utilization is bread racks, built out data centers abandoned by the likes of PepBoys that landlords will part for $3/sq foot per year and Google using servers without cases and velcros to keep hard drives attached.