They got rid of the PCI bottleneck by switching to PCIe, a bottleneck which surprised me when they designed version 1.0 of their pod. They could have gone PCIe at the time, I maintain http://blog.zorinaq.com/?e=10 and they were SATA controllers at the time that met their technical requirements (nr. of ports, Linux support, etc).
I know. I meant to say they could have managed without using that PCI slot at the time (ie. using only 3 PCIe SATA HBAs, or using a mobo with 4 PCIe slots).
FYI: Protocase (http://www.protocase.com/), the company that will build you the Backblaze case, sent me an email yesterday announcing that they are now selling completed Backblase storage pods without the hard drives for $5395.00. I'm not sure if this is the new design or not but I've set up 3 of these and finding some of the parts (SATA back planes) took weeks of searching and shady dealings.
The 5 port SATA backplanes were the hardest part to find when I built these. You can get them through http://www.chyangfun.com/ which is who Backblaze gets them through and we did too. They are hard to get ahold of sometimes so that's why I was looking around and found some shady companies selling them. Protocase mentioned above sells these which would probably be the easiest company to get these through.
Other than this, the other parts are pretty standard and are easy to find with some simple searching.
Running LVM over raid 6 volumes is the 'standard' approach of many enterprise storage deployments. The 'magic' in a good RAID6 implementation is what it does when things go wrong, and lots of things do go wrong. The checksumming is great too.
At some point you can dump the sheet metal on the 'pods', just build your own rack unit. If you look at the 'big iron' systems from NetApp, EMC, and others you will see they make one big enclosure then can install simpler systems within that enclosure. What this buys you is that you can distribute enclosure services into the big box and take it off the individual boxes. That gives you better cost efficiencies. And as you point out your system can run with fewer fans, so you can put a couple of largish three phase fans in the big enclosure (dead simple and very reliable) and use them to push the air through all the other boxes. (or pull it). Then add an android tablet for the 'door' of the enclosure that tells you drive status, etc and you're practically a player in the storage game :-)
Does anyone know if Backblaze will ever support Linux? I've wanted to use their service for a while now, but their lack of Linux support has being a big turn down, and I don't think they have made any change in their statements regarding this 'issue'.
We would like to, we just haven't had time to get it done yet. It runs internally, but is lacking an installer and a GUI, and we would need to prioritize and choose one or more Linux distributions to launch with. Ubuntu is an obvious choice (we focus more on desktop backup than on servers). But some people also ask for CentOS and a few others. It bums me out the Linux community has not solved binary compatibility anywhere NEAR the same level that Microsoft or Apple has, and few in the Linux community seem interested in solving this issue which massively, MASSIVELY hinders development and deployment, but that is a side tangent...
Explanation about the "GUI" comment above -> the Backblaze backup client was simultaneously written from the ground up compiling on Mac OS, Windows, and Linux. The same tree and the same source compiles on all three on EVERY SVN CHECKIN. There is one exception, which is the GUI is an extremely simple stand alone process entirely natively written to match the host OS. On Mac OS it is in Objective C in the beautiful Apple GUI layout editor, on Windows we use Visual Studio and C++ and Win32. The firm rule is these GUIs are ONLY allowed to edit one or two simple XML files, and all the real encryption, compression, transmission is done by other cross platform processes. On Linux we configure the XML file with "vi". :-) The X-Windows GUI has not even been started.
As a linux user I wouldn't mind having to edit files by hand.
XML is a pain, but holding up release due to lack of a gui is sorta silly. Plus I imagine, like me, most multi-server users would be rolling out a standard xml config anyway without more than a simple string substitution on a per server basis.
In fact, the only GUI I have running is on my desktop.
I don't know how many other folks feel this way, but I would kill for a GUI-less version of your client. I'd love to configure XML files, or some (any!) analog.
I've been looking for an off-site backup solution for a (nerdy and technically competent) home user for years, and yours is the only one that I could afford (poor recent BA here).
Please?! I would pay $10 a month (probably more, really) for that. Even if you don't want to support it, could you do it just for me? It'll be our little secret!
Honestly, the XML files are pretty simple. The one main one the GUI writes out is called bzinfo.xml and is found on any Mac system at /Library/Backblaze/bzdata/bzinfo.xml and on any Windows Vista or later system at C:\ProgramData\Backblaze\bzdata\bzinfo.xml
Backblaze is designed to be used with absolutely no configuration (for many users they have no idea where their Outlook.pst file is and we don't think they should have to know), and the only way we could figure out how to make this work was to backup EVERYTHING on your system unless you explicitly exclude it. So bzinfo.xml is basically a flat list of excluded folders you do not want backed up. There is also a throttle in there if you don't want Backblaze to utterly destroy your network uplink, and a few other small settings. It's pretty straightforward.
With that said, we really pride ourselves on easy to use software, so it goes against everything in our DNA to release software with NO GUI at all, but maybe we'll give that a serious thought. If you are using Linux, you probably aren't the average Mom & Pop user. :-)
To a linux user, editing an xml file is easy to use. And since it's a pretty reasonable default to backup /home, you're likely to be pretty safe with defaults anyway.
Call it alpha, see how much demand there is, and the extra money you bring in might be the motivation to finish up the pretty. :)
we really pride ourselves on easy to use software, so it goes against everything in our DNA to release software with NO GUI at all, but maybe we'll give that a serious thought. If you are using Linux, you probably aren't the average Mom & Pop user. :-)
I just want to join others asking here for a Linux version. Please do release whatever you have. And actually I would prefer a command line tool, so I can use it in my cron.
But even for editing text files you still need - maybe not a GUI, but a layer that checks the file format, gives meaningful error messages on where a syntax error is and why, etc. It's not quite the same amount of work as writing a GUI maybe, and I would also argue that you need this anyway, even when you have a GUI; but I used to think the same until I wrote a few tools that I was going to 'slap a GUI on later' and found that writing/maintaining a command line application also takes time; it's not just a few lines of code that you put before the library that's behind it.
If you provided a good solution for taking backups on Linux-system my guess is that you would get a lot of customers like me. If I could easily apt-get some binaries, setup what folders should be backed up and run backups every X-hour it would be a killer feature. Right now I manage a ton of rdiff-scripts to backup to all my files to remote locations.
I like the way CrashPlan works. I use it to back up both Windows and Solaris. The GUI is Java, but I only run it on Windows; the GUI communicates over TCP to the Solaris machine. This seems to me the best compromise.
Similarly, the CrashPlan service itself is written in Java and for binary compatibility, only relies on a little binary library loaded via JNI to accommodate the differences between Linux and Solaris, not to mention Windows and Mac. It doesn't even support Nexenta (OpenSolaris kernel + Debian userland) directly out of the box, but I was able to hand-combine the bits from Linux and Solaris installs to get something working: the moral being, for OSes targeted towards more technically adept users, you may not need as much work as you think, if your app is self-contained enough.
(Backblaze founder speaking here) - I have always liked the CrashPlan company and they seem like a smart focused team with a solid product. And I've heard other reports that CrashPlan works well on Linux, so people should definitely check them out and give them a fair try.
See my other post on this. It is extremely simple XML anybody would understand instantly. In fact, some people have already scripted changes to the XML files running on Mac to increase the "throttle" at night and decrease the "throttle" during the day. We'll be giving the Linux client some more thought over the next few days.
Hey Brian, thank you for explaining the reasons of why the Linux version hasn't seen the light yet, I have been keeping an eye on Backblaze since this blog entry you wrote http://blog.backblaze.com/2008/12/15/10-rules-for-how-to-wri..., quite a while now, and it sucks that there's still no Linux version available, so as others have mention please consider releasing something.
PS: I'm already one of your costumers and I'd love to be able to use your service in my linux machines.
The diagram of the Cost of a Petabyte is very interesting if it is true. It demonstrates the profitability of some SaaS models for selling what is a straight commodity.
However it seems in contradiction to the AWS, Rackspace model, which is a race to the bottom in that there are many competitors and they are selling a commodity (independent of the other high value services they are selling). There is some threshold of volume that is key in order to make money in that space.
Backblaze should really consider selling their pods - it would basically free money for them because whatever inventory they don't sell they can use for their service, and I'm sure there's lots of businesses that would pay good money for a cheaper alternative to storage servers from Dell or HP.
It'd be a major distraction, though. Setting up a separate support/services organization for that is not trivial.
Also, most businesses buy from the Dells or the HPs because they don't have the in-house expertise to manage a more bare-bones box (or more likely just don't want to). The companies that have the need/capability to manage more raw storage could just take the plans and build them out anyways I would imagine.
The blog post provides some fascinating data, thanks Backblaze!
$2100 per month for an entire rack worth of Pods (space, power, connectivity)
$74,000 is the cost to build 10 Pods to fill that rack.
If the Pods are assumed to have a lifetime of 3 years (most will last longer, but lets depreciate at this rate), and if the cost of capital is 20%/year, this equates to a monthly "payment/amortization" cost of $2750. Thus leading to a total cost per rack of $4850 ~ lets say $5000.
1350TB of raw storage is provided by this rack, which can be scaled to 13/15 to account for RAID6 (as revealed by brianwski here) - thus leaving 1170TB available for use. FS overhead etc, lets take this to provide 1PB of storage.
So, essentially, storage costs backblaze about 0.5 cents ($0.005) /GB-month. There are other costs ofcourse, Sean (amongst others) needs to get paid, etc. To go by a common thumb rule, for a minimum sale price, one third of sale price should be profit, another third should be org expenses/marketing/everything else and the remaining third should be the actual cash cost of the building/providing the product/service.
So very roughly, Backblaze could provide storage at about 1.5 cents /GB-month. Factor in 3-way software-level redundancy of data, and you are now upto 5 cents/GB-months for a very high quality storage service.
Contrasting this with 15c/GB-month that Amazon charges (in addition to transfer charges), I do have to wonder why Backblaze wants to stick to the "unlimited desktop backups" business. Even Google storage charges 17c/GB-month, in addition to per-request charges.
Its quite possible I have some factors wrong here and If anyone can spot anything wrong, I'd like to know. Nevertheless, it seems from the numbers provided that backblaze could make a killing in this market. I know I would be interested in using a pure storage backend - equivalent to S3. I use tarsnap and if tarsnap could reduce its backend costs by two-thirds, I know I'd be very happy.
What am I missing?
edit: wrote PB instead of TB. Numbers remain correct, though
True. Though thats not a generic storage service in the sense of S3 or Google Storage. Some also think its a "trojan" to detect pirated media :-)
On that note: I could not find any real detail about this - do they encrypt your files? what kind of privacy/security do they promise? I am assuming the media files do get de-duped across all users, but what about other, personal docs, spreadsheets, pdf etc?
I switched from JungleDisk to Backblaze a year ago and haven't looked back. BB is amazingly fast and painless to use. On top of that the cost is insanely cheap in comparison to using AWS (which Jungle Disk uses).
I am too a very happy switcher from Carbonite to Backblaze. It really is fast, as Valien says. The cost? Well, it has gotten cheaper since the dollah is dying.
It's 135TB worth of drives, but with RAID don't you see a far less useable amount?
Also, considering saving money on hardware costs is a key factor in Backblaze staying competitive, they must be saving money elsewhere and/or have other competitive advantages. Otherwise releasing this information would be akin to publishing a restaurant's 'secret sauce'.
Anybody who builds their own pod is welcome to use RAID or not, and which RAID you choose will affect your final numbers. At Backblaze, we configure it as RAID 6 groups, each group is 15 drives which includes 2 parity drives. So you are down to 13/15 = 86.67 percent of the raw unformatted space BEFORE the overhead of ext4 which adds a little tiny bit extra. So after formatting in our datacenter we are left with about 116 TBytes of storage space for customer files. On the other hand, we use lossless compression on the customer's files before transmitting to the pod, so it can sometimes appear like we are fitting more than 116 TBytes of customer data on a pod, if that makes sense.
I'd say any competitor worth worrying about would be able to build an equivalent system from scratch pretty easily, so I don't think they're risking that much. Convincing users that you'll be trustworthy, secure and around in 5-10 years is a much greater hurdle.
On the other hand, I read this article and I read their last one, when it too shot to the top of HN. Geeks go nuts for hardware porn, and are a great audience to sell these kinds of services to. So the marketing benefits are probably quite substantial.
Yikes, what? We all run Lion here at Backblaze and it works flawlessly for us, and our internal stats show THOUSANDS of Lion customers happily backing up. If you can provide more details of your setup we would LOVE to fix that for you. Since Mac customers make up more than half our customer base, we're completely, 100 percent dedicated to Lion. Send us info or contact us at our company name at twitter or facebook or use the help links off the homepage.
As a side note -> Backblaze only does one thing -> HTTPS POST. In user space. We do NOT extend the kernel, we do not have drivers (at all), we simply read files (we don't even write to your files), compress and encrypt them in RAM, and push them through the completely common HTTPS. It is unusual for Backblaze to cause any problems except a sluggish network or a hiccup in a skype phone call.
I sent an email to sales support explaining my issue on OS X Lion when I was trialing your product, but did not get a response.
Please feel free to contact me at mqudsi@neosmart.net. What I was seeing seemed to be that the software was flooding the network card (wi-fi on a slow WiMax connection) with requests, causing it to timeout entirely such that I could not even ping my router. This doesn't require drivers, kexts, or anything and can be replicated in user mode. It's pretty much akin to DDOSing your own connection.
While I ended up purchasing a Mozy home license for a year, I do not think I'll be staying with them as I am not satisfied with either their client or their backend.
At the lowest level there are three RAID groups in each pod. Each RAID group is made of 15 drives configured in software RAID 6 with 2 parity drives. This means you can lose 2 dives and the data is entirely safe and intact. If 3 or more drives completely fail simultaneously (not just pop out of the RAID group or power down, but where that drive is lost forever, like it will never power up again) you will lose at least some of the data on that RAID group. Layered on top of the 15 drive RAID group is LVM. The specifics of the LVM config are there is one PV (Physical Volume) spanning the 15 drives, then on top of that are one VG (Volume Group) spanning the same exact 15 drives. Then on top of that are as many LV (Logical Volumes) as it takes to keep each logical volume under the ext4 limit of 16 TB. With 3 TB Hitachi drives, there are 3 separate LV on top of the same exact 15 drives. Finally, there is one ext4 file system sitting on top of each of the LV (one to one with the LV). Disclaimer: I work at Backblaze, but datacenter and pods aren't my main area of focus.
While you are here....Why only a single boot drive? It seems like an obvious failure point, and at only $40 per drive I would think soft raid 1 would be no-brainer for reliability.
Regardless though, super impressed by the work into rolling your own hardware, hope you guys continue to do well.
I just asked, and found out we actually HAVE had a number of boot drives fail in our fleet of 200 pods. Most decisions in the pod are around saving money, so our initial thoughts were just that no customer data is on the boot drive so it isn't all that important. But don't get me wrong, there are SO MANY GOOD opportunities to improve the pod, Backblaze just stops working on the pod when it does what is needed for us and we run off to focus on other things. Your call on the boot drive is every bit as valid as ours. :-) I'm staring at an open pod here and I see plenty of good spots to put a second boot drive, and we'll probably be going to a 2.5" form factor (laptop) boot drive sometime soon which would yield even more space.
(Just FYI, I'm sure you guys have already thought of or experimented with this...)
We've had good luck so far with using small USB flash drives for booting big file servers. We keep the drive image pretty generic and if there's a problem with one, we just replace it with a cloned USB flash drive and reboot, no problem.
It doesn't seem to hurt performance at all for these kinds of uses -- although we do set it up without swap to keep the life of the USB flash device reasonable, which might or might not work in your case.
We're considering a PXE boot solution (among other solutions) just to keep all 200 pods (and growing!) updated to the latest Debian. But we also use the boot drive for a few other things like error logging and such. But the idea of eliminating the boot drive entirely is not a bad idea, we could drop the logs in a folder at the top of the data drives. We already (selectively) mirror various excerpts of the logs to other machines in case the whole pod disappears from the network so we have some history and understanding of what was going on when it went missing.
Perhaps streaming logs would help you out? My employer dumps a preposterous amount of log data constantly via streaming log systems (we use log4j, but there are solutions for syslogs, etc.). Aside from a few early hiccups, it works pretty well.
The ECC RAM absolutely does find and corrects problems (we see them in the logs). However, just to be absolutely clear we would not need ECC RAM -> Backblaze checksums EVERYTHING on an end-to-end basis (mostly we use SHA-1). This is so important I cannot stress this highly enough, each and every file and portion of file we store has our own checksum on the end, and we use this all over the place. For example, we pass over the data every week or so reading it, recalculating the checksums, and if a single bit has been thrown we heal it up either from our own copies of the data or ask the client to re-transmit that file or part of that file.
At the large amount of data we store, our checksums catch errors at EVERY level - RAM, hard drive, network transmission, everywhere. I suppose consumers just do not notice when a single bit in one of their JPEG photos has been flipped -> one pixel gets every so slightly more red or something. Only one photo changes out of their collection of thousands. But at our crazy numbers of files stored we see it (and fix it) daily.
Earlier we were totally interested in ZFS, as it would replace RAID & LVM as well (and ZFS gets great reviews). But (to my understanding) native ZFS is not available on Linux and we're not really looking to switch to OpenSolaris.
ANOTHER option down this line of thinking is switching to btrfs, but we haven't played with it yet.
By the way, at Backblaze I've felt like we have had to implement several things that I would have guessed would be standard "off the shelf". One example: When a customer wants to download a restore file with a web browser, are you aware there are no checksums for over-the-network transfers other than the built in (completely unacceptable) 16-bit TCP checksum? You are virtually guaranteed to have an undetected corruption within 30 Gbytes of download, which is basically what we like to call "a totally average customer restore". So Backblaze had to write our own custom reliable, restartable "downloader". It boggles my mind that the whole internet is throwing undetected errors on HTTP downloads and nobody cares to fix the protocol?!! Where the heck is Google, Apple, Facebook, Microsoft, or <insert standards body> defining a standard for web browser downloads larger than a few GBytes?
Actually, zfs-on-linux is doing pretty well. It's not production ready yet, of course.
As for HTTP, I agree with you entirely! There is actually a standard that lets you put a hash (SHA-1 or SHA-256 or something) on an http anchor link, and the browser will verify it when the download finishes. It hasn't really gained very wide adoption though. Personally, I think something akin to Bittorrent is a better solution, since it doesn't have to redownload the entire file when it detects an error. It's ironic that our videos often have better data integrity than the graphics driver installer that I just downloaded, or the browser that I downloaded yesterday.
The OSX Lion download is +3 GB, so Apple customers must take care of TCP errors (or at least Apple Support would be swamped with support requests).
Two possible ways of solving this: make a download into e.g. 16 MB chunks and append a SHA-1 checksum for each chunk (not enterely unlike BitTorrent). Then re-download the chunks where the SHA-1 doesn't match. Another solution would be to use e.g. Red Solomon error correction.
Yep, Apple solves this problem by using their "AppStore" application to reliably download the file in chunks (like you suggested). My complaint is that Apple should extend Safari to be able to download the OS X Lion download reliably. (Then let the rest of us use that protocol.) It seems a bit insane that the OS group at Apple had to write their own custom download application, but did not think to extend their web browser to allow for "restartable downloads that are no longer corrupted".
Microsoft does the same thing -> they have a custom application to download their OS updates reliably. They do not use Internet Explorer's regular download capability because of the limitations.
It baffles me that the big browser companies (Google with Chrome, Apple with Safari, Microsoft with IE) are not interested in building a standard reliable downloader. I mean, what else does a web browser really do?
gzip and zlib HTTP encodings (different containers for "deflate" compression) include a checksum of the uncompressed content (a CRC-32 variant and Adler-32 respectively). If compression isn't wanted, you can wrap uncompressed bytes with a deflate structure - although it'll have to be split into ~64k chunks with a few header bytes ahead of each.
I'm not sure which HTTP client implementations reliably test those checksums, though. (Or whether CRC32/Adler32 would be adequate for 30 gigs!)
And when using HTTPS, all corruption introduced by the network stack will be detected (thanks to the MAC in each TLS record).
But brianwski's point remains that neither HTTP encodings, nor HTTPS can detect certain errors like in-memory corruption of the data after it has been read from disk, but before it has been sent out over the HTTP connection.
We've had a backup server running for a client with FreeBSD's ZFS for about a year now, also with zero hiccups. We first tried to get this kind of stuff working about three or four years ago, and went the OpenSolaris/ZFS route -- what a pain that was. OpenSolaris somehow corrupted its own boot partition one day and just completely refused to start with a totally cryptic error code.
The last time we tried Linux for this, it couldn't do it. Maybe that's gotten better though.
We also tried DragonflyBSD, but it had hardware support issues, and OpenBSD, but unfortunately OpenBSD just simply cannot do large filesystems. At all.
ZFS-on-FreeBSD is the way to go at the moment, I think.
The board also supports Xeon processors, so it might be that it only supports ECC with Xeon. However, I didn't see anything in the docs to support this, instead it says "Dual Core processors of Ci3 and Pentium: support ECC UDIMM only"
I'm curious since I ended up springing for Xeon in my system specifically for ECC. Now I'm wondering if I made a mistake...
IIRC, some vendors wanted to offer super-low-end dual-core servers but rather than create a dual-core Xeon SKU Intel decided to just not disable ECC on some of the Pentiums and i3s. (As if their branding wasn't f'd enough.) So yeah, i3s support ECC when used in a server board.
"The X8SIL/X8SIL-F/X8SIL-V supports up to 16GB of DDR3 ECC UDIMM or up to 32GB of ECC DDR3 RDIMM (1333/1066/800 MHz in 4 DIMM slots.)"
Below are what we assume are ECC errors from an i3 based Backblaze pod, when this happens it is usually bad RAM (bad enough to crash the pods, replace with fresh RAM repairs it):
They got rid of the PCI bottleneck by switching to PCIe, a bottleneck which surprised me when they designed version 1.0 of their pod. They could have gone PCIe at the time, I maintain http://blog.zorinaq.com/?e=10 and they were SATA controllers at the time that met their technical requirements (nr. of ports, Linux support, etc).