Hacker News new | past | comments | ask | show | jobs | submit login
Why Tarsnap won't use DynamoDB (daemonology.net)
156 points by cperciva on Jan 23, 2012 | hide | past | favorite | 37 comments



> $44 per month. That's 14.6% of Tarsnap's gross revenues;

Hi Cpervia,

Please, please, please, charge more for Tarsnap.

Unbeknownst to you, I used you as an example in a blog post a year ago:

> If cpervia tripled the cost of tarsnap tomorrow I would not bat an eyelash.


Since your quote missed some important context: That's $44/month per TB of data stored on Tarsnap. Tarsnap's gross revenues are considerably more than $300/month. :-)

> Please, please, please, charge more for Tarsnap.

Funny thing is, I get lots of people (especially large users) saying exactly the opposite.

The fact that there are lots of people who think I'm very wrong in both directions suggests to me that Tarsnap's pricing is about right.


> The fact that there are lots of people who think I'm very wrong in both directions suggests to me that Tarsnap's pricing is about right.

I have no opinion on Tarsnap's pricing but I will say this: there will always be people who think your server/app/whatever should be cheaper, even when it's free (then they'll complain about the support, uptime or whatever).

What's more, the people who aren't willing to pay anything typically make the worst customers. Instead of being grateful for what they're getting, in my experience they tend to consume far more support "bandwidth".

You might be doing yourself a favour by cutting such people off (effectively).


I don't know, SpiderOak costs $100/yr and I can use up to 100 GB. In comparison, tarsnap would cost at least $130/yr for the 36 GB I use now.

It's not about cheapskating, it's about competition.


As a large customer considering moving to tarsnap, there is one area in which they have no functional competitors I have been able to find: scriptable (i.e. command line based) remote backup with decent deduplication. All of the cheaper options I've found that have deduplication support are GUI based; we simply have too many VMs, and they change to often, for that to be tenable for us at all.


I'm assuming you mean deduplication among your own data, in which case, wouldn't compression take care of it? I haven't used tarsnap, what's the benefit over duplicity?

EDIT: Oh, you mean deduplication of files between separate backup sets? That is a nice feature, true.


It is, and tarsnap does it really, REALLY well. I'm backing up what is, on disk, umm... (checks emails) about 15GiB, and I'm currently spending about 8 cents a day. Now if he'd just implement de-dupe across machines on a given account... :)


For the record, rsync.net claims to be able to do data deduplication on their Features page. Could send them an email.


What rsync.net actually says:

"This simple offering gives you complete control over organization, compression, deduplication, versioning and meta-data. You are NOT locked into a particular application or protocol, and there are no constraints on file sizes, retention, or access."

Which is great if you can find something that will do deduplication for you and encrypt and handle that the disk isn't actually local. I couldn't.

Also, rsync.net is significantly more expensive than tarsnap in my experience.


Thank you for providing some revenue information about Tarsnap. I am very happy that Tarsnap is doing well financially, for my entirely selfish reasons:

1) This means I can continue using this great service.

2) I hope revenue from Tarsnap helps keep you financially secure, so you can continue doing excellent work on the FreeBSD project.

Tarsnap has been bar-none, the best business service that I have ever encountered.

Here's the kind of customer care that you get with Tarsnap:

https://gist.github.com/1665597

When there was a security problem with nonces, you quickly disclosed the problem and came up with a fix, so that existing data could be re-encrypted. I GLADLY paid the usage fees to re-encrypt my data, and refused to accept a credit to my account, because I was impressed with how well you took care of the situation.

Thank you!


Interesting observation. There are always people want lower price, especially the large users who are more cost-driven. However, the only real measurement is if they vote with their feet. Try raise price a bit and see how many people drop off and the impact of the signup rate.


the only real measurement is if they vote with their feet.

They do. I've talked to people who have decided against using Tarsnap because it's too expensive for them; and I've talked to people who used Tarsnap for a while and then left because they decided it was costing them too much.

I don't understand people here. I know I'm not a fantastic businessman, but really, I think I might know something about the people who use Tarsnap...


I call these shodoos, as in, "You know what you should do..."

My response is always, "Please, tell me your idea." but the response in my head is always, "Yes, I do know what I should do. I've been thinking about this for several years, not just the 10-20 seconds you have."


Not sure if it's a question you'd answer publicly, but have you considered having pricing plans to capture both pies?

I'm genuinely curious, not suggesting you should do it.


You mean to offer a lower price, but say "if you want to pay more, feel free"? I have a feeling that wouldn't work very well.


The reverse. Offer a high price, but make it easy to get a discount (i.e., by contacting sales and asking for one, or Googling for a coupon).

Or offer an Enterprise plan and charge a ton for it.

You might be surprised. Some customers may actually prefer it if you charged more, and might not consider Tarsnap if you don't. (Enterprise software is "supposed to be" expensive).

At the very least its worth testing.


Offering a dedicated Windows Server version might be one way of creating an "Enterprise" edition and charging a lot more for it.


This is an awesome idea. Do this.


I loathe companies that do this. It's deceptive and inefficient. Don't lie about your prices, and don't give breaks to people that intentionally cost you more by taking up the time of a sales agent.


I fail to see any deception. A company prefers to get price A for their services so they advertise price A. However, the people in charge would be willing to accept price B rather than lose the sale so price B is made available when price A won't cut it. You are offended by it because you are unwilling to negotiate and so these prices aren't available to you. If you don't ask, you'll never know the answer.


Inefficient? From http://en.wikipedia.org/wiki/Price_discrimination

> The effects of price discrimination on social efficiency are unclear; typically such behavior leads to lower prices for some consumers and higher prices for others. Output can be expanded when price discrimination is very efficient, but output can also decline when discrimination is more effective at extracting surplus from high-valued users than expanding sales to low valued users. Even if output remains constant, price discrimination can reduce efficiency by misallocating output among consumers.


You'd be surprised. I found a good introduction to this ideal in the book "The Undercover Economist":

http://www.amazon.com/Undercover-Economist-Exposing-Poor---D...


He means offering tiered pricing.


TL;DR version: "The custom NoSQL store (that lacks one of DynamoDB's major selling points) I wrote for my unusual use case works better for my unusual use case than DynamoDB."


Yep. Or as I put it, DynamoDB is a great hammer, but sometimes you need a screwdriver.


My TL;DR version is "DynamoDB is much more expensive than custom NoSQL store at thousand-writes-per-second scale"


I don't mean to hammer this comment page with this point, but just to clarify this since the product is so new: It really has very little to do with writes per second. High volume writes are Dynamo's bread and butter.

His cost issues come from the large item size.


No, I have very small items. The cost issues come from the large number of items.


I thought I read 33kb somewhere in there. If so, by dynamodb standards, that's not a small item. If I mis-attributed that 33kb number, then I kindly retract what I said :D


I have 33kB blocks which are stored on S3 (after aggregating them into objects of up to 8 MB). I have key-value pairs which are 24 bytes and 53 bytes long.


Good article.

In many ways, I think the author is echoing the same frustrations as with Google App Engine's Datastore ( http://news.ycombinator.com/item?id=3431132 App Engine charges $6,500 to update a ListProperty on 14.1 million entities ). Both Google Datastore and the new AWS DynamoDB are great, fast services, but they are a bit too expensive at the medium/high end.

Frankly, I wish App Engine/AWS/Heroku/etc would introduce a database that traded speed for cheaper costs. I'd be fine with certain writes being delayed if it meant lower costs.


If Amazon had an option to use HDD instead of SSD storage for a certain DynamoDB table and adjusted the pricing accordingly, would that pretty much take care of users wanting basically what DyanamoDB offers except for cheaper with slightly worse performance?


Colin, since you also wrote the Tarsnap back-end data engine yourself, how would you weigh a cost vs loss-of-control of that functionality in Tarsnap?

From my perspective, the flexibility and control you have over kivaloo generates some value to you as the developer of Tarsnap. At what cost point/function-level would you actually make a switch to a SaaS back-end?

My interest in this question goes beyond this specific case to the consideration of when service/library use in a new SaaS production is contraindicated.


If Amazon had a service which gave me the cost and performance I expect to get from kivaloo I'd start using it without any hesitation. I assign a negative cost to loss-of-control since the testing provided by their large user base and their experience with scalability far outweighs the advantages of greater control.


Not to denigrate Colin's work, nor Tarsnap at all, as I love these CLI based tools, but is there anything better about Tarsnap than just using Duplicity as I do now?

All I have to do is a ' duplicity $DIR s3+http://bucket ' and I will get a gpg encrypted full/diff of my data, which looks to be a bit cheaper and GPG encrypted stored on S3. Plus if I really want to save some cash I could mark every object as Reduced Redundancy to save even more money.

Thoughts?


I only skimmed this, and I'll go back and read it later, but I think the issue he's mentioning is price?

It's true that on Dynamo, large item sizes get very expensive, very quick. That stems from your data being stored on SSD--and with a replication factor of 3 (iirc).

Cloud or no cloud, Amazon or not, SSD storage is still a lil pricey.


Lots and lots of small items also costs though, because there's a 100byte per k/v pair overhead. So if you're storing 2GB of 100 byte k/v pairs, you have to pay for 4GB of storage.

If you store lots and lots, you probably also have either high read, high write, or high read/write rates, in which case you'll need to pay more per hour as well.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: