Hacker News new | past | comments | ask | show | jobs | submit login
Rclone banned from Amazon Drive for having the encrypted secrets in code (rclone.org)
89 points by msravi on May 22, 2017 | hide | past | favorite | 76 comments



For context: Amazon Cloud Drive is a Google Drive/Dropbox analog, except that it provides "unlimited" storage space and bandwidth for $5/month. The built-in user interface requires you to manually upload/download files through a web browser, but the service also supports an API for programmatic access. So tools like rclone and acd_cli were developed to let you do bulk transfers and/or mount your storage as a network filesystem, optionally with encryption (which defeats any attempts Amazon might try to use for deduplication).

Now Amazon is suffering the obvious consequence of offering unlimited storage: people are using it to store tens of terabytes of media and/or backups at very low cost. In an attempt to kill off heavy users, they shut down registration of new API keys several months ago, and now they're systematically revoking the API keys used by popular open-source tools.


Yeah, this is a hard one. There are users on /r/DataHoarder/ that claim to have uploaded literally 100s of (encrypted) TBs to Cloud Drive, which is plainly unsustainable and abusive.

On the other hand, they've also killed the product for a lot of more legitimate users as well. The Amazon web interface and apps for Cloud Drive are obnoxiously terrible, and Rclone really is just a better way to use it. I've been using it to sync 10s of GBs of photos between all my different computers, but with Rclone unavailable, I'll have to fall back to Google Drive, S3, or some other option (the unlimitedness of Cloud Drive was good peace of mind).

I'll be keeping an eye on it over the next few weeks to see whether shipping a binary with OAuth secrets was actually the reason for the ban, or just a pretext for getting the Rclone users off the service (personally, I suspect the latter).


I just use it to upload daily encrypted backups of my mail-server (< 500MB per month)... so I wouldn't mind if they set some reasonable limit to encrypted uploads (say 1-10TB).

I feel like anyone who's actually uploading personal content and who isn't uploading media files that are amenable to deduplication would be comfortable with some threshold as well.


> so I wouldn't mind if they set some reasonable limit to encrypted uploads (say 1-10TB).

How is Amazon supposed to distinguish between encrypted and non-encrypted data that you upload?


A heuristic. No common magic header and poor compressibility? Likely encrypted.


Since encrypted data is indistinguishable from random noise, I think poor compressibility is actually zero compressibility, isn't it?

There are tools to search for TrueCrypt / other encrypted partitions on disks, so it's a solved problem to detect encrypted data.

It would be unfortunate if services ban the ability to upload encrypted secrets, though. On the other hand, that'd be good for Tarsnap. I wonder how much it'd cost to store 10TB on it?


The first billion of digits of Pi might look pretty random, but there might be a short program which genrates them - which can be considered a compressed form. In general, it is impossible to decide how good a given string might be compressed. https://en.wikipedia.org/wiki/Kolmogorov_complexity


Might be a short program that generates them? I'm going to go ahead and file that in the understate of the century folder.


Since encrypted data is indistinguishable from random noise, I think poor compressibility is actually zero compressibility, isn't it?

I don't think so. Random noise can take any form - even of a string composed entirely of zeros, which would be trivially compressed. It's just very unlikely that it'll actually be compressible.


Random noise can't really take any form when you apply the implicit restriction of your search fitting within finite time and space.


I don't get what you're saying. Searching? For what?


The definition of a random number is in the process of generation. You have to actually generate random numbers if you want to have a random number that meets some criteria. And even with impossibly vast resources, you will never find a random megabyte that compresses well.


Well, sure, that's what I wrote: "it's very unlikely that it'll actually be compressible." But it's incorrect to claim that random numbers are by definition non-compressible.


It was in the context of encryption. Even a 64 digit random number is very unlikely to be compressible, and 64 bytes is about as small of an encrypted partition as you'll ever have.


It'd take longer than the universe has left to find a string of N zeroes by generating random numbers, for sufficiently large N. And N is surprisingly small.


Sure, but not zero.


Yes zero. As zero as zero can possibly be, measuring with the most precise instruments possible. There's an infinitely better chance of both of us being struck by lightning and imagining you found such a number randomly.

It wouldn't be zero in certain math worlds. It is zero in the real universe.


So why not just take apart the official apps for Google Cloud Drive and use whatever API and authority into has to build your alternative client?


The nice thing about Rclone is that it already supports Google Cloud Drive (and a host of other providers) so the technology isn't the problem.

I'm lamenting Amazon Cloud Drive in particular because it was the best deal. $60 a year for unlimited storage and with no caveats (or so it seemed before last week).


I am sorry, "Google" was clearly a typo: I meant "Amazon". You do not need Amazon's permission to write a client that targets their API.


If a service claims that they provide UNLIMITED storage why using it is abusing?


HOPSFIELD No, these are entries for McDonald's Sweepstakes. No purchase necessary. Enter as often as you want. So, I am.

CHRIS Really?

HOPSFIELD This box makes it one million, six hundred thousand. I should win thirty two point six percent of the prizes, including the car.

CHRIS Kind of takes the fun out of it, doesn't it?

HOPSFIELD I suppose so. But they set up the rules, and lately, I have come to realize that I have certain materialistic needs.


For those who don't recognize it, that's from the movie "Real Genius". For those who don't quite remember it that way, that looks like it is the version from an early version of the script. By the time the movie was actually made it was changed to Frito-Lay from McDonald's, Hopsfield was changed to Hollyfeld, and the above dialog was altered slightly.

That scene is based on a real life incident which in fact involved a McDonald's sweepstakes and Caltech students entering over a million times: [1]

Since everyone knew that the school in the movie, Pacific Tech, was meant to be a thinly disguised Caltech (it only became Pacific Tech when Caltech objected), and McDonald's had not been happy with the Caltech sweepstakes prank and probably would not want it brought up, my guess is that one or both of McDonalds and Caltech asked for the change.

Changing it to Frito-Lay is an interesting choice, because six years before the McDonald's sweepstakes, a group of Caltech students tried mass entry on a Frito-Lay sweepstakes, but apparently were not as successful.

[1] http://hoaxes.org/archive/permalink/the_caltech_sweepstakes_...


Crap. I knew something seemed off about it. Thank you!

====

Lazlo: No. These are entries into the Frito-Lay Sweepstakes. "No purchase necessary, enter as often as you want" - so I am.

Chris: That's great! How many times?

Lazlo: Well, this batch makes it one million six hundred and fifty thousand. I should win thirty-two point six percent of the prizes, including the car.

Chris: That kind of takes the fun out of it, doesn't it?

Lazlo: They set up the rules, and lately I've come to realize that I have certain materialistic needs.


abuse: use (something) to bad effect or for a bad purpose; misuse.

Abuse doesn't mean breaking the rules. It's like going to an all-you-can-eat buffet and staying for a week. Or taking a job with unlimited vacation time and coming to work once a month.


They should declare limits, then.


I bet in the ToS there are likely limits spelled out, or barring that, a clause that grants Amazon to right to shut you down if they unilaterally decide you're "abusing" the system.


I have looked at the TOS in Spain and you won the bet :)

They can shut down your account for the reason you stated


Yeah, never make a bet that a company doesn't have a vaguely worded way to kick off whoever they want at any time.

But it would be nicer for everyone if they could just honestly state a number of TB.


There are: you can only use approved third-party apps to connect to the service. This one is no longer approved.


Then we agree to disagree.


What do we disagree about?


I disagree with you in that it is a misuse.

What Amazon should do, as Microsoft should have been done in the case of the unlimited Onedrive offer is putting some clause of reasonable use.

When some company offers an unlimited resource, some of the customers will use it to upload/download/etc a lot.


Technically it's not. Hoarding for hoarding's sake is a waste of a obviously limited resource, though.


Being waste does not imply being abuse.


Wait, the resource is UNLIMITED, isn't it? At least it is sold as unlimited.


$5/month for unlimited access via API was really a steal. This would buy you only 217 GB via standard S3 storage, and then you'd have to pay an exorbitant $0.090 per GB for egress data transfer; you'd burn through your $5 after downloading 55 GB.


I'm a bit skeptical about how much deduplication saves them. I have a couple of 100 gigs on ACD, unencrypted, most being family photos/videos - I don't think they're going to find any portion of those that's remotely common to other users - at least, I hope not!


If it's file-level deduplication, sure. But if it's chunk-level, I think you may find that you actually have many chunks identical to many other users' chunks.


Unlikely given any non-negligible chunk size and accounting for the randomization effect of compression (image and video).

I think deduplication buys them most when people upload common assets - videos downloaded from the net, ebooks, ISO images, etc.


Doesn't compression generally reduce entropy?


> In an attempt to kill off heavy users, they shut down registration of new API keys several months ago, and now they're systematically revoking the API keys used by popular open-source tools.

Actually this seems to have been more triggered by a serious issue with acd_cli's authentication server that resulted in its users seeing other people's files:

- https://web.archive.org/web/20170514020241/https://github.co... - https://www.reddit.com/r/DataHoarder/comments/6bi5p5/amazons...

Amazon has started paying a lot more attention to open source tools as a result.


> now they're systematically revoking the API keys used by popular open-source tools.

Would this also impact Arq users? I was hoping Arq would support Backblaze...


I think DropBox will shut down their API in July of this year. I got email warning me about six months ago - since then I moved our internal system to FTP...


They just updated it to a new/improved version of the API


From the discussion at

https://www.reddit.com/r/crypto/comments/61zupq/amazon_drive...

I learned that Amazon Drive is a service that provides unlimited storage for a fixed cost, and that various applications, including rclone, use it for an application that I would expect (as a developer) to use S3 for, and therefore to charge my users per amount of data stored. (Amazon Drive is not part of AWS, I believe.)

I am not extremely surprised that the terms of service of Amazon Drive make it hard to build a compliant application like what rclone is trying to build. I'd expect they can do this with S3 straightforwardly -- at the expense of users needing to spend O(n) instead of O(1) on their data.


I paid for Amazon Drive in order to do backups, some of them with rclone. I am not using Amazon Drive for commercial purposes.

They said Unlimited storage, they should provide Unlimited storage, otherwise it's false advertising.

In other words, I hate apologists.


they are providing unlimited storage sure, but they also say they can terminate your account at any moment for any reason.

If you stop to think about this part for a second you will realize this service is completely inadequate for backups.


The difference here is they splash "Unlimited" on their website in 32 point font in the 1 sentence summary, and put "at our discretion" buried in their 2500 word terms of service. If they want to operate the service under those terms, fine, but they should have to advertise "as little as 0 bytes of storage", not "unlimited".


This is easy to solve: make font size legally binding. If 2 statements in an advertisement don't agree on something, the more visible statement should overwrite the other


I agree, if you want to hold them to their word then you should include all of them. They agreed to unlimited storage at their discretion, and so did you.


No, they should stop providing unlimited storage. What a dumb idea. Or charge people on bandwidth. TANSTAAFL.


In a comment of mine earlier this week [1], I (along with others) speculated whether the shipping of OAuth2 secrets was the reason for the ban, and whether they're other contributing, strategic factors. I wrote:

Amazon has revoked rclone's OAuth2 API key. However, consider that rclone's default OAuth2 client id and secret are compiled into the rclone executable, and thus effectively public; aka. anyone can extract them and pretend to be rclone, and fool users into obtaining access and abuse them for unrelated purposes.

A far better option is for the cloud provider to let users generate their own OAuth2 clients, such as Google does (and supposedly Microsoft, although for me it's always errored out). Unfortunately, Amazon has a "call us" style of Developer access, which effectively translates to no new API access being granted to these types of users.

The speculation around the web is that Amazon also wanted to shut this down because they offered "unlimited" storage, and people were using it to store very large amounts of hard-to-compress, hard-to-dedup data. Breaking a popular tool used to accomplish this (e.g. it supported on-the-fly encryption, producing the exact style of difficult data) will cause some portion of less profitable users to migrate elsewhere. This may or may not be true, but it's certainly an intriguing point.

Some companies like Google allow easy registration of OAuth2 clients, so any rclone user can make their own API keys at console.developers.google.com and feed those to rclone, instead of having to use the built-in credential. Rclone's documentation refers to this ability, but positions it as a performance improvement for advanced users to sidestep a shared rclone-wide quota, rather than for any other purpose.

However, it, unlike many, many other applications, at least lets you plug in your own credential -- this shouldn't be an exotic flow, but rather the normal way to grant access to a third-party application, instead of distributing a hardcoded secret. Sadly, the app developer community has been slow to realize and demand this, and providers have been slow to implement it. Google lets you register as many applications as you want; GitHub even lets you make your own OAuth2 scopes. But there's few others who come close.

[1] https://news.ycombinator.com/item?id=14380106


> Some companies like Google allow easy registration of OAuth2 clients, so any rclone user can make their own API keys at console.developers.google.com and feed those to rclone, instead of having to use the built-in credential

How does making new Google API keys help with an Amazon service?


It doesn't, but the point stands that users should be able to create their own auth keys and use them to authenticate a third party application to the target service, rather than having the app author bake some arbitrary key into the code.


Well yes, that would be nice, but it sounds like Amazon Drive doesn't operate that way, so rclone has no choice but to use a single shared API key.


Link to the infringing source, for the curious:

https://github.com/ncw/rclone/blob/a9d29c22645c9b5d2ab938ebb...


> My guess is that the software will probably just require users to get their own secrets as part of the initial configuration.

Amazon has stopped giving out any more API keys (https://developer.amazon.com/amazon-drive). So I guess that isn't an option unless Amazon opens it up again...



So is this basically sharing the same API key? Wouldn't that make using this insecure, as you'd be seeing the same data as other rclone users?


Not if the source data is encrypted using a local key. What may be possible is deleting other user's data as I believe it'd all be authorized with the same API key.


In OAuth, you have basically two tokens: One for the "software"/"third party service provider", one for the account itself.

So for example, if you want to use SomeMagicTweetTool to send tweets, they use their token to prove it's them, and you grant them a special token for your account.

Someone who has their token can pretend to be them, use their quota if there is per-software/per-provider quota, but cannot access your account unless you gave them a token for your account.


This is a perfect example of the tragedy of the commons. No one person feels the impact of abuse, but ends up ruining it for everyone.

Clearly they didn't intend for this service to be a general backup system.


This is a perfect example of why companies should never offer unlimited services, if they don't expect users to actually take them up on the offer...

While I can understand and agree that fair-play should be a part of the service, if the offer on hand is Unlimited, then it is not abuse to upload tera-bytes, peta-bytes, exa-bytes or even zetta-bytes.

Amazon is hardly a neophyte as far as understanding big data, they are very well placed to know that offering unlimited storage could actually attract really big chunks of data.


I guarantee that every company that has ever marketed an unlimited service has internally raised this question. And the answer always is, "we'll just boot the most demanding 0.01% of customers when necessary." This is a tried and true tactic, no one bats an eye.


Arbitrary and capricious.

    while (1) {  for user in userlist {
        # Top .01% of users are abusive and we boot them
        if usage_distribution_top(user, .0001) {
            disable_account(user);
            send_acctdisabled_notice_to_user(user, "Excessive usage of unlimited storage");
        }
    } sleep 1; }
Hours later: "How have we lost half our customers? We only ever removed the top .01% for their 'abusive' usage of our unlimited service!"

If the concept isn't clear to some people, the problem is this: You can't repeatedly cull the outliers — the top x% of the distribution — because by culling you shift the distribution to create more outliers.

I'm sure businesses recognize that, intuitively if not explicitly in their planning meetings, and therefore cull rarely enough that it doesn't impact the bulk of users (in part due to influx of new users). So you cull the top x%... who decides what x is, and why, if you state in your promotional material that the service is unlimited? "If we set x to .001% and cull monthly, it'll help our bottom line. Those users are scum data hoarders anyway and we won't serve their kind with our 'unlimited' storage service." That should be lawsuit-worthy.

Companies in this situation should be forced to specify a fixed usage limit (exactly what these companies don't want to do), or live with the outliers.


They create a kind-of-fixed limit.

They don't create a running list of the top X% of usage and ban those users. They study their customers usage for some period and find the top X%, then translate that to a fixed upper limit. Then they punish anyone who goes over that fixed limit. If their customers or technology changes, they can re-calculate that number.

As far as "unlimited": marketing is full of bullshit words that don't mean what you think they mean. Courts (no wait, its all company-paid arbitrators now) unsurprisingly favor the companies.

When the Supreme Court sided with AT&T on the binding arbitation clause, no joke I got an email from Charter Communications the very same day the ruling was released that said they updated their terms of service to include binding arbitration.


Then WHY NOT STATE THAT SEMI-FIXED LIMIT TO BEGIN WITH AND SAVE EVERYONE THE HASSLE OF TRYING TO GUESS WHETHER THEIR USAGE WILL BE ALLOWED OR NOT?

If the limit is dynamic and periodically re-evaluated, then you run into the "there's always an outlier" problem.

If the determination isn't dynamic, and Snake Inc. advertises its service as 'unlimited' instead of stating the limit, and then starts discriminating against users or tools associated with higher usage, and get a reputation of being a snake in addition to actually having an undisclosed cap, what then? They'll not only lose their high-end customers to other services (which apparently they don't want or they could have disclosed the limits to begin with), but as already mentioned they also hurt their reputation, because whatever the courts say, people are not happy when an unlimited service actually discriminates against high-usage customers.


I agree when you say that arbitrarily kicking the top x% is bad, but the discussion of shifting the distribution is a red herring. It's trivial to compensate and use only the un-shifted distribution to make these decisions. It's bad for completely different reasons.


They also aren't a new to reserve to right to cut users off and knowing that the wider world doesn't care much if they do that to "extreme cases".

And on the other side, many of the more extreme users of drive probably aren't new to storing a lot of data and knew that Amazons offer had to be unsustainable when used to much and thus would disappear in some way.

I'm not particular sympathetic to either sides messaging here. It's sad that rclone got caught in the middle of it.


100% Amazon's fault. They could have simply said something like, "We reserve the right to cap amazon cloud drive usage at <x> TB (currently, but increased occasionally)."

But they don't want to do that, apparently. Why? I don't know, but it looks like they're worried some of their customers (the same ones they don't like because of excessive usage) will go over to B2 or whatever other cheap services are available, for the peace of mind of not running into any caps even if the service costs more at 2TB or 5TB worth of storage. "Unlimited" storage companies, that nevertheless discriminate against very high usage, want to have their cake and eat it too. Either state a limit and accept the risk that some of your customers decide to go somewhere else, or don't limit and don't discriminate against them. Those are the reasonable choices. Other choices are not reasonable.


What they can do is say that you can upload and download Unlimited bytes... At a maximum transfer rate of 1Mb/s.


I'm not sure if you're being sarcastic — it's very clearly a general backup system: "Amazon Drive offers secure cloud storage for your photos, files, videos, music, and more. Back up your files to the cloud and know that all of your documents are safe."


Exactly.

But it also true that if you offer something unlimited you need to add reasonable limits / throttling so that pathological and cheater do not destroy experience of others.

Amazon Drive might be able to bandwidth limitation for users which are abusing the system but as I far as I understand they just route things to Amazon S3 so bandwidth limitation is probably hard for them. Also they probably do not want to add bandwidth limitation for people who use Amazon Cloud Drive for photos and movies. Not 100% sure but I understand they position.

And I read on redit how people started moving data to Google Drive thinking everything will be super fast - but Google already have bandwidth limitation in place.


Notice our backups failing over the weekend. Set up a new account with backblaze and re-ran the backups.

It sounded to good to be true, fortunately we were still on the trial.


Backblaze is also unlimited, flat rate storage and is therefore also too good to be true.

"paying a flat rate for unlimited storage, or transfer, pits you against your provider in an antagonistic relationship. This is not the kind of relationship you want to have with someone providing critical functions."[1]

[1] http://blog.kozubik.com/john_kozubik/2009/11/flat-rate-stora...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: