Hacker News new | past | comments | ask | show | jobs | submit login
Testing sync at Dropbox (2020) (dropbox.tech)
68 points by sh_tomer 11 months ago | hide | past | favorite | 37 comments



I was an early adopter at dropbox, got all my friends/family/etc on it. Maxed out the free affiliate space when I added my .edu email to it for a grand total of 16.125GB, technically over the max, but, whatever I guess.

I eventually got into a 'make decisions' IT roll, and, Dropbox was an easy choice, did everything we needed it to, price was fair, and we used it for years. Everyone had access to the webUI that needed it, and, everyone else had access to an SMB share where dropbox actually ran. All was good until suddenly nothing would sync. Tech support was useless for about a month until I finally got a call with some kind of higher level tech, who told me that what we were doing wouldn't work, and never should have worked. Mind you, we'd been doing this for years, and stuck with dropbox because of it.

So I dropped them like a bad habit and never looked back. Our portal system had half-ass but usable remote file sharing, so, we used that while we rolled our own system. Was a rocky 6 months, but, now, we have extremely tight, easy, automated, simple to manage file sharing. So really I should thank them for the impetus, and I would if they weren't so abundantly unhelpful.

Personally, well, my personal account continued to slowly lose features over time, and it finally got to the point where the messaging from them was "Pay or Leave", so, I left and did my own thing, which is a combo of Syncthing and Nextcloud that works extremely well, and I own 100% of my data.

So dropbox might have some glitz, but I wouldn't touch it with a 10ft poll, they're too big for their britches, and have been for awhile, and worse, they seem to want you to believe that file synchronization is hard, cloud storage is hard, and file sharing is hard. None of that is true anymore, so what you're buying is a name and a logo with them with a free side of smoke blown up your ass.

No thanks, I'm allergic.


Yep, dropbox was a great company until it wasn't.

I made their first "unofficial" android client and they "allowed" it for a year or so up until they wanted to sue the pants off me. (see https://news.ycombinator.com/item?id=27036593)


suing sounds very extreme, but as someone who has worked on a similar service, unofficial clients are actually pretty annoying to deal with. They often end up using ancient APIs, dodgy secrets handling, we don't know what trackers they use, they do not follow our policies for usage that our official clients do. But they create a secondary community and some actually work surprisingly well for what they try to do.


> everyone else had access to an SMB share where dropbox actually ran

What the...? It honestly does sound like you were holding it wrong.


Why shouldn't that work? Dropbox and SMB both just need to read/write/watch files like any normal process. I do the same thing at home with Syncthing/SMB and it works fine.


From a theoretical perspective, sure, but the product Dropbox sells is either a website or an app each end user gets that adds local sync and some other useful sharing features. They certainly don’t intend it as a centralized system that people expose over the network in a bespoke way, and I can fully sympathize with why they’d tell this user that’s not something they want to support.


Similar here. I was a huge Dropbox evangelist to almost everyone I knew. Then Dropbox limited free accounts to three devices. Then I started getting the emails, texts, and phone calls asking me why the thing I made them sign up wasn't working the way it always had from people who have no experience in the software industry and couldn't understand a company just taking features away from them.


>and couldn't understand a company just taking features away from them

Every tech company who offers incredibly generous free stuff eventually does this, even Google. There's no free lunch. All the cloud storage, the dev salaries, the bandwidth needs to be paid somehow.

I remember when Google was offering unlimited photo storage in their Google Photos cloud. Well that's been gone for a while now, even for the Pixel users who were supposed to get this indefinitely.

Remember Cerberus for Android? They had a "lifetime license" you could buy at one point and after a while they revoked that with a sobbing email saying their business doesn't work without users paying a subscription and they'll have to move everyone, including those who paid the lifetime license to the subscription model, to not go bust. Fuck them.

I think enough users have been burned so far by this model:

  Step 1: Start-up launches cool new app with amazing free feature to gain market share, even though it makes no sense how they're funding it all with nobody paying for it.

  Step 2: VCs see the amazing user growth and invest.

  Step 3: Business or lack thereof was unsustainable from the start so VCs start clamping down, shutting down free accounts and squeezing paying users.

  Step 4: Business slowly dies out since it was never sustainable, and users leave to greener pastures as they see the freebies being taken away from them.
Sound familiar?

I wish tech companies would be sued for backtracking on their promises they made just to lure users in to fake growth.


Yep, I was also screwed by Cerberus. These days the only software products I pay for are those I can also pirate a backup copy of.


I wouldn't really call 16.125GB "incredibly generous" to be honest.


If OP was an early adopter, remember that it's 16 GB in 2007-2008. The iPhone didn't exist yet. The top Thinkpad at the time, Lenovo W510 (https://www.notebookcheck.net/Review-Lenovo-Thinkpad-W510-43...), had 320 GB of storage; OP had 5% of its storage for free on dropbox.

Nowadays we have new laptops with routinely 1TB of storage. Imagine 50GB of free storage with Dropbox, or any equivalent, today.


>Nowadays we have new laptops with routinely 1TB of storage

Unless you buy Macs which still start with 256GB storage.


It felt endless, especially compared to my 100MB yahoo account. I remember sitting there watching the storage counter going up and up and just grinning.


Could you explain the setup? Who would access the SMB and who’d access Dropbox’ web interface?

If it’s running on a server with a network-shared folder, I don’t see how that’s an issue for Dropbox. If you were passing around Dropbox credentials for a single user, that’s definitely a breach of ToS.


Had a business account, multiple users in charge of their little zones. Worker bees do work, drop it in a dropbox folder as they go. When they're finished, they tell the person in charge of their zone it's ready, and that person ships it off to the client.

Nothing was TOS breaking there, just that it couldn't detect changes to the folders, so, it would never sync. Triggering a manual re-sync would just hang indefinitely and never finish.


> Had a business account, multiple users

It’s right there in the ToS: “Don’t share your account credentials or give others access to your account.”

You might consider “one company” as a single entity, but it’s extremely clear from the nature of the product that Dropbox is meant to share files between accounts, not “one account for all”

If you had multiple accounts as designed, you wouldn’t need SMB.

I’m only saying this because it sounds like you’re blaming a company for not bending their ToS to your liking. I stopped using Dropbox myself a long time ago as well.


The ownership of data is super important.

Dropbox' vague EULA allows them to ban you for whatever reasons they want while keeping your data hostage. Good luck taking it back. Switched to Nextcloud and never look back.

Dropbox memory consumption on PC a few years ago was a nightmare, taking roughly 50-70% of RAM. Dropbox mobile apps were slow and clunky while having if not the worst UI for such a simple functionality.


> decoupling the global PRNG into several independent ones

Given this is 2020, I'm guessing you probably already solved this one, but this is something I had to do for a concurrent fuzzer I wrote recently.

Basically my solution is to make SipHash the PRNG. The way this works is you make a tuple of your seed and any other information you want to hash (at minimum, a "channel" number, so you get different streams of random numbers, and a sequence number). You hash the tuple and increment the sequence number and return.

Because it's literally just a hash function, it's easy to reason about its properties: same input in, same output out. As long as the hash function is high enough quality, you avoid bias in the output even when the input has low entropy (thus why I chose SipHash).

There's an Apache-licensed implementation here for anyone who cares. My implementation has streams inside each channel associated with hashable objects, just to make it easier to subdivide the channels.

https://github.com/StanfordLegion/fuzzer/blob/master/src/det...

https://github.com/StanfordLegion/fuzzer/blob/master/src/det...

https://github.com/StanfordLegion/fuzzer/blob/master/src/det...


> Because it's literally just a hash function, it's easy to reason about its properties: same input in, same output out.

Don't all PRNGs have this property?


Hash based PRNGs can be surprisingly tricky to reason about. For example, the other common way to turn hash functions into PRNGs (state = hash{state}) often leads to severe flaws from state space collapse, among other issues.


Can you explain what you mean by state space collapse?


Basically, state space collapse is when the state space of the PRNG is substantially lower than what it "should" be given the number of bits involved. I give an example in the sibling comment with blake2b where the actual period is 10% of the ideal case due to rho structures.


Thanks, this is fascinating.

Re-reading your comment above, I had missed that you were assigning the output of the hash function back into the state. I.e., something like:

    state = initial_seed
    def random():
        state = hash(state)
        return state
Whereas I was proposing something more along the lines of:

    seq_num = 0
    def random():
        result = hash([initial_seed, seq_num])
        seq_num += 1
        return result
If I understand correctly, hash collisions (which are inevitable) will cause the former to loop around in a cycle shorter than the size of the theoretical state space. Whereas the latter (I don't think?) suffers from this. But it may still have bad statistical properties, depending on the hash function.

For what it's worth, I did a (very informal) comparison of MurmurHash3 vs SipHash when I started my approach and found that MurmurHash3 (despite being advertised as passing the avalanche test) gave very statistically biased results. Something that should have generated a uniform distribution definitely did not. Whereas when I tested SipHash the output looked (at least to an untrained eye) essentially indistinguishable from a true random source.

Your parallel comment seems to indicate that there isn't a lot of great practical reading on this topic; I don't suppose you've seen a discussion anywhere going through anything like the approaches above, and whether there is a way to do it "properly"?


Yeah, you got it. The construction you used isn't susceptible to this particular flaw, though it still has others (e.g. predictability of the state updates, related-input attacks, etc). If the state size is reasonably close to the output size, you might still have statistical issues for similar reasons that the iterative construction fails. Whether any of these are relevant depends on the context.

I've been meaning to write some kind of public exploration on this stuff so people can tell me some other source explains it better, but haven't made the time to do more than write code like the blake2b demo I had handy to spit out numbers.


Do you know anywhere I could read more about this?


For the mathematical background (i.e. the theory and behavior in idealized models), standard cryptography books are excellent. I recall that both Serious Cryptography and Handbook of Applied Cryptography have excellent introductions to the topic of rho structures that underlie the specific issue I mentioned. Where I've found them lacking is getting a practical understanding of how important these issues are in the specific situations we see in the real world.

For example, if you were to run Blake2b in 8 bit mode with the above construction, what would your PRNG period actually be? In the ideal case it would 2^(8). In practice, the best you'll get is 26 or 10% of the ideal case. There's a significant probability you won't get that either for a given seed, as much of the state space actually drops into a lower period cycle. The issue is that a real hash isn't the correct construction here, even though a mathematically ideal hash would be fine.

I'm not aware of any actual reference material for that kind of stuff and the people I've met who are knowledgeable learned from seeing other's mistakes.


Seems like a great use case for https://antithesis.com


yeah! we were talking to the antithesis folks during this project, but it wasn't far enough along when we started the project in 2016.

we were heavily influenced by the original foundationdb testing talk from strange loop (wwilson, one of the antithesis founders): https://www.youtube.com/watch?v=4fFDFbi3toc


Discussed at the time:

Testing Sync at Dropbox - https://news.ycombinator.com/item?id=22928726 - April 2020 (25 comments)


Just Dropped Dropbox this month after many years. Their quest to grow and offer more and more unwanted services has finally made their core advantage of easy sync, share, and backup not worth the cost, complexity, and user-unfriendliness.

Google Drive is now Fine.


I was a customer, at some point they activated multi factor auth using an old email and I lost all my files


former area tech lead for dropbox (and frequent collaborator with isaac) here!

happy to answer any questions.


Why does Dropbox have horrendous dark patterns that should be illegal?


Not OP, and I'm not sure specifically about which dark patterns you're referring to, but I can take a stab:

It's because Dropbox doesn't have a good business model (anymore).

Microsoft, Apple, Google, etc. are happy enough to give storage away for free, or almost, and their solutions come pre-installed. For the average user, it'd be an uphill battle to get people to install Dropbox even if it were comparable price-wise to OneDrive, iCloud, Google Drive - and it's not, because Dropbox actually needs to make money off cloud storage and these companies don't. It'd need to be leagues better than those other products (think Firefox overtaking IE, for a time) - and it's not. I think its sync is quite good, but I don't know if it's so much better than the alternatives that it'd persuade significant numbers to switch. And, as others in this thread have pointed out, its desktop client kind of sucks.

Moreover, "files" themselves are becoming an antiquated concept. They're definitely not dead yet, but they're almost a power user thing at this point. Documents are stored on the cloud, embedded inside the specific web app (Google Docs, Notion, Figma...) you used to create it. It sucks, but that's the direction we're heading. File sync is becoming less-and-less essential, to fewer-and-fewer users.

Dropbox is a public company: they can't just shrug their shoulders, tend to their core user base, and iterate on their product. That's not going to produce the kind of growth they need to keep investors happy; you need a good business model to produce that kind of growth organically. Hence you need to produce it inorganically: dark patterns.


Do you have some examples? I'm curious


(2020)*


Added above. Thanks!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: