Hacker News new | past | comments | ask | show | jobs | submit login
Dropbox as a Git Server (anishathalye.com)
284 points by anishathalye on April 25, 2016 | hide | past | favorite | 76 comments



Studying the design document [1], one sentence makes it clear why this project is needed and simply using the Dropbox client does not suffice:

> We can perform a compare-and-swap operation in Dropbox by using the "update" write mode with a specific revision number.

Since a Git repository basically consists of a hash-addressed content object store alongside a mutable list of references (mapping branch names to hashes of latest commits), you need a compare-and-swap operation on the reference list to update the branch reference when you push new commits. The Dropbox client by design does not do a compare-and-swap whenever a file is updated, but the Dropbox API supports it.

[1] https://github.com/anishathalye/git-remote-dropbox/blob/mast...


I'm not sure how this is "needed" just because it is possible. Setting up a small headless git server takes 2 minutes without sharing your code with every intelligence agency on the planet.


If that server is on AWS, Google Cloud or Azure, wouldn't it be totally vulnerable to national security letters?


Could be alluding to the appointment of Condoleezza Rice to the Dropbox board, and implication that Dropbox just gives them access versus requiring pesky NSLs or warrants (given Rice's defense of those NSA programs).


see Amber Cottle.


It's worth mentioning that it's also very easy to set up git for use over ssh just about anywhere (digitalocean, linode, VM or container at home, etc)

The official docs cover it pretty well:

https://git-scm.com/book/en/v2/Git-on-the-Server-Setting-Up-...

If it's for your own use, you can skip the part about creating a `git` user, and host the files in eg `/home/$USER/repositories` instead. The repository setup instructions remain the same, just the path differs, and the `authorized_keys` file to add keys to will be in `/home/$USER` instead of `/home/git`.


It's so simple, I think it's worth showing just how simple. For a single user who has already SSH access to some server, and has installed Git on that server with apt/yum/whatever:

On the server:

    mkdir project.git; cd project.git
    git init --bare
In an existing Git repository on your client:

    git add remote newserver user@example.net:project.git
    git push newserver master


And then you can `pip install klaus` to get a nice web UI; SSH-based git plus klaus[0] is what I'm using[1] instead of Github now and it's great.

[0]: https://github.com/jonashaag/klaus [1]: http://git.haldean.org


> I ran out of private GitHub repositories a long time ago

why doesn't OP use bit-bucket ?


I deployed Gogs at home and am loving it for things I don't want anyone else to have access to (mostly hobby projects with some secret keys in them), and Gitlab is just so fantastic nowadays that I'm adding projects to it instead of Github.

I wish they would fix their Github syncing (they can currently do it, but don't tell you their public SSH key so you can add it to Github as a pushing/pulling key).


Gogs is awesome :)

The only first world probem I have with Gogs is when you change branch deep into a directory it goes back to the top level view.


Thanks for the 'fantastic'!. We’re always trying to enable our users to integrate with more services/tools. I would love to have your input on this specific topic at https://gitlab.com/gitlab-com/support-forum/issues/695


I agree that Gogs works very well for small repositories.

Just a heads-up to anyone planning to use Gogs for large repos: Gogs is basically unusable for large repositories( such as the git repository itself and the linux kernel). The response times through the web UI can extend into a few minutes on small servers( I tried it with a 2 cpu and 2GB RAM server on DigitalOcean) and even on extremely powerful servers, the response times for large repositories is almost the same( I tried it on a 8 cpu and 16GB AM server). The reason for this is that Gogs lacks a cache system for git repos. You can follow the issue here[1].

[1] https://github.com/gogits/gogs/issues/1518


It's not documented anywhere, so it's possible they use different keys for different users, but I was able to figure out they use this key for my account:

    ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC8cBZzV5gblDfbi41AxXC3fmfP8w1okfdNn9b6uXaj7i03NwhJ0n4Eg8Z+zgAtyrIa2bw9tzF8fyeYsnRbK5Paj059dn+XiXp3HOgcvHO9jy9C1+nomkRCM55fkMSGyiSByh2KSOiqqAusrTV9joig/Re30gYpxm8iCs20gbD717lZTT0A3dnrv7IQ86nbn9+5h+yMcok/AWEWS6Xq1NJGG5vav+vh+Nhlteo72qj77B/DFK1noUxcTgcy27xZOp1wrr7a6PaBu6wEMgKd2Oe9wQtzycaOYTAywHqtXnOE/w7FvhJen20GqtL9k4Zr94hxUNG7LJoiO6gNzo/0q24t git@theo.gitlab.com


Hmm, Gitlab crashes when I try to add a mirror. How did you figure this key out, if I may ask?


Bitbucket, which they also support, shows me the key they added to my repo.

Using the key they placed on my Bitbucket repo worked on one of my Github repos.


Ah, thank you, I'll try that.

EDIT: Turns out that key doesn't work. How can I make them add a key to my BitBucket account? I don't see an option for that...


GitLab shouldn't crash. We would like to help you resolve this, please email support@gitlab.com and link to this comment to get help.


> mostly hobby projects with some secret keys in them

For that, you might consider using something like git-remote-gcrypt, which encrypts data client-side before sending to the server.


Yes, I use git-crypt, but I'm paranoid :)


Or GitLab? I really don't understand why anyone would misuse Dropbox as a Git server...


The OP seems to be interested in a SaaS (alternative for GitHub). For the people that think you always have to install GitLab, instead of running your own GitLab server you can also use GitLab.com with unlimited projects and collaborators, see https://about.gitlab.com/gitlab-com/


Has Gitlab done any work on the high availability front? I remember reading a while back that gitlab.com was hosted on 2 servers, or something like that.


They definitely stated that gitlab.com was running on one single server (plus a spare):

https://about.gitlab.com/2015/01/03/the-hardware-that-powers... https://about.gitlab.com/2015/03/09/moving-all-your-data/

It looks like this is no longer the case:

https://about.gitlab.com/2015/06/04/gitlab-dot-com-outage-on...

I haven't been able to determine if they're still storing the git repositories on NFS:

https://about.gitlab.com/2015/09/01/gitlab-dot-com-outage-on...

but if that's the case, I would be extremely nervous about the integrity and stability of their platform. I've had too many bad experiences with NFS mounts failing and general instability that I don't want to deploy it anymore.


We have many servers now, see https://gitlab.com/gitlab-com/blog-posts/issues/214

We're still on NFS but are looking into Ceph https://gitlab.com/gitlab-com/operations/issues/1


We're currently using 20 Azure instances to host GitLab.com. Git data is stored on an NFS server though I can't recall the exact setup from the top of my head.


Huge fan of Gogs here: https://gogs.io/ easilly run on any low powered machine/vps, super great for personal projects.


Or even GitLab

Edit: Apparently I was beat to the punch ;)


Or GitBucket hosted on your own hardware.


I've never tried it, but couldn't you theoretically make another github account using your email (assuming you use gmail) with a "+whatever" in it, and use that to create an infinite number of new github accounts? I believe gmail redirects you+something@gmail.com to you@gmail.com.

Of course, just using Gitlab is the more sensible option :)

[EDIT] - as lfowles pointed out, github doesn't private repos on free-tier accounts, so this is pointless.


Github has no private repositories in the free tier.


I would personally prefer gogits - since its simpler than GitLab.

But since I am a ghetto peasant without a static IP address I prefer to keep everything in what the cool kids call "The Cloud".


I was referring to gitlab's hosted option (maybe I'm not the only one that didn't know this existed until recently) - https://gitlab.com/users/sign_in

I assume you're running your own VPS with gogits/gogs running on it?


I'm pretty much using a single private Git repository where I store all of the things I don't want to open source. When I'm the only one using that repository, I don't see the issue with doing so.


Many people don't seem aware of AWS CodeCommit -- AWS hosted git. $1/user per month, 10GB storage, no bandwidth charges. Free for the first year.


http://visualstudio.com Team Services is also pretty nice. Free unlimited hosted git repos for 5 users, also includes free build/test/load test/release/CI management, kanban boards, team chat rooms. eclipse/xcode/xyz integration, etc, etc. If you've got a small team, not bad.


Hey, thanks for that pointer! I'm an iOS developer who will start working with a bunch of C# developers, and this is just the thing I needed!


This is why I love reading HN. Thanks for the tip!


> Why shouldn't I keep a bare Git repository in a Dropbox shared folder, use it as a folder-based Git remote, and sync it with the desktop client?

> There seem to be some articles on the Internet suggesting that this is a good idea. It's not. Using the desktop client to sync a bare Git repository is not safe. Concurrent changes or delays in syncing can result in a corrupted Git repository.

How dangerous would this be in practice? Ie, what are the chances of corrupting a git repo with this method with a small team, and what exactly must happen to cause it?


I don't have any hard numbers but I tried using Dropbox as a git remote a few years ago. Everything worked great until it didn't.

A HN discussion from a few years ago suggests that Dropbox + GIT is a mixed bag: https://news.ycombinator.com/item?id=5558822


I use this for some OSS projects I help maintain (symlinks in Dropbox, actual dirs on my GOPATH, etc.). The use-case for me is syncing between my laptop & desktop without having to remember to git pull everything before a flight/trip.

There are occasional conflicted copies (once every few months), but it's rare and I don't recall encountering one in quite some time (months).

Tips:

* Don't work on things concurrently * It's brittle (like any sync service) if you're editing things when that folder is still updating. This is especially so when it comes to git. * Repositories with lots of small files take a long time to sync.


I use this method by myself, and even then I had conflicts where Dropbox would fail to sync for some reason and then I would have git tree conflicts. But this not a big problem, since Dropbox saves both versions of conflicting files and you can just delete the conflicting copies, and then merge changes normally using git.


Dangerous enough that the Dropbox client will give you a warning if you try to add a git repo to your dropbox folder.


Even if it were possible to corrupt the repository, corruption would result in obvious failures rather than subtle data loss because of git's use of cryptographic hashes for everything, and the fact that local copies are full repositories makes it easy to nuke the Dropbox repository and recreate it if you need to.


It's not an issue at all and all the articles saying otherwise are FUD. http://edinburghhacklab.com/2012/11/when-git-on-dropbox-conf...

Edit: Wow I got downvoted and I actually ran the experiments myself. Has anyone got experiments showing otherwise?


I didn't downvote you, but I don't really agree this is not an issue. The article itself shows that the git repo can get into a confused state which you have to fix, even if the fix is simple. The experiment itself doesn't emulate common causes of Dropbox pain. Firstly Dropbox stores and syncs files in a way that gits number of files and layout of files is not kind to, causing thrashing and high cpu when people commit. Different clients have different ways of locking a git repo. Issues occur when two clients make changes and are online at different times. I'm not saying everyone will hit these issues, but these are all issues that someone has run into.


Important thing to note — this may not be the best solution if your goal is to always have an up to date local (dropbox synced) version of a git repo. The docs discourage that:

> If you're using the Dropbox client to sync files, it's a good idea to use selective sync and disable syncing of the folder containing the repository to avoid any unexpected conflicts, just in case.

Also, in case this wasn't obvious, this is basically a "shim" to use dropbox as a git server (as opposed to say mercurial or SVN). No actual git server is being run and if I understand correctly, some useful features from a normal git server are not implemented.

> git-remote-dropbox is a Git remote helper.

> git-remote-dropbox stores all objects as loose objects - it does not pack objects. This means that we do not perform delta compression. In addition, we do not perform garbage collection of dangling objects.


you shouldn't put your plain git-repo into a dropbox-client synced folder, as this corrupts the git repo eventually.

but this project exists exactly because of this and circumvents said problem.


> As far as I know, git-remote-dropbox is the only safe way to host a Git repository on Dropbox.

I've another hackish method that's good enough for a single developer, plus it's encrypted.

I've got a truecrypt file container that resides in Dropbox. I mount it as read-write when I want to push. On the rest of my machines, its mounted as read-only. Even if the container is 100GB, it takes 10 seconds to sync daily commits - because of block encryption & syncing. This won't work with Google / MS / Amazon drive etc. because they upload the entire container on each incremental change.


This feels dangerous. While I'm not sure, I don't think changes to a truecrypted filesystem are guaranteed to be atomic. I'm pretty sure there's no guarantee that the filesystem is consistent every time Dropbox starts syncing. You probably end up with a partially correct file system on the ro-side, which might end in kernel panics if you're unlucky. Also it's not two-way as the git-remote-dropbox solution offers.


Hence the word "hackish". Though as long as TC containers are read-only, Dropbox syncs them all right.


Uh, this is also dangerous to your security. The truecrypt docs contain very large warnings about this[1]: someone who can see your disk at multiple versions may be able to drastically weaken your security. If you had keys in those truecrypt files, you should rotate them; if you had secrets, consider them compromised.

Truecrypt uses XTS mode, which is Not Great -- it only exists because someone was trying to hammer ciphers into fitting nicely with fixed sized disk sectors, and it makes a number of serious compromises to do so. [2] has a good discussion of this. You do not want to combine XTS with sharing multiple versions of the ciphertext.

tl;dr you've seen the penguin pictures? As [2] cleverly says, you're pretty much doing that to yourself by sharing multiple versions of the file.

There's also significantly more powerful attacks that can be mounted by an adversary that can feed you corrupted blocks which will allow them to permanently compromise your key, or perform other hijinks like manipulating your content without setting off warning bells. Dropbox, or someone that compromises dropbox, is in that position, by nature of the service they're providing. These issues are rooted in the fact that XTS is not an authenticated cipher -- this leads to such an endemic category of subtle problems that it's been nicknamed "The Cryptographic Doom Principle" [3].

So yeah. If your life depends on it... don't store a truecrypt volume in dropbox.

-----

[1] I found a mirror at http://andryou.com/truecrypt/docs/how-to-back-up-securely.ph... -- and I'll quote, in case that too vanishes:

  > IMPORTANT: If you store the backup volume in any location that
  > an adversary can repeatedly access (for example, on a device kept
  > in a bank’s safe deposit box), you should repeat all of the above
  > steps (including the step 1) each time you want to back up the
  > volume
... where "step 1" is "Create a new TrueCrypt volume".

[2] http://sockpuppet.org/blog/2014/04/30/you-dont-want-xts/

[3] http://www.thoughtcrime.org/blog/the-cryptographic-doom-prin...


This is good information & thanks for the links.


Reading through the code (20 second skim) it does not appear that this remote would reveal any information to the OP such as your dropbox access token or code being pushed.


I find it interesting that Dropbox isn't building these types of utilities themselves. This looks pretty cool and also seems like a really good way to up-sell Dropbox storage later.


Well, they sort of did build it. Anish was an intern at Dropbox when he wrote it. I'm not sure if he started it as a personal side project, but it was finished and released during a Dropbox Hack Week.


Ah okay, thanks for the insight; that's super interesting.


Shameless plug: I've implemented something similar in the past, but with the addition of client-side encryption. I've been using it for some time without problems.

At a first glance at the design docs here, the present project looks more sophisticated in terms of syncing. I'm just using local file operations and hoping for dropbox to do its stuff :)

https://github.com/lucas-clemente/git-cr


Syncthing (https://syncthing.net/) as a git server would be cool too.

I use Syncthing to sync around my personal git repo between my machines. All pushes and pulls are instant. I don't have conflicts because I'm only working on one machine at a time. `git-remote-syncthing` would presumably be even better, allowing multiple simultaneous users.


I'm shamelessly using GitHub Desktop and loving it. I guess I'll pass on this project unless there is a way to point GD to a Dropbox-hosted repo.


I think you misunderstood the point of the project. It is not making dropbox a git gui, but makes it possible to use dropbox as a remote git server.


What I meant to say is that I generally use git via the GitHub Desktop app, not the command line. I don't think it is possible to point GitHub Desktop to a Dropbox repo. Or is it?


Many people don't realize it but you can store git repos in S3 using jgit. Super cheap backups of git repos.


Storing massive small files in Dropbox folders is horrible! It burns my SSD & CPU on each restart.


People still use Dropbox for this? I guess I shouldn't have stopped developing this 4 years ago.


Why?


Because a hacker felt an itch.

Why not?


Why use two services when you can get the job done with one?


When all you have is a hammer....


If your are paying for a Dropbox account might as well make full use of it.


how about free service?


If you aren't are paying for a Dropbox account might as well make full use of it.


What's wrong with a simple svn server?


I love svn, but appreciate some git things as well. Particularly the way you can do a couple of commits before pushing; I don't think there's any equivalent in Subversion?


If you understand how git works, you don't need a tool to do this, and especially not a python tool. Don't misunderstand me, I love no language more than Python. But putting a directory under Dropbox supervision is the easiest thing you can do. That doesn't change because it's name is ".git/".


Are you sure you've correctly identified the "this" in question? This enables using Dropbox as a shared remote repository, which is more than putting the directory under Dropbox.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: