Hacker News new | past | comments | ask | show | jobs | submit login
ClearSkies – open-source file syncing without cloud (github.com/jewel)
190 points by urza on Feb 24, 2014 | hide | past | favorite | 52 comments



Works well for me. For those who want to try it, this is what I did.

Create a Dockerfile:

  # Ubuntu 12.04 + Python + Git
  FROM nlothian/python-git

  # Ruby
  RUN apt-get -y install libgnutls26 ruby1.9.1
  RUN apt-get -y install ruby1.9.1-dev

  RUN gem install rb-inotify ffi

  RUN git clone https://github.com/jewel/clearskies
Build:

  sudo docker build -t clearskies .
Run:

  sudo docker run -i -t clearskies /bin/bash
Then:

  mkdir /testdir
  echo 'testing' > afile
  cd /clearskies
  ./clearskies start
  ./clearskies share /testdir
That will print something out in the form:

  clearskies:SYNCXXXXXXXXXXXXXXXXXXXXXXXX
Note this, then start another clearskies docker container.

In that one:

  mkdir /testdir
  cd /clearskies
  ./clearskies start
  ./clearskies attach clearskies:SYNCXXXXXXXXXXXXXXXXX /sharedir
Wait a few seconds, and your file should appear.


Hmm.

There do seem to be problems with some firewall scenarios (or at least I presume that is what it is..)

When I setup a node at home and one on an Azure box I see them discover each other, but no files seem to get copied.


If you'll open an issue on github and attach the logs from both peers, I can look at it.


I like that "untrusted peers" are defined in the protocol, but unfortunately it's only an optional addendum [1]. To me, untrusted peering is the most important feature and I hope that not being defined in the main protocol does not mean it will be step-mothered in the implementation.

BitTorrent Sync only supports untrusted peers via API [2], and the only other open-source BitTorrent Sync alternative that I am aware of [3] left it out completely.

[1] https://github.com/jewel/clearskies/blob/master/protocol/unt...

[2] http://www.bittorrent.com/intl/de/sync/developers/api

[3] https://github.com/calmh/syncthing/wiki


At one point I had it as a required part of the protocol, but later decided that it added a lot of complexity so moved it to an extension.

I'm currently working on a minor reorganization of the protocol in the protocol_cleanup branch. I'll see if I can fold untrusted mode back into the core protocol.


What exactly is the function of untrusted peers? Speed up the sync?


To use a friend's machine as an off-site secondary backup. They can store encrypted data, but cannot view or push changes to the data. I'm guessing.


If its untrusted, how can it be a canonical reference?


Cryptographic hashes and signatures, I presume?


My original intent was that you could have it running on a linode or other virtual machine where you don't trust the hardware.


Internet connections can be quite asymmetrical. I have a 32 MBit/s downlink, but only 1 MBit/s up. That makes updating data on my tablet with data from home painfully slow unless I am at home. For me, a fast peer that I don't need to trust is pretty helpful.

But apart from speed and redundancy, I also hope for economy of scale. If there were a small market where several hosters offer peered hosts with X GB for $Y/month, it could drive costs down for everyone. Dropbox is asking for $0.10/GB/mon, which is about twice as high as it could be if the market were efficient.


Likely also for data redundancy if you only have 1 computer (or if all your computers/peers are in one building the untrusted peers could be your 'offsite backup' of sorts).


How do the peers find each other? The installation instructions suggest that I just need to run 'clearskies share ...' on one computer, and 'clearskies attach ...' on the second computer. How will they find each other (on a LAN or, especially, if they're each behind different NAT routers)?

I see that there is some 'tracker' code in the repo:https://github.com/jewel/clearskies/tree/master/tracker

Must I run that somewhere that both computers can access, and tell them its address?


Looks like there are various modes of discovery, see under "Peer discovery":

https://github.com/jewel/clearskies/blob/master/protocol/cor...

I can't find any references to DHT in the code (but the protocol lists that as an extension).

For lan udp broadcast:

https://github.com/jewel/clearskies/blob/master/lib/broadcas...

For tracker client (apparently gets a list of tracker URIs from the config):

https://github.com/jewel/clearskies/blob/master/lib/tracker_...

https://github.com/jewel/clearskies/blob/master/lib/conf.rb


There's a common tracker, currently running at clearskies.tuxng.com.

The plan is to add DHT support similar to the DHT used by BitTorrent. We'll seed the DHT using the tracker.

If you want you can also run your own tracker. You should also be able to add peers manually by IP address and port, but that ability is missing from the ruby client.


Another way to do file syncing without a particular cloud provider is Camlistore: http://camlistore.org/

Brad Fitzpatrick is one of the creators (LiveJournal, memcache, etc.) and it's rapidly getting better and better. That said, it can still be a bit tricky to get everything set up.


Camlistore looks really cool, thanks for posting it. I've been thinking something like this should exist for some time now.


Finally... :) I've been waiting for a project like this to appear for ages... thankyou OSS developer angels.


This is excellent. I have been waiting for an open source btsync clone.



gitannex? :)


Have you been able to use it? It is less user friendly than the proprietary solutions. There's tons of documentation, but nothing that says "here's how to use it like bittorrent sync". In fact IIRC the docs specifically say somewhere "you cannot use this like drop box".

Just sharing stuff between 2 PCs was very difficult (or I couldn't figure it out) and the annex program sat at 100% CPU most of the time doing nothing. Being written in Haskell is a turn off too. If I have to fix something, I want C, python, etc, not this crazy write-only language :-)


Have you tried the Assistant and its web UI? It comes bundled with git-annex: http://git-annex.branchable.com/assistant/


I did. Maybe I should give it another shot.

I recall my problem was trying to understand how to sync files I already had in other directories. Things certainly were not sync'd automatically. In the walkthrough it says you need to git-add files and then git-commit them http://git-annex.branchable.com/walkthrough/#index3h2

The ~/annex/ directory ended up with symlinks to git objects and the files themselves are nowhere to be found. I didn't know where things were/weren't sync'd already. Nothing sync'd across and the assistant just said "all done" or something similar. At one point I remeber it just containing a bunch of broken symlinks. Good job I was just testing it out, imagine if it replaced my actual files with broken symlinks.

What all this boils down to is that git-annex is not as fool-proof as the proprietary solutions claim to be and something equally Free as g-a, but less complicated, would be great.


I have. It kind-of worked but eventually corrupted my files. They all became test files with a long .git/ path in them.

The Jabber plugin doesn't work so it wasn't distributed like btsync. I had a central "server". What this mean is if computer A kicked off a sync, a node wouldn't get updated right away. The "server" doesn't automatically push to all of the clients.

It worked ok, I've tried just about ever sync solution out there. btsync is the simplest "just works" that I've found. The only problem I've found with btsync is that sometimes the mobile apps appear to be offline. But once they are woke up they'll start syncing.

btsync isn't a backup solution though, so I have bakthat backing up to Amazon Glacier.


It doesn't do homedir syncing under OSX, making it all but useless for me.



The JSON packets are limited to 16MB - if this is supposed to contain a manifest of a deep-directory-with-lots-of-files, that might not be enough. I regularly rsync (on a lan) trees of a million files and 8 levels deep. The manifest for such a configuration will not fit within 16MB.

I see there's an "rdiff manifest" extension, which is cool for syncing later changes - but the initial manifest will have to be transferred some other way.


This is handled by the protocol; see https://github.com/jewel/clearskies/blob/master/protocol/cor....

As an aside, I am currently adding a more sophisticated manifest exchange in the "protocol_cleanup" branch that will remove the need to keep sending the entire manifest (other than on the first connection).


I use nas4free + samba/CIFS daily, works like a charm. On each client (mobile, PC) I have an application that performs regular backups. In terms of alternatives, there's also OwnCloud that can be deployed in-house.

How does ClearSkies compare to existing private cloud solutions?


ClearSkies doesn't require a central server like OwnCloud, it's peer-to-peer.


Incredible! The OP is using SQLite, particularly, Fossil, http://fossil-scm.org/. It upsets me that this this open source venture does not give credit where it is due. If I a wrong, please correct me.


I'm confused. The protocol spec doesn't mention SQLite. The ruby proof-of-concept doesn't use SQLite. I've not seen fossil before that I can recollect (I am the author of the clearskies protocol spec).


Granted, I'm on mobile, but how does this differ from BTSync for two computers? There needs to be some agent proxying discovery if machines aren't on the same local network.


I think the goal is to have a free (as in freedom) BTSync.


Peer discovery is the same as with btsync. It uses a central tracker, but you can also add peers manually. We're planning on adding DHT support.


Potentially interesting, but my interest waned considerably when I saw it's GPLv3.


We're changing to LGPLv3. See https://github.com/jewel/clearskies/blob/master/license-chan....

The C++ implementation is also LGPL.


How did you reach this decision?

IMHO such licences actually hold free software back.


You can read the entire thread here:

https://groups.google.com/forum/#!msg/clearskies-dev/sTlXzBO...

For the sync app itself GPL would have worked great, but we want to have an easy-to-integrate sync library. Hopefully this will reduce the number of apps that require a cloud service to be able to synchronize the user data between devices.


Really? Because of the license, you're not interested in this project?

I find that very, very bizarre. Can you explain your thinking?


Can _you_ explain why do _you_ think someone loosing interest in the project because of its license choice is bizarre?

What makes _your_ "I don't care about licensing, and neither should you" opinion not bizarre, and his "I don't care about projects with wrong license" bizarre?

I agree that parent comment didn't add much to the discussion, but that's not the reason to imply bizarreness of it.


> Can _you_ explain why do _you_ think someone loosing interest in the project because of its license choice is bizarre

If it's a project that interests or is useful to me, then personally that trumps the license. If I discover a project useful to my workflow then I'll find a way to work with it, despite the license.

> What makes _your_ "I don't care about licensing, and neither should you" opinion not bizarre

Because it's my opinion and I generally don't find my opinions bizarre, otherwise I wouldn't hold them.

And I didn't say "I don't care about licensing, and neither should you" or even insinuate that.

He's more than welcome to care more about a license over a usefulness of piece of software and I'll still find it odd.


Not him, but I can see a few good reasons.

GPL is rather limiting in how you can use the code. A large company (e.g. Apple) won't let something gpl be used heavily internally if they can help it since they won't be able to apply patches or modify it without releasing these changes ... this obligation adds significant legal burden and, furthermore, releasing the changes could reveal private details about the companies internals.

I don't think that the commentor's reason is a good one, but I can understand the viewpoint; GPLv3 is quite limiting for some uses.


As long as it's only used and distributed internally, I don't believe the GPL has a problem with them modifying it without disclosing their changes. That's my understanding.


not really. they won't be able to publish product which incorporates GPL library without releasing this product under compatible license.

they can use and patch GPL product internally as long as they want.


Indeed, AGPL v3 would have been way better.


Why does the author wants to port the daemon from ruby to C++? Is it speed?


Speed isn't actually too much of an issue with the ruby client, since all the CPU time is spent in the GnuTLS and Digest::SHA256 code, which is already written in C.

The problem is portability to android and iOS. We additionally want to make the core easy to embed in other applications.


He mentions an android client. If you want it to be fast, or not drain your battery, you need Java or C++. If you want it to run on iOS or Windows Phone, C++ it is.


To make it easier to port to Android and IOS is my guess.


I love the name for this. :)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: