Announcing Firepad — Our Open Source Collaborative Text Editor

yuchi · on April 9, 2013

I'm happy to see [Substance](http://substance.io) work (Tim's OT) to spread to other related project!

mikelehen · on April 9, 2013

Yes! His work is great. Made for a great starting point for building Firepad. :-)

mbrock · on April 9, 2013

Does this depend on proprietary server tech hosted by Firebase?

toddmorey · on April 9, 2013

Yes, it does.

"Firepad has no server dependencies and instead relies on Firebase [hosted service] for real-time data synchronization."

So the front-end editor portion of it is open source, but the data sync between connected clients happens via Firebase.

You can find more info at the bottom of this page: http://www.firepad.io/

mbrock · on April 9, 2013

Thanks. Seems cool, I just wasn't sure, and phrasing like "Firepad has no server dependencies" had me picturing some advanced WebRTC-based peer syncing or something, which is pretty far-fetched but you never know these days...

derefr · on April 9, 2013

Technically, the whole point of OT is to allow for "some advanced [...] peer syncing or something"; any OT peer can actually synchronize state against any other peer. The original OT applications were all peer-to-peer apps. It's very similar to git, actually: everyone is on a particular commit[1], where each commit has another commit as its parent; clients can create regular commits, and thus move apart in state-space; and then they can accept others' commits to move back together (creating a merge commit in the process.) Very decentralized.

Ever since Etherpad/Google Wave/Google Docs, though, OT has been consistently mangled into a hub-and-spoke design, where there's a server who keeps a canonical state, and the OT tie-breaking algorithm is always decided in the server's favor.

Used in this fashion, it's not much different (actually a bit higher-overhead than) just having the server arbitrarily accept and temporally-order input operations in a way it likes, and then send back a single overwriting transformation to all clients to move them from their last server-known commits to the new server-canonical commit. (In other words, what any multiplayer game's logic server does.)

[1] One major difference is that OT "commits" are referred to by their vector-clock, instead of by their SHA. Cryptographic hashes are nice, and would give OT even cooler properties if it could use them, but they're a bit too slow for something that gets regenerated with every character typed. CRCs might be fast enough, but in an actually-distributed peer-to-peer system you'll get real collisions fast.

yuchi · on April 9, 2013

That's exactly the whole point of Operational Transformation. It's the "algorithm" that powers Google Docs, and stores data as a sequence of transformations that can be applied ("can operate") on a previous document state to build a new one.

therockhead · on April 9, 2013

Do you know of any JS OT framework that supports peer to peer?

mikelehen · on April 9, 2013

I don't know the state of the project, but while researching OT stuff for Firepad, I came across https://github.com/sveith/jinfinote. It looks like a peer-to-peer OT implementation (I think it requires a server to set up the initial session, but after that, everything could be peer-to-peer [as implemented, the server just relays messages with no processing]).

Johnyma22 · on April 9, 2013

Check out the Etherpad plugins, stuff like showing cursor position is available http://beta.etherpad.org has it enabled for example.. Etherpad is extendable and does support code editing with syntax highlighting, you just have to jump into /admin/plugins and click "install" on whatever plugin you want :)

glavata · on April 10, 2013

Loved etherpad before, and still do :)

gnuvince · on April 9, 2013

And it's web based, and thus I will never use it.

toddmorey · on April 9, 2013

One interesting and important bit about how Firebase works is that all write operations to the data happen locally, and then they are synced back to Firebase (and to other connected clients) on a best-effort basis.

That approach is designed to prevent network lag / connectivity issues from impacting the responsiveness of the application. Advancements like that are starting to make newer web apps feel more and more like their desktop counterparts.

macspoofing · on April 9, 2013

That's a strange restriction you put on yourself? Today you have to have a good reason for your product to NOT be web-based.

yuchi · on April 9, 2013

I'm interested too. Is the browser environment which bugs you or the web "platform"? (you know, HTML and brothers)

lutusp · on April 9, 2013

> Today you have to have a good reason for your product to NOT be web-based.

For a document that the visitor cannot afford to lose, or for anything in which privacy, security or personal control are issues, a Web-based document is obviously a mistake. Why do you think the big players in cloud-based and Web-based storage and applications are having such a hard time getting people to adopt them?

The answer is obvious -- the drawbacks greatly outweigh the advantages. The risk of losing sensitive content or having it be compromised or stolen is too great.

Even games, a less serious endeavor, suffer when they adopt a Web-based approach. Look at the brouhaha that followed from making the most recent SimCity version work online-only:

http://www.gamesindustry.biz/articles/2013-03-15-ea-defends-...

A quote:

------------------------------------

"So, could we have built a subset offline mode? Yes," Bradshaw admitted. "But we rejected that idea because it didn't fit with our vision...The SimCity we delivered captures the magic of its heritage but catches up with ever-improving technology."

A number of upset fans in the comments section to Bradshaw's update were not assuaged by the explanation. As one user going by the handle klymen wrote, "With all due respect, to write an article about why the new SimCity has to be always online and not to mention DRM or the anti-piracy measure even once, is dishonest and downright disrespectful. It seems like that you don't think very highly of your audience. Yes, the DRM issue is a sensitive topic, but to avoid it, not to mention it as one of the reason why SimCity is always online, shows absolute disconnect and contempt for your fans. We're not stupid."

------------------------------------

The above is just about a game, not a business transaction or potentially sensitive communication.

As to data loss, consider incidents like this:

Title: Amazon's Cloud Crash Disaster Permanently Destroyed Many Customers' Data

Link: http://articles.businessinsider.com/2011-04-28/tech/29958976...

Quote: "In addition to taking down the sites of dozens of high-profile companies for hours (and, in some cases, days), Amazon's huge EC2 cloud services crash permanently destroyed some data.

The data loss was apparently small relative to the total data stored, but anyone who runs a web site can immediately understand how terrifying a prospect any data loss is."

In conclusion, and to second the SimCity gamer's quote above, we're not stupid.

macspoofing · on April 9, 2013

First things first, web-based doesn't imply cloud-based. You can have a web-based service that's hosted on your servers and only services your intranet. This way you get most of the advantages of the web (no software to install and maintain, and easy access to the resources by your users), and still maintain control over your stack. There is very little reason to build desktop software anymore.

>For a document that the visitor cannot afford to lose, or for anything in which privacy, security or personal control are issues, a Web-based document is obviously a mistake

I don't see the obviousness of this. In fact, if it's a document that you cannot afford to lose, a cloud service makes much more sense than something stored locally (even with backup procedures). There are cloud services that handle sensitive data (such as patient records and images) today, successfully. Yes, there may be cases in which it makes sense to have data reside on your servers, as opposed to on some cloud-provider's, but those are edge cases now. We recently went through something like this at work, instead of hosting and maintaining our own Sharepoint servers, we went with a cloud-based CRM. It makes too much sense. Our source code, which is by far the most valuable piece of our business, is hosted on kiln. We do have backup strategies in cases kiln servers get hit by an asteroid, but we have no qualms about fogcreek maintaining our codebase.

>Why do you think the big players in cloud-based and Web-based storage and applications are having such a hard time getting people to adopt them?

That is absolutely false. In fact, the opposite is true. The trend has been to offload almost everything to the cloud. The big cloud-storage guys all had phenomenal growth.

>The data loss was apparently small relative to the total data stored, but anyone who runs a web site can immediately understand how terrifying a prospect any data loss is."

And how are you immune to this when YOU are responsible for managing your backup strategy and maintain your servers. Do you know how many horror stories there are of data loss that occurred because of things like bad RAID setup. Backup, replication, server maintenance is hard, expensive and time consuming, and most of the time it has no relevance to the underlying business. If you're in the business of making plastic widgets, you want to focus on making plastic widgets, and leave server maintenance to those whose entire business is server maintenance.

lutusp · on April 9, 2013

> First things first, web-based doesn't imply cloud-based.

True, but there are few applications that reside in a browser that don't use cloud-based storage for the results. One may safely refer to Web-based and cloud-based technologies in a single breath.

>> Why do you think the big players in cloud-based and Web-based storage and applications are having such a hard time getting people to adopt them?

> That is absolutely false.

No, it's true, and you need to do impartial research before making this sort of claim. The big players are having a hard time getting people to adopt cloud-based and Web-based technologies, and I already gave the reasons.

http://www.infoworld.com/d/cloud-computing/its-cloud-resista...

A quote: "Accenture and the LSE surveyed more than 1,035 business and IT executives and conducted more than 35 interviews with cloud providers, system integrators, and cloud service users. The key finding: There's a gap between business and IT. Businesspeople see the excitement and business benefits of cloud computing, so they're pushing for it. However, IT people see cloud computing as causing issues with security and lock-in, so they're pushing back."

> And how are you immune to this when YOU are responsible for managing your backup strategy and maintain your servers.

This is a non-argument fort an obvious reason -- if infrastructure data loss is an issue, Web-based data loss is a bigger issue, because in the latter case, users won't necessarily know where the data are located, and the number of possible failure modes is higher.

> Do you know how many horror stories there are of data loss that occurred because of things like bad RAID setup.

I can't believe you even posted this argument. How does an unreliable cloud RAID array constitute an improvement over an unreliable infrastructure RAID array?

I haven't even mentioned the legal issues, where law enforcement has a much easier time subpoenaing evidence from the cloud, compared to legally acquiring from your local network.

http://www.forbes.com/2010/04/12/cloud-computing-enterprise-...

A quote: "Enterprises are moving their assets to the cloud to capture its many business benefits, including ease of deployment and reducing, if not eliminating, the need for IT infrastructure. However, cloud computing offers an array of pitfalls for the unwary. The unique legal risks and considerations presented by the cloud are especially important and often overlooked by nonlawyers."

The article goes on to list five very serious and often overlooked legal pitfalls of cloud computing.

macspoofing · on April 9, 2013

>One may safely refer to Web-based and cloud-based technologies in a single breath.

It may be one of those things that needs to be qualified. Pretty much every Fortune 1000 enterprise runs some kind of a web-based intranet, which may or may not be accessible outside the VPN, with various services, from email, to document management, to ... anything.

>if infrastructure data loss is an issue, Web-based data loss is a bigger issue

HOW?! First, there is nothing preventing you from having your own backups. Second, even if you completely trust the cloud provider (and who says you should?), I claim that it is still safer than managing your own data for most business, especially if your business cannot afford a top-notch IT support staff (or any staff). If you're GE, you can invest in server-farms, if you're Plastic Widget Inc. you're better off with a reputable cloud vendor.

> Businesspeople see the excitement and business benefits of cloud computing, so they're pushing for it. However, IT people see cloud computing as causing issues with security and lock-in, so they're pushing back.

God-bless SysAdmins, but they do have a tendency to be anti-anything that comes in on their turf. They are almost never the decision makers. Having said that, you do realize that cloud services went from nothing (a few years ago) to a huge multi-hundred-billion dollar industry in the span of a few years, and growing. Clearly, SOMEBODY sees values.

> The unique legal risks and considerations presented by the cloud are especially important and often overlooked by nonlawyers.

Yes, there are "unique legal risks and considerations". What's your point? There are risks to cloud services, but there are incredible benefits as well. One always weighs risk and reward accordingly. The rewards is why the industry is growing. Here's an example of a 'unique legal consideration', Canadian hospitals cannot use cloud providers hosted on Amazon or anywhere in the US to host patient data because of things like the Patriot Act, so what do they do? They can go with a regional cloud provider that makes a guarantee that their data will not leave the province. I've seen that happen.

lutusp · on April 9, 2013

>> if infrastructure data loss is an issue, Web-based data loss is a bigger issue

> HOW?!

Because there are more factors involved. A local storage device has some number of failure modes, and probability of failure: A. The cloud had additional failure modes and vulnerabilities: B. The outcome is A + B. The failure modes are additive.

macspoofing · on April 10, 2013

>A local storage device has some number of failure modes, and probability of failure: A. The cloud had additional failure modes and vulnerabilities: B. The outcome is A + B.

That's funny =)

BHSPitMonkey · on April 9, 2013

So would you inherently trust a native GTK+ app containing some client-server functionality over a JS/HTML/CSS app running locally on an offline Chromebook?

You (and the originator of this comment thread) seem to be conflating the client-server model with the platform an application is developed for. "Being a web app" and "relying upon some server" are two entirely disjoint things.

daleharvey · on April 9, 2013

Sim City wasnt written in web technology and needed to be online, its possible to write applications using web technology that dont need to be online.

Those arguments have nothing to do with each other, 'the web' doesnt equal 'store everything in cloud based servers'

lutusp · on April 9, 2013

My point was that forcing SimCity's fans to adopt a cloud-based version of the game was a big mistake, and everyone involved accepts this now.

> Sim City wasnt written in web technology and needed to be online

The present version of SimCity requires players to log on. The game will not work without a Web connection. This has created a huge outcry from fans of the game.

> its possible to write applications using web technology that dont need to be online.

If we define "online" as "connected to a network", then yes, Web-based applications require one to be online. And most Web-based applications require an Internet connection.

> Those arguments have nothing to do with each other ...

Cloud-based storage and Web-based applications have nothing to do with each other? Most Web-based applications store their results in the cloud. If you adopt Web-based applications, you're also adopting cloud-based storage at least temporarily.

macspoofing · on April 9, 2013

>My point was that forcing SimCity's fans to adopt a cloud-based version of the game was a big mistake, and everyone involved accepts this now.

The BIG problem with SimCity wasn't that it had online features (those are great), but that it did not let you play offline for any good reason, which annoyed the heck out of people. And how does that example prove anything? A mediocre game (single or multiplayer, online or offline) gets released every other week.

>Cloud-based storage and Web-based applications have nothing to do with each other?

They don't. You can have a web-based app, that is hosted locally, or you can have an installed app, that cannot function without a cloud-service (e.g. almost every mobile app).

>Most Web-based applications store their results in the cloud.

I mentioned this in another post, but almost every business, big or small runs some of kind of a intranet, where they control the entire stack. The reality is that we're moving away from that setup as well. So no, web-based and cloud-based aren't the same thing.

lutusp · on April 9, 2013

> The BIG problem with SimCity wasn't that it had online features (those are great), but that it did not let you play offline for any good reason, which annoyed the heck out of people. And how does that example prove anything?

It proves exactly what I intended -- that, contrary to an earlier point, Web-based technologies aren't being embraced enthusiastically, that there are circumstances in which they're not the right approach.

>> Cloud-based storage and Web-based applications have nothing to do with each other?

> They don't.

When you post a refutation, it's customary to offer some evidence for your position. You don't -- your reply wanders off to a different topic.

> So no, web-based and cloud-based aren't the same thing.

First, I never said that. Second, Web-based and cloud-based technologies are integral to each other. The majority of Web-based applications store their data in the cloud.

comex · on April 9, 2013

> It proves exactly what I intended -- that, contrary to an earlier point, Web-based technologies aren't being embraced enthusiastically, that there are circumstances in which they're not the right approach.

The outrage about SimCity was largely based on the fact that a single player game fundamentally should not need to connect to a server (SimCity has some interactions between neighboring cities, but they're not key to the game and critics think they should be optional), but a collaborative document editor is inherently "multiplayer" - though being web based might make it harder (but not impossible) to keep a stale copy of the document while offline, the vast majority of uses for such an editor expect everyone to be online and synchronizing changes.

There are circumstances where web-based is not the right approach, but this is probably not one of them.

daleharvey · on April 9, 2013

>> So no, web-based and cloud-based aren't the same thing. >First, I never said that

> If you adopt Web-based applications, you're also adopting cloud-based storage at least temporarily.

Some web based applications use cloud storage, some dont, they arent completely independent of each other and utterly not 'integral'

it doesnt need to be proved past http://diveintohtml5.info/storage.html

> If we define "online" as "connected to a network", then yes, Web-based applications require one to be online. And most Web-based applications require an Internet connection.

Again you are confusing history with truth, web application do not require you do be online to work

http://diveintohtml5.info/offline.html

lutusp · on April 9, 2013

> ... web application do not require you do be online to work ...

What?, If "online" means connected to a network, then yes, Web-based applications require you to be online, and the majority of those require you to be connected to the internet, both to access the application and to store and retrieve data.

daleharvey · on April 9, 2013

no they do not, I linked to a very basic overview of some of the technologies used to have web applications work offline (the storage one is also relevant). There is a relatively huge ecosystem that isnt touched upon in that article, offline web apps are very much a thing.

I have been using a mobile phone whose OS(UI/App layer) was written entirely in web technologies, it most certainly doesnt turn into a brick when I dont have a data connection.

It is fine to not know these things, but its not a very good idea to be publically dismissing technologies you obviously arent very familiar with, and when people point out that you are wrong you should probably do some research before defending it.

derefr · on April 9, 2013

This seems like a very misaimed rant, for this particular piece of tech.

It's a collaborative text editor. What point is there to using it offline? All the disadvantages you state are already inherent in the concept of "collaborative text editor", whether or not it's web-based.

lutusp · on April 9, 2013

> It's a collaborative text editor. What point is there to using it offline?

You're using your premise as your argument.

> All the disadvantages you state are already inherent in the concept of "collaborative text editor", whether or not it's web-based.

Not at all. A locally hosted application can collaborate by sharing only its data, not the application itself, with all data stored locally after the work session ends. I'm only making the point that your premise is mistaken, not that a collaborative editor isn't an obvious application for Web technologies, all legitimate objections aside.

The good news is that a collaborative editor might itself be Web-based, and its documents might reside in the cloud for obvious practical reasons. That's also the bad news, due to problems already listed.

derefr · on April 10, 2013

I'm not sure if you've been keeping up with what "web-based" means, but given HTML5 cache manifests and storage APIs (localStorage, IndexedDB, etc.), web applications are perfectly capable of "sharing only its data, not the application itself, with all data stored locally after the work session ends."

You can basically think of a web application, these days, as a native app that just happens to "update" to the newest server-provided version whenever you start it up, if you're online at the time.

ams6110 · on April 9, 2013

Data loss is a risk you have to manage no matter where your data live, whether on your desktop, in your own data center, or in the cloud.

lutusp · on April 9, 2013

> Data loss is a risk you have to manage no matter where your data live, whether on your desktop, in your own data center, or in the cloud.

That cannot be used to argue that cloud-based storage is just like local storage. It isn't -- cloud-based storage has some serious legal and practical problems that local storage doesn't have.

macspoofing · on April 9, 2013

You're overstating the legal and regulatory risks. I work in health-care, which is probably one of the most regulated industries there is, and regulatory issues are not really a barrier to cloud services.

lutusp · on April 9, 2013

> You're overstating the legal and regulatory risks.

Okay, fair enough -- I will let this IT lawyer "overstate" them for me:

http://www.forbes.com/2010/04/12/cloud-computing-enterprise-...

> I work in health-care, which is probably one of the most regulated industries there is, and regulatory issues are not really a barrier to cloud services.

I recommend that you suspend judgment until someone wants to subpoena a case file, or internal memos and documents, for a high-profile malpractice lawsuit. It's much easier from the cloud than from an intranet or a file cabinet.

Most online medical records are encrypted for obvious reasons. But if a lawful subpoena comes down, the owner of the files is required to unencrypt and provide the requested materials, the existence of which is obvious from their online presence, even when encrypted. For local storage, a medical institution can say the files don't exist or are inaccessible. This option doesn't exist for online records.

The good news is that cloud storage speeds everything up and provides a measure of the degree to which patient data exists. That's also the bad news.

trifu · on April 9, 2013

> For local storage, a medical institution can say the files don't exist or are inaccessible. This option doesn't exist for online records.

I'm not sure if that's a good example of why local is better than online, since by stating to the courts that something doesn't exist (when in actuality it does)is lying to the courts...I'm no lawyer, but I'd imagine that doing so is punishable. I think the benefits of online medical files far outweigh the offline component for many reasons.

macspoofing · on April 10, 2013

>But if a lawful subpoena comes down, the owner of the files is required to unencrypt and provide the requested materials, the existence of which is obvious from their online presence, even when encrypted. For local storage, a medical institution can say the files don't exist or are inaccessible. This option doesn't exist for online records.

Ok, now I know you're trolling.

yesimahuman · on April 9, 2013

While your preferences are perfectly valid, I think it's safe to say web based editors and tools are very popular to a large amount of people. For example, everyone I know loves Google Docs and few of them use MS Office. That's real and shouldn't be discounted so quickly.

joeblau · on April 9, 2013

You guys are awesome! I love your hot sauce!

prg318 · on April 9, 2013

This is a really neat idea! It would be nice to see the editor widget be re-sizable (like an HTML text area) so that you don't have to deal with scroll bars for larger portions of text.

Pretty impressive though!

mikelehen · on April 9, 2013

This should be a pretty easy thing to add, and it's all open source... :-)

karl_gluck · on April 9, 2013

Very cool technology! I really appreciate your decision to allow us to just give it a try without having to set up an account or log in.

civilian · on April 9, 2013

I think there's a bug with people's text cursors writing over eachother.

http://imgur.com/1IvMccd

mikelehen · on April 9, 2013

Can you provide more detail? Here or firepad@firebase.com or as a github issue? From the screenshot it's not immediately obvious to me what went wrong.

People are certainly allowed to write over each other (e.g click in the middle of your sentence and write something). If that's not what happened, let me know

saurik · on April 9, 2013

So, as with most things built with Firebase, I have to ask how the security works. Last I talked to the Firebase team, they were building expression-only rules for managing server-side validation. This allows you to express some reasonable subset of permissions, but not all possible ones.

In this case, the OT history for the document (required to synchronize clients) is stored in Firebase (with each op being a separate object with a massive ID, I imagine this will become awkward with large numbers of old documents, but I digress). Additionally, snapshots are occasionally stored.

Rather better than previous offerings I've seen using Firebase, this demonstration has been put together to solve the first few obvious problems: I am not allowed to dump the set of documents[1], nor am I allowed to arbitrarily corrupt the database by deleting random objects[2]. So far, so good.

[1]: https://news.ycombinator.com/item?id=4780495

[2]: https://news.ycombinator.com/item?id=3824775

A set of example security rules for Firepad is actually provided as part of the GitHub project, so we can do some analysis of the kinds of checks we will need to bypass in order to break this particular demo ;P. (Of course, this just makes us faster, it isn't what makes this possible.)

https://github.com/firebase/firepad/blob/master/examples/sec...

Reading these, it turns out that the only verification that is being done on the snapshots is that 1) they look reasonably valid (have the correct set of fields) and 2) they have the correct author field associated with them (as in, the same one that is used on the history revision item).

However, it doesn't do any consistency checks on the data itself. It doesn't even verify that the snapshot we are uploading is different than the one currently on the server, so the problem of corrupting the state is really easy: we just need to pull the current snapshot and modify its data.

The only pain we could run into is that it could also verify that the author of the snapshot is the current user; but that doesn't help: all we need to do is to make a quick edit to the document and then use our new revision (which we legitimately own) to build our new corrupted snapshot.

That said, while that check is present in the example rules on GitHub, that check isn't actually used in the deployed copy of Firepad on this server as this server is entirely anonymous and thereby none of the users have any auth information at all... we can just pretend to be other users.

For users who wish to follow along at home, you just need to have node.js installed, and then do "npm install firebase". You can then use the following script to destroy any document you want: you just need to set the "room" variable to the ID # of the document you want to modify.

    #!/usr/bin/nodejs
    var Firebase = require('firebase');

    // parseInt(window.location.hash.substring(1))
    var room = 44;

    var shard = room % 15;
    var db = new Firebase('https://firebase-firepad' +
        shard + '.firebaseio.com/' + room);

    var check = db.child('checkpoint');
    check.once('value', function (value) {
        value = value.val();
        check.set({
            a: value.a,
            id: value.id,
            o: [''], // random data would be better
            // but I'm both lazy and busy today ;P
        });
    });

When the client then restores this snapshot and attempts to play back the resulting history to "catch up" it will instead end up outputting tons of errors to the console, as the operations stored in the history will be referring to document positions that no longer exist or are different.

    Firepad: Invalid operation. https://firebase-firepad14.firebaseio.com/44 C2GK

The client then has two options: it can either skip the history entry or it can decide the entire document is corrupt. In this case, it seems that Firebase believed the better of the two options was to simply skip these operations: new clients then manage to resync their state.

But, existing clients now have desynchronized state, so all operations that are being synchronized live between the various clients on either side of this split (ones that started from this snapshot, and ones that pre-existed it) are going to result in this error; that didn't really help.

To be very explicit for a second: this is a different scenario than just "well, its an anonymous system, so anyone can delete the data": we didn't just go in and delete the data in the document, we actually corrupted the state of the document, rendering further attempts to edit it useless.

It is currently my belief that Firebase's security rules system is simply not powerful enough to secure an OT-based text editor, whether or not it uses snapshots (at least assuming it supports offline; there might be tricks you can play if all users are required to be online at all times).

(edit: I am finding it interesting that my previous security analyses of Firebase projects, combined with example code, had been voted up quite high, and this one has now been downvoted to 0. I wonder if people just don't care as much about security anymore? Is it because it is open source? Is the Firebase team themselves going around voting down? ;P)

mikelehen · on April 9, 2013

Hey Saurik,

Thanks for the thorough and correct analysis as usual. :-)

The key things I would point out are that:

1) The checkpointing is an optimization. You could either remove it (which will hurt initial load time) or delegate it to trusted server code (which will be very lightweight; you could run hundreds of rooms off of a tiny EC2 instance or whatever).

2) In general, the whole point of collaborative editing is that you trust your collaborators. If they're malicious, they can already cause mayhem on your editing experience with constant edits, obscene content, etc.

saurik · on April 9, 2013

1) I do not need to modify the snapshots: I can simply inject corrupt history state. The problem is that the server in these kinds of systems is normally supposed to be running the OT algorithm in order to verify that the data being uploaded and stored as part of the permanent document record is valid.

(edit: That said, you would be hard-pressed to do this kind of OT-based text editor without the snapshots, especially with the very large number of separate objects being used to store the history state. While looking into how you were storing the data for this in Firebase, I had tried resetting the snapshot for a document to A0=[''], and attempting to open the document then bogged down so far that I wasn't certain if it would even recover; this problem will just get worse as the document ages... that only had a few hours of history behind it: a real document would just be screwed.)

2) There is a difference between trusting your collaborators with your data, and trusting your collaborators with your program state. Yes: if I am collaborating with people using Google Docs, the other people can ctrl-a+del all of the "data". However, they shouldn't be able to break the editor itself :(.

(edit:) As an example of this, if you remove the snapshots from the mechanism, then you can make the argument that "well, if I validate and ignore all history state that is invalid, this isn't a problem: I just need to keep the clients in sync and skipping things that are broken is valid" (so I'm happily willing to cede that my having added "whether or not it uses snapshots" was going too far). I personally think that this is still a problem, as the document record is still corrupt.

However, with the snapshots certainly, it isn't that I'm able to delete the data: it is that I'm able to break the synchronization system itself. I can setup situations where one party thinks they are editing the document, but their edits are being discarded. I can make it so that one person sees a document different than other people. In addition to doing all of this, I can make it nearly impossible to figure out who's doing it and to fix the situation. This is simply not the same problem as "well, you can always just ctrl-a+del the data from the document".

mikelehen · on April 9, 2013

1) The client ignores invalid history items (unless there's a bug). So while you can pollute the Firebase data if you desire, it shouldn't affect the behavior of the app in any way. (i.e. Other than the checkpointing thing you brought up, you can't corrupt the history.)

That said, Firebase is certainly pushing the envelope in terms of what you would normally do with client-only code. =] And with that comes some challenges. In some ways Firebase is more like a peer-to-peer system than a traditional client-server system (since the Firebase server isn't doing complete data validation / processing). This sometimes affects the way you write code (doing extra validation / sanitization on the client-side for instance), but I think the advantages that come with Firebase outweigh that by far.

saurik · on April 9, 2013

Well, the alternative is something like what many of your competitors (such as Parse) ended up deploying for handling these kinds of security situations, which allows you to write "real code" that runs in the cloud as part of the verification process: if you could run the OT verification algorithm on Firebase's servers, it allows you to avoid the problem of being unable to store trusted snapshots, but continues to offer the advantages of having someone else manage the complexity of operating the server and handling synchronizing the data. In such a case, the server could automatically generate the snapshots as part of a hook that would occur when the data is stored to the history buffer.

In this particular case, yes: you can drop the snapshots entirely, and have the clients download and replay the entire history state in order to synchronize as they open the document. That really isn't practical, though, and with your current implementation it is actually painfully slow to the point of being intolerable (although of course, you would then spend more time optimizing that path). I continue to not be convinced that you can implement a collaborative text editor that can be deployed in the myriad circumstances that Firepad both seems targeted at and that other HN users are commenting on with interest, and have it not have this problem of "users can break the synchronization".

mikelehen · on April 9, 2013

[I think HN is throttling us; I had to wait a while before a reply button appeared. Feel free to email me (michael at firebase) if you want to continue the conversation.]

If the standard mitigation strategies (adding authentication, banning malicious users, etc.) aren't enough, and you're worried about people breaking the synchronization, I agree you'd need to move the checkpointing logic to node.js server code. Sounds like a good example app for me to write when I've caught up on sleep and have some free time. :-)

We're also looking to do a security v2 in the future to expand on our existing security rule capabilities and we've discussed going the "real code" route or else allowing tighter integration with your own server-side node.js/firebase code.

saurik · on April 9, 2013

(You can just click "link" and get immediate access to a "reply" button.) As soon as I'm setting up my own servers and having to make certain they are secure, available, and scaling with the number of documents I have, I'm losing a lot of the advantages of using Firebase ;P. In comparison, with a model like Parse's, I can just push the code to them and have everything be handled without me having to get get my hands dirty. (Also, I'm currently 12 days behind on e-mail, but you guys can get ahold of me using other routes if you want or need to; at least Andrew should know how to get me quickly. I'm more just responding to the things you say here at this point, though: I have nothing new to add.) Great to hear that you may add "run real code on the server"!

batgaijin · on April 10, 2013

http://accumulo.apache.org/notable_features.html

cell labels

flywheel · on April 10, 2013

Security? Client-side encryption.

macspoofing · on April 9, 2013

Is it based on etherpad?

yuchi · on April 9, 2013

No it's based on [Substance](http://substance.io) OT library, but uses CodeMirror instead of the [Surface](http://interior.substance.io/modules/surface.html)

(Substance Team member here)

unwind · on April 9, 2013

By the way, unfortunately HN doesn't do Markdown. Links should just be links, no separate display text.

yuchi · on April 9, 2013

Sorry, just used to find md pretty everywhere to the point I forget HN does not support it!

Well, then, _pardon my markdown_ :)

ianstormtaylor · on April 9, 2013

I actually like reading the Markdown, it's intuitive and then I get to have display text that makes the sentence easy to parse.

yuvipanda · on April 9, 2013

Sweet! This will be fun to integrate into Wikipedia for editing...

/me plots

nahimn · on April 9, 2013

The homepage turned into a giant chat room