Build your own Google Docs with GoInstant's new OT API

habosa · on April 30, 2014

An Operational Transform API has always been my "maybe when I have more time and knowledge" idea. I'm glad to see someone else doing it and I hope I have a chance to use it. OT is the magic behind Google Docs, and it's one of those really elegant protocols that most CS people instantly appreciate when it's explained to them.

http://en.wikipedia.org/wiki/Operational_transformation

gritzko · on April 30, 2014

Repeatedly rewriting operations in-flight is probably the worst trick one can use to deal with partial order.

josephg · on April 30, 2014

Care to justify that assertion? I think OT is a fantastic way to deal with partial order.

OT can be used to guarantee eventual consistency while preserving intent. And unlike transactions, it lets you edit your local model syncronously in the client without waiting for a server round-trip. When you have a stack that can do it, its a really great system.

gritzko · on April 30, 2014

First of all, I know what I am talking about; I published on the subject, I also created a working system (CRDT-based).

Second, in my understanding, transactions don't deal with partial order; they do linearization. That indeed requires a round-trip.

Third, I also sketched some OT-based solutions, so I saw both sides of the coin. According to my experience, OT is terribly complex and fragile. Basic principles are simple, but various concurrency issues make it a nightmare further down the road. Operations are rewritten in several places, all those rewrites are interdependent, partial failures create really tricky situations, etc.

Fourth, Google has a working OT stack, but they also have a working Paxos. They simply have too many brainy folks and too much of equipment/money, probably.

Fifth, all the problems OT solves by entangled op rewrite trickery can be trivially solved by brute force (not that much of force actually, see CRDT).

Sixth, the very notion of "intent" in OT was a really really big misnomer. "Intent" is OK for a legal term, but it is terrible for CS, as it has no intuitive formal meaning (as opposed to "convergence", for example).

Seventh, OT has a strong "magic factor". It is too smart kind of, so it strikes a spark of excitement in some folks. That is often a bad thing (see eg "digital fountains").

gritzko · on April 30, 2014

P.S. "implementing OT sucks" -- Joseph Gentle

amix · on April 30, 2014

I can recommend this article that explains OT really well: http://www.codecommit.com/blog/java/understanding-and-applyi...

amix · on April 30, 2014

Are there any great open-source implementations of OT? Last time I looked there were some, but the quality and popularity of them was questionable.

jimktrains2 · on April 30, 2014

https://github.com/share/ShareJS

That guy also has a C library too. IIRC he used to work on the wave team.

josephg · on April 30, 2014

(Author here)

ShareJS also supports collaboratively editing arbitrary JSON documents. Support for seeing everyone's cursors is being added at the moment - there's a sandbox kicking around here if you want to play with the prototype version: http://sharejs.org:7007/

trevordixon · on April 30, 2014

https://github.com/Operational-Transformation/ot.js/

pingpong_table · on April 30, 2014

For collaborating on general JSON documents, you ought to check out ShareJS; it is a great library.

However, for more complex tasks like rich text editing (at least richer than what markdown supports) you'll need to have an implementation tailored for the task.

bsoft16384 · on April 30, 2014

Google Drive Realtime API developer here. https://developers.google.com/drive/realtime/

If you're trying to build your own Google Docs, you might want to consider using the Drive Realtime API, which is based on the same technology as Google Docs.

The Drive Realtime API has a wide range of mutations built in for common data structures: - Maps (last writer wins) - Lists (insert, delete, overwrite) - Strings (insert, delete, overwrite)

We also support arbitrary data structures including trees and graphs.

For modeling cursors, we have "Index References" which are effectively pointers to a specific element in a string or list. You can move them around on the client side and we have all of the transformation logic worked out so that concurrent edits don't make them point at the wrong position.

Because the data is stored in Google Drive, you can create, share and organize files using the standard functions in the Google Drive API. There's also an easy-to-use REST API for importing and exporting data as JSON.

pingpong_table · on April 30, 2014

This is nice!

WebODF[1] developer here. Commenting because the project is doing this sort of thing and I have decent experience writing OT. I'll give a short explanation of how Operation Transformation works.

"Build your own Google Docs" is a slightly brave claim - "build your own Etherpad" would be more accurate.

This is because their operation set is very simple; for rich text documents the operations simply do not contain enough information to transform them against other ops. At the moment, their API includes 3 operations:

    * Insert  
    * Format  
    * Delete

What I don't see:

Something like MoveCursor, for a start. From the source code of the markdown editor, it appears that cursor position synchronization is handled outside of the OT realm, likely by some other signalling. This is bad because the cursor positions do not get transformed at all - you can see this if you try to collaboratively edit on the same line in the demo - your cursor jumps around unwieldily.

Why this happens:

Imagine two users A and B concurrently editing inside a paragraph (a | denotes a cursor, belonging to B):

  <p>abc|def</p>

Now suppose A inserts "123" at position 3 (after c), and B tries to move his/her cursor to the right by 1 step. Assume that A's operation reaches the server first. Then server's and A's state is:

  <p>abc123|def</p>

because B's cursor is pushed to the right by the preceding text.

Then B's operation "move cursor to position 4 (after the d) reaches the server. But A's operation has not reached B yet. The state on B becomes:

  <p>abcd|ef</p>

And the state on the server becomes:

  <p>abc1|23def</p>

Now B's op is relayed to A, and A's to B, and A's state becomes:

  <p>abc1|23def</p>,

and B's state becomes:

  <p>abc123d|</p>.

because there is no OT for "MoveCursor" against "InsertText" -- in fact "MoveCursor" is not even an op here. This is an inconsistent state, which is basically a broken editing session (should never happen). Fixing this up by passing around diffs is not allowed (in a strict sense); diffing is not controllable OT.

WebODF uses stateless client-side OT, so any server is just mostly a dumb operation relayer. See what the OT logic for these two operations does: https://github.com/kogmbh/WebODF/blob/master/webodf/lib/ops/...

If the "InsertText" operation's position was lesser than the concurrent "MoveCursor" position, "MoveCursor" is transformed to increment it's position by the "InsertText" operation's text length. This ensures that the cursor, while moving, will end up at the intended position within the word you were looking at. (you'll notice that "MoveCursor" has a 'length' parameter in WebODF because a cursor also represents a user selection, yes it has colored collaborative selections).

OT'd process:

  <p>abc<cursorB/>def</p>

A inserts "123", A's state is:

  <p>abc123<cursorB/>def</p>

B moves cursor to position 4, B's state is:

  <p>abcd<cursorB/>def</p>

A gets B's op, transforms it to increment position by "123".length = 3, state becomes:

  <p>abc123d<cursorB/>ef</p>

B gets A's op, applies it because moving a cursor doesn't affect text insertion, state becomes:

  <p>abc123d<cursorB/>ef</p>

Hooray, Eventual Consistency!

Any OT implementation is full of such opinionated enforcements and tradeoffs. It should make some assumptions about the editing UX and make sure that this is preserved when doing realtime concurrent editing; ignoring that and only going for eventual consistency will result in situations where the user cannot look at where they are typing.

WebODF has many such operations (~23) for each kind of edit: "SplitParagraph", "InsertText", "ApplyHyperlink", "ApplyDirectStyling", etc at last count. A large number of them are supported in collaborative mode, i.e. they have OT written. And this is no small task, because each operation needs to have a transform written against every other operation in the set of all operations, including itself. It does help that not all operations interfere with others, so the OT matrix is somewhat sparse. :-)

Fact: Any complex rich text editing framework will need to have it's own customized implementation of OT. We're dealing with the Open Document Format, we have Operations that are significantly tailored to it.

I feel that GoInstant's API is a nice thing! But it needs: 1. More operations. 2. Purely client-side OT (not strictly required, but very very useful).

Shameless plug: WebODF is a FOSS project with ~3 fulltime developers (including me) working on it. We are also working with the ODF change-tracking subcommittee to standardise things. We need more contributors and/or donations. :)

  [1] http://webodf.org/

EDIT 1: Sorry, this post became longer than I had wished. I hope it helps anyone understand how OT works. Also, 'gratulations to GoInstant for launching this. :)

EDIT 2: Reformatting post for readability.

josephg · on April 30, 2014

(Ex-wave, sharejs.org author)

You should be aware that on the google wave team we eventually dumped the idea of 'SplitParagraph' / 'JoinParagraph'. We spent a long time trying to make them work, but eventually proved that it was impossible to make our OT system work correctly on those operations. I suspect that you'll have the same problem.

I would advise you to map your 23 operations down to a much smaller set if you want to get OT working correctly. With wave, we ended up implementing 'SplitParagraph' by inserting an empty <newline /> tag. We could remove that tag collaboratively (correctly), and all the OT semantics worked much better.

jhchen · on April 30, 2014

GoInstant developer here. First thank you for your kind words and thoughtful comment!

We did focus purely on text synchronization as the first step so you are correct cursor positions may be in the wrong spot if you are just moving the cursor around while conflicting text is being inserted/deleted. We do also use a post OT heuristic to place the cursor as the end of a text edit though so the cursor while typing should still look natural. Since cursor position is not included in OT I'm not sure I follow your example where abcd|ef becomes abc123d| though.

This was a tradeoff we made to launch earlier but more core cursor sync support is definitely possible in the future.

pingpong_table · on April 30, 2014

> Since cursor position is not included in OT I'm not sure I follow your example where abcd|ef becomes abc123d| though.

In the OT-less example, I meant that for A,

  <p>abcd|ef</p>

, where | is B's cursor, would become

  <p>abc123d|</p>

when A inserts '123' at position 3, because it would push any other content to the right, including a cursor, because the cursor must remain between the two letters it was between, regardless of what changes in the surrounding text some distance away.

> This was a tradeoff we made to launch earlier but more core cursor sync support is definitely possible in the future.

Makes sense. Looking forward to see more. :)