What's been wrought using the Piece Table? (2014)

joeax · on Oct 2, 2017

I was tasked with writing a undo/redo framework for an existing client app a few years back. It's basically an HMI system (a tool typically used in manufacturing to visually model an industrial system) that used SVG+HTML. I could have used a piece table concept (didn't know it was called that), storing a diff fragment of what was changed (using a library like diff-match-patch). Unfortunately, there was more than a text-representation that needed to be maintained so it didn't work initially.

I wound up using a stack-based architecture that pushed operations stored as functions onto a main stack then its reverse operation onto an undo stack. Once I worked out the state management problems it worked out quite well.

I've always wondered what other approaches others have done if tasked with a similar problem.

jcelerier · on Oct 2, 2017

> I wound up using a stack-based architecture that pushed operations stored as functions onto a main stack then its reverse operation onto an undo stack. Once I worked out the state management problems it worked out quite well.

that's quite common: it's called the Command pattern. https://en.wikipedia.org/wiki/Command_pattern

Another common approach is Memento (basically, "save everything")

A nice thing that can be added is a pair of serialization / deserialization functions for the state needed for your commands. This way you can save them regularly and restore them if a crash or something like this happens ; the user can then continue to work while keeping its undo / redo stack

btown · on Oct 3, 2017

Musescore (open-source music notation software) uses this pattern, for those interested... but not in a particularly DRY or stable way (so much duplicated logic!). I'm sure other open-source examples exist, but this should give you an idea:

https://github.com/musescore/MuseScore/blob/master/libmscore...

https://github.com/musescore/MuseScore/blob/c4af0c22bc90a6ba...

jcelerier · on Oct 3, 2017

Here's how I do it (interestingly, on a Qt-based art software too :p )

https://github.com/OSSIA/score/blob/master/base/lib/core/com...

https://github.com/OSSIA/score/blob/master/base/plugins/scor...

Also once your commands are serialized you can send them over the network for distributed edition

aaron-lebo · on Oct 3, 2017

http://gameprogrammingpatterns.com/command.html

manmal · on Oct 2, 2017

The command pattern only describes the encapsulation of an imperative, but not a complete undo/redo stack.

jcelerier · on Oct 3, 2017

Technically, yes, but every book I read that talked about the Command pattern did so in the context of an undo-redo system

flukus · on Oct 2, 2017

> A nice thing that can be added is a pair of serialization / deserialization functions for the state needed for your commands.

I wonder if this was the origin of modal text editors. Maybe they were designed for machine/programmer convenience more than machine convenience originally.

ZenPsycho · on Oct 3, 2017

they were designed around the constraints of slow teletype machines and slow 300 baud modems. they make no sense as a UI until you take into account how long you used to need to wait for a computer to respond

DennisP · on Oct 3, 2017

A modal editor is not necessarily less efficient in keystrokes than one using control keys; modal just means you don't have to hold your control keys down while completing the command.

Using keyboard commands is often faster than pointing with a mouse, if you take advantage of structures in the text; e.g. "select everything inside the parens." And if you're doing that all day, it becomes second nature to build macros on the fly.

ZenPsycho · on Oct 3, 2017

Less efficient in keystrokes? who said anything about that?

DennisP · on Oct 4, 2017

If you use a keyboard-based editor, presumably one that takes fewer keystrokes is a better UI, other things being equal. (There's also learning curve to consider but everyone has their own tradeoff on that.)

Of course there are also mouse-based editors, which my second paragraph addresses.

ridiculous_fish · on Oct 2, 2017

This is how NSUndoManager works in Cocoa, with the additional twist that recording the reverse operation is performed by actually invoking it on a proxy.

Example:

    - (void)setTitle:(NSString *title) {
      [[NSUndoManager prepareWithInvocationTarget:self]
          setTitle:currentTitle_];
      currentTitle_ = title;
    }

This setTitle: method may be undone by calling setTitle: again with the old title. We actually send that to the undo manager, which records it.

When the user invokes undo, the undo manager replays the message it was sent. setTitle again records how to undo the replayed operation, but this time it gets pushed onto the redo stack!

What is nice about this approach is there is no need to manually reify commands (i.e. Command pattern). It's just a message send.

wila · on Oct 2, 2017

Having written an undo/redo stack for a graphical editing component. That's pretty much what I did. Maintain a redo stack that holds the operations and parameters.

An undo then became going back to the original image plus a redo of everything on the stack minus the last operation. Rather simplistic, but it worked well for our purposes and was easy to implement.

default-kramer · on Oct 2, 2017

I do this all the time when I write a game which can easily be described as a sequence of discrete moves. Chess is a good example, and games with physics engines are the opposite. As you said, the Undo operation is just "start with the initial board and apply N-1 moves to get the current state." This plays nicely with immutability too.

terminalcommand · on Oct 3, 2017

Starting with the initial board and applying n-1 moves should be significantly slower than using immutable data structures. I don't have much experience in algorithms but I think O(1) trumps O(N-1) :).

wila · on Oct 3, 2017

The problem with graphics though is that some graphical operations are destructive so you can't just undo the operation.

In order to undo without a redo list, you would have to keep a complete graphical representation of the image and each operation. With images the memory usage goes up very quickly in that case, copying large blocks of memory on each operation makes your image editing slow down too.

beached_whale · on Oct 2, 2017

I saw a video and Photoshop used to use the command pattern but moved to just saving complete images(mostly) and it was much simpler and fast

Sharlin · on Oct 3, 2017

Lightroom, on the other hand, stores only the edit history by its very design (non-destructive editing). Though I guess that during editing it may cache intermediate results to speed things up.

wila · on Oct 3, 2017

Well I wasn't creating Photoshop :) this really was the easiest solution for our needs.

rdtsc · on Oct 3, 2017

I interned at a CAD/CAE company and our "cool" feature there was a history tree. I don't know maybe all of them do it today, but I remember being thoroughly impressed then, having only seen word and simple editor undo/redo before.

Since we had all the visualization bits written, we reused them and displayed it as a tree and you'd try different approaches as branches by clicking between them to compare. I guess it was basically as a seeing the history graph of a git repo, but this was about 5 years before git was created.

westoncb · on Oct 3, 2017

Anyone know if there is a best-practice immutable data structure variation here? The way I've thought about it is that if you store the entire sequence of operations which have occurred so far, and you have some function that can re-build the current state from that list, then you don't need any inverse operations. This means that undoing an operation is just removing the last operation performed from the operation list, and then rebuilding the state.

This seems nice since you don't have to figure out inverse operations (which I've seen get tricky), but I imagine the performance penalty of having to store all operations and rebuild the entire state every time could be problematic in some cases.

Then again, I imagine there are good standard optimizations for dealing with this kind of thing. One thing that occurs to me is that operations which 'cancel' each other could be sought out and eliminated before rebuilding the state.

I'd be glad for any more information on the subject!

Edit: more specifically I'm wondering if those optimizations do in fact exist, and if so what they are.

rawnlq · on Oct 3, 2017

Do you mean https://en.wikipedia.org/wiki/Persistent_data_structure or something more specific for this case? It's known how to only incur an O(1) cost in time and space for making a data structure persistent. So then you just need to keep a reference to all past versions of the data structure you might ever want to undo back to.

EDIT: actually reading the wiki more carefully, I think they were talking about making a BST persistent specifically rather than any data structure. In that case looking at implementations such as rrb-vectors might be more interesting: https://github.com/clojure/core.rrb-vector

westoncb · on Oct 3, 2017

That wikipedia article looks interesting. I guess the only thing more specific I'm wondering about is persistent data structures for undo/redo systems. But just knowing the name 'persistent data structure' is useful.

underwater · on Oct 3, 2017

You need a reference to each state, but because the components are themselves immutable you can share objects. For example a Document may contain many Blocks which contain many Lines. Changing a Line means replacing that object, its parent Block and its parent Document, but references to everything else can be copied over.

unoti · on Oct 3, 2017

> and you have some function that can re-build the current state from that list...

You can be far mor efficient than that. The trick is to use immutable data structures to represent your state. When something changes, return a new state. Parts of the state that didn’t change are still referring to chunks from the previous state, so it’s actually sharing most of the memory from the previous state.

There are data structures in libraries in different languages that support this idea, for example system.collections.immutable in .NET. Anything you build in Erlang or Elixir will basically automatically do this. (As an aside, Erlang IOlists are a related concept to the piece tables the article talks about.)

Using this approach you don’t actually have to rebuild a previous or redo state, you just keep a list of states and move between them for undo and redo.

scentoni · on Oct 3, 2017

A list zipper? https://en.wikipedia.org/wiki/Zipper_(data_structure)

SamReidHughes · on Oct 3, 2017

An advantage of storing operations instead of using immutable data structures is that with them you can implement region-based undo.

throwanem · on Oct 3, 2017

Seems like a fair fit for something like a Merkle tree. Do it properly and you could even save and load undo history with the file, treating external changes seen on load as new history states - which might be a nice feature to have.

ndh2 · on Oct 3, 2017

I read that Blender initially didn't have undo. Saving to a file was quite fast because they used binary data. So what they did was save the scene to memory, then save again, and compute a diff for the undo queue.

kobeya · on Oct 2, 2017

At the cost of doubling the state size you can maintain an undo skip list to get log N performance for long undos. But that is useless for he common case of stepping back one at a time.

laythea · on Oct 2, 2017

This task becomes more difficult once you need to undo calculations performed on the GPU. Not sure I will encounter a nice way to reverse those operations without numerical inaccuracies.

duncan_bayne · on Oct 2, 2017

Can you not store the result of those calculations, as a means of "reversing" them?

As someone who's not directly programmed GPUs before, they look quite intimidating and opaque; certainly, that's the sense I get from HN threads and the like.

laythea · on Oct 3, 2017

I wouldn't call GPU intimidating, more a royal pain in the butt! :) The approach of simply saving state doesn't work very well due to performance reasons. I.e. If user is "editing" some of 5-10 million+ vertices, via some sort of brush tool that runs modifications to vertices through a GPU shader, then saving the state becomes a big no-no due to the huge delay in copying the entire current state of the vertices back to the CPU. This results in a short (but unusably irritating) pause, before the user can use the editing brush. Well just copy the bits you are interested in you may say, but there is no way I have came across to save only regions of the vertex buffer that the user "touches". There may be some fan-dangled way to save the "operations" not data, and then run them through a "reverse" shader, but I have a feeling that is overly complex and subject to inaccuracies. Darn, GPU's are fast though.

kobeya · on Oct 2, 2017

I believe this is what graphics tools do, for other reasons as well. Each operation creates a new layer and what you see is the dynamically rendered composition. This lets you do things like adjust the strength / intensity of the effect as well, by changing the weight of the layers.

kornish · on Oct 2, 2017

For context, this was almost certainly submitted because of a recent submission about data structures for use in text editors.

If anyone wants to check out the comments: https://news.ycombinator.com/item?id=15381886.

punnerud · on Oct 2, 2017

True, but the original post only partially talk about the algorithm at the end (when I almost was on my way to leave the article). The link I posted was mentioned as a cool read in between, but I feel it deserve more credit and readers.

kornish · on Oct 2, 2017

Absolutely. Thanks for posting!

thomastjeffery · on Oct 2, 2017

If you are interested, I wrote a quick Piece Table implementation in Haskell there.

projektfu · on Oct 2, 2017

Just mentioning that "unlimited undo/redo" meant that it was really fast to undo and redo one step in Word. This system didn't necessarily enable an undo history like we expect from word processors now. In fact, up to Word 5.1a for the Mac and Word for Windows 2.0, Word didn't have undo history. Undo would undo the last action, and then it would undo the undo. I believe undo history was added to Word 6.0.

rosstex · on Oct 3, 2017

Fun fact: UC Berkeley's intro course on Data Structures used a Text Editor for the massive project of the semester in Spring 2016. (Source: I wrote the autograder.)

Our undo/redo feature was suggested to be implemented as a stack.

http://datastructur.es/sp16/materials/proj/proj2/proj2.html

hangonhn · on Oct 2, 2017

Wow. The inventor Piece Table is none other than Moore of Boyer Moore string search algorithm:

https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_sea...

karmakaze · on Oct 2, 2017

Also related to both strings and piece tables is the rope data-structure for concatenations of strings

e98cuenc · on Oct 3, 2017

A few years ago (2003, time flies) I wrote a piece table with a twist, instead of using a linked list I used a tree. I put a description of the whole algorithm here:

http://e98cuenc.free.fr/wordprocessor/piecetable.html

A years later it was integrated in AbiWord.

There was a major mistake in the profiling. The dots in the graphs were the best of 3 measuraments. At the time I thought that was the fairest to prevent outliers due to concurrent process. It also warms the data in the cache making the best measurement super fast.

tudorw · on Oct 2, 2017

If you're interested in this kind of structure it's worth taking a look at 'transclusion' devised by Ted Nelson in the mid 60's https://en.wikipedia.org/wiki/Transclusion reply

rhinoceraptor · on Oct 2, 2017

Is fast save the (in)famous Word file format where the in-memory state is written to a file?

duncan_bayne · on Oct 2, 2017

Yes. Along with - as I assume you're alluding to - state that you might not intend to share along with the current state of the document.

kobeya · on Oct 2, 2017

Saving internal memory state to a file is such a terrible idea, who would do that!

Proceeds to go write a program to manipulate memory-mapped files.

duncan_bayne · on Oct 3, 2017

It's a great idea, seriously. As demonstrated by the details in this article.

_But_ non-technical users do not understand how it works, which means that they were _very_ surprised to find state other than the obviously visible stuff being saved along with documents, and thus sent accidentally to recipients.

If you're going to make use of this sort of clever optimization, you owe it to your users to at least document the consequences, if not the specific implementation.

Edited: I think you read more criticism into my post than was intended on my part.

rhinoceraptor · on Oct 2, 2017

That’s what Emacs (and AFAIK, some other interactive lisp environments) do to speed up boot time.

p4bl0 · on Oct 2, 2017

The famous unexec function…

kobeya · on Oct 2, 2017

I know. It was sarcasm.

yuhong · on Oct 3, 2017

Of course, this was when floppies was higher quality too.

Sammi · on Oct 3, 2017

The writer of the article wrote more details in a reddit discussion on the article three years ago: https://www.reddit.com/r/programming/comments/22fpz0/whats_b...

Found it in the wikipedia footnotes :)

frandroid · on Oct 2, 2017

How does a piece table differ from a linked list?

ithkuil · on Oct 2, 2017

More similar to a rope/cord: https://en.m.wikipedia.org/wiki/Rope_(data_structure)

thomastjeffery · on Oct 2, 2017

In a Piece Table, a Piece is a representation of an edit. You can keep these edits in any structure you want, including a linked list. If you are interested, I posted such an implementation (using Haskell) here:

https://news.ycombinator.com/item?id=15381886

ww520 · on Oct 2, 2017

Linked list is a generic data structure with the linking information specified and none of the content is specified. Piece Table spells out what kind of content data to maintain, like the offset and length of the data fragments of the original document and the change log document.

cwt137 · on Oct 2, 2017

Is this in any way related to Operational Transformation?

kristianp · on Oct 3, 2017

There are some other pages about MS Word here too: https://web.archive.org/web/20150822081514/http://1017.songt...

infogulch · on Oct 2, 2017

What's the difference between this and ropes? They seem conceptually similar.

terminalcommand · on Oct 3, 2017

Ropes are inherently a binary tree. From what I can understand you can implement piece tables both as a double linked list and a b-tree. Furthermore in ropes every leaf spans one character, in piece tables, pieces have different lengths.

Furthermore with ropes, every operation returns a new rope data structure, that means ropes are immutable. If you want to implement undo, you only keep references to the former ropes. There is no inserting or adding via looking up the changes in an undo list.

I think maybe if you implemented a piece table, where every piece spans one character, used a binary tree and made it immutable, you'd get a ropes data structure.

For more information about persistence vs immutability this link might help: https://stackoverflow.com/questions/10034537/persistent-vs-i...

deathanatos · on Oct 3, 2017

> Furthermore in ropes every leaf spans one character,

The Boehm paper on ropes[1] (which is about the only academic literature on the subject that I know of) does not do that (and explicitly suggests one should not), nor does any real-world implementation, (e.g., the Boehm implementation, SGI's impl) do that. It would be incredibly inefficient, and for no real gain.

A good rope implementation will store an array of characters/bytes in the leaves, up to some threshold.

> Furthermore with ropes, every operation returns a new rope data structure, that means ropes are immutable.

There is nothing inherently immutable about ropes, and it is certainly possible to mutate a rope. (For example, appending a single character to a leaf with space is much quicker if the rope as a whole is mutable.)

Look at the SGI implementation for an example here; their reference docs[2] contain sufficient details to see that the rope itself is mutable.

Now, a rope typically just describes a sequence of characters. It is entirely possible for a leaf node in a specialized rope to reference on-disk content, and other leaves to reference in-memory content. In that regards, it can be like a piece table. Implementing an undo/redo on top of a rope is less straight-forward; whereas a piece table's undo history can essentially share large portions of the linked list (see the lovely diagrams here[3]) I'm not sure the same can be done on a Rope's tree, due to the tree's need to balance and re-balance. That is, without copying the tree, since that kind of defeats a lot of the benefits that a piece table has — not needing to copy the entire "file", even if that's just a bunch of spans.

Now, one could make the leaves in a rope ref-counted (so they could be shared) and then copy the Concat nodes for undo/redo levels. Keeping the concat nodes as copies means the copies can balance independently, and since concat nodes are small (~2 pointers for left/right and a depth, IIRC) that isn't too much copying (the bulk of the data, the text, is refcounted in the leaves). But the simplicity of the piece table really starts to shine at this point.

[1]: http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.14.9...

[2]: http://www.sgi.com/tech/stl/Rope.html

[3]: http://www.catch22.net/tuts/piece-chains

terminalcommand · on Oct 3, 2017

I stand corrected, thanks for the thorough answer.

My interest in ropes was arisen from the "data structures for text editors" post here on HN. Most resources were haskell implementations. IBM Developerworks had an article about ropes and a corresponding java library, the article stated ropes were immutable. https://www.ibm.com/developerworks/library/j-ropes/index.htm...

I admit being wrong on every node storing a single character, that misconception stems from a graph I saw representing ropes on yesterday's article.

Undoing with immutable ropes is very straightforward I think. You don't copy the whole file, but just the references. I admit it is heavier on the memory, but you could store an arbitrary amount of previous "states" or versions in ram. The benchmark on the Java library may prove this point.

I will try to read the original article to gain more insight. Thanks again for the resources.

mcguire · on Oct 3, 2017

Is there nothing that J Strother Moore hasn't touched?

anovikov · on Oct 3, 2017

Back in 2000, i was hired to write an app that recovered damaged Word documents, and i had to experience it first hand. Impressed me as well!

bitmapbrother · on Oct 2, 2017

The title makes it seem like it was exclusive to Word and developed by Microsoft when in reality it was developed at Xerox and used by their Bravo word processor.

cpr · on Oct 2, 2017

Right, since Charles Simonyi wrote both originally.

dang · on Oct 2, 2017

That was the rewritten title (see https://news.ycombinator.com/item?id=15388593). We've reverted it to the original now.

punnerud · on Oct 2, 2017

God point, thanks. Agree with you. I will try to remember that next time an article make the front page.

bitmapbrother · on Oct 3, 2017

Much better. Thanks.

ksk · on Oct 2, 2017

It does seem like that, but the piece table itself is not an algorithm FWICT, and as an abstract concept it can be implemented in various ways (with varying degrees of optimizations). A car is not just a car!

Given that people are still attempting to re-invent text editors, it would be interesting to know if anyone has unknowingly come up with similar ideas in their design.

yuhong · on Oct 2, 2017

The history of FullWrite is slightly wrong. Wikipedia has a better one: https://en.wikipedia.org/wiki/FullWrite_Professional

(1988 is also around the time of a famous DRAM shortage BTW)

nerdponx · on Oct 2, 2017

What was wrong with the original title? It was "What's been wrought using the Piece Table?"

castis · on Oct 2, 2017

I would hazard a guess that others, like myself, would have simply passed this article by, not knowing what a piece table was.

I know now, which I hope is the important part.

https://en.wikipedia.org/wiki/Piece_table

dang · on Oct 2, 2017

We've changed the title back to that. It was "The Algorithm that enabled unlimited undo and fast save/copy-paste in Word" before.

One secret of HN titles is that sometimes the non-original title helps a story make the front page but then reverting it to the original makes it more interesting again.

punnerud · on Oct 2, 2017

I had no idea what a Piece Table was and how important it is, so I wanted it also to be reflected in the title.

JadeNB · on Oct 2, 2017

It seems a bit click-baity ("well, what has been wrought using it?"). I suppose that the current one is arguably still a bit so (it could be "How the piece table enabled unlimited undo …"), but it seems better to me.

yodon · on Oct 2, 2017

Hmmm... my guess is you're probably not making a whole lot off those ads on your website if "What's been wrought using the Piece Table?" strikes you as click baity.

always_good · on Oct 2, 2017

They're referring to the format of "click to find out!" when it can be mentioned right there in the title.

That nobody is going "oh boy, so what has been wrought using the piece table?" is part of their point.

punnerud · on Oct 2, 2017

Click to find out without the normal read the whole article to find out? You are given a new title when you enter the site. Those two combined give you more then enough information to figure out if you already know it. And if you already knew that Word is using the algorithm, you would probably just skipped or opened the link to verify that is was this algorithm. All this combined, do you mean I should have kept the original title?

cnp · on Oct 2, 2017

The old one conveys much less information than the new, all things considered

bernadus_edwin · on Oct 3, 2017

We call it now as redux