Modernizing the DOM Tree in Microsoft Edge

c-smile · on April 19, 2017

So new IE’s dom::node essentially looks as

    struct node {
      element* parent;
      node *first_child, *next, *previous;
    }

And that was also my initial implementation of it in Sciter Engine (https://sciter.com) but after some testing I’ve found that these two structures work better:

    struct node {
      element* parent;
      uint node_index; // its index in parent->children
    }
    struct element: node {
      vector<node*> children;
    }

a) better suits DOM manipulation/traversal needs, b) more compact and faster, c) more CSS friendly (for things like :nth-child(2)), e) yet the structure ensures that there are no loops in the tree.

bzbarsky · on April 20, 2017

A problem with the vector representation is that it makes some common DOM operations (removeChild and insertBefore) O(N) in the number of children. This can be a major problem in some cases. Gecko actually used to have that sort of representation and we have been trying to move away from it, for precisely that reason.

If your DOM is more static, I agree it's a better representation.

c-smile · on April 20, 2017

removeChild and insertBefore on DOM tree are always O(N) operations or at least causing full scan of children later. No matter how children are stored.

Think about rules :nth-child(4) { border:1px solid; } and the like.

bzbarsky · on April 20, 2017

> removeChild and insertBefore on DOM tree are always O(N) operations

It depends. If you insert only one node, I agree you end up doing some O(N) work later in many cases (though not nearly always). If you remove/insert k nodes at the front, in the vector case you do O(kN) work while in the doubly-linked-list case you do O(k+N) work at worst. In the case when k == N the difference is between O(N) and O(N^2).

And yes, this is a problem in practice. We've seen popular web libraries doing this to remove all the kids of an element:

    while (foo.firstChild)
        foo.removeChild(foo.firstChild);

which will be O(N^2) in a vector implementation.

So the real question is what the amortized cost of an insert/remove is. Unless your web page is flushing out layout after every DOM modification (which is sadly too common), you will only redo selector matching every however many DOM mutations. In many cases "however many" is "thousands".

c-smile · on April 20, 2017

    while (foo.firstChild)
        foo.removeChild(foo.firstChild);

Let's assume that there are 100 children. So we will have 5000 a[i] = a[i-1] operations in total. This is really nothing for modern CPU.

But I agree that even this is not good.

That's why I have Element.clear() and Element.prepend|append|insert([array of nodes]) methods.

As in any case calling of DOM method (as any other function ins script) is comparable to 100 of a[i] = a[i-1] in native code.

I mean that such O(N) problems are better to be solved by specialized native functions.

bzbarsky · on April 20, 2017

> Let's assume that there are 100 children.

The cases where I've run into this being a problem involved thousands to tens of thousands of children. At 100 children there's really no problem, I agree.

> So we will have 5000 a[i] = a[i-1] operations in total.

Plus the index updates, unless you do those lazily, right? Again, not a problem at 100 kids, but at thousands of kids the cache misses start being pretty annoying.

> That's why I have Element.clear() and Element.prepend|append}insert([array of nodes]) methods.

Well, sure. Browsers could expose better APIs for this stuff. And web pages could use those APIs. And then browsers may face different tradeoffs in terms of internal representation.

As things stand, though, we don't have those better APIs, and if we added them right now it would take a decade or more for people to use them consistently enough that browsers could optimize based on that...

> As in any case calling of DOM method (as any other function ins script) is comparable to 100 of a[i] = a[i-1]

Last I measured, the overhead of a call to a DOM method (so not counting the work it actually does) in Firefox was on the order of 30-40 clock ticks. a[i] = a[i-1] is likely at least four instructions (two lea, two mov on x86 for example) unless you vectorize things. But probably about comparable in the vectorization case, yes.

> I mean that such O(N) problems are better to be solved by specialized native functions.

In an ideal world, I agree. I wish we lived in that world!

c-smile · on April 20, 2017

They really should use just one call element.innerHtml = "";

to clear element's content. And use doc fragments for massive insertion.

Rest of us shall not pay for someone's inefficient code.

Good wishes of course but still ...

fmap · on April 20, 2017

And just to be clear, the use case where you are doing repeated lookups/traversals of a linked list is uncommon?

If not, you could always go with a hybrid implementation, e.g., using a vector with a changelog and applying the modifications only when you traverse the structure, or using some persistent sequence data structure with high branching degree. Either one would use less memory than a linked list, and frankly, I've never encountered a situation where a performance critical data structure was best implemented as a doubly linked list... The memory overhead for large structures was always prohibitive.

bzbarsky · on April 20, 2017

No, that's common too. Think childNodes traversal.

So you do need something in addition to the linked list to make sure that at least in-order childNodes traversals are O(N), not O(N^2), in common cases.

Memory use is a concern, but not the overriding one. A very large web page (e.g. the HTML spec) has maybe 500k nodes. An extra 4 words of memory per node, on 64-bit, then translates to ~16MB of RAM. If you look at actual memory consumption on that page, you will see that 16MB is not _that_ huge a deal, unfortunately.

The big problem with a doubly linked list is loss of memory locality and the resulting cache misses on traversal. That's something browsers are still working on figuring out, honestly; there is various work around trying to arena-allocate nodes and whatnot, but it has various drawbacks as well (e.g. arenas being kept alive by a single node, which you can fix by making your nodes relocatable in memory, but then you need to use handles instead of pointers, etc, etc).

fulafel · on April 22, 2017

Sounds like it might be interesting to try JIT-style techniques for data here, to help cache footprint and memory usage. Strawman idea: a chameleon container with a opportunistic "fast path" implementation for the common case, and then a generic "deopt" implementation that would kick in on demand, per node. The "fast path" container could maybe have compressed pointers in 16-bit fields too.

wtetzner · on April 20, 2017

Would there be any value in using something like a gap-buffer to store children? I know it's usually used with text, but I can't think of any reason it couldn't be used here. I have no idea though if it would be a good idea.

emmab · on April 20, 2017

The page might not use rules like that though? And the browser could optimize the case where it doesn't.

c-smile · on April 20, 2017

That just one example.

One element removal means invalidation of container's layout. That means you will need to relayout container. That is a scan of children again. And so on.

bzbarsky · on April 20, 2017

An element removal may also mean absolutely nothing if the container is display:none, say...

taeric · on April 19, 2017

At a gut shot, only concern for this would be highly dynamic trees that are not either appending or prepending elements to children. I'd be curious to see benchmarks checking that.

With this, I'm always bemused that there are not more discussions on algorithms for doing tree traversal that don't use a queue or a stack. I think you have everything you need here to accomplish that.

jdmichal · on April 19, 2017

Even prepending... One would have to touch every sibling to update the index. So prepends are now a linear operation on the count of siblings. Prepend a few hundred items to a node, and I'm sure one would see the pain in profiling.

Now, whether that's a realistic use case is another question. But at the same time, you know someone on the Internet is doing it that way...

taeric · on April 19, 2017

I was assuming you'd use a standard deque that could hide the complexity of prepending. I didn't consider that the ordinal actually matters from the parent, though. So great point! :)

And yes, this was essentially my question. What are the realistic numbers in either scenario.

jdmichal · on April 19, 2017

Right. If you don't cache the ordinal, than "next sibling" and "previous sibling" operations have to scan the array.

But it's a relatively easy performance enhancement to have a validity flag for the ordinals, and only rewrite them when a read requires them. That way, write-heavy workloads won't be punished.

c-smile · on April 19, 2017

It was discussed quite a loot actually.

ArrayList (sciter's children list) vs LinkedList (Edge children list) : http://stackoverflow.com/questions/322715/when-to-use-linked...

taeric · on April 19, 2017

I meant specifically in modeling the Dom. Apologies if that wasn't clear.

The specific questions would be how often are the trees of a site dynamic and how often they insert instead of prepend or append.

Edit: And, of course, I'm still interested in why more people don't use stackfree algorithms for DFS.

c-smile · on April 19, 2017

Well, almost all HTML/CSS layout algorithms are O(N) complex so number of immediate children of an element is naturally limited by, say, 500-1000 sub-nodes. Otherwise the whole thing will not move. Note that GMail uses pagination with 500-800 elements on the page.

So we are speaking about vectors of pointers that contain 1000 elements. memmove() performance (used for quite rare operation of insertion of elements in the middle) on modern CPU is comparable with the need to access left/right elements in DL-lists for insertion. Consider cache misses, etc.

taeric · on April 19, 2017

This all makes sense, but I have become ridiculously empirical in my old age. Actual data wins many arguments in ways that surprise reasoning.

chuckdries · on April 19, 2017

Unrelated question: How would you compare sciter to Electron? If I'm seeing this right, your thing is just for UI whereas electron has the whole application running in a browser-like environment, yes?

How is communication handled between business and HTML?

c-smile · on April 20, 2017

Sciter is essentially UI, yes. Engine with plain C API.

You can easily add native methods to be callable from script. So to use any library/backend of your choice. boost, POCO, whatever.

Yet you can use it with C/C++, D, .Net/Mono, Delphi, Go, Rust, Python : https://github.com/sciter-sdk

socmag · on April 19, 2017

When you say pointers, do you actually mean pointers?

I've implemented something like you describe before in a game engine for a managed language.

Instead of using pointers though I used integers throughout.

The actual state was maintained in a flat vector of element blocks and were stable.

The nice thing about integers on a GC'd platform are that they are opaque to the GC so zero time was spent on pointer chasing.

The other nice thing is you could literally dump the entire state to disk or restore pretty much in one go.

Looks good though, and I agree (as you mention further down) that state can often be part of multiple list/trees simultaneously, so it makes better sense to keep them separate.

userbinator · on April 20, 2017

more compact and faster

This is more of a general comment and may not necessarily be true in your implementation, but I see this argument brought up a lot: "vectors are arrays, therefore smaller (because only one pointer is needed per element) and faster than linked lists". However, one must remember that vectors are not statically sized arrays --- they dynamically size to hold their contents, and for the resizing to be (amortised) constant-time, there will on average always be some amount of unused space in the dynamic array, and you still need the (one) pointer to this dynamic array in the node itself, plus additional overhead to keep track of size/used. That's the "smaller" argument debunked. As for faster, that's not always true either, since simple forward/backward scans will require two indirections, one to get the dynamic array's address and another to get the pointer to the node itself. That's two memory accesses to areas which might not be close together at all. Compare this to one indirection (follow the next pointer) for a linked list, that is also made to an address in the vicinity of the rest of the node.

In summary: the linked-list will be faster for forward/backward scans and inserts/deletes and slower at indexing, while the vector will be faster for indexing and slower for forward/backward scans. The linked list will be smaller too.

yorwba · on April 20, 2017

If no insertions or deletions happen between accesses, a vector scan can also be done with only a single indirection, by simply incrementing the pointer to the node. This also means better locality than a linked list, where the next pointer may be close to the current node, while the next node can be anywhere. In a vector, they are simply next to each other.

jdmichal · on April 20, 2017

That's true when you have a `vector<node>`, but the structure was described as a `vector<node*>`. So what's adjacent to your current position is the pointer to the next node, not the node itself.

Which one is optimal depends a lot on the size of a `node`, as moving elements around in the DOM will require memory copies in the first case.

c-smile · on April 20, 2017

struct Node does not need first_child pointer when it is TextNode/CommentNodes/etc. as they are just terminals.

Yet text node here

   <div>text</text>

does not need next/previous pointers as it is lone child.

In case of DOM tree children-vector case takes always less than "children-dl-list" case. With only one exception - element with single child node when size of memory needed is the same as in "children-dl-list".

emmab · on April 20, 2017

> As for faster, that's not always true either, since simple forward/backward scans will require two indirections, one to get the dynamic array's address and another to get the pointer to the node itself.

Can this be stored in a register when you're accessing multiple locations in the array in sequence?

pjako · on April 20, 2017

All nicecly ordered in linear memory for good data locality?

amelius · on April 19, 2017

Isn't this something one could abstract away?

c-smile · on April 19, 2017

It is about this part in the article:

> The new tree structure is simple and straightforward; it uses four pointers instead of the usual five to maintain connectivity: parent, first-child, next, and previous sibling

That straightforward four pointers structure is not the best for HTML DOM. Neither memory wise nor for typical algorithms applied to it.

smitherfield · on April 19, 2017

With all due respect (it looks like some seriously impressive work you've done), how do you know they're using "typical algorithms?" Maybe they've devised something new or uncommon, or significant new optimizations of the "typical algorithms." In general I'd give a company like Microsoft the benefit of the doubt that they know what they're doing, and have considered and rejected alternative approaches — at least when it comes to objectively-measurable technical goals like "create a fast data structure to represent a DOM tree," if not always subjective big-picture goals like "make a good web browser" (even in that case, IE was, for its time, a good web browser — the best — until Ballmer let it languish).

Your data structure looks to me like it's roughly optimal for the case where the DOM tree is generated once and then is never or rarely modified, and nodes have several children relatively frequently and no children relatively rarely. But in the age of single-page-apps and other Javascript-heavy jQuery/Angular/React-type sites filled with DOM manipulation and AJAX, it's more important (especially in the exponentially-increasingly-common pathological case) to make it inexpensive to insert, delete and move nodes within the DOM tree.

With your data structure, every time you create a new `Node`, even one without any children, you have to allocate a new `vector`, which is a lot more expensive than just three pointers. Every time you insert or move a Node to an arbitrary location means an extremely expensive random-access insert into a vector (especially if it triggers a realloc), and every insertion, move or deletion in an arbitrary location requires updating the `node_index` of every sibling (old and new, in the case of a move), which also defeats the advantage of vectors over linked lists (avoiding indirection).

c-smile · on April 19, 2017

"Your data structure looks to me like it's roughly optimal for the case where the DOM tree is generated once and then is never or rarely modified"

Check gmail for example. It has just one list - element that has 100 children. Other elements have from 0 to 10 children.

And that 100 list is normally populated as a whole.

"even one without any children, you have to allocate a new `vector`"

You don't need to allocate anything if vector is empty ...

Vectors's data buffer gets allocated only when you put something in the vector. At least in my case. So empty vector is just one pointer - nothing to write home about it.

"every insertion, move or deletion in an arbitrary location requires updating the `node_index` of every sibling"

On any DOM insertion you need to scan all siblings almost always anyway. For quite many reasons, e.g. to reset styles that could use `this ~ that` or something like `:nth-child(odd)`

hsivonen · on April 19, 2017

What’s better in your opinion?

Someone · on April 19, 2017

I would guess what (s)he writes in https://news.ycombinator.com/item?id=14150628, except that, maybe,

    vector<node*> children;

is a typo, and should be

    vector<node>* children;

Reason: I don't see how the former enforces "e) yet the structure ensures that there are no loops in the tree."

The latter would mean more copying of nodes, but would have excellent cache locality, so I can see it could well be faster, too.

c-smile · on April 19, 2017

It is precisely

   vector<node*> children;

yet this

    vector<node>* children;

makes little sense in DOM case.

hsivonen · on April 20, 2017

Gecko had that but migrated to neighbor pointers, so that's a data point of one real engine moving from a child array to neighbor pointers.

smitherfield · on April 19, 2017

That does seem like it makes more sense; you could then implement `node_index` as a fast, stateless inline method like so:

  constexpr vector<Node>::difference_type node_index() const noexcept
  {

          return parent ? this - parent->children->begin()
                        : -1; // is a root node
  }

c-smile · on April 19, 2017

This does not work, element/node shall have stable address so to be O(1) addressable.

Elements/nodes can participate in multiple lists at the same time. Rendering lists (a.k.a. rendering tree) and things like Document.querySelectorAll() that returns NodeList.

Someone · on April 20, 2017

Can I conclude that your earlier statement "yet the structure ensures that there are no loops in the tree." isn't correct? I could easily create a loop like this:

   this.children[0] = this;

c-smile · on April 20, 2017

That case is easily manageable: you cannot insert/appendNode that has non-null parent. So you need to remove that node from its former parent first.

Problem is that their solution by design contains loops.

So on each tree traversal iteration you will need to access parent:

    Node* get_next_node(Node* that) {
      that = that->next;
      return that == that->parent->first_child ? nullptr :  that;
    }

wtetzner · on April 20, 2017

How could that happen, given that children isn't a pointer?

    vector<node*> children;

Someone · on April 21, 2017

It is C++, and std::vector has an operator[] that makes a vector behave like a convention C 'pointer to data' for array accesses (in fact, the compiler likely will optimize the access to exactly what you would get if the declaration were of a C array of pointers to node)

yuhong · on April 19, 2017

https://news.ycombinator.com/item?id=3233935

timtadh · on April 19, 2017

I case people didn't click through it is an awesome comment by the original author of the IE5 DOM Tree explaining how it was implemented.

EdSharkey · on April 19, 2017

I can see now why DOM manipulation was so catastrophic for performance on IE5-8, all that bookkeeping of character positions must have been a killer once the DOM tree was of any significant size.

Makes me wonder. If edits to DOM nodes on legacy IE were focused on attribute values only, and the string-lengths of those edited attributes never changed, whether one could bypass the bookkeeping and get good performance gains.

aconz2 · on April 20, 2017

I see something like this and think about how much an ensemble approach to the data structure would help. This keeps popping up in places where you have a very general abstraction (like the DOM) and want to support a broad set of operations and use cases (read-only, insert heavy, traversal heavy, etc.) and so you often choose the structure which supports most of these cases pretty well. But you sacrifice a lot of perf by doing so.

What I'm wondering is how well you could do perf-wise by having the DOM be comprised of a heterogeneous set of structures, each of which is chosen based on the history of what operations have been performed on it (ie. we're appending a lot here, let's use a vector of children). This is all similar in spirit and goals of:

- JIT compiling, but for the data structures, and not code

- This work on composing allocators in D https://www.youtube.com/watch?v=LIb3L4vKZ7U

- ML ensemble methods

0x0 · on April 19, 2017

This is interesting and something I've suspected for a long while. I remember struggling big-time with older IE versions giving some really strange results when doing to perform DOM node manipulations on improperly nested HTML (which in turn often came from copy-pasting from Word), erroneous results that definitively hinted towards DOM tree operations actually happening on flat substrings.

tambourine_man · on April 19, 2017

Gotta love this new Microsoft.

A blog post openly geeking out on browser data structures. So good

adrianratnapala · on April 20, 2017

What I like about this article, even more than the technical specifics, is that it is an account of a successful, deep, in-flight refactoring effort.

That is, they avoided the two most common errors: rewriting, and piling technical ever higher.

bsimpson · on April 19, 2017

Almost makes me wish it worked on non-MS systems…

Kudos to the MS Edge team for pushing the Web forward!

angry-hacker · on April 20, 2017

Who knows, maybe one day it will in order to fight Google's monopoly?

Jyaif · on April 20, 2017

What MS is explaining in the article is that they've merely started using the same data structure as all the other engines. It was probably a huge amount of work, but that's more like not dragging the web backwards than pushing it forward.

jstimpfle · on April 19, 2017

Heh, just today I made this https://gist.github.com/jstimpfle/a4f2661f8d042d9862b9fecdd8... as a poor-man's PHP. It looks for made-up tags and calls with the tag attributes as arguments, to make text substitutions (no, not a very innovative idea).

I used a cheap regex to look for these tags instead of parsing the DOM properly to avoid unnecessary allocations and parsing overhead. So after reading the article, I guess I'm still in the 90s!

> TreeWriter::AppendChild(parent, child);

That's soo the right approach. OO must die. And the argument why method calls are (typically) wrong already on a syntactic level can be found in the article.

JustSomeNobody · on April 19, 2017

Interesting. However can we please retire the word modern already? It's as bad as game changer.

ko27 · on April 19, 2017

Not to take anything away from Microsoft's accomplishments, but the latest chrome canary has a speedometer score of 140 (70% faster than Edge).

najajomo · on April 20, 2017

> Not to take anything away from Microsoft's accomplishments, but the latest chrome canary has a speedometer score of 140 (70% faster than Edge).

I don't agree with that statement and can't refute it, but I'm going to mod it down into oblivion anyway ;)

wtetzner · on April 21, 2017

I suspect it was down-voted because it didn't add anything to the conversation.