A problem with the vector representation is that it makes some common DOM operat...

c-smile · on April 20, 2017

removeChild and insertBefore on DOM tree are always O(N) operations or at least causing full scan of children later. No matter how children are stored.

Think about rules :nth-child(4) { border:1px solid; } and the like.

bzbarsky · on April 20, 2017

> removeChild and insertBefore on DOM tree are always O(N) operations

It depends. If you insert only one node, I agree you end up doing some O(N) work later in many cases (though not nearly always). If you remove/insert k nodes at the front, in the vector case you do O(kN) work while in the doubly-linked-list case you do O(k+N) work at worst. In the case when k == N the difference is between O(N) and O(N^2).

And yes, this is a problem in practice. We've seen popular web libraries doing this to remove all the kids of an element:

    while (foo.firstChild)
        foo.removeChild(foo.firstChild);

which will be O(N^2) in a vector implementation.

So the real question is what the amortized cost of an insert/remove is. Unless your web page is flushing out layout after every DOM modification (which is sadly too common), you will only redo selector matching every however many DOM mutations. In many cases "however many" is "thousands".

c-smile · on April 20, 2017

    while (foo.firstChild)
        foo.removeChild(foo.firstChild);

Let's assume that there are 100 children. So we will have 5000 a[i] = a[i-1] operations in total. This is really nothing for modern CPU.

But I agree that even this is not good.

That's why I have Element.clear() and Element.prepend|append|insert([array of nodes]) methods.

As in any case calling of DOM method (as any other function ins script) is comparable to 100 of a[i] = a[i-1] in native code.

I mean that such O(N) problems are better to be solved by specialized native functions.

bzbarsky · on April 20, 2017

> Let's assume that there are 100 children.

The cases where I've run into this being a problem involved thousands to tens of thousands of children. At 100 children there's really no problem, I agree.

> So we will have 5000 a[i] = a[i-1] operations in total.

Plus the index updates, unless you do those lazily, right? Again, not a problem at 100 kids, but at thousands of kids the cache misses start being pretty annoying.

> That's why I have Element.clear() and Element.prepend|append}insert([array of nodes]) methods.

Well, sure. Browsers could expose better APIs for this stuff. And web pages could use those APIs. And then browsers may face different tradeoffs in terms of internal representation.

As things stand, though, we don't have those better APIs, and if we added them right now it would take a decade or more for people to use them consistently enough that browsers could optimize based on that...

> As in any case calling of DOM method (as any other function ins script) is comparable to 100 of a[i] = a[i-1]

Last I measured, the overhead of a call to a DOM method (so not counting the work it actually does) in Firefox was on the order of 30-40 clock ticks. a[i] = a[i-1] is likely at least four instructions (two lea, two mov on x86 for example) unless you vectorize things. But probably about comparable in the vectorization case, yes.

> I mean that such O(N) problems are better to be solved by specialized native functions.

In an ideal world, I agree. I wish we lived in that world!

c-smile · on April 20, 2017

They really should use just one call element.innerHtml = "";

to clear element's content. And use doc fragments for massive insertion.

Rest of us shall not pay for someone's inefficient code.

Good wishes of course but still ...

fmap · on April 20, 2017

And just to be clear, the use case where you are doing repeated lookups/traversals of a linked list is uncommon?

If not, you could always go with a hybrid implementation, e.g., using a vector with a changelog and applying the modifications only when you traverse the structure, or using some persistent sequence data structure with high branching degree. Either one would use less memory than a linked list, and frankly, I've never encountered a situation where a performance critical data structure was best implemented as a doubly linked list... The memory overhead for large structures was always prohibitive.

bzbarsky · on April 20, 2017

No, that's common too. Think childNodes traversal.

So you do need something in addition to the linked list to make sure that at least in-order childNodes traversals are O(N), not O(N^2), in common cases.

Memory use is a concern, but not the overriding one. A very large web page (e.g. the HTML spec) has maybe 500k nodes. An extra 4 words of memory per node, on 64-bit, then translates to ~16MB of RAM. If you look at actual memory consumption on that page, you will see that 16MB is not _that_ huge a deal, unfortunately.

The big problem with a doubly linked list is loss of memory locality and the resulting cache misses on traversal. That's something browsers are still working on figuring out, honestly; there is various work around trying to arena-allocate nodes and whatnot, but it has various drawbacks as well (e.g. arenas being kept alive by a single node, which you can fix by making your nodes relocatable in memory, but then you need to use handles instead of pointers, etc, etc).

fulafel · on April 22, 2017

Sounds like it might be interesting to try JIT-style techniques for data here, to help cache footprint and memory usage. Strawman idea: a chameleon container with a opportunistic "fast path" implementation for the common case, and then a generic "deopt" implementation that would kick in on demand, per node. The "fast path" container could maybe have compressed pointers in 16-bit fields too.

wtetzner · on April 20, 2017

Would there be any value in using something like a gap-buffer to store children? I know it's usually used with text, but I can't think of any reason it couldn't be used here. I have no idea though if it would be a good idea.

emmab · on April 20, 2017

The page might not use rules like that though? And the browser could optimize the case where it doesn't.

c-smile · on April 20, 2017

That just one example.

One element removal means invalidation of container's layout. That means you will need to relayout container. That is a scan of children again. And so on.

bzbarsky · on April 20, 2017

An element removal may also mean absolutely nothing if the container is display:none, say...