Hacker News new | past | comments | ask | show | jobs | submit login

A problem with the vector representation is that it makes some common DOM operations (removeChild and insertBefore) O(N) in the number of children. This can be a major problem in some cases. Gecko actually used to have that sort of representation and we have been trying to move away from it, for precisely that reason.

If your DOM is more static, I agree it's a better representation.




removeChild and insertBefore on DOM tree are always O(N) operations or at least causing full scan of children later. No matter how children are stored.

Think about rules :nth-child(4) { border:1px solid; } and the like.


> removeChild and insertBefore on DOM tree are always O(N) operations

It depends. If you insert only one node, I agree you end up doing some O(N) work later in many cases (though not nearly always). If you remove/insert k nodes at the front, in the vector case you do O(kN) work while in the doubly-linked-list case you do O(k+N) work at worst. In the case when k == N the difference is between O(N) and O(N^2).

And yes, this is a problem in practice. We've seen popular web libraries doing this to remove all the kids of an element:

    while (foo.firstChild)
        foo.removeChild(foo.firstChild);
which will be O(N^2) in a vector implementation.

So the real question is what the amortized cost of an insert/remove is. Unless your web page is flushing out layout after every DOM modification (which is sadly too common), you will only redo selector matching every however many DOM mutations. In many cases "however many" is "thousands".


    while (foo.firstChild)
        foo.removeChild(foo.firstChild);
Let's assume that there are 100 children. So we will have 5000 a[i] = a[i-1] operations in total. This is really nothing for modern CPU.

But I agree that even this is not good.

That's why I have Element.clear() and Element.prepend|append|insert([array of nodes]) methods.

As in any case calling of DOM method (as any other function ins script) is comparable to 100 of a[i] = a[i-1] in native code.

I mean that such O(N) problems are better to be solved by specialized native functions.


> Let's assume that there are 100 children.

The cases where I've run into this being a problem involved thousands to tens of thousands of children. At 100 children there's really no problem, I agree.

> So we will have 5000 a[i] = a[i-1] operations in total.

Plus the index updates, unless you do those lazily, right? Again, not a problem at 100 kids, but at thousands of kids the cache misses start being pretty annoying.

> That's why I have Element.clear() and Element.prepend|append}insert([array of nodes]) methods.

Well, sure. Browsers could expose better APIs for this stuff. And web pages could use those APIs. And then browsers may face different tradeoffs in terms of internal representation.

As things stand, though, we don't have those better APIs, and if we added them right now it would take a decade or more for people to use them consistently enough that browsers could optimize based on that...

> As in any case calling of DOM method (as any other function ins script) is comparable to 100 of a[i] = a[i-1]

Last I measured, the overhead of a call to a DOM method (so not counting the work it actually does) in Firefox was on the order of 30-40 clock ticks. a[i] = a[i-1] is likely at least four instructions (two lea, two mov on x86 for example) unless you vectorize things. But probably about comparable in the vectorization case, yes.

> I mean that such O(N) problems are better to be solved by specialized native functions.

In an ideal world, I agree. I wish we lived in that world!


They really should use just one call element.innerHtml = "";

to clear element's content. And use doc fragments for massive insertion.

Rest of us shall not pay for someone's inefficient code.

Good wishes of course but still ...


And just to be clear, the use case where you are doing repeated lookups/traversals of a linked list is uncommon?

If not, you could always go with a hybrid implementation, e.g., using a vector with a changelog and applying the modifications only when you traverse the structure, or using some persistent sequence data structure with high branching degree. Either one would use less memory than a linked list, and frankly, I've never encountered a situation where a performance critical data structure was best implemented as a doubly linked list... The memory overhead for large structures was always prohibitive.


No, that's common too. Think childNodes traversal.

So you do need something in addition to the linked list to make sure that at least in-order childNodes traversals are O(N), not O(N^2), in common cases.

Memory use is a concern, but not the overriding one. A very large web page (e.g. the HTML spec) has maybe 500k nodes. An extra 4 words of memory per node, on 64-bit, then translates to ~16MB of RAM. If you look at actual memory consumption on that page, you will see that 16MB is not _that_ huge a deal, unfortunately.

The big problem with a doubly linked list is loss of memory locality and the resulting cache misses on traversal. That's something browsers are still working on figuring out, honestly; there is various work around trying to arena-allocate nodes and whatnot, but it has various drawbacks as well (e.g. arenas being kept alive by a single node, which you can fix by making your nodes relocatable in memory, but then you need to use handles instead of pointers, etc, etc).


Sounds like it might be interesting to try JIT-style techniques for data here, to help cache footprint and memory usage. Strawman idea: a chameleon container with a opportunistic "fast path" implementation for the common case, and then a generic "deopt" implementation that would kick in on demand, per node. The "fast path" container could maybe have compressed pointers in 16-bit fields too.


Would there be any value in using something like a gap-buffer to store children? I know it's usually used with text, but I can't think of any reason it couldn't be used here. I have no idea though if it would be a good idea.


The page might not use rules like that though? And the browser could optimize the case where it doesn't.


That just one example.

One element removal means invalidation of container's layout. That means you will need to relayout container. That is a scan of children again. And so on.


An element removal may also mean absolutely nothing if the container is display:none, say...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: