Hacker News new | past | comments | ask | show | jobs | submit login

> Back in the day, it was thought that basic HTML was a format to last.

HTML was never intended to be archival. Archival assumes a long term relationship between format and user-agent, but those two things evolve independently.

> Who is going to update these documents in order to make them conforming to future browsers?

You don't update legacy documents stored in an archive. You find a conforming user-agent (appropriately old browser version) to consume them in their intended state.

> Is it worth it?

Yes, HTML is a versioned format. Improvements to the format are welcomed and necessary.




Except, as described later in this thread, WHATWG HTML, the spec that is actually implemented is decidedly not versioned and changes from day to day. It's likewise discouraged from keeping old user-agents. (See e.g. chrome's and firefox' aggressive update and support policies.)

From what I understand, the WHATWG's policy regarding archival is "well yes the format is constantly changing but we'll try REALLY hard to not make too many breaking changes."


To my knowledge the WHATWG specs are the backbone of the W3C specs, but nobody follows the WHATWG specs. Browser vendors prefer to follow the W3C specs precisely because they are versioned and perform against a slower and extremely thorough review process.


That’s not right. Browser implement the WHATWG specs.

That said, validity changes don’t matter to the browser’s ability to render old pages. Changes to remove support for an element entirely are very rare.


This is not (uncontroversially) true. See the long discussion below or on any other W3C post on HN.


I don't need to see a comment thread here to understand the process. I have been following this for 20 years, long before there was a WHATWG.

Additionally, WHATWG lost some credibility when they attempted to redefine the DOM and arbitrarily delete some node types. Granted, most of those types are legacy types not in use by anybody in long time, except for the attribute node type. Browser vendors simply ignored this foolishness.


I’m not sure what you are referring to specifically, but WebKit aims to conform to WHATWG DOM and we check that against Web Platform Tests. We don’t even look at W3C DOM. I believe it’s th same for the other browser engines.


I wouldn't mind some extra information you have on this. When I've spoken to folks at the browser vendors one-on-one, they've talked about following WHATWG, rather than the W3C standard, but usually that was in conversations that were critical of the W3C, so it was hard to tell how ubiquitous that position was.


I am having trouble finding the background information on this. Basically, the WHATWG took the W3C DOM and wildly changed some foundational concepts without a thorough understanding of what those decisions mean.

Here is a very simplified description of this problem years after the fact: https://github.com/whatwg/dom/issues/102

It is important to understand the DOM wasn't created for HTML. The DOM, starting with DOM level 2, was created in parallel with XML Schema. This is evident when reading some of the W3C mailing lists and comparing release dates of W3C publications.

Attribute nodes can be independently walked when walking the DOM. By removing attributes as a node type you break this functionality. You can use this little utility I wrote as a proof: https://github.com/prettydiff/getNodesByType/blob/master/get...

Browser vendors are extremely shy about adopting new technology that makes for breaking changes. They will do so, but you need to have an incredibly strong argument. WHATWG's changes to the DOM had no beneficial argument, except perhaps developer convenience for those developers who cannot figure out DOM walking.

The DOM is a pretty solid technology with regard to extensibility, predictability, and sturdiness. If you maintain a large major browser and somebody came to you with breaking changes and a bunch of weak bullshit for justifications what would you do? Also, imagine if you will, that if you ever challenge the people bringing you this pile of shit they will troll the hell out of you in a very visible and immature way.

The response from the browser vendors was to simply say nothing and ignore them like they were never there. I got into an argument about this with the WHATWG on a github issue once, and wish I hadn't. Ignorance is like a black hole that sucks everything in and it never stops to allow rational signals to escape undamaged.


This specific decision turned out to be mistaken, but W3C makes this type of mistake way more often and doesn't even always fix them. You can see in the record of this issue that the problem was eventually resolved. WHATWG Working Mode has also been updated since this change and would not allow this type of change to be made today without implementor support.

Regardless of issues like this, browsers track WHATWG DOM near exclusively. You can see devs from all of the major browser engines commenting in the issue you linked.


A big difference is that it took somebody new to the WHATWG (many years later) to admit failure and correct the problem very directly. In the past the WHATWG had a severe case of not invented here syndrome and would troll people to death who disagreed with them.

I know from my own conversations with the WHATWG this wasn't something that long time WHATWG members would admit to (or even understand). It was the childishness, perhaps more than anything else, that nobody took them seriously.

> Regardless of issues like this, browsers track WHATWG DOM near exclusively.

I am going to disagree with you there. Perhaps they do now, extremely recently, but historically this is absolutely false.

> You can see devs from all of the major browser engines commenting in the issue you linked.

Yes, everybody participates in the WHATWG. This isn't new. Participation is different than adopting those recommendations back into your software.

Here is what browsers actually implement: https://www.w3.org/DOM/DOMTR and https://www.w3.org/TR/dom41/

It is important to keep in mind that the WHATWG doesn't do a lot of XML work, but the DOM is markup language agnostic. The DOM isn't something created or maintained in an HTML rich vacuum.


Do you work on a browser engine? I do (WebKit). Your claim that browsers actually implement W3C DOM 4.1 is just totally wrong. We don't even read it.

The person who ultimately fixed this problem in the DOM Living Standard is Anne Van Kesteren, who was not even remotely new to WHATWG at the time. The person who filed this issue (Philip) is also a WHATWG old timer.


"never intended to be archival" – HTML originated as an easy to handle stand-alone documentation standard (as a cut-down version of SGMLguid + links/anchors). The entire point of a documentation standard is backwards compatibility. Especially the just-ignore-what-is-not-implemented policy made this very promising regarding future usage, as long as major structural elements were to be honored. (Compare the drop of framesets, menus and manueitems as primary elements to represent structure and hierarchy, or the drop of major phrase elements conveying meaning and emphasis. Also, referring to the recommendation for substitutes, HTML is now not a stand-alone language anymore, but requires additional CSS.)

Opposed to this, HTML was not intended as presentation layer for fancy web-apps. (There had been better around for this in the Hypertext-world, even then.)


> HTML was never intended to be archival

This is about the exact opposite of archival: backwards compatibility. We don’t want to split the web into old web and new web. Having to switch browsers for decade old pages as we encounter them, raising the barrier to entry for that lore of old, effectively sepulchring it from the public.

99% of the web’s users are not going to understand when to switch browsers, how, nor why.


> This is about the exact opposite of archival: backwards compatibility. We don’t want to split the web into old web and new web.

It happens anyways regardless of what people want. The 90s era web doesn't work properly in modern browsers and 90s era browsers don't work with the modern web.

> 99% of the web’s users are not going to understand when to switch browsers, how, nor why.

This also happens naturally. Chrome is the most popular browser and it doesn't come with most operating systems. That is something users must switch to.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: