Hm – I'm not too happy to see most of the original HTML-elements marked "not conforming" and "must not be used", thus preparing for browsers to eventually drop the support. There are still lots of web-sites and valuable information stored and archived in this format. Back in the day, it was thought that basic HTML was a format to last. Who is going to update these documents in order to make them conforming to future browsers? Or are we just dropping a decade of documentation? Is it worth it?
(Consider: Apparently, MS-Word docs or PDF prove longer lived than basic HTML documents! Who would have thought of this?)
I don't really know what process the W3C fork of our work uses for removal, if any, but you can learn more about how features get removed in the (WHATWG) HTML Standard per our working mode:
In short, I think it's important to distinguish between conformance and removal from browsers. Removal from browsers is a big deal and, as per those links, is only done when it's not going to break the web, or when the benefits are very high (e.g. security issues). Removal from being conformant just reflects the evolution of best practices. See also https://github.com/whatwg/html/blob/master/FAQ.md#how-are-de...
<blink> is no longer supported by browsers, but with a few css declarations it works just fine.
Removing support for presentational markup does not mean a loss of information. Browsers will still render tags they don't recognize, and re-applying the styling of those tags is often trivial. (I mention <blink> because it's one of the more difficult, but not terribly so)
As the web evolved, our needs changed. Do we still need the <font> tag?
It's true that you can just use some CSS to make up for the lost HTML feature, but than again you could also rewrite the HTML part.
Forgive me if I'm wrong, but I'm fairly sure that what the OP is trying to say, is that there are plenty of great websites out there, which were developed a long time ago, and for which there is no maintainer to do any work on it. Thus having HTML elements like this dropped, would make the content in a way lost.
Thinking about it some more, users can probably add plugins to add this css automatically, or some browsers might even keep those features in, but still, there will be users that don't know this, I think, resulting in a bad experience.
The "plenty of great websites" were developed long time ago. Having a degree of visual consistency of layout across different browsers was not possible then according to standards.
The content will not be lost. The tags will result in valid elements but the rendering may vary. This has always been a thing to be expected, since legacy elements (pre HTML5) never had uniform rendering and contained quirks.
Should the current/new standard have support for ambiguously rendered quirky elements? Is it even a standard then?
After HTML5 the end result will definitely be the same on most (if not all) layout engines. The standardization as a process requires non-conforming legacy to be dropped.
I may not have expressed myself clearly, but I understood what OP was saying. Never overnight and post, kids.
I was thinking about something like Stylish or the user stylesheet I've been hearing about in Firefox (for their UI IIRC, but still). Inject some global css on older/missing doctypes, and it's probably less than 200 total declarations to handle every older tag. I'd imagine <font> to be the hardest and/or longest, followed by <blink> and <marquee>.
Would be a small extension.
My other point I think I expressed clearly enough, that the loss of presentational markup is not a loss of content in most cases. If the title is in Times New Roman instead of Arial, most of the time it'll just look worse. Unless the content is meta, the presentation is to make things more pleasant to read.
Not sure I'm getting you… An extension to view old pages? This is the worst idea I've heard in a long time. The web is awesome partly because it's backwards compatible.
The <font> tag is still necessary because of HTML for emails and Microsoft’s disappointing decision back in 2007 to regress Outlook to using the MSO renderer/editor, which is worse than IE 5.0.
There are a ton of uses for old-style <font>, <b>, etc tags. CSS is both more verbose and more abstract. When I do not care about the "reusability" of a given code fragment, give me presentational markup tags all day long
Just as a nit, "font-size:10pt" is a lot worse than <font size="2"> from an accessibility perspective. <font size="2"> is "one notch below the user's default size". If the user sets a 20pt font, it's going to be larger than 10pt.
You could use "font-size: small" to get the size="2" behavior.
(Also, "font-family: Whatever", not "font-face: Whatever".)
Note that your CSS example is longer and requires combining two different syntaxes. And for one change, the difference in length and complexity is much more apparent.
> <span style="font-size: 10pt">Blah</span>
vs.
> <font size="2">Blah</font>
Which is easier to remember? Which is more obvious at a glance?
The former. The problem is just that you've learned what the latter means better than the former. I'm the opposite.
Plus, the former is an absolute value where the latter is not, as far as I've been able to tell. I know that first span will always be 10pt font. I have no idea what "2" even means in this context.
That may be true, but regardless of whether pt is a best practice, the point remains that it is still a unit. IMO that puts it ahead of the alternative example.
> Back in the day, it was thought that basic HTML was a format to last.
HTML was never intended to be archival. Archival assumes a long term relationship between format and user-agent, but those two things evolve independently.
> Who is going to update these documents in order to make them conforming to future browsers?
You don't update legacy documents stored in an archive. You find a conforming user-agent (appropriately old browser version) to consume them in their intended state.
> Is it worth it?
Yes, HTML is a versioned format. Improvements to the format are welcomed and necessary.
Except, as described later in this thread, WHATWG HTML, the spec that is actually implemented is decidedly not versioned and changes from day to day. It's likewise discouraged from keeping old user-agents. (See e.g. chrome's and firefox' aggressive update and support policies.)
From what I understand, the WHATWG's policy regarding archival is "well yes the format is constantly changing but we'll try REALLY hard to not make too many breaking changes."
To my knowledge the WHATWG specs are the backbone of the W3C specs, but nobody follows the WHATWG specs. Browser vendors prefer to follow the W3C specs precisely because they are versioned and perform against a slower and extremely thorough review process.
That’s not right. Browser implement the WHATWG specs.
That said, validity changes don’t matter to the browser’s ability to render old pages. Changes to remove support for an element entirely are very rare.
I don't need to see a comment thread here to understand the process. I have been following this for 20 years, long before there was a WHATWG.
Additionally, WHATWG lost some credibility when they attempted to redefine the DOM and arbitrarily delete some node types. Granted, most of those types are legacy types not in use by anybody in long time, except for the attribute node type. Browser vendors simply ignored this foolishness.
I’m not sure what you are referring to specifically, but WebKit aims to conform to WHATWG DOM and we check that against Web Platform Tests. We don’t even look at W3C DOM. I believe it’s th same for the other browser engines.
I wouldn't mind some extra information you have on this. When I've spoken to folks at the browser vendors one-on-one, they've talked about following WHATWG, rather than the W3C standard, but usually that was in conversations that were critical of the W3C, so it was hard to tell how ubiquitous that position was.
I am having trouble finding the background information on this. Basically, the WHATWG took the W3C DOM and wildly changed some foundational concepts without a thorough understanding of what those decisions mean.
It is important to understand the DOM wasn't created for HTML. The DOM, starting with DOM level 2, was created in parallel with XML Schema. This is evident when reading some of the W3C mailing lists and comparing release dates of W3C publications.
Browser vendors are extremely shy about adopting new technology that makes for breaking changes. They will do so, but you need to have an incredibly strong argument. WHATWG's changes to the DOM had no beneficial argument, except perhaps developer convenience for those developers who cannot figure out DOM walking.
The DOM is a pretty solid technology with regard to extensibility, predictability, and sturdiness. If you maintain a large major browser and somebody came to you with breaking changes and a bunch of weak bullshit for justifications what would you do? Also, imagine if you will, that if you ever challenge the people bringing you this pile of shit they will troll the hell out of you in a very visible and immature way.
The response from the browser vendors was to simply say nothing and ignore them like they were never there. I got into an argument about this with the WHATWG on a github issue once, and wish I hadn't. Ignorance is like a black hole that sucks everything in and it never stops to allow rational signals to escape undamaged.
This specific decision turned out to be mistaken, but W3C makes this type of mistake way more often and doesn't even always fix them. You can see in the record of this issue that the problem was eventually resolved. WHATWG Working Mode has also been updated since this change and would not allow this type of change to be made today without implementor support.
Regardless of issues like this, browsers track WHATWG DOM near exclusively. You can see devs from all of the major browser engines commenting in the issue you linked.
A big difference is that it took somebody new to the WHATWG (many years later) to admit failure and correct the problem very directly. In the past the WHATWG had a severe case of not invented here syndrome and would troll people to death who disagreed with them.
I know from my own conversations with the WHATWG this wasn't something that long time WHATWG members would admit to (or even understand). It was the childishness, perhaps more than anything else, that nobody took them seriously.
> Regardless of issues like this, browsers track WHATWG DOM near exclusively.
I am going to disagree with you there. Perhaps they do now, extremely recently, but historically this is absolutely false.
> You can see devs from all of the major browser engines commenting in the issue you linked.
Yes, everybody participates in the WHATWG. This isn't new. Participation is different than adopting those recommendations back into your software.
It is important to keep in mind that the WHATWG doesn't do a lot of XML work, but the DOM is markup language agnostic. The DOM isn't something created or maintained in an HTML rich vacuum.
Do you work on a browser engine? I do (WebKit). Your claim that browsers actually implement W3C DOM 4.1 is just totally wrong. We don't even read it.
The person who ultimately fixed this problem in the DOM Living Standard is Anne Van Kesteren, who was not even remotely new to WHATWG at the time. The person who filed this issue (Philip) is also a WHATWG old timer.
"never intended to be archival" – HTML originated as an easy to handle stand-alone documentation standard (as a cut-down version of SGMLguid + links/anchors). The entire point of a documentation standard is backwards compatibility. Especially the just-ignore-what-is-not-implemented policy made this very promising regarding future usage, as long as major structural elements were to be honored. (Compare the drop of framesets, menus and manueitems as primary elements to represent structure and hierarchy, or the drop of major phrase elements conveying meaning and emphasis. Also, referring to the recommendation for substitutes, HTML is now not a stand-alone language anymore, but requires additional CSS.)
Opposed to this, HTML was not intended as presentation layer for fancy web-apps. (There had been better around for this in the Hypertext-world, even then.)
This is about the exact opposite of archival: backwards compatibility. We don’t want to split the web into old web and new web. Having to switch browsers for decade old pages as we encounter them, raising the barrier to entry for that lore of old, effectively sepulchring it from the public.
99% of the web’s users are not going to understand when to switch browsers, how, nor why.
> This is about the exact opposite of archival: backwards compatibility. We don’t want to split the web into old web and new web.
It happens anyways regardless of what people want. The 90s era web doesn't work properly in modern browsers and 90s era browsers don't work with the modern web.
> 99% of the web’s users are not going to understand when to switch browsers, how, nor why.
This also happens naturally. Chrome is the most popular browser and it doesn't come with most operating systems. That is something users must switch to.
I'm really afraid that this is preparing the final drop of browser support. (We've seen similar in http, where many of the http/1.1 (1997) features, like multipart-http, for-headers, etc. aren't supported by any client for years by now.)
As for MS-Word and HTML: When I finished my thesis in the mid-1990s, I saved it both in the MS-Word version, I used to write it (MS Word 5 for Mac), and HTML (expecting future compatibility). I can still open the Word version, but I may be soon unable to conjure a formatted display of the HTML-version. And I can still display a PDF 1.x...
What do you mean by "browser support"? What functionality do you expect to go away that would prevent you from viewing an old HTML file in a modern browser?
E.g., drop of framesets. Many old documentations use them, as do most websites from the second half of the 1990s (so-called 2nd gen. websites). Without frames, the content can't displayed in context anymore and consistency of presentation is lost entirely.
Oh gosh, that is absolutely true. A major change in presentation when the support is dropped to be sure. On the other hand, what was the level of standardization then? Were there not massive inconsistencies across browsers when you got into fine details of implementation doing frames? Especially parent-child -relations among elements/sets/contexts which is an integral concept in definitive DOM - a grand achievement in standardized HTML format.
In framesets, parent-child relations were absolutely defined and stable, as were the paths between individual frames.
(Current frame is "self" or "window", parent frame or frameset is "parent", and the top most entry point into the hierarchy "top". Moreover, "self", transcending the window context, is also the only reliable reference to the global object, thus also providing a valid reference to the context of a worker. Specifically, it was for framesets that the notion of hierarchy was introduced, which eventually resulted in the concept of the DOM. Some inconsistencies to this concept of strict parent-child relations were actually introduced by early implementations of the iframe-element, which is, BTW, still a valid HTML element.)
That said, there was a small inconsistency with an early subversion of Netscape 3, regarding, whether the frame source would be relative to any current location of the frame or rather relative to the frameset. (But this was an issue for a rather short period of time, two months or so.) A major difference in styling was the implementation of frame borders, if they would be entirely invisible by just specifying `border="0"` (Netscape and others) or, if they required the two attributes `frameborder="0"` and `framespacing="0"` (MS IE). In practice, next to all sites specified both schemes. And jet another, but minor implementation specific detail was the sizing of framesets: While Netscape Navigator supported, like all other browsers, a size specified in pixels, this was internally translated to percents of the total width. Therefor, depending on rounding to integers, the presentation in the Netscape browser could be off by a pixel or two.
(The latter was, in deed, not unusual behavior at the time, just like MS Word and RTF used to translate any measurements internally to "tips" or twentieths of a point.)
This discussion involves details about content referenced from another domain (or source which is not trusted). The implementation seems to be an issue which has finally been solved with a standard. The new implementations are being worked on. It is called CORS and the problems are still hard to solve in practice.
I am super glad for all the hard work put into all this.
Personal MS Word anecdote: Word 2016 can successfully open and render my final year University project report, compiled in 1996 using Word 6.0. It contains a bunch of embedded images, tables and moderately complex diagrams drawn using Visio 2.0.
The report is saved across six .doc files, due to the size limitation of the 3.5" floppy disks we were using back then.
Not really. HTML rendering by email clients has never been especially standards-conformant; the publication of a new HTML standard (especially by W3C, as mentioned elsewhere) isn't likely to affect that.
What's wrong with W3C again? Are we on the same train we were with the ill-fated XHTML 1 strict and XHTML 2.
Anyway what's going on anyway with Google+Microsoft+Apple+W3C, why is there such a big push to HTTPS and HTTP/2, and declaring old HTTP/0.9 and HTTP/1 and HTTPS/1 and HTML5.0 as legacy!? And why is Mail still sent in plain text completely insecure, and no adoption hype to support SMIME/etc? It is beyond fishy. Or is it just pure greed, no one cares about non-walled-garden-open-web (aka everything has to live in LinkedIn/FB/AppStore/PWA) and there is no money in mail?
The reality is that the WHATWG (a) only writes descriptive standards, describing what already exists, usually with pseudocode and prosa instead of ABNF or EBNF (see the URL standard replacement), and (b) only describes something once it’s actually been implemented on larger scale.
On the topic of what standards are supposed to do – prescriptively shape and replace what exists – the WHATWG isn’t useful. WHATWG "standards" are the equivalent of Microsoft Office Open XML, a standards body just taking an existing implementation, defining whatever it does as standard, and doing it so incomplete that the result is useless.
Yes, WHATWG and W3C are doing the best they can do in the current climate (where Google can roll out QUIC and SPDY before even any standard is defined across websites accounting for 6% of global traffic, 65%+ of web browsers, and 85%+ of mobile phones), but this is just misleading. It helps no one to pretend to do standardization work when you don’t actually have any power to decide anything – neither WHATWG nor W3C can actually force, or even ask, Google to change SPDY or QUIC. They’re papertigers.
I work for Chrome, as an editor of the HTML Standard. So let me give you my perspective on this.
In Chrome we ensure that all features we ship to the web go through a public standards process. This allows them to be developed by a collaborative community, including other browser vendors and web developers who would use them. It ensures that if we happen to ship a feature sooner than other vendors, there's a specification and a shared test suite (https://github.com/w3c/web-platform-tests) that allow others to quickly follow. Note that a specification is better than requiring them to read the Chromium source, because specifications are at a higher level that doesn't depend on individual browser architecture details.
In the WHATWG we don't only write descriptive standards. But we do ensure that whatever standards we write, are ones browsers are willing to implement. And we ensure that standards accurately describe how browsers operate, even for legacy features, because that is all part of the mission of allowing browsers to compete on an even playing field and build themselves from scratch without having to go through the kind of costly reverse-engineering that Firefox 1.0 did to catch up to IE6. In practice we've found that algorithmic specs are better for this than BNFs, as it's harder to specify error-handling behavior for BNFs while still staying compatible with the web (i.e. while still producing a standard browsers are willing to ship).
And yes, we're not interested in just creating a standard out of thin air, with no vendor collaboration, calling it "standard", and then hoping some magical power would force browsers to implement. It is indeed much more collaborative than that.
But the fact that we require standards to be developed in tandem with implementations doesn't mean that implementations (such as Chrome) just go ahead and do whatever they want, and we at the WHATWG transcribe it into the spec at some lower level of detail. Instead, the public, collaborative standards process helps to extract out all testable and observable aspects of the feature into a codebase-agnostic description others can use, and provides a forum for them to comment on ideas before any final shipping decisions are made. And, per our working mode (https://whatwg.org/working-mode#changes), changes and additions do require multi-implementer support before they're ready to graduate to a WHATWG Living Standard; proposals not yet at that point are said to be in incubation, and are often developed elsewhere (see https://whatwg.org/working-mode#new-proposals) such as the W3C's WICG.
> In Chrome we ensure that all features we ship to the web go through a public standards process.
Ehm, basically every major feature Chrome has shipped has been shipped before the standard was even discussed. SPDY shipped long before HTTP/2 was even finalized, and QUIC is doing the same. NaCl shipped in the same way, without any standardization, and to this date, earth.google.com depends on it.
In general, your problem is that you only consider browser developers. In the past, the WHATWG has decided to redefine the URL standard, then shame cURL for not following the standard, without ever involving anyone from the curl project in the discussion. The URL discussion affects everything from Android’s IPC system to curl, from industrial machinery to the web. The WHATWG explicitly declared that the URL spec is designed to completely, and exhaustively, obsolete and deprecate any existing URL or URI spec.
Yet, the only people ever contacted about this, and who were given the ability to take part in the discussion, were representatives from the three large browser vendors.
Those are fair counterexamples. Perhaps I should be distinguishing between the Blink team and the rest of the Chrome team. I realize that distinction isn't very important to an outsider, but at least realize that that there's a large portion of the Chrome team that cares very much about the web evolving through an open standards process.
The URL Standard was designed in the open with input from many different constituencies. The cURL author has chosen not to participate, for reasons of his own, but e.g. Node.js, PHP, Google's GURL (used by Android IPC, I believe), and others are quite involved.
The URL Standard didn’t even consider contacting any industrial vendor that relies on them - e.g. SIEMENS. There’s entire industries out there that use these standards, and rely on them to be stable.
The only participants were all either browsers, affiliated with browsers, or a handful of web serving projects.
Other projects that rely on URLs include everything from KDE to Gnome, Microsoft’s OS to the systems used in your car.
Changing a URL standard and only involving web vendors is basically like changing the A4 paper standard and only talking to the Microsoft Office team, the Google Docs team, and HP’s printer team – while entirely ignoring paper manufacturers, envelope manufacturers, the mail companies around the world that will have to ship the envelopes, fax manufacturers that have to build faxes able to fax the new format, newspapers and magazines that have to replace their paper, newspaper shelf manufacturers that build newspaper shelves for newspaper stores, etc.
Most of the time, it’s easy to only think of the web as browsers and servers, but some of the specs the WHATWG touches go through entire industries, sometimes there are millions of companies that have to be notified months or years beforehand to replace their software, update it, potentially even do a recall, and standardize. Not everything moves as fast as the web.
And this entirely disregards the people that are trying to parse the web with HTML parsing, which everyone loves to ignore. And so many other groups of people and companies.
https://github.com/whatwg/url/issues/118 is the curl issue, I believe. As I love curl more than any browser I personally side with it. It has support for ldap urls; curl wins. :)
The W3C version is a sporadically updated, bad-faith fork of the WHATWG version, created to maintain the fiction that it "owns" HTML, which it deems necessary to maintain the organisation's standing (and funding) in the eyes of other organisations and governments.
And yet, we don't get things like that changes page from the WHATWG version. (Unless you want to dig through the whole commit history)
It's absolutely a fiction, but at the same time, this at least attempts to be a standard.
The WHATWG version seems more like a reflection of "oh by the way that's the rules our browsers are following this month. Your's truly, the browser vendors."
If you omit the "Editorial:" or "Meta:" commits, I think it's actually at a similar level of detail as the W3C fork's changes log. (Not completely; scrolling through I do see a number of commits that wouldn't be relevant.) But the W3C fork has only managed to copy-and-paste a small subset of our changes, so indeed, the changes log for the last year of work at the WHATWG will be somewhat daunting compared to the small subset they managed to copy over.
There may be room for someone to compile a higher-level "this week/month/year in the HTML Standard" or similar; before I started working in the WHATWG, that actually used to exist: https://blog.whatwg.org/category/weekly-review (also in very amusing YouTube form: https://www.youtube.com/watch?v=1Bg5BPnmj68). So far we haven't had the bandwidth to restart that, but if you or someone else wants to contribute that sort of thing to the blog or elsewhere, I'd love to help you get started.
But this is kind of my point: As of now, this doesn't exist.
I don't think the commit history works. It doesn't give you any indication about which changes are relevant or irrelevant and it doesn't tell anything about the larger efforts taking place.
Actually, I don't think it would even make sense to create an equivalent of the W3C diff, because there are no versions or other structures to organize the changes around - there is just a constant stream of changes. (Which is kind of the point of the living standard concept after all)
The W3C fork's versions are arbitrary too though (yearly). You could organize a yearly update on what's new in the HTML Standard if you thought that would be valuable to people. It wouldn't change the fact that browsers release new features based on the ever-changing standard every six weeks. But it sounds like at least some people would find it useful.
Personally I'd tend toward weekly or monthly, although I admit that yearly is more likely to generate HackerNews posts ;)
I think of it like WHATWG is the git master branch of the "html" standard, and W3C regularly packages a modified (changing things they disagree with) version of it and "release" it with a version number (HTML 5.x), going through alpha, beta, etc. By the time it's finalized, it's out of date.
I hope this doesn't sound too snarky, but as far as I know, the WHATWG standard is live, consistent and always up to date (including corrections), while the W3C recommendations are outdated snapshots of the WHATWG standard, which are labeled by arbitrary version numbers instead of the snapshot timestamp, for whatever reason.
EDIT: Apparently even that description was too charitable towards the W3C (see gsnedders comments).
They stopped really doing snapshots of the WHATWG standard a while ago when they moved their authoring toolchain away from what the WHATWG document uses, and now just occasionally selectively copy over patches (sometimes incompletely) and make their own changes.
> But is that actually an improvement over the previous situation? (Serious question.)
No, it means we have two increasingly different documents purportedly defining the same things, and when they do copy patches over they've failed to also copy over other dependent patches too on a number of occasions leaving their spec as defined unimplementable.
I know the current manglement isn't explicitly malicious, but this is an atrocious state of affairs.
Practically speaking, the Web is a consortium of corporate foghorns that also happen to collectively be the majority ad-hoc directors of new media (translation: agendas with finance). Cable and daytime TV was the old media, which of course still exists, and social media has become a juggernaut majority of its own beside that.
So, you'd think the actual grassroots on-the-ground parts of a project that is ostensibly defined to be open and free, would actually be made of extremely smart people with straightforward management and as little bureaucracy as possible. Because, you know, the part where everything hits the ground needs to be well-oiled, have no chinks in the armor, and provide a secure foundation of independence.
And yet we have... chaos, infighting, politics and wars over (literally) nothing. And while all that's happening, corporations are progressively nibbling away at the capabilities we have today (to set up websites, to communicate freely) that we take for granted. One day we'll wake up checkmated by some incredibly well-engineered chess move...
Sighs
If the net neutrality thing is repealed, I will be exactly 0% surprised. It'll just be another EME, really.
At this point both W3C and WHATWG are not where innovation on the web is (or should be) happening. It's up to the individual browser makers to innovate. W3C and WHATWG's job should be to document any consensus among browser makers.
It shouldn't be their job to decide how browsers should work, that's the browser makers' decision. (Which happens to be large corporations, for the most part.)
That just gets the browser makers castigated by the tech community. Every time, say.. Google, intents something new, the entirely predictable incoherent screaming starts about how it's another Microsoft IE/ActiveX.
Nevermind the fact that the landscape has changed to the point where that isn't a realistic outcome anymore.
Nevermind the fact that in the instance I'm describing (which was something like WebASM or WebSockets.. it was WebSomething and I can't recall the name), they had submitted their proposals to the standardization groups, with no change on the volume of the noises.
I wish people would decide whether they want browser makers trying New Stuff or they want New Stuff coming from standards bodies only. There are upsides and downsides to either way, but I really don't beleive that BMing browser makers whenever they try New Stuff is even sort of constructive.
> It's up to the individual browser makers to innovate.
This seems to make the most sense because the browser is the end product by which people consume their internet.
It seems to me, they have been and always will be years ahead of the governing bodies that make these part of their "standards" decisions. By the time something finally makes into the spec, we're already onto a dozen new things the browsers are capable of and implementing.
At this point it just feels like the spec is an afterthought, not necessarily keeping up with how fast the industry is changing.
I my opinion one cannot call something a "standard" that changes every few days.
EDIT: In this sense W3C's HTML 5.x can be considered a rather badly authored (cf. other comments here) standard, while what the WHATWG releases is not something that even measures up to a standard, but it is the daily version of how HTML is supposed to be today.
The whole standard is more or less stable. There are some parts of it that describe new technologies that have not yet been implemented everywhere, but at this point those additions are only added after the design itself is pretty stable. Such additions must also have the support of two or more implementers, per our working mode[1].
Why are there no stable snapshots, or versions, of the standard?
In practice, implementations all follow the latest standard anyway, not so-called "finished" snapshots. The problem with following a snapshot is that you end up following something that is known to be wrong. That's obviously not the way to get interoperability!
This has in fact been a real problem at the W3C, where mistakes are found and fixed in the editors' drafts of specifications, but implementers who aren't fully engaged in the process go and implement obsolete snapshots instead, including those bugs. This has resulted in serious differences between browsers.
It's not enough to be stable to be a standard, you also need authority that enforces it. Either because people "respect you" (whatever that means), or because there's a central authority forcing them to implement the standard, people actually implement it. If they don't, then it's not much of a standard.
So, WHATWG is in constant flux, and W3C has about as much authority as I do. _Thankfully_ in practice WHATWG is "stable enough," but just saying that's what "we" consider a good enough standard for something used in creating all sorts of UIs, from trivial to vitally important, is indicative of a bigger problem.
> Just because something changes often doesn't mean it is unstable.
When you do a project contract, you surely want to define the exact standard with respect to which the application is to be developed against, so that one can decide whether the reason for something looking wrong is a browser bug (I can work around it - but it will cost extra money) or indeed a bug in my code that the customer found (i.e. I have to work extra hours for no money because I did bad work).
To be able to decide such questions is a central purpose of existence for standards.
> When I do a project I want to be able to code against the standards from which browsers were developed; that is the WHATWG standard.
I already argued that there is no WHATWG standard, but only a document that changes every few days. Even without this nitpicking: Which of these thousands of versions is the one on which the browser implementation is based on?
This one: https://html.spec.whatwg.org/multipage/ . Contrary to some people's perception here, everything that goes into this is implemented by at least 2 browsers.
Sorry, but you can't print that page and use that forever. I understand that you wish that you could, but you can't. I live in the real world, so rather than reading a snapshot and hoping it stays that way forever, I just read the up-to-date version since that's what is implemented by browsers, not the PDF I saved 3 months ago.
Given that the browsers with respect to which you implement the code change under your feet every 6 weeks, I think it's better the standard keeps pace with them than having it give a misleading impression of what you're developing against.
I hope we can agree HTML is used for text content first and foremost. A format that changes all the time at the whim of an ad company is basically useless for long-term preservation of legal documents, or documents in education, etc. Do you think having the latest web app fad is more important? Especially when the format has been around for 25 years now. "Innovation" on the Web is only happening so that Google can keep an edge in search tech, and for similar reasons.
This just speaks from lack of experience: that's exactly what it means in the world of software. Do you not understand what specification means? "Specific" is even in the word.
You can't compare it to browsing on Amazon, because functionality doesn't just go missing and literally break buying things; functionality doesn't just suddenly get added and people rely on the exact font size and copy of a particular header in the men's clothing department to be precisely 2em and "Men’s Clothing," and now that it's changed to 1.5em and "Men's Winter Fashion" a third-party app can't render the header in an appropriate width size nor find the clothes to begin with.
Roughly, when interpreting qualified names, Chrome is throwing InvalidCharacterErrors when the acid test wants it to throw NamespaceErrors, in situations where you really have both. This leads to two tests failing."
So decide for yourself whether the WHATWG "standard" did breaking changes in the past or not.
The csswg is a W3C working group. It's true that there are sometimes breaking changes if usage is low enough. This is true of the W3C specs as much as it is of the WHATWG specs.
The advantage to following the WHATWG specs is that it reflects how browsers work today, not how they worked a few years ago.
A standard is something like ISO EN DIN A4. It is defined in cooperation with every stakeholder involved, it is specced, it is tested, and a stable definition is created. Everyone builds against this definition, and it works fine. The standard deprecates everything that existed before, and replaces it.
That is a standard. It’s authoritative, basically immutable, and it is prescriptive.
WHATWG "standards" come after the fact, only consider whatever browsers implement, refuse to ever deprecate anything (unless browsers have already deprecated it), and almost always just are "whatever Google Chrome does". That’s a disgusting abuse of the word standard.
WHATWG "standards" are the equivalent of Microsoft Office Open XML, a standards body just taking an existing implementation, defining whatever it does as standard, and doing it so incomplete that the result is useless.
Yes, WHATWG and W3C are doing the best they can do in the current climate (where Google can roll out QUIC and SPDY before even any standard is defined across websites accounting for 6% of global traffic, 65%+ of web browsers, and 85%+ of mobile phones), but this is just misleading. It helps no one to pretend to do standardization work when you don’t actually have any power to decide anything – neither WHATWG nor W3C can actually force, or even ask, Google to change SPDY or QUIC. They’re papertigers.
The WHATWG is for browser vendors. It's in a constant state of flux as new changes get proposed and those proposals get changed through implementation.
The W3C is for web authors. It presents a more stable recommendation and provides advice (based on research) for authors.
> The W3C is for web authors. It presents a more stable recommendation and provides advice (based on research) for authors.
Web authors usually use MDN instead, because it serves that purpose in a much better way. (Note that despite the name MDN is not Mozilla specific but a cross-browser resource, and that Microsoft and Google recently joined MDN.)
> The WHATWG is for browser vendors. It's in a constant state of flux as new changes get proposed and those proposals get changed through implementation.
The W3C recommendation is no different in that regard, they also describe stuff that's not fully implemented in all browsers yet. But the WHATWG version is more up to date, so you'll notice much earlier that the new feature you want to depend on will be abandoned or changed.
> The W3C recommendation is no different in that regard, they also describe stuff that's not fully implemented in all browsers yet.
Note for a W3C document to go to Recommendation there must be two interoperable implementations. Of course, that doesn't mean any browser has implemented any of it, just that someone has implemented each part of it.
The WHATWG specs contain a lot of innovation, but change more rapidly. In many cases WHATWG specs are living documents that gradually evolve and adapt to changes in real time.
Conversely the W3C specifications are fixed to versions and are occasionally patched with updates. The W3C process is incredibly slow and conservative, which frustrates developers on the bleeding edge. Due to the slow process, thoroughness of that process, and formal versioning most software vendors prefer to implement against the W3C publications as more stable or reliable.
WHATWG needs to add a <w3c-please-stop-plagiarising-the-whatwg-html-standard /> tag then gate some useful functionality behind it to see if W3C will dare include it
You can ask someone to stop doing something legal. I'd hate to live in the sort of world where you can't. HN moderators would send the police after you to seize your laptop if you make bad comments. The only way to get your roommate to stop eating your plums would be to charge them with larceny. Every relationship would end with a restraining order, or it wouldn't be over. Failing to turn in a homework assignment on time would lead to a court date. Buying more than ten items in the express lane would get you arrested for fraud.
If that isn't the world you want to live in, let's get rid of this idea that just because I have no interest in the government stopping you from doing a thing by threatening violence (and that's all a license is - a statement that the following activities are not copyright infringement), I'm totally fine with you doing the thing.
Sure but they are using the copyright to insist on attribution, which undermines the argument that they are simply against using the law. If they really didn't care they'd use CC0.
And in the quoted statement they are not saying the W3C should improve their process for forking WHATWG's work. They are saying the W3C shouldn't fork their work at all. So despite their specifically chosen license (with easy to understand layman's summary) are WHATWG against all forking? Or are they simply against the W3C?
Most of your points are addressed by Ian in an old email:
"In the case of the WHATWG specifications, the licenses allow broad re-use,
so that implementors can copy-and-paste text into their comment blocks, so
that tutorial writers can copy-and-paste text into their documentation, so
that experiments we haven't considered can spring up without inhibition,
and so that, if the WHATWG stops being a good steward (like the W3C
stopped being a good steward in the early 2000s), the next group of spec
editors doesn't have to start from scratch." (http://lists.w3.org/Archives/Public/www-archive/2014Apr/0034...)
Yes, they do in fact want to use state violence to insist on attribution. They don't want to use state violence to insist on the W3C going away, but they still want the W3C to go away. That seems reasonable to me.
This change to requiring attribution is actually fairly recent, and was made with some reluctance on the part of us editors, despite eventually agreeing it was the best path forward. See https://blog.whatwg.org/copyright-license-change
No, they are saying anyone is allowed to fork this for any reason, but we’d really prefer the W3C didn't fork this for the reason that they are, because it's confusing and counterproductive.
Whether someone should be permitted to do something is a different issue than whether they should actually do it.
I really wish we'd "simplify" the HTML spec. The "pave the cow paths" approach to allow non-closed tags and mix of various syntaxes has lead to an explosion of complexity. That has regressed into terrible performance and memory hungry parsers.
This was done once - it was called XHTML. It was, effectively, just HTML in XML form. Tags had a single syntax (no implicitly self-closing tags). Documents were required to be well-formed, syntactically, or they would not display.
HTML had tag omission and other minimization features from day one since HTML is based on SGML which formalizes these notions. If by mix of various syntaxes you mean CSS, then I have to agree with you. There never was a need to define a new syntax for item/value pairs; plain markup attributes were and are sufficient for presentation properties.
I wish WHATWG would properly version their work. I don't like the idea of a "living standard" because it leads to checking for individual functionality and feature detection, rather than being able to say, "This is fully HTML 5.x.x compliant."
Regardless of the state of W3C, if I built an embedded renderer based on their specs, I could at least say, "this renderer is based on <http-ref> and link to the recommended spec version. Whereas if I did that with the living standard href, I'd be out of date any time they decided to rename an attribute.
But that's the point - web authors are supposed to use feature detection instead of writing to a particular standard version. It turns out to be a better model for large interfaces. Yes, in theory, you can ask "Is this OS POSIX.1-2008-compliant or not." In practice, it takes a while to e fully POSIX.1-2008-compliant, and so you get autoconf, with its individual feature detections of specific function. Less clean, but way more practical.
If you're writing an embedded renderer, you can always say "This is compliant with the standard as of 14 December 2017." If you're writing an embedded renderer that is being applied to the live web and not just to a fixed set of pages that are also embedded (e.g., you're shipping HTML documentation and a viewer, or a kiosk, or something), you will in fact be out-of-date when the living standard changes. There's no point in saying "I'm compatible with HTML 5.2.0" because the live web isn't targeting 5.2 any more. So you can either acknowledge that, or figure out how to get software updates.
How does one perform feature detection in a static HTML page?
As far as I can tell the only way to author a compatible web page these days is by checking every damn feature of HTML you use against some humongous table like Can I Use? before assuming your audiences' browsers support it.
Compare to versioned specs, where I need simply determine the minimum spec version supported by my target audience (and any exceptions to the spec) and code against that spec.
There is some utility in naming sets of well-supported features...
This is the annoying part. It's a joke that a site like Can I Use needs to exist, and that browser vendors don't really have apt versions of their own compatibility tables.
Going to a third-party website to check to see if something is supported is disgusting.
The problem with autoconf is that it detects every conceivable Unix featuring going all the way back to the 1980s, not that feature detection is itself that problematic.
Right, hence my assertion that well-known names of feature sets are useful. "HTML 5.2" is useful in the same way that "C99" was, because eventually there is a day I can just assume everything in "HTML 5.2" is present in all my targets. If I don't have such a name, if I'm forever at "HTML 5", I'm forced into the "autoconf" scenario of using feature detection forever for everything not in the base specification.
(That the W3C is apparently incompetent at associating feature sets with names is a separate issue.)
It's funny that you use C99 as an analogy. Please list all the compilers you support that support C99. I'll give you a hint--MSVC, Clang, and gcc all don't support C99 fully, and possibly never intend to. It's not just an idle "oh, no one cares about those features; they support it for all intents and purposes": gcc kept its default standard at C89 in part because it didn't support C99 fully.
What you're doing when you say that you assume C99 compliance is you're thinking of the features from C99 that you want to use and relying on that. Admittedly, the generally-unsupported features are very niche. But that means that you potentially have a dozen different ideas of what "we support C99" actually means, and that's before you start asking how reliable an implementation needs to be before it meets the definition of "support." Declaring support for versioned standards is often more problematic than helpful (versioned implementations is a different story).
The real problem with autoconf is that no one removes the unnecessary feature checks and no one audits it to see what's still necessary for the platforms that people intend to support.
"C99 is substantially completely supported as of GCC 4.5 (with -std=c99 -pedantic-errors used; -fextended-identifiers also needed to enable extended identifiers before GCC 5), modulo bugs and floating-point issues (mainly but not entirely relating to optional C99 features from Annexes F and G)." [1]
Sounds fully supported to me, for all practical purposes.
autoconf doesn't detect much of anything by default. A few commonly used, boilerplate macros do a series of test (e.g. AC_PROG_CC, AC_USE_SYSTEM_EXTENSIONS, AC_SYS_LARGEFILE), but for the most part each and every feature test autoconf does was explicitly and individually requested by the author.
The real issue is that people copy+paste autoconf tests from other projects without thinking about whether they're necessary, or even confirming whether they work for their use case. And because people just copy+paste autoconf tests instead of keeping a browser tab open with the (free) POSIX spec when writing their code, most tests people add are for stuff that no longer needs to be tested for (i.e. all the major Unix platforms support most standard POSIX features by default), and lack the tests for non-standard interfaces they actually use.
But there's no easy way to fix such poor development practices. A good start would be if people just stopped using autoconf, as well as libtool, cmake, maven, etc, unless and until it really became necessary. Follow the KISS principle. Keep your build as simple as possible and regularly test your code on at least one platform other than Linux/glibc, such as FreeBSD or OpenBSD, rather than misplacing your faith in overly wrought tooling.
It works the same way on the web. Don't use the latest + greatest feature if you don't need to. Like with performance optimizations, don't add the burden until there's relevant, empirical evidence that it's worth your while in the particular case. Nobody ever magically achieved high performance or strong portability by adopting over wrought tooling before the problems ever presented themselves. Doing so often ends up with the opposite result.
This specification should be read like all other specifications. First, it should be read cover-to-cover, multiple times. Then, it should be read backwards at least once. Then it should be read by picking random sections from the contents list and following all the cross-references.
Ah, hyperbole. I didn't know that humor could be specified.
The relevance it that W3C HTML5 standards are supposed to already be stable everywhere, while WHATWG and the browser is a guessing game of what actually works and behaves the same way everywhere.
I've never seen anyone reference it in that way, which doesn't mean nobody does, but was the basis for my wording of "little actual relevance". (Admittedly, going to either HTML spec is not something that's needed very often for most devs, since most changes happen in other specs (CSS, web platform APIs at W3C, ...) and/or are widely documented outside, but while I've had occasional discussions involving quotes from the WHATWG spec, W3C HTML5 spec hasn't been referenced at all)
For the question "is this supported widely enough", caniuse.com + your local traffic stats is in most cases more relevant than inclusion in some spec or not.
What's the point of removing features such as "menu" from HTML standard? If there are browsers supporting it and webpages using it, would Mozilla (or Google or Microsoft) actually remove those features just because newest standard said so? I mean: marquee was deprecated long ago, yet browsers still render it correctly.
<marquee> has never been part of any HTML standard, ever. It is listed as an obsolete feature in the HTML5 standard for the point of making it obsolete (a weird reason) but not mentioned anywhere else since time began.
<isindex> got removed from browsers, which I personally find kind of sad because that's what I learned in 1995 and I've written a web page that uses it. But it's weird and does nothing that a normal form couldn't do, so the browsers seem to want to deprecate it.
The biggest weirdness about it was that it was essentially a parser macro, not an element. That is, at parse time, it expanded into a form/label/hr/input set of elements into the token stream. Super-bizzare. See the removal patch at https://github.com/whatwg/html/commit/5c44abc734eb483f9a7ec7....
Whether or not to remove a web platform feature from a browser basically comes down to how many people are using it vs. what is the maintenance cost. I suspect that browsers continue to support <marquee> so they can continue to render all those great webpages from the late 1990s properly.
I can't speak for the spec authors, but IMHO, tags should be deprecated and eventually removed when they are deemed to be useless, especially when their functions and/or semantics are covered by another tag, and especially when their use is harmful (or rather, more harmful than beneficial).
In my (very personal) opinion, an HTML tag or attribute, and more generally a feature of any design/development framework, should be considered possibly harmful if it:
- presents possible security problems; for examples, consider some of the points listed here: https://html5sec.org/
- promotes poor usability or accessibility; e.g. interactive tooltips with links or controls in them, for example, are quite difficult to make accessible, and I wouldn't want an HTML <tooltip> tag without a lot of discussion about accessibility
- promotes anti-patterns; e.g., at this point I think <marquee>-style scrolling informational text is an anti-pattern in a web context, since it can the text much harder to read, especially on small screens
Of course, none of these concerns should lead to immediate removal of a thing as soon as they're pointed out, but they should be discussed and considered. It's a cost-benefit analysis: what does this feature actually buy us that isn't easily achievable with other features, what problems is it causing and how severe are they, and are the benefits worth the problems?
As for <menu>, my guess, though I haven't been able to find the actual discussion, is that it was removed because its semantics are somewhat in conflict with <nav>, and probably its most common use was custom context (aka "right-click") menus, which bring a lot of accessibility problems with them. I don't know that I agree with the decision to remove it altogether, since I think its use to semantically identify and group web application controls is very valuable and not covered by any other tags (though I'd love to be corrected), but I do think that context menus, which to me seems like the most common use for the <menu> tag, are a very problematic design element. Again, it's a balance; is it worth the problems it causes? I guess the authors decided it wasn't.
(Just to reiterate, I don't know why <menu> was removed, I'm just guessing. If anyone can find any of the discussions about <menu> and the problems with it, I'd love to read more.)
I haven't looked at the W3C fork of our work, but in the actual HTML Standard (maintained at the WHATWG), menu was not completely removed---just the mostly-unimplemented context menu feature. We left menu as a semantic alternative to ol/ul for menu-like lists.
There's also the case of things like marquee, which are not removed, but just marked as obsolete and something that web developers must not use. (Which in practice means that conformance checkers like https://checker.html5.org/ are required to complain about them; it doens't mean there's some godlike web-developer-enforcement committee going around preventing you from writing code that uses marquee.) Their implementation requirements are still in the spec; see e.g. https://html.spec.whatwg.org/multipage/obsolete.html#the-mar... and https://html.spec.whatwg.org/multipage/rendering.html#the-ma.... (Same for frame/frameset, by the way.)
Agreed. We should never have been putting "apps" on the web in the first place. Giving control from the client over to the server is a terrible idea, and I'm amazed it ever took off.
Where would you draw the line between a "website" and a "web app", would you like to see JS die entirely? Would you like the web to be non-interactive. Genuinely interested.
"Interactivity" can mean anything including hypertext itself.
And what sort of interactivity? Does backend logic count, or only logic in the browser? If only the latter - why does that matter, but not the former? Does any site that uses javascript qualify, regardless of how little?
Hacker News uses javascript, so is it a "web app" and not a "web site?" Would it suddenly become a webapp if the mods hit their heads and decided in a fever delirium to turn the whole thing into a SPA, despite it having the exact same functionality?
In this model, would YC have to publish the static pages of HN on the "static" web but the forum on the "dynamic" web? But what if they cache the threads? Now they're static as well. And having every web developer divide their attention and work between two platforms based on which part of it is "static" and which part is "dynamic" seems needlessly complex and confusing.
I sympathize with the idea - HTML and javascript are terrible for building applications, but if you want the web to only be static HTML files then your "new" platform is going to contain almost every website in existence, including most of the brochure sites, articles and "legacy stuff." Most web apps are also documents, few are strictly one or the other.
It would make more sense to bifurcate the web along WASM, because that will lead to the distinction between HTML and compiled binaries (which, I know, we've already been there with Flash and Java) both in the browser. But even then, WASM is intended to work within the context of javascript and HTML, not necessarily to stand alone.
(Consider: Apparently, MS-Word docs or PDF prove longer lived than basic HTML documents! Who would have thought of this?)