Don't be fooled. The W3C does not even set the standard for HTML5 - it's developed by WHATWG and then periodically the W3C just makes a copy and throws away all the GIT history and erases the acknowledgments section and publishes it to keep themselves feeling relevant.
Not going into politics I personally favor WHATWG’s overall presentation of the spec, but I go to W3C’s rendition for one reason—they version the spec and publish a readable changelog[0] highlighting the key changes between versions.
In contrast, WHATWG’s officially recommended[1] ways of keeping up are commit history (quite messy), Twitter feed (primarily channeling commit history as it seems), and—I kid you not—manual diffing of Git revisions.
The problem with the W3C version is that it isn't even a snapshot of any version of the WHATWG document. The W3C has modified it in some arbitrary ways which don't correspond to any "ground truth" in browser implementations.
A practical example: The W3C "spec" says that the <main> tag must only appear once in a document. WHATWG and browsers agree that multiple <main> tags are perfectly fine, albeit silly.
But in practice that isn't a good example. Given HTML's extreme error tolerance the practical difference is zero. It only makes a difference for authors, who really should consider it an error to use more than one <main> tag.
As an end user/engineer I occasionally refer to either spec when something isn't clear otherwise. It was by referring to W3C and WHATWG specs that I wrote a text-synced audio player a few years ago, for example.
I don’t read the rest of W3C’s spec to be honest. It’s useful to have a brief summary of changes to see where things are heading (and any convenient changelog implies some kind of versioning), but I’d likely be checking MDN or WHATWG for specifics if something catches my eye.
For example, after I saw the new <dialog> tag in W3C’s changelog, I jumped to MDN to find a friendly overview of its usage and a compatibility table (spoiler, the tag isn’t supported by most major browsers).
„Multiple main elements in the DOM, so long as only one is visible to the user.” - so they allow multiple main elements as long as single one is visible, right?
> WHATWG’s officially recommended[1] ways of keeping up are commit history (quite messy), Twitter feed (primarily channeling commit history as it seems), and—I kid you not—manual diffing of Git revisions.
That is mostly meaningful for browser makers and Web authors that want to interact with the development of the HTML specification.
Web developers that simply want to make things are rather encouraged to search MDN and caniuse.com[0], which provide information about the availability and potential quirks of HTML features across browsers, and link to sections of the appropriate specifications.
I've created a SGML DTD grammar for W3C HTML 5.1 validation at [1], and am in the process of adding 5.2. This is of course only for checking and normalizing HTML the markup language, rather than a reference for Web APIs and browser behaviors.
Those are both 3rd party offerings and could disappear at the drop of a hat. It's not a good practice for a standards body to rely on outside parties to write documentation for their work.
It comes down to what you as web author want to accept as "standard". Just that Google, err "browser vendors", write spec texts that are changing all the time for over ten years now doesn't mean it's accepted by authors or the general public. If anything, the W3C has been handed down HTML spec work back in 1996 or so by IETF, when before HTML was spec'd in the form of RFCs with a proper DTD grammar for it's vocabulary.
Anyway, I don't quite get the W3C hate, and the naive belief in WHATWG as the altruistic HTML authority when the motives and process for WHATWG are intransparent, and the Web has clearly degressed into a monopoly during WHATWG's reign, all the while having produced nothing to stop the Web's degression into a monopolistic tracking and ad shit show.
I used to hate the W3C for wasting everybody's time with XHTML2, which was pretty much known to be dead on arrival from day one because browsers were barely even supporting XHTML (no, accepting XHTML and treating it as HTML tag soup doesn't count as support). About the only browser fully supporting XHTML was Firefox and authors hated it because the spec also demanded a yellow page of death if the XHTML had any syntax errors.
WHATWG promised to end this ivory tower nonsense by putting the vendors before academic purity and authors before vendors and users before authors. This sounded like a great idea, but sadly recently it seems that Google has become the de facto ruler of WHATWG and "users" means "users of the Google Chrome browser", putting vendor interests above all.
Others elsewhere have already decried WHATWG's flippant attitude to accessibility, ignoring established screenreader best practices and advice from accessibility experts. WHATWG is not altruistic. If W3C is too academic, WHATWG is too commercial.
"About the only browser fully supporting XHTML was Firefox and authors hated it because the spec also demanded a yellow page of death if the XHTML had any syntax errors."
And this, kids, is why the Internet is, and always will be, such a mess. That was, and always will be, the correct approach to pages with syntax errors. Everywhere else trying to silently 'fix' syntax errors was soundly rejected as a source of innumerable bugs, but on WWW they are treated like line noise.
Sure, this is a case where change could have only happened if everybody else joined in at the same time. If you're the only browser to correctly implement the syntax requirements of XML for XHTML:
* all existing sites are broken because they're not XHTML but HTML
* any "working" XHTML page might break because the author didn't test in your browser
* any CMS-based website might break randomly because a content writer used broken markup or used the WYSIWYG editor in a non-compliant browser
* any CMS-based website might break randomly because the developers forgot about some edge case that results in bad markup
Not to mention that even if you somehow manage to radicalise every other browser vendor to the point where they implement XHTML with yellow pages of death (and also remove HTML support to force authors to migrate) you still only have well-formedness guarantees but the markup itself might still be nonsense or semantically invalid.
But pointing and laughing is frankly disingenuous. Though the problem is inherently unsolvable (just ask any programmer using a typed language whether they trust code just because it compiles without errors) not having certain checks at the infrastructure level doesn't prevent developers from writing correct and maintainable code if they chose to and understand how.
The reason the web was so full of shitty code for most of its existence is that "real developers" didn't take it seriously and refused to learn the technologies. So the majority of code was written by people who either didn't care to understand what they were doing or had no programming experience (because "real programmers" deemed it unworthy of their expertise).
And finally, the reason WHATWG's HTML revision was so well-received was primarily that it also defined appropriate error handling for real-world situations. The W3C spec on the other hand ignored the possibility of errors, leaving these decisions up to the vendors (who had a vested interest in making sure all sites "work").
"Sure, this is a case where change could have only happened if everybody else joined in at the same time."
If the US wasn't taken over by the free market capitalists for the last 30 years, this would be a solved problems. You can't use a non-conforming browsers to do online shopping. Done.
"just ask any programmer using a typed language whether they trust code just because it compiles without errors"
This is misconstruing the benefit. This prevents a wide class of errors, and a good type system _assists_ in providing generally good designs. Other than the Haskell people not many claim it magically produces correct code.
"not having certain checks at the infrastructure level doesn't prevent developers from writing correct and maintainable code if they chose to and understand how."
Neglecting such checks is a fairly good indicator of a bad programmer, though. It's like someone might manage a landing while muting the tower, disabling instruments and putting on a blind fold, but shouldn't expect to keep their job afterwards.
"The reason the web was so full of shitty code for most of its existence is that "real developers" didn't take it seriously and refused to learn the technologies."
And frankly, it was a pretty sensible choice, it was pretty clear that the politics of the time were stacked up against them. The same sense that tells people to use the safety mechanisms tells them to not fight losing fights.
"And finally, the reason WHATWG's HTML revision was so well-received was primarily that it also defined appropriate error handling for real-world situations."
No, WHATWG was well received because of a better, more approachable documentation style, significantly better marketing, and good short-term results to reinforce that marketing. W3Cs complete paralysis didn't help.
The choice wasn't unconstrained tag-soup markup vs XML. HTML was originally spec'd using SGML, and SGML is a proper superset of XML. However, reporting the exact place of an error in a large SGML document in the presence of tag inference/omission is challenging or outright impossible.
So W3C spent a significant amount of work to bring forward XML as a simplified SGML subset starting in 1997, with the intention that XML eventually replaces SGML as the basis for HTML. Obviously that didn't happen; but that's no reason to give up on structured markup altogether. SGML is perfectly capable to validate and canonicalize almost all HTML 5 documents (save for a few not-so-well-thought-out design decisions from the Hickson era). The same can't be said for the "official" HTML5 validator; even the specification text itself contains content model errors.
Why blame the W3C for developing XHTML 2 instead of blaming the browsers for not implementing it?
XHTML 2 was the way to the Semantic Web. The point was to get rid of JavaScript, extend HTML and CSS and eventually replace them by XSLT and semantic open machine-readable (XML) data formats. The outcome would have been for web browsers to become mere document readers and everything else to be done with native applications able to read and work with interoperable formats. This would have given authors less control, browsers less power (and market share, if web pages did break), and advertisers less revenue.
HTML5 was the way to the web as a virtual machine for web applications, which gave authors increased control, advertisers full reign over users' web browsers, and an ever more complicated and resource-hungry web platform[1] tolerant of any web page however bad its code.
In hindsight the theoretical purity would have had real benefits for real users.
From the HTML Design Principles:[2]
> In case of conflict, consider users over authors over implementors over specifiers over theoretical purity. In other words costs or difficulties to the user should be given more weight than costs to authors; which in turn should be given more weight than costs to implementors; which should be given more weight than costs to authors of the spec itself, which should be given more weight than those proposing changes for theoretical reasons alone. Of course, it is preferred to make things better for multiple constituencies at once.
Theoretical purity is listed at the end as if theoretical purity had nothing to do with the users' interests. The wording makes it seem as if users cared about nothing but "costs or difficulties" (think about the poor users who will have "difficulties" if web pages break!).
Nowadays the W3C (EME) and WHATWG are both enmeshed in commercial interests and the organizations left that publish web and document standards are OASIS (which publishes the OpenDocument and DocBook standards) and the IETF.
WHATWG is now led by a Steering Group of the four major browser engine vendors, and for some time has had a rule that every new feature in HTML needs a test suite and implementation commitments from multiple vendors before it can go in. It’s definitely not right to claim it’s controlled by Google.
They can have all the formal procedures that they want, but none of them are legally committed to implementing the WHATWG spec. So it ultimately comes down to power, your share of the browsing market[1]. Everything else is ultimately a facade, that will crumble if it diverges too far from the underlying reality.
[1] The power of the different vendors isn't a straightforward function of market-share of course, you have to factor in: shares on the different form factors, shares that are locked in (e.g. WebKit on iOS, Blink on ChromeOS), spending habits of the different user bases, etc.
Oh man, how is this possibly legal? Can the WHATWG license their work in such a way as to disallow this, if you want a fixed version number to test against (which doesn’t actually make sense, as there is no reference implementation, just various browser vendors that hopefully converge on and agreed implementation) you can just fork your own version of the WHATWG html spec, which will be more inline with browsers and have further bug fixes and corrections.
WHATWG doesn't care about this because honestly its an organization where the browser vendors can tell you what they will implement. Having W3C do this doesn't influence any of the decisions that they make.
WHATWG Living Standards are now licensed under CC-BY 4.0,so if the W3C continues to copy, it needs to provid appropriate acknowledgement. The lic mes does allow copying by design though.
Why? WHATWG doesn't make money with the spec itself, rather want it spread to wider audience, so more are informed of the upcoming changes for smoother deployment.
Liberal spec licensing was a deliberate goal of the WHATWG: remember that most of the WHATWG specs have been written from scratch rather than derivatives of the existing W3C specs (which W3C host organisations hold the copyright of).
It ultimately gives a further constraint on relevance: be relevant, or someone will fork your spec and implementers will follow.
WHATWG doesn't make a standard for HTML either. It hosts an ongoing conversation between browser vendors.
Someone creating a standard from that conversation is useful, although I do agree w3c's approach isn't ideal. Personally I'd suggest HTML5 "levels" where level 1 would be fixed in stone (until HTML6) and higher levels are considered increasingly unstable.
They already tried that in CSS with browser vendor prefixes (-moz, -webkit, ...) but unfortunately temporary features tend to be used forever so it did not work well.
Browsers need to have maps from vendor prefixes to standardized way of doing things. If an upgrade path isn't possible, the page needs to throw a warning back to the originating server in a GET request at a well known endpoint.
Google doesn’t quite have that power - some of the things they have pushed have failed to gain traction, sometimes for very good reasons. For example, Object.observe was a proposal that Google themselves eventually unshipped from the Chrome after the JS ecosystem roundly rejected adopting the paradigms it enables, most notably for React’s unidirectional data flow (it always felt like a proposal meant specifically to optimize Angular). Google also backtracked on Web Components, choosing to revise the spec after some ideas became awkward when combined with the emergence of ES modules.
The other browser vendors play an incredibly meaningful role, as well as the frontend web developer community at large.
I think this fact demonstrates why EME was implemented in the first place. One could argue that Mozilla is there to fight against DRM in the web, but fighting against three powerful organizations is not something to be desired.
I already suggested basing the W3C spec on the web developer edition of WHATWG HTML as a compromise. In retrospect, the full spec was never a good fit for a W3C REC in the first place. (you may remember the 2022 prediction)
That’s the {amazing|horrifying} nature of this: Web Standards are all descriptive. The standard is only written after the majority of browsers already shipped it, and enough major sites rely on it.
(Of course, this means by the point that you actually get to have a discussion about standards, you can’t really change anything more, and all standards committees, from WHATWG to W3C, are just rubberstamping whatever the browser vendors want)
This is not correct. Here are some common scenarios:
1. A CSS feature is in most cases specified before any of the browsers implement it.
2. Often, a HTML feature is specified (i.e., added to the HTML Standard) at the time development on it starts in a browser. This is because most browser vendors want to make sure that what they implement is in the spec, and they have people who actively work on the spec for that reason.
3. Nowadays, when a web API is proposed by a browser vendor, they write the proposal down somewhere (usually on GitHub). Sometimes it’s a formal spec, other times it’s just an “explainer.” There are probably plenty of web platform features that only have such an informal specification, but that’s fine for smaller features, I think.
In any case, there is always a discussion and some written document. Browsers no longer implement web platform features behind closed doors (although Apple still, in a few cases, starts the discussion a few months before shipping the feature, but that’s more an exception than a rule).
I can recommend to read this comment chain as well, IMO it shows the arguments of both sides very well (and avoids us having to re-argue this every time).
https://html.spec.whatwg.org/multipage/
https://stackoverflow.com/questions/6825713/html5-w3c-vs-wha...