Because everything went so well when the W3C was left to their own devices...

kuschku · on Feb 6, 2017

Certainly better than bullying cURL into accepting their idiotic URL specification.

Especially considering what they want would rather be specified as a parser for generating URLs from user input, and not to simply demand every tool interacting with URLs to be able to parse malformed URLs.

jcranmer · on Feb 6, 2017

WHATWG didn't bully cURL into URL, people complaining that cURL didn't match browsers did that.

The WHATWG made a decision long ago that its standards would be descriptive (describe how people parse it), not prescriptive (describe how people should write it). Anyone who attempted to write a web browser would have had to reverse-engineer how other browsers treated crap, because it was the only way to get websites to work. If you think the definition of URL was stupid, you should see what they had to do to support document.all: define a new concept in JS to represent the notion of "this looks and acts like undefined but you can actually use it as an object."

kuschku · on Feb 6, 2017

And a descriptive standard is entirely useless.

The entire point of a standard is that implementors agree on a design definition, then implement it, and it stays consistent, forever.

If you look at standards that work and fail, you’ll quickly notice a pattern. Prescriptive are Metric, the A-series of paper, the entire SI system, most open standards, etc. Descriptive are the imperial / US standard, Letter paper, Microsoft "Open" XML, etc.

And before you complain that prescriptive standards are useless because you can never change legacy systems: Several countries have prescriptive language design, with legal authorities how the language has to be used, and they manage to deal with centuries of legacy data.

jcranmer · on Feb 6, 2017

The email RFCs are prescriptive, and totally useless. I actually have evidence, for example, that RFC 2047 is more often violated than not, and I wonder if message/global will ever see usage as a Content-Type. Prescriptive attempts at tackling memory models in languages have generally failed.

Also, your delineation of prescriptive/descriptive is laughable. The imperial standard, letter paper, and OOXML are all prescriptive standards (albeit OOXML is a very badly written one). Prescriptive language standards aren't necessarily well-applied--ask how many people follow the 1996 German spelling reform, or how many use «le hashtag» instead of the "official" «le mot-dièse» (hint: look at the name of the Wikipedia page is).

OOXML did poorly not because it was descriptive but because it wasn't precise. It was an XML rendering of internal Office file formats, and its description of terms were no better than internal documentation. Something like the TNEF format is much closer to a descriptive document, since it spends a lot of time discussing the differences between Outlook 2007, Outlook 2010, and Outlook 2013 at various steps.

kuschku · on Feb 6, 2017

Considering I’m German, the 1996 spelling reform was exactly what I was referring to – I’ve only found a single document this decade which wasn’t in the new spelling.

All other documents I read have been updated in the meantime.

If an entire country can update centuries of material in a few years, why is it so hard to update some simple websites, or, in case that’s not possible, ship a polyfill as Addon?

jcranmer · on Feb 6, 2017

If a new version of, say, Chrome were to drop the Mozilla/5.0 from its UA and break a website because it relied on Mozilla/* in its UA detection (there are STILL sites that do this), who would users blame? Chrome, obviously.

If you're trying to make a new web browser, it's even worse--people won't use it if it breaks the sites that use it. And the web developer would say "it works in all the major browsers, what's wrong with it and why should I spend the time to fix it for your shitty new browser?"

The problem is that the blame for broken sites is universally attributed to browsers, not website developers.

kuschku · on Feb 7, 2017

That’s why you create a new version that is specifically not backwards-compatible with the old one, ensure all tools have appropriate linters already existing, browsers in developer mode or beta/dev versions fail on such sites, etc; and then let websites opt-in to the new version with a special header.

jcranmer · on Feb 7, 2017

That has been tried before, and that has failed. One of the things that HTML5 fixed was the DOCTYPE mess (in fact, the <!DOCTYPE html> represents the minimal string that enabled standards mode in every browser, including IE). In addition to standards/quirks mode (and the concurrent but separate HTML/XHTML issues), Mozilla tried versioning JS (that was ripped out), and IE tried the compatibility mode switches.

It should be noted that later versions of IE eventually gave up and joined the crowd by having its UA string pretend to be Chrome, which pretends to be Safari, which pretends to be Firefox, which pretends to be Netscape.

germanier · on Feb 6, 2017

I know that this is just a minor example, but since 2006 (after a few minor revisions) the new German orthography is pretty widely used, especially by people who's work involves writing.

dragonwriter · on Feb 7, 2017

> The WHATWG made a decision long ago that its standards would be descriptive (describe how people parse it), not prescriptive (describe how people should write it)

WhatWG standards are prescriptive in the usual sense; where they differ from other prescriptive approaches is in being grounded in implementation commitments. A standard no one implements is useless, after all.

A standard that is written by a committee where no one is committing to actually implement the standard is useless.

gsnedders · on Feb 6, 2017

> Especially considering what they want would rather be specified as a parser for generating URLs from user input, and not to simply demand every tool interacting with URLs to be able to parse malformed URLs.

The goal of the URL standard isn't to generate URLs from user input (for example, the browser address bar is out-of-scope), but instead define how <a href="foo"> in HTML (or, rather, more generally, {http://www.w3.org/1999/xhtml}a@href from the DOM), url("foo") in CSS, Location: foo in HTTP, and similar get parsed. There are large parts of this that the web relies on undefined error handling around this, which makes it worth standardising somewhere (as browsers cannot practically drop it, and most web content is targeted primarily at browsers, hence if you want to be compatible with the web you need to be compatible with browsers).

I'll also point out that the WHATWG scarcely exists: as a venue there's almost no formal organisation or high-level plan, it largely working on some shared values, and as a result there's a fair bit of variety between different groups of people working on different specifications where some take those values to further extremes than others.

kuschku · on Feb 6, 2017

> The goal of the URL standard isn't to generate URLs from user input […], but instead define how <a href="foo"> in HTML

Ehm, that’s exactly user input. If it were machine input, there would be a normalized definition, and you’d use that internally.

And you wouldn’t define a URL spec this way, but you’d define a strict URL spec that browsers and all tools should use, and a legacy tool for converting URLs.

Then any tool that doesn’t directly deal with user input can always be sure the URL they get will strictly follow the standard, and only the first tool in the pipeline has to deal with malformed input (which this is)

gsnedders · on Feb 6, 2017

> Ehm, that’s exactly user input. If it were machine input, there would be a normalized definition, and you’d use that internally.

That depends on the definition of "user"; most things I'm used to refer only to the current user of the device as the user, and everything else as untrusted input.

The problem with having a formal definition of the strict subset is that you end up with bugs (often security critical) in almost every implementation because of some case where the conversion produces something not in the strict subset. That's something that's happened with way too many formats.

kuschku · on Feb 6, 2017

Not really.

Usually, in such a situation, you can fail early, and you can log a warning.

The alternative, of trying to fix the developers error with heuristics, almost always ends with worse security-critical bugs.

There’s a reason people advocate for strict typing, and proper errors, and not PHP’s "any unreferenced constant is of type string with its name being its content".