Being flexible about what markup is accepted has meant the web could gain new features and gracefully degrade, and has made it more fault-tolerant. It's not at all a failing.
Compare that to JavaScript, which will happily fail if you use new syntax or a missing function, and thus web pages which rely on JS often show up as just a full screen of white when something goes wrong, which it frequently does. That's not to say JS should be as flexible as HTML is here, but it provides an interesting contrast.
There's nothing wrong with some tolerance, like ignoring tags that it doesn't know. But if the syntax is wrong, it shouldn't try to fix it or guess what the user meant, just display an error. Accepting invalid syntax means all HTML parsing becomes vastly more complicated. Which creates room for bugs, exploits, and unexpected situations like OP's post.
> But if the syntax is wrong, it shouldn't try to fix it or guess what the user meant, just display an error. Accepting invalid syntax means all HTML parsing becomes vastly more complicated.
Why push the complexity onto the user? Someone who just wants to make a working website doesn't care about your pedantry.
Do they want their page to fail to render entirely when PHP outputs a warning? Do they want their website to be completely broken because they forgot to convert some of their text from Latin-1 to UTF-8 before pasting it into the document? Should we really expect them to have to modify their blogging software to validate custom HTML snippets, lest the entire page become unusable? Will they be pleased when their style of code falls out of favour in future, is deprecated, and then their page doesn't work at all later?
Moreover, strictness can backfire when you have such a diversity of implementations.
> Which creates room for bugs, exploits, and unexpected situations like OP's post.
The OP is not so much an unexpected situation as a carefully engineered one that's completely within the constraints HTML sets.
> Do they want their page to fail to render entirely when PHP outputs a warning?
Displaying warnings is fine. Invalid XHTML, less so.
> Do they want their website to be completely broken because they forgot to convert some of their text from Latin-1 to UTF-8 before pasting it into the document?
The encoding of the content has nothing to do with the document markup.
> Moreover, strictness can backfire when you have such a diversity of implementations.
On the contrary: this prevents subtle bugs in the interpretation of invalid data by different implementations.
>> Why push the complexity onto the user? Someone who just wants to make a working website doesn't care about your pedantry.
Because this approach has historically resulted in people who "just wanted to make a working website" making websites that only work in specific browsers (or worse yet, specific versions of specific browsers on specific platforms). And then those sites stuck around and infrastructure got built around them that made them hard to fix.
We've spent the best part of 00s fixing that mess, and there are still some pockets that haven't been properly cleaned up. If that's not a lesson to learn from, I don't know what is.
> Compare that to JavaScript, which will happily fail if you use new syntax or a missing function, and thus web pages which rely on JS often show up as just a full screen of white when something goes wrong, which it frequently does.
Isn't that more due to failure to handle exceptions and display errors to users?
Compare that to JavaScript, which will happily fail if you use new syntax or a missing function, and thus web pages which rely on JS often show up as just a full screen of white when something goes wrong, which it frequently does. That's not to say JS should be as flexible as HTML is here, but it provides an interesting contrast.