> Very often we have to deal with documents that only use a subset of HTML and t...

hakfoo · on May 9, 2021

One of the situations I've seen is roughly: System is up: returns XML or JSON or pretty-format-of-the-week System is down: It returns a HTML IIS error page.

In the second case, all you might want is to extract the content of the first <h1> tag out of that error page. That's predictable enough of a task that a Regex might be able to handle it, especially if at that point you've already iven up on a full success and you're just salvaging a prettier error message than "system error".

yarcob · on May 10, 2021

Exactly. In cases like this using an HTML/XML parser instead of a regular expression won't make a difference at all.