My 90's ISP homepage URL makes more sense. inreach.net/~myusername
Actually, I miss when ISP's all came with some space for simple web hosting. It was a given that a lot of people would want to make their own sites, not just consume them.
Not just Apache, but POSIX shell syntax! It's called Tilde Expansion[0], so in your dash, bash, whatever, ~USER expands to the home directory of USER. This is the general form of standard "bare tilde" syntax as a stand-in for $HOME.
GEEK PERL CODE [P+++++(--)$]
My tendencies on this issue range from: "I am Larry Wall, Tom Christiansen, or Randal Schwartz.", to: "Perl users are sick, twisted programmers who are just showing off." Getting paid for it!
As a team lead for a typical SaaS app, they're banned. I'd rather see a chain of individual string checks than long regex strings that only the author understands, because they're usually brittle and often incomprehensible to anyone but the author.
How is a chain of string checks less brittle and easier to understand? If they are checking for the same pattern, the intrinsic complexity will be the same, the string checks will just add some additional complexity and risk of bugs.
How many programmers do you think understand that perfectly at first glance? I've programmed and used regex for decades and can admit, I don't. Is it even correct? Who knows, unless I waste time deciphering both it and the RFC side by side.
I'd much rather have a handful of single checks, preferably commented. As is usually the case, performance is not the primary concern.
I’ve just grepped my codebase for regex matchings, and this is not true. The most common use case is matching a filesystem path or a URL that is known to conform to a schema (e.g. file names prefixed with dates) and extracting parts of the schema from it.
> Just Google the first result for 'email address regex validation.'
That is an abomination and not a good way to validate emails, because, as you say, it’s super complicated and barely understandable. Draw a finite-state automaton corresponding to this regex to see why. Equivalent code written without regex, implementing the same FSA, would easily be >100 LOC and equally incomprehensible.
In practice, it’s better to check whether the string contains an @ and maybe a dot, and that’s it. Sure, you won’t be RFC 5322 compliant, but who cares? Your users are much more likely to make a typo in the domain name anyway than misspell the characters that would render the email invalid. Just send an email and see if it arrives.
All of the regexes in said codebase of mine are simple. The longest is 75 characters and a straightforward one to check for UUIDs; you can understand it at a glance:
Now rewrite the same to a sequence of string checks and show me the code. For a fair comparison you should remove all comments and whitespace as you have done with the above regex.
The problem with the above is not the regex per se, the problem is that the email address grammar is really complex for historical reasons. If you insist on validating email syntactically, you can’t avoid that complexity by rewriting to multiple string checks.
The solution is to use a library or just perform a simpler validation (eg check for a ‘@‘), since a full syntactic validation does not provide much value anyway - the address might still be invalid anyway.
You can add comments to regexes, explaining each part. I believe it is called verbose mode.
> And as you mentioned (especially in the case of email validation), they're usually incorrect.
My point was that the email address might still be invalid despite being syntactically correct, eg if you miss a letter. This is why I don’t understand the obsession with syntax-level email validation. You still need a confirmation mail.
But of course there can be a bug in a regex - just as there can be a bug in imperative string-matching code implementing the same pattern.
> A single "/x" tells the regular expression parser to ignore most whitespace that is neither backslashed nor within a bracketed character class, nor within the characters of a multi-character metapattern like "(?i: ... )". You can use this to break up your regular expression into more readable parts. Also, the "#" character is treated as a metacharacter introducing a comment that runs up to the pattern's closing delimiter, or to the end of the current line if the pattern extends onto the next line.
for a working link to the state diagram generator.
Even with a handful of single checks there's still the need to compare those, block by block, to the RFC.
Assuming RegEx is to be used (I'm not intimidated by RegEx's but I'm general not a fan, preferring custom parsers for many things that are hard or impossible with a RegEx) this is a better approach:
> I'm not intimidated by RegEx's but I'm general not a fan, preferring custom parsers for many things that are hard or impossible with a RegEx
Good call not to use Regex for things that are impossible to do in Regex! But seriously, a custom parser must have some way to recognize individual tokens. If you distinguish parsing and lexing, what tool do you use for lexing?
Regexes have a particular purpose: matching patterns of characters. I haven’t seen anyone suggest how to do that in a simpler and cleaner way.
It's less about the matching and more about the validation with most of my past applications, IIRC the best RegExp matchers for the current email specification have 99% or somesuch coverage but aren't complete .. there are many examples of data extraction and validation where a regular expression is an imperial tool for a metric job.
Nested data, eg JSON, not a good fit in general, they are weak at balanced tag matching, they suck at validating numeric ranges such as Lat|Long clock time, etc.
Yeah use regex for its purpose (matching character patterns) and don’t use it for things it can’t do. That is just common sense and applies to any tool.
But the argument about the email address validation confuses the tool with the problem to solve. The email address grammar is intrinsically complex, so if you want to validate an email address against this grammar (which I think is silly, but that is a seperate discussion) any validator implementation would necessarily be at least as complex as the grammar. Regex is not the problem here, rather it is the simplest possible solution for a complex problem.
interesting...you ban anything people typically suck at? At PayPal we banned html and made everyone write XML....turns out we just wrote shitty XML which lead to shitty xhtml :P
> At PayPal we banned html and made everyone write XML
That's gross when XML with its pointless verboseness is actually just a "canonical" SGML subset without tag omission and other short forms, intended for delivery to browsers, while SGML proper has all the authoring features. Goes to show how clueless and prone to koolaid sellers developers were and still are (cf crypto, LLMs).
the idea was you ask 10 web devs how to code a <button>Save</button> you'd get 10 different answers, so we had a <Button>Save</Button> xml tag that generated them all the same. there was only one way to create a button now. It worked until people started adding all these options to to <Button> template that it became garbage again.
Regex is a high-level domain-specific language. So in this analogy, it is the tedious substring-comparisons in imperative code which is the equivalent to low-level assembler.
Using the right level of abstraction for the problem at hand is key to readability.