is a total strawman, needlessly obfuscated. How about writing it like this:
/^[0-9a-z._%+-]+@[0-9a-z.-]+.[a-z][a-z]+$/i
which, while "scary looking", is at least immediately readable by anyone who knows even the basics about REs. If the argument for "verbose REs" is valid, it ought to stand up at least a typical standard RE.
Also, it's not clear that "letter" and "[a-z]" mean the same thing. Does "letter" include uppercase? Does it include non-ASCII letters like "[[:alpha:]]" does? Don't forget the weird collation behavior "[a-z]" sometimes encounters.
> which, while "scary looking", is at least immediately readable by anyone who knows even the basics about REs
Nope, I'm mostly a DB guy very fluent in SQl and I use regex like two dozen time a year.But every time I nead to write something not trivial I must run to a regex cheatsheat website and spend long minutes trying to figure shit out.
It's not that I'm dumb and taking a MOOC about regex is definitely on my todo list... It's just that I haven't found the damn time yet to learn monstrosity and exceptions of regex.
And this is especially painful coming from PostgreSQL which have a good debugger and a clear syntax (even for non standard functions).
FWIW I'm doing the Accessing Web Data part of the Python for Informatics course at Coursera. First part of that course is regex and after one lesson (~15 mins) it covers enough to read the above expression.
I was already cognisant with regex, so I'm perhaps biased, but a simple email-like search seems easy for a novice to read.
Perhaps you have a particular block when approaching regex. The book by Charles Severance that goes with the course (above) is freely available online.
"Also, it's not clear that "letter" and "[a-z]" mean the same thing"
"number" and [0-9] are even worse. That should have been called "digit" and, as another commenter already pointed out, in the age of Unicode, it still is confusing.
As to this attempt at simplifying regex writing and reading: nice try, but I think it needs more work. Apart from the Unicode thing, there's the fact that "letter" only is equivalent [a-z] because of the 'case insensitive' flag.
I think I would go for something that's less grammatical English and more programming language like (alignment of the colons optional)
Start of text.
1 or more : digit, lowercase or one of ._%+-
Literal : @
1 or more : digit, lowercase or one of .-
Literal : .
2 or more : lowercase
End of text.
All: case insensitive.
My default would be to have 'lowercase' mean the Unicode character class. 'ASCII lowercase' would handle [a-z]
Adding capture groups, look ahead and look behind, comments, etc. is left as an exercise to the reader (they probably would make this look very ugly)
There's also the issue of nesting, like in this botched attempt to write a regex for URLs:
One or more of:
Once : letter or underscore
One or more : letter, digit or underscore
Separated by: /
Optional:
Literal: ?
One or more:
One or more : letter, digit or underscore
Literal : =
One or more : letter, digit or underscore
Separated by: ,
Of note here is that I think we need to digress from regexes a bit by introducing things like 'Separated by'. Without it, you often need to repeat potentially long phrases (programmatically building your regex can avoid that, but I think you still would need a serialization format, and I also think it makes sense for that to not use a full fledged programming language)
Thinking of things of that complexity, I'm starting to think it would be better to have people write a BNF grammar.
I remember learning that \d was not the same as [0-9]. \d is 'digit in any language, not just 0 to 9', so it'll activate on digits other than Arabic numerals.
The SRL documentation doesn't make it clear if they mean 'number' to be 'only numbers 0 to 9' or 'any digit in any language'.
Also, it's not clear that "letter" and "[a-z]" mean the same thing. Does "letter" include uppercase? Does it include non-ASCII letters like "[[:alpha:]]" does? Don't forget the weird collation behavior "[a-z]" sometimes encounters.