Hacker News new | past | comments | ask | show | jobs | submit login
Larry Wall's Very Own Home Page (wall.org)
92 points by thunderbong 4 months ago | hide | past | favorite | 58 comments



It is a sad day indeed when the Perl link on Larry Wall's home page results in a 404.


It's been "a sad day" for over a decade.

Here's the last capture on archive.org[1]; there wasn't much there in the first place.

It looks like Larry had better things to do than maintaining that page (...or the page that links to it) :)

[1] https://web.archive.org/web/20100420032605/http://www.wall.o...


It says right at the beginning that site is under construction.


That was a fast 404 anyway.

I forgot how damn fast websites used to be.


> I hate my telephone. Please don't ask for my phone number.

I can relate...


It's... beautiful.

And doesn't chew up CPU nor makes me wait.


Looks just like Perl.


Is Perl chartreuse? I always imagined it to be a light blueish colour. Maybe cornflower blue, or dodger blue, or a light steel blue.


No! Perl is surely beige, like everything else in the 90s.


%7Elarry? Looks like an old trend ( https://jkorpela.fi/tilde.html ) that is no longer common.


It wasn't a "trend". It was Apache's way of automatically mapping a username to his home directory. I feel old.


My 90's ISP homepage URL makes more sense. inreach.net/~myusername

Actually, I miss when ISP's all came with some space for simple web hosting. It was a given that a lot of people would want to make their own sites, not just consume them.


  > It was a given that a lot of people would want to make their own sites, not just consume them.
I wouldn't say it was a given that a lot of people would want to host, but it was a given that people _could_ host.

Then Geocities came along and made the hosting easy, destroying the ISP-hosting market.


And now we have Neocities.


Not just Apache, but POSIX shell syntax! It's called Tilde Expansion[0], so in your dash, bash, whatever, ~USER expands to the home directory of USER. This is the general form of standard "bare tilde" syntax as a stand-in for $HOME.

[0]:https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V...


I am referring to the practice of escaping the tilde.


But still incredibly rad. https://tildeverse.org/


Godamnit, is beautiful. 10/10 click.


Is Larry Wall brat?


Perl, it's so confusing most times when written in Perl (perl perl perl)

Perl, why must it fit on one line when written in Perl (perl perl perl)


I used to visit his personal site to see updates, but It's been awhile since I last looked.

I do however frequently watch this interview with him: https://www.youtube.com/watch?v=aNAtbYSxzuA


GEEK PERL CODE [P+++++(--)$] My tendencies on this issue range from: "I am Larry Wall, Tom Christiansen, or Randal Schwartz.", to: "Perl users are sick, twisted programmers who are just showing off." Getting paid for it!

Thing of beauty.


That's exactly as I remember it from 1999 when I visited it.


Perl 6: We're working on it, slowly but surely...or not-so-surely in the spots we're not so sure... (it’s called raku now)



I didn't know Mr. Wall was competing (or teaming up) with tonsky.me on who blinds the reader faster.


Ah I remember his journal on Kerataconus, helped me through my own struggle.


Larry Wall is Brat.


Thank you, Larry!


Is there a Larry Wall Facebook Wall?


Larry is Wall.


i wonder what he's up to these days.


He's building a bigger keyboard, because perl has finished all the available symbols £_&++()/@!?;:'"*~`••√π÷×∆∆\}{=°^¢$¥€%©™™]]


That looks more like APL.


Would be useful for Rust as well.



i became a master at regex from my perl days in the 90s and early 2000s....valuable skill imo


Regex is great (sometimes), for the writer.

As a team lead for a typical SaaS app, they're banned. I'd rather see a chain of individual string checks than long regex strings that only the author understands, because they're usually brittle and often incomprehensible to anyone but the author.


How is a chain of string checks less brittle and easier to understand? If they are checking for the same pattern, the intrinsic complexity will be the same, the string checks will just add some additional complexity and risk of bugs.


Edited a bit to explain we're just a typical SaaS application. Regex mostly crops up in validations.

Just Google the first result for 'email address regex validation.'

(?:[a-z0-9!#$%&'+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'+/=?^_`{|}~-]+)|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])")@(?:(?:[a-z0-9](?:[a-z0-9-][a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-][a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

How many programmers do you think understand that perfectly at first glance? I've programmed and used regex for decades and can admit, I don't. Is it even correct? Who knows, unless I waste time deciphering both it and the RFC side by side.

I'd much rather have a handful of single checks, preferably commented. As is usually the case, performance is not the primary concern.


> Regex mostly crops up in validations.

I’ve just grepped my codebase for regex matchings, and this is not true. The most common use case is matching a filesystem path or a URL that is known to conform to a schema (e.g. file names prefixed with dates) and extracting parts of the schema from it.

> Just Google the first result for 'email address regex validation.'

That is an abomination and not a good way to validate emails, because, as you say, it’s super complicated and barely understandable. Draw a finite-state automaton corresponding to this regex to see why. Equivalent code written without regex, implementing the same FSA, would easily be >100 LOC and equally incomprehensible.

In practice, it’s better to check whether the string contains an @ and maybe a dot, and that’s it. Sure, you won’t be RFC 5322 compliant, but who cares? Your users are much more likely to make a typo in the domain name anyway than misspell the characters that would render the email invalid. Just send an email and see if it arrives.

All of the regexes in said codebase of mine are simple. The longest is 75 characters and a straightforward one to check for UUIDs; you can understand it at a glance:

    [0-9A-Fa-f]{8}-[0-9A-Fa-f]{4}-[0-9A-Fa-f]{4}-[0-9A-Fa-f]{4}-[0-9A-Fa-f]{12}


Now rewrite the same to a sequence of string checks and show me the code. For a fair comparison you should remove all comments and whitespace as you have done with the above regex.

The problem with the above is not the regex per se, the problem is that the email address grammar is really complex for historical reasons. If you insist on validating email syntactically, you can’t avoid that complexity by rewriting to multiple string checks.

The solution is to use a library or just perform a simpler validation (eg check for a ‘@‘), since a full syntactic validation does not provide much value anyway - the address might still be invalid anyway.


The difference is, individual checks can be commented and referenced to a particular rule or even line in an RFC.

A regex blob is basically 'this is all the rules, RTFM.' And as you mentioned (especially in the case of email validation), they're usually incorrect.


You can add comments to regexes, explaining each part. I believe it is called verbose mode.

> And as you mentioned (especially in the case of email validation), they're usually incorrect.

My point was that the email address might still be invalid despite being syntactically correct, eg if you miss a letter. This is why I don’t understand the obsession with syntax-level email validation. You still need a confirmation mail.

But of course there can be a bug in a regex - just as there can be a bug in imperative string-matching code implementing the same pattern.


From `perldoc perlre`:

> A single "/x" tells the regular expression parser to ignore most whitespace that is neither backslashed nor within a bracketed character class, nor within the characters of a multi-character metapattern like "(?i: ... )". You can use this to break up your regular expression into more readable parts. Also, the "#" character is treated as a metacharacter introducing a comment that runs up to the pattern's closing delimiter, or to the end of the current line if the pattern extends onto the next line.


That alone is hard to document and maintain.

Coupled with auto gen state diagrams, the current and correct RFC 5322 spec and case notes it's far more defensable.

There are some pretty decent RegEx tools about these days.

https://regexper.com/#(%3F%3A%5Ba-z0-9!%23%24%25%26%27*%2B%2...)

^^ Heh. Markup processing error in HN ?? the final ) wasn't captured in the link creation.

See https://stackoverflow.com/questions/201323/how-can-i-validat...

for a working link to the state diagram generator.

Even with a handful of single checks there's still the need to compare those, block by block, to the RFC.

Assuming RegEx is to be used (I'm not intimidated by RegEx's but I'm general not a fan, preferring custom parsers for many things that are hard or impossible with a RegEx) this is a better approach:

https://regex101.com/r/gJ7pU0/1

It's a "live" example that includes a test suite and has a parser that annotates blocks.

The RegEx expression uses a DEFINE for sub clauses to improve clarity.


> I'm not intimidated by RegEx's but I'm general not a fan, preferring custom parsers for many things that are hard or impossible with a RegEx

Good call not to use Regex for things that are impossible to do in Regex! But seriously, a custom parser must have some way to recognize individual tokens. If you distinguish parsing and lexing, what tool do you use for lexing?

Regexes have a particular purpose: matching patterns of characters. I haven’t seen anyone suggest how to do that in a simpler and cleaner way.


It's less about the matching and more about the validation with most of my past applications, IIRC the best RegExp matchers for the current email specification have 99% or somesuch coverage but aren't complete .. there are many examples of data extraction and validation where a regular expression is an imperial tool for a metric job.

Nested data, eg JSON, not a good fit in general, they are weak at balanced tag matching, they suck at validating numeric ranges such as Lat|Long clock time, etc.


Yeah use regex for its purpose (matching character patterns) and don’t use it for things it can’t do. That is just common sense and applies to any tool.

But the argument about the email address validation confuses the tool with the problem to solve. The email address grammar is intrinsically complex, so if you want to validate an email address against this grammar (which I think is silly, but that is a seperate discussion) any validator implementation would necessarily be at least as complex as the grammar. Regex is not the problem here, rather it is the simplest possible solution for a complex problem.


iirc perl from 5.x onwards allowed both whitespace (at the right places) and comments, in regexes. using those could make them a lot more readable.

can't remember but you might have to specify a flag for it.


this is the default in https://raku.org


good to know, thank you.


interesting...you ban anything people typically suck at? At PayPal we banned html and made everyone write XML....turns out we just wrote shitty XML which lead to shitty xhtml :P


> At PayPal we banned html and made everyone write XML

That's gross when XML with its pointless verboseness is actually just a "canonical" SGML subset without tag omission and other short forms, intended for delivery to browsers, while SGML proper has all the authoring features. Goes to show how clueless and prone to koolaid sellers developers were and still are (cf crypto, LLMs).


the idea was you ask 10 web devs how to code a <button>Save</button> you'd get 10 different answers, so we had a <Button>Save</Button> xml tag that generated them all the same. there was only one way to create a button now. It worked until people started adding all these options to to <Button> template that it became garbage again.


Absolutely. Readability trumps all else in a productive team environment.

If everyone had the same 'because people suck at it' attitude, we'd never have evolved beyond asm, if even.


Regex is a high-level domain-specific language. So in this analogy, it is the tedious substring-comparisons in imperative code which is the equivalent to low-level assembler.

Using the right level of abstraction for the problem at hand is key to readability.


well most code i've seen in corporate america sucks balls




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: