As someone who used to be very pro-self-taught programmers, I changed my mind af...

fivedogit · on Jan 22, 2015

I appreciate the thoughtful response and I don't really disagree with you on any of this. In fact, if I were interviewing for positions, I'm not sure I'd handle it any other way -- it's human nature to choose the "safe" option -- but that doesn't make it right or optimal. But it's also taken to an extreme: I have a BS in CompSci (from 12 years ago), but "never been a professional coder" almost always outweighed everything like a fat kid on a see-saw. It's a double-whammy: not only are they looking at self-taughts harder, self-taughts are naturally going to be less comfortable with best practices, lingo and other general experience they'd pick up in a month's time on the job.

In trying to put my thoughts into words here, I found it easier to whip up some pie graphs instead.

http://www.fivedogit.com/2015/the-problem-with-technical-hir...

Damn! I forgot to include "Do you have exp with the company's stack?". Oh well. That's more about getting the interview in the first place, so I guess it's ok.

weesals · on Jan 24, 2015

Why is it a bad idea to parse a Turing-complete language using regex expressions? I have implemented the XScript language from Age Of Mythology and AOE3 with that approach; it seems to be quite efficient and flexible, is there something I should look out for?

JoachimSchipper · on Jan 24, 2015

(This comment is pretty informal; if you want to know more, look up terms like "context-free grammar", "Parsing Expression Grammar", "Backus-Naur Form", "regular language", or even "formal language theory".)

The problem is not "Turing-complete" (roughly, "can express arbitrarily complex stuff"), but "context-free" (roughly, "you can parse without considering what you've seen so far").

For instance, Brainfuck is Turing-complete (in the "Turing tarpit" sense) but really easy to turn into tokens (in fact, an informal approach may not even distinguish between "+" and "the token 'increment'"). Even realistic languages like C can be parsed without using anything much more advanced than regexes (you need some ugly kludges to support typedefs, and you should pretty much ignore newlines; one would typically use something more like yacc, but that's still not a very sophisticated tool.)

On the other hand, XML or HTML (which are not Turing-complete, and, informally, "not that expressive") are pretty much impossible to usefully parse without extensively considering context - <a><b /><c /><d /></a> gives and <e><d /></e> are "very different <d />'s".

I don't know XScript, but regexes may be a completely viable approach. In fact, if your current approach works, it's likely good enough - it requires some theoretical background to express why you can't parse HTML with regexes, but you'll run into lots of trouble if you try (leading to stuff like http://stackoverflow.com/questions/1732348/regex-match-open-...). Of course, there's value in finding out your approach won't work before you've spent weeks of effort. ;-)

weesals · on Jan 24, 2015

Thanks for the information! Looks like I've got some reading to do :)