Hacker News new | past | comments | ask | show | jobs | submit login

Yea, anytime I use a regex that isn't immediately obvious I put it in a function called get_<something>. Unfortunately people that write overly complicated and error prone regexes usually don't choose to document them.



If a regex is going to be reusable, then yeah, I'd agree. But dumping single lines of code into their own functions just for readability isn't practical for real time systems. In those cases you really should be using comments as they get stripped out by the compiler.


Couldn't those functions just be inlined by the compiler if they're simple regex-wrappers anyway?

I do agree that it might be overkill to move regexes to their own functions just for readability's sake but I don't buy the performance argument. Furthermore, regexes are most popular in scripting languages that no sane person would use for real time performance-critical systems anyway.


1. Ahh right. I wasn't aware that happened.

2. Web sites are a classic example of scripting languages being used for real time performance critical systems (though I'm not arguing that all web sites are real time).

Sometimes the ability to modify code easily is as important to the choice of languages as the raw execution speed of the compiled binaries.


REGEXES aren't practical for real time systems.

If you're using a regex, and certainly if you're using a language other than C, you probably have space for the function call overhead.


I don't really agree with that.

Sometimes C is inappropriate (eg you'd be nuts to build a website in C yet some sites do offer real time services)

Often the data set and/or logic required makes C an inappropriate language (eg you wouldn't use C for AI nor for some types of database operations).

And even in the cases where you're just building a standard procedural system, sometimes the interface lends itself better to other languages (eg C would be possibly the worst language for real time websites.)

But even in the cases where you're building a solution that's suited for C, there are still other performance languages which could be used.

"Real time" is quite a general term and as such, sometimes it makes more sense to use scripting languages which are performance tuned. Which is where writing 'good' PCRE is critical as RegEx can be optimised and compiled - if you understand the quirks of the language well enough to avoid easy pitfalls, eg s/^\s//; s/\s$//; outperforms s/(^\s|\s$)//; despite it being two separate queries as opposed to one.


"Real time" is commonly assumed to mean that you can't use a garbage collected language or need to be extremely careful doing so because random pauses of 100ms break your constraints.

If you're in a situation where the overhead of a couple of function calls is unacceptable, regexes are totally unacceptable and you need to write custom character manipulation.

This situation is really rare and in almost all business cases, using C is inappropriate.


Shouldn't most compilers (jit-)inline it?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: