RegEx101.com now offers a debugger

WestCoastJustin · on Sept 28, 2013

http://rubular.com/ is also really great! I use it often when programming ruby regular expressions. Fill in a batch of test strings into the box, then run your regular expression against it, and instantly see (visually) what is happening. This is a BIG plus coming from perl regex 10 years ago ;) This is not a dig a perl regex, but I just remember it was a trial and error loop, where I would continually be iterating the script to see if it worked, where as with this webpage, you just iterate, more quickly.

They also have a great example:

  test string:
  Today's date is: 9/28/2013.

  regex:
  (?<month>\d{1,2})\/(?<day>\d{1,2})\/(?<year>\d{4})

  result:
  month	9
  day	28
  year	2013

Screenshot here: http://i.imgur.com/ixyHRde.png

VeejayRampay · on Sept 29, 2013

Looking at regex101, the layout of the page looks awfully similar. It seems that the author took Rubular and made it more generic. So yeah, props to Rubular.

egor83 · on Sept 28, 2013

And a couple of alternatives:

https://www.debuggex.com/

(for Python flavor:

https://www.debuggex.com/?flavor=python )

and one more Python + regex:

http://www.pyregex.com/

helloTree · on Sept 28, 2013

What is it with the obsession with regular expressions? They are useful things, sure, but I just use them in connection with grep or if I search for strings and normally they are pretty basic, e.g.

$ grep -r -n --color "foo*bar" src

If I want to validate input data with the machine I just use a parser.

ghshephard · on Sept 29, 2013

Regexes are an elegant and very powerful way to validate data in scripts in a concise (and if they aren't abused) easy to read fashion. There are almost infinite number of examples, but let's say I want to verify that a field is a 64 Bit hexadecimal MAC address

   $mac =~ ^[A-Fa-f0-9]{16}$

Gets the job done. How else, but a regular expression so concisely?

And, when you say, "If I want to validate input data with the machine I just use a parser." - that's pretty much what a regex engine is - a sophisticated parser, and the regular expression is the "commands" that you feed to it to parse the input text.

ghshephard · on Sept 29, 2013

Here is another one I just did tonight - I wanted to match IPv4 addresses, but didn't want to validate anything with a leading 0 (specifies octal format, which 99.9999% of the time is not what people want), but I do want to accept a leading 0 if it's the only value (I.E. 3.0.2.1, 0.0.0.0, etc...)

regex_ipv4='^((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])$'

Gets the job done.

How else would you do it?

You can then build up a library of these, and use them on other projects.

MichaelSalib · on Sept 29, 2013

Maybe something like this:

  def is_ipv4_addr(s):
     try:
        octets = s.split('.')
        assert len(octets) == 4
        for o in octets:
            assert 0 <= int(o.lstrip(0) or '0') < 256
     except:
        return False
     return True

It is longer; on the other hand, it is easier to read and more importantly easier to verify correctness.

ghshephard · on Sept 30, 2013

Would:

  1. 12 .13. 14
  089.23.45.67

Both match that? (Your general point is made though - RegExes look fine to the person that just crafted them, but are opaque to the casual observer)

clarry · on Sept 29, 2013

I think you forgot to verify that an octet doesn't have leading zeros (unless its value actually is zero).

MichaelSalib · on Sept 29, 2013

I didn't forget: the (o.lstrip(0) or '0') expression does that.

Actually, that should be o.lstrip('0')...

clarry · on Sept 30, 2013

Wrong.

  >>> is_ipv4_addr("01.0.0.0")
  True

It should reject that (i.e. return False) because the first octet contains a leading zero. But you're just stripping the zero away, ignoring its existence. For no effect, because converting with int() already ignores them for you.

Your code is also ok with bizarre inputs like "0..." :-)

Regexes really do have their strengths -- they compactly express a state machine, and you can always break the expression into parts which'll show exactly what the state machine will accept. They could also be much more readable if people bothered to break them into parts instead of typing it out all inside a long string that becomes really difficult to parse visually. There are other notations to improve readability, for example rx in emacs: http://www.emacswiki.org/emacs/rx

A seemingly simple regex can be implemented in imperative code and it might look clean and pretty until you get the logic exactly right and amend it to handle all the corner cases that are not obvious at first sight. For comparison I did the exercise in old-fashioned C (and the indentation got messed up along the way, sigh).

https://pastebin.mozilla.org/3171656

A state machine would be more appropriate in my opinion.

MichaelSalib · on Sept 30, 2013

You're right. My mistake.

I like automata and I think regexes are good for some things, but I definitely agree about the crappy syntax. When working in CL, I loved Edi Weitz' CLPPCRE package which allowed you to specify regexes using either the traditional broken string form or an s-exp syntax. Much cleaner.

mjhoy · on Sept 29, 2013

> How else would you do it?

Taking your question generally, I was curious to see what it might look like as a parser, since I find that regex a little hard to read. Here's an implementation with Haskell's parsec:

https://gist.github.com/mjhoy/6751909

helloTree · on Sept 30, 2013

Ok then maybe it's just a matter of taste. I like the parser approach more:

import Text.ParserCombinators.ReadP -- or the parser lib of your choice

import Data.Char

...

macP = count 16 (satisfy isHexDigit)

hnriot · on Sept 29, 2013

I don't think your example does what you think it does, match fo followed by o zero or more times, followed by bar.

when you do understand regex, you'll be amazed at the myriad of things you can do with it.

helloTree · on Sept 30, 2013

You are absolutely right, I forgot the '.'.

redox_ · on Sept 28, 2013

Would be ultra-cool if you could propose a "generate a matching sample" button.

joezo · on Sept 28, 2013

Take a look at http://fent.github.io/randexp.js/ if you want something that will do this

Lindrian · on Sept 28, 2013

I have thought about this, but in cases where it would be useful, it's impossible to generate a sample match string. For example, creating a match string for /(?:a|[bc])efg?/ is super simple, but for something like: /(ab(?1)*)/ it becomes much harder. Not to mention the performance hit you would see for these more complex expressions. (These are just dummy expressions for illustrative purposes, but I'm sure you get my gist.)

codexon · on Sept 28, 2013

If there is a modifier with a variable number like ?,+,* you could just repeat it X times.

For example if it is ? repeat 1 time, + repeat 2, * repeat 3 etc...

If it is |, choose the first or choose randomly.

Jugurtha · on Sept 28, 2013

That's funny. Just yesterday I needed that and used pythex and a bunch of other similar 'testers' to make sure my regular expression was good. It was, but it somehow didn't work on the Mozilla Add-On Builder.

After asking a question on the #jetpack channel, members have spotted the mistake: The regex was correct, but it needed to "match the exact string" as mentioned on the doc. I've read it, but didn't understand that point. There was a missing "." at the beginning and the end. So /.regex.*/

Thanks for putting this.

Lindrian · on Sept 28, 2013

Forgot to tell you how to use it. Simply insert an expression and some text and press the little red button right above the input for the regex. That's all you have to do! Enjoy :)

thousande · on Sept 29, 2013

The explanation box is nice!

Here is another regex tester: http://www.gethifi.com/tools/regex

eknkc · on Sept 28, 2013

http://www.regexper.com is also awesome to visualize regular expressions.

hclee · on Sept 28, 2013

Not bad. You don't have to jump around your regexp reference and editor. It does not exactly tell you why your exp & string does not match. It just show what typed reg exp will do.

Lindrian · on Sept 28, 2013

I think this shows quite clearly what is going on and why I get the result I get: http://regex101.com/r/kU3cJ6/#debugger (imgur link: http://imgur.com/H2IkNGy)

If you don't agree with me, could you perhaps suggest an improvement?

hclee · on Sept 29, 2013

I mean it does its job and it is good. I would use it. But look at this case; test string inout [123:0] Asdfg

Exp (made it wrong in purpose) ^inout\s+\[[0-9]\: It just display "No Matches".

It would be cool if it kind of guide what exp you want to use to match test string. What you put in Test string is what you want to get.

mattyod · on Sept 28, 2013

I like it. Would be nice to be able to match against multiple test strings though.

shmerl · on Sept 29, 2013

How do you replay the tutorial once it's finished?

shocks · on Sept 29, 2013

Lots of great information in this thread, thanks all!