Hacker News new | past | comments | ask | show | jobs | submit login

Oh interesting! TIL. I think that probably means that this is UB then: https://github.com/JuliaLang/julia/blob/d8ff21c69c118e8801e8... --- You can't enable NO_UTF_CHECK in PCRE if you're going to pass data that isn't valid UTF-8.

N.B. As of PCRE 10.33, you can able the PCRE2_JIT_INVALID_UTF check for JIT matching instead.

It looks like this is also coming to the standard interpreter as well: https://lists.exim.org/lurker/message/20190524.173112.0d226a...




Yeah the PCRE situation is a bit unfortunate. Avoiding crashes on invalid data would be the minimum and hopefully PCRE does that officially soon. To really make things work well, we would have to patch PCRE to handle what Julia considers invalid characters to be, which is doable, but it may be better to just reimplement regex functionality in Julia, which is non-trivial, but that way we naturally get correct treatment of invalid data, JIT, and support for other string types.


> Yeah the PCRE situation is a bit unfortunate.

Indeed. In all likelihood, it's a CVE waiting to happen.

> we would have to patch PCRE to handle what Julia considers invalid characters to be

Sorry, did you see my links in the previous comment? This is already available in the JIT engine for PCRE 10.33, and appears to be making its way into the standard interpreter as well. So long as both Julia and PCRE implement UTF-8 correctly, both should be on the same page with respect to invalid UTF-8 byte sequences.

> but it may be better to just reimplement regex functionality in Julia, which is non-trivial, but that way we naturally get correct treatment of invalid data, JIT, and support for other string types.

Yup, this is what I did for Rust, which can work on both completely valid UTF-8 and arbitrary byte sequences. But it is a ton of work. I'd get as much mileage out of PCRE2 as I could.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: