Hacker News new | past | comments | ask | show | jobs | submit login

My point was that "parsing with derivatives" not necessarily a slow thing and I provided example supporting that point.

I specifically has been searching for performant regular expression library recently. The "parsing with derivatives" approach can share more of the state, I believe, when doing several matches in parallel (think about trie-encoded dictionary) than DFA-based libraries do and should have smaller startup time.

I have not misinterpreted your argument. I have provided a point where it does break because I consider that point important. The reader of our conversation will, from now on, I hope, not consider the original "parsing with derivatives" paper as the state of the art and, probably, will come up with something himself.

I, actually, did and I am glad you answered my points. They made me thinking.

I think that parsing with derivatives can be used as a tool to parse (in parallel! possibly sharing derivatives computed!) parts of text with regular subgrammars and then something like CYK can be applied (again, in parallel like in [1]) to the regions parsed.

[1] https://jyp.github.io/pdf/PP.pdf

PS

Please note that in [1] they show that chart parsing struggle with exactly regular subgrammars - typical chart parsing algorithm has O(n^2) complexity for repetitions expressed with asterisk in regular grammars. I do not think my "solution" is necessarily better than in the [1], but I think I have to think about it more.




I want to preface this by saying that most of this comment is kind of blunt and wordy, but I really don't intend it to be taken rudely. I just wanted to address your points very explicitly to explain my position more clearly. I value your contributions and hope you'll take this response in the spirit I intend it.

> My point was that "parsing with derivatives" not necessarily a slow thing and I provided example supporting that point.

I actually don't think you supported this point, because I specifically said (emphasis new):

> A fun one to include might be Might's "Parsing with Derivatives", which is algorithmically novel (though not very performant).

I was explicitly talking about the 2011 paper titled "Parsing with Derivatives" which generalizes Brzozowski's derivatives from REs to CFGs, and which has absolutely terrible performance compared to most other parsers for CFGs.

You went on a bit of a tangent by arguing about "using derivatives to parse REs in general" (to paraphrase). This was never the scope of what I was talking about. Whether derivatives can be used to parse REs efficiently is irrelevant when I was specifically referring to the 2011 paper, which provides a novel, general parsing algorithm — not LL, not LR, not GLL, not GLR, but just "general", for all CFGs in existence.

> I have not misinterpreted your argument. I have provided a point where it does break because I consider that point important.

I still think you misinterpreted me, because you're talking about "how can derivatives be used to parse things efficiently" while I was talking about "the contributions of this specific 2011 paper that introduced a new general parsing algorithm that can handle all CFGs and which happens to be called 'Parsing with Derivatives'".

At this point I want to say that I'm not trying to be mean here, and I really hope that's not how I'm coming across. I just think something isn't connecting in our dialogue, so I'm trying to be overly explicit and cautious in my wording to make sure I don't introduce any possibility for misinterpretation.

> The reader of our conversation will, from now on, I hope, not consider the original "parsing with derivatives" paper as the state of the art and, probably, will come up with something himself.

See, this is the thing. The 2011 paper I referenced isn't about REs, but your papers were about REs. The 2011 paper introduced a brand-new general parsing algorithm to the world. It was subsequently refined by Adams, Hollenbeck, and Might in 2016's "On the Complexity and Performance of Parsing with Derivatives" [0], and then again refined by Darragh and Adams in "Parsing with Zippers" [1] (published at ICFP this year). To the best of my knowledge, the materials you've linked so far do not even reference the 2011 paper and so cannot be said to improve on the state of the art in that regard, where "state of the art" refers to "state of parsing all CFGs using a derivative-based approach rooted in the Brzozowski derivative". They instead start with the 1964 Brzozowski paper that originated the derivative of regular expressions and improve on that directly — meaning they are not addressing CFGs in general, as the 2011 paper I was talking about does. Your paper and the 2011 paper are more like siblings in that they both descend from the same 1964 paper, but solve different problems. So I think it would be misleading to suggest that your paper is an improvement on the state of the art of the 2011 paper because they're really very different problem domains, despite both involving "parsing" and "derivatives".

> I think that parsing with derivatives can be used as a tool to parse (in parallel! possibly sharing derivatives computed!) parts of text with regular subgrammars and then something like CYK can be applied (again, in parallel like in [1]) to the regions parsed.

So, if I'm understanding you right, you're suggesting using a hybrid approach. I think that's an interesting idea! I don't know that I've seen much done like that. I do know that CYK is almost completely ignored in PL these days, so perhaps there is some other strategy to use there to augment the derivatives-of-REs subcomponents. I'll have to think about that. What a neat idea!

[0] https://doi.org/10.1145/2980983.2908128

[1] https://doi.org/10.1145/3408990




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: