Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You may want to work on specifying a formal grammar in EBNF or similiar.

Great for library / parser implementers and it helps discover ambiguities.



http://vfmd.org has an exhaustive and unambiguous formal spec for (a variant of) Markdown. As a bonus, it's easy to understand, nicely extensible, and has an example implementation. The only drawback I'm aware of is that it seems routinely ignored by nearly everyone. I wonder why?


While the spec is way more formal, it's still not in a form that easily lends for a proof that it's unambiguous.


I have not looked into the details of the grammar, but proving that a grammar is unambiguous is typically done by proving that it is in, say, LL(1) or LR(1) (or some related family). This approach towards unambiguity has the additional advantage that it is easy to derive a parsing algporithm directly from the grammar that has nice theoretical properties (e.g. linear parsing time in the input length; where the hidden constant depends on the grammar).


There is no grammar, it's a description how to parse markdown using regexes.


From John MacFarlane, the author of the CommonMark spec:

> If anyone wants to contribute a BNF, please do! But I'm very skeptical that it can be done, due to the many quirks of the syntax.

https://github.com/jgm/CommonMark/issues/113#issuecomment-60...


So it's basically as useless for general processing as Markdown, because you won't have a grammar from which you can generate parser, hence all the parsers will be handcoded with various quirks differing from one to another and none of which will produce a parse tree.


I'm baffled. Without a formal grammar CommonMark is useless. Who implements a parser without a formal grammar?


If the syntax has so many issues that you can't specify a formal grammar, there's a problem with the spec.


If it's hard to write a BNF, it's also hard to write a parser.


BNF doesn't make something unambiguous.

I mean, Foo = "hello" / "hello" is totally acceptable, and if you have any indirection like Foo = XXX / YYY with XXX, YYY = "hello" you can then attach different semantics to XXX and YYY despite it being ambiguous. Add in more levels of indirection, and you have a lower chance of it being trivially obvious.


But there exist tools to help you decide wether a grammar is unambiguous when you have a formal grammar: https://en.wikipedia.org/wiki/Ambiguous_grammar#Recognizing_...




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: