It's pretty cute. Still no static types for "Strings that end with a dot" or "Strings that match [a-z]+" or "Strings that are proper nouns". ;)
I'm not sure I understand the benefits of this compared to a traditional parser generator.
"Use a data structure that makes illegal states unrepresentable." is not effort spent well.
(And given Gödels incompleteness theorems unachievable in the general case.)
A proper parser generator using a grammar will give you the structural guarantees you need, a lexer will reject invalid terms. And a transformation step gives you the target structure.
Eventually you will have to validate though at runtime.
> Still no static types for "Strings that end with a dot" or "Strings that match [a-z]+"
Sure there are. :) Technically speaking, anyway. Here’s a type for “strings that end with a dot”:
-- | A string that ends in the character '.',
-- represented as a string without its trailing dot.
newtype StringEndingInDot = StringEndingInDot String
fromString :: String -> Maybe StringEndingInDot
fromString str = case reverse str of
'.':cs -> Just $ StringEndingInDot (reverse cs)
_ -> Nothing
toString :: StringEndingInDot -> String
toString (StringEndingInDot str) = str ++ "."
And here’s a type for “strings that match [a-z]+”:
data Letter = A | B | C | D | ... | X | Y | Z
type StringAZ = NonEmpty Letter
Now, admittedly, I would never use either of these types in real code, which is probably your actual point. :) But that’s a situation where there is a pragmatic “escape hatch” of sorts, since you can create an abstract datatype to represent these sorts of things in the type system without having to genuinely prove them:
module StringEndingInDot(StringEndingInDot, fromString, toString) where
newtype StringEndingInDot = StringEndingInDot { toString :: String }
fromString :: String -> Maybe StringEndingInDot
fromString str = case reverse str of
'.':_ -> Just $ StringEndingInDot str
_ -> Nothing
You may rightly complain that’s a lot of boilerplate just to represent this one property, and it doesn’t compose. That’s where the “Ghosts of Departed Proofs” paper (https://kataskeue.com/gdp.pdf) cited in the conclusion can come in. It provides a technique like the above that’s a little more advanced, but brings back composition.
>Technically speaking, anyway. Here’s a type for “strings that end with a dot”:
These are just two functions that wrap and unwrap a string. Your "type" doesn't generate a compile-time error when constructed with the wrong data. This doesn't even handle the simplest use case:
A properly enforced static type would not need to emit Maybe, because it would be impossible to set it to the wrong value in the first place.
Not to mention that to be truly useful a static type would need some sort of literal that would emit a compile-time error if the supplied init data was wrong.
In short, examples above do not demonstrate Haskell's ability to statically enforce rules like "string needs to end with a dot".
Now, you could make a type that always prepends a dot when asked for a string representation, which happens to enforce this specific constraint, but you cannot use a trick like this for most constraints (such as the second example of "only alpha characters").
The problem, as you hint at, is that Haskell lacks a literal
syntax for "strings that end with a dot". One could fake
this up with Template Haskell if one was so inclined. On
the other hand, the representations proposed are
representations of "strings that end with a dot" and
"alphanumeric strings", the dubious ergonomics (and utility)
I think (but cannot guarantee) there's nothing much stopping you writing template Haskell to construct valid values at compile time if you want. It's just that most of the time you're (or at least I am) happy to use "unsafe" functions because the verification is simple enough to do in your head, and the other 99% of the time I'm creating values it's from run-time data.
> It's pretty cute. Still no static types for "Strings that end with a dot" or "Strings that match [a-z]+" or "Strings that are proper nouns". ;)
That’s not true, you can wrap these in types that can only be constructed from valid data, and from that point on you can be sure everything is correct.
data EndsWith : Type where
EndsWithPeriod : (s: String) -> { auto prf : ("." `Strings.isSuffixOf` s = True) } -> EndsWith
s : EndsWith
s = EndsWithPeriod "."
-- Value of type False = True cannot be found
-- s2 : EndsWith
-- s2 = EndsWithPeriod ""
The noun one might be possible -- I'm honestly less familiar with what a proper noun is offhand. (E: prefix=>suffix)
Ignoring all the convoluted type examples here, the simplest case is that you create the type that matches the parsed value. For example, you can create an EmailAddress class that can only contain valid email addresses. Instead of String, you use EmailAddress whenever you need to ensure the valid parsed value.
I'm not sure I understand the benefits of this compared to a traditional parser generator.
"Use a data structure that makes illegal states unrepresentable." is not effort spent well. (And given Gödels incompleteness theorems unachievable in the general case.)
A proper parser generator using a grammar will give you the structural guarantees you need, a lexer will reject invalid terms. And a transformation step gives you the target structure. Eventually you will have to validate though at runtime.