Looks really interesting. I wrote an SQL parser and compiler frontend for a SQLite-style database last summer. I started off in C using Lex/Yacc, but for the last piece of the project I used Haskell. This step was to take SRA ("sugared relational algebra"), which is essentially a transition step between SQL and simple relational algebra, and desugar it into relational algebra. The code is up at https://github.com/thinkpad20/sql if anyone is interested, with the code for that portion being contained in the haskell folder. I might translate the whole thing into Haskell (maybe using Attoparsec) at some point.
I was curious that you used a different version of relational algebra than the one I had been taught. I've usually seen RA described in terms of six fundamental operators: Project (pi), Select (sigma), Rename (rho), Cross Product, Union and Difference. What gave rise to the model you chose?
Anyway, it's great to see that more people are using Haskell. I would love to be able to work in it some day.
That is really fascinating, I will have a good look at that, thanks!
I have a SQL parser in Haskell on github here (using Parsec):
https://github.com/JakeWheat/hssqlppp
The model in the article is roughly based on the version in Date and Darwen's work which is almost the same as what you mention: project, rename, extend, join, union and not matching. The summarize by is kind of syntax sugar in this system. I left out some of the operators which weren't used anywhere in the code in the article. Industrial grade conversion from SQL to relational algebra is a real challenge!
Wow, your parser is a behemoth! My parser ended up being around 500 lines of Yacc. I know I didn't fully implement SQL, but I thought I had gotten pretty close. Are you doing any extra fancy stuff inside of the parser? Clearly the documentation is pretty extensive. I like the lhs work; I might start doing some of my projects in LHS.
SQL to RA is definitely quite a task, but it was pretty fun (up until the end, when it just started getting annoying...). Then again, I think compilers and programming languages are a blast anyway. Haskell makes it even more so :)
When I wrote this code, I wasn't really comfortable using monads, so all of the threading is explicit. If I were to revisit it, I would probably use a state monad, which might also make it easier to make it tail-recursive (as it is, my desugarer is not written in a tail-recursive manner).
1. Speed and space can be difficult to predict without considerable experience.
2. The libraries are not as extensive and tested compared to conventional choices such as Java or C++.
Using strict data structures is usually the simplest way to regain predictability:
data QueryExpr
= RelVar !Text
| Restrict !ScalarExpr !QueryExpr
| Select !(HashMap Text ScalarExpr) !QueryExpr
| SummarizeBy !(Vector Text) !(HashMap Text ScalarExpr) !QueryExpr
Excessive strictness can cost you performance if you end up evaluating things you didn’t have to, but ASTs should probably be strict. Using Criterion for benchmarking helps find the sweet spot.
I have found Haskell’s library ecosystem quite good—any time I’ve needed a library, there has been a suitable one. In the rare case that changes are necessary, maintainers tend to be responsive and helpful. We probably just work on different kinds of software.
otoh, theres a few mostly haskell shops, and that number is growing! Its really exciting to see the growing haskell adoption at a number of companies doing interesting things
We have one other Haskell programmer, who previously worked with some other functional programming languages (mainly some dialect of ML I think) and was working with Erlang in his previous job, and he picked it up quite easily. I think he is unusual though, I found it much more difficult to learn (but extremely rewarding).
What about learning/discovering the haskell language and its ecosystem? Did you have a lot of refactorings, replacements of libraries wrongly chosen, etc? Were these changes facilitated by haskell? Very curious to know because haskell is high on my list as the language to use in the near future.
In my experience, the more you learn about Haskell, the more you can pick out spots in your code where you could have made something more concise, more elegant, more efficient, etc. The excellent thing about pure functions and Haskell's strong typing, though, is that refactoring is a breeze -- changing a function will never break another function as long as the inputs and outputs to that function remain the same.
The was and continue to be a lot of refactorings, but they were/are pretty much all related to the complexity and sheer number of features in SQL. Haskell really gets out of the way, and helps make the refactorings much more easy.
I was curious that you used a different version of relational algebra than the one I had been taught. I've usually seen RA described in terms of six fundamental operators: Project (pi), Select (sigma), Rename (rho), Cross Product, Union and Difference. What gave rise to the model you chose?
Anyway, it's great to see that more people are using Haskell. I would love to be able to work in it some day.