It's just syntax, it compiles to SQL and runs on today's DBMS. It has no differe...

lolinder · on Sept 16, 2023

It does make a difference if the resulting SQL is unperformant.

Someone who is good at SQL can look at a query and see where the query planner might go wrong, then make subtle tweaks to get better performance. Optimizing queries in a compile-to-SQL language basically has to be left to the compiler, which may not have the context needed to write performant SQL.

Compiling a query is different than compiling a whole program to assembly, because the query makes assumptions about schemas and indexes that cannot be encapsulated in what you're feeding to the compiler at that moment.

ttfkam · on Sept 16, 2023

Agreed.

Also, someone who ISN'T good at SQL can look at EXPLAIN output and see where the query planner HAS gone wrong.

Adding PRQL to the mix unambiguously makes that analysis and optimization step harder.

maximilianroos · on Sept 16, 2023

So I agree that unperformant SQL is unperformant, but PRQL can _reduce_ the chance of making mistakes there.

Here's an example [1] of someone reporting that a query engine was far more performant with one SQL construction, and then PRQL changing the SQL we output to use that construction.

GCC & Clang are much better at compiling to assembly than any person! PRQL isn't there yet, but each improvement scales to everyone who uses it.

[1]: https://github.com/PRQL/prql/issues/2182

[Disclaimer: PRQL dev]

ttfkam · on Sept 16, 2023

Be honest. C to object code to linker to final binary are a MUCH bigger leap than PRQL to SQL.

You make it sound like SQL is some insurmountable hurdle while PRQL is a bunny slope. You're not getting anywhere with that nonsense.

[Disclaimer: not a PRQL dev]

maximilianroos · on Sept 16, 2023

I don't know where this aggression is coming from — we're an open-source project, only volunteers, trying to do something that makes it easier for folks to work with data.

Critical & constructive feedback would be really appreciated, but that's not this — why all the snark?

ttfkam · on Sept 17, 2023

Agreed. Got off on the wrong foot. I and others have noted what appeared to be overzealous marketing, and I previously asked questions regarding DB engine-specific features like jsonpath, unnest for arrays, etc.

The responses from the PQRL appeared evasive to me, asking for PRs to address obvious syntax mismatches, and never answering (what I believed to be) substantive questions regarding extensibility and lowest common SQL denominators. "We don't support that yet," is a perfectly reasonable response. But what I'm seeing now is just youthful exuberance untempered by abundance of experience.

Almost anyone working intimately with DBs beyond just MySQL 5.x will know that each DB engine has features that can reduce the amount of SQL required, improve data integrity, substantially speed up operations, etc. Not allowing for escape hatches to native functionality is often a deal breaker. Does PRQL have "escape hatches" where you can use the vendor-specific keywords, operators, and types of the underlying engine? (Not just calling single-return-value function, but actual substantive syntactic differences.)

The issue of NULL is a tricky one, and I totally see why folks would want to == and != it away, but unfortunately in the SQL world, it's not just a matter of IS NULL vs IS NOT NULL (leaving aside IS [NOT] DISTINCT FROM). NULL has profound effects in the model stemming from its VERY different meaning from most programming languages. NULL means "unknown" in SQL, not just "no value" as it is in most general purpose languages. It's subtle but absolutely can't be hand-waved away. How does PRQL handle "IS DISTINCT FROM"? What about set-returning functions that were passed a NULL parameter?

Which brings me to why I think I was harsher than intended: I don't think any of you actually like SQL, and I think that's a dangerous place to start from. It leaves "weird" NULL behavior cascading through your query while the user has used == and != just as they always have with any other value. Using EXPLAIN becomes more difficult due to indirection, and it's not clear to me how you'd alter the query without hacking the PRQL engine in a one-off. I fear that it is indeed "lowest common denominator" (which these days typically means MySQL 5.x compatibility), which is (in my personal opinion) wholly unacceptable for a non-trivial data access conduit.

I'd likely have been much less snarky and supportive of PRQL if it was presented as an idea factory examining the shortcomings of SQL—of which there are many—and trying out solutions on an experimental basis.

But that isn't the tone of the project at all. Its messaging (intended or not) goes at SQL with both barrels while touting itself as a production-ready alternative standard to SQL. It isn't about "trying it out for analytics to see if it's right for your team," which I could totally get behind.

Instead, your site has statements like these:

• PRQL is a stable foundation to build on; we’re open-source and will never have a commercial product

• PRQL is a single secular standard which tools can target

A lack of commercial support isn't in any way synonymous with a stable foundation. Stability is like trust: it is earned over time through action, not just asserted to be true with good intentions. I'm not saying you can't get there eventually, but PRQL is just a year old! The project and team simply haven't earned the right to assert that yet!

https://xkcd.com/927/

All implementations are standards until they diverge. Calling PRQL a standard at this point is like the first baseball World Series where only the US played (and just white men from the US at that, but I digress). I'm not saying it can't ever be a standard, but you're skipping several steps in that process and already declaring, "Done!"

To my knowledge (and quick internet searches) there is no major standards organization behind it. Is there even more than one implementation of this standard? (Usable in multiple programming languages is not the same thing.)

I appreciate that you're volunteers and an open source project. I really do. I also appreciate that your hearts are in the right place with regard to making data accessible. But this isn't about snark. You're trying to bite off a huge chunk of a 49-year-old leviathan without recognizing how it got to be almost half a century old (in the computer industry!!!) and without noticing that massive leviathan is covered in a long lifetime of battle scars. Those are the waters you're trying to swim in (or at least seeming to from your public website messaging and source repository).

    "Those who don't understand UNIX are doomed to reinvent it, poorly." – Henry Spencer (1987)

This is not just an idle piece of snark from Unix folks to any upstart that crossed their paths. It was already a large, battle-scarred leviathan as well by that point, and is far more so today!

The same totally applies to SQL. Show some respect to that, keep up the good fight, but don't let hubris cloud your judgement about how far you and your team really are in this journey.

Other commenters in this discussion have noted inconsistencies in PRQL syntax. Those conversations won't get any easier by stating you'll never address them because it's "a stable foundation" or making breaking changes to make things better, thereby breaking that promise. Your project is experimental. Own it. Run with it. Improve on it. But if it's presented as more than this—as it is now—expect vehement pushback from both haters and other who honestly hold you no ill will but recognize several missing fundamentals as I have.

maximilianroos · on Sept 17, 2023

I appreciate the lengthy response.

I won't address every point here, but a couple of thoughts:

- To close the syntax point — I opened https://github.com/PRQL/prql/issues/3518. We're currently using the output of the rust's canonical SQL formatter. (so any claim we're obfuscating SQL's syntax is incredulous — do you agree?). I'm open to hand-written examples if SQL doesn't have sufficient auto-formatting tools available.

- "never answering (what I believed to be) substantive questions regarding extensibility and lowest common SQL denominators" — some of the features are implemented, some aren't, there are issues on GH for by-and-large all of them. Feel free to open other issues. I don't think there's a duty to respond to every question in every comment on HN, and I certainly don't think losing your composure is an appropriate response to others not answering every question.

- Many of the answers to your questions are in the docs — for example escape hatches. You don't have to read them to engage, but claims about a project's "youthful exuberance" are less credulous from those who haven't done so.

- Stepping back, are these are informed & constructive criticisms, or are they grasps for straw-men as part of a reactionary response? For example, the response takes two phrases from the website out-of-context to load a whole argument on the words "stable" & "standard". But stability & standards can each mean multiple things, and taking the least generous interpretation of a word doesn't make for a reasoned critique. Does a point such as:

    touting itself as a production-ready alternative standard to SQL

...have any grounding in fact? Or does it come through this aggrieved reactionary lens? For context, PRQL's Readme specifically states:

    PRQL still has some bugs and some missing features, and is probably only ready to be rolled out to non-technical teams for fairly simple queries.

ttfkam · on Sept 18, 2023

> For context, PRQL's Readme specifically states

Be honest. If a different project touted itself as a standard and a firm foundation on its public website but had this disclaimer on the source repo's README file, what would your thoughts be?

While you may categorize my responses as reactionary—and they very well may be—can you really claim unbiased objectivity on your part?

For example, JS and C# have ECMA. SQL, C, and C++ have ISO. POSIX has IEEE. All have multiple implementations by different organizations. The term "standard" has a clear, well-defined meaning in computing with a long history. Your response was to handwave it away as "meaning multiple things." Whether de facto or de jure, the appellation in PRQL's case simply does not apply.

I appreciate that you are not obligated to read and respond to every internet comment. When you do respond however, bear in mind that dismissing clearly defined definitions of industry terms like "stable" and "standard" is not an unbiased position.

I am not aggrieved. You asked for critical and constructive feedback. I laid out plainly where I believed the messaging was incongruous from the point of view of someone not intimately involved with PRQL and how I felt it should be changed to better fit its role in the data ecosystem. I didn't call for the project's elimination or even call into question the need for improvement over the existing state of SQL. Take of that what you will.

ttfkam · on Sept 16, 2023

(I couldn't help but notice you didn't comment on the difference in formatting in the examples.)

Do you have examples of PRQL working with jsonpath? Generating JSON? Unnesting arrays? Returning ids from an INSERT or UPDATE without making a separate read query?

Not trying to be argumentative. Honest question.

remram · on Sept 17, 2023

I am not affiliated in any way with the PRQL project. Those are great questions though, I hope we get an answer.

snthpy · on Sept 19, 2023

@remram I replied to the parent comment.

snthpy · on Sept 19, 2023

Thanks, that's a great question. You're right in that so far we haven't highlighted working with JSON and I hadn't actually tried until this point. IMHO the true power of PRQL comes from the fact that it allows you to define functions and with that you get the power of composability which is the true power of almost every programming language (and which is for the most part completely lacking in SQL).

So with that said, I tried the following POC (remember that PRQL is just a SQL generator so the JSON capabilities depend on your underlying RDBMS):

    ```sh
    > prqlc compile <<EOF
    let get = path obj -> s"""{obj} -> {path}"""
    let getstr = path obj -> s"""{obj} ->> {path}"""
    let extract = obj path -> s"""json_extract({obj}, {path})"""
    
    from [{data='{"duck": [1, 2, 3]}'}]
    select { data | get '$.duck[0]', data | getstr '$.duck[1]', extract data '$.duck[2]' }
    EOF
    WITH table_0 AS (
      SELECT
        '{"duck": [1, 2, 3]}' AS data
    )
    SELECT
      data -> '$.duck[0]',
      data ->> '$.duck[1]',
      json_extract(data, '$.duck[2]')
    FROM
      table_0
    
    -- Generated by PRQL compiler version:0.9.4 (https://prql-lang.org)
    ```

What's going on here is that I used [s-strings](https://prql-lang.org/book/reference/syntax/s-strings.html) to define custom PRQL functions `get`, `getstr` and `extract` which translate into the underlying `->`, `->>` and `json_extract` SQL constructs.

You could then for example pipe that query to DuckDB (the example is taken from the following DuckDB blogpost [Shredding Deeply Nested JSON, One Vector at a Time](https://duckdb.org/2023/03/03/json.html)):

    ```sh
    > prqlc compile <<EOF - | duckdb
    let get = path obj -> s"""{obj} -> {path}"""
    let getstr = path obj -> s"""{obj} ->> {path}"""
    let extract = obj path -> s"""json_extract({obj}, {path})"""
    
    from [{data='{"duck": [1, 2, 3]}'}]
    select { data | get '$.duck[0]', data | getstr '$.duck[1]', extract data '$.duck[2]'}
    EOF
    ┌───────────────────────┬──────────────────────────┬───────────────────────────────────┐
    │ "data" -> '$.duck[0]' │ ("data" ->> '$.duck[1]') │ json_extract("data", '$.duck[2]') │
    │         json          │         varchar          │               json                │
    ├───────────────────────┼──────────────────────────┼───────────────────────────────────┤
    │ 1                     │ 2                        │ 3                                 │
    └───────────────────────┴──────────────────────────┴───────────────────────────────────┘
    ```

HTH

ttfkam · on Sept 19, 2023

Excellent, thank you for the response.