Hacker News new | past | comments | ask | show | jobs | submit login

> RDBMS can solve pretty much every data storage/retrieval problem you have.

Except the most important problem: A pleasant API. Which is, no doubt, why 95% of those considering something other than an RDBMS are making such considerations.

RDBMS can have pleasant APIs. It is not a fundamental limitation. We have built layers upon layers upon layers of abstraction over popular RDBMSes to provide nice APIs and they work well enough. But those additional layers come with a lot of added complexity and undesirable dependencies that most would prefer to see live in the DBMS itself instead.

At least among the RDBMSes we've heard of, there does not seem to be much interest in improving the APIs at the service level to make them more compelling to use natively like alternative offerings outside of the relational space have done.




I've honestly never understood why people have such a distaste for SQL. SQL and Linux/Unix have been the biggest constants of my entire programming career to this point (20ish years). I always know I can count on them.


I love the data model of RDBMS / SQL. I hate SQL _the language_. Verbose, irregular, attempting to mimic English and thus making it unintuitive and next to impossible to remember. 80% of my lookups into reference documentation is about syntax for things I don't use that often.

Examples: GRANT TO, REVOKE FROM, DENY TO (oh yes, what's the difference between REVOKE and DENY? and did you know that you can REVOKE a DENY?), arbitrary requirements for punctuation (e.g., = in BACKUP DATABASE AdventureWorks2012 TO DISK = 'X:\SQLServerBackups\AdventureWorks1.bak'), dubious context-sensitive "magical identifiers" (e.g., SELECT DATEPART(year, @t)), non-composability (how do you dynamically give a part specification to DATEPART?), etc, etc, etc.

It's possible to write a novel about just how unstructured and irregular "structured" query language is.


All of your examples are vendor specific extensions to SQL. Nothing of that is part of the standard. So you are effectively hating a specific implementation which is not that logical.


Imagine hating the things that you use rather than a pdf with text, fascinating.


Yes, ANSI syntax is surely composable or dynamic: EXTRACT(YEAR FROM DATE '1998-03-07')


Honestly, I think it's because a lot of folks go their entire careers avoiding set-based thinking. Many of these folks are talented programmers so I would assume they'd become quite decent at SQL if they devoted time to it. I'm speaking more on the DML side than the DDL side here.

In my experience <20% of developers are good enough to be dangerous with SQL, and maybe 5% what I'd consider adept.

The rest range from "SELECT * is as far as I'll go; where's the ORM" to "why the hell are you using cursors to aggregate"

SQL is powerful, elegant, and reliable. With modern DB support of JSON, ARRAY, statistical functions, and much more, SQL is probably the #1 most underutilized/improperly leveraged tool today. SQL-killers have been coming out for 40 years now and (for its domain!) SQL's lead is pulling farther away if anything.

*yes there are some questionable implementations, so please replace "SQL" with "PostgreSQL" if nonstandard SQL implementations have caused nightmares for you in the past.


The reason isn’t set based thinking avoidance. I think it is because fundamentally we want to work with smart objects and not rows of data in most cases.


More often than not that's a good example of set-based thinking avoidance. The application graveyard is full of projects that felt their transactional data was too sophisticated for "dumb rows" and reinvented an inefficient relational model. I'd love to hear more about these "smart objects" - they're usually not that smart and typically can be represented relationally.

Even for use-cases like graph based models you still find Twitter and Facebook using MySQL to build the graph on top of. It’s simply heavily abstracted for the majority of engineers at those companies (where I'd wager the <20% proficiency in set-based thinking holds true) but it still fundamentally relies on SQL.


Where can I learn about the relationship between set-theory and SQL?


I'm not too well-versed in the academic/mathematical background of SQL, other than than to say it's closer to relational algebra than Set theory, and the resource to read is probably the 1970 Codd paper: https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf

SQL does implement the set operations of UNION, INTERSECT, and EXCEPT, but I meant "thinking in sets" more colloquially.

edit: this is more practical than theoretical but the author here actually does a nice job of discussing "set-based" thinking vs. procedural https://www.itprotoday.com/open-source-sql/fangraphs-tags-cl...


Discrete Mathematics with Applications by Epp. Chapters 5, 7, 10.

Fundamentals of Database Systems by Elmasri & Navathe. Part 2.


I wholly agree with you, but I'll say this:

When it comes to prototyping, I'm not going to fuck with something like Java-- I'm going to reach for Python. If I don't know what I'm doing to begin with, I don't need something telling me all the ways in which I'm doing it wrong.

Same goes for SQL/NoSQL. I loosely know what I need to model and may revise it arbitrarily. SQL does not lend itself to this. NoSQL was designed for it.

NoSQL is the "fuck you dad you can't tell me what to do" of the database world. SQL is your annoying dad that tries to institute curfews and won't let you go to underage drinking parties. In the end, it's the latter you're going to be calling from a holding cell, but there's a lot more fun you can have with the former.


In the case of SQL/NoSQL I think it is the other way around. In order to use a NoSQL database to its strength, you have to know all access path up front [1]. Because you are designing your table structure with all the queries in mind. In SQL on the other hand, you can always go for a normalized structure and you are pretty safe to be prepared for most requirements that are coming in the future.

Additionally, I think drafting a DDL schema is a great way to learn about a new application domain. It forces me to ask hard questions. And that improves my understanding of the domain. I guess that is similar to some people who like to prototype using Haskell type signatures.

[1] Unless you mean specifically a document store without schemas.


Hmm, it’s the other way around for me. The crazy friend may be fun to go drinking with, but I’d never rely on him for anything.


> If I don't know what I'm doing to begin with, I don't need something telling me all the ways in which I'm doing it wrong.

This is unrelated to this conversation, but this is my main beef with Rust. I love Rust (like a lot), but it's just not good for prototyping. End of non-pertinent rant.


SQL is great at what it is designed to do, and absolutely horrible for anything else. Sometimes I have had to use tools that only allow (a subset of) SQL for querying data. (Especially BI) Doing regex date validation in a json derived from a string in an sql dialect without function-definitions is horrendous. These kinds of "why the f do i have to use sql for this"-moments happened surprisingly often to me working as a data analyst.

But then I'm trying to do relatively simple things in Pandas or R, like changing column values based on multiple field value conditions, and it is a struggle requiring at least 17 brackets, while in SQL it would be a simple statement that anyone can understand after reading it once.


>Doing regex date validation in a json derived from a string in an sql dialect without function-definitions is horrendous.

Doing regex is horrendous, but doing it on SQL in a modern database is no more difficult than in a full-fledged programming language. Most modern DBs have strong JSON support and built-in regex functions

1. JSON_CAST/JSON_PARSE your data

2. REGEXP_EXTRACT() on the result, here's several date validator regex from a SO post (https://stackoverflow.com/questions/15491894/regex-to-valida...)

And that's it. In fact in many cases it's probably faster to do it natively in SQL than to export it to python or R and parse there.


The problem you are describing is probably rooted in the specific SQL dialect you had to use. Selecting from inside JSON strings and matching regular expressions should be a no brainer. And it is straight forward in PostgreSQL, e.g.


> simple things in Pandas or R, like changing column values based on multiple field value conditions, and it is a struggle requiring at least 17 brackets

With dplyr, wouldn’t this be mutate(x = case_when(… ~ y, TRUE ~ x)) or the same but with ifelse?


I haven't used much R, but how would you do something like:

  CASE WHEN
  SUM(daily_revenue) OVER (PARTITION BY department, TRIM(SUBSTR(region, 5)) IN ('North','West','Misc')) > 
  AVG(revenue) OVER (ORDER BY sale_time ASC rows BETWEEN 28 PRECEDING AND CURRENT ROW)
  AND NOT COALSECE(had_prev_month_party, FALSE)

  THEN pizza_party_points + 5

  WHEN <above> AND had_prev_month_party THEN pizza_party_points + 3

  WHEN MIN(sale_time) over (PARTITION BY department) = DATE_TRUNC('month', current_date) then 5

  ELSE GREATEST(pizza_party_points - 1, 0)

  END as pizza_party_performance_points_current

this example may be a bit esoteric but it actually draws from lots of real-world cases (comparing a partitioned sum with an overall average, checking conditions on other columns, messy data parsing and manipulation, implicit waterfall nature of CASE WHEN vs. explicit elseif, etc)


SQL is also a big constant of my programming career, I know it can do everything I need, but sometimes it is frustrating how its limitations make some simple things complicated.

For example, one of the biggest gripes I have with SQL is the fact that the table is the only output format, it doesn't allow outputting data with different cardinalities in a single query. The internal relational model of a RDBMS is extremely rich and powerful, but when extracting that data using SQL you have to leave all the richness behind and flatten the data into a simple dumb table. So as result we have to do much back and forth with separate queries based on results of previous ones (or even ugly hacks like concatenate higher cardinality data into a single field to then re-separate it in the application). A suitable successor for SQL should have a way to output a subset of the relational model with the relational part intact.


I'm not sure a successor to SQL is even necessary, rather there would be benefit to some higher level interfaces provided by the RDMBS that compliment SQL to cut down on the egregious application boilerplate that becomes necessary to achieve common tasks using SQL directly.

SQL is essentially the assembly language of the database. That makes it very powerful, but sometimes you just want a language that gives you a "garbage collector" for free, while still being able to drop down to assembly when you need additional power.

There is no technical reason why an RDBMS can't support different modes for accessing data. We are just so used to putting that work into the application that it has become a hard sell to want to move it to the right place.


> For example, one of the biggest gripes I have with SQL is the fact that the table is the only output format, it doesn't allow outputting data with different cardinalities in a single query.

With MSSQL at least you can return multiple result sets. Not sure about other database vendors.


SQL has some annoying quirks but on balance I am not sure there is the distaste for SQL that you claim. It's fine, even good in many circumstances, but also too low level for some tasks which pushes common problems onto application developers to solve over and over again or build up layers of dependencies to utilize someone else's solution. That's not the fault of SQL but the fault of the RDBMS.

Our layers of abstractions atop SQL along with approaches taken by DBMSes outside of the common relational names have shown that there is room for improvement within those tasks, able to be done natively by the DBMS, and it does not have to come at the expense of losing SQL when you need to be able to describe lower level questions. Different tools for different jobs, but little will within the RDMBS space to explore those other tools.


I don’t think that people have a distaste for SQL, I think that large companies have a vested interest in trying to sell no SQL solutions because they more tightly integrate into a “cloud” ecosystem. A database solves that problem domain too well, it’s not marketable.


Can you write an SQL query to return arbitrary JSON? Returning queried data in a nested form is a must nowadays.


> Can you write an SQL query to return arbitrary JSON? Returning queried data in a nested form is a must nowadays.

Yes. https://www.sqlite.org/json1.html



With just a splash of row_to_json and json_agg, you can JSON encode your entire query in PG.

  SELECT json_agg(row_to_json(t))
  FROM information_schema.tables as t;


The dealbreaker here is having to do this dance for every nested field on every entity, and having to write a separate backend query for each separate front-end use-case for a single entity.

It just isn’t feasible to write everything out when the schema has >30 entities, some with N:M relations, and when queries routinely go several levels deep to fetch dozens of joined fields at every level. The boilerplate overhead is too much.

A natively GraphQL database makes such queries a magnitude less verbose to write out, and all the queries can stay in the frontend (or can become persisted queries on a GraphQL ”middle-end” server).


Hasura, a GraphQL server, uses exactly this technique to transform final query results into JSON in PG before bringing back into local memory.


Make more than a single query and nest on the application layer.


I blame ORMs. ORMs are promoted by scare mongering novice developers away from learning SQL in the first place. I'm ashamed to say I fell for it for a few years. When I eventually learned SQL it was like a fog being lifted from my mind that I hadn't even noticed before.


I tried not to use an ORM for my last project, but I you ended up rolling my own ORM for the application anyways because it was easier to keep the structures in the web application tied to the database. How else do you keep the two consistent?


The irony is, to effectively use an ORM, you need to be able to debug the SQL it generates when it’s not performing correctly (eg operating on a collection, row by row). I like to use ORM for row based, transactional operations in an OLTP. But usually look for the way to write straight SQL when doing OLAP style reporting stuff. Both of which happen in almost every business.


With SQL you kind of have two options/extremes that are unpleasant in their own way.

You either model things in a very domain specific and classic fashion. Here you get the benefit of being quite declarative and ad-hoc queries are natural. Also your schema is stronger, as in it can catch more misuse by default. But this kind of schema tends to have _logical_ repetition and is so specific that change/evolution is quite painful, because every new addition or use-case needs a migration.

Or you model things very generically, more data driven than schema driven. You lose schema strength and you definitely lose sensible ad-hoc queries. But you gain flexibility and generality and can cover much more ground.

You can kind of get around this dichotomy with views, perhaps triggers and such. In an ideal world you'd want the former to be your views and the latter to be your foundational schema.

But now you get into another problem, which is that homogeneous tables are just _super_ rigid as result sets. There are plenty of very common types you cannot cover. For example tagged unions, or any kind of even shallowly nested result (extremely common use case), or multiple result groups in one query. All of these things usually mean you want multiple queries (read transaction) or you use non-SQL stuff like building JSONs (super awkward).

If you can afford to use something like SQLite, then some of the concerns go away. The DB is right there so it's fine to query it repeatedly in small chunks.

I wonder if we're generally doing it wrong though, especially in web development. Shouldn't the backend code quite literally live on the database? I wish my backend language would be a data base query language first and a general purpose language second, so to speak. Clojure and its datalog flavors come close. But I'm thinking of something even more integrated and purpose built.


By moving access to your data-driven normalized tables into stored procedures representing a schema-driven API, you can have them both.

And you can lock down your critical production databases from arbitrary sql


Isn't that what the ActiveRecord pattern is supposed to be? (Something something n+1's and over fetching data.)


If only we had a generic GraphQL resolver for entity-based SQL schemas.

Oh wait, we do have Prisma. And it suffers from those same issues.


Have you looked at PostGraphile? It’s doesn’t have n + 1 or over-fetching issues.


I have, and I would recommend it, if it generated a Postgre schema from a GraphQL schema and not the other way around.

The advantages of GraphQL are its integration of backend and frontend workflows, and that won’t happen with PostGraphile: a frontend needing a schema change needs to go through the backend instead of the backend extending to meet the frontend in the middle.


> Except the most important problem: A pleasant API

A pleasant API is clearly not the most important business problem a database is there to solve.

The data in it is presumably the life and blood of the business, whereas the API is something only developers need to deal with.

But that aside, the interface will be SQL which is quite powerful, long-lived (most important) and, fortunately, very pleasant.


EdgeDB looks promising. Postgres under the hood so you know it's stable.


What is a pleasant API? For what kind of data?


I wonder if parent post meant something other than SQL?


That's also my reading. And I agree that SQL is a nice DSL yet way behind general programming languages we enjoy today.

I'm dealing with some pretty involved ETL at work these days, and it's really hard and ugly to do it in SQL to "stay in the DB" and keep things fast. It's tempting to read, transform in a high-level language (think pandas) then write back to the DB, but then you give up so much perf.

I would love another API than SQL like what Sparq is doing. You can keep a query optimizer, an engine to execute queries efficiently physically, etc, but it could be controlled with a nice general language API.

Another way to put it is that SQL became the standard and alternative haven't been developed much in comparison. Imagine like the ML world being stuck with Prolog or some DSL. It would be annoying to say the least. Instead people have a variety of APIs and the popular and most developed ones are actually using high-level languages to drive optimized ML engines. These convert function calls and chaining into efficient transformation that use memory, computing units (CPU, GPU, etc) efficiently. The same for DB would be wonderful.


Yeah, a lot of RDBMS are adding JSON support, but the support is often a bit clunky to use and may be missing important features. If you're dealing with a bunch of semistructured APIs that return JSON natively, Mongo makes it really easy to just dump all that into a collection and then just add indices as needed.


>Except the most important problem: A pleasant API.

For that, there are stored procedures.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: