Hacker News new | past | comments | ask | show | jobs | submit login
How a SQL database works (calpaterson.com)
48 points by datelligence on Dec 19, 2019 | hide | past | favorite | 16 comments



I find it somewhat disappointing that these kinds of tutorials always use such simplistic demo data. I don't care about a database with 3 rows and 4 columns. I wish the authors could find a way to illustrate what a larger, more realistic database looks like. Like, a database that would actually need an index.


The main thing to learn about btrees is that they have very high branching to ensure shallowness and thus trade off more sequential accesses for fewer random accesses, a good tradeoff for disk I/O. The other thing is to recognise that sorted access is still better than random access even over an index; better for disk / block cache, better for CPU cache, and seemingly redundant sorts can e.g. have outsize constant factor performance wins for some joins over materialized subtable queries.


Is a btree with a million nodes really going to be any easier to understand? Like I understand some times its better to have more realistic data (e.g for presenting results related to benchmarking performance), but for a post explaining the data structures behind a database it wouldn't help at all.


yeah, I think it would ultimately give the reader better understanding of how a real SQL system works. The idea, after all, is to give the reader an understanding that they can use to go on and create performant applications, right? If you create an application while thinking of the database as a 2d table with single digit dimensions, you may make some pretty big design missteps when it scales to hold real data.


Hi, - thanks for the feedback - I appreciate it. Negative feedback is always helpful :)

I did originally try using a meatier example but I found that doing that a) makes writing the article really hard - for example, diagramming a meaningful database is a lot of work and makes the diagrams incomprehensive and b) it is that much harder to trasmit the principles than when the example is trivial.


Alternatively, analytics databases don't use indices by design, as for most queries they need to shuffle entire datasets anyways. For example, https://dbdb.io/db/vertica: "Indexes are not support in Vertica."


Indexes are part of the SQL standard, so they are certainly pertinent for "SQL database" discussion.


Plenty of analytical databases still use indexes; bitmap indexes, date ranges, "clustered" indexes and even ordered/btree indexes all still come in handy because they can all help whittle down queries from billions to millions of records.

They just need to be ready to do efficient full table scans.


IIRC Microsoft provided a full blown demo database ( either for SQL Server or from Access, I can't remember ).


Northwind Traders, IIRC


Introduction to Database Systems (CMU) [0] should be useful to anyone who wants to learn about how relational databases work.

[0] https://www.youtube.com/playlist?list=PLSE8ODhjZXjbohkNBWQs_...


Nice. Now explain query optimizers.


I'll try! :) I wanted to cover this material because I found that most of the "how SQL works" intros discuss query planning but I think knowing the underlying datastructures gives people pretty useful intuition.


Grace hash join would be another major topic to cover first.


Almost everything is down to join order, and that's mostly down to which join order ends up with the fewest rows to process.

MySQL IMHO is better for this than Postgres because it's more predictable and you have more tools to force its behaviour, with straight_join and index hints, whereas with Postgres you're forced to use CTEs to control evaluation behaviour when you have application knowledge that the DB doesn't have stats for.

If you know nothing about how databases work, and deal in mostly simple queries, Postgres is a better choice though.


Good thing we cut that "a" out of the title. Saved me so much reading time, and at the low cost of making the title ungrammatical!

(Seriously, why keep doing this?)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: