Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can you explain more what you mean about SQLite having roughly its own VM?


See: http://www.sqlite.org/different.html

Money quote about half-way down the page: "SQL statements compile into virtual machine code"

Every SQL database engine compiles each SQL statement into some kind of internal data structure which is then used to carry out the work of the statement. But in most SQL engines that internal data structure is a complex web of interlinked structures and objects. In SQLite, the compiled form of statements is a short program in a machine-language like representation. Users of the database can view this virtual machine language by prepending the EXPLAIN keyword to a query.

The use of a virtual machine in SQLite has been a great benefit to the library's development. The virtual machine provides a crisp, well-defined junction between the front-end of SQLite (the part that parses SQL statements and generates virtual machine code) and the back-end (the part that executes the virtual machine code and computes a result.) The virtual machine allows the developers to see clearly and in an easily readable form what SQLite is trying to do with each statement it compiles, which is a tremendous help in debugging. Depending on how it is compiled, SQLite also has the capability of tracing the execution of the virtual machine - printing each virtual machine instruction and its result as it executes.


Off-topic but relevant: I've been looking for info on just how database software does what it does. Ie from parsing the SQL query to hitting the disk, and everything in between. Anyone got any links or books?


Gray and Reuter's Transaction Processing is absolutely superb, and it covers a good part of this stack, everything lower-level than query planning. However, it covers every approach to doing this. If you want to know how Postgres, say, does it, that's a lot less information, and may be more digestible.


Thanks. I am interested in the general theory of how every database does it; it bothers me that this area of software engineering is terra incognita to me.


The sqlite source is well put together and pretty small. The (now pretty much obsolete) sqlite 2.x source was a lot smaller and might be a better starting point for pure learning purposes.

Bonus: in the sqlite command line tool, putting "EXPLAIN " before a query will dump out the VM commands that the query was converted to, without executing them.


Any decent textbook on databases would cover most of this, with the possible exception of parsing SQL. It will likely have an in-depth discussion of data structures to reduce disk access.

I strongly recommend you read one such book more or less end-to-end.

HN: As with other areas of computer science, there is often a "canonical" book, like TAOCP or CLRS on algorithms. Is there any such book for databases?


Nothing speaks like the source :) I recommend: http://www.postgresql.org/ftp/source/v9.1.2/


There are also details and examples at http://www.sqlite.org/vdbe.html

Things have changed a bit since then, but not much. SQLite's API is very different than regular databases because it is a library operating in the same process. In particular it does not calculate all result rows for a query up front (that wouldn't be very 'Lite') but instead calculates the next matching row as you ask for it.

Consequently the internals have to be able to record their state, return a row, and then resume from that state to get the next matching row. There is a also a fair amount of query optimisation that goes on, which again means the need for expressing queries in a variety of different building blocks. Combine the state machine with building blocks and you have a special purpose VM.


In particular it does not calculate all result rows for a query up front (that wouldn't be very 'Lite') but instead calculates the next matching row as you ask for it.

That sounds like SQL cursors?


If you squint really hard you could make that case, but in SQLite they are really not the same. You cannot use SQL syntax and you can't do other operations on cursors other than read the columns for the matching row.

Virtually all other database engines calculate the query results up front. It is more effective in their implementations to do it that way. For example they will also calls to ask how many rows remain in the results. SQLite has no such API and the only way to find out is to actually retrieve each result row.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: