Hacker News new | past | comments | ask | show | jobs | submit login
SQLite Release 3.25.0 adds support for window functions (sqlite.org)
333 points by MarkusWinand on Aug 15, 2018 | hide | past | favorite | 115 comments



https://www.windowfunctions.com is a good introduction to window functions.

Besides that, the comprehensive testing and evaluation of SQLite never ceases to amaze me. I'm usually hesitant to call software development "engineering", but SQLite is definitely well-engineered.


Ditto! For anyone who isn't familiar with SQLite's testing procedures, read this[1] fascinating page.

The SQLite project has a mind boggling 711 times more test code than SQLite itself has. Put another way, only 0.1% of the project's code is SQLite itself. The other 99.9% consists of tests for that 0.1%.

[1] https://www.sqlite.org/testing.html


In a similar vein, I rarely (if ever) seen a library that handles dynamic memory allocation more robustly than SQLite. This page is a glory to behold: https://www.sqlite.org/malloc.html


That page is inspiring, for a commodity, open source and old software project everything is very clearly defined and the language is very definitive which I hope is indicative of the actual library quality (I’ve never used SQLite).


Oh that is a fun read! I hadn't seen that before.


While in complete agreement with you on how amazing SQLite's engineering practices are, your math is off by an order of magnitude.

1/711 = 0.00140646976

0.00140646976 ~= 0.14%, not 0.01%.

I'll go put on my "pedant" hat now.


Thanks for catching that!

Although it was more a typo than a math error (as the 99.9% figure was correct). I'll chalk it up to posting when I should be sleeping (4:30am local time).


Good thing too, since it's in just about everything: https://www.sqlite.org/mostdeployed.html


I find it fun to look around whatever room/bus/park that I'm in and try to count the instances of Sqlite.


Is it like curl in that regard?


This looks great, but I couldn't get through the first question on aggregate functions. Are there any SQL books/tutorials that go over things like this?

A lot of material I've seen has been like the classic image of "How to draw an owl. First draw two circles, then draw the rest of the owl", where they tell you the super basic stuff, then assume you know everything.


Design queries iteratively.

Having an understanding of relational algebra (not the symbols, but the concept; math is always about the concept) generally helps a lot as well; it's the reason why compsci database lectures often start with relational algebra.

> We would like to find the total weight of cats grouped by age. But only return those groups with a total weight larger than 12.

The total weight of cats grouped by age.

    SELECT sum(weight), age
    FROM cats
    GROUP BY age
But only return those groups with a total weight larger than 12.

    SELECT sum(weight), age
    FROM cats
    GROUP BY age
    HAVING sum(weight) > 12
Ordered by age.

    SELECT sum(weight), age
    FROM cats
    GROUP BY age
    HAVING sum(weight) > 12
    ORDER BY age
The total weight column should be called total_weight.

    SELECT sum(weight) AS total_weight, age
    FROM cats
    GROUP BY age
    HAVING sum(weight) > 12
    ORDER BY age


(self plug) try pgexercises.com. It's in a very similar vein to windowfunctions - I understand it was one of the original inspirations - but it covers a much wider range of material, with a gradual growth from very basic stuff up to slightly advanced.


I'll vouch for this plug! I've spent quite a bit of time learning SQL (and Postgres specific stuff), and this was one of my favorite resources.

I can also recommend Learning SQL by Alan Beaulieu, which was recommended in some HN comment. After looking into various books, some of them Postgres specific, I found this one to be the best for learning from scratch.


My introduction to window functions (and the best write-up I’ve seen) was through the SQL Cookbook (http://shop.oreilly.com/product/9780596009762.do). I highly recommend.


This is a fantastic book - the name doesn't do it justice.


Try Joe Celko's book "SQL for Smarties" or ...

https://www.red-gate.com/simple-talk/sql/t-sql-programming/w...


Ibis uses windowing functions for aggregations if the database supports them. IDK when support for the new SQLite support will be implemented? http://docs.ibis-project.org/sql.html#window-functions

[EDIT]

I created an issue for this here: https://github.com/ibis-project/ibis/issues/1597


The discussion at https://news.ycombinator.com/item?id=17231349 has many more links and descriptions of how to use window functions in practice. (Including my comment at the top about how in practice I find that you usually want to combine window functions with nested queries.)


Thank you everyone for your help!


> but SQLite is definitely well-engineered.

Considering it was written for the U.S. Navy to run software on guided missile destroyers, that makes sense.


For Python folks interested in using these features, you might be interested in this post [0] which describes how to compile the latest SQLite and the python sqlite3 driver. I've got a fork of the standard lib sqlite3 driver that includes support for user-defined window functions in Python as well which may interest you.

[0] http://charlesleifer.com/blog/compiling-sqlite-for-use-with-...

[1] https://github.com/coleifer/pysqlite3


The query optimizer improvements are pretty cool too.

Even though, I don't really understand that one : "The IN-early-out optimization: When doing a look-up on a multi-column index and an IN operator is used on a column other than the left-most column, then if no rows match against the first IN value, check to make sure there exist rows that match the columns to the right before continuing with the next IN value. ". I would think there's no need to check the right column(s) if the leftmost one has no match...


Suppose your query is:

   SELECT * FROM tab WHERE key1=1 AND key2 IN (2,3,4,5);
SQLite starts by doing a single b-tree lookup on the index on (1,2) - composed from the key1 field and the first possibility of the key2 field. If that works, then it proceeds to look up (1,3), (1,4), and (1,5). But if the (1,2) lookup fails, then it backs off and tries just (1,) to see if that matches anything at all. If (1,) finds any record, the search proceeds with (1,3), (1,4),etc. But if (1,*) fails, the search stops immediately.

The insight here is that a multi-column key value can be resolved using a single binary search. It is not a sequence thing where we first look for the key1=1 and then do a separate lookup in a subtree for key2. Both key1 and key2 are resolved in the same binary search.


This is what I expected the optimization to be, except I'm still not sure that I understand the wording of "that match the columns to the right", I'd expect that to be "that match the columns to the left", after all, you're checking the existence of (1,* ), not of (* ,2) or (* ,3).


That makes sense. I completely misunderstood the release note. Thanks for the clarification.

The insight is very interesting. I always thought there would be 2 separate binary searches...


A bit offtopic but has anyone tried to replicate sqlite databases? Using rqlite https://github.com/rqlite/rqlite or something else?


I am interested in this topic as well, although sqlite explicitly says that it is not meant for client-server configuration. I still want to see if it is feasible and someone is using it in production.


What they mean by that comment is that it doesn't have users in the sense that mysql, SQL Server, etc have. https://www.sqlite.org/whentouse.html The documentations says that it works well for websites. For client server, you just need to make an API that handles the interactions with the database.


I got thrown into a legacy web project that used sqlite as the database. It was a small internal-only app, I guess the original developer(s) figured it was so small that sqlite would be plenty and it would reduce the environment complexity.

Unsurprisingly they were wrong. It was small, but sqlite couldn't handle multiple users. I believe this was before sqlite had WAL support, so reading would lock the DB. The 'solution' was to split the sqlite database into many smaller DBs that would allow users to use the site at the same time as long as they were in different areas. This greatly added to the complexity of the application. Some reports would need to access multiple databases to get what it needed, so it would still lock out people. Complexity was much higher then having all the data in a single postgresql/mysql database. All the users hated the system and often ran into DB lock issues.

> sqlite explicitly says that it is not meant for client-server configuration

They are right and their advice should be heeded.


Since 2010, SQLite has had Write-Ahead-Logging. Perhaps your project was using an older version of SQLite? https://www.sqlite.org/wal.html


Expensify uses SQLite for their core database.

https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-q...


Interesting discussion of bedrock

https://news.ycombinator.com/item?id=12739771


Most applications don't need the level of concurrency that Pg or MySQL provide and they will be just fine with something simple to operate, which is why I wrote https://redisql.com/


Shameless plug since I am the author, but you could be interested in https://redisql.com/

Basically an Redis module that embed SQLite, I offer replication on the PRO version, if you want to try it out you can download the trial version for free.

If the trial version is not enough, send me an email and we could work something out ;)


Just a heads up, you should probably have someone proof read your copy. There is a typo in every paragraph.


Thanks I will proceed soon!


Almost all my main apps have a sync history. Before was using batch processing and a version field:

customer: name, ...., version = 1

update customer: name, ...., version = 2

and sync manually. Now I use a log/event sourcing like setup, where every write to the BD is also stored in a log table and sync from there. Much better!


Make your logged events invertible and get an undo/redo function for free. That's how I kill three to four birds with one stone.


invertible? What is that?


Say you record an event like "name of customer X has been changed to FooCo". Instead, record "name of customer X has been changed to FooCo from Foo & Sons Co". If you want to undo it, just swap from/to (=invert it).


A common pattern to implement this is an audit table: https://dba.stackexchange.com/questions/15186/what-is-an-aud...

...which is itself a special case of an audit trail or log: https://en.wikipedia.org/wiki/Audit_trail


In my experience the usefulness of audit tables diminishes as the table becomes more complex (more columns; composite and document columns, like arrays, json, xml), because it becomes difficult to find what exactly changed. Another problem is related rows; audit tables aren't very good to capture the context of changes. You can work around this with more complex triggers, but it quickly becomes a lot easier to do this from the application and not in the database, capturing semantics and not just data changes.

The issue with any application-based approach is of course that it can't provide total access mediation.


Bedrock [1] by Expensify is supposed to be solid. Master/slave async replication (i.e. all slaves are read only), pluggable, provides full SQLite access via the MySQL protocol.

[1] http://bedrockdb.com


Related: Is there a ruby client for rqlite? I can't find one.


A bit off topic, but would be great to use SQLite in the browser instead of IndexedDB.

I love relational databases, but you're almost forced into a NoSQL approach when developing a SPA since the client (browser) only supports simple key -> value storage. It would be a dream to use LINQ-to-SQL, or similar type safe query DSLs like Slick or Quill (Scala), or Esqueleto (Haskell) in the browser.

Combine that with a single language driving the backend and frontend and voila, no duplication of model, validation, etc. layers on server and client.

One can dream I guess, but the reality is NoSQL fits the modern web app like a glove, for better or worse.


> A bit off topic, but would be great to use SQLite in the browser instead of IndexedDB

That almost happened. There was a thing called WebSQL [1] that was W3C was working on to add SQL to the browser. Everyone who implemented it used SQLite.

Apparently, that disqualified it from standardization. To move ahead, they wanted to see independent implementations of the standard.

No browser makers stepped up to reduce the quality of their implementation by replacing some of the best designed, best written, best tested code on the planet with some other SQL back end to satisfy the committee, and so Mozilla was able to push IndexedDB as the standard browser DB interface.

[1] https://en.wikipedia.org/wiki/Web_SQL_Database


> Everyone who implemented it used SQLite

They had to, since what was standardized was specifically the SQL dialect of SQLite v3.6.19.

> No browser makers stepped up to reduce the quality of their implementation by replacing some of the best designed, best written, best tested code on the planet with some other SQL back end to satisfy the committee

There were only two implementations at all: WebKit and Opera. Mozilla and Microsoft weren't going to implement it without a spec decoupled from particular backend.


Ironically, it’s my understanding many browsers use SQLite under the hood for storage of indexeddb


This almost happened. Ten years ago "WebSQL" was being considered as a standard - it would have introduced SQLite-backed databases as a built-in browser feature: https://en.wikipedia.org/wiki/Web_SQL_Database

The W3C dropped it in November 2010 because they were uncomfortable supporting a web standard which was entirely tied to a single implementation of SQL (SQLite). I adore SQLite but I think the W3C made the right call there - web standards need more than one compatible implementation.


> I adore SQLite but I think the W3C made the right call there - web standards need more than one compatible implementation.

could you elaborate on that? I get the impression that now, eight years later, we're much worse off. I don't really see the upside of not going for what is, AFAIK, a relatively vanilla implementation of SQL.


A web standard needs to have more than one implementation. If you allow one single implementation to define the standard, you get something like Flash.

I don't have a particularly strong practical argument here: for me this is more of a fundamental principle for how the web should work.


Flash was a closed source plugin that was added to the browser as a binary blob. It could be updated by the user, and because it frequently had security bugs users were frequently urged to update. Those updates also often introduced new features, which content providers started taking advantage of once a significant number of people updated, putting pressure on the rest to update too.

SQLite is open source that would have been compiled into the browser by the browser vendor. A typical user would not have the means to update it separately from the rest of the browser.

So why couldn't W3C pick a particular release of SQLite, list a specific subset of its features, and declare that this, as described in the documentation for that specific version, is the standard?


> web standards need more than one compatible implementation

Seems like a similar situation to WebGL. I think they would have looked past implementation count if the feature was really needed.


https://github.com/kripken/sql.js/

I have used this and it is slow. But it was interesting!


> Named window-defn

<3

I really prefer that syntax and I’m always a little sad when I have to copy-paste windows across average, total, count, standard deviation, max, min… I fully admit that it’s syntactic sugar but it’s the elegant kind.


I’d really really like if they improved “alter table” to include dropping/renaming columns/constraints, even if it required rewriting the whole table.


There is nothing stopping you from doing it yourself. The Sqlite FAQ[1] even has an entry on it.

Two things holding this back: 1. code being inside our outside sqlite would not be much different. 2. the amount of additional test code would be humongous for an operation of that level of complexity.

So, just do it yourself. It's not that hard.

[1]: https://www.sqlite.org/faq.html#q11


A lot of table alterations can be done without rebuilding the table, though perhaps a table scan might be needed to validate constraints (though this should optionally be a separate statement). To do this safely requires the RDBMS to support the functionality.


> There is nothing stopping you from doing it yourself.

Good database migration tools (e.g. Alembic) do this automatically.


Any idea on how to do this without blocking writes when SQLite is embedded in a server process?


Well if you don't mind rewriting tables, just create a table in the new structure, insert data from the old table to the new one, drop the old table, and rename the new table.


That’s not enough because it ignores all foreign key constraints.


Lock db exclusively, disable foreign key checking, rebuild and replace table, reconfigure foreign keys (they are gone after delete and rename), and everything should be fine.


One problem is that you then might need a special case for SQLite in migrations which can lead to different db schemas in tests and production


Use the same database for test and production. Nowadays, it is really easy to install PostgreSQL or MySQL locally or in your test environment. SQLite is great, but not for replacing PostgreSQL or MySQL during tests.


Any idea on how to do this without blocking writes when SQLite is embedded in a server process?


Create new empty table, set up trigger on old table to copy across any inserted/updated rows, backfill the remaining rows incrementally, and finally use ALTER TABLE to atomically replace the old table with the new one.


You're right, it should work!

When you wrote "backfill the remaining rows incrementally", did you mean having some background process or thread that reads a batch of rows from the old table (for example a few thousands of row), then inserts them in the new table, then commits, and keep doing this until all rows are copied? This way, other writers will be blocked only for the duration of each incremental transaction and will have an opportunity to lock the database for themselves between two batches?

Do you have experience doing this on production servers?


Unrelated to window functions, but I finally took the time to start digging into SQLite's internals. People always sing its praises, so it was time to see what all the fuss was about.

Someone else already mentioned that the vast majority of SQLite's codebase are tests. Well, on top of that, of the real "working" codebase I'd say the majority of it is comments. It's incredible. The source is more book than code. If you have any when, why, or how question about SQLite, I guarantee it's answered in the code comments (or at least one of the hundreds of superb documents on their website).

Another surprise I discovered: SQLite has a virtual machine and its own bytecode. All queries you execute against a SQLite database are compiled into SQLite's own little bytecode and then executed on a VM designed for working with SQLite's database. Go ahead, start `sqlite3 yourdb.sqlite` and then run `explain select * from yourtable;`. It'll dump the bytecode for that statement; or any statement you put after `explain`. So cool!

In hindsight, it makes a lot of sense, and a well built VM can be nearly as efficient as any other alternative.

https://sqlite.org/arch.html https://sqlite.org/opcode.html

Fun bit of history. The VM used to be stack based, but now it's register based. I guess they learned the same lessons the rest of the industry learned over that time period :P

(N.B. the bytecode is for internal use only; it's not a public facing API. You should never, ever use bytecode directly yourself.)

There are some painful parts of the codebase though. These aren't "cons" per se. More like necessarily evils.

1) It is filled to the brim with backwards compatibility hacks that make the code more complex than it strictly needs to be. (Most of these are the result of various users of the library misusing the API. The SQLite devs are generous enough to grandfather in the bugs that made those applications work. That's excellent, but it definitely makes the code more "crusty".)

2) One of SQLite's big features is its flexible memory subsystem. It handles OOM, and provides an API for completely customizing the memory subsystem. But given that this is C and memory allocation and interaction is pervasive, the code ends up littered with function calls and clauses. Handling OOM is no small task, and often how to handle the OOM is different in different places. So you can imagine the complexity that adds to the codebase.

Again, those are necessary evils, so its not something I'm "complaining" about. But I thought they were worth mentioning for fellow adventures like me who decide to dive in (which I highly recommend).

So, thanks to how well designed SQLite is overall, and their great documentation, I was able to write a parser in Rust for the SQLite file format in a handful of hours (https://sqlite.org/fileformat2.html). The file format is surprisingly simple. I'm now writing a Cursor to walk the tables, which is a fun exercise of classic B-Tree algorithms.


> It is filled to the brim with backwards compatibility hacks that make the code more complex than it strictly needs to be

They finally ended sqlite4 as an architectural misadventure, but perhaps they should try again with the current codebase/features but all of the backwards cruft jettisoned.


According to [1] windowing functions make SQL turing complete.

Does this make SQLite turing complete or has it been turing complete before?

[1] http://beza1e1.tuxen.de/articles/accidentally_turing_complet...


This is super cool.

Does anyone know how to upgrade python3's sqlite module to the latest version?


Use APSW instead of sqlite3: https://rogerbinns.github.io/apsw/


You can use https://github.com/coleifer/pysqlite3

It even supports user-defined window functions using the new sqlite apis.


Charles provided really great documentation on how to build this here: http://charlesleifer.com/blog/compiling-sqlite-for-use-with-...

Or if you're feeling lazy (like I was), there's a fork of his library at https://github.com/karlb/pysqlite3 which compiles the 3.25.0 amalgamation by default. This worked for me:

    $ python3 -mvirtualenv venv
    $ source venv/bin/activate
    $ pip install git+git://github.com/karlb/pysqlite3
    Collecting git+git://github.com/karlb/pysqlite3
    ...
    Installing collected packages: pysqlite3
    Successfully installed pysqlite3-0.2.0
    $ python
    Python 3.6.5 (default, Mar 30 2018, 06:41:53) 
    [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import pysqlite3
    >>> pysqlite3.connect(":memory:").execute("select sqlite_version()").fetchall()
    [('3.25.0',)]


Is this going to be an issue?

Or just a benefit?

Ive had python libraries break with updates.


This is awesome! Does anyone know if this update will effect the sqldf package in R?


Not sure if SQLite is still a 'lite' database


I've just built the snapshot. The binary is still less than 2MB.

And it works:

  sqlite> with t(x) as (values (1), (2)) select sum(x) over (order by x) from t;
  1
  3


I just checked a stripped copy of postgresql-10 built with all options on and it is 7MB. That could be reduced a little by leaving out language support and ssl support etc.


While binary size could certainly be a factor in a good deal of embedded environments, we also need to look at the resource requirements of the binary in question as well.

Sqlite doesn't need too much more memory than what its binary needs whereas with postgresql, you need all sorts of bells and whistles just to get the database system to boot.


Well, sure, sqllite is smaller than postgres, but database system performance is largely dictated by the size of the buffer cache. For something like a configuration database (a great use of sqllite) this does not matter. But for more than a very modest amount of data more memory for buffers will benefit both postgresql and sqllite.

Anyway, not trying to make the case that postgresql is small compared to sqllite, it obviously isn't, just wanted to point out that it's not _that_ big either.


Do you have tutorials to strip lang and ssl?

Is it possible to dynamic link system openssl to reduce size?


If you look at the build instructions you will see that in the ./configure step you can enable and disable different features. Try ./configure --help to get a list.

It is dynamically linked to openssl, but you can configure out the internal code to support the functionality.

"strip" however meant the unix strip command to remove debug symbols.


And how possible to embeb it in a mobile device? Too crazy, I imagine...


Insightful. Thanks!


To me, what makes it 'lite' is that it doesn't have a client-server architecture that requires a running daemon. It's just a file format and a library designed to interact with it.


I just had a look at the executable of my latest - extremely simple - webapp. A 64 bit elf, statically linked with Musl libc and SQLite, written in Nim. 1.4MB all told, with no optimizations whatsoever. In the year 2018 AD, we count that as light.


Interesting, I noticed how big my database file was as well.

I considered the Lite meant offline storage.

Still a huge fan, I went from noob programmer to knowing databases because how quickly I could make and play with databases in sqlite.


The "lite" refers to things like:

   - it only has b*-tree indexes
   - it only has one index per-table source


> it only has one index per-table source

I don't know for sure what this means, but it sounds like it is incorrect.


I could swear I've seen you say this. Maybe I'm mis-remembering? ISTR it was that for each table source in a query SQLite3 uses just one index (or the table itself) for indexing or scanning to find relevant rows in that table source.


SQLite can use multiple indexes if there are OR terms in the WHERE clause.

SQLite tries to only uses indexes in situations where they help the query run faster. SQLite is not limited in its use of indexes. It is just that the use of multiple indexes for a single FROM-clause term is rarely helpful.


Ah, OK, thanks!


In the category of SQL databases, it certainly is. I mean it's not called LiteDB for a reason I suppose ;).


I don't even know what that means but again, because it is SQLite, I up vote automatically ^_^


There's a whole categories of analysis queries that are really easy to write with window functions and very annoying to write without. They help eliminate nested subqueries and self joins. This should make SQLlite a better choice for data analysis and reporting.


SQLite guys, please add FDW support ala Postgres and easy foreign function support for Python and R, and you’ll corner most of analytics and data science.


SQLite has had FDW since forever. https://sqlite.org/vtab.html

SQLite has had user-defined functions since forever. https://sqlite.org/c3ref/create_function.html

Another underappreciated feature (while we're at it) would be WAL instead of undo-journaling (https://www.sqlite.org/wal.html), which enables concurrent reading and writing of SQLite databases. Has been available for some ten years or so, but is off by default.

Functions are available through the Python bindings (which are not maintained by the SQLite project), virtual tables I think, aren't. Alternate bindings (https://rogerbinns.github.io/apsw/) claim to achieve "Everything you can do from the SQLite C API" interop, which would include virtual tables.


Having written a fairly significant amount of SQLite virtual table module code, I think it isn't quite sophisticated enough to be useful. I regret building my application atop that virtual table functionality. Many parts of queries that are vital for efficient execution are not pushed down to the virtual table provider; for instance there is no way for "SELECT COUNT(*) FROM table" to do anything but retrieve every field of every row and return them to the SQLite engine, because the interface does not push the "COUNT" down to the virtual table provider. Even if your underlying data store can trivially answer the question directly, you have to pull all the data and let SQLite compute the answer. So on and so forth.


Out of curiosity, why would you ever run a SELECT COUNT(*)? What use case demands that versus, say, SELECT COUNT(col) where col is the name of a column in the table?


That was just a simplified example. "SELECT COUNT(col) FROM ..." has the same problem: from the virtual table provider's point of view, the query you submitted is "SELECT col FROM ...". There's no way for it to avoid retrieving the data even if it has a faster way just to count it. This is just one small example -- the virtual table interface is just too simplistic to support a lot of real-life scenarios.

Another good example is joins -- they aren't pushed down, and SQLite can only do joins as nested loops. All you can do is stream raw data into the SQLite engine and allow it to perform the join, even if you can do it more efficiently yourself.

In my case I was passing queries to another database engine, so every single thing that could be pushed down to the virtual table provider, I would have been able to take advantage of. But not enough stuff is.


eh? if you want to know the number of rows in a table select count(*) is the idiomatic way


No. That’s the way one may see the first time they’re learning what a SELECT statement is, but it’s most certainly not what you’d ever do in anything near production code.

The parent made this statement:

> ... for instance there is no way for `SELECT COUNT(star) FROM table` to do anything but retrieve every field of every row and return them to the SQLite engine.”

The only response here is, “Of course! That’s what you told it to do! Why would it do anything else—and why would you ever tell it to do that?!”

Edit: To be clear and not sound as though I’m just being dismissive, you can of course use `COUNT(star)`, but the parent made it sound like this was something non-trivial. There are better ways to write such a query when you care about the exact count of rows and, assuming you can’t depend on grabbing the max primary key due to deletions, you’re complaining about the database wasting cycles pulling back irrelevant info. Get to know your database, as it often has features to query such metadata in a much smarter and more performant way.

PS: I replaced the asterisk with the word “star” because it’s making the formatting of this comment atrocious. Apologies for all the edits. I give up now.


I'm sorry, you've misunderstood. I'm fully aware that you can query it more efficiently, and in fact that's the point. If you're writing a virtual table provider, SQLite does not provide enough information about the actual user's query for you to know that you can do it more efficiently. There's no way to know that the user wants a count at all; the count function is not pushed down into the virtual table provider interface. SQLite only tells you that it needs you to retrieve every field in every row. You can't know that, in actuality, SQLite only wants to count the rows.


Ah. My apologies. I did misunderstand what you were calling attention to.


Not only that, the GNOME Data Access (GDA) uses this and has virtual tables for accessing disparate remote databases (including LDAP), so we know this works. What u/usgroup may be asking for is for a bunch of virtual tables to be included in SQLite3 proper.


As far as I know you already can extend SQLite with custom scalar and aggregate functions and virtual tables (FDW) at least in Python (with apsw). Am I missing something?


I might be missing something, but it sounds to me like you're describing this: https://docs.python.org/3/library/sqlite3.html#sqlite3.Conne...


SQLite competes with fopen; not SQL. [1] It's great for embedded systems and small single-user apps. It's not when your data doesn't fit in memory.

[1]: https://www.sqlite.org/whentouse.html


Many analytics use cases are single user. I’ve often thought you could do worse than SQLite as a first pass at a dataframe implementation. And SQLite is a great fit for a range of analyses on a single-user computer where you’re looking to sample from or calculate aggregations from data that fits on harddisk but not in RAM.

Now, where SQLite starts to fall down in analytics workloads is that it’s row-oriented rather than column oriented. Performance could be better. Still, even for analytic workloads SQLite can be good enough for medium-sized data!


I dunno... In my neck of the woods, datasets larger than anything a hard drive will store are commonplace, making SQLite a non-starter. I can imagine SQLite being useful for prototyping or for toy apps or for one-offs with small datasets, but not much else...

(Edit to clarify: I think it's awesome for embedded apps. But it never struck me as a good choice for data science.)


Cool, but in between data that fits in RAM and data that requires exotic storage and computing surely you can at least imagine data of this mid-sized variety.

For that, SQLite isn’t necessarily bad.


I’d argue what they would need for that is tighter integration with Pandas (Python) and data.table (R) but that would be nice

Clever support for multilevel indexes would be top of my wishlist




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: