Hacker News new | past | comments | ask | show | jobs | submit login
SQLite Is Serverless (sqlite.org)
501 points by alexellisuk on Jan 26, 2020 | hide | past | favorite | 440 comments



I think a good under-appreciated use case for SQLite is as a build artifact of ETL processes/build processes/data pipelines. Seems like lot of people's default, understandably, is to use JSON as the output and intermediate results, but if you use SQLite, you'd have all the benefits of SQL (indexes, joins, grouping, ordering, querying logic, and random access) and many of the benefits of JSON files (SQLite DBs are just files that are easy to copy, store, version, etc and don't require a centralized service).

I'm not saying ALWAYS use SQLite for these cases, but in the right scenario it can simplify things significantly.

Another similar use case would be AI/ML models that require a bunch of data to operate (e.g. large random forests). If you store that data in Postgres, Mongo or Redis, it becomes hard to ship your model alongside with updated data sets. If you store the data in memory (e.g. if you just serialize your model after training it), it can be too large to fit in memory. SQLite (or other embedded database, like BerkleyDB) can give the best of both worlds-- fast random access, low memory usage, and easy to ship.


I have been using SQLite as a format to move data between steps in a complicated batch processing pipeline.

With the right pragmas it is both faster and more compact than JSON. It is also much more "human readable" than gigabytes of JSON.

I only wish there was a way to open an http-fetched SQLite database from memory so I don't have to write it to disk first.


> I only wish there was a way to open an http-fetched SQLite database from memory so I don't have to write it to disk first.

The sqlite3_deserialize() interface was created for this very purpose. https://www.sqlite.org/c3ref/deserialize.html


If the language's sqlite bindings don't offer a way to load a database from a string, if you're on a modern linux kernel (3.17+) you can make use of the memfd_create syscall: it creates an anonymous memory-backed file descriptor equivalent to a tmpfs file, but no tmpfs filesystem needs to be mounted and there's no need to think about file paths.


You can use the memvfs module to load in-memory databases if you're using the C API. I'm not sure how many higher-level APIs support it though.

[1] https://stackoverflow.com/a/53453338/3063 [2] https://www.sqlite.org/loadext.html#example_extensions [3] https://www.sqlite.org/src/file/ext/misc/memvfs.c


A very interesting approach is sqltorrent (https://github.com/bittorrent/sqltorrent): the sqlite file is shared in a torrent, and all queries will touch a specific part of the file, which is downloaded on-demand.

Also check https://github.com/lmatteis/torrent-net


Incredibly odd, but so awesome


  $ mount -t tmpfs none /some/path
  $ write db.sqlite /some/path/db.sqlite
  $ read db.sqlite

We've been abusing tmpfs for more than 10 years to get around the IO layer's failings. It's probably still a valid pattern.


This is a amazing, I think you may have just solved and headed-off a huge number odd problems for me.

Could you talk more about what Pragmas you’ve been using and why?


Not the OP, but I find `PRAGMA synchronous = OFF` makes the creation of DBs vastly faster ...


> I only wish there was a way to open an http-fetched SQLite database from memory so I don't have to write it to disk first.

Ramfs?


tmpfs is the better-behaved option should you run out of resources, see:

https://www.jamescoyle.net/knowledge/951-the-difference-betw...

I'm still remembering old-school ramdisks under Linux which were finite in both number and size, both to quite small extents. I think there were 8 (or 12 or 16?) total ramdisks available, of only 2-4 MB each, configurable with LILO boot options.

That's now ... mostly taking up valuable storage in my own brain for no useful effect.


It looks like a good intro, thanks. I wasn't aware of these technologies, but I knew it was possible to build an FS in RAM. So I just put these two keywords together.


FWIW, I learned a few things researching my answer.

(A prime validation for answering questions, BTW.)

My first read was that the old-school ramfs / ramdisk limitations still held. I can't actually even find documentation on them, though I'm pretty sure I'm not dreaming this.

Circa 2.0 kernal IIRC, possibly earlier.

OK, some traces remain, see:

https://www.tldp.org/HOWTO/Bootdisk-HOWTO/x1143.html

Note that this is OBSOLETE information.


What pragmas do you use? It sounds amazing!


Using SQLite in my ETL processes is something I have done for over a decade. It's just so convenient and, at the end, I have this file that can be examined and queried to see where something might have gone wrong. All of my "temporary" tables are right there for me to look at. It is wonderful!


Yes! Along these lines I heartily recommend `lnav` ^1, a fantastic, lightweight, scriptable CLI mini-ETL tool w embedded sqlite engine, ideally suited for working with moderately-sized data sets (ie, millions of rows not billions) ... so useful!

1. https://lnav.org


I have used it to inspect say the history of a users' requests on a load-balanced server. I like to permanently store the results of the logfile excerpt to a DB table for posterity and future reporting.

Figuring out how to enter "sql" mode in lnav, generate a logfile table, and then persist it from an in-memory sqlite db to a saved-to-disk sqlite db .... was frustratingly annoying.

It boils down to:

    :create-logline-table custom_log
    ;ATTACH DATABASE `test02.db` AS bkup;
    ;create table bkup.custom_log as select * from custom_log;
    ;detach database bkup;
if i recall you cannot call sqlite commands ".backup" or similar in lnavs sql mode. So lnavs interjection into the sqlite command processing is annoying (I'm actually very familiar with sqlite).


Would you mind elaborating on your ETL process a little more? Im a junior DE and curious about how I would implement this


It's pretty straightforward, really.

I construct the .sqlite database from scratch each time in Python, building out table after table as I like it.

Some configuration data is loaded in from files first. This could be some default values or even test records for later injection.

The input data is loaded into the appropriate tables and then indexed as appropriate (or if appropriate). It is as "raw" as I can get it.

Each successive transformation occurs on a new table. This is so I can always go back one step for any post-mortem if I need to. Also, I can reference something that might be DELETEd in an a later table.

Often (and this is task-dependent), I will have to pull in data from other server-based databases, typically the target. They get their own tables. Then I can mark certain records as not being present in the target database, so they must be INSERTed. If a record is not present in my input and is there in the target, that would suggest a DELETE. Finally, I can compare records where some ID is present in my input and my .sqlite, they might be good for an UPDATE. All of this is so I can make only the changes that need to be made. Speed is not important to me here, only understanding what changes needed to be made and having a record of what they were and why.

I am happy to say that an ETL process I wrote using this general method back around 2009 is probably still running. I haven't had to touch it in years. Occasionally I will receive questions as to "why did this happen?" and I can just start running queries on the resultant .sqlite database file, kept with the logs, for answers.

Similarly, I can use these sorts of techniques when I am analyzing other datasets. The value here is that I can just refresh one table when the relevant data comes in, rather than having to run the ingest process for everything all over again. This can save me a lot of time.


Awesome - elegantly simple using very common technologies.


I am not a very talented programmer so I stick very close to what is common, standard, and easy to understand. It usually means I am on the downslope of the hype cycle and it limits some opportunities but I have become okay with that.

I have gotten some CS students who were about to shoot flies with various cannons turned on to SQLite. I kept a couple of the decent books about it nearby and would shove it into their hands at that point. Usually a week later they would be raving about it.


Do you still have the titles of those books at hand? I'd love to take a look at them.


They are The Definitive Guide to SQLite by Mike Owens and Using SQLite by Jay A. Kreibich. I am quite sure they are more book than I needed, I only plumbed a fraction of SQLite's immense capabilities.


Do you generate the file from scratch every time or do you modify the previous one as new data arrives?


Depends on what you want... if you have a separate db project, you can have the output of that project be a clean database for testing other things, or a set of migration scripts for existing deployments.

I've been working on doing similar with containerized dababase servers for testing, while still having versioned scripts for prod (multiple separate deployments).


It is a bit of a hybrid.

In the early stages of development of whatever the ETL process is, I keep the database and just empty it out each time. As I got more of a sense of what I needed, I started DROPing my TABLEs more often and remaking them. Eventually I would make the whole database from scratch once I was along the way and had most everything fleshed out.


Ok. So each export is a full dump, not a delta on a previous one.

Do you anticipate hitting a wall at some point where the total time becomes a problem?


Well, it depends on the process. Some were full dumps, some were deltas pushed up to the final database, sometimes both (this product in particular had a load from file capability that you were supposed to use but some edge cases that were not well-addressed).

No, the time never grew significantly.

For one of the analysis projects, just one step of the analysis was quite time consuming but it would have been that way no matter what. SQLite allowed me to let it grind away overnight (or even over a weekend) on a workstation without tormenting production servers.


We do something like this; one of the outputs of the data pipeline is an sqlite file that's deployed nightly along with code to App Engine. The sqlite stuff is all read only, read/write data for the app is stored in firestore instead.

We initially used json but ran in to memory issues; sqlite is more memory efficient and being able to use SQL instead of the wild SQL-esque is both faster and more reliable.


Yes, I have been doing same thing, only with LMDB.

I do not think LMDB could load from in-memory only object (as it has to have file to memory-map to), however.

But same design reasons, I wanted something that

a) I can move across host architectures

b) something that can act as key-val cache, as soon as the processes using it are restarted (so no cache hydrating delay)

c) something that I can diff/archive/restore/modify in place

We tested sqllite for the above purpose at the time, and writing speed and ( b ) - lmdb was significantly faster.

So we lost the flexibility of SQLite, but I felt it was a reasonable tradeoff, given our needs.

I also know that one of the Intel's python toolkits for image recognition/ai, uses LMDB (optionally) store images that processing routines do not have incur the cost of directory lookups when touching millions of small images. (forgot the name of the toolkit though)…

Overall, this a very valid practice/pattern in data processing pipelines, kudos to you for mentioning it.


"wild SQL-esque" should have been "wild SQL-esque thing I wrote to query the JSON"


I've wondered about this too, but have not gotten around to trying it yet.

We get a gnarly csv log file back from our sensors in the field, which is really a "flattened" relational data model. What I mean by that is a file with "sets" of records of various lengths, all stacked on top of each other. So, if you open it in Excel, (which many users do), the first set of 50 rows may be 10 columns wide, the next 100 rows will be 20 columns wide, the next 45 wide, etc. And, the columns for each of these record sets have different names and data types.

Converting to JSON is obvious, but I've thought about just creating a SQLite file with tables for each of the sets of records. Then, as others have said, can use one of any number to tools to easily query/examine the file. Also can easily import into a pandas data frame.

One concern is file size. Any comments on this? I can try it, but wonder if anyone knows off the top of their heads if a large JSON file converted to an SQLLite file would be a lot larger or smaller?

edit: clarity


Yes, it is great for that.

You only have to read the CSV file once, and after that you have a nice set of tables you can query any which way you want.

I use SQLite as an intermediate step between text files and static HTML, for example.


I was under the impression SQLite files were not supposed to be moved across architectures.


Thankfully that's not the case:

> The SQLite file format is cross-platform. A database file written on one machine can be copied to and used on a different machine with a different architecture. Big-endian or little-endian, 32-bit or 64-bit does not matter. All machines use the same file format. Furthermore, the developers have pledged to keep the file format stable and backwards compatible, so newer versions of SQLite can read and write older database files.

https://www.sqlite.org/different.html


How does this work with container/ephemeral services such as typical K8s deployments? Can I trust the file system mounting via resources like StatefulSets or FS mounts? For that matter, Heroku, App Engine, Cloud Functions, whatever?

Our current setup is having all our services in kubernetes but our databases in stateful VMs. I do occasionally stuff job-reports and similar data into postgres rows since it's already there, but I've been unhappy with our ETL setup and would be interested in hearing techniques to improve it.


ETL workers themselves are typically ephemeral, plumbing batches between remote storage systems like Postgres, S3, and Hive. You might use local disk as scratch space during the batch, but not as a sink.


From experience, and supported by the Sqlite docs, I can tell you that trying to run sqlite on files on an NFS mounted filesystem will not work. See section 2.1 of this document [1] and the related discussion HN discussion [2]

[1] https://www.sqlite.org/howtocorrupt.html

[2] https://news.ycombinator.com/item?id=22098832


Use a volume container mounted against a persistent storage engine on the node and do pod mounting from those containers. Stateful VMs are often a better choice for production imo.

I'm in favor of leveraging ISP dbaas and persistence offerings over trying to home grow something. It just depends on where you are coming from and/or what you are trying to do... K8s alone avoids so much lock in, and as long as whatever storage option (container mount) or dbaas you use is portable, I don't think it's so bad in either case.


I don't know what is "ETL" meaning here, although SQLite does include a JSON extension to read/write JSON data too, so you can use SQL and JSON together if necessary.


It's an acronym for munging some data

ETL = extract, transform, load


Yes, if that is what you are trying to do, I think SQLite is good. SQLite command shell also has a .import command to read data from a file, and you can also import into a view and use triggers to process the data (this is something I have done). And there is also functions and virtual tables for JSON, and you can load extensions (written in C) to add additional functions, virtual tables, collations, etc. So for many cases, SQLite is useful.


this is inspiring, i cannot believe i had not considered this before!


I’d love a SQLite to macOS Excel (or any macOS spreadsheet application) workflow so less technical users can do analysis. Has anybody pulled this off?


You mean like the .excel command?

"... causes them to accumulate output as Comma-Separated-Values (CSV) in a temporary file, then invoke the default system utility for viewing CSV files (usually a spreadsheet program) on the result. This is a quick way of sending the result of a query to a spreadsheet for easy viewing"


Or load it in Metabase (as an macOS app).


You can do that with powerquery


Beware opening SQLite files you didn't create: https://research.checkpoint.com/select-code_execution-from-u...


Terrific! Data pipelines I've built have had JSON as their intermediary steps which I'm growing weary of.


> Of those that are serverless, SQLite is the only one known to this author that allows multiple applications to access the same database at the same time.

IIRC, MS Access allowed that, which explained a lot of its popularity.


This was a standard feature of flat file databases in the early 90s. There were many products. They often had an ODBC driver, which provided a SQL front end. dBase .dbf files were often used for storage. The arcane file locking in Windows is intended for exactly this kind of application.

Apart from quality (!), SQLite's main advantage over these products is broad platform support. And continued existence.


We still have a multi user desktop application that uses the Borland database engine and dbf files over a windows file server. The BDE is no longer supported but it still works


Unix like operating systems also allow locking files for on disk databases (like mbox files) it’s just not what every application does by default.


Used to love dbm files. 1980s serverless NoSQL. They're still totally usable although we have LevelDB nowadays too.


In the old days of DOS I could do that with Clipper 5.2. Two or more PCs using Netbios shared directories could work on the same database, provided that they wouldn't write to the same record. That wasn't a problem because the environment (can't recall for sure if it was Clipper or an external library) allowed single record locking, so I enclosed the lock attempt into a timed spinlock-like block which attempted once per second for 10 or 15 seconds before obtaining the lock or failing with a record busy error. No SQL involved however, and indexes had to be rebuilt by hand like twice a day to be safe. But those were the days when a 486 could crunch a 100.000 records db with multiple indexes very easily.


This is the first time I've heard Clipper mentioned in a looong time. I was just talking the other day how useful it was in those days that there were tools that included the database and UI all in one. Everything was a bit easier because the language assumed a database, rather than just using a library to interface with one.


Pretty much all dBase derivatives had both file-based and record-based locking, that was specifically designed to work in a networked environment with file shares (although NetWare was much more common than NetBIOS back in the day). RLOCK() was the function you had to use for records.


MS Access “allowed” it but depending on the version it could be quite problematic. We had a use case where Tableau connected to an Access file read only (but which another program used as a data store and wrote to it often) on an Windows file share, and once in a while the lock files would get screwy and we would have to manually delete the lock files to get things working again. Deleting lock files could be a huge chore because you have to figure out stuck processes and kill them. Task Manager wasn’t up to the task, so had to use SysInternal tools to help with that.

Access is really meant for single-user scenarios I feel. Maybe the locking mechanism has gotten better but for multiuser access I tell people to use a real SQL database.


It was ok pre-windows 2000. I had access 97 it on 9 workstations on NT with never a problem! Moment office 2000 and windows 2000 came out it went to hell. Moved to SQL Server then. I am not sure what changed.

It was quite frankly the most productive custom business software package I have ever used. Literally would take 10 people a week to do something custom you could do in an afternoon with 1 person in Access. I suspect the same is true now.


We had a much heavier multiuser setup on Access 97, and it worked fine. We had the same breakdown as you did, and had to do a registry edit on all users' machines to keep it limping along until we could move to SQL Server, which was the right thing to do anyway.


Interesting. Nice to know it wasn't just me. There was no one to talk to about it back then so you had to think on your toes.

SQL Server was a crap load easier to back up reliably


> Access is really meant for single-user scenarios

In a lot of cases, I've found it to be the best tool for a temporary or one-off (preferably smaller scale) data mining/massaging project. The query-building interface was the way I originally learned the basics of relational databases, and it also helped me get a better grasp of SQL-- the ability to flip back & forth from the GUI query builder to the SQL it generates is nice.

On the multiuser side, however, I have found a couple workarounds in the past. If you have everyone operate locally and space out their central database connections to intermittent, automated burst queries, you can get more concurrent users than you might expect. It helps to have fewer users per table, as well, and of course it really helps if they don't need to see the most recent adds/changes in real time.


My recollection was that the automated wizard loved to nest statements and it was a huge chore to manually massage stuff after the fact; almost incomprehensible to parse. but I agree that it was a real godsend for people trying to branch out from the limitations of excel.


Yeah, Access' Jet Engine and many of its issues are very well known.

[https://en.wikipedia.org/wiki/Microsoft_Jet_Database_Engine]


And all xBase based programming languages, Clipper, FoxPro, Visual Assist, and their competition Paradox.

I loved the Clipper 5 OOP capabilities, sadly Visual Objects tried to be too much like Visual Basic and some of the easiness was lost.


Since Access 97 or so it could actually use MS SQL as a backend which largely removed those issues; the Access "database" was then merely a VBS GUI for SQL Server. This works pretty well.


I am not sure about MS Access. From what little I know 2 people opening from network drive would mostly result in database being corrupt.


MS Access "allowed" it. SQLite actually works.


Access is actually designed to work in a multiuser situation over a LAN for concurrent read and write. SQLite isn't afaik. As long as the LAN was cabled I never saw any issues. The only reason it was necessary to move to a server type database was because people were insisting on wifi networking.


It might have been designed for it. It was poorly designed for it. It was a clustfuck of corruption when actually utilized


You basically had to engineer around the corruption risks for anything actually in production. Data redundancy, old-school 'Save' buttons (your work isn't properly saved until you click it, because it sends copies of that work to multiple backends) and isolating users as much as possible to their own 'shards' was part of how I saw it kludged through in the real world.

There was still a lot of weird behavior, though, and while you could reduce data loss you couldn't eliminate it.


Penny wise pound foolish. I had a president waste two to three hours a day instead of using a 40 dollar service to take care of the incredibly simple and repetitive task. Eventually when I was planning to leave anyways I asked him what his time was with per hour. I guess it was less than minimum wage at the time. (5.25 an hour)

I'm probably going to have nightmares filled with screens of the access database corruption dialogs


What's anyone's time worth? Might have kept him from doing something foolish.


Valid point. He did defraud the federal government for 600k. If he had more time on his calendar he could have fucked over the American taxpayers ever more.


It depended. It suited a small cabled office with 10 to 15 computers fine. I saw such setups work fine for 12 years without corruption. But when wifi came along people started connecting that way, sometimes unintentionally, and corruption became an issue. So then we bit the bullet and moved the backend to a server database. Not a single database corruption since.


But this is the difference between supporting concurrent access or not. The fact that the feature relied so much on network reliability means that race conditions still existed.


I'm not arguing with you but the fact remains that unlike SQLite, Access doesn't lock the database file so that only one user can write at a particular moment, it's more granular than that. It's also the fact that this worked very well for databases on small cabled LANs where no more than a dozen-ish computers might be interacting with the database at any one time. It was never designed for use over the internet (having said which I have maintained a forum running an Access mdb file as its backend since forever without a hitch the forum software process does the database edits so it's like a database server process in a way I suppose).


I'm not sure I understand what the physical transport has to do with this. Can you please expand a bit?


Maybe disconnect and reconnect cause windows network share to lose locks or something like that.


Yes, for Access (and sqlite?) it is the client computers that do the edits directly to the database file, rather than via a mediating server side process, so they are particularly vulnerable to network interruptions while in the middle of an edit. Cable is far more reliable than wifi and so far fewer interruptions. I think that's it.


Ok but you cant really say you support network access if you rely on it to be flawless. Working fast, maybe but not reliably working if your network is not super reliable is not really support.


It was a different time.


Back in the day, which was a Wednesday


Race conditions not triggering when the transport is fast enough?


I understand the distinction now :), thank you. Funny.


Good point.

Berkeley DB also supports multiple processes accessing a database concurrently, as far as I know.

I was wondering if the authors were referring to SQL-like databases, but MS Access seems to be one?


Yes, MS Access supports both SQL and non-SQL API, and not mentioning it isn't very professional from the author.


Is there anyone actually choosing MS Access for new projects in 2020?


They were indeed referring to SQL servers. The paragraph starts with

> Most SQL database engines are client/server based.

I skipped it for brevity.


It did allow it but getting it running reliably was basically getting punched in the face on a daily basis. Act! Was similar except they included getting kicked in the balls two to three times a week.

The reason access was popular was it made any middle manager that could wield an Excel spreadsheet think they could build a database.


I have a sales guy buddy with a 50K contact database in Act! that runs a Win95 VM to maintain access to this data. I've tried to figure out how to migrate it, but it is more than a few hours and who has that kind of time?


Didn't prevent it is closer to my recollection. My first job was on a system that used this idea pretty heavily - what a nightmare! We only got that code stabilized once we removed any sharing (and later removed Access completely).


Not open source, but Raima supports multiple processes on a single db file as well. https://raima.com/


FoxPro/Foxbase as well. In the Foxbase days predating FoxPro, there was even a “multi-user” version that, unsurprisingly, cost more than the standard single user.

And plenty of other file-based DBs such as Borland’s Paradox and the db engine, dBase; pretty sure FileMaker, too.


HSQL does too I believe.


JET database are what MS Access used underneath, and attempting multiple concurrent usage on JET databases is a good way to corrupt them.


Waaaaait wouldn’t that mean the file system is the server, with some binary API and responsible for handling concurrent access and locks for the entire file? LOL.


Serverless in this situation means that you don't really have to provision or setup an actual server to handle the database, the client itself just need the ability to read and write an SQLite file.


Because many of the locking things are outsourced to the file system which acts as server for concurrent threads.

A server is something that listens for commands from various clients and executes them.


That just circle jerk. There is an agreed upon def of serverless.

If we reduce things to the absurd we stop being able to reason about things.

https://en.m.wikipedia.org/wiki/Serverless_computing


>There is an agreed upon def of serverless.

Sorta but not really. The fact people have worked backwards from marketing names to try and constructively define inherently self-contradictory branding (rather than create a descriptive category into which we place questionable names and ignore them) is an embarrassment for everyone except the marketing departments.


I honestly too dislike all this naming fad and feel the internet’s been taken over by the management and marketting folk .. but still, i try not to hyperbole about it - too bad, but its sorta ok, and it ultimately irrelevant for the job, dbConnection is simply remote.


It's called a buzzword, nothing more nothing less, it's like "cloud computing", eventually it becomes meaningless.


We now have kubernetes tho, and openshift. And i actually believe in docker swarms and compose battalions.

Ill give u that its hard to discern when a thing is a real change and when it isnt, as the titans of industry try to peacock around.

Id just examine what they actually mean, for good measure. Theyre trying to sell things they have no idea about - but thats why the hierarchy of commerce is.


Thank you for saying this. People seem to argue semantics to appear clever and it really pollutes communication and reasoning.


That is not the definition of serverless in question.


If you examine it closely, youll see that they have alot incommon, and that the author defined the term neo-serverless to attempt to adress this - both definitions share the fact that multiple applications (in clientless form) can access the database. He even gives amazon S3 as an example. For neo serverless.

I agree with you that it take a bit to marry both, but the stretch isnt far.

Im also avoiding criticising the Author for not sticking to the main def, since id then be red-herringing the post.


> both definitions share the fact that multiple applications (in clientless form) can access the database

That is like the least relevant thing on that entire page, to be frank. "SQLite is Serverless" is specifically referring to SQLite being an embeddable library that runs in the same process (and same thread, even) as your application vs. the client-server architecture (database in another process, with communication via a port) that DBMSes like MySQL and co have.

> Im also avoiding criticising the Author for not sticking to the main def

The "main definition" (i.e. the web dev buzzword) came into being years after this post was written.


You are right


Well you effectively report some of the workload normally handled by the server to the OS's filesystem layer, that's true. In particular you rely heavily on the FS locking working correctly. Calling the FS a "server" is a bit of a stretch though.


> It is important to understand these two different definitions for "serverless". When a database claims to be "serverless", be sure to discern whether they mean "classic serverless" or "neo-serverless".

It's really not important to understand that distinction, because this author seems to be the only one making it. Everyone knows what "serverless" means at this point, and it's not an embedded DB.


I agree! Also the cryptography people should stop calling their hobby "crypto", everybody knows crypto means bitcoin and stuff. Snobs.

(here's SQLite's "Serverless" page the way it was in 2007: https://web.archive.org/web/20071115173112/https://www.sqlit...)


I'm not sure if this comment missed a /s somewhere...


it wouldn't be wrong to include one, but it didn't necessarily need it. the link in parentheses serves the same purpose.


nitpick: it would be wrong to include one, because sarcasm isn't sarcasm if you say it's sarcasm.

personally i feel like even including the archive.org link was pushing it over the edge :)


This such an air-headed comment to make. You must realize that page describing how SQLite is “serverless” has been up probably longer than your entire adult life. It is important in its context, they are not trying to “redefine” (lol) the term.


> (This section was added on 2018-04-02)


That’s exactly the point, the section was added to note the new definition of serverless vs the one that has been there for over a decade.


It doesn’t matter. It is not the generally accepted definition of serverless. The meaning of a word is based on how it is generally used, and almost nobody means this when they use the word “serverless”.

At this point trying to use the word in this way just creates a bunch of unnecessary confusion. Call it something else so we can move on to more important (and clear) discussions.

(Also there are a lot of assumptions and snarky comments in these responses about my age, which is quite rude and pretty elucidating.)


Their usage of serverless predates all of these cloud application offerings. Just because a lot of people in this particular echo chamber like to think of serverless as meaning one thing, there are plenty of others who have been using the same word to mean something different for quite a bit longer.

Moreover, the cloud as a service definition isn’t even accurate. There still is a server — it’s just not one the developer has to worry about.


The page was written before that usage of the term serverless existed, and anyone reading it would have known exactly how it was meant. The "neo- vs classical-" note was added in 2018 because the distinction was being made.

Is your argument that they should replace "serverless" with "runs without a server?" That seems like a strange position to me.


No doubt you're a little surprised that hacker news doesn't contain that many posts about people breaking into computer information systems, amirite?


For sure. Everyone knows hackers are criminals now. So why are we all here on a public criminal forum discussing our crime?


I think hosting providers marketing their managed servers as "serverless" is what has created confusion.

Imagine if Uber called itself a "carless" taxi.


Although clearly different in your world, to me serverless meant that it can run without a central server (it was used in peering systems, eg games in the 90's, and co-distributed systems workers sharing info - a bit like block chain in the 00's). Occasionally, it was used when the app was working offline.

I think it highly ironic that the marketing hype just upended what's really going on. The new stuff's 100% server bound, as most people realise.


Everyone know that “serverless” anything doesn’t run on a server, be it AWS or Azure instances or what have you. That would be both ironic and silly, like someone used the wrong word or something.

Serverless databases and what have you have been around longer than the current batch of folks trying to redefine things (or more charitably, ran out names to call things). Like or not, there is a distinction even if the old definition was before your time.


Nobody called them serverless DBs, and nobody does.

There is a vocal minority of reductionists here on HN who dismiss the accepted definition of serverless because the literal meaning doesn’t make sense. It’s just noise though. “Serverless” does have a specific meaning and it’s not “there are no servers anywhere”.

I was around when there was no serverless. Things change, new words arise. Time for us to get with the times.


I remember Firebird called "serverless" sometime around 2006, in the same meaning that SQLite uses. It was pretty common terminology for RDBMS.


nah


> I was around when there was no serverless.

We used to call it /cgi-bin/ ;)


>Everyone knows what "serverless" means at this point, and it's not an embedded DB.

Serverless is a marketing term at this point (as it was when it started). This post brings some welcome definitions and expands it to something that has many of the same attributes but wasn't appreciated as such.


Serverless refers to platforms with specific and well-understood properties, such as managed servers, autoscaling, usage-based pricing, etc.

What parts of SQLite have these properties?


And for others the word “serverless” literally means without a server. As in... a library. It’s been a well used definition for many years.


In the case of SQLite, it doesn't have a server process.

In the case of cloud "serverless" it doesn't have a physical ser-- Wait did you just mention "managed servers"?


SQLite has specific, well-understood pricing and scaling properties.


This so much!

But is it web-scale?! /s


The "neo-serverless" thing is strange. But then, so is "serverless". "Serverless" systems are just time-shared servers. But only if they use new, cool technology. Shared hosting with vendor managed MySQL doesn't count, apparently.


After this article has been uncovered by HN, that may just be about to change.


Exactly. If anything, it seems like the author wants to take the word "embedded", which everyone understands, and somehow redefine it to "serverless" which everyone also understands, and which this is not.


Yes, of course, the author took a time machine to 2007 to try and co-opt your buzzword.


Meh. If we allow serverless to make REST calls, is accessing a file system any different?

I had more trouble with the assertion that the embedded DB would be maintenance-free. I started a top level comment asking for someone to explain how that would work. The part of my brain that protects me from scams is screaming “something for nothing”.


> "I had more trouble with the assertion that the embedded DB would be maintenance-free. I started a top level comment asking for someone to explain how that would work."

When is the last time a certed-up DBA had to do any maintenance on your Firefox install's places.sqlite database? That's what's meant by maintenance free; you can reasonably employ sqlite databases on users' computers, without users having the foggiest idea of what a database even is.


In case SQLite is not enough and you need redundant servers or clustering, there's also database servers that use SQLite as storage engine: http://rqlite.com/ https://dqlite.io/


So you run an app with sqlite on one server and sync to sqlite on another server? What would be the benefit to using a separate db server like in the 'neo-serveless' setup?


No, in this case you always have to use rqlite/dqlite because they manage the network synchronization. They use SQLite as storage engine (one SQLite database per server instance).


I understand that in those cases rqlite/dqlite is used. But that it just a technical detail. My point is that I am running two servers: one with the app and Xqlite and another one with Xqlite.

In case of a neo-serveless setup, I also have two servers: one with the app, the other one with the db server.

So what are the benefits of the Xqlite setup? I looked into that before and for one thing, Xqlite is slower (obviously) then just sqlite. So speed is not a key benefit. I also will have to manage both servers myself.

At least for a seperate db server I have the benefit that I can buy that as a service, incl management, backups and such.


Not the author, or knowing of all the technical details... simplistic replication structure and redundancy/failover without an expensive or more complex RDBMS solution while still self-hosting the service.

There are still a lot of instances where you cannot use a cloud provider for your app or database.

To be honest, I'd probably lean more towards a nosql database that has in the box, relatively easy replication strategy, though that might mean 3+ db servers for good performance. (RethinkDB, MongoDB, Cassandra/ScyllaDB, Cockroach). Just depends on the budget and resources.


Well, ScyllaDB is free and open source, so that should help the budget. (Though we do have an enterprise version, base price is FREE!)


I'd probably reach for Scylla wherever I would consider Cassandra (or similar), though the similar bigtable/columnstore-like services in most cloud hosts is generally still going to be easier.

The performance over C* is surprisingly good.


a bit tangential, but when do you move form using in-application data structures (maps, trees, vector/arrays) to using a database? Is it basically when the data doesn't fit in memory? I've been programming for almost a decade and I've never come across needing a database... (for context, it's ten years without anything web related) I'm interested in them and I'd love to learn SQL but I can't even think of a use case outside of managing user in some web based application


SQL queries tend to be much smaller than their equivalent data-structure traversal procedures. This can be beneficial even when you still want the data in your process's address space, hence embedded database engines like sqlite. Libraries like Linq can also provide the same expressive power over a programming language's native objects and collections.

As to why you'd want a separate DB server process: long lived mutable structures tend to drift into unexpected states, which is why the standard computer troubleshooting procedure since forever is to restart. Database management systems specialize in not having that problem. You deploy one that's widely used and hardened over many years, change it infrequently, and operate it with great care. Or pay a cloud provider to do that.

Then everything else gets to outsource the burden of persistence to it, and the vast majority of the workloads you develop and operate are stateless. These are drastically more convenient and more resilient to mistakes. They're effectively "restarted" with each API request, and you can stand them up, tear them down, migrate them to new machines, etc. as much as you want without fear of data loss.


"long lived mutable structures tend to drift into unexpected states"

Why are those "tend to drift" ? As an example I wrote sort of like game server for one of my applications. Internally it has those exact forever lived mutable structures. I've never observed it to drift into any unpredictable state. Works like a charm and running for many month. I only reboot it when I need to update it to a new version.

The only catch here is that all data fit into RAM. With the amount of RAM modern computers can be stuffed with I do not really see if my server would ever run out of it.

Of course it backed up by database but it merely serves as a persistence layer for this particular application.


Consider why 3rd normal form exists to begin with.

In the real world, let's say you have a shipping address and a billing address for a customer, and they are usually (but not always) the same.

Eventually, a customer moves, changing both of their addresses. But the user forgets to change their billing address with their delivery address.

A proper database would have a 'billing address is same as delivery address' logic, following the principle of DRY (don't repeat yourself).

----------

There are lots of examples here of what can go wrong when you repeat yourself in a database application. The user may have an error when repeating themselves over the dataset (delivery address is correct, but zip code on billing address has a typo).

Dealing with these issues at scale, with hundreds of thousands of customers, is certainly a problem. Normal forms can formalize these issues and help the business owner avoid the problems.

Where do you verify the existence of zip codes and cities? Where do you check for typos? How do you prevent contradictions on the submitted information?

Your human customers will make many mistakes. Your logic must hold up even in the presence of faulty data.


"A proper database would have a 'billing address is same as delivery address logic"

The database does not have logic like this. It has to be implemented by stored procedure. When I have application server all such logic (if applicable) is handled by code in much more performant way. No data goes to a database directly. Everything passes through the app server along with the validation data transformation etc, etc. As already said the database in this particular case is nothing more but persistence layer.

Again we can put all kind of theoretical speculations but as I already said, my particular server does not have data drifting to some faulty states.


>> "A proper database would have a 'billing address is same as delivery address logic"

> The database does not have logic like this.

Of course it wouldn't have 'logic', databases are just stores of data.

You'd have one 'address' table, with probably a int-primary/surrogate key. Then the 'delivery' and 'billing' address would be an int, pointing to the address table.

Furthermore, the billing and delivery address would be foreign keys, so the internal database logic would keep the tables in sync with no application code required.

With the data organized in this manner, the application code becomes logic free and braindead easy to write. Or at least, corner cases become easier to handle and more explicit. (Say two customers share the same address, do you allow repeats in the address table? Or do you allow customers to tie address information together? Either way, your decision rests on how you define the primary key)

> When I have application server all such logic (if applicable) is handled by code in much more performant way.

The most performant way is no logic at all. A proper database removes a lot of checks, the storage format itself naturally creates logic free code.

Some application logic is necessary of course. But you can minimize the logic needed by thinking about data layout.


hmm, it seems to map to my workflow with Clojure lately. But there I'm massaging and generating new data sets (even though it's statless) but I can see that if you have a very settled data format then this would work in a language agnostic data-interface kind of way. I could see like measurement/time series data being stored in SQL and accessed that way would be much cleaner than opening and closing CSV files for all sorts of datasets


I'm sure people will give you the orthodox answer, so let me give you mine: when other things become more important than performance, ease of development, and code clarity.

Many years ago I happened to meet the creator of Prevayler, an open-source persistence framework that provided ACID guarantees and was thousands of times faster than a database as long as your data fit in RAM. I tried it out for a project and we loved it. Our hundreds and hundreds of unit tests ran in a few seconds. Our pages rendered in ~5 milliseconds. It was structured around a log of actions, so if you wanted to know how the data ended up like it did, every change was logged.

What people who haven't worked this way often miss is that a database doesn't make things simpler, it just makes certain things easier. Once you add a traditional database to your project, you're importing a million lines of mysterious code into your project, and demanding to pay a serialization/deserialization penalty any time your code wants to look at data. When it works, it can be swell, but when you have a problem, suddenly things can get hairy. Database performance optimization is a murky art in a way that just isn't true if all your data is right there in RAM.

Prevayler of course didn't catch on. It was just too weird for most people, who had grown up on databases and for whom data structures were something that they hadn't really thought about since their last CS exam. But I sometimes dream of the world where it did catch on. At the time, fitting in RAM was a big limitation. But now I can get an off-the-shelf desktop with 768 GB of RAM, and Amazon's servers go up to 24 TB of RAM. If you're going past working sets at those levels, traditional databases have anyhow fallen out of favor of big data tools. But it would have been a much better fit for today's world of microservices and distributed systems.


This is exactly the style in which I build applications these days. Most of the time I just describe it as “hella caching” but it is a different paradigm from treating the database as the primary state engine. The speed and simplicity of working on in-memory structures are great. When you outgrow a single server’s RAM capacity, you can use Kafka or another durable message queue as your application’s WAL and shard your data across multiple servers.


Have you written up the systems you've built? I'd love to read more about the practical details. Feel free to email me or DM me on Twitter if that's better.


I haven’t written up any of the production work I’ve done in this vein, but here’s a demo application I built as a hiring challenge (apologies for the broken demo link, it was hosted by the now-defunct Hyper.sh): https://github.com/notduncansmith/agree/blob/master/README.m...

Given no firm deadline, I timeboxed to 12 hours so it’s not fully fleshed-out but I like to think it illustrates the concept well.


Ah, that's cool! I like your writeup; it makes the advantages of your approach clear. I hope you got the job!


Thanks! I did, though now working at another co and spending my free time generalizing this into a library that makes this paradigm easier to adopt (basically Redux-ifying your backend but with end-to-end encryption on the stored event logs). The initial version should be ready to publish in the next few weeks :)


If you remember, please email me or contact me on Twitter when it comes out! I'd love to check it out. And I'll definitely pass it along to the Prevayler community, who I'm sure will be tickled.


Prevayler sounds a lot like Redis, unless I’m missing something.

I’m a big fan of Redis, but I was bitten many years ago when I tried to use it as a replacement for an RDBMS. There were two reasons for this: 1) lack of development libraries and operational tools, and 2) lack of data integrity checks.

1 has changed these days, but 2 is still very much the case (and rightly so, IMHO). Perhaps Prevayler had this?

But yep, it was fast at a time when our competition’s software was slow and clunky. It made a difference.


Not quite. Redis is still an external database. Prevayler was much simpler, just a library.

With Prevayler, all data is kept in RAM, reachable from one root object, which Prevayler holds. All changes to the data model must be expressed as command objects. Each object is handed to Prevayler, which serializes it to a log and then executes it. Once in a while, you can snapshot the data out and start a new log. If there's a crash, you just load the latest snapshot and replay the log.

You got exactly as much data integrity as you wrote into your objects and your commands. Which without much work could be quite a bit, because you get a lot of data integrity by not writing things. E.g., if a kind of object should never be deleted, you just don't write any deletion code. If, say, account balances should never be changed directly, but only as part of properly structured credits and debits, then you just write your code like that.

Most databases, Redis I think included, are made for arbitrary operations on data. Developers add integrity and security later, hopefully. And that is often duplicative of the code base, so that one ends up having integrity checks both in the code and in the database.

I should say that made a lot of sense for the era databases came out of. Databases were a huge step forward in the 1970s and 1980s. My dad was a developer in that era and it was a big relief not to have to get a bunch of programmers to all follow the same conventions for exactly which record was stored in exactly which spot on their precious and expensive disk drives. Not having to know the minutia of the hardware let a lot of people just get in there and build business reports. But if we were starting fresh today, I don't think we would do anything like a SQL database. Redis was definitely a step away from that era, and I look forward to many more.


If I didn't need the log, would I need to use Prevayler at all? It sounds to me like a plain old "keep objects in RAM", which is what most programming languages already natively do.


What Prevayler got you was the ACID guarantees: https://en.wikipedia.org/wiki/ACID

So if you didn't need persistence, you certainly didn't need it. Ditto if data integrity wasn't important.

It would also have made distribution much easier. Since each mutation was already serialized to disk before execution, you could also send it over the wire to read-only replicas and hot spares for the master.


I have found that you can go for a disturbingly-long period of time using primitive schemes like LINQ<->Objects<->JSON to persist your business data before things start to get hairy. Personally, I would say 10 megs of persisted data is about the upper limit before I am going to start reaching for SQLite. If you start to get clever with schemes like one file per serialized entity, you could potentially avoid using a 3rd party database indefinitely. I have found that the technical cost of using SQLite from the start is so low that I just start out using it by default even if the file will never exceed 1MB.


> If you start to get clever with schemes like one file per serialized entity, you could potentially avoid using a 3rd party database indefinitely

Most file systems will start to get slow at some point with too many files in a folder.


will you store large objects like images or audio file directly in SQL?


In most cases I will. Pulling a blob out of a row is a lot faster than opening an additional file handle. There aren't really any downsides to this either unless you are running table scans. SQLite does support indexes (even full-text) and they do work miracles so do use them when necessary.

I would go so far as to argue that SQLite could be used to store all of the assets for any large piece of software (I.e. a AAA game). This would probably wind up faster and more reliable than most alternative solutions out there today.


Can confirm SQlite is used in AAA game engines as well as VFX production pipelines. I know because I put it there.


So - is SQlite a good file system? Should I just use it for everything?


I think hard drive manufacturers should just embed SQLite directly into their devices and optimize the SQLite VFS implementation to be flash aware at the controller-level. Who needs a file system when your disks speak SQL?


SQLite has a page measuring performance of internal blobs vs external files: https://www.sqlite.org/intern-v-extern-blob.html


Anytime you want persistency: sqlite is a replacement for fopen() https://www.sqlite.org/whentouse.html


> when do you move form using in-application data structures (maps, trees, vector/arrays) to using a database?

The killer feature of databases, including no-sql or older ones like BerkeleyDB, is multiprocess synchronization.

If two programs (or two copies of the same program) need to coordinate their data, a database is often far easier to use than writing your own transaction layer through files, mutexes, and other primitives.

Web applications, such as forums or discussion boards, have many users and processes adding data simultaneously. That's why databases are so popular in web backends.


Persistence, portability, and durability. SQLite is an actual file on disk, can be easily read by multiple programs/languages, and has write-ahead logging to keep your data safe from crashes.


Is all your data on disk read-only and the data in RAM can be forgotten when your application finishes/closes? Then you don't need any of this.

If your state consists of a single data structure you could write it to disk into a new file and move-replace the old file. This works as long as you have a single instance of your application.

For anything more complex SQLite is a good way to keep your data in a consistent state. If you store it on a local drive, the performance is stellar.

Thinking about going multi-process or multi-user? SQLite can still give you an easy head-start because it performs OK in most situations with a database stored on a network drive.


Any medium size or greater application either uses SQL or recreates a shitty version of it. You just can't write complex code without relations.


Structured transactions between applications is one place.

You can get by with files, but they're slower. A DB is the right choice.

Structured data anytime you're hitting multiuser - web, networked games, collab programs, live maps, etc.

Anything that's massively stateful. A DB hands you a lot of guarantees. The filesystem has some of them, but is much slower.


That's a weird question, I don't really understand.

When you want to persist data beyond the lifespan of your process?


How do you persist state across a restart of the app or device? Do you dump state to a file? Do you not need to persist state ever in 10 years of software development?


> when do you move from using in-application data structures?

as soon as you need any of these:

* a relational model -because you need to model your data that way or you are required to because someone wants to consume it with tableau-.

* concurrently read/read data between 1+ instances.

* a standard way of doing backups.

moving from that to a database would probably equal a rewrite.


It does allow multiple applications to access the same database at the same time, but when you do so it really hurts performance. I noticed this when i wrote a web crawler in go and used sqlite as the backend.

As soon as i connect using the command line interface, it slows down significantly.

Just something to bear in mind if you want to use it with multiple processes!


I use it for real time data (around 22G/month, compressed) and I can do all sorts of filthy stuff with the database while the real time processes are running.

Increasing the retry timeout can help, using WAL can help. I'm sure you've tried all this though.


Multiple processes is an instant anti-pattern for a single SQLite database. I would stop and reconsider your approach before trying to build this solution using it. The way I see it there are 3 options:

1) Implement another process which will have exclusive ownership of the shared SQLite database, and then use some IPC scheme to delegate database operations from multiple processes.

2) Give each process its own copy of a SQLite db if there is no effective shared state between these processes (I.e. you are just map-reducing web crawler results). Upon completion of each process you could aggregate each into a final combined db.

3) Use a hosted database solution such as Postgres.

The bigger question for me would be what are you going to do with this data once you collect it. If you plan on having another series of processes that then use the SQLite db to provide reporting views or execute business logic, I think a hosted solution might be a better option. If scalability is a serious concern, option 2 is probably your best bet.


It's ironic how using a severless database require your app to be the server and do top-level access management to the database


Yes, I thought the same - so you have to write a server. What about locking? roll your own I suppose. Multi user - roll your own? What about hot backups? Rollbacks?

I always conclude these things are advocated by people who have no experience in large multi user systems. The same as the NoSQL movement. They'll eventually build a database server. They build a system using the cool thing which works fine when they test it on their single user system. Go live - aagghh what's happening, why are all these people trying to access my data simultaneously? and so on.

One I'll always remember was when XML was the next big thing - they decided to store the raw XML in a database. It was a commercial product, and we were interfacing to it from our system. Once we found out this we started asking questions - no no it works fine we were told, laughing at us old database guys. Went live couldn't handle 5 TPS - what a surprise, it never worked as far as I'm aware. There is this continuous circle of databases are bad, no no do this you don't need to do this, no things have changed - what do you database guys know. Its entertaining to watch if nothing else, my advice - learn SQL, and some database tuning, its not that hard, at least compared to writing your own database engine.


It may be ironic but there are benefits to rolling your own. Having your persistence layer talk in terms of your business models instead of raw SQL could be seen as a benefit in some contexts. The process which is the "server" can be written to allow for relaxed consistency based on specific business operation being performed (e.g. no need for transactions around log entries). This could be used to leverage substantial performance gains. The benefits afforded by guarantees of exclusivity between application and database are difficult to overstate.

These advantages of purpose-built functionality would also extend into the arena of handling replication and clustering. I.e. multiple distributed processes each with independent databases synchronized via some custom protocol that operates in terms of specific business models and processes.


you can just turn off these things if you so desire - in reality the overhead is a lot less than you'd think. You'd be way better off spending the time you'd use to write half a database to optimising your application and just use a database. Or maybe you don't really need a database.


Nothing of the sort is required. If you want to write a simple single-threaded synchronous program, sqlite works great for that.


You don't need to write a server to use it. You need to recognize you're using the wrong tool to solve your problem.

I've used SQLite as an embedded database, as a log file, it's even possible to use it as a virtual filesystem in tcl starkits. But a high demand, multiaccess data solution it is not. Yes you can make it work but you need to justify the costs of doing all that server work when you could just use a SQL server that already meets your needs.


Centralise your writes to be only done through a single thread/process and then you can read from as many threads/processes you like with no noticeable difference in performance (at least for the several hobby dataset importing projects I tried).


«just write your own server on top of serverless embedded SQLite to get to acceptable performance without needing a client/server database»

(Honestly, once you are at the point where concurrency causes performance issues with SQLite, you are better off moving to databases designed to handle concurrency rather than trying to cobble together your own workaround - you have reached the point where the drawback of SQLite‘s architecture outweigh its advantages and the advantages of other databases architectures outweigh their drawbacks)


Sure, I don't disagree. I've ditched sqlite for Postgres many times.

It's just that for 99% of the projects I ever worked on, writes are like 100x less than reads so wrapping the writes in a queue of sorts has been quite okay and performant.

It can and it has been coming to a point when it's easier to use a full-blown database server, too.

I was simply pointing out that for a lot of classic workflows wrapping/centralising writes works quite fine.


Depends on what you're up to, I suppose. Having a particular chunk of data wrapped in a microservice has other benefits. If that's the direction one is headed anyhow, then wrapping SQLite + single access can be more tractable than allowing multiple access that you'll eventually have to rip out again. I know it works for plenty of people, but for me "database as integration layer" has always been an antipattern.


Depends on what you're doing... If you're working in descrete projects that need to be run, then archived.. using an RDBMS vs a service over SQLite is an actual consideration... if you have a remote API interface that talks to different SQLite db files on a per project basis, then backup and archival become trivial matters... if you're using a classic RDBMS then it can become much more complicated.

Aside, adjusting schema over time also becomes easier as archived projects don't need to be updated, they just continue to exist with the older schema.


As you quoted, people in this discussion are saying that writing a dummy program that has its own sqlite and does nothing but pass messages to it that it receives from all other processes on the system that need to talk to that sqlite file results in much better performance than accessing that file "directly" from the separate processes.

so if everyone's saying this, is there such a standard dummy program?


> As you quoted, people in this discussion are saying that writing a dummy program that has its own sqlite and does nothing but pass messages to it that it receives from all other processes on the system that need to talk to that sqlite file results in much better performance than accessing that file "directly" from the separate processes.

And use what protocol for inter-process communication with the daemon managing SQLite? At the end of the day you just created the equivalent of a server anyway...


Any protocol. People are saying this is a common solution.

So is there a standard dummy tool like that?


> So is there a standard dummy tool like that?

My point is that by that time SQlite doesn't fit "Serveless" as defined by the linked article anymore.


True


There shouldn’t be should a such standard program. It’s up to you to implement it in-process so you reap all the benefits of sqlite’s speed and lack of dedicated server process.

In the case of Erlang/Elixir where I most actively work in the last few years, it’s really easy to centralise write access (dedicated actor receiving messages). With other languages it should also be fairly easy.


Given the extensions that people compile into SQLite, I don't think one can even say that... there's mention of ActorDB in another thread that seems to cover this...

Wrapping a typical API in either GRPC or even over HTTP isn't too hard.. you take in a parameterized query with parameters using library X and return JSON stringified to that library... you may want more specific interfaces, or even GraphQL over the database for that matter... it really depends on your needs.


Presumably multiple readers are fine, with only multiple writers being an issue?

I vaguely recall trying to use multiple threads to write to an SQLite DB some years ago, and I think it actually locked the entire file for writes. I might remembering wrongly, but I think I switched to reader/writer locks in c# instead, and seeing a huge perf boost.


WAL mode means readers do not block writers and a writer does not block readers. If it locked the entire database you likely weren’t using WAL.


I was definitely using WAL - I know I mentioned using reader/write locks, but that was so the C# locking mechanism worked the way I wanted; I realise that SQLite doesn't block readers on writes.

My point, not well made :), was that a lightweight locking mechanism worked much faster than SQLite's file-based locking mechanism. This was on Windows, mind, so things might be very different on Linux.


Even with WAL enabled?


Let's say there are concurrent writes to a single table with few indexes. Process of updating said table and indices eventually boils down to update of a single B+ tree from a multiple threads. In general until serialized it is physically impossible without corrupting the tree. Sure there might be other specially crafted implementation of table more friendly to concurrent writes but there will be trade-offs. There are no miracles in the world.

As for particular case with WAL, all it does in this particular scenario is act as a queue. If your database load is spiky then it can even out load and make an impression of faster response. Under constant load it will not speed up the things and will internally serialize all actual updates


If you need concurrent/parallel writes, SQLite is not the right tool for you. You may as well lament that hammers are no good for driving screws.


Can you please point me to where did I say anything about SQLite being right tool for concurrent write? My point was that true concurrent writes to a generic database table with indexes are are physically impossible in a general case disregarding what database it is.


Some concurrent writes are "a transaction can wait for others to finish, up to how long it takes for a person to grow bored" and some concurrent writes are "if a transaction cannot finish without waiting the application is already struggling with a severe performance problem" (for example, because it needs multiple concurrent writes on different disks to keep up with incoming data).


Yes, using WAL. I tuned all of the settings that i could at the time.


SQLite is amazingly prevalent as well. Your own phone probably has hundreds of SQLite databases on it. One challenge with SQLite, though, is you have to download the whole thing to make use of it.

Shameless plug, but I made a fun side project that allows Amazon Athena to read SQLite databases from S3. https://github.com/dacort/athena-sqlite


What I really want: SQLite storage backend driver for s3/gcs. No need for disks then. I haven’t been able to find such a solution though; and am not technically proficient enough in C (the Lang SQLite is written in) to do so myself.


As part of a fun side project to make a SQLite driver for Athena, I made a read-only storage driver for S3.

https://github.com/dacort/athena-sqlite/blob/master/lambda-f...

Implemented the VFS side in Python, thanks to the awesome apsw library.


The challenge with this is that S3 and friends are object stores, meaning you upload or download the whole file each time. As you can imagine, this will cost you tremendous bandwidth even to insert one row.

Furthermore, it doesn't solve the multiple writers problem, because (afaik) there's no way to lock a file on S3.


> As you can imagine, this will cost you tremendous bandwidth even to insert one row.

Is this the cost of network transfers changed by AWS or GCP? What if I'm hosting my app on e.g. EKS or GKE respectively?


It costs roughly $5/million PUTs and $0.40/million GETs on S3 in addition to the bandwidth and storage you use.

S3 objects are also immutable. Once they’re written, they can’t be updated.

A read-only version of this might be useful, but probably wouldn’t work in-place.

Something that might be of interest is S3 SELECT support that lets you query a single (optionally compressed) CSV, JSON or Parquet file server-side at the same cost of a regular S3 GET.

https://docs.aws.amazon.com/AmazonS3/latest/dev/selecting-co...

And if you really want relational (i.e. JOINs, aggregations and sub-queries) semantics on a bucket full of CSV, JSON, Parquet, ORC or regular-expression-describable files in a cost effective way that has great performance on buckets containing 100’s of TBs of data, definitely look at Athena which is only $5/TB of data scanned during a query.


I am working on RediSQL[1] and I am about to launch a managed version.

The interface will be either HTTP or Redis protocol, you create your database, and daily I will back it up on S3.

(If interested you can subscribe for updated here: https://simplesql.carrd.co/

[1]: https://redisql.com/


Redisql looks pretty sweet! To be frank, and this is only my personal opinion, I don't think I would want to pay for an api, but rather run my own. The business model of providing everything OSS but sending telemetry seems rather intriguing; as a hobbyist user I am ok with such telemetry being collected. If I were to run it in production for a business app though, I wouldn't even bother considering the unpaid version for the following reasons:

1. I do not want my production instance to shut down for WHATEVER reason. This is just not an acceptable risk for most businesses. The only time a DB can go down is something goes wrong.

2. As an engineer, I understand that 3 counters that are not accurate aren't a big deal. I can even look into the source and see that they really do as you say. Justifying this to a security org will be a complete nightmare, as most security orgs in enterprises are staffed with barely technical folks masquerading as "security".

So, it seems like a pretty good way to coerce enterprises to pay up while letting hobbyists continue using it. Very smart, I wish you the very best!


Thanks! That was exactly my reasoning.

Make it available and simple for hobbyist and small companies, ask money to who can afford it to sustain the development.


this looks very interesting...can you please post some benchmarks in your docs for reference?


Benchmarks are always tricky, but sometimes useful, so yes I should post some of them.

Right now I am busy with releasing the v2, but after that I should definitely do some more marketing.

Anyhow, to give you an order of magnitude, on memory data storage we reach ~80k inserts for second. On a machine with 1vCPU and 3GB of RAM, a 15$/month box from DO.


that would be cool! as an interim step, if your data is small enough, perhaps you could run an in-memory sqlite db and periodically backup to a permanent s3 file?

APSW exposes the sqlite backup api so you could do them online without shutting down the database.


I love SQLite, but people approach it from a classic RDBMS angle which confuses them.

Here's the deal: SQLite is a file format with a nice API that uses SQL as the paradigm for reading/writing to the file.

That's it. Stop overthinking it.

Can you write a microservice that stores its data in a big JSON file that you've built some code around to read/write to? Yes. It's just a file, but you have to build all the read/write methods. SQLite is not really any different, except the read/write work is already done, and you use SQL to format the data values and encode the read/write logic instead of the language you are working in.

The file format has some cool extensions like text indexing, geospatial, etc. But it's no more a RDBMS than reading and writing a JSON file is.

"But there's indexes!" Yes, just liked you might build an index on your JSON file and read that before reading the JSON file to know where things are faster -- and then you have to write all the code to do that. SQLite is just a file format, where you can also build indexes and all the code for that is already done for you.

It's just a file format with an API that's similar to the ones you'd use for any regular old RDBMS and uses SQL as the domain language to read/write data.

It's just a file format. Anything you can do/can't do with a file format you do with SQLite.

It's just a file format. A nice convenient one that you are probably better off using than most other things for most purposes.

It's just a file format.

edit

I'm glad this comment is getting such a response. I'm not trying to be mean, just help clarify thinking.

Here's two thought experiments:

1) If SQLite didn't require you to use SQL as the read/write logic and was called "datalite", and instead just forced you to use C function calls, exactly like you would if you were working with literally any other file format on the planet, would you still be confused as to what it is?

2) Do you consider reading and writing to any other file format anywhere in the hierarchy of RDBMSs? Consider Python's csv module. Is that an RDBMS? Let's move away from tabular data, how about python-docx?


I've spent half an hour or so looking at this thread, and my take away is that you're largely just confusing matters even further.

> It's just a file format.

This is clearly incorrect. It does encompass a file format, but it also contains code to manage that file. The existence of sqljet does not change this, it's merely a different database management system that uses the same file format.

You also seem to mostly ignore that the data it's managing is a relational database, not some other form of data. This is why you can't compare it to python's csv module or python-docx, neither of these do anything to restrict you to a relational data model, nor do they provide a system to query them as if they were a relational database. On the other hand, if for some unknown reason you rewrite the storage engine of postgresql to use the either the sqlite file format or csv/docx, I assume you agree it would still be an RDBMS.

Ultimately I think you're just adding to the confusion by saying that a project with 139,000 lines of C code is "just a file format".


You can read and write sqlite files entirely without the library. The file format is available on the sqlite site. SQLJet proves the point that you don't need 139,000 lines of C code to use sqlite files.


I'm sure you could make a tool which could read and write to SQL Servers mdf files. That wouldn't change that SQL Server is a RDBMS.


It's a file format with a very mature and nice library around it. At the end of a day it's just fwrite() and fread() with SQL syntax.


And perhaps python is just glorified assembly, but it seems to me to be a "difference in degree that leads to a difference in kind."


I'm not sure I understand your comment. What's the signifigance of it being "just a file format"?

What are the aspects of "approaching it from a classic RDBMS angle" that are incompatible with it being "just a file format"?

I've always seen it as "just a file format" myself, and I think I've always approached it "from a classic RDBMS angle", but I've never felt confused, and I don't see what I'm overcomplicating.


> and I think I've always approached it "from a classic RDBMS angle"

Have you tried to figure out where to install the server or asked what the system requirements were for it?

Have you grown concerned that once the system moves into production the O&M team won't know how to operate "yet another database"?

Do you spend agonizing hours trying to figure out if it supports multithreaded connection pools for multi-user writes?

Have you wondered if your organization has the budget to add another DBA to the team if you add SQLite to your tech stack?

If the answer is no to all of the above you aren't approaching it from a classic RDBMS angle. Believe it or not, there are tens of thousands of questions about SQLite struggling to figure out the answer to the above questions.

The people asking these questions are not stupid, they're just approaching the technology from the wrong direction.

This post is no different than "CSV is serverless" or "JSON is server-less" with a blog post about classic vs neo-serverless JSON technologies.


> Have you tried to figure out where to install the server or asked what the system requirements were for it?

Not exactly, but asking "where to install the server & client and what are the system requirements for each" is not that different to asking "where to install the client and what are the system requirements for it", even when there is no server.

> Have you grown concerned that once the system moves into production the O&M team won't know how to operate "yet another database"?

Yes. Because the production concerns of SQLite are not nil.

> Do you spend agonizing hours trying to figure out if it supports multithreaded connection pools for multi-user writes?

Not agonizing hours, but it is just a slightly rephrased version of a valid question about SQLite w.r.t. concurrent file access (as with locks on file access for any file). Other commenters have brought this up in terms of multi-user access slowing down applications, and setting up intermediary DB access processes using IPC to facilitate this.

> The people asking these questions are stupid

I disagree

> no different than "CSV is serverless" or "JSON is server-less"

CSV and JSON lack any protocol or queryable interface: unless you're using some ancillary tool like `jq` as a comparison, CSV and JSON as filetypes are both "serverless" and "clientless" so not particularly comparable. An article on those would be quite different.


bane said "not stupid"


That was a later Edit. The original did not have the word "not".

I make these errors sometimes too and need to edit. HN's method beats Twitter, but it would be nice if we could see a versioned history of each message...


The original document is from SQLite documentation. I think it’s fair for them to make a case why sometimes a file database > a server database. People asking these questions are not stupid. We all have to start somewhere.


Please check it again, I wrote "not stupid".

The SQLite creators have a great deal of documentation that's targeting database administrators and users and trying to explain what this thing is, when I think they really should have just targeted people who need a nice file format and a clean API.

But hell, there's like a trillion SQLite files in use so what do I know?


If you are not confused, this might be an indication that the comment will not provide any new information for you. If the comment provided no new information to you, it might be an indication you're not confused (and it was therefore not farted in your general direction)


Sqlite implements all of the features you would expect from a relational database, including indexes, transactions, write ahead logging, consistency, and provides a SQL wrapper on top of its file API. _This is what makes it a database and not just a file format_. To the client, whether or not the API is happening over a socket or a locally linked library is irrelevant. How connection pooling works is also not relevant to whether it is a database or just a file format.

Calling Sqlite just a file format is kind of like calling Python "just a syntax specification." That's part of it, but we're talking about the actual implementation of it (probably CPython).



Consider: it’s totally possible to strip down Postgres until all you have left is an embedded RDBMS of the style of SQLite. (I’m not sure why nobody has done this yet, actually.) Would you call the result “just a file format”?

Such an instance of “embedded Postgres” would still have a huge sprawling catalog (PG_DATA) directory attached to each use of it, so it wouldn’t be contained to a single file. But neither is SQLite contained to a single file—SQLite maintains a journal and/or WAL in a second file.

And, yes, this “embedded Postgres” would require things like vacuuming. But... so does SQLite. Have you never maintained an application that maintains a long-lived “project” as a single SQLite file, where changes are written into this project file repeatedly over a long period? (Think: the “library” databases of music/photo library management software.) SQLite database files experience performance degradation from dead tuples too, and need all the same maintenance. Often “database version migrations” of such software is written to either rewrite the SQLite file into a clean state, or—if it has the possibility of being too big for that to be a quick task—to call regular VACUUM-like SQL commands to clean the database state up.

——

Now, I get what you’re trying to say; the point that you’re trying to make—that SQLite might be a relational database, but it’s not a relational database management system in the sense of sitting around online+idle where it can do maintenance tasks like auto-vacuuming. Unlike an RDBMS, SQLite doesn’t have its own “thread of execution”: it is a library whose functioning only “happens” when something calls into it. By analogy, regular RDBMSes are like regular OS kernels, while SQLite is like a library kernel or exokernel.

But that doesn’t mean that SQLite is a file format! It can be used as one, certainly, but what SQLite is is exactly the analogy above: the kernel of an RDBMS, externalized to a library. As long as you “run” said kernel from your application, and your application is a daemon with its own thread of execution, then your application is an RDBMS.

This can be seen most directly in systems like ActorDB, that simply act as a “transaction server” routing requests to SQLite. ActorDB is, pretty obviously, an RDBMS; but it achieves that not due to its own features, but 99% due to embedding SQLite. All it does is call into SQLite, which already has the “management system” part of an RDBMS built in, just not called unless you use it—just like exokernels often already have things like a scheduler, just not called into unless you as the application layer do so.


Great comment! Respectful of GP and constructively critical.

The other thing I would mention is that SQLite can operate totally in memory which makes it useful without even using it to persist data (say you have a language with a slow dataframe API, just use SQLite in memory to process your data).


FirebirdSQL is pretty close to what you're talking about... it's library can do either embedded or you can run a shared server instance. It's really pretty neat, but on the one side, PostgreSQL is probably better, and on the other SQLite is nicer.

I worked on a project a few years ago, where I chose Firebird so I could use literally the same database on potentially offline sites that regularly sync up to a main office (shared) deployment. I worked pretty well and was still a lot of work.


SQLite is a file format with a familiar API and uses SQL as the logic for searching/adding data to the file.

Approach it exactly the same way you'd approach using a CSV file and all the confusion and overthinking about it goes away. Approach it as a stripped down RDBMS and you end up with all kinds of questions about support for this or that RDBMS familiar service.

You can write your own SQLite file reader/writer. Here's the specs (includes the specs for the Journal and WAL files and semantics as well) https://www.sqlite.org/fileformat.html

Here's an example of somebody who's done this. https://sqljet.com/ - this is not a wrapper on the sqlite C code, this is a re-implementation of that code that is binary compatible with SQLite files.

The Journal file only exists as a temporary file until transactions complete. The .sqlite file you make is the entire atomic file that follows the SQLite file format. The Journal has its own file format. Same goes for the Write-Ahead log.

RDBMSs also manage connection queues, account management, rights and permissions, and so on. Many overcome various OS limitations by providing their own entire file management, fopen(), virtual memory and other subsystems that are tuned to their workloads.

SQLite is a file format. SQLite uses familiar relational paradigms to make it easy to read/write data to the format without having to learn yet another API and domain language. The API code is extraordinarily well tested, and it makes simple complex logic like transaction journaling, indexing and so on.

>Consider: it’s totally possible to strip down Postgres until all you have left is an embedded RDBMS of the style of SQLite.

No! SQLite is not an embedded RDBMS. It's a file format.

If there was a library you could import, and it provided methods to read and write directly to files that PostgreSQL could read/write to and there was nothing else to install, no runtime, no daemons, no servers, etc., then we could pass around self-contained PostgreSQL files to each other. Then PostgreSQL files would be a file format as well.

Have you ever used a library to read/write from a CSV, JSON, JPEG? It's no different than doing so for a SQLite file!

SQLite is a file format.


File formats don't have an API.

> SQLite is a file format with a familiar API and uses SQL as the logic for searching/adding data to the file.

That description is for a library, not a file format. SQLite is a library, that saves to a convenient format and allows you to query the file using the SQL syntax. File formats dont have "logic".

You are arguing the equivalent that Word is a file format. While there is a Word file format, Word itself is an application.


> File formats don't have an API.

So when you read/write to any other file format, you just read/write bytes directly to/from disk and re-implement the parsing and read/write logic in your own code every time?

> File formats don't have "logic".

Every file format has logic otherwise it's just random entropy in an arbitrarily long byte stream on a disk. How to read/parse and interact with that format depends entirely on the logic and scheme for that file. For example, many file formats have an index of some kind that you must read and parse before you can figure out where the other data lives, compressed file formats often store a dictionary, image formats often have a compression/decompression logic that must be followed for reading/writing to them.

> You are arguing the equivalent that Word is a file format. While there is a Word file format, Word itself is an application.

.docx is the file format for Word documents. There are many APIs and programs that can read/write to/from .docx files.


> No! SQLite is not an embedded RDBMS. It's a file format.

You keep saying it's a file format, but it's quite possible to use sqlite without persisting anything to a file at all.

    rc = sqlite3_open(":memory:", &db);


You can read/write CSV files, JSON, JPEG, WAV, MP3, MP4, etc. into memory as well. That doesn't make any of them an RDBMS.

SQLite is a file format. It has a nice API and uses SQL as the domain language for read/write logic. If it didn't use SQL for the logic, would you still be confused?


Being a file format is only one aspect of what SQLite is. SQLite describes itself as "SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine." I think that is a better, more encompassing, description than "it's a file format"

Edit:

If SQLite lacked any ability to persist data to disk, it would still be very useful as an in-process SQL engine for many sorts of problems. Certainly not as useful as it currently is, but nevertheless still useful.

I'd say the file format without the SQL engine, or the SQL engine without the file format, would be like peanut-butter without jelly. Certainly not pointless, but the real magic comes from the combination of the two.


Sure, I can get behind that. The logic that handles the read/writing and SQL parsing is all part of the library for sure.

But that library is absolutely not required nor is SQL. One could build their own library that read/write to SQLite files entirely without any of that if they wished. Some people have done ground up rewrites in other languages other than C, but have more or less stuck with the same internal logic and the use of SQL as the read/write logic.


All other RDBMS products also store their information inside files. I can take a mdb file from SQL Server and copy it to another server and attach it there.

SQLite is obviously a library that stores it's data to a file like millions of other libraries. SQLite is to it's data file the way LAME is to MP3. LAME is not the file format, it's the library.


> All other RDBMS products also store their information inside files.

No they don't. There are many RDBMS (and other DBMSs) that do not store their information inside files.

> I can take a mdb file from SQL Server and copy it to another server and attach it there.

Yes, SQL Server is an RDBMS.

MDB files are files defined by a file format. This format defines something called a database. There are many libraries and other pieces of software than can read/write mdb files (https://jackcess.sourceforge.io/)

SQLite files are files defined by a file format (https://www.sqlite.org/fileformat.html). You can write code that reads and writes to a SQLite database file without using the library. So long as you follow the format specification, you will produce or be able to read from an aribtrary sqlite file that is produced by any code that implements the specification.

The SQLite library is a reference implementation of the specification as well as some sample logic for reading/writing to the files and using SQL to define the interactions with the file. The SQLite library is not required for this, nor is SQL, nor is the logic for ACID compliance.

SQLite is not an RDBMS and offers almost nothing that an RDBMS (or DBMS) might offer. If you wish to use SQLite files with an RDBMS you either have to build the RDBMS yourself, or find somebody else who's done so.

The more you consider the SQLite a file format, the easier it becomes to work with and understand. The more you try to consider it an RDBMS the less it makes sense.

Just because SQL is involved, doesn't mean it's an RDBMS.


SQLite is a a library the implements an embedded database engine. The file that it stores this data in is an artifact, not the interface.


SQLite is a whole library with complicated locking protocol options etc. and works with multiple files of different formats (e.g. separate transaction log).


Since the file format specification is available, you can write your own code that directly reads/writes to SQLite files. You can't even have to use the logs and journaling options. You don't even have to use SQL.


That’s the SQLite database file format. SQLite itself is a library that can deal with multiple files in different formats (transaction log and database file). It also reads .sqliterc and maybe others.


> That’s the SQLite database file format.

You got it! SQLite is a file format.

> SQLite itself is a library that can deal with multiple files in different formats (transaction log and database file). It also reads .sqliterc and maybe others.

Yup you also got it! SQLite is not an RDBMS and shouldn't be approached that way.

The more you try to ram RDBMS ideas into what SQLite is, the more it won't be that. The more you try to treat it like a file format, the more it will be what you want.


In their own words: "SQLite does not compete with client/server databases. SQLite competes with fopen()."

[1] https://www.sqlite.org/whentouse.html


I think most rmdbs systems are just a file (or set of distributed files) when you strip away all the helper functions. The fact that Sqlite squirrels everything into a single archive doesn't mean many of the abilities housed within a more comprehensive database aren't there. You can implement JSON columns and full text searching and numerous other fancy systems.

The main argument I hear for why Sqlite deserves second class status in the DB marketplace is the difficulty in handling multiple writes simultaneously.

To that point I'd say it's more of a simplicity in design choice. Search for 'DB race conditions' and you'll see that every database struggles with handling multiple writes nested inside complex transactions. Sqlite avoids the whole mess and requires the programmer to think through I/O instead of offloading all that logic to the rdbms software.


> The main argument I hear for why Sqlite deserves second class status in the DB marketplace is the difficulty in handling multiple writes simultaneously.

SQLite is not in the DB marketplace. It's in the file format marketplace. It handles multiple writes in exactly the same way CSV handles multiple writes. If you want to handle multiple writes with SQLite, you handle it the same way as CSV.


I disagree. CSVs don't have write-ahead logging and are not able to selectively lock portions file for writes.

And I'd say that not only is sqlite in the db marketplace (albeit for a specific subset of database application types) it's one of the largest players.


Why not? Just roll your own write-ahead log when you are writing to CSVs! That's all the SQLite code is doing.

SQLite does not selectively lock portions of the file for writes. It locks the entire file using the Operating System's own file handling services.

SQLite is a file format. CSVs, XML and JSON are also huge in the dB marketplace. That doesn't make them RDBMSs.


A collection of CSVs can be used as a database.


Yes! If a database is defined as a place where I can read/write data, then any byte stream that I can read/write to is probably a database of a kind.


I'm no SQLite expert but by this logic aren't all single-host databases that tick the "Durability" ACID checkbox "just file formats" in the sense that yeah, the bytes we care about exist somewhere in the filesystem?

Moreover I'm having trouble coming up with things that I'd associate with a RDBMS and not "just a file format" that SQLite doesn't support. Transactions? SQLite has them. Relational constraints? SQLite has those too. Could you elaborate on some of the confusion that you've seen around this?


Look at the other comments just in this post.


Sorry for the previous terse reply.

Longer answer:

SQLite files do not guarantee ACID compliance. You can write code tomorrow that produces SQLite files, and so long as you follow the specification (https://www.sqlite.org/fileformat.html) it will be readable by any other code that implements the specification (e.g. https://sqljet.com/)

An RDBMS is not a database, nor is it SQL, nor is it data. It is a kind of DBMS software that manages relational databases, and access to the data (such as users and user rights). Most modern RDBMSs run as servers and offer network connectivity, connection pooling, advanced buffering options, various memory usage schemes. Many have their own memory allocation and file handling routines that are separate from the OS. Some offer clustering, partitioning and so on. SQLite does not offer any of these things. If you were to write a comprehensive list of things that Oracle, MS SQL Server, DB2, PostgreSQL, MySQL and SQLite offer, SQLite would offer almost none of the features that the rest do.

A relational database uses the relational model to store data. SQL is the most common language for describing what you want to put into or retrieve from the relational database, but it is not required.

There are many kinds of databases. Some of them store data in memory, in a file, in multiple files, and so on. Some of them follow various models, some of them are unique. If you have the file format for a database that stores its data in files, you can read/write to the file freely without any management system and without ACID compliance. SQLite files are examples of a kind of database file that stores data using a relational model. So are MDB files that Microsoft Access uses.

By conflating a file format with an RDBMS, it's like conflating a fork for a restaurant, or a chair for a house.

ACID compliance is not something guaranteed by the file format. SQLite files do not guarantee ACID compliance. If you write some code tomorrow that can read/write SQLite files based on the spec, you haven't created and ACID compliant SQLite file, nor is your code ACID compliant.

The SQLite library implements the properties that make SQLite ACID compliant. It does so by various clever means like a journal file format, and a write-ahead-log file format and various other well thought out approaches. If you were to write your own code that implemented the SQLite file spec, and you wished your code to also offer ACID compliance, you would have to implement those things yourself -- and you are under no obligation to use the SQLite journal and WAL file formats nor the internal logic that the SQLite library uses. You can do it entirely your own way!


edit - removed


The R in RDBMS stands for Relational, not Remote.


MySQl, Postgres, or MongoDB still store your date in files. So there are also “just a file format”. You do have one extra step - the db server process - to access the files.


Yes! You've defined the difference between an RDBMS (the server process and whatever else it does) and the file format the data is stored in.

SQLite is just a file format. If you want it served up over some kind of server, you have to build your own (and most people do), or use a server that somebody else has built for you (there's a couple out there).


SQLite is not "just a file format" anymore than say MS SQL Server is a file format. SQLite is a RDBMS in the form of a library and uses a particular file format for persistent storage. But it can also run purely in-memory: https://www.sqlite.org/inmemorydb.html without any file at all.

Being a RDBMS is not defined by whether the engine runs in-process or as a server in its own process.


Sure. Or a light RDBMS where the main not supported case is concurrent writes.


No, SQLite provides no RDBMS functionality. It is not an RDBMS. It is a file format.

Saying that a program that opens and reads/writes a file through a file format API is a light RDBMS turns almost every program in history into an RDBMS.

If SQLite didn't force you to use SQL as the read/write logic, absolutely nobody would confuse it for an RDBMS. That's because it's a file format.


> No, SQLite provides no RDBMS functionality.

I think your comment would be more comprehensible if you gave some examples of the kind of functionality you think is missing.

I don't know what you mean by "RDBMS", but JSON and XML don't do joins, don't do views, don't do efficient query plans, and so on. It's either ignorance or obstinacy to say SQLite is just a file format.


> ...but JSON and XML don't do joins, don't do views, don't do efficient query plans, and so on. It's either ignorance or obstinacy to say SQLite is just a file format.

Sure they do. If you write the logic to do so, and put it behind a nice API, you can make all of this come true. In fact, millions of people every day do joins with JSON and XML in their code every day. You can probably just use Apache Drill as the "library" in this sense to facilitate joins and whatnot. The creators of the SQLite library simply build that stuff into their library for you.

SQLite is a file format. It has a nice library full of wonderful utility functions for reading/writing that file format and a simple to use API that is operated by sending SQL to it.

It is exactly the same as reading and writing any other file format with any other API and library. The more you understand SQLite as a file format with a nice reference API implementation, the more it makes sense.

It is not the same as using an RDBMS and offers almost none of the things and RDBMS might offer. The more you try to figure out how it's not like PostgreSQL or Oracle or MongoDB, the more confused you'll make yourself.

It's no more an RDBMS than a .docx file is.


First of all, you still haven't answered the question: What is it that an RDBMS has that SQLite doesn't have?

> It's no more an RDBMS than a .docx file is.

Thanks for the idea. Your argument is like saying this:

Microsoft Word is not a word processor -- it's a file format.

I mean, yes, Word has a file format; but it's far more than just a format specification.

> Sure they do. If you write the logic to do so,

Right, but you don't have to write the logic if you're using SQLite. That's the point. SQLite is a library, which provides a way to do SQL operations on data. Like Word, SQLite has a file format, but it is far more.

I just don't get where you're coming from. Do you not know that the SQLite libray can actually do complex SQL queries on data? Or do you think that people shouldn't do that for some reason? Or do you just value SQL queries so little that you don't see any difference between being able to do complex queries and doing `file.Write(json.Marshal(data))`? What is it you're trying to accomplish with this line of argument?


> First of all, you still haven't answered the question: What is it that an RDBMS has that SQLite doesn't have?

And RDBMS is a well defined thing and is literally what the acronym expands to mean. This is very old technology with an interesting history and I really implore you and anybody reading this to go read up on it. It's not just whatever we assume it to be or some kind of data bucket with SQL.

> Microsoft Word is not a word processor -- it's a file format.

No, don't be obtuse. I'm saying that .docx is a file format.

Word is both an application for editing documents and contains a reference implementation for reading/writing .docx formatted files. There are many libraries that can read .docx files and some of them are also part of document editing software.

> Right, but you don't have to write the logic if you're using SQLite. That's the point. SQLite is a library, which provides a way to do SQL operations on data. Like Word, SQLite has a file format, but it is far more.

I just don't get where you're coming from. Do you not know that the SQLite libray can actually do complex SQL queries on data? Or do you think that people shouldn't do that for some reason? Or do you just value SQL queries so little that you don't see any difference between being able to do complex queries and doing `file.Write(json.Marshal(data))`? What is it you're trying to accomplish with this line of argument?

Precision of thought. People don't go around calling fish oceans, or forks restaurants. The SQLite library does what you've described to SQLite files. But you don't need the SQLite library to work on SQLite files. You don't need even need SQL. e.g. https://sqljet.com/

Just because a library offers SQL as a convenient tool to read/write data into its file format everybody loses their minds and starts to think the library is some kind feature reduced Oracle cluster. Go back to my first post. People are approach what SQLite is from the wrong direction (RDBMS) and its confusing the fuck out of everybody who gets near it.

This is important. IT departments and governments make very large, very expensive decisions based on if people know that SQLite is closer to CSV files than to Oracle databases.

I literally sat in a meeting last week where a senior decision-maker at a client wouldn't accept delivery of some software because it used SQLite and didn't want to add maintenance of yet another database to their overworked DBA staff and didn't want to hire a dedicated person to manage it. So now, instead of just taking delivery of the software, some of it has to be rewritten to use the client's RDBMS system, which in turn actually will add workload to the overworked DBA staff and will also perform worse.

SQLite IS A FILE FORMAT with a really nice library for reading/writing to that format.


> SQLite IS A FILE FORMAT with a really nice library for reading/writing to that format.

You keep repeating that, but it is just not the case. SqlLite is the name of the library, not the file format. Just see the definition on Wikipedia:

> SQLite is a relational database management system (RDBMS) contained in a C library. In contrast to many other database management systems, SQLite is not a client–server database engine. Rather, it is embedded into the end program.

It is really that simple.


> > Microsoft Word is not a word processor -- it's a file format.

> No, don't be obtuse. I'm saying that .docx is a file format.

I'm afraid I'm not the one being obtuse. That statement is a mirror; please have a look.

> Precision of thought.

Which is why "SQLite is only a file format" is a false statement, and you shouldn't be making it.

> But you don't need the SQLite library to work on SQLite files. You don't need even need SQL.

That's like saying btrfs isn't a filesystem, because grub knows how to read it. The core functionality of SQLite is the query system. The fact that it's got a well-defined file specification which other projects can read is one of its features, not the sum of everything that SQLite is.

> I literally sat in a meeting last week where a senior decision-maker at a client wouldn't accept delivery of some software because it used SQLite and didn't want to add maintenance of yet another database to their overworked DBA staff and didn't want to hire a dedicated person to manage it. So now, instead of just taking delivery of the software, some of it has to be rewritten to use the client's RDBMS system, which in turn actually will add workload to the overworked DBA staff and will also perform worse.

Finally, something remotely concrete, rather than a repetition of the same false statement.

So the problem you're trying to solve is that people see "SQL" and think "Oracle": A massive installation which requires separate resources, both in terms of servers and manpower to maintain it.

I can see why you want to try to correct that false belief. But your solution seems to be to introduce another false belief. Imagine you're successful in getting people to accept that "SQLite is just a file format". Five years from now, someone else will be posting this to HN:

"I literally sat it a meeting last week where a senior decision-maker at a client wouldn't accept delivery of some software because it used SQLite, and he said SQLite is just a file format like JSON; and they need advanced SQL querying, safe transactions, and safe access by multiple accounts. So now, instead of taking the delivery of the software, some of it has to be rewritten to use the client's RDBMS system."

You're not going to fix one misconception by introducing another. One better thing to say would be the truth:

"SQLite allows us to embeds database functionality into your application, so there's no need for a separate stand-alone database."

Or, in fact, to do what this article does, and try to hijack current hype around "serverless":

"SQLite is a serverless database -- you don't need to install and maintain a new RDBMS; it's embedded inside the application itself. No additional maintenance necessary."


It can open multiple files (the transaction log).


Almost any modern programming language can open multiple files.


At a minimum it is multiple file formats. But really it is multiple file formats + a fairly intricate library for dealing with them along with locks etc.

According to their home page the SQLite database file format is a file format, and SQLite is a library.


No but close. A SQlite file is a single format. The journal file and the WAL file are different formats used for bookkeeping by the library to attempt to be ACID compliant. The library implements some complex logic to ensure this, but reading/writing a SQLite file does not require any of this.

You could write your own code tomorrow that reads/writes SQLite files but does not produce, read, or write WAL or Journal files. So long as the resultant SQLite file follows the specification, it can be read by any other piece of software that implements the specification, such as the SQLite library or SQLJet (https://sqljet.com/).


> SQLite is a file format with a nice API that uses SQL as the paradigm for reading/writing to the file.

<insert-your-favourite-relational-database-management-system-here> is a collection of bits with a nice interface that uses a query language as the paradigm for reading/writing data.


No, an RDBMS is a piece of software for managing databases in the relational model, access to the databases (such as users and permissions) and provides services such as a server, connection pooling, and so on.

SQLite provides almost no RDBMS features.

This isn't just semantics.

A car is not an engine. A fork is not a kitchen. A SQLite file is not a DBMS.


https://en.wikipedia.org/wiki/Database#Database_management_s...

>Connolly and Begg define database management system (DBMS) as a "software system that enables users to define, create, maintain and control access to the database".[24]

>The functionality provided by a DBMS can vary enormously. The core functionality is the storage, retrieval and update of data. Codd proposed the following functions and services a fully-fledged general purpose DBMS should provide:[25]

[x] Data storage, retrieval and update

[x] User accessible catalog or data dictionary describing the metadata

[x] Support for transactions and concurrency

[x] Facilities for recovering the database should it become damaged

[ ] Support for authorization of access and update of data

[ ] Access support from remote locations

[x] Enforcing constraints to ensure data in the database abides by certain rules

Under this definition SQLite - the library - clearly is an RDBMS that leaves out some common features that do not make sense within its niche but is otherwise fully functional and under this definition the files that SQLite manages are the database, not just merely a file format.


The zip utility.

[x] Data storage, retrieval and update

[x] User accessible catalog or data dictionary describing the metadata

[ ] Support for transactions and concurrency

[x] Facilities for recovering the database should it become damaged

[x] Support for authorization of access and update of data

[ ] Access support from remote locations

[x] Enforcing constraints to ensure data in the database abides by certain rules

Congratulations, apparently zip files are as much of an RDBMS as SQLite. If I bundle zip with ssh (Access support from remote locations) and Linux (Support for transactions and concurrency) did I just create a new RDBMS?

How many checkboxes do I have to have in order to call anything an RDBMS? Is SSH an RDBMS (Access support from remote locations). Can I just put a catalog in a .txt file and check that box? Is XML an RDBMS because it enforces constraints and supports storage retrieval and update? Are chmod and chown an RDBMS because they support authorization of access and update of data?

> SQLite manages are the database, not just merely a file format.

It turns out databases can be just files. Those files must follow a described file format. SQLite files are relational databases that are instantiations of the file format specification for SQLite files. The SQLite library implements that file format as well as some clever logic to support SQL and ACID compliance. Some SQLite libraries do not support these things.


I think you’re stuck in 90/10 rule territory here. But even so, SQLite was 220,000 lines the last time they measured, which was five years ago. You can do a lot of functionality in 22kloc, even ignoring the other 90%, which you shouldn’t

Lodash, for instance, is much smaller than 22k lines, and it “just” manipulates objects and lists.

If you downplay others like this, I wonder how you feel about your own work. Have you been working hard for years on something that “just” accomplishes a straightforward task? Are you happy? I know I wasn’t.


I'm not downplaying anybody's work. Many people approach SQLite as something in the RDBMS territory. It's not. Almost all of the confusion I've ever seen related to SQLite comes from starting from that basis. If one simply thinks of it as an alternative to fopen() then it makes very simple and intuitive sense.

The people in this thread seem to be very resistant to this simple clarity of thought, but whatever, they can stay confused and keep coming up with feature comparisons of SQLite vs Redshift vs Elasticsearch or some such.

If one were to draw a spectrum:

   file-format:<-x----------------------------->:DBMS
SQLite is the x on this line and .txt files are about the only thing that any further left on it.


There is no such spectrum. SQLite is a piece of software that implements some but not all commonly expected RDBMS features. Software is not a file format but software may be written with the expectation that a given file follows the requirements of a certain file format and the software may be written in a way that it produces files that follow the file format specification. Since SQLite - the software - is an RDBMS the files it produces can be considered to be databases.


I just drew the spectrum. It exists now.

SQLite is a file format. You can read/write SQLite databases without the SQLite library. You can write your own custom reader/writer/creator. You don't have to use SQL. You don't have to be ACID compliant. I can make for you right now a SQLite database that did not touch any SQLite software, put data into it and you can open it with another piece of software that implements the SQLite file format specification.

Likewise, you can use the SQLite library software to create a SQLite database file, put data into it, and I can read it/update it using any other software that implements the SQLite file format specification.

The SQLite library offers some very very basic features, such as ACID compliance, and so on, but those are not part of or guarantees of the file format or the database files. The software that you write that implements the SQLite file format specification does not have to do any of these things to work with or produce a valid SQLite file.

An RDBMS is a kind of DBMS for managing relational databases and providing access to the databases (for example users and permissions). Modern RDBMSs offer extensive features (look at an Oracle or MSSQL Server spec sheet) that are not even hinted at with the SQLite library software.

This is because SQLite is not an RDBMS, it's a file format.


The impression given here is that you only use it as a dumb store of data. My experience is that it's more like:

  file-format:<------------------------x------>:DBMS


Why, because it offers SQL support? That just makes it a relational database that supports SQL. MS Access supports SQL.

If you were to draw up a feature list of Oracle, PostgreSQL, MS SQL and SQLite, SQLite would have almost none of the features of any of the actual RDBMSs.

Here's some examples from MS SQL:

- Support to PMEM devices and bypassing OS storage mechanisms for optimal file read/write access

- Availability Groups and synchronous replica pairs

- Users and permissions

- Secure Enclaves

- Certificate management functionality

- BI tools

- Database tuning advisor

- Machine Learning services

- Service Broker

- Replication services

- Analysis Services

- Reporting services

- Notification services

- Integration services

and so on.

Draw up a set of features for JSON files and jq and compare to SQLite. Is it closer to MS SQL or JSON?


I've seen several software projects that are built for rdbms let you use sqlite. It works.


They're using SQLite as the file format for persisting data. It's a great file format for this. You can even build an RDBMS over top of uncompressed WAV files if you want. It doesn't make WAV files RDBMs.


But SQLite has rdbms features. I remember being able to show tables and do SQL queries in SQLite DBs.


Those are relational database features, not RDBMS features. The SQLite file format specifies a way to organize, store and retrieve fairly arbitrary data using a relational database model.

The library knows how to handle SQL to describe the work being done. The SQL is optional, one can, with the specification, read/write SQLite files in many other ways.

There are almost no RDBMS features in SQLite. There are many many other file formats that store data that offer features that are very similar to SQLite files: indexes, journaling, write logging, etc.

Thought question, you can ask a .tar file to give you a listing of what files and directories are stored in the file. Are .tar files RDBMSs? Consider:

1 - If you consider each file in the .tar file a "table" you can get list of tables.

2 - If each file follows a regular format, say JSON, you can search the "tables" by extracting the file and grepping it or using jq or whatnot.

3 - You can store a special file that is an index of some kind that lets you know in which file some data is, or even where in the file it is.

4 - You can build logic such that when you want to do other CRUD operations you can record a journal and a write ahead log to help build in ACID compliance.

5 - You can build buffer logic to support Write-ahead-logging, transactions and what not to improve performance.

Are .tar files RDBMSs? Trivially no.

But maybe, if you do all these things, you've invented a terrible database and database engine.

However, you need to build a server, user access controls, connection pooling, import/export tools, partitioning, clustering, etc. before you start to arrive at an RDBMS that uses this engine.


You're making some basic assumptions that do not make sense.

The database can be physically stored in any arbitrary format. One can build an RDBMS that stores all its data in tar or JSON files. No matter how inefficient as long as software exists that manages the database.

>Are .tar files RDBMSs?

This question doesn't make sense because you are asking if databases can be management systems which is obviously false by definition.

If the 5 steps you have described are implemented in software then that software would be considered an RDBMS and the .tar file clearly would be a database. There is no confusion.

>However, you need to build a server, user access controls, connection pooling, import/export tools, partitioning, clustering, etc. before you start to arrive at an RDBMS that uses this engine.

Those features are not necessary for a piece of software to be called RDBMS but most industry standard RDBMS do indeed support these features and SQLite clearly is an RDBMS that pursues a certain niche that only makes sense in certain situations.


> This question doesn't make sense because you are asking if databases can be management systems which is obviously false by definition.

Yes! And by extension the tar utility is not a DBMS even if it checks some of the boxes for one. And thus SQLite files are not RDBMSs. Looks like you and I agree.

> The database can be physically stored in any arbitrary format. One can build an RDBMS that stores all its data in tar or JSON files. No matter how inefficient as long as software exists that manages the database.

Sure! One can come up with all kinds of very terrible software. But what's the distinction between some random software that just allows CRUD operations on a file format and an RDBMS by your definition? Because you've defined something close to 100% of all software as an RDBMS which makes the distinction between software and RDBMSs meaningless.

There has to be something more than just that to be an RDBMS doesn't there?


Whilst .tar files are not an RDBMS. I can assure you that SQLite more than qualifies as a RDBMS.

Even the popular vote says it is: https://www.google.com/search?q=sqlite+rdbms


If there's no Inter Process Communications and it doesn't use the OS to write to the filesystem...how does an application write to SQLlite db?

I'm a noob and just curious.


It uses the OS to write to the filesystem. It just does some clever management of data needing to be written. You can build your own if you are clever enough to read write to your own file format too.

If you want to build an RDBMS using SQLite as the core, you can. You can also do it using uncompressed WAV audio files if you are clever and hate yourself enough.

To use SQLite in such a scenario you simply have to write the entire RDBMS minus the file handling routines. This includes a connection pooling mechanism and a single process to isolate the connection to the SQLite file so that the OS doesn't get angry when you try to have multiple things writing to it.


Does this mean that you can hack some database storage (w/ sqlite) together on frontend only hosting platforms like Github or Netlify?

I think not, but I wonder if some hack is available by virtue of it simply being a file that you can read (and somehow) write to.

The best I came up with: let's say you have a toy project, and you call the Github API and replace the file upon every write. Implementing a read is easier as you know where the file is located. This hack shows that you somehow need write access to get any form of performance out of it, because this hack is super slow.


> Does this mean that you can hack some database storage (w/ sqlite) together on frontend only hosting platforms like Github or Netlify?

You can run SQLlite in the client via webassembly therefore open a SQLite file in the browser to query it yes, you just can't write anything in it and expect it to persist somehow on the static hosting service itself.

> The best I came up with: let's say you have a toy project, and you call the Github API and replace the file upon every write. Implementing a read is easier as you know where the file is located. This hack shows that you somehow need write access to get any form of performance out of it, because this hack is super slow.

No you'll need a server for that, you can't make random HTTP requests to any server in the browser, because of CORS/SAME ORIGIN policies.

"SQLite is serverless" is meaningless buzzword. It just means that SQLite is equivalent a flat file where you'd shove some data, just that you can use SQL to query that file instead of having to index data in it yourself.


"SQLite is serverless" is by no means a meaningless buzzword. The term had a clear meaning before it was coopted by the current web dev fad. SQLite does not operate on a client-server architecture the way e.g. MySQL or PostgreSQL do.


Just to back that up, it had a clear meaning because -less is a valid suffix to append to English words. When grandma runs out of cookies she is cookieless (when a website doesn't use cookies it is also cookieless). It doesn't have to be "in the dictionary" to make sense in conversation.

A quick search of usenet shows "serverless" being used in 1994. It wasn't a term or a buzzword, it wasn't common, it was just English: https://groups.google.com/d/msg/comp.os.linux.misc/r76oNl98C...


Yes, this is what I was trying to say. It wasn't a thing people would throw around like a buzzword but if someone used it in conversation and especially in context (like the SQLite page, which was written over a decade ago, does) people would understand what you meant.


> does not operate on a client-server architecture

That was never a common usage of the term serverless.


The linked article was written in 2007, for your information.

Software development didn't begin or end with web dev and the cloud.


You could make it persist if you have it uploaded somewhere. But then I guess thats not serverless.


"Serverless" is a garbage marketing term in general. It sounds sexy to nontechnical management folks who are used to hearing a bunch of expensive costs and the word "server" associated with them in some fashion. From that view, anything that gets rid of those pesky "servers" sounds like a win.


It is, I did Azure Functions which is their "serverless" solution. It's either a type of CGI 'trigger' which is not as impressive however, another 'function' can be triggered based on Database actions or file uploads in their Blob Storage. So it's just handy if you're all in on Azure, and probably other clouds because you can have code run immediately when events happen for certain services. Outside of that, I find it wasteful.

The other buzzword is microservices. I was working a gig that involved running like 10 on a MBP with 16GB of RAM and it froze up the laptop. Java is not a language for micro anything, at least not in regards to memory. Was fun but dreadful to work with more than 3 local services at once. The creators of the framework suggested to mock services, but it was just messy to do that too.


> The other buzzword is microservices. I was working a gig that involved running like 10 on a MBP with 16GB of RAM and it froze up the laptop. Java is not a language for micro anything, at least not in regards to memory. Was fun but dreadful to work with more than 3 local services at once. The creators of the framework suggested to mock services, but it was just messy to do that too.

This seems like a strange criticism. It appears that you want a full deployment of the platform on your local machine. But Microservices aren’t optimized to run on your local machine; they’re meant to be deployed to kubernetes/docker swarm/mesosphere.

I’m also no fan of Java but I don’t think it’s the language that is preventing you here? All speculation since I don’t know the details but it sounds suspiciously like the JVM is getting stuck trying to allocate heap during startup (this is just a guess).


And when you need to interact with other services in a disconnected way? You have to run them somewhere... and local should generally be an option, even if that's to a local single-node (mini)kube cluster.

There's nothing wrong with needing to run more than one thing while developing/testing and interacting. There are other options, but it's not a clearly bad approach.

That said, I would definitely lean towards Go or Rust myself if given the option (even though I have limited exposure to either) over Java. C#/.Net Core is in the middle imho. Even node works pretty well, but the tons of files thing bugs me sometimes. We're leveraging more node and C# where I work and I definitely prefer node... but for more services I do think that Go and Rust are probably better options.


In general, staging environments are the solution to cross service testing for Microservices. The expectation that a Microservice must be testable end to end locally is not one I would like to encourage. The whole point of splitting out a service this way is that individual services have very narrowly scoped, independent function and can be tested on their own, perhaps with mocked data.

One wouldn’t expect to run all of a companys entire pipeline locally on a single laptop. Expecting that for Microservices is similar.


Java has been used on servers since 256MB was common to use — Java is not a limiting factor in how lightweight you can get on modern Intel hardware, though it might need some tuning.


The bloat was in the framework used, not as much on Java itself.


Not true. While the term is hyped, it is done so for good reasons. It’s not just that management doesn’t want the pesky servers, it’s that most developers themselves don’t want or care for it. Devs want to use platforms that let them quickly deploy their code that implements business logic, and servers/kubernetes and other abstractions aren’t things many devs give a fuck about.


This is overly reductive. “Serverless” might be overhyped, but amongst devs and ops it is a perfectly good way to succinctly describe a very specific architecture with real merit. Paying for exactly the compute time i use is pretty sexy to me when I want to deploy and scale a side project.


I would say (X)aaS (whatever as a service) is probably a better term... like DBaaS in this case, vs Serverless as in in-process, which is probably a closer and better use of the term.


What was wrong with "managed"? It's pretty generic and transfers across many domains and also, more fairly descriptive of what's going on.


"managed" is an AWS instance or a digital ocean droplet. "serverless" is an entirely different thing.


"Serverless" is another level of abstraction and management, that's it. Are you proposing that a term collision adds too much confusion?

"Managed" hosting has been around for decades as have "managed" services, in general, long before Amazon even existed. EC2 brought a new level/layer of automation and management ontop of what colo datacenters used to do, provide scaling etc. To be clear I'm not saying it's simple but "managed" describes much of it.


I would argue that the architectural implications of using a "run individual functions on a remote pool of computing resources and build your client application around this pattern and only pay for the exact amount of compute time you use" is not sufficiently covered by the term "managed". I think there could have perhaps been a better name than "serverless" chosen, but that's the name enough people have agreed on to use when talking about this specific architecture that it would be difficult to use another one.


Agreed, I'm not big in devops, but I can see the usefulness of it if you're already part of a cloud provider, but like I said I only know of the capabilities of it in Azure, I'm not familiar with the capabilities elsewhere.


This perspective has always confused me. Would you say that structured programming was a garbage marketing term, since even though the programming language didn't have goto the compiled machine code has jumps and branches? I would hope not, because the point is that an interface is provided on top of the underlying structure so that you don't have to think about gotos outside of exceptional cases.

Not having to think about what servers in which datacenter are serving a request is pretty convenient. We can argue about whether the pricing for most serverless computing is bad, but as far as the term and the technology itself I don't see how this is different from any other abstraction.


The criticism is on the nomenclature, not what's being provided. "Structured programming" is... pretty well named IMHO. Serverless is a misnomer at best and borders on fraud/false advertising from the name alone.

The technologies are fine and there are potentially useful cases for it. It's no silver bullet though and is overhyped IMHO.


I’ve done that, by persisting to an S3 backend.. but there’s no concurrent access and it is all one big race condition.


> Does this mean that you can hack some database storage (w/ sqlite) together on frontend only hosting platforms like Github or Netlify?

The meaning of that 'serverless' and this 'serverless' is incompatible, since the web inherently needs servers.

I think the best you can do is uploading the SQLite file to a static hosting platform & decoding it in the browser. You can then use it as a file storage with DB capabilities. (Which is what basically really SQLite is.)


I will shamelessly plug myself again.

I am the main author of RediSQL [1] and I am about to launch a managed service for it.

It will allow to write SQL (SQLite dialect) against an HTTP or Redis protocol, to make thing clearer: https://simplesql.carrd.co/

Eventually it will upload your database to an S3 bucket for backup.

[1]: https://redisql.com/


Looks pretty cool!

I can imagine it would be useful for data-science projects, if the pricing is right.


What pricing would feel right to you?


I'm far from an expert on such things, but I like the adjustable fixed-price model (like digital ocean).

Whenever I see a price like 0.0002$ per hour, I feel like they're trying to mess with my intuition.

That's especially applicable to data-science, because you're not worried so much about automatic scaling, you just don't want to be surprised at the end of the month.

I don't know if you can match digital ocean's prices (they offer managed postgres, for a fair comparison), but if you can get close then you have a chance.


I was thinking something based on size or performance.

1 req for second, free.

50 req for second, 5€.

Unmetered, 30€.

Maybe not counting for seconds but for hour or day so to accommodate burst.

Another option would be on features. But that will requires time.


If you have full support for SQL, it means some queries can run for a few seconds, possibly even minutes (even sqlite supports some form of bfs). So will you just time-out those requests? That would make a lot of possible uses suddenly impossible.

I might consider the unmetered option, if the performance gain is worth the 6x cost of other providers.

For web and games, maybe your existing model could work, but then always having a recent backup also becomes more important (and for most sites, the lack of acid is a deal-breaker).

The nice thing about data-science is that you interact with the workspace yourself, so you know exactly when you want to save a snapshot.


Hummm, what you mean by data science?

Like running analysis in a Jupiter notebook?

Indeed I am not sure this case is a good fit...


Yes, that's what I meant. And possibly for ML preprocessing.

Just out of curiosity, who do you imagine your users will be?


The problem with data science is that usually you have relatively big datasets, you care more about throughput than latency and you work in secure an environment where you can definitely have access to the database credentials.

Streaming over the network the result of a big select is not ideal, moreover I believe that data scientist prefer to work with common technologies. I mean that there are already adapter for SQLite or PG or MySQL, while for RediSQL it won't be as straightforward.

I am thinking to developers for the JAM stack would be interested to this sort of API. Or people that want a database without having to think too much about it.


Well, many choose to do their data-science over aws and similar, so I'm not sure there's a big different. I see you point about throughput and network load, but part of DS is data analysis, where the work is mostly exploratory: finding connections in existing data, and working heavily with aggregated data and previews, rather than just using it as a pipeline for other systems.

I think for most JAMs the network is a bigger hindrance than the query time. So, I hope you know what you're doing.

Anyway, cool project. I'll make sure to check back in a while and see where it went. I'm working on an "adapter" (so-to-speak) that queries SQL, so maybe I'll add yours too when the time is right.


The advantages of using an API like these on JAM it would be that very sophisticated application could be written completely client side. Which is quite interesting IMHO.

What is your project?


My project is an interpreted query language that compiles to SQL (with support for several backends).

It is more capable than ORMs, and provides a layer of abstraction that SQL direly needs but lacks, as well as a shorter syntax, that is in-line with other popular languages.

Here is a very early version of it: https://github.com/erezsh/preql

I've kept working on it, but privately, and I'm trying to make it into a product.

I will probably release it as open-source when it's ready. I still need to figure out the right license, financial model, etc.


This document is golden, serverless means an application reading and writing to the database in the process directly


I find this article valuable. I made a mental note to check if SQLite is enough every time I consider one of the “serverless” options.


Choose your own definitions... but SQLite is not serverless by common parlance.

Author creates two definitions for serverless which don't match the common usage. Serverless is more about DevOps / deploy experience than how the program leverages OS processes internally.

Apparently MS and AWS are ISPs?

Maybe SQLite could be serverless if you defined it as incapable or running as a server on its own?


Serverless used to have a very clear definition: not having or using a server, so SQLite is a perfect example of serverlessness. I was extremely confused when the newfangled definition (called neo-serverless on TFA) showed up, or your definition for that matter. Who ever thought up theses confusing meanings for a word that used to be perfectly clear?

You don't need an ISP to have a server. Any computer or program that listens to a network port is a server.


Or is it any program that _responds_ to comms on a network port. The "serve" part of "server" ;)


MS and AWS are indeed ISPs, but not because they have servers.

They have their own IPs and global networking infrastructure


I don't think we fully appreciate just how pervasive SQLite is in our computing devices. Apple recognized its power and utility and embedded it in macOS and iOS, and Google apparently followed suit in Android. The universality of SQLite is due in no small measure to its stellar quality and the trustworthiness of its author, D. Richard Hipp.


Serverless, like all embedded databases ;-)

In the sense you don't need to provision an additional piece of infrastructure to power your application :-)


Yep SQLite is in fact serverless. We based our open source web based IDE on SQLite and it works fantastically. The best part is that you can even take a SQLite database and use sql.js and run it offline in the browser!


"Microsoft Azure Cosmo DB and Amazon S3 are examples of a neo-serverless databases."

Can we back away a bit from the bandwagoning of misused terminology? Serverless literally means "running your apps on somebody else's server". S3 is not a server you run your apps on, it is SaaS that you manipulate through an API - you don't put your apps on it. If S3 is serverless, then literally every network service of any kind is serverless.


I always preferred XaaS nomenclature myself. "XaaS with autoscaling"... is probably better than "Serverless"


The main problem that I see for using sqlite is exactly for it being 'Classic Serverless'. Because how does one keep an up te date backup? Deploying to Heroku, Dokku, AWS Lambda and such means the sqlite file will be lost on a crash or new deploy. Even a VM can crash. Export to S3 on every write? Maybe if changes do not happen often, so only for specific use cases (and actually I think you should just generate static html in that case).

I use sqlite for local tests for cases where the live app has a 'neo-serverless' database. It is very, very fast so the tests run almost instantly.


> Because how does one keep an up te date backup?

https://www.sqlite.org/backup.html

There's a ".backup" command:

* https://sqlite.org/cli.html#special_commands_to_sqlite3_dot_...

Alternatively, given that it's ACID, you could just take a snapshot of the file system/volume in question, and do a recovery on restore.

Edit: SQLite also has WAL files, so presumably one could just use tar/rsync to create the backup, and only the last file would be 'corrupted', so you'd lose the last (few) transaction(s):

* https://www.sqlite.org/wal.html


Ding! correct answer. In addition, the backups can run in the background, simultaneous to active use in other threads. When the backup is initiated, it is a transaction, so it maintains knowledge of what portion of the database is new since the backup began and does not include that data. Which is what you want for backup integrity.


>how do you back up a file


It's a valid concern in so far as that any standalone database setup you would use at a serverless provider has backups already taken care of (for example AWS Lambda with AWS RDS has backups built into RDS). If I deploy SQLite instead I have to take care of backups myself, and that might not be trivial to get right.


It's just a single file. Easier to backup than any other database.

Also I don't think any cloud provider/database provides an always up to date backup other than a standby replica (which isn't also a backup exactly).


If I remember the "ways to corrupt SQLite databases" page discussed a few days ago, backing up SQLite isn't entirely trivial. By default there are up to three files that have to be copied simultaneously, else you risk corruption. The optimum is to run the backup command to create a copy, but that requires realizing this exists.


If you do find the link could you please share it here. I think I've been hit by it once before.

Also, it's comparatively simpler than other DBMS's like Postgres or MySQL.



Thanks. Useful information.

But I've been copying after taking a shared lock on the sqlite db and I think that's supported as mentioned on https://www.sqlite.org/backup.html.

The online backup API is nice but it has to be done in-process or through a dedicated application. While the lock and file copy is very easy to do using whatever shell the OS provides.


I like the definition of classical- and neo- serverless. Does anyone knows more articles about serverless in terms of classical serverless?


> Neo-Serverless: The database engine runs in a separate namespace from the application, probably on a separate machine, but the database is provided as a turn-key service by the hosting provider, requires no management or administration by the application owners, and is so easy to use that the developers can think of the database as being serverless even if it really does use a server under the covers.

I’m having a failure of imagination here.

What would I use a database for that I could reasonably assert to my peers and superiors that no maintenance whatsoever is required? Backups and restores count as maintenance. Multi region is now common, if not pervasive.

Are there public datasets that are so common that it would be worth it to provide it as a service? What other service would behave like S3 but look like SQLite?

I get that one might be safe to assume that “serverless” isn’t just pure functions. It could reach out to other services that are not serverless and still not consume (further) resources on a set of machines while not in use.

But a severless database... I’d have to have something aggressively read-mostly, written to S3 at intervals and read from serverless processes. But is that a new thing or reading data from S3?


I integrated my database with the app server, but I still open a socket internally to access it.

So my system is also "serverless" in that meaning.

Acronyms used to be confusing, this is just ridiculous.

http://root.rupy.se


CSV is also serverless..


But usually not multi-user-write with transaction support.


CSV does not traditionally involve a server. RDBMS do.


CSV doesn't have a write-ahead log.


The file system it is stored on does.


I don't think sqlite had WAL when that page was created.


I think bob means transactions.


Transactions, durability and acceptable performance with concurrent access.


Very serverless. The new thing.


I use SQLite as a backend for my website and absolutely love it. Not having to setup a full dbms just for a small blog is a dream, and backups are as easy as copying a file.


I currently use it to build a read-only rest service for GeoNames gazetteer. Each instance contains a copy of the database which means its really linearly scalable.


My understanding of serverless = easily scalable, managed service.

But somehow the word annoys people. Maybe we should find a better word?


> My understanding of serverless = easily scalable, managed service.

There is already a term for that concept: managed services. There is no such thing as a managed service that's designed not to be scalable. Some implementations may be better at scaling than others, but that's it.

The serverless buzzword is pure marketing.


On the one hand, the term serverless is really frustrating to me.

On the other hand, there's a certain amount of amusement I get from seeing "the cloud" become a marketing buzzword in the mid-late '00's, and now seeing the same thing happen with "serverless."

When you look at the implementation of the two "technologies", they're about 95% the same. Yet somehow they're pitched as these big revolutions.

In another decade when terminals or p2p become popular again we'll be hearing about some new buzzword like "Terran" computing, or "social" architecture or something.


For me, serverless is not just "easy" scaling or a managed service, but completely automatic and elastic scaling with metered billing. If you can go from 1 to millions of requests without touching your infrastructure or dropping requests, then that's serverless. If you have to provision larger instances or even wait 10 minutes for your autoscaling policy to spin up new VMs so you can serve those requests, then it's not.

Some services labelled serverless really do reach very close to this ideal (S3, for instance), while others fall short in various ways. "Serverless" Aurora, for example, can't scale writes beyond a single instance, so while it can take you quite far (a 96 core db instance can handle a lot of writes), past a certain point it's no longer really serverless anymore since you'll have to figure out some kind of sharding strategy to keep scaling writes. With S3 or DynamoDB, this doesn't happen. While even those services do have some sanity check limitations, they can scale seamlessly up to the point where you start to approach the scale of AWS itself.


I agree, yours is definitely a better description.


I always explain serverless as "Instead of one server you have countless servers, but they are managed by someone else".

Yes, a better word would be great. I think that ship is sailed though...


'Instanceless'. Because the difference is that you outsource and forget any questions about the instance(s) of the service (and whatever supports it, like the os process, the kernel, the vm, the real machine, the datacenter).


My understanding was that cloud = easily scalable, managed service.


That was mine too. It’s possible that this previously applied to primitives (cpu, memory, network, storage) and now refers to applications (keyvalue store, SQL database, message queues, etc.)


serverless it mean no server but everything use servers, yep we should find a better expression.


Peer-to-peer networking in its most pure form does not need a dedicated/central server. Each machine hosts a client/server instance locally.


I never found things like DBaaS annoying at all.


"There's no server process" vs "there's no server on your itemised bill".


I don’t get it, then any db can be classic serverless if they run in the same computer as the app?


>any db can be classic serverless if they run in the _same computer_ as the app?

Not just the "same computer" ... it's the same process id (PID).

Extract of relevant text from that webpage:

>Classic Serverless: The database engine runs within the same process, thread, and address space as the application. There is no message passing or network activity.

In other words, when you compile and link "sqlite.c" into your own executable, the same PID (process) that handles text input and paints pixels on the screen -- is the same PID that writes to the sqlite database file. It's all the same process. That's what they mean by "classic serverless".

In contrast, if you make a Go executable that writes to MySQL/PostgreSQL db and make them both run on the same physical computer, that's not "classic serverless". It's because when you enter "ps -aux" to list all running processes, you see separate PIDs for the Go executable and the MySQL db engine.

Other jargon used might be "in-process" vs "out-of-process" or "embedded" vs "external". SQLite is sometimes characterized as "in-process embedded database" but MySQL is an "out-of-process" db.


No, there’s still some form of IPC going on then. SQLite is an embedded database - everything happens in the memory space of the same application. Of course you can still throw a web front-end in front of your application but there’s still no extra hop to the database, per se.


Yes, technically, although they make the distinction is that it's in the same process.

It's a bad definition. I would stick with the terms "embedded" or "in-process" which have been around for decades and are well-understood.


Isn’t it easier to say “embedded” and “severless”


Not in this thread apparently. I guarantee nobody here as ever used the term "serverless" over "embedded" or "in-process" in their entire career but apparently the purity and nostalgia of SQLite overrides it.


This article was written in 2007, so yes, people have been using the word "serverless" before managed service providers decided to take it up as a buzzword.


Nobody used it before. Even the about page says: "SQLite is an embedded SQL database engine."

"serverless" is a marketing term, then and now. Not sure why there's such a big defense of it. If you have to argue this much over the provenance of a term that clearly isn't used the same way today then it's a good sign that it's not very useful.


> Nobody used it before.

The archive.org links already provided to you prove that you're flat wrong about that. SQLite used the term many years before it was a webshit buzzword. They used it in the intuitive straight forward english sense; somebody who has no hat is hatless, somebody who has no home is homeless, and a database system that has no server process is serverless. In 2007 when it was written, nobody would have batted at eye at this term, the meaning would have been immediately clear to anybody who had any familiarity with databases.


Just because a single page on the internet used it does not mean the term was used in the industry. It also doesn't matter if it technically makes sense, although it's a stretch (because there is a server process, you just share it).

It's a marketing term, and a poor one at that. That's why SQLite even describes itself as a embedded, in-process database without client/server architecture. Because that's the common jargon.

I'm surprised at the endless defense of a marketing buzzword and the argument over which marketing definition is the "real" one. This is bikeshedding at its finest.


Nobody was saying it was a buzzword in the industry back then (actually...[1]). It was being used as a normal English word, not some misleading marketing drivel like the new use of the term. Somebody who has no shirt is shirtless. An MRE heater that doesn't catch fire is flameless. Somebody who isn't witty is witless. Somebody who doesn't have a clue is clueless.

Are you starting to notice the pattern here? "less" is a suffix that can be applied to nearly any noun to describe the trait of lacking that noun. When this is done, the meaning is clear to native English speakers who've never heard that combination before. When this article was written in 2007, the meaning was clear. SQLite didn't muddy anything, didn't redefine anything.

[1] The term 'serverless' was in fact in use in the decade and a half prior to 2007: https://books.google.com/ngrams/graph?content=serverless&yea... I don't know if the term was being used primarily in a technical context, but I have little doubt the term was being used to describe something that did not have a server. In fact I'm just about certain that spike was the term being used in a tech context. I've found you an example of the term being used in 1995 (DOI 10.1145/224057.224066):

> A serverless network file system distributes storage, cache, and control over cooperating workstations. This approach contrasts with traditional file systems such as Netware, NFS, [...] where a central server machine provides all file system services.

This is not precisely the same way that SQLite has used the term. Rather, they're using the term in an intuitive natural way that fits their context, just as SQLite used it in 2007.

Here is an example from 2007 (DOI: 10.1109/TNET.2006.886289):

> Abstract—We explore exploits possible for cheating in real-time, multiplayer games for both client-server and serverless architectures.

Here we have the term "serverless" clearly being contrasted with "client-server", which seems very similar to the way in which that SQLite document used it.


This is not new or unknown information, why is this on the front page?


Because other people deemed it interesting.


The fact that SQLite is serverless is common knowledge and not interesting unless you're brand new to databases


I think better name (and coined since 1990ies) is embedded db engine. MySQL, Firebird (InterBase) etc can also be linked into binary and run in the same process, without any type of sockets.


Two definitions of serverless?

I heard many, but what they're writing isn't one of them.

As far as I can tell SQLite was always called an embedded database, and never a serverless one.


The page is more that 10 years old. And it uses "serverless" in the sence of server-less aka not having a server. Which is pretty damn sensible terminology, as opposed to the newer "serverless" meaning" runs on some utility server outside of your care or control".


I know, but have you anyone calling "embedded databases" "serverless databases" in the last 10 years?

It's like someone found this page and felt very smart about it, because they stick it to the serverless crowd.


"embedded database" does not imply "serverless database". Other embedded RDBMSes run servers; they just run the server as a separate thread rather than a separate process. SQLite is different in that there is no separate thread of control. SQLite runs in the same thread as the application that calls it. There is no separate thread hanging around to clean up or handle background tasks after an SQLite function call returns.


> SQLite is an example of a classic serverless database engine. With SQLite, there are no other processes, threads, machines, or other mechanisms (apart from host computer OS and filesystem) to help provide database services or implementation. There really is no server.

Well, it's bending the overall consensus defining serverless as a managed / and or stateless service.

I never saw anyone using the term "serverless" to mean "embeded".

Using this definition, anything and everything that is not requiring a specific server to be served / distributed can be described as serverless.


The SQLite page dates back to 2007 at least: https://web.archive.org/web/20071115173112/https://www.sqlit...

> Using this definition, anything and everything that is not requiring a specific server to be served / distributed can be described as serverless.

The distinction is useful to make when similar systems traditionally rely on a client/server model. Which many DBMS do, to say nothing of RDBMS. That it is server-less is a distinctive feature of SQLite as an RDBMS.


> Well, it's bending the overall consensus defining serverless as a managed / and or stateless service.

That assertion is quite the stretch because there is no consensus on what serverless actually means. The only thing that exists is that the concept of function-as-a-service is being forced as a placeholder for serverless, but some vendors try to manipulate the definition to include their managed services offerings.


There's a section added in 2018 to deal with the apparent confusion.

It'd be a better idea to just delete the page. It may have been written long before the meaning changed, but it's pointless to fight a losing and completely insignificant battle over language.


Hard disagree. The "neo-serverless" version has always been extremely confusing to me. I expect serverless to mean the absence of a server.

In SQLite's particular case, it's subverting the expectation that has persisted since the beginning of time (of databases) that a database must be managed by a server.


Why even write this? It's yet another convoluted use of the term and meaningless for SQLite of all things.

Re: downvotes - what are people disagreeing with? That the term is not convoluted? That it's actually useful? That it helps to have more sub-definitions in an industry known for overloaded terms? I guarantee not a single person here has used "classic serverless" over "in-process" or "embedded" in their entire career.


It describes an important architectural feature of SQLite.

Also, this page was first written in 2007 (or perhaps earlier) [1], long before 'serverless' was applied to things like Amazon Lambda.

[1]: https://web.archive.org/web/20071115173112/https://www.sqlit...


Adding an original timestamp to the page would've been more helpful than the updated section, or just deleting the page entirely. Or just title it as "SQLite is not client/server".

Nobody goes around talking about "classic serverless" instead of "in-process" which has been around for decades.


The article is from 2007, and explicitly calls out that the "neo" usage of the term is different than the original. If you scroll to the fifth line or so, you'll find the update. I didn't have to scroll on my device, though i can imagine there are some smaller devices that may not have that line on the screen when opening the page.


Where does it say 2007? Regardless it's vague and irrelevant material.

Classic serverless has always been known as "in-process". The attempt at differences seems like adding marketing fluff rather than just removing the page entirely.


It doesn't say 2007 anywhere, but the page is clearly at least that old.

https://web.archive.org/web/20071115173112/https://www.sqlit...

I also found material from 2004 with the term.

https://www.tcl.tk/community/tcl2004/Presentations/D.Richard...

> Regardless it's vague and irrelevant material

It's not vague at all, the term makes sense to distinguish "in-process" from client/server models. The page also includes an explanation on the very next line.

> Adding an original timestamp to the page would've been 100x more helpful

Not everyone reading about a database product is a developer. A timestamp would not help anyone unfamiliar with the history of the serverless term. The alternative is to rewrite all instances of "serverless" in the documentation which is a waste of time. Modern "serverless" is a stupid buzzword and this page clearly serves as a protest.


Complete history of the document in question is here: https://www.sqlite.org/docsrc/finfo?name=pages/serverless.in...

It was, indeed, written in 2007, but based on ideas that predate that.


I would love to know who uses "serverless" instead of "in-process". Why add a new term at all?

And if the definition has since been muddied, then all the more reason to avoid using it instead of creating even more niche definitions.


> I would love to know who uses "serverless" instead of "in-process". Why add a new term at all?

"In-process" is meaningless to non-IT people, they don't even know what a process is. The SQLite dev probably created the term for marketing purposes, i.e. the exact same reason cloud providers adopted it 10 years later.

> And if the definition has since been muddied, then all the more reason to avoid using it instead of creating even more niche definitions.

I disagree. The term in relation to SQLite is clearly defined and predates the modern version, there's really no need to go back and change it. You're also disregarding the statement made by the SQLite dev by keeping this page and updating it with a clarification.


SQLite is meaningless to non-IT people.


Those people wouldn't end up on a random documentation page for SQLite then, making this argument superfluous.

> I would love to know who uses "serverless" instead of "in-process". Why add a new term at all?

This discounts which term came first. Back in 2007 it was just fine to talk about this as being serverless, the marketing term gained popularity years later. They even talk about the more recent definition on the page, I really don't get why people in this sub thread get triggered by some random documentation page written over a decade ago. There's no need to further change or delete that page because this discussion is lacking any practical relevance.


I didn't make the argument, the other poster did.

It doesn't matter what definition came first. It was bad back then when embedded and in-process already existed. Now it's even worse.


Non-IT people make IT decisions all the time. If you can sell your IT product to C-levels, they'll force IT to use it. Look at how companies misuse things like blockchain, ML and AI just because of the hype around those words.


Yea, that's now how SQLite has ever been sold. All the other comments on this post have since shown just how useless the "serverless" label is for this.


I have seen that page for around 10 years or so (except the section that was added in 2018). The term has a very clear meaning and consistent uses within the docs (and consistent with my common sense understanding of the word too). It is not SQLite's fault that AWS and co muddles the water with their usage of the term


It's not their fault. It's not anybody's fault. Language evolves.

But if you wouldn't write it this way today, you should just change it, instead of drawing your readers into a pointless fight over semantics.


> another convoluted use of the term and meaningless

Just as the term serverless itself?

I find this, old version, superior.


> It's yet another

Actually seems more like the original than yet another. And also it makes more sense “serverless” as in “there is no server” and not as in “someone else manages the server for you”.


The same reason supermarkets put "gluten free" on things like butter, water and vegetables. While it may seem stupid and obvious to those informed and educated, theres a point in everyones life where they dont know anything about a subject and they need a first step. Hopefully, its a first step in a deeper understanding and education on the subject at hand.

I'm sure people will google stuff like "Is SQLite serverless?". There are no such thing as stupid questions, you're only stupid if you choose not to learn.


Wouldn't it be better to say "SQLite is an embedded database and not client/server". Easily understood using very well-known and absolutely clear terms.

Saying "serverless" is trying to sum up that definition into a single word. It wasn't that useful back then and now has been further overloaded. Who is going to google "serverless" today and learn about SQLite's meaning of it?


I mean i dont really like the term serverless at all anyway, as its meaning isnt really literal, its more literal and correct in the context of SQLite IMO. Unfortunately thats just the world we live in. It also annoys me that the word "literally" now has a secondary definition in the dictionary to mean figuratively, just gotta get used to it, despite my dislike for it.

Meanings and definitions of words change over time as they get used. Another prime example would be the word "Hacker".


It is not meaningless, but perhaps unclearly expressed.

There is a fundamental point the the argument: you can distribute a database without segregating it.

Three methods to building a 1M+ users webapp:

1. A centralized database, eg. PostgreSQL. Typically it has a single-writer beefy machine.

2. A decentralized newSQL. Tables are automatically sharded for writes, among a set of database-only servers. Typically offered as cloud: CosmosDB, Cloud Spanner, Aurora.

3. A distributed system segregated per app. Each user has a dedicated sqlite file.

The third option would be simpler to code for, since it won’t have substantial scaling issues.

It is also easier for a lambda-like platform to provide: load the sqlite corresponding to the authenticated user, and the lambda code, and execute the code in sandbox.

Although one negative aspect for the AWS of this world would be lack of lock-in. It is relatively easy to migrate to another cloud service, or to mix cloud services.


The terms "in-process" or "embedded" have been used far longer than this page has existed.

"serverless" is a marketing term, then and now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: