As an ex-game developer and software architecture nerd, I'm very excited about data-oriented design and ECS. It really is a cool pattern, and it's a very common one in shipping games today. It's not just architecture astronaut stuff.
At the same time, the level of hype about ECS today reminds me an awful lot of the amount of hype surrounding OOP in the 90s. Can ECS be a better way to structure your game entities and make your game loop faster? Yes. Will it make your teeth whiter and your partner love you more? No.
(There are ECS frameworks in JavaScript, which gives you absolutely no control over memory layout and thus completely defeats one of the primary purposes of the pattern.)
Like any pattern, it exists to solve concrete problems. It shouldn't be the One True Way To Think About All Programming Henceforth and Forever.
When the author says things like:
> For example, how would you model chat messages in your game? I suppose you’d have to represent that as an entity in your game. How would you represent a constraint that would prevent a health component from being added to your chat message erroneously? In ECS it’s straightforward to create a system which operates on chat messages individually, but how would you query your chat messages so that you can display them in order?
To me, that just means "Don't use ECS for those." I have a really nice coffee mug that is just perfect for holding coffee. It does its job very well. That does not mean I feel any need to use that coffee mug for digging holes in my garden.
Databases and the relational model are great. ECS is great. Object-oriented programming is great. Functional programming is great. But treat them all as tools that should be used for the right job.
"At the same time, the level of hype about ECS today reminds me an awful lot of the amount of hype surrounding OOP in the 90s."
I appreciate that most of the ECS hype has been around specific use cases, though.
OOP was claimed as not a specific useful tool, but the answer to all programming, a billing it has not lived up to. It achieved "useful tool", no question, especially as some of the very rough bits were sanded off the original proposals (particularly the idea that objects should only and exactly match real world objects, an idea which I believe in hindsight was accidentally carried over from a simulation worldview in Simula where maybe it worked into the general programming world where it didn't), but it certainly has failed to be the one true programming methdology.
I haven't seen anyone claiming everything should be rewritten in ECS.
> I appreciate that most of the ECS hype has been around specific use cases, though.
I recently watched the first 30 minutes of Mike Acton's 2014 talk, and while that portion of the talk wasn't about ECS specifically, it very much presented an absolutist perspective.
This is a issue with how most potential solutions or languages are marketed, even to experienced developers. They're all sold as silver bullets and every experienced programmer should know that there is no such thing.
A lot of it also stems from what I call 'the hello world problem'. Very simple cases are brought up to sell you on the idea, but it's only in complex scenarios that you really learn the worth of something.
The biggest issue is that it's a mischaracterization of the talk.
He spent quite a bit of time at the start of the talk defining the context w/i which he was talking. He used strong language but it was never absolutist.
> I appreciate that most of the ECS hype has been around specific use cases, though.
Depends a lot on where you hang out. On amateur gamedev fora, I have seen many many many posts from beginners where they are struggling to cram ECS into their game and feel they need to because it's simply "the way" that one architects a game. Even if their game is written in a language that offers no performance benefits and the their simulation benefits nothing from it, they just think they have to.
It's heartbreaking watching someone go, "I know I could just store this piece of data right here in my entity class, but I'm not supposed to because of DoD, so how should I do this?" And then they get back confidence answers that involve pages of code and unnecessary systems.
It's exactly like the OOP fad of the 90s, just in the opposite direction. Yes, it turned out you don't need to encapsulate all data in classes. But, also, it is OK to just store data in stuff. You don't have to make every letter of your pop-up dialog a separate component.
"At the same time, the level of hype about ECS today reminds me an awful lot of the amount of hype surrounding OOP in the 90s."
I try not to focus too much on the ECS side.
It is all about understand the problem you need to solve. If you are making a game, and you know exactly what the game needs to do from a programming perspective... write it.
Yes, while I would keep the "data oriented" viewpoint... but do you really need to spent time on some ECS layer? If you know exactly what each character in the game does, write it and solve it. Are you making a game... or trying to create the next Unreal Engine?
> (There are ECS frameworks in JavaScript, which gives you absolutely no control over memory layout and thus completely defeats one of the primary purposes of the pattern.)
While JS does not provide great support for bit packing complex structs, typed arrays give you quite a bit of control over memory layout for simple numeric types, which is what you usually want for optimal data-oriented code anyway. This is a common technique used in fast JS libs for data visualization, ie:
This. To me a well engineered game from a data perspective is a set of seperate datastores optimized for the job they are doing referencing one another through handles. I can see the drive to represent this in a general purpose way but you nearly always lose performance and/or flexibility. Ironically the “where do I put chat messages in my ECS” example illustrates that nicely.
Your chapter on ECS in _Game_Programming_Patterns_ fueled my own obsession with the power of such a design, so you could be partly responsible ;).
I’ve spent a lot of time thinking about how to incorporate Data Oriented Design, SoA and ECS into the normal boring business logic at work, and I think it’s interesting to think about keeping data in the same form as it is in the relational database and skip the impedance mismatch in Object-Relational Mapping.
ECS only makes sense if you have real structs like in .NET or C/C++/Rust/Swift/etc, and tight latency requirements, but I think when someone learns a powerful concept like ECS, they want to invent a reason to use it so they can actually use it in action. I know that is what I’ve experienced at least.
> There are ECS frameworks in JavaScript, which gives you absolutely no control over memory layout and thus completely defeats one of the primary purposes of the pattern.
It doesn't give you direct control over the memory layout, but it's still fairly safe to assume that arrays are going to end up in relatively contiguous memory, which the relevant part for the performance difference between structs of arrays and arrays of structs.
I don't recall explicitly telling many compiled languages to stick all the items in an array together in memory either - it just happens by default, same as in JS.
JS arrays don't store their elements directly, they store references to their elements. (Unless the elements are primitives and the engine is using NaN-boxing or pointer tagging to store those inline, or something like that, which doesn't apply to ECS components generally.)
Depending on how the GC works, the elements themselves might still wind up next to each other in memory some of the time. But that is definitely not a pattern that you would expect to hold in general- components will be added and removed over the course of the game, with lots of other stuff happening in between.
In C++, the layout of the array elements is actually part of the language semantics. In JS, the language semantics don't even have a way to talk about that layout, and engines in practice don't use the layout you want.
> it's still fairly safe to assume that arrays are going to end up in relatively contiguous memory
A contiguous array of pointers to the actual objects doesn't buy you much. You're still doing an indirection and risking blowing your cache each time you do something with each element. It may be the case that the objects the array elements point to are contiguous in memory, but that's entirely a roll of the dice, and those dice get re-rolled on every garbage collection.
To get this in JS you should do it explicitly with TypedArrays otherwise you’re at risk of what the underlying VM does and in current modern implementations it most assuredly doesn’t make sure most kinds of arrays end up relatively contiguous. Notably for this conversation arrays of objects are unlikely to.
I dedicated years to full-time work on unifying the architecture for general software development.
ECS, OOP, functional programming, and others serve as methods to organize software — the developer interface. However that role should belong to IDEs — an actual user interface for developers.
Everybody has different hierarchy of their software, esentially developing their own software engine which sucks, because doing it right is extremely hard and time consuming.
What Steve Jobs told long time ago is still true today, "Paradoxically, we need more sophisticated software to make it more easy for the user" (paraphrasing). This is what we need - trully general software engine with IDE made right, what will unify organization of all sofotware and beyond.
For some deeper insights, in the context of the Web, I suggest reading Graydon Hoare's post [1].
What about TypedArrays though, they could be useful to implement a ECS in JS. I do agree with your point, but ECS could be a useful pattern in JS not just for graphics or games but also OLAP workloads.
With composition, you say, "This object A has pieces B, C, and D. The way to get to B, C, and D is by going through A."
ECS is the exact opposite: "There are components B, C, and D. They may or may not be associated with entity A. In most cases, you shouldn't care because you should be working with B, C, and D directly and not going through A at all."
I think your latching onto an overly specific definition of composition. The core of composition vs. inheritance is how you add features to a things, and whether that mechanism is decoupled so you can bolt that feature to other things. A feature is a blob of code, as opposed to data, which is a blob of memory.
ECS means to add a feature, you write a new system. Its decoupled because you don't have to do anything to anything else in the program. Adding a system (feature/code) affects nothing. To bring it to life on a subset of entities, you add components, which are the data portion. It only affects the joint of those entities that have the component. So again, data can be dynamically added even at runtime without affecting anything else.
With inheritance. You need to add a subclass, and restart the program, losing all the data in the process, coz the code is coupled with the data (the definition of a class). This can be annoying for some types of program development.
> But it is just that SQL is the most terse and standard way so express logic.
As someone who has written a ton of complex SQL... I couldn't disagree more.
Trying to shoehorn things that can trivially and intuitively be expressed in a couple of for-loops with a couple of variables, into SQL expressions making use of joins and aggregate functions and GROUP BY's and (god forbid) correlated subqueries... having to replicate subqueries because their results in one part of the query can't be re-used in another part... teaching people arcane terminology distinctions like between WHERE and HAVING... not having basic functions for calculating percentiles or even a basic median... certain basic kinds of logic operations that simply can't be done... flipping a coin on whether the query engine will use the index and execute in 10 ms, or decide to follow a different execution plan and take 5 minutes to run...
I have never encountered more bugs in business logic than in dealing with SQL. It obfuscates what should be clear. SQL isn't a solution for avoiding bugs, it's what causes so many of them.
The types of issues you're enumerating are really part of a learning curve that every language and environment will have.
> not having basic functions for calculating percentiles or even a basic median
I don't think this is really true anymore; windowing functions are pretty prolific, and I think every major database will have some percentile functions
> The types of issues you're enumerating are really part of a learning curve that every language and environment will have.
They're not though. That's why excellent coders often despise having to learn and use SQL, and basically just refuse to. SQL is uniquely terrible -- and I say this as someone with a career spanning from Win32 C++ to every major web technology.
> every major database will have some percentile functions
MySQL still doesn't. MariaDB only added it in 2018, and I've been writing queries for a lot longer than that. (And MySQL only added window functions in 2018 as well.)
And remember that, for various reasons, you're often (usually?) stuck using a database engine in production that's several years old, since upgrading a database engine is not something to be taken lightly.
Nearly all of them, because any language designed in the last 30 years permits user defined libraries in trivial fashion, even amazingly letting you write those libraries in the same language you’ll use it. And even more incredibly, the libraries themselves can build on other libraries! (There’s even often a standard allowing libraries to be shared by different implementations
… or rather, a standard that actually standardizes things beyond the language aesthetic)
Databases follow the COBOL model of extensibility — tis for me and not for thee. It is only to be done by the vendor, and maybe some third party specialists
> They're not though. That's why excellent coders often despise having to learn and use SQL, and basically just refuse to. SQL is uniquely terrible -- and I say this as someone with a career spanning from Win32 C++ to every major web technology.
This is basically a strawman + appeal to authority argument. I've no desire to bicker. "Uniquely terrible" is quite subjective, we'll have to agree to disagree on if it applies to sql.
Excuse me, but I'm certainly not "bickering" and I resent the insinuation. And you don't seem to understand what a strawman argument is.
I'm not going to write an article on this for you, but you can certainly see the sentiment frequently expressed on HN that the rise of NoSQL and of ORM's is a direct result of otherwise capable programmers not wanting to deal with SQL and its conceptual building blocks such as joins. Take that for what you will, but you might suspect there's a reason for it, and that it's not merely the learning curve for "every language".
> you can certainly see the sentiment frequently expressed on HN that the rise of NoSQL and of ORM's is a direct result of otherwise capable programmers not wanting to deal with SQL and its conceptual building blocks such as joins
I think I saw that sentiment a lot 5+ years ago. The last 2-3 years have been a lot more for postgresql and writing SQL directly.
Either way I don't think that "excellent coders" "despise" learning SQL, almost every senior coder that has interacted with a medium-to-large database has understood and valued SQL in my experience.
I feel like your perspective might be more narrow than you think it is.
I've been coding professionally for 20 years and at every job I've ever had we've made the decision to move away from ORMs or as much as possible given the constraints of our system in preference of directly writing SQL. My previous employer dumped Rail in favor of a Go backend (a language whose userbase almost entirely eschews the idea of ORMs) and my current employer never even had one to start with, choose to write SQL directly in a typescript backend.
I'm not going to pull the stats for you but this very website is FULL of articles talking about the benefits of writing your own SQL and I can't even remember the last time someone tried to pitch me a shiny new ORM.
> teaching people arcane terminology distinctions like between WHERE and HAVING
This is akin to complaining about needing to teach people arcane terminology distinctions between FOR and WHILE. I think it's a crime that many CS degrees I see don't include a course in database and SQL. There are of course some arcade corners of SQL (I sometimes have to look up the different JSONB operators in postgres), but WHERE and HAVING ain't it.
> flipping a coin on whether the query engine will use the index and execute in 10 ms, or decide to follow a different execution plan and take 5 minutes to run
Heavens, if you think using an ORM is going to help you here, have you got another think coming.
Your take on this is so foreign to me I was honestly shocked to see it. That being said, you probably aren't the only one, so perhaps my perspective is more narrow than I had assumed as well.
ORMs and NoSQL were a direct result of the desire to have an object-oriented data worldview in a programming landscape dominated by OO languages. They were not introduced because programmers didn’t want to learn SQL (implying incompetence etc).
The common sentiment on HN I'd say is the opposite: ORM's are needed for junior devs who didn't have time to learn SQL properly. After NoSQL hype died out (around 5 years ago), the HN attitude towards both ORM's and using NoSQL instead of relational dbs has been mostly negative for sure.
I take it as the same reason they don’t want to use boring technology like HAProxy, or why they think infrastructure in general is beneath them: because they are easily influenced by breathless grifters on Medium promoting the New Thing.
SQL isn’t going away any time soon. At its heart, it’s relational algebra. If you can learn DS&A, you can learn relational algebra.
FWIW, I love HAProxy but hate SQL (for the exact reasons the OC suggests). It turns the simple into soup and makes bugs more likely. Every single non-trivial usage of it I've seen in the wild has been a ball of spaghetti. I bought the learning curve argument for a long time until I finally realized: no, SQL is a victim of its own success. It's popular because it's popular, not because it's a superior language. That popularity has encouraged a gambling addict level of doubling down that continues to gaslight less-experienced developers into believing it's "the way."
You're right that it isn't going away, but when it does, I'll be the first to dance on its grave.
Oh, I've worked with all the major engines. To which we can add: trying to remember which seemingly basic features are supported by which engines, and which aren't (e.g. percentiles are great in Postgres, but you're SOL in MySQL). That there's virtually nothing "standard" about SQL syntax or functionality at all, except for the simplest of queries and the general basic concepts.
For us there's been two major motivations, and another few lesser.
First and most prominent is that Sybase was bought by SAP and SAP has pivoted SQLAnywhere from a general RMDBS to focus on mobile applications (ie more embedded). Apart from product development changing focus we've struggled to buy licenses for our customers, so the writing on the wall was clear.
The seconds is that we've got some big customers who run their own MSSQL servers, and they've been pushing harder and harder for us to use their servers so they can handle administration and access control directly.
A third motivation is that while we've been primarily on-prem, a lot of customers these days don't want that, so we've started offering cloud hosted options. Since we're a Windows shop Azure made sense, and as such the hosted MSSQL also makes sense and is quite convenient.
Fourth is that, while MSSQL is quite anemic in terms of SQL language functionality, it does support transactional schema updates, which makes schema upgrades much less error prone.
Given that MSSQL syntax support is quite limited and we now have developed tooling and library support for multiple databases, we'll probably also add support for PostgreSQL later.
In that direction, it's usually going to be about robustness -- more tooling, less edge cases, a wider developer base, built-in cloud support, and so forth. The more popular a database is, the easier just everything is.
Occasionally it's going to be some very specific aspect of performance limits, or a feature like spatial indexes that work in a special way. And occasionally it's going to be about cost or better integration with a particular cloud or other specific piece of tooling.
that's how you end up with multi-thousand LOC sql files that contain all the historical business logic and edge cases of a company and it just keeps being piled up on and its complexity keeps growing because not everything is expressible in SQL. So you end up having a big monster that someone needs to support and one that's not very testable easily
Testing SQL is easy. E.g. the JVM has H2, which works for simple stuff, or you can use testcontainers, or just spin up a container and run your tests against that. You just run your migrations, insert mock data, and run your test.
In fact testability is one of the best parts. You can safely test read-only queries against a prod secondary database to see that it gives reasonable results on real data, and use the repl to explore parts of your query to really get a sense that things are working.
SQL queries yes, what Bury above probably is talking about is obscure triggers/stored-procs,etc that encodes the business logic inside the database far away from any version control or sane ways to track it.
Tracking triggers etc is pretty straightforward, just put them in a repo and apply them as part of a schema change. Testing them can be tricky, as unit tests aren't as likely to capture the locks/time that can be an issue when the database is under load. Tools like pgreplay can be helpful here.
It's not like tracking events in application logic is somehow just a bucket of unicorn farts and rainbows. At least triggers are deterministic and tracked within transactions. Can't tell you how often folks update the DB with an ORM in application logic but forget to update dependent records in other tables in their one-off branch of logic.
Sometimes it's a hard problem.
That said, I think triggers need very well-maintained dependency charts in the docs/comments to ensure they don't ever go cyclical. Those are bad days.
(But again, events in app logic are in no way shape or form immune to cyclic runaways.)
Re orm, I'm strongly biased towards only using them when they
(1) are absolutely necessary or provide some extreme benefit
(2) are able to output the raw SQL to be executed on the db
(3) the SQL to be executed is understood.
I've seen them used as a substitute for understanding SQL/databases and, personally, think that's a pretty bad idea.
You can write code in any language without defining functions or modularizing in any way. You can write bad code in any language.
For some reason, folks assume SQL must be written badly since they have only written it badly or seen it written so.
It is absolutely possible and preferable to write maintainable SQL logic into user defined functions, stored procedures, views, materialized views, CTEs, temporary tables, etc. If you're looking at one huge pile of monolithic, untestable SQL, the problem isn't the SQL.
One doesn't write O(n^3) algorithms in C++ and then blame C++ for it being slow. For some reason, folks seem pleased with themselves to do as such with SQL though every day and twice on Sunday.
Got subselects in each of the fifteen outer joins with NULLs all over your schema, and now you're upset performance is horrible and inconsistent? PEBKAC.
The difference is that most people don't write their business logic in a single flat function in most programming languages, and there is a good chance that individual pieces would have tests.
Neither do good SQL developers: you can easily structure database logic in multiple stored functions, stored procedures, packages, and multiple files. Unit test frameworks are also available.
Testing SQL is pretty popular in most pro-SQL environments as well. There are many tools for that even if many developers appear to remain willfully ignorant of them.
Most other programming languages are more composable than a SQL query. You can make functions, split logic into different classes, extract common logic, etc. Much harder with big sql queries
Counterpoint: don’t do this. The path to hell is paved with good intentions.
Some co-workers and I inherited a code-base where the authors went down the views and stored procedures route. It was basicallly impossible to untangle; there was no knowing what relied on a view or a proc, so you couldn’t touch them at all, there were no docs, there were duplicates of everything (and again, no way to know what’s used). A good number of them had all kinds of side effects and weird performance characteristics. They weren’t version controlled.
If your colleagues are not applying basic engineering rules to views and stored procedures, it's not fair to say that this is a problem with views and sprocs.
They can and should most definitely be testable, documentable, trackable and version controlled.
Should one also judge all backend programming by PHP standards circa 1998?
I tend to judge things by the average-worst-case that they enable and how easily things go wrong, and IME stored procs specifically, go badly real quick.
Data structures can be a hard part of programming. That doesn't mean folks don't chose the wrong data structures all the time in any language.
An array instead of a linked list or vice versa. A dequeue instead of a set or vice versa. A binary tree instead of a B+ tree. You can't just throw column names at a DB schema and hope for the best. There's actual engineering to be done, even if many programmers refuse to admit it when interacting with a relational database engine.
You're blaming the language for the bad coding practices of the developers. You can build maintainable database code, and modern SQL IDEs like Datagrip provide similar tools for SQL like you have for other languages, e.g., refactoring, dependency/usage info, versioning, etc. The downsides you mention are not inherent to SQL.
I think SWEs think of databases only as a kind of generic persistence layer and aren't super interested in a lot of the details (more than one has said to me databases are just an implementation detail) or additional capabilities, which I think is a very limited view. Databases are essentially interpreters, just like Python or Ruby. They come with access to a multiuser persistence (etc) service or engine (super handy!) but offer all the affordances of other interpreters. In fact, plpython on PostgreSQL will just use the Python packages on your system.
Do you have to read about it? Sure, just like all programming languages and platforms. But, interestingly, SQL was designed for non-technical analysts, so (to your point) it's more accessible than those others.
Mediocre coders fixate on algorithms. Great coders concentrate on data structures.
SQL schema creation and maintenance is all about choosing the right data structures for the job at hand.
If you neglect attention to your data structures, no algorithms will save you, and you really only have yourself to blame. It's a still to be learned and honed like any other.
SQL is the semantic layer on top of relational algebra. Adding another layer on top is like having Google translate from Spanish to Russian before translating back to English. Each level of indirection just opens the door for more nonsense to creep in.
I feel like you've come to the right conclusion, but partly for the wrong reasons. Terseness in and of itself is not useful per se. Code golfing languages are the tersest there is, but we don't write code that way.
Or they just used the wrong word. I'd agree with them, but clarify that SQL is not just terse, but concise.
A join or a group by is going to be much clearer than writing the code the query plan is going to generate, creating temporary hash maps, doing nested loops, etc.
Yep, because unlike general purpose programming languages, you describe WHAT data you want from SQL, not HOW to get that data as in most application languages.
We've been doing this for a while and it has some pretty severe limitations at scale. The biggest issue is that humans are human and you cannot make everyone else on the team use your elegant SQL DSL business engine in exactly the same way you had envisioned. Eventually, it will grow into a bit of a monster.
We are headed back in the other direction, but with some hedge. Code with a very simple & transparent CI/CD experience seems equivalent to some SQL configurable thing in our minds now, but with way more potential upside.
In our latest code-based abstractions, nothing is stopping you from breaking out a SQL connection and running any arbitrary query. In fact, all of the data is still in SQL and it's still authoritative, you just now have access to way more powerful tooling to work with that data. Code+SQL together is the answer.
For non-expert team members, modern C# is turning out to be way more intuitive than SQL the moment you encounter a 3+ table join scenario.
I think you make some decent points, but at the same time I really don’t understand this part:
> For non-expert team members, modern C# is turning out to be way more intuitive than SQL the moment you encounter a 3+ table join scenario.
Partly because this is so inefficient that it’s never going to work at scale, but also because of how you’re moving the complexity into the realm of “magic”.
I know it’s very easy and intuitive to use one of C#s ORMs, but with that comes an reliance on things like linq and a model builder, both of which may not work the way you think they do. If your developers think about it at all, and having seen so many C# developers use IEnumerable where they really should have been using IQuerable… well if your developers can’t do relatively basic SQL then I’m not sure I’d trust them with the abstraction either.
I don’t think you’re really, wrong, either, but I think it’s much more a question of building developer tools to help your developers handle good data access than it is about picking a particular tech on the consuming side.
As far as data storage goes, however, I think we’re just beginning to see the move away from the classics because the classics just weren’t build to support how we use data in 2023.
I can’t begin to tell you the nightmare it is to update old school SQL “data wells” to be capable of temporality, good BI access as well as being compliant with the various EU legislative rules, and almost none of that can be done above the database. Well I guess you could, but you’d probably go insane. And that’s just path of it, the other part is just how much data we transact now. It used to be that banks were basically the only organisations to move massive amounts of tiny bits of data, and now we all do it. Like, a single solar plant moves a gazillion points of data into your various systems a minute, where 30 years ago the entire data for that place might’ve been a couple of kb a year, it’s now mbs a day.
Same, every backend service I write now has the majority of the business logic in SQL and a little pre/post-processing in regular code. A well-designed schema will mean that your queries don't get messy. If some more complex thing starts feeling forced, I add a bit more non-SQL code to make it reasonable.
When teammates look at my code, and logic is all right there instead of scattered around, and there's a schema file backing it all that makes it clear what all the relations are, they have an easy time making tweaks or adding on. Yes it's very testable too.
This also kinda depends on having a multi-service architecture if your system is large. Separate database for each. That's a good thing anyway.
The fear of committing SQL to the codebase is one of the most baffling things in modern backends.
To me the most infuriating thing is the "SQL query scattered around multiple files" pattern, where a backend engineer will decompose a perfectly fine SQL query into 3 or 4 files, with multiple functions, often with very artificial separations (for example: a function just for the "select ..." part, another for the joins).
All that in the name of having small files, small functions, small lines. You take complexity away from the "micro" parts and embed it into the invisible parts of your program.
I hate this, also. This was a "design pattern" (antipattern) at a previous company. It was incredibly difficult to navigate SQL built like this. We were also asked to write "unit tests" for each micro-function. This was essentially a pointless exercise that compared each function call to a string, copy-and-pasted from the function itself, with no regard to the correctness of the overall query. At least it was easy to make the tests pass.
We're using some tool at work that recommends putting SQL into dedicated separate files, one file per query. Or you can use the query builder which is just SQL but with extra steps, limitations, and caveats. It's so annoying.
> And that in itself is the most important factor to avoid bugs.
I'm not sure about that. SQL was the first language I learned and the language that has always been there throughout the decades, and is also where I make the most mistakes.
If you were to write the equivalent of a SQL query in some other language, you'll probably make a lot more mistakes. Especially if you're trying to achieve the same performance. And I mean a read-only query, not even something with multi-writes and locking.
I don't think that is true as I could use a better relational calculus that exhibits all the of the same positive attributes that SQL brings to the table. SQL's traps are not all fundamental, just straight up poor language design. SQL's biggest trap is where it grossly deviates from Codd's relational algebra model. A relational system that is more "pure" would solve many bug vectors right out of the gate.
What's an example of a SQL query that's misleading but wouldn't be if it stayed more like Codd's model?
The issues I have with SQL are simpler things like `timestamp with time zone` not really storing a time zone and actually being the thing you want for tz-independent timestamps.
I'm not sure I have a great example in front of me – the bugs get fixed. I can point out that even `SELECT * FROM table` violates the relational model. I don't expect anyone is encountering bugs from that simple query as it is easy enough to reason about, but when you compound the complexity it becomes harder to remember where SQL is non-relational and to remember to account for that.
I suppose it can be avoided with extreme care, like properly managing memory in C, but a good language will help you out. Much like C, SQL is old so those traps may be excusable in a legacy language, but it is shame that we haven't put much effort into building something more modern. Where is the 'Rust' of SQL?
I'm not sure GoogleSQL is, for all intents and purposes, anything more than just a particular SQL implementation. Notably, it still breaks from the relational model in the same way and, as such, is prone to the very same bugs of which we speak.
The closest thing to the 'Rust' of SQL is probably Datalog. Even something like QUEL, which Postgres used for the first decade of its life, would have likely left us in a much better place.
But Oracle won the database wars, so we got left with the junk that came with it – for compatibility initially, and now we can't seem to move past SQL because the typical developer somehow has come to think SQL and relational algebra/calculus are the exact same thing...
SQL isn't the purest, but I've yet to see a real-life example of some other query language that gracefully handles queries that would be difficult in SQL. Looking at these other languages with an open mind, I still don't see them making things easier. If there's something better, then there should be examples out there showing a stark contrast in usability.
The only real alternative I can think of is the map-reduce pattern (see PySpark), but it's for a different use case with some overlap. I have replaced SQL queries with map-reduce once before, and only because the application really didn't make sense to use SQL for.
> I've yet to see a real-life example of some other query language that gracefully handles queries that would be difficult in SQL.
That's exactly it. Nothing has come anywhere close to seeing the same kind of effort as SQL put into it. SQL is the best application of relational calculus (or close approximation, at least) we have – but that doesn't make it good. A concerted effort to build a better application of relational calculus would undoubtedly yield a major improvement for developers, on the order of how something like Rust has improved over C, and as I said before it is unfortunate that we as an industry are not showing interest in a better language like we do with regards to general purpose computing.
SQL gets the job done, but we can do better.
> The only real alternative I can think of is the map-reduce pattern
The alternative is to build a better language, not jump to a completely different and unrelated model. Indeed, there is a place for map-reduce for certain problems, but they are outside of where you would use the relational model.
SpacetimeDB looks great for game dev. I've been hearing chitter chatter in the FaaS world about connecting clients directly to the DB as a means of reducing number of components and complexity in simple-ish CRUD apps. Not sure how well that design holds up there, time will tell.
For those seeing this (SpacetimeDB) and immediatelly conjuring images of nightmares to be, consider the following: if you had an extremely latency sensitive usecase and had the opportunity to host your database and business logic on the same machine, why wouldn't you?
The fundamental difference between an ECS and a struct/object layout is that an ECS is column-oriented (aka columnar), while a struct/object layout is row-oriented.
Everything else about how you might query these layouts is more superficial... you can provide the same API with either layout, the same way you can in relational database systems (both layouts can be queried with SQL, but with different performance characteristics.)
I started learning programming in the mid-to-late 90's.. a teenager learning with Turbo C, Turbo Pascal, and VB6... eventually to Visual C++... to then attempting to jump on the OOP bandwagon with Java, I began to dislike coding.
I was questioning whether this was the career but, after a few years, decided to give it a go.
Job interviews, particulary then, were about "OOP this" and "OOP that" and I would purposely be agreeable but was unahppy with the code I was writing. Eventually I would come out of my shell with views to other devs.
Roll on to 2014 watching Mike Actons "Data Oriented Design" - and I immediately felt at home, not because I consider myself an 'expert' or on the same league as seasoned game programmers but I felt I had the right mindset all along!
In my 20 years, primarily a C# developer to pay my bills but at home my personal projects are written in C, and tried D.. and now Odin. I just prefer the control they give.
(Scheme as well, I do like Scheme)
Whenever someone asks me (a developer) what Data Oriented Design is, I try to explain to them it is like building a decent database. You are not thinking about the code (so much) but the representation of data. With databases, you create tables that have relations with other tables, with keys and indexes. You are setting up "lookups" etc.
Once they grasp the idea of building a database (which most developers can easily understand) - it is a case of transferring that energy not to tables in a database, but to data structures in your programming language.
Of course, this idea is easier to understand if you have experience in languages like C. For those coming from Python or Java or C#, etc, can be a little trickier but only if they think purely in the OOP mindset.
However, if they struggle to grasp this, then ECS is a great way for understanding in that OOP-ish way. The penny drops when I discuss Entities and Components. You are not building classes and inheritence, you are building entities and components. It is more flexible creating an Entity and "attaching" a Component for it to do something. Then you don't have some kind of Update method in a class, you just have one function which passes in all of it's type.
Anyway - it all comes back full circle now reading this article. Great reading! Now I can just refer people to this link instead.
I am thinking that iteration is just traversal and traversal is just execution.
Take iterators in C++ or standard library "algorithms" library in C++. Or iterators in Rust or in Java.
You just want to traverse and collect values (and calls on functions in some interleaving), which is like joining tables as this article says.
I'm thinking that the definition of the traversal (such as Kafka pipeline, clojure threading syntax, Clojure data driven design, Javascript lodash pipeline) can be mapped to tuples and then the computer can optimise arbitrary chains of traversals based on the number of potential tuples are available and what traversals are equivalent paths or routes to the same traversal.
In other words, every program is a compiler pipeline or database query pipeline.
Maybe Prolog and Datalog can help here. Optimisation of arbitrary traversals and determination of identical traversals.
Or monads are just ordered traversals and OCaml compilers are just traversals of execution (function application) and relationship following, which are joins.
> In other words, every program is a compiler pipeline or database query pipeline.
> Or monads are just ordered traversals and OCaml compilers are just traversals of execution (function application) and relationship following, which are joins.
> The relational model is everywhere.
This is interesting - could you elaborate a bit more?
Author feedback: the large print, and the static elements (screen-thieving banner + “AI assistant” button), made attempting to read this blog as presented a vile experience on my phone. The AI button can’t be dismissed, if you ask it how to dismiss itself it doesn't know, and it won’t even accept negative feedback due to obscured submit buttons.
I think I'm still a bit on the fence after reading this post on whether building out relational queries on top of ECS will ultimately nuke the cache locality and multithreading performance benefits of ECS in the long run.
The great promise of ECS is that once we all have 128-512 core CPUs, that we will actually be making great use of them in the videogames of the future rather than throwing away the vast majority of performance which is the current world with the vast majority of Unity and Unreal games.
Once you hit the real world and start making queries of everything, everywhere, all at once, maybe it'll be lock, lock, lock on your "database queries"
And also, it seem like this DB really demands a game engine like Bevy actually get finished!
Has anyone here seen a database where your users are users in the db itself? Not just a user(id, name, email, password) table but actual db users with GRANTed permissions and ACLs etc set appropriately, and open access to the DB for these users.
It seems like it would solve a lot of problems by eliminating the need for data broker apps / endpoints that simply put POST or GET parameters into different SQL queries based on which endpoint you hit. I have no idea if it would scale but I don’t see why it couldn’t.
This sort of idea came up earlier this month in another post as well:
I really resonated with this article and ended up at the same conclusion myself, even going so far as to replace a homegrown ECS system with a relational database in the browser
> I think that people perceive databases to be slow, not because they are slow, but because they often interact with them in the context of persisting data to disk, from across a network, by passing strings back and forth which must be parsed, compiled and executed. If you’ve ever optimized a program, you know that I just listed pretty much the slowest operations you can do on a computer.
I’m sorry, but the difference between "databases are slow" and "the only way I can interact with databases is slow" is only pedantics. I understand that, as a database developer working hard to squeeze every bit of performance possible this is frustrating, but as a developer I couldn’t care less about the difference.
Hi spacetimedb.com. You apparently want people to read about your product, but have a non-GDPR-compliant popup that requires people to uncheck multiple "vendors" if your readers do not want to be tracked.
Edit: it actually does not allow any unchecking. Just "you agree to these marketing and tracking cookies by using this site" Nope.
Baffling. It does not lend confidence in your core product. If you're not respectful of your casual reader's data, how can we expect you to be careful with your user's data?
I recommend zero tracking, or if you must, have it be genuinely opt-in.
Nope, nope and nope. Went to the github page for spacetimedb, it does everything that is terrible.
>Instead of deploying a web or game server that sits in between your clients and your database, your clients connect directly to the database and execute your application logic inside the database itself. You can write all of your permission and authorization logic right inside your module just as you would in a normal server.
Why? Databases should never, ever, ever, be used to perform logic, they are datastores, that is it. Your logic goes elsewhere. Stored procedures are the worst "feature" of any database, you are just asking for a hard time debugging, troubleshooting, and increasing the chance that you will fuck up the most valuable part of your system, your data.
> This means that you can write your entire application in a single language, Rust, and deploy it as a single binary.
It also means you have a single point of failure, no read-replicas or redundancy. Hate everything about this.
Can you keep an open mind? We’ve used stored procedures for years. It has worked wonderfully for creating a single source and producer of truth for business data. Instead of potentially having business logic across multiple repos and deployments, everything exists in one place, with absolute unquestionable authority.
It’s not difficult to debug at all, you might just be unskilled.
> It’s not difficult to debug at all, you might just be unskilled.
I agree with your larger point but this seems too harsh: it’s definitely harder to debug simply because, as with microservices, understanding how the app is functioning now requires you to understand different code in multiple languages and locations, you’re highly likely to hit non-portable behavior across databases for authoring and debugging, and you’re never going to get a debugger with the whole flow in context.
That doesn’t mean there aren’t benefits as well and it could be especially useful as a way to force distinctions about contracts for common operations, but I wouldn’t say it’s right for all or even most projects. The sweet spot is going to vary widely.
The tradeoff seems to be „ability to deploy working software without reliance on single central authority”.
You may get rid of several smaller bottlenecks this way, introducing an enormous, all-encompassing one.
Or am I wrong?
> Databases should never, ever, ever, be used to perform logic
You're talking about a best practice like it's a fundamental law. It's not, it's just how we've mostly been doing things. A lot of interesting innovations in distributed systems / architecture (serverless, graphql, thin clients, thick clients, ORMs, RSC, WSGI, nodejs) have been made because the designers tried relaxing a constraint or taking a counterintuitive idea to a maximalist place.
In fact, if you look harder, there are a fair number of existence proofs of successful systems built on stored procedures. There's even a "best practice" phrase recommending doing compute as close as possible to the data.
Stored procedures ensure all your clients get the same logic. They're only "the worst feature of any database" if the language you're writing them in is not suitable (which is the case with PL/SQL and co, which were tacked on to RDBs and reek of bad 80s syntax and facilities).
If the language is nice and has well designed access to db facilities like records and such, it can be better than writing your code outside the database, especially coupled with a data-oriented design model/ECS (which can be extremely debuggable and offer great visibility).
>and increasing the chance that you will fuck up the most valuable part of your system, your data.
Since you can write anything outside you can inside, no. I can send a "delete from/drop <table>" from any client at any moment, or make any mistake in updating. That's what backups are for, and databases make them even easier (not to mention transactions being very good and neglected part of business logic).
> Why? Databases should never, ever, ever, be used to perform logic, they are datastores, that is it. Your logic goes elsewhere. Stored procedures are the worst "feature" of any database, you are just asking for a hard time debugging, troubleshooting, and increasing the chance that you will fuck up the most valuable part of your system, your data.
You cannot sanely use a database with multiple heterogenous clients without putting logic at least about what valid data should be in it. This is gonna include some “business logic” in practically any real-world system.
Otherwise you have to elevate the same functionality to some gatekeeper-daemon that’ll almost certainly perform far worse, lack features, and be an eternal source of dumb bugs, including, I can just about guarantee, data corruption bugs.
This is of course what SpacetimeDB does. The stored procedures fail any transactions that are not authorized to be carried out. e.g. if a player is not high enough level or something.
> Databases should never, ever, ever, be used to perform logic, they are datastores, that is it.
I wouldn't go that far. Relational algebra is performing logic. Constraints and foreign keys are logic, as well.
I'm not going to argue that you should go back in time 15-20 years and start shredding XML strings in stored procedures again. But thinking of the database as one step above a flat file is similarly backward thinking.
More than that, the concept of putting application logic adjacent to the data store is sound. That's exactly what a web API or a microservice is doing. They allow a uniform mechanism of requests and responses to a data store. At a concept level, that's exactly the same thing. The concept of keeping logic at or immediately adjacent to the data layer so that a whole range of disparate applications can request and maintain data from the source is what the design goal of "database side logic" is.
> At a concept level, that's exactly the same thing.
But at a practical level they very different. If you have a middleware layer as you are describing, it's written in a real programming language with all the adjacent tools (source control, debugging, etc).
I'm not hard-core against stored procedures used lightly but they have a lot of downsides and they simply aren't needed. There's no performance advantage. There are complexity advantages and disadvantages that might be a wash.
> If you have a middleware layer as you are describing, it's written in a real programming language with all the adjacent tools (source control, debugging, etc).
I'm not sure what you're using, but you can absolutely use source control for stored procedures. There's any number of database change management tools available. You should already be using one for your schema. Stored procedure debugging tools also exist for most platforms.
I'm not really interested in the "it's not a real programming language" topic. That's almost universally someone going on an ego trip about what they like. If your point is, "We get to use a single language that the whole team is experienced and familiar with," then sure that's valuable. But that's about the team's capabilities more than anything.
> There's no performance advantage.
No, that's an outrageous claim. I've seen and implemented some processes as stored procedures and in some of those cases it's worked much better simply because we don't have to pull the data pool out of the cloud and across the country to wherever the CPU is, manipulate it, and then push it all back up to the data store.
Stored procedures are not some universal panacea that the 4GL crowd wanted them to be, but it's also not something that's universally worse.
> If you have a middleware layer as you are describing, it's written in a real programming language with all the adjacent tools (source control, debugging, etc).
This is very much the point of SpacetimeDB. We write stored procedures in Rust or C# (or theoretically, any language that compiles to WASM), and we get all the same developer tools we're used to.
> There's no performance advantage.
There very much is a performance advantage to executing WASM in-process compared to communicating over an IPC barrier or, god forbid, a network.
>If you have a middleware layer as you are describing, it's written in a real programming language with all the adjacent tools (source control, debugging, etc).
And nothing stops a DB internal language used to write store procedures and such be a "real language" -- with source control, debugging and everything. RDBs can do a whole lot more offering an even better environment for those than common RDBs do - Smalltalk or Symbolics level good.
In fact in the case of this project, that language is Rust.
Nothing you mentioned is an inherent problem with the databases. In fact, databases have arguably more potential to solve these problems then any currently used system.
Like debugging - if every memory access is a database access, then you have a builtin logging for all your memory access that is both more performant and optimal then any normal normal debugger. You can take snapshots of your 'memory' pretty much at any time, the data layout is clearly defined. You can manually edit your memory any time you want, and serialization is already solved for you.
The more I think about it, the more possibilities I come up with.
Damn, this is actually genius!
I'm writing a quite similar system to this so I can give the differences why this is useful (esp for this usecase).
1: Separating stores was crucial during the 90s and earlier when people were still writing in memory-unsafe languages (C/C++ cough) since it could cause wild corruptions with simple stray pointers. Process-separation was just a sanity thing. As you notice these are other languages in play here so random corruptions shouldn't happen (memory exhaustion can still be a thing though with their model)
2: Yes, debugging stored procedures/triggers on SQL-Server,etc is a PITA because they're database first centric objects, however the idea here is to make the database fill the job of app-servers of the 90s/00s with "regular" debugging workflows for developers. (Don't confuse implementations with the concept)
3: And MOST importantly, this is a game-focused thing, gamedevs will often end up replicating most cache/database functionality (badly) to squeeze things into main memory with the goal to achieve realtime performance targets anyhow, why not forgo that duplicated crazy work with a solid framework?
4: As a corollary to the above, the benefits of separating storage from applications (to run multiple applications against the dataset as it often happens in enterprise scenarios) isn't really the focus, application to database mappings are intended to be more or less 1:1
How do you manage multiple instances of the application and not introduce split brain?
> Or you could do it at disk level (RAID).
That is not comparable. RAID's redundancy is not the same as the redundancy in a multi node database cluster. You have one service, not multiple services, network card gets fried, your database goes down, you can't promote a standby to master and be on your way. Also RAID is a single disk as far as the OS is concerned, so you could hit I/O limits (especially if you have a single binary) that cause your app to chug, you cannot split your writes and reads across different physical or virtual machines that have different disks.
> you are just asking for a hard time debugging, troubleshooting, and increasing the chance that you will fuck up the most valuable part of your system, your data.
This sounds like a tooling problem. One could imagine a database that doesn't have these issues.
> It also means you have a single point of failure, no read-replicas or redundancy.
At the same time, the level of hype about ECS today reminds me an awful lot of the amount of hype surrounding OOP in the 90s. Can ECS be a better way to structure your game entities and make your game loop faster? Yes. Will it make your teeth whiter and your partner love you more? No.
(There are ECS frameworks in JavaScript, which gives you absolutely no control over memory layout and thus completely defeats one of the primary purposes of the pattern.)
Like any pattern, it exists to solve concrete problems. It shouldn't be the One True Way To Think About All Programming Henceforth and Forever.
When the author says things like:
> For example, how would you model chat messages in your game? I suppose you’d have to represent that as an entity in your game. How would you represent a constraint that would prevent a health component from being added to your chat message erroneously? In ECS it’s straightforward to create a system which operates on chat messages individually, but how would you query your chat messages so that you can display them in order?
To me, that just means "Don't use ECS for those." I have a really nice coffee mug that is just perfect for holding coffee. It does its job very well. That does not mean I feel any need to use that coffee mug for digging holes in my garden.
Databases and the relational model are great. ECS is great. Object-oriented programming is great. Functional programming is great. But treat them all as tools that should be used for the right job.