I'm honestly surprised it took them so long to reach this conclusion.
> That idea quickly loses its appeal, though, when one considers trying to create and maintain a 2,000-member structure, so the project is unlikely to go this way.
As repulsive as this might sound at first, I've seen structures of hundreds of fields work fine if the hierarchy inside them is well organized and they're not just flat. Still, I have no real knowledge of the complexity of the code and wish the Postgres devs all the luck in the world to get this working smoothly.
> I'm honestly surprised it took them so long to reach this conclusion.
I'm not. You can get a long way with conventional IPC, and OS processes provide a lot of value. For most PostgreSQL instances the TLB flush penalty is at least 3rd or 4th on the list of performance concerns, far below prevailing storage and network bottlenecks.
I share the concerns cited in this LWN story. Reworking this massive code base around multithreading carries a large amount of risk. PostgreSQL developers will have to level up substantially to pull it off.
A PostgreSQL endorsed "second-system" with the (likely impossible, but close enough that it wouldn't matter) goal of 100% client compatibility could be a better approach. Adopting a memory safe language would make this both tractable and attractive (to both developers and users.) The home truth is that any "new process model" effort would actually play out exactly this way, so why not be deliberate about it?
From what I gather postgres isn't doing conventional IPC but instead it uses shared memory, which means the same mechanism threads use but with way higher complexity
IPC, to me, includes the conventional shared memory resources (memory segments, locks, semaphores, condition variable, etc.) used by these systems: resources acquired by processes for the purpose of communication with other processes.
I get it though. The most general concept of shared memory is not coupled to an OS "process." You made me question whether my concept of term IPC was valid, however. So what does one do when a question appears? Stop thinking immediately and consult a language model!
Q: Is shared memory considered a form of interprocess communication?
GPT-4: Yes, shared memory is indeed considered a form of interprocess communication (IPC). It's one of the several mechanisms provided by an operating system to allow processes to share and exchange data.
...
Why does citing ChatGPT make me feel so ugly inside?
I always understood IPC, "interprocess communication", in general sense, as anything and everything that can be used by processes to communicate with each other - of course with a narrowing provision that common use of the term refers to those means that are typically used for that purpose, are relatively efficient, and the process in question run on the same machine.
In that view, I always saw shared memory as IPC, in that it is a tool commonly used to exchange data between processes, but of course it is not strictly tied to any process in particular. This is similar to files, which if you squint are a form of IPC too, and are also not tied to any specific process.
> Why does citing ChatGPT make me feel so ugly inside?
That's probably because, in cases like this, it's not much different to stating it yourself, but is more noisy.
> Why does citing ChatGPT make me feel so ugly inside?
Its the modern let me Google that for you. Just like people don't care what the #1 result on Google is, they also don't care what ChatGPT has to say about it. If they did, they'd ask it themselves.
With regard to client compatibility there are related precedents for this already; the PostgreSQL wire protocol has emerged as a de facto standard. Cockroachdb and ClickHouse are two examples that come to mind.
Postmaster would just share the already shared memory between processes (containing also the locks). That explicit part of memory would opt-in to thread -like sharing and thus get faster/less tlb switching and lower memory usage. While all the rest of the state would still be per-process and safe.
tl;dr super share the existing shared memory area with kernel patch
All operating systems not supporting it would keep working as is.
Yeah. I think as a straightforward, easily correct transition from 2000 globals, a giant structure isn't an awful idea. It's not like the globals were organized before! You're just making the ambient state (awful as it is) explicit.
We did this with a project I worked on. I came on after the code was mature.
While we didn't have 2000 globals, we did have a non-trivial amount, spread over about 300kLOC of C++.
We started by just stuffing them into a "context" struct, and every function that accessed a global thus needed to take a context instance as a new parameter. This was tedious but easy.
However the upside was that this highlighted poor architecture. Over time we refactored those bits and the main context struct shrunk significantly.
The result was better and more modular code, and overall well worth the effort in our case, in my opinion.
> I think as a straightforward, easily correct transition from 2000 globals, a giant structure isn't an awful idea.
Agree.
> It's not like the globals were organized before!
Using a struct with 2000 fields loses some encapsulation.
When a global is defined in a ".c" file (and not exported via a ".h" file), it can only be accessed in that one ".c" file, sort of like a "private" field in a class.
Switching to a single struct would mean that all globals can be accessed by all code.
There's probably a way to define things that allows you to regain some encapsulation, though. For example, some spin on the opaque type pattern: https://stackoverflow.com/a/29121847/163832
A plain global can be accessed from other compiled units - agreed with no .h entry it is my=uch more error prone e.g. you don't know the type but the variables name is exposed to other objects
At most they'd be determined to be read only constants that are inlined during constant folding. This includes most integral sized / typed scalar values that fit into registers for the most part, and nothing you've taken the address of either - those remain as static data.
I think there might be a terminology mix-up here. In C, a global variable with the `static` keyword is is still mutable. So it typically can't be constant-folded/inlined.
The `static` modifier in that context just means that the symbol is not exported, so other ".c" files can't access it.
A static variable in C is mutable in the same sense that a local variable is, but since it's not visible outside the current compilation unit the optimizer is allowed to observe that it's never actually modified or published and constant fold it away.
Check out the generated assembly for this simple program, notice that kBase is folded even though it's not marked const: https://godbolt.org/z/h45vYo5x5
It is also possible for a link-time optimizer to observe that a non-static global variable is never modified and optimize that away too.
But the Postgres mailing list is talking about 2000 global variables being a hurdle to multi-threading. I doubt they just didn't realize that most of them can be optimized into constants.
Yea. Just about none of them could be optimized to constants because, uh, they're not constant. We're not perfect, but we do add const etc to TU level statics/globals that are actually read only. And if they are actually read only, we don't care about them in the context of threading anyway, since they wouldn't need any different behaviour anyway.
Exactly, if you're now forced to put everything in one place you're forced to acknowledge and understand the complexity of your state, and might have incentives to simplify it.
I believe I can safely say that nobody acknowledges and understands the complexity of all state within that class, and that whatever incentives there may be to simplify it are not enough for that to actually happen.
Right but that would still be true if they were globals instead. Putting all the globals in a class doesn't make any difference to how much state you have.
> Putting all the globals in a class doesn't make any difference to how much state you have.
I didn't make any claims about the _amount_ of state. My claim was that “you're forced to acknowledge and understand the complexity of your state” (i.e., moving it all together in one place helps understanding the state) is plain-out wrong.
It's not wrong. Obviously putting it all in one place makes you consider just how much of it you have, rather than having it hidden away all over your code.
Yes, it’s the most pragmatic and it’s only “awful” because it makes the actual problem visible. And would likely encourage slowly refactoring code to handle its state in a more sane way, until you’re only left with the really gnarly stuff, which shouldn’t be too much anymore and you can put them in individual thread local storages.
I think my bigger fear is around security. A process per connection keeps things pretty secure for that connection regardless of what the global variables are doing (somewhat hard to mess that up with no concurrency going on in a process).
Merge all that into one process with many threads and it becomes a nightmare problem to ensure some random addon didn't decide to change a global var mid processing (which causes wrong data to be read).
Access checking, yes, but the scope of memory corruption does increase unavoidably, given the main thing the pgsql-hackers investigating threads want: one virtual memory context when toggling between concurrent work.
Of course, there's a huge amount of shared space already, so a willful corruption can already do virtually anything. But, more is more.
I've never really been limited by CPU when running postgres (few TB instances). The bottleneck is always IO. Do others have different experience? Plus there's elegance and a feeling of being in control when you know query is associated with specific process which you can deal with and monitor just like any other process.
But I'm very much clueless about internals, so this is a question rather than an opinion.
I see postgres become CPU bound regularly: Lots of hash joins, copy from or to CSV, index or materialized view rebuild. Postgis eats CPU. Tds_fdw tends to spend a lot of time doing charset conversion, more than actually networking to mssql.
I was surprised when starting with postgres. Then again, I have smaller databases (A few TB) and the cache hit ratio tends to be about 95%. Combine that with SSDs, and it becomes understandable.
Even so, I am wary of this change. Postgres is very reliable, and I have no problem throwing some extra hardware to it in return. But these people have proven they know what they are doing, so I'll go with their opinion.
It's not just CPU - memory usage is also higher. In particular, idle connections still consume signficant memory, and this is why PostgreSQL has so much lower connection limits than eg. MySQL. Pooling can help in some cases, but pooling also breaks some important PostgreSQL features (like prepared statements...) since poolers generally can't preserve session state. Other features (eg. notify) are just incompatible with pooling. And pooling cannot help with connections that are idle but inside a transaction.
That said, many of these things are solvable without a full switch to a threaded model (eg. by having pooling built-in and session-state-aware).
> solvable without a full switch to a threaded model (eg. by having pooling built-in and session-state-aware).
Yeeeeesssss, but solving that is solving the hardest part of switching to a threaded model. It requires the team to come terms with the global state and encapsulating session state in a non-global struct.
> That said, many of these things are solvable without a full switch to a threaded model (eg. by having pooling built-in and session-state-aware).
The thing is that that's a lot easier with threads. Much of the session state lives in process private memory (prepared statements etc), and it can't be statically sized ahead of time. If you move all that state into dynamically allocated shared memory, you've basically paid all the price for threading already, except you can't use any tooling for threads.
I've generally had buffer-cache hit rates in the 99.9% range, which ends up being minimal read I/O. (This is on AWS Aurora, where these bo disk cache and so shared_buffers is the primary cache, but an equivalent measure for vanilla postgres exists.)
In those scenarios,there's very little read I/O. CPU is the primary bottleneck. That's why we run up as many as 10 Aurora readers (autoscaled with traffic).
Throw a ridiculous amount of RAM at it is more correct assessment. NVMe reads are still an “I/O” and that is slow. And for at least 10 years buying enough RAM to have all off the interesting parts of OLTP psql database either in shared_buffers or in the OS-level buffer cache is completely feasible.
It's orders of magnitude faster than SAS/SATA SSDs and you can throw 10 of them into 1U server. It's nowhere near "slow" and still easy enough to be CPU bottlenecked before you get IO bottlenecked.
But yes, pair of 1TB RAM servers gotta cost you less than half year's worth of developer salary
an array of modern SSDs can get to a similar bandwidth to RAM, albeit with significantly worse latency still. It's not that hard to push the bottleneck elsewhere in a lot of workloads. High performance fileservers, for example, need pretty beefy CPUs to keep up.
With modern SSDs that can push 1M IOPs+, you can get into a situation where I/O latency starts to become a problem, but in my experience, they far outpace what the CPU can do. Even the I/O stack can be optimized further in some of these cases, but often it comes with the trade off of shifting more work into the CPU.
Postgres uses lots of cpu and memory if you have many connections and especially clients that come and go frequently. Pooling and bouncers help with that. That experience should better come out of the box, not by bolting on tools around it.
+significant and unknown set of new problems, including new bugs.
This reminds me of the time they lifted entire streets in Chicago by 14 feet to address new urban requirements. Chicago, we can safely assume, did not have the option of just starting a brand new city a few miles away.
The interesting question here is should a system design that works quite well upto a certain scale be abandoned in order to extend its market reach.
Also, even if a 2k-member structure is obnoxious, consider the alternative - having to think about and manage 2k global variables is probably even worse!
I think this is a situation where a message-passing Actor-based model would do well. Maybe pass variable updates to a single writer process/thread through channels or a queue.
Years ago I wrote an algorithmic trader in Python (and Cython for the hotspots) using Multiprocessing and I was able to get away with a lot using that approach. I had one process receiving websocket updates from the exchange, another process writing them to an order book that used a custom data structure, and multiple other processes reading from that data structure. Ran well enough that trade decisions could be made in a few thousand nanoseconds on an average EC2 instance. Not sure what their latency requirements are, though I imagine they may need to be faster.
Obviously mutexes are the bottleneck for them at this point, and while my idea might be a bit slower than a low-load situation, perhaps it would be faster when you start getting to higher load.
> I'm honestly surprised it took them so long to reach this conclusion.
Oracle also uses a process model on Linux. At some point (I think starting with 12.x), it can now be configured on Linux to use a threaded model, but the default is still a process-per-connection model.
Why does everybody think it's a bad thing in Postgres, but nobody thinks it's a bad thing in Oracle.
With the multi-threaded tcc above it scales about as well as multiprocess. With mainline it doesn't scale well at all.
So far I haven't gotten around to reusing anything across libtcc handles/instances, but would eventually like to share mmap()'d headers across instances, as well as cache include paths, and take invocation arguments through stdin one compilation unit per line.
I don't see the problem. All variables are either set in config or at runtime and then for every new query they are read and used by PostgreSQL (at least this is my understanding).
Regarding the threading issue, I think you can do the connections part multithreaded instead of one process per connection and still use IPC between this and postmaster. Because of the way PostgreSQL currently works, seems feasible to move parts one by one into a threaded model and instead of tens/hundreds of processes you can have just a few and a lot of threads.
Honestly, they should prototype it and see how it looks like and then decide on the way forward.
I don't get it. How is a 2000-member structure any different from having 2000 global variables? How is maintaining the struct possibly harder than maintaining the globals? Refactoring globals to struct members is semantically nearly identical, it may as well just be a mechanical, cosmetic change, while also giving the possibility to move to a threaded architecture.
Because global variables can be confined to individual cpp files, exclusively visible in that compilation unit. It makes them far easier to reason with than hoisting them to the "global and globally visible" option if you just use a gargantuan struct. Which is why a more invasive refactor might be required.
What if the global variable has a greater scope than just a single TU? For simple variables of limited scope this approach would work but for more complex variables that are impacting multiple "modules" in the code it would introduce yet another code design problem to solve.
Yeah, I was really into that before there was even a cross-compiler/cross-platform syntax for declaring TLS values in C++ but have since “upgraded” to avoiding TLS altogether where possible. The quality of the implementations vary greatly from compiler and platform to compiler and platform, you run into weird issues with thread_at exit if they’re not primitive types, they run afoul of any fibers/coroutines/etc that have since become extremely prevalent, and a few other things.
> That idea quickly loses its appeal, though, when one considers trying to create and maintain a 2,000-member structure, so the project is unlikely to go this way.
As repulsive as this might sound at first, I've seen structures of hundreds of fields work fine if the hierarchy inside them is well organized and they're not just flat. Still, I have no real knowledge of the complexity of the code and wish the Postgres devs all the luck in the world to get this working smoothly.