Hacker News new | past | comments | ask | show | jobs | submit login

I'm honestly surprised it took them so long to reach this conclusion.

> That idea quickly loses its appeal, though, when one considers trying to create and maintain a 2,000-member structure, so the project is unlikely to go this way.

As repulsive as this might sound at first, I've seen structures of hundreds of fields work fine if the hierarchy inside them is well organized and they're not just flat. Still, I have no real knowledge of the complexity of the code and wish the Postgres devs all the luck in the world to get this working smoothly.




> I'm honestly surprised it took them so long to reach this conclusion.

I'm not. You can get a long way with conventional IPC, and OS processes provide a lot of value. For most PostgreSQL instances the TLB flush penalty is at least 3rd or 4th on the list of performance concerns, far below prevailing storage and network bottlenecks.

I share the concerns cited in this LWN story. Reworking this massive code base around multithreading carries a large amount of risk. PostgreSQL developers will have to level up substantially to pull it off.

A PostgreSQL endorsed "second-system" with the (likely impossible, but close enough that it wouldn't matter) goal of 100% client compatibility could be a better approach. Adopting a memory safe language would make this both tractable and attractive (to both developers and users.) The home truth is that any "new process model" effort would actually play out exactly this way, so why not be deliberate about it?


From what I gather postgres isn't doing conventional IPC but instead it uses shared memory, which means the same mechanism threads use but with way higher complexity


As does Oracle, and others. I'm aware.

IPC, to me, includes the conventional shared memory resources (memory segments, locks, semaphores, condition variable, etc.) used by these systems: resources acquired by processes for the purpose of communication with other processes.

I get it though. The most general concept of shared memory is not coupled to an OS "process." You made me question whether my concept of term IPC was valid, however. So what does one do when a question appears? Stop thinking immediately and consult a language model!

Q: Is shared memory considered a form of interprocess communication?

GPT-4: Yes, shared memory is indeed considered a form of interprocess communication (IPC). It's one of the several mechanisms provided by an operating system to allow processes to share and exchange data.

...

Why does citing ChatGPT make me feel so ugly inside?


I always understood IPC, "interprocess communication", in general sense, as anything and everything that can be used by processes to communicate with each other - of course with a narrowing provision that common use of the term refers to those means that are typically used for that purpose, are relatively efficient, and the process in question run on the same machine.

In that view, I always saw shared memory as IPC, in that it is a tool commonly used to exchange data between processes, but of course it is not strictly tied to any process in particular. This is similar to files, which if you squint are a form of IPC too, and are also not tied to any specific process.

> Why does citing ChatGPT make me feel so ugly inside?

That's probably because, in cases like this, it's not much different to stating it yourself, but is more noisy.


Without a credible source to reconfirm what ChatGPT said, one can’t really assume what ChatGPT says is correct.


> Why does citing ChatGPT make me feel so ugly inside?

Its the modern let me Google that for you. Just like people don't care what the #1 result on Google is, they also don't care what ChatGPT has to say about it. If they did, they'd ask it themselves.


Not necessarily. Man 3 shmem if you want a journey back to some bad ideas.


What do you think IPC is?


Would this basically be a new front end? Like the part that handles sockets and input?

Or more if a rewrite of subsystems? Like the query planner or storage engine etc?


Both, I'd imagine.

With regard to client compatibility there are related precedents for this already; the PostgreSQL wire protocol has emerged as a de facto standard. Cockroachdb and ClickHouse are two examples that come to mind.


Would something like opt-in sharing of pages between processes that oracle has been trying to get into kernel be the correct option: https://lwn.net/ml/linux-kernel/cover.1682453344.git.khalid....

Postmaster would just share the already shared memory between processes (containing also the locks). That explicit part of memory would opt-in to thread -like sharing and thus get faster/less tlb switching and lower memory usage. While all the rest of the state would still be per-process and safe.

tl;dr super share the existing shared memory area with kernel patch

All operating systems not supporting it would keep working as is.


Yes, it would mitigate the TLB problem. Interesting that Oracle is also looking to solve this problem, but not by multithreading the Oracle RDBMS.


Yeah. I think as a straightforward, easily correct transition from 2000 globals, a giant structure isn't an awful idea. It's not like the globals were organized before! You're just making the ambient state (awful as it is) explicit.


We did this with a project I worked on. I came on after the code was mature.

While we didn't have 2000 globals, we did have a non-trivial amount, spread over about 300kLOC of C++.

We started by just stuffing them into a "context" struct, and every function that accessed a global thus needed to take a context instance as a new parameter. This was tedious but easy.

However the upside was that this highlighted poor architecture. Over time we refactored those bits and the main context struct shrunk significantly.

The result was better and more modular code, and overall well worth the effort in our case, in my opinion.


> I think as a straightforward, easily correct transition from 2000 globals, a giant structure isn't an awful idea.

Agree.

> It's not like the globals were organized before!

Using a struct with 2000 fields loses some encapsulation.

When a global is defined in a ".c" file (and not exported via a ".h" file), it can only be accessed in that one ".c" file, sort of like a "private" field in a class.

Switching to a single struct would mean that all globals can be accessed by all code.

There's probably a way to define things that allows you to regain some encapsulation, though. For example, some spin on the opaque type pattern: https://stackoverflow.com/a/29121847/163832


No that is what a static in a .c file is for.

A plain global can be accessed from other compiled units - agreed with no .h entry it is my=uch more error prone e.g. you don't know the type but the variables name is exposed to other objects


Wouldn't those statics also be slated for removal with this change?


At most they'd be determined to be read only constants that are inlined during constant folding. This includes most integral sized / typed scalar values that fit into registers for the most part, and nothing you've taken the address of either - those remain as static data.


I think there might be a terminology mix-up here. In C, a global variable with the `static` keyword is is still mutable. So it typically can't be constant-folded/inlined.

The `static` modifier in that context just means that the symbol is not exported, so other ".c" files can't access it.


A static variable in C is mutable in the same sense that a local variable is, but since it's not visible outside the current compilation unit the optimizer is allowed to observe that it's never actually modified or published and constant fold it away.

Check out the generated assembly for this simple program, notice that kBase is folded even though it's not marked const: https://godbolt.org/z/h45vYo5x5


It is also possible for a link-time optimizer to observe that a non-static global variable is never modified and optimize that away too.

But the Postgres mailing list is talking about 2000 global variables being a hurdle to multi-threading. I doubt they just didn't realize that most of them can be optimized into constants.


Yea. Just about none of them could be optimized to constants because, uh, they're not constant. We're not perfect, but we do add const etc to TU level statics/globals that are actually read only. And if they are actually read only, we don't care about them in the context of threading anyway, since they wouldn't need any different behaviour anyway.


Exactly, if you're now forced to put everything in one place you're forced to acknowledge and understand the complexity of your state, and might have incentives to simplify it.


Here's MySQL's all-session-globals-in-one-place-class: https://github.com/mysql/mysql-server/blob/8.0/sql/sql_class...

I believe I can safely say that nobody acknowledges and understands the complexity of all state within that class, and that whatever incentives there may be to simplify it are not enough for that to actually happen.

(It ends on line 4692)


Right but that would still be true if they were globals instead. Putting all the globals in a class doesn't make any difference to how much state you have.


> Putting all the globals in a class doesn't make any difference to how much state you have.

I didn't make any claims about the _amount_ of state. My claim was that “you're forced to acknowledge and understand the complexity of your state” (i.e., moving it all together in one place helps understanding the state) is plain-out wrong.


It's not wrong. Obviously putting it all in one place makes you consider just how much of it you have, rather than having it hidden away all over your code.


Yes, it’s the most pragmatic and it’s only “awful” because it makes the actual problem visible. And would likely encourage slowly refactoring code to handle its state in a more sane way, until you’re only left with the really gnarly stuff, which shouldn’t be too much anymore and you can put them in individual thread local storages.

It’s an easy transition path.


I think my bigger fear is around security. A process per connection keeps things pretty secure for that connection regardless of what the global variables are doing (somewhat hard to mess that up with no concurrency going on in a process).

Merge all that into one process with many threads and it becomes a nightmare problem to ensure some random addon didn't decide to change a global var mid processing (which causes wrong data to be read).


All postgres processes run under the same system user and all the access checking happens completely in userspace.


Access checking, yes, but the scope of memory corruption does increase unavoidably, given the main thing the pgsql-hackers investigating threads want: one virtual memory context when toggling between concurrent work.

Of course, there's a huge amount of shared space already, so a willful corruption can already do virtually anything. But, more is more.


I've never really been limited by CPU when running postgres (few TB instances). The bottleneck is always IO. Do others have different experience? Plus there's elegance and a feeling of being in control when you know query is associated with specific process which you can deal with and monitor just like any other process.

But I'm very much clueless about internals, so this is a question rather than an opinion.


I see postgres become CPU bound regularly: Lots of hash joins, copy from or to CSV, index or materialized view rebuild. Postgis eats CPU. Tds_fdw tends to spend a lot of time doing charset conversion, more than actually networking to mssql.

I was surprised when starting with postgres. Then again, I have smaller databases (A few TB) and the cache hit ratio tends to be about 95%. Combine that with SSDs, and it becomes understandable.

Even so, I am wary of this change. Postgres is very reliable, and I have no problem throwing some extra hardware to it in return. But these people have proven they know what they are doing, so I'll go with their opinion.


I've also definitely seen a lot of CPU bounding on postgres.


It's not just CPU - memory usage is also higher. In particular, idle connections still consume signficant memory, and this is why PostgreSQL has so much lower connection limits than eg. MySQL. Pooling can help in some cases, but pooling also breaks some important PostgreSQL features (like prepared statements...) since poolers generally can't preserve session state. Other features (eg. notify) are just incompatible with pooling. And pooling cannot help with connections that are idle but inside a transaction.

That said, many of these things are solvable without a full switch to a threaded model (eg. by having pooling built-in and session-state-aware).


> solvable without a full switch to a threaded model (eg. by having pooling built-in and session-state-aware).

Yeeeeesssss, but solving that is solving the hardest part of switching to a threaded model. It requires the team to come terms with the global state and encapsulating session state in a non-global struct.


> That said, many of these things are solvable without a full switch to a threaded model (eg. by having pooling built-in and session-state-aware).

The thing is that that's a lot easier with threads. Much of the session state lives in process private memory (prepared statements etc), and it can't be statically sized ahead of time. If you move all that state into dynamically allocated shared memory, you've basically paid all the price for threading already, except you can't use any tooling for threads.


I've generally had buffer-cache hit rates in the 99.9% range, which ends up being minimal read I/O. (This is on AWS Aurora, where these bo disk cache and so shared_buffers is the primary cache, but an equivalent measure for vanilla postgres exists.)

In those scenarios,there's very little read I/O. CPU is the primary bottleneck. That's why we run up as many as 10 Aurora readers (autoscaled with traffic).


>I've never really been limited by CPU when running postgres (few TB instances). The bottleneck is always IO.

Throw a few NVMe drives at it and it might.


Throw a ridiculous amount of RAM at it is more correct assessment. NVMe reads are still an “I/O” and that is slow. And for at least 10 years buying enough RAM to have all off the interesting parts of OLTP psql database either in shared_buffers or in the OS-level buffer cache is completely feasible.


> NVMe reads are still an “I/O” and that is slow

It's orders of magnitude faster than SAS/SATA SSDs and you can throw 10 of them into 1U server. It's nowhere near "slow" and still easy enough to be CPU bottlenecked before you get IO bottlenecked.

But yes, pair of 1TB RAM servers gotta cost you less than half year's worth of developer salary


an array of modern SSDs can get to a similar bandwidth to RAM, albeit with significantly worse latency still. It's not that hard to push the bottleneck elsewhere in a lot of workloads. High performance fileservers, for example, need pretty beefy CPUs to keep up.


Depends on your queries.

If you push a lot of work into the database including JSON and have a lot of buffer memory...CPU can easily be limiting.


With modern SSDs that can push 1M IOPs+, you can get into a situation where I/O latency starts to become a problem, but in my experience, they far outpace what the CPU can do. Even the I/O stack can be optimized further in some of these cases, but often it comes with the trade off of shifting more work into the CPU.


Postgres uses lots of cpu and memory if you have many connections and especially clients that come and go frequently. Pooling and bouncers help with that. That experience should better come out of the box, not by bolting on tools around it.


> I'm honestly surprised it took them so long to reach this conclusion.

On the contrary, it's been discussed for ages. But it's a huge change, with only modest advantages.

I'm skeptical of the ROI to be honest. Not that is doesn't have value, but that it has more value than the effort.


> it's a huge change, with only modest advantages

+significant and unknown set of new problems, including new bugs.

This reminds me of the time they lifted entire streets in Chicago by 14 feet to address new urban requirements. Chicago, we can safely assume, did not have the option of just starting a brand new city a few miles away.

The interesting question here is should a system design that works quite well upto a certain scale be abandoned in order to extend its market reach.


Yeah, and you will run headlong into other unforseen real world issues. You may never reach the performance goals.


Also, even if a 2k-member structure is obnoxious, consider the alternative - having to think about and manage 2k global variables is probably even worse!


Each set of globals is in a module it relates to, not in some central file where everything has to be in one struct.

If anything, it's probably easier to understand.


I think this is a situation where a message-passing Actor-based model would do well. Maybe pass variable updates to a single writer process/thread through channels or a queue.

Years ago I wrote an algorithmic trader in Python (and Cython for the hotspots) using Multiprocessing and I was able to get away with a lot using that approach. I had one process receiving websocket updates from the exchange, another process writing them to an order book that used a custom data structure, and multiple other processes reading from that data structure. Ran well enough that trade decisions could be made in a few thousand nanoseconds on an average EC2 instance. Not sure what their latency requirements are, though I imagine they may need to be faster.

Obviously mutexes are the bottleneck for them at this point, and while my idea might be a bit slower than a low-load situation, perhaps it would be faster when you start getting to higher load.


That would most likely be several times slower than current model


I think the Actor model is fine if you start there, but I can't imagine incrementally adopting it in a large, preexisting code base.


> I'm honestly surprised it took them so long to reach this conclusion.

Oracle also uses a process model on Linux. At some point (I think starting with 12.x), it can now be configured on Linux to use a threaded model, but the default is still a process-per-connection model.

Why does everybody think it's a bad thing in Postgres, but nobody thinks it's a bad thing in Oracle.


Well for one Postgress is open source and widely used. So anyone can pick it up and look at its internals, that's not the case for Oracle DB .


This is how I made my fork of libtcc lock-free.

Mainline has a lock so that all backends can use global variables, but only one instance can do codegen at a time.

It was a giant refactoring Especially fun was when multiple compilation units used the same static variable name, but it all worked in the end.


Out of curiosity, where is this fork? Sounds very interesting.


https://github.com/rsaxvc/tinycc-multithreaded

This is the multi-threaded compiler: https://github.com/rsaxvc/tcc-swarm

With the multi-threaded tcc above it scales about as well as multiprocess. With mainline it doesn't scale well at all.

So far I haven't gotten around to reusing anything across libtcc handles/instances, but would eventually like to share mmap()'d headers across instances, as well as cache include paths, and take invocation arguments through stdin one compilation unit per line.


I don't see the problem. All variables are either set in config or at runtime and then for every new query they are read and used by PostgreSQL (at least this is my understanding).

Regarding the threading issue, I think you can do the connections part multithreaded instead of one process per connection and still use IPC between this and postmaster. Because of the way PostgreSQL currently works, seems feasible to move parts one by one into a threaded model and instead of tens/hundreds of processes you can have just a few and a lot of threads.

Honestly, they should prototype it and see how it looks like and then decide on the way forward.


I don't get it. How is a 2000-member structure any different from having 2000 global variables? How is maintaining the struct possibly harder than maintaining the globals? Refactoring globals to struct members is semantically nearly identical, it may as well just be a mechanical, cosmetic change, while also giving the possibility to move to a threaded architecture.


Because global variables can be confined to individual cpp files, exclusively visible in that compilation unit. It makes them far easier to reason with than hoisting them to the "global and globally visible" option if you just use a gargantuan struct. Which is why a more invasive refactor might be required.


What if the global variable has a greater scope than just a single TU? For simple variables of limited scope this approach would work but for more complex variables that are impacting multiple "modules" in the code it would introduce yet another code design problem to solve.


Just use thread local variables.

I abuse them for ridiculous things.


Yeah, I was really into that before there was even a cross-compiler/cross-platform syntax for declaring TLS values in C++ but have since “upgraded” to avoiding TLS altogether where possible. The quality of the implementations vary greatly from compiler and platform to compiler and platform, you run into weird issues with thread_at exit if they’re not primitive types, they run afoul of any fibers/coroutines/etc that have since become extremely prevalent, and a few other things.


Thread locals are both blessing and a curse - the problem with them is that you have no lifetime control over such variables.


That is the plan for PostgreSQL.


> if the hierarchy inside them is well organized

is this another way to say "in a 2000 member structure, only 10 have significant voting power"?


This statement is not about people, it is about a C struct.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: