Hacker News new | past | comments | ask | show | jobs | submit login

If you think glibc pthreads are not optimal then you could supply a patch and transparently fix literally thousands of applications and tens of millions of users. Otherwise I trust the glibc developers I work with who are extremely smart and focused on performance.



I highly recommend you take a look at https://webkit.org/blog/6161/locking-in-webkit/ then. It’s been known for a long time that pthread mutexes are very expensive and terrible. Unfortunately you can’t fix it AND maintain back compat (and you might lose technical compliance with POSIX). But that doesn’t mean that there aren’t drastically better locking primitives and it’s the epitomy of hubirus to think glibc is a great codebase. It’s just widely used and most people really don’t want to dive into writing their own libc. Literally every high performance application I’ve seen uses their own locking primitives, DNS alternatives etc. The only exception is when the locking isn’t a critical bottleneck.

Examples of the same concept in different contexts: WTF lock Folly locks ABSEIL locks Rust mutexes

I have more interesting projects to work on than glibc maintenance.


The reason WebKit rolls their own locks is because they have a specific needs that they can optimize around, which they do quite well. But a two-bit lock isn’t actually capable of supporting recursive locking, or fairness, or all the other things that people expect from a pthread mutex. You could say that most people don’t need those things and could get away with a more optimized lock, and you’d be right, but glibc supports the general case and does a fairly good job, so it’s a reasonable choice except for high-performance applications which end up rolling their own.


You don't need a larger mutex for fairness. For recursive locking, maybe. But most mutexes aren't recursive.

That's one of the big problems with pthread mutexes: they try to pack too much functionality into a single type. A pthread mutex can be recursive, so every mutex needs space to store a recursion count, even though most mutexes aren't recursive. A pthread mutex can have an arbitrary priority ceiling (pthread_mutexattr_setprioceiling), so every mutex needs space to store the priority ceiling, even though most mutexes don't set that attribute. A pthread mutex can be "robust" (pthread_mutexattr_setrobust), which at least under glibc means that every mutex needs previous/next pointers in it so it can be stored in a per-thread linked list, even though most mutexes are not robust. That's an excessive level of generality, when different kinds of mutexes could have just been different types!

The other big problem with pthread mutexes is just that they are old. I could be wrong, but my impression is that much of the interest in smaller mutexes has come about relatively recently, at least compared to the age of these APIs. It might be possible to shrink pthread_mutex_t to some extent despite the aforementioned issues, but it's impossible to do so on any existing operating system (at least on existing architectures) because changing the size of pthread_mutex_t would break ABI.


Another point about glibc implementation of pthread_mutex_t (and IIRC even the glibc internal lll_t that implements simple mutex as few lines of assembly involving futex(2)) is that on 32/64b platforms the layout of the structure is compatible between 32b and 64b ABIs. The reason for that is that pthread_mutex can have pshared attribute and also it is used in implementation of higher-level posix-IPC primitives (sane implementation of all of which involves just placing the particular synchronization struct into shared memory and not somehow translating that into SysV-IPC).


You'd need extra space to store the fact that the mutex is fair rather than unfair. (I agree with you that the API tries to cram too much into one interface, but that's really on POSIX rather than glibc.)


Look at the benchmarks. The fairness locks perform worse than if you didn’t have any fairness because you’re wasting a huge amount of CPU cycles trying to create that fairness.

Recursive locks are generally acknowledged as a terrible design pattern.

You don’t get to simultaneously claim that glibc has an optimal lock implementation and that high performance applications should implement their own. That implies the glibc implementation isn’t actually optimal because you’re paying a penalty for features that aren’t actually necessary.


Sounds like a problem with POSIX rather than glibc. Nothing about glibc has stopped Webkit from using their own locking implementation alongside, or it could be supplied as a library (libraries after all provide a vast array of Linux features which operate above or alongside libc). You could even work through the POSIX committee to get lighter weight locks approved and then have it added to glibc and then work to modify existing programs to use them. So I don't see how this relates to the supposed inefficiency of glibc's pthreads implementation.


Nothing is stopping glibc from shipping a non-POSIX extension to locks / doing that standardization. Like I said, I have more interesting things to spend my time on. In my eyes POSIX is an ossified standard and dead standard that hasn’t shipped anything interesting or consequential in a very long time.

Pthreads are fine as a default for most applications but then most applications shouldn’t be using threading primitives anyway and we should be using higher levels of abstraction that user better locking primitives under the hood.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: