Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
“POSIX advisory locks are broken by design” (sqlite.org)
150 points by okket on July 24, 2018 | hide | past | favorite | 99 comments


I'm pretty sure this is a well-known problem in general on Linux -- file locking is almost never a good solution. It sucks, since there are definitely times where it would be useful.

In any event, this is probably a low priority for glibc/kernel devs -- the fact that syncing doesn't actually behave as expected is much more important (https://lwn.net/Articles/752063/).


They added a solution for this three or four years ago, by adding "Open File Description Locks" which don't have the misbehaviour that that SQLite comment is complaining about.

See https://www.gnu.org/software/libc/manual/html_mono/libc.html...


Yes. And there is code in the SQLite source tree (currently on a branch) that supports OFD locks. The problem we have is that SQLite is so widely deployed and is in so many systems and on so many platforms, that we have to continue to support Posix Advisory Locks (PAL) for the foreseeable future, for compatibility. It will be great if someday we can remove the PAL code. But for now, it has to stay in the tree.


Neat! Is there any reason it’s in a branch? OFD locks fully interoperate with regular POSIX locks, so there shouldn’t be any compatibility issue, except that SQLite would need to fall back on older kernels.


Since OFD locks are a GNUism, putting that code into the mainline would immediately break portability. I quite like my SQLite databases on illumos and ZFS, so if it couldn’t work on illumos because of a Linux-specific feature, that’d be a serious problem.


That's what #ifdef is for. SQLite has a number of those, given it runs even on platforms which are not POSIX at all!


I know what they are for and don’t need instruction on that, thank you very much.

I also know that #ifdefs are a bad practice with many pitfalls so their use should be kept to a minimum. There is a treatise on the subject written in the early ‘90’s that I now see is very much relevant, judging by your comment. I wish I could quote it but since I’m using a smartphone to type this it’s too much hassle to find it.

Regardless, from a software architecture’s point of view, introducing a GNUism or a Linuxism where it isn’t strictly necessary is an obviously bad idea.

How’d you fare with that GCC build on the mailing list from couple of years back, by the way?


Using #ifdefs to get more out of certain architectures or working around quirks of a platform is definitely not something I'd consider bad practice.

As long as it has a default way to fall back to that works everywhere, there is no harm in platform specific code guarded by #ifdefs or other but similar mechanisms provided by the language you use (golang has platform and OS detection!)


They are not a bad practice if their use is kept to a minimum, and if the #ifdef in question is picked out thoughtfully.

Dump the built-in preprocessor macros between two different compilers on the same OS and diff the output, it should become immediately clear what I mean. Now go to a different OS and do the same thing and you’ll see that the number of possible combinations of defines to pick explodes, and depending on the situation, the common subset might not even be useful for portability, as in not enough overlap.


Then what is the alternative where I don't loose good integration into OS capabilities while also being portable and not bloating the code?


Where POSIX or XPG6 are insufficient, use core functions from libc, having verified that they exist on all OS targets and that they behave the same, and build the required functionality on top of those.


But then you would not be able to, for example, take advantage of FreeBSD's pledge when available or SELinux to protect your application and the user.

Libc doesn't even remotely offer what some libraries want and it leaves out everything that OS can offer on top of POSIX and libc.

And what if I want to support Windows, Linux and BSD? Windows doesn't really have a libc and is definitely not even remotely compatible with POSIX.


Yes you wouldn't be able to take advantage of OS specific features, and you'd have to write platform agnostic code on top of what you do have, but that's the price to pay for portability.

Windows does have a libc, but unlike UNIX where it comes with the OS, on Windows it comes with Visual Studio. That's what the "Visual Studio redistributables" are, since Windows has no linker maps and therefore no ABI versioning. Also, Windows has had a POSIX subsystem since the Windows NT days. That's what Windows Services for UNIX ("Interix") runs on.

I'm not saying don't use #ifdefs at all; what I am saying is don't use any OS-specific features if you want your program to be portable, or at least don't use those OS specific features which would immediately preclude usage of your software on other operating systems.

The only exception to this rule is if you are specifically writing software optimized for an operating system, like for example AmigaOS, or illumos/Solaris, or FreeBSD. However, unlike GNU/Linux based operating systems, it is a practical affair to find enough common functionality in illumos/Solaris and the BSD's to write software optimized for all of those operating systems and still be portable between all of them. The same does not hold true for software which has been written using GNU features because they are completely proprietary to GNU and require porting to BSD's and illumos. Sometimes, it's not even feasible to port those features because they are misfeatures or so badly designed that they are broken out of the box.


>I'm not saying don't use #ifdefs at all; what I am saying is don't use any OS-specific features if you want your program to be portable, or at least don't use those OS specific features which would immediately preclude usage of your software on other operating systems.

But this is exactly what #ifdefs solve.

You both have a portable program and can rely on OS-specific features. You can #ifdef GNU behaviour so that you can use the GNU shit on Linux and rely on more portable behaviour if you don't have GNU. Or you use muslc which doesn't pull as much crap. You can #ifdef that too.

I think it seems you want contradictory things. On one hand you claim that portable programs are desirable, on the other you say that you shouldn't rely on OS specific behaviour.

All behaviour is OS specific and outside of very trivial programs I challenge you to find an application that does not rely on #ifdefs, OS specific behaviour or duplicating code while also being portable.

Portability is not what I'd consider the utmost goal of software engineering, that would be solving the problem. Then comes maintainability. Portability is something at the end of a fairly long list.


Portability is what made UNIX the most widespread and successful operating system of all time.

As an engineer yes I want to solve practical problems with a computer, but I also don’t want to dictate (within reason) which OS the user must use. For example, if one of potential users of my software has a really advanced FreeBSD or OpenBSD setup and they can compile link and package my software, it would enable them to keep the advantages of their setup, and my software would make their advanced setup even more useful, like compound return on investment. Another advantage of this approach is that by not dictating the platform, it makes the user more productive and saves their time. People just want to get a task done and solve a problem, and respecting their time should be one of the programmer’s priorities.


We're not dictating what the user is using for their OS, again as I mentioned, there should always be a pure POSIX fallback. But #ifdefs help to take advantage of what the OS can do if the user decides to do it and we should not take that away from the user either.

Additionally I feel like it should be mentioned that even when POSIX and UNIX was a thing, programs had to include #ifdefs because target platforms wouldn't support X or didn't support Y in the same way as another platform. POSIX may have been designed to help against that but people still had to port their software a lot, with lots of #ifdefs and runtime shims.


* Henry Spencer and Geoff Collyer (1992). "#ifdef Considered Harmful, or Portability Experience With C News". USENIX ’92 Proceedings. pp 185-197. https://usenix.org/legacy/publications/library/proceedings/s...


Yes, that’s the document! Thanks so much for locating it, really appreciate it!


Best practices in portability have evolved since the 90s. I understand that portable C from that time was a nightmare, but things have changed a lot.

As long as you have a well defined portability layer, it doesn't really matter if it uses #ifdef or separate files (actually SQLite can be embedded as a single file, so it matters, but still). The important point of Spencer's paper was that portability shouldn't be a second thought. In fact it also mentions how to use #ifdef, not just when not to use it.


Ah, darn, I'm a few hours late so you will probably miss it but I wanted to ask for a long time:

Sounds like a lot of stuff like this has accumulated. Your dedication to backwards compatibility (and testing) is always very impressive, but don't you ever get the urge to do a parallel "SQLite Next" effort as it were? Kind of like python had the 2/3 years.


I read somewhere that the developers of SQLite have a multi-decade support agreement with some big companies. If true, they will probably keep all the accumulated stuff. Any changes would only be additions and never break backwards compatibility.


There was an attempt to re-architect sqlite. Try searching for sqlite4.


Here is a discussion about how the sqlite4 experiment ended, from 8 months ago

https://news.ycombinator.com/item?id=15648280 (157 comments)

And the relevant commit (timestamp 2017-10-27):

https://sqlite.org/src4/artifact/56683d66cbd41c2e

> All development work on SQLite4 has ended. The experiment has concluded.

> Lessons learned from SQLite4 have been folded into SQLite3 which continues to be actively maintained and developed. This repository exists as an historical record. There are no plans at this time to resume development of SQLite4.


You can also use the BSD extension, flock(2), which is supported everywhere except commercial SysV clones like Solaris and HP/UX. flock has the semantics you would expect[1] and isn't Linux-only.

This LWN article has a great explanation of all the behaviors: https://lwn.net/Articles/586904/

[1] Effectively identical to the Linux extension/POSIX proposal except that (1) they don't support ranges, (2) don't contend with POSIX locks (they do on real BSDs but not Linux), and (3) BSD flock won't work across NFS.


I was under the impression that since[1] 2.6.12, when using BSD flock(), NFS will use POSIX locks that emulate BSD locking behavior and work over NFS. So long as you use consistent NFS implementations, shouldn't this work?

[1] http://nfs.sourceforge.net/#faq_d10


I stand corrected.


Page with only the relevant chapter (rather than 5MB monstrosity): https://www.gnu.org/software/libc/manual/html_node/Open-File...


GNU != POSIX.

Glibc work won’t fix (e.g.) the BSDs.


The BSDs already have flock(), as mentioned by M. Ahern on this very page at https://news.ycombinator.com/item?id=17605317 .


File locking seems good enough to me. I wasn't consiciously aware of this short-coming but when you think about it, it makes total sense: the syscalls for file locks (at least on Linux and macOS) are related to the file descriptors. So when you close the file, the lock is gone.

Why would you in any case hold 2 file descriptors open of the same file in the same process, especially when you use locking? For a "multi-process application" (this sounds so antiquated) probably one thread should hold the fd and exclusively write on it.


Actually, it wasn't a problem on Linux under the initial pthreads implementation. The API worked as you would expect for threads; the advisory locks were owned by individual threads. Eventually (probably after the change to NPTL and improved kernel support for threads) it was fixed to conform to the POSIX design whereby the locks are owned by the process.


> the advisory locks were owned by individual threads. Eventually (probably after the change to NPTL and improved kernel support for threads)

Isn't that because, in the time prior to that "improved kernel support", threads were really processes, and it was once they became full-fledge (i.e. fully lightweight) threads that they lost the ability to own the advisory locks? Or am I misremembering?


It was one of the known problems.

* http://jdebp.info./FGA/linux-thread-problems.html


An unrelated comment: Looking through the rest of the file, I love how many comments there are in general! I am able to get a sense of what each function does, and what the individual code blocks within the function are doing.

This is something that I do not see as often as I want, and I'm really glad to see it here!


My (admittedly quick) reading of this is that the issues stem, primarily, if not exclusively, from using multiple file descriptors for the same file, necessitated by using/supporting threading for concurrency.

If that's the case, then perhaps it's not so much that PAL was broken by design but that the original design merely didn't anticipate threading. I'm not sure how the advisory locking scheme compares in terms of timeline with threading support, but Unix file locking in general easily predates even SMP.


> If that's the case, then perhaps it's not so much that PAL was broken by design but that the original design merely didn't anticipate threading.

I don't think it's even that. It's simply because the API describes the contract between the kernel and processes, not the kernel and threads. It's a case of determining who is responsible for what.

In this case the process is responsible for coordinating file locks with the kernel, not individual threads. From the kernel's perspective, 2 requests to the locking API from different threads still come from the same actor: the process. From the perspective of the locking API threads are a process's private business.

A developer failing to understand the abstraction levels in an API is not the fault of the API's design, but it could be caused by documentation.


Being unable to control a lock from multiple threads isn't a great API either. Per file descriptor^H^Hion is closer to what I'd call ideal. It gives you a locking context that can map m:n to threads. Even if it is a bit fussy with things like fseek.


I don't think that's correct. According to Wikipedia, SMP dates to approximately the early 1960s, whereas according to the FreeBSD man pages, flock and fcntl were both introduced in 4.2BSD, released in 1983, two decades later. It's at most fair to say that Unix file locking came around the same time as Unix SMP.


"Unix SMP" is not synonymous with a system threading interface -- processes give parallelism as well and predate the introduction of threading interfaces. The pthread interface came about in the 1990s.

The unix interface is broken with respect to threads in many places besides this. Signals. Forking. Problems abound because the concept was bolted on decades later.


signals, forking, and simple things like chdir.

A huge area that is broken w.r.t. threading is the whole internationalization stack: setlocale and all.

What if I want to write a server such that each request is executed in a potentially different locale (because it services globally distributed users?)


Ah yes, I was thinking about the system call interfaces but if we start looking at libc damn near nothing was reentrant until relatively recently (the late 90s and early 00s count as recent right? ... right?)

Just take a gander at all those _r functions where an extra parameter needed to be added for reentrancy.


Some of them turned out to be hyper-corrective duds, like readdir_r. No need for that one, since the buffer it needs can be allocated in the DIR object. You'd never want multiple threads concurrently doing readdir_r from the same DIR stream; there is no credible use case for it. (Or, maybe, optimization? Since it can store the dirent-s directly into some space designated by the caller, eliminating a copy operation from a program that wants to retain all those dirent-s.)


Is this because, as I recall, a DIR object is opaque while a FILE is not? Allowing the former to be expanded without breaking the API?

edit: Nevermind, that shouldn't matter should it? The important thing is that they're both by reference.


FILE can be opaque. Some implementations made the (arguable) mistake of implementing e.g. fileno() as a macro that accesses FILE internals, and this means they can't change that layout without breaking libc ABI of existing programs. But you can implement all POSIX stdio FILE APIs with an opaque FILE.


What if I want to write a server such that each request is executed in a potentially different locale?

You can use uselocale() to install a thread-specific locale.

http://pubs.opengroup.org/onlinepubs/9699919799/functions/us...


Then your thread better only execute one request at a time and carry through every operation involved in that request. Works fine in PHP, not fine in most performant and even most non performant web contexts.


But that's fine. For example, you pull the thread from a pool and bind it to the request. As part of this, the thread pulls the locale object from the request and attaches to it. It services that request, and then returns to the pool. Next time it is pulled from the pool it gets a request in a different locale.

Other disciplines are possible. E.g. some operation involving multiple locales, all in one thread, can just do the uselocale calls as needed.


That's not fine, because processing a single request often involves waiting on some I/O, and the thread can service (a part of) another request in the meantime. Which is what modern event loop driven servers do - but this doesn't play well with thread locals, or abstractions implemented on top of them, because now your single physical thread effectively runs multiple cooperatively scheduled logical threads/requests.


That's useful to know; so they finally patched this, some ten years ago now.


haimez said this already, but I wanted to phrase it another way:

This only works if your not executing multiple requests in the same thread, ie: using non-blocking IO.


Even if you're using non-blocking or asynchronous IO there's always going to be a point at which you switch to the context of a particular request in order to handle it, and that's the point at which you can also switch locales if you want to handle it in a locale-specific way.


IEEE Std 1003.1-2017 actually specifies chdir() as thread-safe (obviously in the sense that it is atomic but influences all process' threads).

It strikes me as particularly weird as there are notionally POSIX-conformant OSes that implement it in a way that is not atomic.


Thread safety is not the issue. The issue is that multiple threads may want multiple different working directories simultaneously, for different work items. But there is one working directory for the process. This problem is partly alleviated by the ...at() family of system calls, of course.


It's comments like this which make me think that the state stack should be on the connection handle rather than on a worker unit... or maybe they're the same thing.


Then you don't use multithreading at all rather than conclude it's broken?


> processes give parallelism as well and predate the introduction of threading interfaces

Right, but the reverse isn't true, which is what I was getting at. That is, availability of hardware parallelism was a prerequisite for anyone wanting threading in the first place.


I don't get you. Are you saying that the only reason people would want threading capabilities is to exploit multicore/multiprocessor architectures?

That isn't true. Having multiple threads for GUI applications makes sense for responsiveness even on a single-core machine.


> Are you saying that the only reason people would want threading capabilities is to exploit multicore/multiprocessor architectures?

That's exactly what I was saying, though, specifically, kernel threading support.

> Having multiple threads for GUI applications makes sense for responsiveness even on a single-core machine.

I had never heard this before, but I'm hardly an expert. Is there something you could point to, ideally pre-MP (or pre-MP-popularity), that advocates for kernel thread support being needed for improved GUI responsiveness.


Strictly speaking you're right, it's not really about kernel threads. It's about concurrency.

Proper pre-emptive threads in the kernel aren't the only way to achieve that: Java used to use 'green threads' rather than kernel threads. Its advice on using worker threads to keep your Swing application nice and responsive [0], would still have applied. (I believe this is still the case today for GTK programming in Python, which still uses green threads.)

Most GUI toolkits use C-based languages and compile to native code. Cooperative multithreading isn't a popular model in C-family languages, but I can't see a good technical reason that it wouldn't work (after all, it boils down to something akin to green threads with finer control).

A well-designed Qt app (in C++) will use worker threads to keep the UI thread responsive [1]. It doesn't matter if the worker threads run on a different core, or on the same core through time-division multiplexing. It's not a question of compute-throughput. What matters is that the UI thread doesn't get blocked for long.

[0] https://docs.oracle.com/javase/tutorial/uiswing/concurrency/... [1] https://doc.qt.io/archives/qq/qq27-responsive-guis.html


> Strictly speaking you're right, it's not really about kernel threads.

I'm not particularly interested in being right or wrong, but, rather, improving understanding.

The reason I made the (strict) distinction was that the whole discussion has been about kernel-supported file locking. Anything called "threading" outside of the kernel seems inapplicable, no matter how closely related conceptually or in practice.

Of course, I may be missing something, such as if userspace/green threads both significantly preceded kernel threading support and motivated its implementation (presumably by benefiting from it).


> I'm not particularly interested in being right or wrong, but, rather, improving understanding.

Sure, I hadn't meant to seem combative.

> the whole discussion has been about kernel-supported file locking. Anything called "threading" outside of the kernel seems inapplicable, no matter how closely related conceptually or in practice

Is it necessarily inapplicable? You could write a Python program that uses file-based locking, despite that Python threads don't map to kernel threads, no?

Or in C, you could have 'fake threads' where you effectively bounce control around between a few different execution streams (something akin to coroutines or fibers). We could still reason about which 'thread' holds the file lock, despite that there's only one kernel thread. (Aside: the GNU 'Pth' library does something like this.)

> such as if userspace/green threads both significantly preceded kernel threading support and motivated its implementation

I think interpreters tend to use green threads simply for ease of implementation, more than for anything else. I suppose it helps with cross-platform concerns, but these days, it's quite possible to write multi-platform concurrent C/C++ code. Green threads aren't so popular today, now that multicore is the norm.

Long ago, processes predated (kernel) threads. Before my time, but I believe the general idea then was that if you want the kernel to orchestrate concurrent execution (including executing in parallel on multiple cores if possible) then you'd just have to bite the bullet and go multi-process.

Today we still see some use of process-level concurrency/parallelism, such as in Postgres.


> Is it necessarily inapplicable? You could write a Python program that uses file-based locking, despite that Python threads don't map to kernel threads, no?

Sure, but what would be the point? I'm looking for legitimate motivations, not contrived examples of what's merely possible. Remember, we're talking about single-core, single-process, multi-threaded for responsiveness (e.g. in a GUI) here. That single process wants to use kernel file locking?

> Long ago, processes predated (kernel) threads. Before my time, but I believe the general idea then was that if you want the kernel to orchestrate concurrent execution (including executing in parallel on multiple cores if possible) then you'd just have to bite the bullet and go multi-process.

By the time Unix-based MP systems became common (all single-core at the time, just multi-socket), those OSes tended to have kernel threads and at least lightweight processes. I can't recall when SunOS 4.1 got it, but I think it was around '91 if not before, but Sun was pushing SunOS 5 aka Solaris 2 for their big SMP boxes https://www.dre.vanderbilt.edu/~schmidt/PDF/beyond_mp.pdf

Before that, there wasn't much biting of any bullets, as I recall, as the challenge was time-sharing the scarce resource that was a CPU, not parallelism. Frequent context-switching, process or even thread, would be the performance-killing worry for something real-time like a GUI on a computer with a single user. Per Moore's Law, today's CPUs have something like 11000x the transistor density of the ones in '91.


> what would be the point

The point of my comment there was that I'm not sure what you meant by 'inapplicable'. It would work, right?

As you say, it seems unlikely you'd ever want to use file locks for intra-process concurrency when you could just use the mutex machinery of your language's standard library. The use of green threads doesn't change things at all.

> the challenge was time-sharing the scarce resource that was a CPU, not parallelism

The question there is how pre-emption works, right? They're non-cooperative threads after all. If I understand correctly, Solaris used not to have a direct mapping from userland threads to kernel threads (I'm not sure how the multiplexing worked), but they later moved to a direct mapping (every thread really is backed by a kernel-managed thread).

(I could be completely wrong about that, I just skim-read the 'Solaris Internals' book once.)


> The point of my comment there was that I'm not sure what you meant by 'inapplicable'.

Inapplicable to the context of the over-arching discussion, which is POSIX advisory file locks.

These are not the "threads" you're looking for.

>>> Are you saying that the only reason people would want threading capabilities is to exploit multicore/multiprocessor architectures?

So, to recap, yes, I'm saying the only reason people would want kernel threading capabilities (the only threading capabilities applicable to the topic at hand) is to exploit multicore/multiprocessor architectures.


> the only threading capabilities applicable to the topic at hand

If we're going to discuss the merits of kernel threads, it doesn't make sense to do so in a vacuum.

> I'm saying the only reason people would want kernel threading capabilities [...] is to exploit multicore/multiprocessor architectures

Didn't we already discuss this? The answer is no.

Preemptive multithreading of native-compiled code, is a popular concurrency strategy. I gave the example of Qt. GUI toolkits based on C/C++ could use userland-based cooperative multithreading to keep things responsive, but they don't. They use kernel threads, for reasons which have nothing to do with parallelism.


> Didn't we already discuss this? The answer is no.

We did, but you never made the case for the answer being no, which means the answer is still yes.

> GUI toolkits based on C/C++ could use userland-based cooperative multithreading to keep things responsive, but they don't. They use kernel threads

That's descriptive, not normative. You haven't said why that's the case, or, more importantly, why, in the absence of kernel threads, someone would want them to exist only for GUI responsiveness types of use cases.

Your earlier admission led me to believe they wouldn't:

> Cooperative multithreading isn't a popular model in C-family languages, but I can't see a good technical reason that it wouldn't work (after all, it boils down to something akin to green threads with finer control).


> you never made the case for the answer being no, which means the answer is still yes

The widespread use of preemptive concurrency for reasons aside from parallelism, demonstrates my point.

> You haven't said why that's the case

If I had to guess, I'd go with this: it's the easiest option for the C/C++ languages and developers, considering how existing functions tend to be blocking, and that C/C++ programmers are more likely to be familiar with preemptive concurrency than cooperative concurrency.

You certainly wouldn't want to use a model that relies on, say, continuations. Intel's TBB library does this. It's a very good library, but the user must write pretty hairy C++ code, despite the benefits of modern C++. It's no help that continuations are trivial in, say, modern C#.

Fibers/coroutines can be done with external C/C++ libraries, but it's seen as a pretty exotic thing to do.

> why, in the absence of kernel threads, someone would want them to exist only for GUI responsiveness types of use cases

Looks like Windows first got 'proper threads' in Windows 98. I suspect it's not purely for GUI, but for other types of application too, such as servers. Again I suspect the reason they're used in preference over the cooperative alternative, is largely just what people are used to.

Looks like Windows provides fibers for C++, so it's not like they've never heard of them. [0]

> Your earlier admission led me to believe they wouldn't:

Sure, I'm not hating on cooperative multithreading. They have advantages: fine-grain user control, no unexpected interleaving, better performance regarding switching. Bit of a pity it's not often used, I suppose.

I google'd the question, but didn't find a decent exploration of why they're so rarely used. This StackOverflow answer thinks the answer is technical, but I'm not convinced. It seems to discount asynchronous IO. https://stackoverflow.com/a/16766549/

[0] https://docs.microsoft.com/en-us/windows/desktop/procthread/...


> The widespread use of preemptive concurrency for reasons aside from parallelism, demonstrates my point.

If your point is that kernel threads are more attractive that the alternative, once they already exist, then yes, but that continues to miss refuting my assertion.

Using something ("off label"[1]) once it's already there, isn't the same thing as wanting/needing it in the first place.

[1] consistent with my assertion that the "label" for threading specifies its use for multiprocessor support. After all, how did the early X server or Sun's NeWS manage to be so responsive before all that newfangled threading stuff in the 90s?


> how did the early X server or Sun's NeWS manage to be so responsive before all that newfangled threading stuff in the 90s?

Good question. I have no idea.

If my intuition is accurate, systems like the DOOM engine just crunch through each frame's workload as a sequential queue, tending to things like the audio buffer whenever necessary. I don't know if there's any asynchronous code to speak of in there.


> Today we still see some use of process-level concurrency/parallelism, such as in Postgres.

FWIW, as somebody working on postgres, there's plenty reasons to regret this. The unshared memory, fd spaces make some things more complicated / heavyweight. We've cross-process shared memory allocators, but then the pointers all have to be relative. It's harder to separate connections from a "query execution context", making connections more heavyweight than necessary. Worker threads for async IO - which we'd love to have - would need to be duplicated in each process. Etc.

There's still quite some benefits, like increased robustness in the face of bugs, reduced contention for memory alloccators etc, ... But I doubt you'd find anything close to a majority inside the postgres team to use threads if we'd need to make the decision again, unencumbered by existing code.


As the sibling comment implies, I don't mean SMP anywhere in computing, I mean in the Unix world, which didn't even exist in the early 1960s.

I'll grant that my early Unix knowledge is far from comprehensive, primarily skewed toward hardware at certain universities and in the database industry, as well as mostly BSD, with little from the "AT&T side" before SVR3. Given that, I'd be hard-pressed to name a Unix port that supported SMP prior to 1983.

And I didn't mean to equate SMP with threads, only that thread support wouldn't make much sense without hardware concurrency, of which multi-processor hardware was an early practical example in the Unix world.

As such, it was merely a (failed) rhetorical device to illustrate how designers of filesystem locking might not even have a notion such as intra-process concurrency on their mental radar.

Even if you are, pedantically, correct, it doesn't refute my overall suggestion.


That design might be for portability, since pread() and pwrite() ought to be thread safe, so one global FD per file for your multi threaded app should be enough, at least on Linux and freebsd. It does however not help if some other part of your application "accidentally" opens that file too (some lib for example).


Well, threading is a major problem area, but it is most definitely broken by design in any case, in that it has sort-of "spooky action at a distance" semantics, which is just completely uncomposable. Imagine you call a function from a library that simply scans a directory for files with a specific magic value, say ... if any other part of your program happens to hold a POSIX lock on any of those files (could even be another unrelated library), that will release those locks. That's just braindead.


I can't claim to speak for the designers, but my intuition (having had my first Unix exposure in the early/mid-80s) is that their assumptions were along the lines that it would be used consistent with the part of the Unix philosophy stating:

> Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new "features".

This seems inconsistent with the (now totally common) programming styles that use multiple, unrelated libraries or have a single program do something that requires keeping files with locks open, while going off on a tangent to scan a directory. Back then, I suspect the standard practice would be to fork() if not outright delegate the task to existing tools like find/xargs.

I certainly agree that hidden side effects like closing a duplicate fd causing all the locks to disappear is undesirable, but it might not even have occurred to them that anyone would care. Maybe they thought about it and the alternative would be too memory intensive (remember when memory was actually scarce?). It would be interesting if anyone had pointers to something like actual mailing list archives.

Characterizing it as "just braindead" as uncharitable, when it's decades later, without at least being able to point to contemporaneous pressures and constraints.


Nope.

ATT’s implementation had this really stupid behavior and they snuck it into POSIX because that meant they wouldn’t have to change their existing code. No one else realized how stupid it was until it was too late.

The design is stupid. There is no excuse for it.


Right. This article by Jeremy Allison sheds some light on how this broken design ended up in POSIX.

https://www.samba.org/samba/news/articles/low_point/tale_two...

> The reason is historical and reflects a flaw in the POSIX standards process, in my opinion, one that hopefully won't be repeated in the future. I finally tracked down why this insane behavior was standardized by the POSIX committee by talking to long-time BSD hacker and POSIX standards committee member Kirk McKusick (he of the BSD daemon artwork). As he recalls, AT&T brought the current behavior to the standards committee as a proposal for byte-range locking, as this was how their current code implementation worked. The committee asked other ISVs if this was how locking should be done. The ISVs who cared about byte range locking were the large database vendors such as Oracle, Sybase and Informix (at the time). All of these companies did their own byte range locking within their own applications, none of them depended on or needed the underlying operating system to provide locking services for them. So their unanimous answer was "we don't care". In the absence of any strong negative feedback on a proposal, the committee added it "as-is", and took as the desired behavior the specifics of the first implementation, the brain-dead one from AT&T.


That's the first design-related documentation I've seen presented, so thank you.

However, it's merely an indictment of the POSIX standards process and sheds no light on why the AT&T implementation was the way it was in the first place.

I keep reading what, to me (essentially an outsider with no horse in the race), sound like either hyperboles at worst ("brain-dead", "really stupid", "no excuse") and arguments advocating the MIT Method (of "Worse is Better" fame) at best.

It tends to beg the question, "If it's so horrible, why did anyone bother to implement it that way or use it once it was there?" I'd expect, if it's actually as broken as everyone makes it out to be, that it would be worse than nothing and would get no use.

Apparently, the major db vendors didn't bother, at least at the time, but that could well be because there was no reliable [1] cross-platform option.

So, again, how about some actual, contemporaneous evidence of the original design process, for a fair, contextual critique?

[1] i.e. reliable or good enough, not reliable in the sense of implementation correctness covering all corner cases, as seems to be demanded by certain commenters and the MIT Method in general.


What would falsify your position? Is there any evidence that you would not reject with "but maybe it was good enough"?

If I presented a case where a worker was killed because of this locking mechanism, wouldn't you just say "but maybe it prevented the meltdown of a nuclear reactor, so it was a net positive vs. no locking at all, therefore, it was the right thing to do"?


I'm not sure you actually understand my position, which has nothing to do with which philosophy is "right" or "better", but, rather, that the one you seem to object to so strenuously on moral grounds, both existed and was a valid engineering consideration/strategy at the time.

To falsify it, you would merely need documentation that such a philosophy was not a consideration in this design.

If it's falsified, then there may well be something new to be learned about design and mistakes to be avoided.

Otherwise, it's just another example of "Worse Is Better" and isn't worth the effort.


A hacky implementation can be simultaneously good enough for some users and also completely unacceptable for a standard.

It's not worse than nothing, sure. If you use it exactly as expected, and only have one logical user of a file in each process, it does an acceptable job. But in the context of what it's supposed to do, be a generic locking mechanism, it's horrifically broken.

So I'll summarize this way: It's completely okay that they wrote this code, as a "version 0.2". But there's no excuse for presenting it as finished code, with reasonable semantics. It's not hyperbole to say that.


> also completely unacceptable for a standard.

[...]

> It's completely okay that they wrote this code, as a "version 0.2". But there's no excuse for presenting it as finished code, with reasonable semantics. It's not hyperbole to say that.

Am I missing something, or are you just saying "Worse is Worse"?

The "excuse" is that this way (arguably, perhaps) governed the history of Unix development, well before there was even a standard.

What I've been attempting to get people (especially ones with the seemingly most strenuous invectives toward the design) is to perform the thought experiment of placing themselves in the "shoes" of the designers, both by trying to imagine being in that past and, much harder, actually believing in that philosophy.

I believe that will go much farther in increasing understand and, to borrow from the HN guidelines, gratify intellectual curiosity, than arguing against strawmen (or just non-existent proponents) of design goodness.


> Am I missing something, or are you just saying "Worse is Worse"?

No, I'm not. It's fine that they made that code, and were using it.

But different situations have different requirements.

I'm not objecting to the design work at all.

I'm objecting to the idea of calling it "ready to standardize". This is a presentation problem, not a development problem. It was half-baked, and shouldn't have been set in stone until it was fully baked.


> I'm objecting to the idea of calling it "ready to standardize".

Oh, indeed, that's a distinctly different topic than has been focused on in the rest of the thread. The upthread indictment of the process is quite on point.

It's also a wide topic with a wide variety of involved parties, worthy of its own thread, off of a blog post.


> both existed and was a valid engineering consideration/strategy at the time.

Please define what you mean by "valid".

> To falsify it, you would merely need documentation that such a philosophy was not a consideration in this design.

Well, then I probably don't care? I care whether it was a bad idea, not whether it was a bad idea coming from a bad philosophy or a bad idea standing on its own.


> Well, then I probably don't care? I care whether it was a bad idea, not whether it was a bad idea coming from a bad philosophy or a bad idea standing on its own.

If your goal is to prevent such a "bad idea" in the future, then not caring could easily work against you.

You'd end up having to expend energy challenging each bad idea on its own, possibly failing even to make inroads because there's a bad philosophy that makes it seem like a good idea (maybe even obviously so, rendering your challenge easily dismissed and a wasted effort).

Instead, if you focus your energy on challenging the bad philosophy, you both get to the heart of the matter right away, and you cover all the new bad ideas it enables all at once (and even ahead of time). You also won't have to start from scratch, as the philosophy, unlike every new bad idea, isn't unique, and there's likely plenty of literature out there already that you can use as ammunition.


Well, yeah, of course I care. But not for the determination of whether a bad idea is bad. This may be an instance of worse is better, but it is a bad idea regardless, and thus at best an additional piece of evidence against the philosophy.


Eeeh, this seems like a far fetched interpretation of the principle. The dbm library, for one, was around since '79,and I'd guess Ken Thompson would adhere to the principles instead if they were so pervasive.


I wasn't asserting that they were pervasive or even universal (in the Unix universe).

Perhaps I'm missing your point about dbm, unless it's just that libraries existed, even long ago, that happened to open files. That doesn't refute my point, which is that a process operating on those files via that library and outside of that library is incongruous.

As further evidence for there being at least an idea of such a dual "class" of files, I present the fact that the * glob does not match files starting with a dot.


Well, take a simpler case: A program that takes a lock and then reads a file specified by the user on the command line.

There are tons of perfectly reasonable scenarios that are hilariously complicated to implement correctly, even ones that fit within the more traditional unix approach.

Yes, maybe this is a result of some sort of tradeoff, but I would say it almost certainly was not a sensible tradeoff even back then. I don't see how using a sensible strategy could be much worse in terms of any resources used (I mean, really, it's a matter of where you put the pointers to the lock structures and where you call the automatic unlock handler), and the costs of using the semantics as they are now correctly are so high it's just not a useful abstraction really. If anything, this smells like one of those tradeoffs where a terrible abstraction that is incredibly difficult to use correctly was chosen simply because it was less code to write, while thus causing every user of the abstraction to have to write tons more code if they want to have correct code.

Though if you ask me this looks much more like someone not thinking it through. Locks are for coordinating concurrency, the unit of concurrency is the process, so locks are held by processes. Oh, and we have to clean up locks that aren't released by the process! Well, let's do that when the file is closed ... OK, done!


> Well, take a simpler case: A program that takes a lock and then reads a file specified by the user on the command line.

I'm having trouble seeing the use case under that Unix Philosophy constraint. I admit I might not be imaginative enough, but a program/utility that would be normally expected to operate on its own locked file (by being asked to do so by the user) still seems incongruous.

> it was less code to write, while thus causing every user of the abstraction to have to write tons more code if they want to have correct code.

Actually, that's a completely valid design decision and philosophically compatible, even if you happen to find it distasteful.

It's one of the (caricatured) tenets of "Worse is Better" which has been credited with the (original) success of Unix over more "correct" competitors http://dreamsongs.com/RiseOfWorseIsBetter.html :

> Simplicity -- the design must be simple, both in implementation and interface. It is more important for the implementation to be simple than the interface. Simplicity is the most important consideration in a design. > Correctness -- the design must be correct in all observable aspects. It is slightly better to be simple than correct. > Consistency -- the design must not be overly inconsistent. Consistency can be sacrificed for simplicity in some cases, but it is better to drop those parts of the design that deal with less common circumstances than to introduce either implementational complexity or inconsistency. > Completeness -- the design must cover as many important situations as is practical. All reasonably expected cases should be covered. Completeness can be sacrificed in favor of any other quality. In fact, completeness must be sacrificed whenever implementation simplicity is jeopardized. Consistency can be sacrificed to achieve completeness if simplicity is retained; especially worthless is consistency of interface.

It seems like advisory locking has made a "worse" sacrifice in each category, in favor of the purported benefit. There's even a reference (though I don't recall if it's in that essay, specifically) to making the user do extra coding work for correctness.

Of course, there are plenty of critics of "Worse is Better" or the "New Jersey approach", including the author of that essay. However, his whole point is that it improves survivability, which, arguably, was still important to Unix in the 80s and maybe even into the 90s.

> not thinking it through. Locks are for coordinating concurrency, the unit of concurrency is the process, so locks are held by processes. Oh, and we have to clean up locks that aren't released by the process! Well, let's do that when the file is closed

It's absolutely not thinking every possible future scenario and corner case through, but that, I would argue, is the opposite of "braindead". It worked, especially for the use case you described, which seems perfectly reasonable for early Unix (and adherents to the Unix Philosophy of any era).


> I'm having trouble seeing the use case under that Unix Philosophy constraint. I admit I might not be imaginative enough, but a program/utility that would be normally expected to operate on its own locked file (by being asked to do so by the user) still seems incongruous.

That's just bad engineering. "Seems unlikely, therefore we ignore it" is essentially how you create broken corner cases that result in unreliable systems, i.e., braindead.

> Actually, that's a completely valid design decision and philosophically compatible, even if you happen to find it distasteful.

No, it's not. It's one that people make, that doesn't make it valid, just a reality.

> However, his whole point is that it improves survivability, which, arguably, was still important to Unix in the 80s and maybe even into the 90s.

That's not a justification for building bad systems, that's only an explanation for why bad systems survive once they have been built.

You might as well say that fraud improves survivability. Yeah, under certain circumstances it does, that doesn't mean that winning a market via fraud is in the interest of all market participants or that your product is therefore good, it simply means that you can win against competitors by using fraud.

> It's absolutely not thinking every possible future scenario and corner case through, but that, I would argue, is the opposite of "braindead".

It's obviously wrong scoping, hence braindead. You don't need to think through every possible future scenario, you simply have to try and formulate the composability semantics of the mechanism and you should notice it's broken.

> It worked, especially for the use case you described, which seems perfectly reasonable for early Unix (and adherents to the Unix Philosophy of any era).

In which case that philosophy was braindead then?


> That's just bad engineering. "Seems unlikely, therefore we ignore it" is essentially how you create broken corner cases that result in unreliable systems, i.e., braindead.

I can only assume, then, that your "simple" example wasn't simple at all, but a contrivance to result in such a corner case.

You've also confused "we ignore it" with "we consciously decide not to bother with it".

> That's not a justification for building bad systems, that's only an explanation for why bad systems survive once they have been built.

This is a distinction without a difference. This isn't a moral issue, so "justification" is irrelevant.

There are only tradeoffs. Insisting that the design was just bad, broken, unreliable, or braindead, while not even acknowledging the tradeoffs (i.e. ignoring the positives or, instead, calling them negatives) strikes me as disingenuous.

> You might as well say that fraud improves survivability.

Nope. That's a strawman.

> In which case that philosophy was braindead then?

I urge you to read the essay and the commentary around "Worse is Better". Otherwise, all this is essentially just a shallow dismissal.

You have, in essence, argued in favor of the "MIT method". As I said, this argument was already given, by the original proponents, the mentioned essay's author, and numerous other reasonably well known names in computer science, mostly decades ago. A web search or even just Wikipedia can provide jumping off points.


> This is a distinction without a difference.

Wut? Explanations are the same thing as justifications? I think you have to explain ...

> This isn't a moral issue, so "justification" is irrelevant.

Actually, it is?

> There are only tradeoffs. Insisting that the design was just bad, broken, unreliable, or braindead, while not even acknowledging the tradeoffs (i.e. ignoring the positives or, instead, calling them negatives) strikes me as disingenuous.

There probably were no benefits, other than for the people implementing the kernel, as explained earlier.

> Nope. That's a strawman.

So, what am I misrepresenting then?

> I urge you to read the essay and the commentary around "Worse is Better". Otherwise, all this is essentially just a shallow dismissal.

Yeah, thanks, it doesn't make sense as a justification, only as an explanation.


If you want locks from the same user you'd use other mechanisms that file locks. File locks are for processes that don't have a lighter weight way to cooperate. (e.g. semaphores.)

Also, there's a reason that databases have processes dedicated to writing (e.g. Postgres writer process, Oracle dbwr)

Fun fact - IBM's MVS didn't have file locks.


there's a reason that databases have processes dedicated to writing

That's going to be a little tricky in an embedded db.


Sure, but you could still make sure writes are done in a mutex?


Don't forget that it still needs to synchronize across process boundaries, too - two processes, both running SQLite, and both accessing the same database file, is a supported scenario.


As far as I know these alternative thread libraries are ancient history. I can understand you want to defend against sabotage like people using hard links to open a database under two names but is that really a dealbreaker?


When he gets a file descriptor, he could also get an inode. Since on hard links inode will be the same, the prior lock state, if any, can be obtained and action taken accordingly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: