So this is something I’ve never understood. If you modify a shell script while it’s running, the shell executes the modified file. This normally but not always causes the script to fail.
Now I’ve known about this behaviour for a very long time and it always seemed very broken to me. It’s not how binaries work (at least not when I was doing that kind of thing).
So I guess bash or whatever does an mmap of the script it’s running, which is presumably why modifications to the script are visible immediately. But if a new file was installed eg using cp/tar/unzip, I’m surprised that this didn’t just unlink the old script and create a new one - which would create a new inode and therefore make the operation atomic, right? And this (I assume) is why a recompiled binary doesn’t have the same problem (because the old binary is first unlinked).
So, how could this (IMO) bad behaviour be fixed? Presumably mmap is used for efficiency, but isn’t it possible to mark a file as in use so it’s cant be modified? I’ve certainly seen on some old Unices that you can’t overwrite a running binary. Why can’t we do the same with shell scripts?
Honestly, while it’s great that HP is accepting responsibility, and we know that this happens, the behaviour seems both arbitrary and unnecessary to me. Is it fixable?
> isn’t it possible to mark a file as in use so it’s cant be modified?
That's the route chosen by Windows for binary executables (exe/dll) and various other systems. Locking a file against writes, delete/rename or even read is just another flag in the windows equivalent of fopen [1]. This makes for software that's quite easy to reason about, but hard to update. The reason why you have to restart Windows to install Windows updates or even install some software is largely due to this locking mechanism: you can't update files that are open (and rename tricks don't work because locks apply to files, not inodes).
With about three decades of hindsight I'm not sure if it's a good tradeoff. It makes it easy to prevent the race conditions that are an endless source of security bugs on unix-like systems; but otoh most software doesn't use the mechanism because it's not in the lowest-common-denominator File APIs of most programming languages; and MS is paying for it with users refusing to install updates because they don't want to restart their PC.
I've updated .so files on FreeBSD while they're running. They weren't busy and a program which had it mmaped to run promptly crashed (my update wasn't intended to be hot loaded and wasn't crafted to be safe, although, it could have been if I knew it was possible). And now I won't forget why I should use install instead of cp (install unlinks before writing, by default, cp opens and overwrites the existing file)
This behavior in shell scripts predates mmap. In very early versions of Unix it was arguably even useful; there was a goto command which was implemented by seeking on the shell-script file descriptor rather than as a shell builtin, for example. I don't know of any use for it since the transition to the Bourne shell, but my knowledge is far from comprehensive. (I suppose if your shell script is not small compared to the size of RAM, it might be undesirable to read it all in at the start of execution; shar files are a real-life example even on non-PDP-11 machines.)
As I understand it, the reason for ETXTBSY ("on some old Unices...you can't overwrite a running binary") was to prevent segfaults.
cp usually just opens the file O_WRONLY|O_TRUNC, which seems like the wrong default; Emacs for example does create a new file and rename it over the old one when you save, usually, allocating a new inode as you say. By default it makes an exception if there are other hardlinks to the file.
Btrfs and xfs have a "reflink" feature that allows you to efficiently make a copy-on-write snapshot of a file, which would be ideal for this sort of thing, since the shell or whatever won't see any changes to the original file, even if it's overwritten in place. Unfortunately I don't think you can make anonymous reflinks, so for the shell to reflink a shell script when it starts executing it would need write access to somewhere in the filesystem to put the reflink, and then it would need to know how to find that place, somehow. And of course that wouldn't help if you were running on ext4fs or, I imagine, Lustre, though apparently an implementation was proposed in 02019: https://wiki.lustre.org/Lreflink_High_Level_Design
> there was a goto command which was implemented by seeking on the shell-script file descriptor rather than as a shell builtin, for example.
Oh noooo I just realized you could probably implement a shared library loadable module for bash `enable` that does the same thing... just fseek()s the fd...
“Emacs for example does create a new file and rename it over the old one when you save, usually, allocating a new inode as you say. By default it makes an exception if there are other hardlinks to the file.”
Though the trade off is that all operation ceases on a full hard drive.
I don’t have a better solution, but it’s worth noting.
Emacs gives you an error message in that case rather than destroying the old version of the file and then failing to completely write the new version, in the cases where it does the tempfile-then-rename dance. This is usually vastly preferable if Emacs or your computer crashes before you manage to free up enough space for a successful save.
It doesn't cease all operation; other Emacs features work as they normally do. Bash, by contrast, stops being able to tab-complete filenames, at precisely the time when you most need to be able to rapidly manipulate your files. At least, that's the case with the default completion setup in a few recent versions of Ubuntu.
Well, it looks like creating another hard link is a nearly-free solution. And beyond that, since emacs already has both behaviors, presumably you can tell it you want the in-place modification.
the reason why modifying a script during execution can have unpredictable results, not demonstrated in this test, is that Unix shells traditionally alternate between reading commands and executing them, instead of reading the entire file (potentially very large compared to 1970s RAM size) and executing commands from the in-memory copy. on modern systems, shell script sizes are usually negligible compared to system RAM. therefore, you can manually cause the entire file to be buffered by enclosing the script in a function or subshell:
bash will read(), do its multi-step expansion-parsing thing and then lseek back so the next read starts on the next input it needs to handle. This is why the problems described in the story can happen.
The other way to fix this is to simply use editors that will just make a new file and move over that file on the target on save. I believe vim or neovim does this by default, but things like, ed or vi do not. Emacs will do something similar on first save if you did not (setq backup-by-copying t) but any write after will still be done in-place. I tested this trivially without reviewing the emacs source simply doing the following and you can to with $EDITOR of choice:
!#/usr/bin/env bash
echo test
sleep 10
# evil command below, uncomment me and save
# echo test2
while running sleep, if changing the script causes things to happen, your editor may cause the problem described.
> If you modify a shell script while it’s running, the shell executes the modified file
That is dependent on the OS. In this case wasn't the shell script just executed fresh from a cronjob?
I remember on Digital Unix - on an Alpha so this was a few years ago - that you could change a c program (a loop that printed something then slept, for example), recompile and it would change the running binary.
> wasn't the shell script just executed fresh from a cronjob?
The description said that the script changed while it was running, so certain newly introduced environment variables didn’t have values and this triggered the issue.
My reading was that this was just a terrible coincidence - the cron job must have started just before the upgrade.
Regarding changing a C program, now you mention it I think that the behaviour you describe might also have happened on DG/UX, after an upgrade. IIRC it used to use ETXTBSY and after an upgrade it would just overwrite.
Not really behaviour that you want (or expect) tho.
It's nice to see the same mistakes that people have been making for as long as I've been alive, on small and large systems all over the world, still happen on projects with professional teams from HPE or IBM that cost hundreds of millions of dollars.
From what I know, so far linux doesn't have an exclusive lock capability on a file, windows does however. So in linux you can't mark a file in exclusive possession of a process.
Now I’ve known about this behaviour for a very long time and it always seemed very broken to me. It’s not how binaries work (at least not when I was doing that kind of thing).
So I guess bash or whatever does an mmap of the script it’s running, which is presumably why modifications to the script are visible immediately. But if a new file was installed eg using cp/tar/unzip, I’m surprised that this didn’t just unlink the old script and create a new one - which would create a new inode and therefore make the operation atomic, right? And this (I assume) is why a recompiled binary doesn’t have the same problem (because the old binary is first unlinked).
So, how could this (IMO) bad behaviour be fixed? Presumably mmap is used for efficiency, but isn’t it possible to mark a file as in use so it’s cant be modified? I’ve certainly seen on some old Unices that you can’t overwrite a running binary. Why can’t we do the same with shell scripts?
Honestly, while it’s great that HP is accepting responsibility, and we know that this happens, the behaviour seems both arbitrary and unnecessary to me. Is it fixable?