> Or: C knows that it doesn't need fixing. People don't like APIs that can rando...

usrbinbash · on Nov 20, 2023

> Why not fix the problem?

Because doing so breaks backwards compatibility, simple as that.

The problem isn't even that `setenv` isn't thread save. The problem is that `getenv` returns a `*char` directly into the environment memory space. Many many many programs rely on that being the case.

> People like you

People like me would like every software to be perfect, but that's not the world we live in, so we are forced to be pragmatic. When fixing something causes more problems by breaking backwards compatibility promises, than it prevents, then there is no good argument for a fix, and the correct approach is to say "yes, this sucks, let's document it well so people don't waste too much time on this".

The setenv/getenv problem is such a case. Anyone who disagrees is free to fork glibc, implement whatever fix they think is adequate, and then try to compile the software packages found on a typical Linux server against the result.

> so the issue can be reproduced.

"Can be reproduced" and "is a common issue in production code" are not the same.

Fact is, almost all production programs that set envvars, do so once, very early in the process lifecycle, and then never again, and so are never affected by this.

mastax · on Nov 20, 2023

So why not implement the fix suggested in the article: improve the existing interface to the extent possible, and introduce a new interface which is easier to use correctly.

fch42 · on Nov 20, 2023

There is nothing to "improve" on the existing interface, really. From a C point of view ... a _hidden_ global lock is worse than no lock at all. Because in the latter case ... you, as the programmer, have a choice what to do. If you never call setenv(), no locks. If you only ever call setenv() in your startup code, no locks. If you only ever call setenv() after fork&co, no locks. And if you do believe you need to call it at runtime, but are singlethreaded ... still no locks. And if you really really really need to call it from a multithreaded process, concurrently with getenv(), then lock around both and make your getenv() "safe" wrapper create you an owned point-in-time copy - basically a getenv_r().

Note also that "global references" like getenv() returns and point-in-time owned snapshots don't behave the same way. Say, a library initializer code could retrieve a number of env var references by calling getenv(), and then use those at runtime. No more need/use for getenv() again after - and even perf-sensitive code could look at the env var. With a func that copies, the perf-sensitive code would need to do that each time (lock, lookup, copy). Not strongly desirable.

Also ... UNIX is rather flexible ... and if you so wish, you _can_ substitute _your own_ setenv()/getenv() by the magic of dynamic linking. To create a set that locks and returns you leaked copies (changes the semantics of getenv so that the caller must free the pointer to avoid a leak). It's all possible to do this.

I'm getting the impression from this that we see a "go tantrum" here. "I make my own standards but I wanna use that C/Unix standard thing as well but not how it is because it's not nice it should take go into account waaaahwaaah ...".

It is not _nice_ to modify your own env at runtime. Maybe, just maybe ... that's for reasons. Because not everything that can be done is also a great idea.

rerdavies · on Nov 20, 2023

So why not implement it yourself, instead of polluting the standard runtime with functionality that nobody needs?

zare_st · on Nov 20, 2023

> People don't like APIs that can randomly crash your program while there's no good technical reason for why they should. Why not fix the problem?

I think you're not seeing this from the right POV. People that consume POSIX API need to know POSIX API.

https://pubs.opengroup.org/onlinepubs/009604499/functions/se...

It says loud and clear "The setenv() function need not be reentrant. A function that is not required to be reentrant is not required to be thread-safe."

> "The unpredictable crashes only happen very rarely" doesn't mean the crashes go away.

If you get a crash over setenv() reading the manual page of setenv C call should be your first step. And the only step. The bigger issue is in design of application that has wrongly assumed setenv() is thread-safe. That requires a refactoring and is solely due to developer misunderstanding the API.

tikhonj · on Nov 20, 2023

"RTFM" is not a coherent defense for awful API design and we shouldn't accept it as such.

zare_st · on Nov 20, 2023

Who is "we"?

I'm a UNIX/C programmer for decades and I don't care about this.

There is no such thing as beautiful API design. Every design is a compromise. If you think non-reentrant calls should be deprecated in POSIX take it to the committee.

There is a myriad of non-reentrant code both in POSIX spec and in libc implemenations. You need to RTFM, I'm sorry.

There is no "coherent API" as far as null termination goes too. Some library functions deal with it, some calls don't. You need to RTFM.

I also want to know OP's reason to even use setenv() in a multithreaded piece of software. It's like an oxymoron. setenv and vars are useful to pass on data from parent process to forked children because they inherit the environment. If you use the threading model you don't need it. If your application is a single process setenv() is useless.

usrbinbash · on Nov 20, 2023

What should we accept? That every library is made under the assumption that it has to work as expected, even if people ignore the documentation?

As someone who made and maintains multiple libraries: No. Not gonna happen.

JohnFen · on Nov 20, 2023

Putting aside whether or not the design is awful, the fact that it's standardized and documented is absolutely a valid argument. Changing it now would break backward compatibility. That should always be a showstopper.

Programmers who are using any library code without reading and understanding the documentation are asking for trouble regardless of language.

The correct solution to your objections is to create new functions that behave as you prefer.

wredue · on Nov 20, 2023

The real skinny of it is that it’s in the name: “Environment”.

If you’re calling setenv in the middle of your program, you fucked up.

There are those things in programming that should be extremely triggering to your “what the actual fuck?!” senses, and “setenv in the middle of runtime” is one of those things.

kstrauser · on Nov 20, 2023

True, but for every envvar a program reads, something called setenv on it originally. It’s not like no programs call setenv in the middle of runtime. Examples:

- Shells

- CI runner

- Container launchers

- IDEs

DSMan195276 · on Nov 20, 2023

> but for every envvar a program reads, something called setenv on it originally

That's not true, that's just misunderstanding how it works. `execve()` takes an entirely new copy of environment variables to give to the child, that's the "real" way to do it.

tsukikage · on Nov 20, 2023

The child process's environment for these purposes is constructed without mutating its parent's environment - a copy is used - and before the child process actually runs the target code it was created to run. So there is no possibility of race between mutations to the environment and reads of the environment. If you are writing such a tool but doing something other than this, you are doing it wrong.

stefan_ · on Nov 20, 2023

No, a process gets its environment variables from the operating system (just like argc, argv) before any code is ever executed and the majority never change them.

ric2b · on Nov 20, 2023

Then why does setenv even exist? Maybe that's the issue and it should be deprecated and throw compilation warnings?