Hacker News new | past | comments | ask | show | jobs | submit login

> Or: C knows that it doesn't need fixing.

People don't like APIs that can randomly crash your program while there's no good technical reason for why they should. Why not fix the problem? People like you, who have no issues with the current implementation, won't see any regressions because you're already a good citizen, and myriad other programmers whose programs do occasionally crash because of this will be helped.

> So, is there a potential issue with thread safetey? Yes. Does it matter given where and under what circumstances it occurs? Not really.

"The unpredictable crashes only happen very rarely" doesn't mean the crashes go away.

> What kind of actual real life production code would continuously set envvars while simutaneously calling a function that tries to read the environment?

The reproduction sample calls setenv in a loop so the issue can be reproduced. A single setenv anywhere in the code is enough to trigger the crash, but then you would get one of those "you need to run the program a million times to reproduce it" bug reports that gets pushed down the line.




> Why not fix the problem?

Because doing so breaks backwards compatibility, simple as that.

The problem isn't even that `setenv` isn't thread save. The problem is that `getenv` returns a `*char` directly into the environment memory space. Many many many programs rely on that being the case.

> People like you

People like me would like every software to be perfect, but that's not the world we live in, so we are forced to be pragmatic. When fixing something causes more problems by breaking backwards compatibility promises, than it prevents, then there is no good argument for a fix, and the correct approach is to say "yes, this sucks, let's document it well so people don't waste too much time on this".

The setenv/getenv problem is such a case. Anyone who disagrees is free to fork glibc, implement whatever fix they think is adequate, and then try to compile the software packages found on a typical Linux server against the result.

> so the issue can be reproduced.

"Can be reproduced" and "is a common issue in production code" are not the same.

Fact is, almost all production programs that set envvars, do so once, very early in the process lifecycle, and then never again, and so are never affected by this.


So why not implement the fix suggested in the article: improve the existing interface to the extent possible, and introduce a new interface which is easier to use correctly.


There is nothing to "improve" on the existing interface, really. From a C point of view ... a _hidden_ global lock is worse than no lock at all. Because in the latter case ... you, as the programmer, have a choice what to do. If you never call setenv(), no locks. If you only ever call setenv() in your startup code, no locks. If you only ever call setenv() after fork&co, no locks. And if you do believe you need to call it at runtime, but are singlethreaded ... still no locks. And if you really really really need to call it from a multithreaded process, concurrently with getenv(), then lock around both and make your getenv() "safe" wrapper create you an owned point-in-time copy - basically a getenv_r().

Note also that "global references" like getenv() returns and point-in-time owned snapshots don't behave the same way. Say, a library initializer code could retrieve a number of env var references by calling getenv(), and then use those at runtime. No more need/use for getenv() again after - and even perf-sensitive code could look at the env var. With a func that copies, the perf-sensitive code would need to do that each time (lock, lookup, copy). Not strongly desirable.

Also ... UNIX is rather flexible ... and if you so wish, you _can_ substitute _your own_ setenv()/getenv() by the magic of dynamic linking. To create a set that locks and returns you leaked copies (changes the semantics of getenv so that the caller must free the pointer to avoid a leak). It's all possible to do this.

I'm getting the impression from this that we see a "go tantrum" here. "I make my own standards but I wanna use that C/Unix standard thing as well but not how it is because it's not nice it should take go into account waaaahwaaah ...".

It is not _nice_ to modify your own env at runtime. Maybe, just maybe ... that's for reasons. Because not everything that can be done is also a great idea.


So why not implement it yourself, instead of polluting the standard runtime with functionality that nobody needs?


> People don't like APIs that can randomly crash your program while there's no good technical reason for why they should. Why not fix the problem?

I think you're not seeing this from the right POV. People that consume POSIX API need to know POSIX API.

https://pubs.opengroup.org/onlinepubs/009604499/functions/se...

It says loud and clear "The setenv() function need not be reentrant. A function that is not required to be reentrant is not required to be thread-safe."

> "The unpredictable crashes only happen very rarely" doesn't mean the crashes go away.

If you get a crash over setenv() reading the manual page of setenv C call should be your first step. And the only step. The bigger issue is in design of application that has wrongly assumed setenv() is thread-safe. That requires a refactoring and is solely due to developer misunderstanding the API.


"RTFM" is not a coherent defense for awful API design and we shouldn't accept it as such.


Who is "we"?

I'm a UNIX/C programmer for decades and I don't care about this.

There is no such thing as beautiful API design. Every design is a compromise. If you think non-reentrant calls should be deprecated in POSIX take it to the committee.

There is a myriad of non-reentrant code both in POSIX spec and in libc implemenations. You need to RTFM, I'm sorry.

There is no "coherent API" as far as null termination goes too. Some library functions deal with it, some calls don't. You need to RTFM.

I also want to know OP's reason to even use setenv() in a multithreaded piece of software. It's like an oxymoron. setenv and vars are useful to pass on data from parent process to forked children because they inherit the environment. If you use the threading model you don't need it. If your application is a single process setenv() is useless.


What should we accept? That every library is made under the assumption that it has to work as expected, even if people ignore the documentation?

As someone who made and maintains multiple libraries: No. Not gonna happen.


Putting aside whether or not the design is awful, the fact that it's standardized and documented is absolutely a valid argument. Changing it now would break backward compatibility. That should always be a showstopper.

Programmers who are using any library code without reading and understanding the documentation are asking for trouble regardless of language.

The correct solution to your objections is to create new functions that behave as you prefer.


The real skinny of it is that it’s in the name: “Environment”.

If you’re calling setenv in the middle of your program, you fucked up.

There are those things in programming that should be extremely triggering to your “what the actual fuck?!” senses, and “setenv in the middle of runtime” is one of those things.


True, but for every envvar a program reads, something called setenv on it originally. It’s not like no programs call setenv in the middle of runtime. Examples:

- Shells

- CI runner

- Container launchers

- IDEs


> but for every envvar a program reads, something called setenv on it originally

That's not true, that's just misunderstanding how it works. `execve()` takes an entirely new copy of environment variables to give to the child, that's the "real" way to do it.


The child process's environment for these purposes is constructed without mutating its parent's environment - a copy is used - and before the child process actually runs the target code it was created to run. So there is no possibility of race between mutations to the environment and reads of the environment. If you are writing such a tool but doing something other than this, you are doing it wrong.


No, a process gets its environment variables from the operating system (just like argc, argv) before any code is ever executed and the majority never change them.


Then why does setenv even exist? Maybe that's the issue and it should be deprecated and throw compilation warnings?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: