Many of C's problems relate to string handling. These are all legacy functions w...

_kst_ · on March 4, 2021

strncpy() is not a "safer" strcpy(). It can avoid some errors involving writing past the end of the target array (if you tell it the correct length for that array), but it's not a true string function, and it can leave the target unterminated and therefore not a valid string.

http://the-flat-trantor-society.blogspot.com/2012/03/no-strn...

icedchai · on March 4, 2021

This is true, and many people don't realize it. I used to call a wrapper function that would always set the last byte to 0.

rrauenza · on March 4, 2021

I never could really understand the point of strncpy()... we always end up wrapping to deal with writing an unterminated string.

Was it intended for fixed length records?

tedunangst · on March 4, 2021

It is for fixed length records, which is why it also zeroes the remaining space.

ironmagma · on March 4, 2021

Arguably naming it with “str” is itself a security vulnerability.

tedunangst · on March 4, 2021

No argument. At best it is a "string to fixed record" function, hence the name, but it is not a string function.

Someone · on March 4, 2021

Yes. strncpy was intended for copying file names into a buffer that was only zero terminated when the name was shorter than the maximum length of a file name in Unix (14 bytes. See https://stackoverflow.com/a/1454071, https://devblogs.microsoft.com/oldnewthing/20050107-00/?p=36...)

You can also use it to overwrite part of an existing string, but I think that’s a side effect of the above.

spc476 · on March 5, 2021

Yes it was. On early Unix systems, each entry in a directory was bascially:

    struct dir
    {
      char name[14];
      int  inode;
    };

Adding a NUL byte might waste a full byte that could otherwise be used---remember, back when C was first developed, 10M disks were large and very expensive.

throwaway09223 · on March 4, 2021

In the interest of satisfying pedantry I think we can agree that strncpy() is intended to be a safer strcpy() for a subset of uses.

As you say, it does in fact obviate some errors. A value judgement as to which behaviors are more or less safe may be subjective, but the intent is not.

wahern · on March 5, 2021

strncpy was never intended to be a safer strcpy. It was created for a very specialized use case in the Unix kernel--copying a string-ish identifier to a fixed-size char field that only uses NUL termination if the identifier is shorter than the field size. Because of how the C language and the Unix kernel coevolved, it became part of the standard C library by default. I've seen it used for it's original semantics in only a handful of places, but in general it's almost always misused.

To be clear, strncpy does not guarantee NUL termination. It takes a C string as the source argument, but it doesn't write out a C string; it writes out a very esoteric data structure that is unfortunately easily confused with a C string.

By contrast, strlcpy was intended to be a safer string copy routine: https://www.usenix.org/legacy/event/usenix99/full_papers/mil... In particular, it was designed to be what people seem think strncpy is. Its return value semantics are controversial, though mostly only among the glibc crowd as every other Unix libc, including musl and Solaris, now provide it. But the semantics were designed based on experience in fixing old C code, and observations about how developers tend to write C code, not based on prescriptive theories about how people should manipulate C strings in C code.

cestith · on March 4, 2021

Still, unless you're writing something that has to be very low-level all the way through, it's better to use a string-handling library than the stdlib tools for strings.

stefan_ · on March 4, 2021

The first thing you do is not use any strings. You'll be amazed how much you can get done in languages that aren't so obsessively centered around stringified programming.

Animats · on March 4, 2021

It was a design decision of QNX that the kernel never uses strings. Everything the kernel handles is fixed length, except messages, and messages go from one user process to another. The kernel does not allocate space for them. I think they go that right.

There's a QNX user process that's always present, called "proc", which handles pathnames and the "resource managers", programs which respond to path names. But that's in user space, and has all the tools of a user-space program.

cestith · on March 4, 2021

There are absolutely things that can be written without string handling. Then again, there are things that can't. Not handling strings in the kernel probably was a good decision. That userland I'll bet has string handling though, to be useful to users.

cestith · on March 4, 2021

Most of the code I write has a spec of input and output being some form of text. Still, I tend to write that in languages that have safe string handling and drop into C only when the profiler indicates that's useful.

When handling strings in C, it's useful to use the string functions from glib or pull in one of the specifically safe string handling libraries and not use any C stdlib functions for strings at all.

There are a number of C strings libraries safer to use than the standard library, and many of them are simpler, more feature-rich, or both.

* https://github.com/intel/safestringlib (MIT licensed)

* https://github.com/rurban/safeclib (MITish)

* https://github.com/mpedrero/safeString (MIT licensed)

* https://github.com/antirez/sds (BSD 2-clause, and gives you dynamic strings)

* https://github.com/maxim2266/str (BSD 3-clause)

* https://github.com/xyproto/egcc (GPL 2.0, includes GC on strings)

* https://github.com/composer927/stringstruct (GPL 3.0)

* https://github.com/c-factory/strings (MIT licensed)

* https://github.com/cavaliercoder/c-stringbuilder (MIT licensed, does dynamic)

If one does use the C standard library directly for handling strings, the advisories from CERT, NASA, Github, and others should be welcome advice (CERT's advice, BTW, includes recommending a safer strings library right off).

derefr · on March 4, 2021

Yes, sure, write Unix CLI plumbing tools without strings.

pjc50 · on March 4, 2021

Until you want to communicate with the user, filesystem, or web.

Kaze404 · on March 4, 2021

Why are these functions deprecated in favor of others but not removed? I know in Javascript this can happen so as to not break older websites, but in a compiled language this shouldn't be a problem right?

badsectoracula · on March 4, 2021

Removing anything breaks existing source code that has been tested to work. After all just because something may lead to issues it doesn't mean it will always lead to issues.

Also in many systems the C library is linked dynamically and shared among all programs so even though a program is compiled it still relies on the underlying system to provide the function.

Finally i'm certain that if a C standard removes something, it'll be treated as the equivalent to that standard not existing. C programmers are already a conservative bunch without such changes.

pjc50 · on March 4, 2021

In a compiled language, when you remove a function it fails to compile. So removing them from the standard library forces code changes - they're not usually drop in replacements because the semantics were wrong in the first place.

Removing strcpy would make the Python transition look easy.

lalaithion · on March 4, 2021

The expectation of a C89 programmer is that a valid C89 program can be compiled for any machine that has a C89 compiler, and likewise for C95, C99, C11, and C17. Furthermore, it's expected that any C89 program can be compiled unchanged on any future version of C, and the standard library is part of the definition of the language, and therefore functions cannot be removed.

DaiPlusPlus · on March 4, 2021

At a certain point we have to say that it’s wrong for someone to expect C89 should still be the LCD.

And yes: it should all still compile, but none of that prohibits the compiler from issuing flashing red/yellow warning messages to your terminal for using footgun functions, preferably with uncomfortable audible notifications too.

All of this is silly though, because even in a strict C89 environment you can still have your own safe wrappers over the unsafe functions. I find that very little of modern programming has a hard dependency on ultramodern compiler features (e.g. you can theoretically build React/Redux using only ES3 (1998ish) if you like. Generics using type-erasure can be implemented with macros. Etc.).

Also, C89 conformance doesn’t mean much: you can have a confirming C89 system that doesn’t even have a heap - nor a stack for autos! (IBM Z/series uses a linked-list for call-frames, crazy stuff!)

cestith · on March 4, 2021

I think for new code in environments that support newer standards C89 shouldn't be used. For the increasingly rare places new C is being written where C89 is the latest tooling available and the code handles strings, a safer string library is nearly a must. I strongly recommend a safer string library no matter which standard, but I'm nobody.

When updating existing code C89 (maybe K&R) might be what's used so minor code changes won't undo that.

I tend to write most of my code in something higher-level than C and only resort to C or assembly in performance-critical sections as found with a profiler. Plenty of general-purpose languages have memory-safe strings built into the language, and honestly I keep hoping the Cisco/Intel safestrings library or something like SDS gets the standard library blessing one day.

DaiPlusPlus · on March 5, 2021

> I think for new code in environments that support newer standards C89 shouldn't be used

Why stop there? Don't use C. Use Rust!

SAI_Peregrinus · on March 5, 2021

Rust doesn't (yet) support all the targets C does. Mostly weird embedded stuff that needs it, and gccrs might solve that problem, but it's not always possible.

When it is possible, I certainly agree that Rust is nicer.

badsectoracula · on March 5, 2021

> And yes: it should all still compile, but none of that prohibits the compiler from issuing flashing red/yellow warning messages to your terminal for using footgun functions, preferably with uncomfortable audible notifications too.

As long as it is done like in recent versions of Visual C++ where i can disable that useless compiler output pollution with a #define, usually with a snide remark about Visual C++ right above it.

DaiPlusPlus · on March 5, 2021

> disable that useless compiler output pollutio

The compiler is trying to help you write better code - suppressing warnings should not be taken lightly.

badsectoracula · on March 5, 2021

This is not the same as the regular warnings though, what Visual C++ is doing isn't helping writing better code - it is suggesting to replace standard functions which are available everywhere in code where i actually know what i'm doing with functions that are available to Visual C++ and pretty much nowhere else.

As i wrote in another comment, something that may lead to issues isn't the same as something that will always lead to issues - e.g. if i check a string's length or actually calculate and allocate the necessary memory before calling strcpy it is perfectly fine and safe to use it, but Visual C++ doesn't know about that, it complains like some stupid greenhorn that read somewhere "never use gotos" and then is surprised when he sees some Linux kernel code with gotos everywhere for cleanup, thinking that those people writing the kernel do not know what they're doing.

maxlybbert · on March 4, 2021

The C Standard Committee doesn’t actually ship a compiler the way the people behind Java, Python, Lua, C#, Go, Rust, etc. do. The best they can do is deprecate particular functions and hope compiler writers and standard library writers follow along. But the compiler writers have vocal customers who insist the depreciations are overly-cautious.

syncsynchalt · on March 4, 2021

There are actually very few _dangerous_ functions in C (gets is the only one that comes to mind). Others have massive caveats (strncpy) but still have their place. Others are just known to have certain gotchas (strcpy, strcat, sprintf).

The reality of C is that if we deprecated every objectionable function in the stdlib we wouldn't have anything left.

a1369209993 · on March 5, 2021

> There are actually very few _dangerous_ functions in C

I think you mean there are very few functions that cannot possibly be used correctly (namely gets). Most C functions are dangerous - can lead to crashes and security vulnerabilties if used incorrectly - but that's just a expected consequence of using a language with no provisions for memory-safety.

> The reality of C is that if we deprecated every objectionable function in the stdlib we wouldn't have anything left.

Somewhat ironically, malloc is actually perfectly safe[0] - using the return value has some issues, but calling it is always[0] fine.

0: Assuming the OS-level memory allocator is sanely configured WRT overcommit, anyway.

gvx · on March 4, 2021

It's not great if you're working on a new release and you realize you also need to change something unrelated because the language changed under you, especially if it's just a bugfix but a high-priority one, or consider the head-aches caused by source-only distributions suddenly breaking for all your new users (or existing users switching to a new computer or spinning up a fresh VM).

sudomakeup · on March 4, 2021

Why wouldn't it be an issue with a compiled language?

Its nearly the exact same reasoning as "we're not going to break older websites"

Kaze404 · on March 4, 2021

In Javascript there's an expectation that Javascript written 15 years ago for Netscape will also work on Firefox 89. Is that also the case with C, wrt compiler versions? I've always assumed it wasn't.

int_19h · on March 5, 2021

It's very much the case, so long as you stick to standard C (the full limitations of which very few people are actually aware of).

Runtime backwards compatibility is similarly extensive on platforms that care about it. You can still take a DOS app written in ANSI C89 the year that standard was released, and run it on (32-bit) Windows 10, and it'll work exactly the same. In fact, you can do this with apps all the way back to DOS 1.0.

Kaze404 · on March 5, 2021

Wow, that's super interesting. Thank you :)

pix64 · on March 4, 2021

strlcpy() isn't standard. You have to provide your own implementation if you want your code to be portable.

cestith · on March 4, 2021

This is something git does. That's why they prefer it - it's available to git everywhere.

schlupa · on March 5, 2021

It's 4 lines of code to implement it. So even if it is not available on a platform (glibc mmmh because of Dreppers stubborness), it's no problem.

    size_t strlcpy(char *dst, const char *src, size_t dstsize)
    {
       size_t len = strlen(src);
      if(dstsize)
        *((char*)mempcpy(dst, src, min(len, dstsize-1))) = 0;
      return len;
    }

schlupa · on March 5, 2021

and

     *((char*)mempcpy(dst, src, min(len, dstsize-1))) = 0;

can be replace by

     ((char*)memcpy(dst, src, min(len, dstsize-1))[min(len, dstsize-1)] = 0;

if you don't have mempcpy

SideQuark · on March 4, 2021

These still lead to lots of bugs via off by one errors on lengths or other buffer misuse.