I disagree with quite a few of them. The title should be "Principle for C programming ON UNUX BASED SYSTEMS". C programming is quite a bit wider that this, and some of the typical points here are no-no if you want to write /portable/ C.
For example, "Do not use fixed size buffers". It's all very fine, but 1) it can be exploited as well if someone managed to fudge the size you are going to allocate, and 2) on some platform, you don't have/want malloc(). So it's a lot better to have a fixed buffer and check the sizes carefully before copying into it.
Another one I dislike (but it's personal preference) is the 'use struct for pointers to structs' -- well, nope, I don't like that, it's unnecessarily heavy. I typedef my structs all the time, and call them something_t, and * something_p. It's easier to rework, rename, search for and it's quicker to type so makes the source code lighter to read. I know it's not popular, and for example the kernel guidelines agree with you, but I don't.
As for "no circumstances should you ever use gcc extensions or glibc extensions" well sorry, I also disagree here. I love the 'case X..Y:' syntax for example and it's been around for about a million years. It's not because the C standards prefer adding idiotic syntax instead of useful ones like this that I'm going to stick along and limp when there is a perfectly nice, clear and very readable alternative.
Another one I love but can't use are the sub-functions. Now what also would have been a lovely extension if the runtime had been perfected a bit, but it was never 'finished'. Speak of easier code to read when your qsort() callback is listed /just above/ the call to qsort().
Another extension is of course the __builtins that you actually do need on modern systems. Like memory barriers, compare and swaps, ffs, popcount and so on. Of course I can have an explicit function to do it (in the case of the last 2), but that's the sort of things that ought to be in the C library anyway. So I'll use these, thanks.
As far as the rest of the article about the process, your code reviewers and so on, in many places and on many projects (open source ones are a case in point) you don't have the freedom/time to do that. The rule is ' do as best as you can' -- and that ought to do it in many cases.
I used to typedef my structs but I stopped doing that a while ago. Even if something is an opaque type it's still useful to know that it's a struct and not some random typedef for an integer or similar. You might not be able to copy it for instance, and if you can you might not want it if it turns out to be a few kilobytes in size and harms performance.
I don't agree with everything in the kernel coding style but it's mostly reasonable and I think their approach to typedefs is perfectly reasonable.
And why do you use separate typedefs for pointer types? That's borderline obfuscation IMO, if I'm dealing with a pointer I want to know it. If it's in order to save a single keystroke it really isn't worth it IMO and I don't see how it helps reworking anything.
I agree with the fixed size buffer thing though. There are plenty of situations where fixed size buffers are completely fine, that's way too broad as a general "principle". I guess the idea is "make sure you don't reserve less memory than you need" but that's rather obvious, isn't it?
> 'I typedef my structs all the time, and call them something_t, and * something_p’
I wish people would stop perpetuating this particular naming convention. Its in violation of POSIX which specifically reserves the entire *_t ‘namespace’. Obviously its fine if this is done on a system or environment for which this is irrelevant, but its best avoided otherwise.
The reality is that if you are creating a library you probably should prefix your types and functions anyway. And rely on the prefix to minimize collision probability. So it doesn't really matter if you put _t and the end of your type aliases. You will probably not get the collisions anyway. Unless POSIX is going to suddenly introduce mylib_array_t or something.
No, but your compiler MIGHT decide in a future release that it's a whole lot faster to ignore the header files for standards types and definitions and just copy a pre parsed version of the struct into the symbol table when the header is included. It might look at the _t and decide nope, I don't have a definition for this so it's an error, despite your own definitions.
This probably won't happen. But if it does you don't have any grounds for complaint really.
The compiler to do that would also need to drop C standard compatibility (section 7.1.3 of C99). Which is probably a good reason to complain and to just stop using that version of this purely theoretical compiler.
> For example, "Do not use fixed size buffers". It's all very fine, but 1) it can be exploited as well if someone managed to fudge the size you are going to allocate
Fuzzing really underscored just how terrible "dynamically sized buffers" are to me as well. Even if your logic is perfectly correct (e.g. no possibility of buffer overflows), something as simple as deserializing a length-prefixed array needs a quota or cap. It's not enough to handle malloc failing: Someone will successfully allocate 1.9GB via your 32-bit deserialization code, spreading the actual allocation failures across the rest of your codebase - including all 3rd party and system libraries - most of which almost certainly have at least one oversight in OOM condition error handling, invoking all kinds of potentially exploitable undefined behavior.
I prefer attempting to find an O(1)-space algorithm over dynamic allocation, which I suppose means using fixed-size buffers anyway.
In my experience it is surprising how many programmers will --- regardless of language --- tend to settle for O(n)-space or higher algorithms when just a little more thought would produce a simpler O(1). Line numbering is a common example of this.
Take, for example, displaying the context source code for an error when you know the file and lineNo of the error.
A naive approach would be to read the entire file into an array and then to output the lines [lineNo-context..lineNo+context], possibly with line numbers prefixed. This is O(N) memory with regards to the size of the file being processed. A 8GB source file will crash your program when built for 32-bit.
Another approach is to read and discard until you've discarded lineNo-context '\n' characters, then copy chunks from your source directly to the output until you've read another 2*context+1 characters. This is O(1) memory (strictly speaking, you could do it byte-by-byte with an integer counter or two - practically you might have a fixed size buffer to read/write faster) and would allow you to handle even 1TB files sanely.
To be fair, I'm often guilty of the naive approach myself :)
>For example, "Do not use fixed size buffers". It's all very fine, but 1) it can be exploited as well if someone managed to fudge the size you are going to allocate, and 2) on some platform, you don't have/want malloc(). So it's a lot better to have a fixed buffer and check the sizes carefully before copying into it.
This is reasonable. I mentioned measuring fixed size buffers if you have to use them, but edited it out. I think all programming guidelines should be taken with a grain of salt and adjusted as necessary when sanity demands deviations from them.
>Another one I dislike (but it's personal preference) is the 'use struct for pointers to structs' -- well, nope, I don't like that, it's unnecessarily heavy. I typedef my structs all the time, and call them something_t, and * something_p. It's easier to rework, rename, search for and it's quicker to type so makes the source code lighter to read. I know it's not popular, and for example the kernel guidelines agree with you, but I don't.
It's easier to rework, rename, and search for? How so? The problems with it is that you should be able to easily differentiate structs and scalars, becuase you should treat them differently. Same for pointers. You should generally be passing structs by reference, not by value, for example. I don't appreciate hiding information about the nature of your types for the sake of ergonomics. The readability gain trumps the extra quarter-second of typing each time you use the type.
>As for "no circumstances should you ever use gcc extensions or glibc extensions" well sorry, I also disagree here. I love the 'case X..Y:' syntax for example and it's been around for about a million years. It's not because the C standards prefer adding idiotic syntax instead of useful ones like this that I'm going to stick along and limp when there is a perfectly nice, clear and very readable alternative.
gcc is not the only compiler in the world. You can't be crying out about the Unix-specific nature of this article and then favor a dependence on gcc.
>Another one I love but can't use are the sub-functions. Now what also would have been a lovely extension if the runtime had been perfected a bit, but it was never 'finished'. Speak of easier code to read when your qsort() callback is listed /just above/ the call to qsort().
Ugh. Just make a static function.
>Another extension is of course the __builtins that you actually do need on modern systems. Like memory barriers, compare and swaps, ffs, popcount and so on. Of course I can have an explicit function to do it (in the case of the last 2), but that's the sort of things that ought to be in the C library anyway. So I'll use these, thanks.
Why do you need these? If you must, see my comments on abstracting non-standard/non-portable/etc code.
I tend to find fixed size buffers easier to conceptualize than dynamically allocated buffers. Tend to find that even with most modern language C#/Java I still see developer use fixed size buffer even within enterprise apps.
So I think there is something to be said with the whole movement of books/expert advice advocating against fixed size buffers/enum's to use more dynamic memory models when people in the trade are still using enum and static sizes. Anecdotally I've seen it more use of it now than any time in the past.
Worked on a number of different projects clearly from a maintence perspective. From C,C++ mostly around VBS2/VBS3 (Operation Flashpoint) and VSS Simulation systems. Moved on from C/C++ simulation market when the money wasn't really in it unless you're the sales people or upper management. Kind of drives it home when you bike into work and all management are driving BMW ect.. This was a startup that I took a major pay cut.
After that phase moved into web development. Tomcat/Java and Servlet containers and C# and F#. I've spent about in total 12 to 15 year's software development in this field.
When I first started programming I learnt from Quake/Doom Engine from John carmack. I can attest his software style and technical finesse was something to be admired even to me today. There was something to be said to have a look at good written C code that was really straight forward and easy to follow. At this time OOP/Java was starting to become a mainstream in the market place and most of hype and push was from marketing and also compiler writers/contracts wanting to push their contracting sales pitch at Universities/Schools and management.
Granted the whole OOP/Java was one giant experiment that paid off for other people but never really paid off for me. In my younger years I remember not getting OOP. When I say NOT getting it, I never really felt there was a clear explanation of what OOP was and my gut feeling was the whole theory vs practicality didn't work for me. So seeing that I had to time to waste I started research and reading as many OOP books as I could find. In total I've read about 60 different OOP books and research papers. After all this I still conceptually don't GET OOP. The whole thing a act in cognitive dissonance for me.
You probably GOT OOP but I never did. So more power to you.
My only anecdotal (Single data point) experience has been maintaining a large range of different type of software over the years. From procedural code, to functional programming all the way Java, and Spring (Inverse Version Control).
A lot of smart people than me in in the 90's/2000 created these massive taxonomy systems. Multiple inheritance from anything from 10 to 15 layers deep. Excessive use of design patterns and over-use of meta-programming where meta-programming (really) didn't need to be used all the time (look at you C++). I remember night tearing my hair out because these things TOOK up the fucken wall. Litterally the taxonomy's was just that large that had so many inter-layered derived class calling Derived classes that called in turn called the base classes then would in turn call the Derived classes, that would then in turn call some event handler. Yes you get the picture.
It did take about 3-4 years of 10 hour nights to finally get this massive inheritance system (Operation Flashpoint btw) where you could be productive. Then two months after I quit.
I then moved over to Java, and web-app development. Where I doubled my wage overnight. The Java project's designs had also drunk the kool aid also during the 2000's. So yet again I was faced with 7 to 8 layer (Single) inheritance system. Each consecutive developers building on this inheritance system. As you can imagine, the cost/turn around time and budget for such a system caused then to drop the system. They settled on Spring and dependency inversion control.
Granted most of the code I see teams writing is a throw back to procedural code. You have your controller that processes requests that in turn passes it off to services. That in turn accesses Repositories. This is inclusive of your typical MVC model, but most of the services, and code now I see is just a singleton instance of the class that just acts as a namespace for functions.
Prior to this in java land, all enum's and statically allocating arrays where bad and unclean. DIK_CODE's or id identifiers where discarded into the rubbish. Replaced by developers using inheritance type system to replace such crude (primitive) approaches of programming! Looks at all those enum's lets replace them using the type system of the language!
So for example instead of writing.
#define CUSTOMER_PAGE 1
#define CUSTOMER_CHECK_OUT 2
#define ORDER_FORM 3
You had developers using the type system in its replacement.
class Validator impliments IValidator
class Orders extends Validator
class CustomerPage extends Orders
class CustomerCheckOut extends Orders
class OrderForm extends CustomerPage
Somebody is going to come here and proclaim `they were doing it wrong`. I just shrug my shoulders and say it doesn't really matter I'm stuck looking at this mess.
For today, I see developer and new projects coming online where they've moved away from such inheritance model's and moved to a more procedural approach and flat design. I do welcome this move, and cheer for the faster turn around time.
It takes me on average when faced with new projects that use massive inheritance structure's 4-5 days to get my head around the system (If at all). The same or more complicated systems using procedural old statically defined array, enum's and defines take about 1 hour to find and isolate the problem and fix.
Other teams may have had success with large scale OOP code-bases. Though I've mostly found them to be error prone, more riddled with edge case situation and bugs. They're a nightmare to extend (counter to the whole notion of the sole reason for OOP), than your typical on this ID do this approach. It's something to be said I've recently started doing development work on mmpeg and x264 code base and its a please to be up to speed and doing some productive work within 2 to 3 hours.
It was a long winded written rant. I still use OOP but its more or less glorified name spaces with functions. Today I rather work on procedural code than OOP code. It may have bugs, it may have their own little quirks but its like a old rusted vehicle that still keeps going.
For example, "Do not use fixed size buffers". It's all very fine, but 1) it can be exploited as well if someone managed to fudge the size you are going to allocate, and 2) on some platform, you don't have/want malloc(). So it's a lot better to have a fixed buffer and check the sizes carefully before copying into it.
Another one I dislike (but it's personal preference) is the 'use struct for pointers to structs' -- well, nope, I don't like that, it's unnecessarily heavy. I typedef my structs all the time, and call them something_t, and * something_p. It's easier to rework, rename, search for and it's quicker to type so makes the source code lighter to read. I know it's not popular, and for example the kernel guidelines agree with you, but I don't.
As for "no circumstances should you ever use gcc extensions or glibc extensions" well sorry, I also disagree here. I love the 'case X..Y:' syntax for example and it's been around for about a million years. It's not because the C standards prefer adding idiotic syntax instead of useful ones like this that I'm going to stick along and limp when there is a perfectly nice, clear and very readable alternative.
Another one I love but can't use are the sub-functions. Now what also would have been a lovely extension if the runtime had been perfected a bit, but it was never 'finished'. Speak of easier code to read when your qsort() callback is listed /just above/ the call to qsort().
Another extension is of course the __builtins that you actually do need on modern systems. Like memory barriers, compare and swaps, ffs, popcount and so on. Of course I can have an explicit function to do it (in the case of the last 2), but that's the sort of things that ought to be in the C library anyway. So I'll use these, thanks.
As far as the rest of the article about the process, your code reviewers and so on, in many places and on many projects (open source ones are a case in point) you don't have the freedom/time to do that. The rule is ' do as best as you can' -- and that ought to do it in many cases.