I guess it's not handling delete, which probably accounts for a lot of the difference.
EDIT: I see that it's implementing a left-leaning red/black tree. I see some caution about that approach here? Would love to get the author's opinion about this: http://read.seas.harvard.edu/~kohler/notes/llrb.html
You'll find delete on top of the todo list, it's more complex but not horribly so.
I skimmed the link, but from what I can see there are no issues there that outweigh the significantly simpler code for me. This may very well change once I get more experience with them, time will tell.
Anyone who does too much C would know that manually implementing associative array (dictionary) is not fun. I started to use two implementations for my work, both worked pretty well for the things they are good at (and all of them are as "generic" as you can get from C):
I prefer ordered sets with value semantics (which allows using as maps without overhead) to hash tables since I find the performance easier to reason about and often want ordering anyway. The easiest one to implement would be a binary searched vector, but comes with the disadvantage of unstable pointers because of reallocations. rbtrees are definitely trickier, though this one is left leaning which reduces the complexity somewhat (see PDF linked from README).
The rbtree implementation from jemalloc allows you to maintain the value semantics (it is an intrusive design as well). I believe it is also a left-leaning 2-3 red-black tree to be exact. The API is a bit harder to understand but once you get use to it, can be pretty easy to use. It also implemented one trick that embeds the color bit into the right child node pointer, otherwise you are paying a full 8-byte size penalty.
Yeah, that's usually where I draw the line, embedding data in pointers :) Bitfields and packed structs is another angle that I usually don't bother with. Too many details to keep in mind for my feeble brain.
If I were starting a new project in C, I would definitely rely on a solid foundation library like APR[0] or CoreFoundation[1] without reimplementing a bunch of fundamental data structure algorithms by myself. Glib is another alternative, but too bloated in my view.
When I use C, it's typically for performance reasons. I need to do funny things with pointers and bit-wise arithmetic that I can't do in higher-level languages because I'm operating in cases where a few more instructions per loop iteration matter. I usually end up re-implementing very fast, cut-down versions of data structures that work only on a tightly-bounded input set for maximum performance (e.g. very fast hash table for IP addresses recently). This approach doesn't work for the cases where C really shines.
I wouldn’t go as far as to say it’s “developed in the open”. It’s significantly more opaque than Swift or WebKit, for example; I’d describe it as “accepting of patches”.
These structures are intended to be used like the Linux Kernel's data structures, via `container_of` (Though in the library the macro is called `c7_baseof`. But it does the exact same thing as `container_of`).
If you don't know about `container_of`, the idea is that it allows you to take a pointer to an inner field, and transform it into a pointer to an outer 'container'. For the linked-list, this means that if you want to make a list of `struct car` objects, you simply embed a `struct c7_list` into `struct car`, pass the address of that internal `struct c7_list` object to the list manipulation functions provided here, and then use `container_of` (Or `c7_baseof`) to transform the pointer to the `struct c7_list` into a pointer to the containing `struct car`.
In general, I think `container_of` is one of the best things for writing C code, and it provides a surprising amount of flexibility and nice patterns you can use. It does have the obvious issue of type checking (You could have a `struct wheel` that also has a `struct c7_list` on it, and accidentally add ti to the same `struct c7_list` that has `struct car`), and if your `struct car` needs to be on more than one list at a time it can get confusing knowing which `struct c7_list` entries belong to which list, but in general I'd call it a big net positive.
It's also possibly worth noting, you can use this to implement a non-intrusive list if you want, just simply make an object that wraps a `struct c7_list` and a void pointer (Or a typed pointer, if you want), and then write some extra logic around it for allocating nodes and such.
What a neat trick for creating a fairly generic intrusive data structure. I’d love to have (or put together) a little compendium of clever, non-obvious programming tricks like this.
Back when I independently invented this due to being self-taught, I didn't know about offsetof(), and I just put the collection at the start and cast the pointer. Learning C is surprisingly obscure even today.
Yeah, even just 'upcasting' is surprisingly effective as well, it's really just inheritance. The lack of being able to embed more than one 'thing' is painful though. And if you're not careful you can run into ugly strict aliasing issues like `struct sockaddr` has (Though a quick `-fno-strict-aliasing` can make all your problems go away...)
And I would agree, I find a surprising number of people think C is 'easy' because it doesn't contain that many built-in constructs, and then get completely stuck when they try to use it. To get really good at C, you need a very solid understanding of common and effective patterns you can use. You can still write C without such things, but it quickly becomes a mess of random patterns and inconsistencies. And unfortunately, I can't really think of one source you can really point to for learning them (though admittedly I haven't really looked into it tons).
Casting around between the type of the first struct member and the type of the overall struct is okay under strict aliasing, but unfortunately sockaddr violates it because sockaddr isn't a member of sockaddr_in. Not that I knew this at the time, it was just a mysterious rule.
Yeah, I used to spend a lot of time reinventing OOP in C back in the days. Part of the problem is I learned C++ long before I took a serious look at C.
These days I definitely prefer embedding/baseof since it's more explicit.
When I need polymorphism, stuffing some function pointers in a struct usually works well enough.
Using `container_of` lets you avoid having to use void pointers. And the macros like foreach are surprisingly readable once you give them a shot, and they're simple enough they don't really have any big gotchas in how they work.
Not every `container_of` data structure is perfect, but in general I'd say they're just as nice to use as any other language's data structures (Though `container_of` is flexible in different ways). The only big downside is that it not part of the C standard, leading to more than a few implementations, some more featureful than others. This one is probably the "least featureful" version I've seen, which may be intentional. The Linux Kernel's implementation has a lot more utility things like looping in different ways and different types of list manipulations. Some are highly useful, others not at much.
Intrusive lists are very different beasts than lists by value.
There are less allocations involved (perf and points of failure), values can be inserted on different lists without new allocations, deletion is O(1), elements can be heterogeneous (different sizes and types), etc
etc
I seldomly use linked lists, but most of the time i prefer them to be intrusive. There of course are intrusive list implementations on C++.
They were talking about taking a value off of one list and adding it to a different one, which doesn't require an allocation, but the entity is never on more than one list at a time in that scenario.
That said, you can add an entity onto multiple lists if it contains multiple nodes embedded inside. You have to keep track of which nodes are attached to which lists though (Which generally just means being consistent on which you use where). If you give them decent names, then it's not usually a problem, but it can sometimes get a little confusing if you're not careful.
It's all about control and flexibility. If you don't need it you shouldn't pay the price. Having spent quite some time trying to bend the STL to my exact requirements, I don't mind so much.
Sorry, didn't get around to that yet as it's trivial and used enough internally that I figured it has enough test coverage for now. The c7_list struct is embedded in the items, rbpool [0] uses it to keep track of nodes [1] and slabs [2] for example. Note that the same structure is used as root.
Because I always end up switching on the return value, and switching on enums is a lot nicer than ints. Especially when there is no agreement on what values are returned, LT is mostly -1 but is often specified as simply less than zero.
When you have a finite set of cases, enum is just the right solution period. I have no idea why people still insist on using ints.
From a quick glance; it's hashed (as in not ordered), not pool allocated and intrusive as opposed to offering value semantics. Couldn't get more different.
There's nothing new under the sun, I never claimed novelty. This is simply how I would do it, take or leave.
The descriptions of the containers contain more of the kind of information you're looking for, it's a bit much to expect the caption to explain it all.
It's as real world as it gets for now, the test suite should give you an idea.
I will typically add comments where I feel the code doesn't do a good enough job of explaining itself.
I've only glanced at these; can someone more familiar explain how these are different from the BSD "queue" macros (and their offspring, like the tree headers)?
But actually, to me that means a thin, light API with a couple of functions and some macros. Does what it needs to do, no more. Error handling is basic if even provided, usually via return of numerical error codes.
That's correct, it does mostly what you tell it to and stays out of the way. If you want error checking beyond the test suite you have to do it yourself. The issues people have with C comes mostly from treating it like any other language. C is all about control and flexibility.
Further, special care is taken to reduce the number of allocations, often using pools/slab allocators. And value semantics, or rather byte array semantics, which means items are treated as blocks of N bytes rather than pointers/references.
Definitely cool, although the spirit of C is to be overly paranoid about portability and not link to a library, God forbid. (Single pairs of h/c files like this are usually okay with people)
I sort of agree with the first part, though some people actually enjoy having this level of control and flexibility even when they don't absolutely need it.
I find chatty generic code where everything is nailed to the floor ugly, Rust, C++, Swift, Java etc. So I guess it depends on who you ask and their experience.
There is nothing wrong with macros, though I definitely prefer the Common Lisp kind.
If you pass -lcodr7 to the Unix C linker, it would typically go looking for a file called libcodr7.so. You could perhaps view this as mainly historic behaviour.
Some libraries like Cairo seem to be generally known as "Cairo", not "Libcairo". But there is also a trend for libraries to have very straight-to-the-point names. If the PNG parsing library is called libpng.so and the XML parsing library is called libxml.so, it would be confusing to call these libraries just "Png" and "Xml".
As an other example, the two C programs "systemd" and "libsystemd" live in the same repo, but are really two different beasts: libsystemd might be useful to you even on a system which doesn't use systemd.
The code shown here seems to indicate that insert has at least four cases, and delete has six: https://en.wikipedia.org/wiki/Red%E2%80%93black_tree
But the code I see here is very small: https://github.com/codr7/libcodr7/blob/master/source/codr7/r...
I guess it's not handling delete, which probably accounts for a lot of the difference.
EDIT: I see that it's implementing a left-leaning red/black tree. I see some caution about that approach here? Would love to get the author's opinion about this: http://read.seas.harvard.edu/~kohler/notes/llrb.html