Hacker News new | past | comments | ask | show | jobs | submit login

A lot of average C programmers write high-performance linked-lists in C, while they could have used high-performance hashtables in Python/Ruby/Java with the same programming effort.

(and no programs are high-performance if they have a pointer bug that makes the program crash).

"The difference between theory and practice is small in theory and large in practice..."




Who writes high-performance code using linked lists? Linked lists are awful for performance. Every --- every --- large C project I've been involved in has a good generic hash table and a vector-style resizeable array.


If that really is true, it sounds like you don't have much experience in C, to be frank. C apps abound in fixed-size statically allocated arrays and linked lists with next pointers (often multiple pointers, if the objects are part of different lists) embedded in the objects themselves. The hash tables often aren't broken out until the poor performance shows up as a bottleneck.


My resume isn't too hard to find. There's even some C code out there with my name on it if you look hard enough.

Performant C code doesn't use linked lists. Linked lists shred caches and allocators. Your go-to data structure for storing resizeable arrays is... wait for it... the resizeable array. I call mine "vector.h".

(That performant code often doesn't want a hash table is also why I backported STLport's red-black tree to C specialized on void* in my second-to-last major C project, but I'm just saying that because I'm proud of that hack.)


The kinds of lists that barrkel are talking about are what I call hooked-in lists as opposed to the container style lists seen in the STL. That is, objects that already existed are linked together through fields inside the object themselves. Since you're probably already operating on the object, it's not quite as bad for cache performance as the container style lists. Further, since the objects already exist, the memory allocation overhead is minimal. All objects of that type need an extra pointer, but there's no allocation required to add something to the list.

This is how the Linux kernel maintains just about everything. It's also how I've implemented lists inside of an allocator: http://github.com/scotts/streamflow/blob/master/streamflow.h...


I get that void* lists are even worse, but randomly malloc'ing a bunch of struct-foo's and linking them with pointers is worse than keeping a pool of memory chunks of struct-foo size and maintaining a length counter, since hopping from foo to foo isn't likely to hop from page to page.


With hooked-in lists, you rarely need to navigate through the whole list. You usually just need to get the next one. Having implemented hooked-in lists, the backing for them was a fixed size of memory chunks, but that was one level of abstraction lower. The Linux kernel does something similar.

(Felt I should clarify that it's nothing about hooked-in lists that prevent you from navigating the whole list. It's just the algorithms you end up using to solve the problems that pop-up at that level of systems programming.)


Err... why does the existence of shitty "C apps abound" have anything to do with how performant C programs are written?


Even I agree with this. You can't just do whatever you want in C and still get good performance, you have to have a clue, just like with higher-level languages.

I think the original article should add, "performance tends to depend on the programmer, not the language".


Yes and these containers are built into C++ so no need to role your own.


The side-discussion we're having now is also why <list> and <slist> are a pain to use, and why Stroustrop's examples tend to use <vector>. Storing things in contiguous memory and incurring occasional copies is usually a better plan than forcing every read to follow a chain of pointers.


these kinds of varying opinions that seem like wide schisms, or maybe just ultimately are different problem areas, often come up when i'm reading about C, and that makes me think "i'll learn it more when people figure out how to use it" ... is that fair?


> "i'll learn it more when people figure out how to use it" ... is that fair?

No, because you could be learning that today and get a head start on your own future that way. It's not that hard to understand and once you've seen how the clock ticks from the inside it becomes a lot easier to build other clocks.

Lists vs resize-able arrays is a very old debate, both have their places, it all depends on the problem you're trying to solve which one is the most appropriate. If you code your stuff in a not too stupid way (say using a macro for your accesses) it's trivial to replace one approach with the other in case you're not sure which one will perform better.

Lists work well when they're short and you use a an array of pointers to the beginnings of your lists and maintain a 'tail' pointer for fast insertion in case inserts are frequent. If you use resize-able arrays you probably want to allocate a bunch of entries in one go, roughly doubling the size of your array every time you need to increase the size. That seems to waste a lot of memory, but since it is heap memory (allocated using brk) it (usually) won't be mapped to real memory until you use it, so the overhead is smaller than it seems.

I hope that's correct and clear :)


Only if you're comfortable never learning it.


If you need to scan/remove if appropriate? They're optimal for that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: