Liblfds, a portable, license-free, lock-free data structure library written in C

preetamjinka · on May 31, 2016

Related to this is Concurrency Kit [0], which is BSD-licensed, written in C, and provides concurrency primitives and data structures.

[0] http://concurrencykit.org/

liblfds · on May 31, 2016

I know the author of CK, and he's very, very good. Better than me, I am sure of that!

I believe CK offers a delete-supporting hash, too, so it has a full O(1), which liblfds does not yet have.

pekk · on May 31, 2016

While your humility is refreshing, why wouldn't people just use CK instead of liblfds?

liblfds · on May 31, 2016

The libraries are not identical; I'm not particularly up to speed with CK, I'm not a user, but there's so much that can be done with any data structure library that there will be things liblfds has done which CK has not, and naturally vis-versa, such that for some one of the libraries will be a better (or possibly the only) choice.

What those things might be I cannot much tell you, where I'm not particularly au fait with CK.

chillacy · on June 1, 2016

Might want to get acquainted then... if you've got competition for your library and you expect users to pick you, you should know why you're better IMO: http://concurrencykit.org/man.html

It's free and open source but in some regards it's similar to product design/marketing. If you don't pitch it, some reviewer on the internet will make a stackoverflow post about it, and they won't know your strengths as well as you do.

A blog I read recently outlined some things to figure out: https://blog.orangecaffeine.com/the-art-of-saying-no-db012a2...

From the other comments here it sounds like some unique things you have are:

* versioned datastructures that won't change during library upgrades

* self-documenting code style

liblfds · on June 1, 2016

It depends though where you're coming from.

I write the library for the pleasure of it. I'm aiming to create something I think beautiful - for better or for worse, from other people's point of view :-)

I'm not actually looking for it to be used as such. It may well be, if my idea of beautiful is any good, it will come to be used, but that's... it's like a side effect.

When you paint a picture, or sculpt a sculpture, or write a song, you're not competing with other artists to make what other people will judge to be the better picture, sculpture or song.

sargun · on May 31, 2016

How does ConcurrencyKit compare to this? I've used concurrencykit a bit, and it's great. The only problem I have is that sometimes the documentation is lacking.

liblfds · on June 1, 2016

I went full-on, for documentation.

Many, many years ago, back at Uni, a friend of mine received 100% for his Uni project. I could code many years before I went to Uni, but I was shallow in all other ways - all the ways in which we actually produce quality, reliable, usable software. He showed me some of his project, and I saw his docs. What he'd done was absolutely fantastic - a page per API function, documenting the standard set of information that you need to use an API function. The approach you see in all the major projects today, the MS API docs, for example - this from a twenty year old!

That experience permanently set my expectations for documentation, and is reflects in the library docs. One page per function, per enum, per struct, etc, etc, etc. It's ALL documentated. Painstakingly, exhaustively.

There's absolutely no point in writing an API which people cannot use, for whatever reasons.

When I come to make a release, and I'm done (or rather, I think I'm done) with development, I write the docs.

This induces, always, a large final round of development work, as I come to realise, from attempting to document them, that some of what I've done is simply too complex or too awkward or too inherently confusing to use, and has to be rewritten.

I am of the view from my experiences that all developers should write their own documentation, because of the improvement in code quality that is induced by the code changes which come from keeping the docs comprehendable.

protomyth · on May 31, 2016

That is an amazingly nice home page for a project. No clutter, to the point, all the info I needs, and loads fast.

liblfds · on June 1, 2016

Thankyou! I hope you're right :-)

ddevault · on May 31, 2016

I found this a while ago but the compatability matrix turned me away very quickly. This is one of the least portable C libraries for data structures I've ever seen.

benbenolson · on May 31, 2016

Yeah, they refer to it as "portable", but then go on to show that they haven't tested it on anything except x64 and ARM32; not only that, but it also depends on what you use to compile it (kbuild, GNU make, etc.).

liblfds · on May 31, 2016

That's not the case.

What you get depends only on the processor type.

What does vary is whether or not any given platform has actually been tested - i.e. the code compiled on that platform, and the test programme run.

Currently, only ARM32, MIPS32 and x64 platforms are available, and so the test programme has only been run on those platforms.

Finally, I think you may have been misled in the same way another reader further down was; note that the matrix is the matrix of toolchains which are supported out-of-the-box, i.e. where the library ships with build files for those platforms.

It is NOT the matrix of the ONLY tooclhains which can be used.

rwmj · on May 31, 2016

If two people misunderstand your homepage in the same way, it may be time to change your homepage.

liblfds · on May 31, 2016

Yes (and yes).

==

(Now done).

liblfds · on May 31, 2016

Interesting. Why did it turn you away? do you run code on many of the given platforms?

ddevault · on May 31, 2016

x86, yes, and the limitations on the toolchains you can use is weird and bad.

liblfds · on May 31, 2016

When you say "limitations", I'm not sure what you mean. I wonder if there's a misunderstanding here - the matrix describes the toolchains which are supported out-of-the-box, i.e. build files for them are provided, and they have all been tested and should work.

It's not a matrix of the ONLY platforms which CAN work. Is that what you're thinking? Other platforms might well need the porting abstraction layer to be implemented (the Intel compiler might offer a different set of atomic intrinsics, for example), but that's what it's there for.

ddevault · on May 31, 2016

It's not very clear, then. Reading the page leads me to believe that if x is my toolchain then y won't work, and the idea that such a thing could even be possible for your library speaks very poorly for it. I suggest tweaking it if this is not the case.

In any case, I did re-review the library today and I'll still pass on it because I'm very opinionated about C and this library doesn't match my opinions.

And I would like to see ARM64 support before I depend on this sort of thing, too, for what it's worth.

liblfds · on May 31, 2016

Well, to be fair! the page does say this immediately above the matrix - "The following toolchains and processors are supported out-of-the-box"

However, this is certainly not fully explanitative, and of course is just one line immediately above a large and eye-catching matrix.

I will change it, to make it more clear, and more eye-catching.

Regarding ARM64, it's something I want very much. I will finally have time for it now 7.1.0 is out. I Googled a bit just the other day and there are Pi-like ARM64 dev boards out there now, so I'm going to get one.

Which leaves me needing a SPARC dev board, and a POWERPC dev board, and an x86 dev board, and a MIPS64 dev board, and a... :-)

smw · on May 31, 2016

Wonder if you can use something like this:

http://elinux.org/Test_Lab_Architecture

to test on someone else's hardware?

liblfds · on May 31, 2016

I hope so.

I applied to the GCC compiler farm late last year, but received a reply only after so long a time that I'd deleted the key pair I'd submitted because I thought nothing was going to happen - and my efforts to get back in have yet been unanswered.

The GCC compile farm isn't for intensive testing or benchmarking, though (which is what you need for lock-free, to try to catch race conditions) - that test lab sounds more interesting.

I have to say though having your own dev boards which are yours and only yours and which you can hammer endlessly is very convenient. It's actually just occurred to me I can leave all the devs boards permanently running the test suite...

OTOH, I don't think I can get small, portable dev boards for all the platforms I care about, so at least for some architectures, a third-party farm is probably the only way.

Probably the most pressing requirement though is access to a Mac. Right now there are no build files for that platform.

monocasa · on May 31, 2016

By the way, the Pi3 had ARM64 cores.

liblfds · on May 31, 2016

I didn't know! thanks! I have a Pi 2 right now for ARM32, I'm completely happy with it.

TD-Linux · on May 31, 2016

There is no supported 64 bit kernel yet, though.

ape4 · on May 31, 2016

I guess they are using very specific machine instructions to achieve their ends.

liblfds · on May 31, 2016

Yes and no.

On Linux, the library uses the GCC atomic intrinsics. As such, the library funamentally supports any platform GCC supports with its atomic intrinsics. So the instructions are indeed machine specific, but they're provided by the compiler through a standard API.

Of course, actually getting something to work means actually compiling in on a given platform and seeing the test programme pass.

ape4 · on May 31, 2016

Thanks for the info. This is the first time a library has replied to me ;)

liblfds · on May 31, 2016

Google's huge investment in AI. You'll be hearing a lot more of it in the future :-)

("Do you REALLY want to use that API? I think you should space your comments out a bit more. Hey, your battery is getting low! BTW, I've just emailed your entire source base to Google for assimilation.")

nneonneo · on May 31, 2016

This library claims to be license-free, but some of the modules are documented otherwise. For example, the "Queue (bounded, many-producer, many-consumer)" module (http://liblfds.org/mediawiki/index.php?title=r7.1.0:Queue_%2...) has the following license note:

"Dmitry Vyukov granted email permission for the idea to be used and implemented in liblfds, under the MIT license."

Similarly, the unbounded queue has an "unclear" license, rather than the "standard liblfds" license.

It's not clear to me how the library can be license-free when one component is very clearly MIT-licensed, and when others have no clear license status. I would be quite cautious using this code in production or in a lawyer-sensitive environment until the licenses are clarified.

liblfds · on May 31, 2016

Good point.

When the library was initially released, back in 2009, and until recently, it's been license-free, or at least thought to be. Recently with the addition of Dmitry's design, and a closer analysis of the possible licensing terms, the picture has become more nuanced - hence the inclusion of per-data structure information regarding licensing and what happened is of course classic computing - there's now a regression bug in the one line description of the library! Where the "license-free" term has been in the one-line description for so long, it was overlooked.

I have to figure out what to do about this now!

_bz2r · on May 31, 2016

It looks like you are part of the project. Can you clarify what you mean by "license-free"?

If there literally is "no license" and something isn't public domain, then it means you aren't authorized to use it (because the authors have a copyright, even if they have thus far chosen not to assert it). And in order to be public domain, it must be explicitly put there. That may be the case, but your page would be clearer if you said so.

And if not, then a license must exist, and you should elaborate it.

liblfds · on May 31, 2016

My view is that all these, ah, parameters for licensing - what is or is not, copyright, public domain existing in some countries and not others, etc, etc - none of these are things I have chosen. They are being imposed upon me. This is not okay. I feel no need to go along with it all, just because other entities force it upon me.

This is why I say license-free. It is rejection of these imposed parameters. I know what I want here - for other people to be able to use the library as they wish for whatever they wish, simple as that. No charges, no catches, go use it.

However, in practise, companies and individuals must abide by all this stuff, and so you see about the code being in the public domain, a license of your choice being granted if you wish for one, etc. This is to address the practical problems involved.

Regarding public domain, I think you've overlooked where it is explicitly placed in the public domain - it's the end of a couple of sentences, so it's understandable;

"If however for legal reasons a licence is required, the license of your choice will be granted, and license is hereby granted up front for a range of popular licenses : the MIT license, the BSD license, the Apache license, the GPL and LPGL (all versions thereof) and the Creative Commons licenses (all of them). Additionally, everything is also placed in the public domain."

hp · on May 31, 2016

Legally "all rights reserved" is the default, until/unless you follow the legally appropriate ceremony to disable it. If you get the ceremony wrong in some way then you're back to "all rights reserved" in practice, regardless of intent.

CC0 contains the lawyer-developed appropriate ceremony to do what you want here, including handling weird jurisdictions. The way you've phrased it will confuse the lawyers and may or may not "work" legally speaking.

There are instructions on the CC site about how to apply CC0 to a project. CC0 was developed for precisely this purpose.

el_isma · on May 31, 2016

Maybe you need this license? http://www.wtfpl.net/

liblfds · on May 31, 2016

It's not very polite :-)

"Licensing with attitude!"

_bz2r · on June 2, 2016

Ah. Your last sentence answers it all. Thank you.

huhtenberg · on May 31, 2016

For the lazy, a link to the latest release - https://github.com/liblfds/liblfds/tree/master/liblfds/liblf...

PS. Code has quite a bit of that macro-heavy OpenSSL vibe to it. Definitely not everyone's thing, but it is very readable - https://github.com/liblfds/liblfds/blob/master/liblfds/liblf...

liblfds · on May 31, 2016

Macros are heavily used to avoid making function calls.

I'd glad you find it readable - I hope you're right :-)

pantalaimon · on June 1, 2016

But wouldn't a static inline achieve the same thing while keeping type-safety?

liblfds · on June 2, 2016

Yes.

As I understand it, this requires either that every to-be-inlined function is present in the public header file, so it can be #included into every user source file which wishes to use those functions, so they turn up in the object file and so can be inlined, or we rely on the compiler and linker to inline across object files, which is not commonly supported.

It depends upon the compiler having inlining support - the library is targetting arbitrary compilers on bare metal platforms, so it's not necessarily a valid assumption.

Currently, there stil remain two uses of inlining, and I am aiming to get rid of them too in the next release. This also removes a macro from the porting layer, making it simpler.

cbd1984 · on May 31, 2016

They say it's public domain so it is, indeed, license-free. AFAIK, the public domain dedication makes the rest of the section chaff.

wongarsu · on May 31, 2016

In many jurisdictions there is no concept of public domain, and thus declaring your work as public domain has no effect in many parts of the world.

The author granted a license for all Creative Commons licenses, which includes CC-Zero (public domain where that's possible, the closest possible equivalent everywhere else). That's the real part that makes the rest of the section unnessesary.

Twirrim · on May 31, 2016

SQLite is public domain, but they've made licensed versions of it available for people to buy, because:

a)public domain is a concept that is far from a world wide legally recognised concept (its roots are in old English law)

b) public domain scares company lawyers and it's easier to give them an actual license :D

masklinn · on May 31, 2016

> public domain is a concept that is far from a world wide legally recognised concept

Right, the "continental" (franco-german) IP school having moral rights which the anglo-saxon IP school doesn't have[0][1].

These moral rights tend to be perpetual, non-waivable, non-assignable and inalienable, as a result the concept of public domain in the anglo-saxon sense doesn't really make sense, instead "public domain" is a shorthand for works whose economic rights have lapsed (that is in fact the original meaning of the term as coined by Alfred de Vigny) or works which don't have associated economic rights in the first place (e.g. mathematical proofs, parliamentary transcripts) rather than an actual legal concept, and an author can only waive their economic rights through licensing.

[0] the english Crown Copyright has some similarities

[1] anglo-saxon copyright roughly matches the continental "economic rights" subset of author's rights

liblfds · on May 31, 2016

Indeed. It was with lawyers in mind I went for overkill on the licensing front. I wanted to wholly and completely remove that issue from consideration.

llull · on May 31, 2016

Chaff? For a project where one has total control, yes. When one has to get approval and deal with a specific list of approved licenses (which may not include one's particular favourite, no matter how sensible it seems to oneself), then no. It is appreciated.

weinzierl · on May 31, 2016

The license is interesting but the discussion is off-topic here. I created another thread about the license: https://news.ycombinator.com/item?id=11807463

bogomipz · on June 1, 2016

Can someone comment on lock free data structures vs lock free algorithms. My understanding is that there isn't as much of a symbiosis between the two as there are with other data structures and algorithms. Is it easier to implement a lock free data structure than a lock free algorithm?

liblfds · on June 1, 2016

I may be wrong, but I think an algorithm could only be considered in terms of locking or lock-free if multiple threads were through that algorithm performing work on the same data at the same time (and so potentially running into each other, in ways which would break the work being done).

I've seen server code like that - all the time of course - but the only algorithm I can think of which has that property is a PRNG, and that can normally be implemented on a per-thread basis, each thread holding its own single-threaded PRNG.

This probably reflects my lack of work with algorthims. Could you give an example?

bogomipz · on June 1, 2016

I came across this recently - Treibner's algorithm, mentioned in this:

http://people.csail.mit.edu/shanir/publications/Lock_Free.pd...

liblfds · on June 1, 2016

I think you mean Treiber, right? typo with the "n".

In 1986, on an IBM mainframe, published a white paper describing how to implement a lock-free stack (which is also a freelist, of course). It was the first lock-free data structure. I'm not quite sure why it's being called an algorithm!

The scalable version described in that paper is scalable by dint of adding an elimination layer in front of the stack.

I've added such a thing, although my own design - I've not read their paper, so I don't know how they did theirs - to the freelist in 7.1.0, but it hasn't worked well. It roughly doubled performance on x64, when there are a large number of cores, but reduces performance for lower numbers of cores and on other platforms. When there are a large number of cores, performance is rock bottom,, so doubling means going from like 1m ops/sec/core to 2m ops/sec/core, when if you run on just one core, you'd be doing say 30mm ops/sec on that core.

I'd expect it in principle to permit real scalability - fairly linear performance as the number of cores rises - so I think I've missed something, or done something wrong.

bogomipz · on June 1, 2016

Oh mea culpa, yes that's a typo. I was referring to W.K Treiber and his stack implementation. I think the fact that he referred to it as an "algorithm" is the source of my confusion.

Is it safe to say then that distinctions such as "lock free" and "wait free" always refer to data structures and not algorithms then?

I was curious, could you elaborate on why you think that PRNGs satisfy the property of a lock free algorithm?

Thank you for your detailed response by the way.

liblfds · on June 1, 2016

> Is it safe to say then that distinctions such as "lock free" and "wait free" always refer to data structures and not algorithms then?

Well, for what my thoughts are worth, I would say no, not inherently and catagorically, but in practise, it's hard to imagine algorithms which are multi-threaded in such a way that threads contend on shared data, such that the design of the algorithm could even be lock-free or wait-free.

Parallelized (i.e. multi-threaded) algorithms tend I think to have multiple threads all running the same code but each being given a separate part of the overall data to process - say an image processing algorithm, where each thread receives an equal portion of the image to process.

A PRNG however I think can (for one or two types of PRNG only, not in general) be implemented as a lock-free algorithm. A PRNG is not a data structure - rather, it maintains some internal state and on requests uses (and updates) that state to generate a new random number. It is an algorithm.

The updating process must be atomic or it is compromised - you can't have one thread half way through the state updates, and then another thread jumping in at that point and using the half-updated internal state.

Good PRNGs hold a lot of internal state - up to a kilobytes - so it's not possible to directly use atomic operation to update that state. However, there is a class of PRNGs called "SplitMix" which have an 8 byte state, where on request for a new random number, that state has a particular (quite large) constant added to it, and the result is then passed through a hashing function.

With this design, an atomic add can be used for the addition, and once that's done, the requesting thread then local to itself runs the hashing function (it's single-threaded, and merely takes an 8 byte number for input), and so the design is lock-free - no matter how many threads are using the PRNG, whether or not they have been idled by the OS, all other threads can still get their work done.

This PRNG is a curiosity though - it has almost no genuine use cases. The performance is poor when compared to the single-threaded version of the same PRNG, and a PRNG being what it is, each thread can easily run its own, thread-local PRNG.

bogomipz · on June 1, 2016

Thanks for your detailed response. You mentioned "SplitMix" PRNGs, what is the second one that is implemented lock free?

By the way congrats on getting your release out. This looks like good stuff.

liblfds · on June 1, 2016

Make that "one" then :-)

I don't actually know know of others, but I know there are a range of other PRNGs which run on single-word states, so they might be viable.

Thankyou! I see many flaws in the library though - there's so much that I can see to improve or to add. Still, with such things, there's a lot of value in making a release to get new functionality out so it can actually be used, even if there's much else to be improved, or which is not as it could or should be.

adekok · on May 31, 2016

The documentation is viewed via an embedded frame. So you can't save URLs to the documentation. This is a known bad practice for web sites.

The API changes with every release. WTF?

struct lfds710_freelist_element;

Really? Just... really? In order to use a newer version this library, you have to change your entire application for the updated function names.

And many of the lock-free structures are add-only. The binary tree is unbalanced. Which makes it not a very good binary tree.

This looks like it was written by somone who heard "lock free", and went "cool, I'll write a library!"

And he missed the last 40 years of software best practices.

The web site is bad, the APIs are bad, the functions aren't very useful, the library isn't very portable, etc.

People shouldn't use this library for anything.

[edit] Updated the comment about "someone" being a 16 year-old student.

nneonneo · on May 31, 2016

The library also uses some absurdly long identifiers which are bound to clutter your code one way or another. To iterate a binary tree, use

while(lfds710_btree_au_get_by_absolute_position_and_then_by_relative_position(state, &element, LFDS710_BTREE_AU_SMALLEST_IN_TREE, LFDS710_BTREE_AU_INORDER_WALK_FROM_SMALLEST_TO_LARGEST))

It's not going to be possible to remember what names to use when you need to iterate. Or, take the initialization routine. You first call

lfds710_btree_au_init_valid_on_current_logical_core( &state, key_compare_function, LFDS710_BTREE_AU_EXISTING_KEY_FAIL, NULL );

on one core, which is perhaps reasonable, but then you call

LFDS710_MISC_MAKE_VALID_ON_CURRENT_LOGICAL_CORE_INITS_COMPLETED_BEFORE_NOW_ON_ANY_OTHER_LOGICAL_CORE()

on other cores. That is one hell of a #define, and it's not even clear from the name what it's doing. It seems like "LFDS710_MISC_SYNC_INITS" or something along those lines would be clearer and easier to remember.

ddevault · on May 31, 2016

Holy shit! That's 71 characters, which leaves you 9 characters for your code. 4 of that goes to indentation, 3 of that goes to (); and you have 2 characters left for the parameters before you break the 80 character limit. That's horrifying.

liblfds · on May 31, 2016

OTOH, you do know exactly what it does, without having to read any docs.

I am of course open to suggestions for improvement.

mjevans · on June 1, 2016

Encourage the adoption of a line-width that is longer than ancient terminals (at least it isn't 40 cols across).

Also I'm much less concerned with absolute line width in my own code than I am with indentation steps or number of arguments something takes without flowing on to multiple lines.

liblfds · on May 31, 2016

Yes, some of the identifiers are very long. The aim was to make the APIs self-explanatory. It should be possible to read an API and know what it does. It's challanging to explain an inorder walk, and its direction, in a shorter form.

The goal of course is for all identifiers to be as short as possible.

ch · on May 31, 2016

What about replacing FROM_SMALLEST_TO_LARGEST with ASCENDING?

liblfds · on May 31, 2016

That could work. ASCENDING and DESCENDING. I'll think it over (in the morning - I'm about comatose right now).

jhpriestley · on May 31, 2016

The criticism of the documentation is fair, but it's an incredibly trivial point. You can save URLs to the documentation if you right-click the link.

For example, here is the link to the rationale for the versioned naming scheme:

http://liblfds.org/mediawiki/index.php?title=Introduction#Co...

Do you even understand the significance of "lock-free" in the title here? Do you have some basis to think that the weaknesses of the data structures, like unbalanced trees and add-only collections, can be remedied without using locks? If you do then that would be an interesting criticism, but as it is it looks like you just opened the page, looked around for a few things to criticize, and then wrote this sour, shallow, dismissive comment.

openasocket · on May 31, 2016

A quick google shows that there are lock-free implementations of AVL trees and red-black trees. And every paper I've ever read about lock-free data structures includes both addition and removal operations.

liblfds · on May 31, 2016

Yes. I have seen the white paper for the red-black tree, and the approach used should work for AVL (I've not seen that paper). I've just not had time to implement yet. A full O(log n) data structure is necessary for the library to be really useful piece of work - but there's a lot of other things which are necessary for that, too, and they have so far taken up most of the development time.

About 10% of the effort so far has gone into coding the actual data structures - almost all the work goes into testing, benchmarking, performance tuning, documentation, porting to new platforms, fundamental improvements in the basic design and so on. The benchmark app alone has taken all in all about six full months to write. I thought it'd be simple, it turned out to be a tar-pit!

openasocket · on May 31, 2016

Honestly, I wouldn't bother with red-black or AVL. I'm not sure about the current state of lock-free data structures, but I prefer a B-Tree to a red-black or AVL tree. At least in the single-threaded case B-Tree operations tend to be faster, have better cache locality, etc. Google reveals there are lock-free implementations of B-Trees, so maybe look into those instead?

liblfds · on May 31, 2016

Could be. I will look into the choice before I commit - figuring out how one of these complex lock-free data structures works and implmenting it correctly and writing all the tests cases is a major undertaking. It's something you want to make sure you're making the right choice about before you invest the time.

As an aside, binary trees turn out to be really great with lock-free. In the obvious locking implementations, there's one lock per tree. You can use read-writer locks to get parallelism on read, but even so, the mere existance of a single lock of any kind induces memory contention which eliminates scaling as processor counts rise.

I'm sure there are in the locking literature better solutions, but I'm up to speed with lock-free, rather than locking, solutions, so I don't much know about them.

Regarding lock-free, there is no single lock, and btrees are beautiful because they naturally distribute memory accesses throughout the tree. Performance basically scales linearly. There's a gnuplot on the site from a 16 core Amazon VM, and that property can clearly be seen. Every new core simply adds another 200k benchmark ops per second. For the other, non-scaling data structures, new cores are death.

nothrabannosir · on May 31, 2016

> This looks like it was written by a 16 year-old student who heard "lock free", and went "cool, I'll write a library!"

Nothing wrong with that, though. In fact, that's great.

Word to the 16-year-olds (or any-year-olds) doing that: put a disclaimer in your README to avoid this kind of reply being top comment in a HN thread about your work :)

adekok · on May 31, 2016

I wrote code when I was 16. There's nothing wrong with being that age.

What is wrong is not learning from others. This library follows many practices which are well known as "worst practices" in the industry.

partycoder · on May 31, 2016

Can you elaborate more on those "worst practices"?

huhtenberg · on May 31, 2016

Who peed in your espresso this morning?

adekok · on May 31, 2016

The best response to a technical review of software is an ad hominem.

Good to know.

DiabloD3 · on May 31, 2016

Well, what we should be doing is a double blind test to find out if pee in espresso improves performance, and what the minimum effective dose is.

ma2rten · on May 31, 2016

Well, ... your technical review was ad hominem in the first place.

adekok · on May 31, 2016

Because saying someone looks 16 is an insult?

loeg · on May 31, 2016

When you say it like you did initially, yes!

arcticbull · on May 31, 2016

I think that's actually the definition of an ad hominem attack :)

jotux · on May 31, 2016

Comments like this are exactly why people are afraid to release projects to the public and contribute to open source.

adekok · on May 31, 2016

If you're afraid of negative feedback, you will never do anything, and you will never learn anything.

In this case, I'm astonished at just how many bad practices the author has engaged in. Do most software projects have structs / functions / whatever where the names are versioned?

It's one thing for TempleOS to invent his own OS, build system, programming language, GUI, etc. He made some odd choices, but at least he's clear he doesn't care (much) about other people using his code. And, it's clear his code works, and does something useful.

The author of this library has invented many of his own programming practices... which are clearly worse than current industry practices. Why? I can't say. But I can say that they are worse.

I just don't understand this attitude that all feedback has to be couched carefully to avoid hurting anyones feelings. If it's bad, it's bad.

Perhaps you could be productive, and post your review of his code. If it's in any way negative, (by your standards), I should jump on the review as being overly negative, and "you're the reason why people don't do open source."

Adults should be able to take criticism. If they can't, I have very little sympathy for them.

jotux · on May 31, 2016

>I just don't understand this attitude that all feedback has to be couched carefully to avoid hurting anyones feelings.

>Adults should be able to take criticism. If they can't, I have very little sympathy for them.

Again, your outlook is toxic to newcomers. Yes people should be able to handle criticism but you should be capable of crafting that criticism in a way that isn't mean spirited and rude.

I wish more projects and communities had code of conduct rules similar to the rust language: https://www.rust-lang.org/conduct.html

Which includes rules your comment would obviously violate:

    Please be kind and courteous. There’s no need to be mean or rude.

cyphar · on May 31, 2016

> >I just don't understand this attitude that all feedback has to be couched carefully to avoid hurting anyones feelings.

> >Adults should be able to take criticism. If they can't, I have very little sympathy for them.

> Again, your outlook is toxic to newcomers. Yes people should be able to handle criticism but you should be capable of crafting that criticism in a way that isn't mean spirited and rude.

I don't think that's a fair statement. If someone has done something badly, they've done it badly and they should be told that. You're not doing anyone any benefit by sugar coating it by first saying "great job on trying". Personally, I prefer getting a frank and honest review of my work rather than having to tease out why someone doesn't like it but they won't state it plainly because they'll "hurt my feelings". We're discussing code, and if code is bad then we should make it clear that it's bad, say how to do it better, and move on.

jotux · on May 31, 2016

I agree completely with what you're saying but this:

>This looks like it was written by a 16 year-old student who heard "lock free", and went "cool, I'll write a library!" And he missed the last 40 years of software best practices.

Is not constructive criticism, it's just being rude.

adekok · on June 1, 2016

You're unwilling to give constructive criticism yourself. I suspect because you know that by your standards, any negative criticism can be labeled as "mean and rude".

You've been given multiple opportunities to give positive criticism yourself. And you've refused. Your position is entirely negative.

For something where you are more offended than the author of the library. He's an adult, and he can defend himself. You're infantilizing him by protecting him from things where he doesn't want protection.

The code of conduct you linked to would seem to prohibit your overly negative behavior.

i.e. your comments have come across (to use a common phrase here), as toxic.

adekok · on May 31, 2016

> Again, your outlook is toxic to newcomers.

I disagree.

> you should be capable of crafting that criticism in a way that isn't mean spirited and rude.

As I suggested, perhaps you could give some positive criticism of your own, and give your review. Not doing that means you're demanding I do something which you won't do.

This is not a trivial point. I'll bet that you'll find it difficult to craft a technical review which points out the flaws in his software, without anyone being able to claim it's being "mean and rude".

Try it. Please.

> I wish more projects and communities had code of conduct rules

That's nice. But I wish more projects had code of conduct rules on people asking questions. I've been doing open source for 20+ years. There is a large subset of people who don't read the documentation, don't ask good questions, who ignore all attempts to help them...

... but ...

their behavior is superficially polite. Until they get frustrated that "no one is helping them", and they start complaining that everyone is being mean to them.

That behavior is toxic, and I have no patience for it.

Yet strangely, I've yet to see a "code of conduct" which deals with this issue. It's open source... you shouldn't demand that other people help you, no matter how politely you make that request.

And (as with most such arguments I've had), I'll note that you're more offended by my comments than the author is. Such over-sensitivity seems to me "problematic". The author is an adult. He can clearly take criticism. Perhaps learning from his example would be good.

devishard · on May 31, 2016

So? I don't think the world needs more bad open-source code. If people only open-source good code, then that saves everyone the trouble of sifting through the junk like this. People being afraid to open source bad code would be a good thing.

And having a top comment on HN warning me that the code is pretty bad saves me time, too, because I don't have to read the code or try using it to find that out. So I really appreciate people publicly pointing out that the code is bad.

I'd feel more positive about this code if it had been submitted with a request for code review and feedback rather than being presented as a "a portable, license-free, lock-free data structure library written in C" with a whole website advertising it as being ready for use.

cyphar · on May 31, 2016

I think making all code free software would allow bad code to be fixed, which would result in a much better state of affairs than having proprietary software that won't ever be made free because the author won't fix it.

liblfds · on May 31, 2016

> The documentation is viewed via an embedded frame. So you > can't save URLs to the documentation. This is a known bad > practice for web sites.

Yes and no. The site currently is arranged with a set of tabs on the left and a frame on the right. It had to be a frame because some of the web-based apps, such as the mediawiki used for documentation, generate output encapsulated in the "<HTML>" tag. As such, their output cannot be captured and placed into a div; it has to go into its own frame.

However, the documentation can be opened in a new tab (right click, open in new tab) and then where it has its own browser window, the URL is displayed as normal.

Of course, this is not the default behaviour - the user has to do something to get it, and to realise that it is possible. That is a problem. The only obvious solution is to have the documentation automatically open in a new window. That's a bit unpleasent though - new windows popping up unexpectedly is not good for the user experience.

> The API changes with every release. WTF? Really? Just... really? In order to use a newer version this library, you have to change your entire application for the updated function names.

No. All versions of the library can concurrently be included and linked against. The reason for this is so that any code which using an existing version of the library does not have to be modified at all - and so does not need to be revalidated - when a new version is published.

IME, once code is written, and given that it works, it tends quite strongly not to change. It works, so why spend time and effort merely to change it? however, if a new version of a library comes out, anything could have happened - so existing code has to be revalidated. It's extra work. This is avoided by supporting concurrent use of multiple library versions.

> And many of the lock-free structures are add-only. The binary tree is unbalanced. Which makes it not a very > good binary tree.

The add-onlys were thrown together in a week or so many months ago, just to get those classes of data strutures out there, as writing the full versions of them will take some time. There is in fact for eaxmple a full, balanced binary tree, in the literature. That's on the list!

Regarding being unbalanced, one way to address this is to hash the key before inserting a new node.

> This looks like it was written by somone who heard "lock free", and went "cool, I'll write a library!"

That's a bit unfair. The library is eight years old now. It's not a flash in the pan.

> The web site is bad, the APIs are bad, the functions aren't very useful, the library isn't very portable, > etc.

It's actually as portable as it is possible for a lock-free library to be. Only a fairly few platforms provide atomic instrinsics. As far as GCC provides support, they are all suppoted, and I believe GCC supports all of them.

devishard · on May 31, 2016

> No. All versions of the library can concurrently be included and linked against. The reason for this is so that any code which using an existing version of the library does not have to be modified at all - and so does not need to be revalidated - when a new version is published.

This is what having consistent interfaces is for. Keep your interface consistent, not your implementation.

This is also a strong argument for not releasing code and advertising it on HN if it's not mature.

> The add-onlys were thrown together in a week or so many months ago, just to get those classes of data strutures out there, as writing the full versions of them will take some time. There is in fact for eaxmple a full, balanced binary tree, in the literature. That's on the list!

Which begs the question why you included immature implementations in your library.

> That's a bit unfair. The library is eight years old now. It's not a flash in the pan.

Which doesn't negate the criticism in any way.

> It's actually as portable as it is possible for a lock-free library to be. Only a fairly few platforms provide atomic instrinsics. As far as GCC provides support, they are all suppoted, and I believe GCC supports all of them.

There are working examples (ConcurrencyKit) of lock-free data structures that target more architectures and more compilers, so "It's actually as portable as it is possible for a lock-free library to be." is provably false.

liblfds · on May 31, 2016

> This is what having consistent interfaces is for. Keep > your interface consistent, not your implementation.

I have to disagree. If the underlying library used has changed, then the code which uses the library has to be revalidated. You cannot assume it simply continues to work - to do so is to fully rely on the library provider getting it right, introducing no bugs, or anything unexpected. This is not viable for serious projects.

The and the ONLY way in which existing code does not have to be revalidated is if it is completely and wholly unchanged.

> Which begs the question why you included immature implementations in your library.

There's nothing immature about them, and I don't know why you are saying there is. They were written about a year ago and writing them means also writing their test suite, which they pass.

The choices you disagree with in the library seem to be being taken to justify the charge of "code immaturity". Those choices may be right, or they may be wrong, and so could potentially be mistakes, but this is an entirely different matter to maturity.

> There are working examples (ConcurrencyKit) of lock-free data > structures that target more architectures and more compilers, so > "It's actually as portable as it is possible for a lock-free > library to be." is provably false.

I was thinking of processor support, rather than compiler support. Processor support is the main part of the work - all of the code has to be written with it in mind. Compiler support is much easier; it's just a matter of porting their atomic instrinsics, which basically means writing about a dozen straightfoward macros.

devishard · on May 31, 2016

> I have to disagree. If the underlying library used has changed, then the code which uses the library has to be revalidated. You cannot assume it simply continues to work - to do so is to fully rely on the library provider getting it right, introducing no bugs, or anything unexpected. This is not viable for serious projects.

> The and the ONLY way in which existing code does not have to be revalidated is if it is completely and wholly unchanged.

And when I integrate a new release, how do I verify that you didn't accidentally change one of the older numbered versions of your code? I'm going to have to test either way when I integrate a new version of your code, so I might as well get any performance improvements or whatever you introduced into the implementation. Versioning your interface does not solve the problem you claim it does.

> There's nothing immature about them, and I don't know why you are saying there is. They were written about a year ago and writing them means also writing their test suite, which they pass.

I am saying they're immature because you said they were immature. You said, "The add-onlys were thrown together in a week or so many months ago, just to get those classes of data strutures out there, as writing the full versions of them will take some time."

> I was thinking of processor support, rather than compiler support.

I was also thinking of both processor support--you're the only one who mentioned any compilers. Given this supports only 2 processors and ConcurrencyKit supports 5, your claim that this is "actually as portable as it is possible for a lock-free library to be" is still provably wrong.

liblfds · on June 1, 2016

> And when I integrate a new release, how do I verify that > you didn't accidentally change one of the older numbered > versions of your code?

They're only released once. No release is changed after its release. Once it's out, it's out. The archive is never touched again.

When a release is made, the source code is exported, archived and uploaded to the library web-site. To accidentally change that code, I would have to accidentally repeat that manual process. It's the same with github - I actually use SVN at home, and I make a new repos per release and upload the code, once, to that repos. Github is a release only, write-once source control system, for me.

> I'm going to have to test either way when I integrate a new version of your code

Well, now I'm confused. If that's so, what's the problem? you keep using the old code, it's not changed, either the API or the library binary, so you don't have to test that; and for any new code, you can use the new release version. To the extent the APIs are unchanged, moving fully from one release to another is a matter of issuing a global search-and-replace for the string "liblfdsNNN" to "liblfdsMMM".

> Given this supports only 2 processors and ConcurrencyKit > supports 5

The library supports on Linux almost every processor GCC offers atomic instrincs for - ARM32, IA64, POWERPC32/64, SPARC32/64, MIPS32/64, x64 and x86. I'm pretty sure the library does not support Alpha, however. Of those platforms, I only have access to actual hardware for ARM32, MIPS32 and x64, so only those platforms actually have the test code compiled and run and being seen to pass.

On Windows, support is provided for ARM32, x86, x64 and IA64. Again, of these, I only have access to x64.

devishard · on June 2, 2016

Ugh, your problem thinking runs too deep for me to power through.

1. Let's say I do things the way you suggest: I use different versions of your code simultaneously. 3 years in I have 5 versions of your code and I find a bug in my code in an area from the second version. Now I have to know 5 libraries to understand the code and I'm constantly getting tripped up by the subtle differences between versions 2 and 3.

2. So to fix this plan I finally persuade my boss to give me a week to address tech debt. With your search-and-replace plan, except I forget to do it in the library that Bob the contractor kept in a separate repo before we fired him. And then there's the separate problem of when I passed it into a function that took a function pointer and so the name is different and not find-and-replacable as being the same function. And then there's the place where Joe, who is a good programmer but kind of inexperienced, poor kid, he thought he'd simplify things by creating a macro that inserts the number, and this got past code review because I reviewed it right after a meeting with the client from hell. So now nothing is working after the find-replace, it takes us the entire week to figure out why, and we don't get to spend any of our tech debt time fixing the place where Bob the contractor decided it would be a good idea to "optimize" the graph generation functions.

For the record, I'm speaking from experience here. The way you wrote this code is an ideology I've worked with before and it's miserable to consume libraries written in this style. You can argue for the "benefits" all you want but I've eaten this dog food before and I'm not going to eat it again.

anarazel · on May 31, 2016

> I was thinking of processor support, rather than compiler support. Processor support is the main part of the work - all of the code has to be written with it in mind. Compiler support is much easier; it's just a matter of porting their atomic instrinsics, which basically means writing about a dozen straightfoward macros.

There's actually rather few alive platforms that don't either have ll/sc or cas. The original i386, armv5, sparcv8 (leon IIRC added cas though), pa-risc come to mind (that's the postgres platforms for which fallback atomics support is used). Once you have CAS it's easy to provide fallback implementation for other atomics. It's also not all that hard to provide a "fallback" implementation for atomics, which guarantees atomicity using some locking mechanism.

liblfds · on May 31, 2016

CAS or LL/SC is straightforward enough (although you have to be aware in your use of them of the differences between processors in terms of whether they lock cache lines, or have exclusive reservation granules, or per logical-core locks, etc, and there is one other minor complication to consider, the presence or absence of contigious double-word CAS or LL/SC), but memory ordering behaviour and support varies significantly across processors.

Intel in that regard are a pain, because they have a mandatory, built-in full memory barrier in their atomic operations. ARM does not, and I see the freelist on ARM running about 25% faster (relatively speaking) than Intel, because of it.

If you look at the first two bars (first is the new GCC atomic instrincs, second the old GCC sync intrinsics) in the one-core chart from these two gunplots, the first gnuplot being ARM32 and the second a Core i5,

http://liblfds.org/pages/images/liblfds710_freelist_push1_th...

You will see on Intel they're level, and on ARM, the atomic bar (the first bar) is about 25% higher.

The new GCC atomic instrincs only issue memory barrier when told to, whereas the old sync instrincs normally (e.g. on most platforms - the docs are a little nebulous) issue memory barriers.

The freelist doesn't need a memory barrier on pop, but on Intel, you get one anyway, and on ARM, with the sync instrincs, you get one anyway. The atomic instrinics also issue on Intel, because Intel forces it to happen, but they do not issue on ARM. I think this gives the 25% performance improvement.

Joky · on May 31, 2016

> This is also a strong argument for not releasing code and advertising it on HN if it's not mature.

I'm not totally convinced about that, cf: https://en.wikipedia.org/wiki/Release_early,_release_often

erichdongubler · on May 31, 2016

Uh...what about typedef?

typedef lfds710_freelist_element freelist_element;

liblfds · on May 31, 2016

I think typedefs are almost always harmful. The explanation is longish and not directly related to the library, so I'll leave it at that.

What I will say though is that the struct already has a type, so as far as I can tell, the benefit consists of not having to type the struct keyword?

Also, the typedef in particular you've given would make it impossible to concurrently include and link multiple versions of the library (which is the purpose of having the version in the public entity names).

cyphar · on May 31, 2016

> Also, the typedef in particular you've given would make it impossible to concurrently include and link multiple versions of the library (which is the purpose of having the version in the public entity names).

Honest question: why do you think that's a useful usecase? I've never thought of using two versions of the same library, mainly due to fear of nasal demons.

Also, you could do symbol versioning like glibc does and then just keep the old code in the library for backwards compatibility.

liblfds · on June 2, 2016

> Honest question: why do you think that's a useful usecase?

I worked for three years on software used by telcos. Carrier grade - they were rigourous about change control and validation, because bugs were totally unacceptable. For them, it would have been exactly what they needed.

Before that, I worked once for a while on a huge, unmaintainable code base. Awful code, basically one huge function, with a million lines of code =-) with that code, you should not risk ANY change. For them, it would have been essential.

In general, across all software I think software quality is not high. Developers do not test enough, and are too casual about change. I am taking an approach which I think is necessary, and it certainly deviates from the usual practise.

> Also, you could do symbol versioning like glibc

I'm not sure I properly understand - I need to read up properly on what this offers. However, it's not clear to me how it changes anything from the C code POV - in the C code, the user still needs to use the old versions of prototypes, to call the correct code in the library, so what do they get for symbol versioning? why not just link to the old binary? and it's important, because the old binary is completely unchanged and it is only by that, that revalidation is unnecessary and so does not have to be performed.

On a more practical level though, the library targets arbitrary compilers on bare metal platforms, and symbol versioning from what I read is rarely offered.

adekok · on May 31, 2016

Sure. Did he include the typedefs in his code?

Not that I can see (on a quick scan).

So the only way to use his library is to create your own wrapper functions. And update them every time he updates his library.

Thanks, but no thanks.

liblfds · on May 31, 2016

Could you explain why this is the only way to use the library? why are typedefs essential for use?

adekok · on May 31, 2016

Let's say that my application uses your library. Great! I code it, it works, and I release it for public consumption.

It get popular, so that Ubuntu packages the application and your library. Millions of people use it.

Now what happens when you release a new version of the library? All of the APIs with 710_foo become 720_foo. Which means that I have to rebuild my application to work with the library. I can't just link to the new version and get the bug fixes.

Even worse, Ubuntu has to rebuild all applications which depend on your library. Every time you release a new version.

This is why no one else puts version numbers into function / struct names. We rely on the C compiler to do type checks at compile time, and we rely on the linker to do symbol checks at run time.

If no one else puts version numbers into function names, it should be a strong hint that you shouldn't do that, either.

Similarly, 80-character symbols / macro names are just too complicated to use / remember. Pretty much no one else does this, either.

liblfds · on May 31, 2016

> I can't just link to the new version and get the bug fixes.

I think this isn't quite right, by omitting the down side - that you will be getting the not just the bug fixes, but new bugs. It's not all roses!

What has occurred between 710 and 720 is change. Ideally, hopefully, intended to be, positive change only. Bug fixes only. Improvements only. We all as software developers know there will in fact be new bugs introduced. Regression testings exists for that reason, and it is an imperfect method.

So I guess what I'm saying is this : nothing, NOTHING, should change for a given application, without revalidation. Swapping a library version under the hood is by that view not good practise. By and large of course this is type of change is absolutely normal practise, because most of the time it seems to work, and it less costly and time consuming.

(Really, it happens I think because humans are optimists and writing code is hard work and time consuming, so we involuntarily hope it's all fine and don't look hard for the bugs we've not found).

adekok · on June 1, 2016

> What has occurred between 710 and 720 is change.

As you note, the changes should be minimal. i.e. the versions should be compatible. By changing the names, you are creating massive change, in order to bludgeon users of your library with the knowledge that there might be minor changes in it.

This is counter-productive.

> Swapping a library version under the hood is by that view not good practise.

And re-naming everything is?

Please, get over yourself. Your practice runs counter to 40 years of computer engineering. There are hundreds of thousands of Open Source projects which keep consistent names across minor releases.

Either they're all wrong, every one of the millions of developers, or you are wrong.

I think my review ends here. You're clearly in the camp of "I know better than everyone else".

cyphar · on May 31, 2016

Having version numbers in struct and function names is what he's saying is a bad idea. Typedefs would be a way to achieve it, but you have to write a wrapper library to do that.

Why not just use symbol versioning like glibc does.

liblfds · on June 1, 2016

You have to remember what I'm looking to achive here is that a new version can be released, and used, while the existing codebase using the old versions is utterly unchanged. The APIs are unchanged, the library they link to is unchanged - totally, utterly, unchanged, not one single byte of difference.

I've not thought about (or really come across - I think I have seen it on Windows) symbol versioning, but from what I've understood (from a pretty terrible explanation online) it would mean the library ends up with a single library file, which gains new members as new releases of the library occur.

This still has the problem of imposing unvalidated change on the user - the library changes under him. What if I introduced new bugs?

cyphar · on June 1, 2016

Well, the solution to that is to maintain branches of your code for each major version change and to only introduce breakages in major versions.

Because your current method means that your library will never delete old code. Not to mention that I don't know how deep your obsession with versioning goes -- if you slightly change a basic function shared by different library versions what will you do then? Or do you have n copies of the same code everywhere? What do you do if there's a bug in it? And how the hell do you expect a distribution to maintain the code, because you are copying large chunks of code in every version with no net benefit before you even start working on new versions.

If you did major versioning right, then only people who link against version X of your library would use the version X symbols. Glibc uses symbol versioning to extend that further and allow you to change glibc versions under users and they will still be using the old functions because you have symbol versions. Unless you're talking about "changing the library under them at compile time" in which case I think that it's a bad idea to do it that way (people just won't update their code).

Alternatively, you could allow users to set a macro before including your headers. This macro sets the version of the library they want to use. Then you don't require people to specify the version in every single symbol.

liblfds · on June 1, 2016

> Well, the solution to that is to maintain branches of your code for each major version change and to only introduce breakages in major versions.

That is what is being done. Versioning is GCC style, "[major].[minor].[bugfix]". When APi breaking changes occur, a new major release is published, and is begins a new line of releases. When non-API breaking changes are made (or backported), a minor release occurs on all previous lines. Bugfix release can occur on previous lines.

So there's 6.0.0, then 6.1.0, and then a few months later 6.0.1 and 6.1.1 came out, with the same bug fix for both releases.

> Because your current method means that your library will never delete old code

Right. That is intentional. The idea is that when a new release is made, there is no change whatsoever in previous versions of the library which are in already in use.

> if you slightly change a basic function shared by different library versions what will you do then?

There are no shared functions, the library is stand-alone.

> If you did major versioning right

It would still require change in the binary being used by application, right? it would mean change, and that means revalidation - the only way it would not mean revalidation is if the application owner decided "what the hell, I'm sure he's got it right and there are no new bugs and it'll all be okay".

Anyway, you must alao remember the library targets bare metal platforms on arbitrary compilers. As it is for example the library is for example wholly independent of the standard library - it doens't even require a freestanding implementation. Symbol versioning is a luxury, high-end, rarely provided functionality. If the library relied on it, it would greatly limit the range of supported platforms.

> Alternatively, you could allow users to set a macro before including your headers. This macro sets the version of the library they want to use. Then you don't require people to specify the version in every single symbol.

That would defeat the very goal being aimed for. The idea is that existing code does not change when a new version of the library si released. The idea that when a new version is release, that all existing code should automatically use that new version, is to be avoided. You are trying to find ways in which it can happen, when that's the very thing which I'm trying to have not happen. That change should be controlled, should be consciously and deliberately chosen at a particular time, and to whatever extent is decided, by the maintainer of the application using the library.

Consider. Say an application is using the queue from 7.0.0 in say six different places. The code works. Why change it? but then there's a performance problem. One of those queues would really benefit from going faster. In this situation, why change all of the queues? those queues could be in completely different sub-components of the overall ssytem. When you update one of those queues, you have to revalidate that part of the system. You don't want to do it - you have limited developer resources, a to-do list the length of your arm and customers screaming for new features and other bugfixes.

cyphar · on May 31, 2016

Such an odd way to license free software. It's as though the author honestly thinks that licenses are bad for free software -- copyright has bad defaults and licenses allow us to make free software.

liblfds · on May 31, 2016

No. Property rights are absolutely, profoundly, completely, totally, utterly central, vital, irreplacable and essential. Licenses are an offshoot of property rights.

However, the arrangement of property rights as they are now being forced upon me - other entities telling me how I must go about managing my rights - this I do object to.

I know what I want to do with the property rights in this case, but I am often told there's all kinds of trouble in achieving this, because of these other entities enforcing their view of property rights upon me and the people who would form a contract with me (by choosing to use the software on the basis I am offering it).

cyphar · on June 1, 2016

You think that the copyright default of "all works are copyrighted essentially forever, and you cannot use, modify or share copyrighted works of any kind" is sane? Seriously?

If you want to talk about property rights, we can talk about those. But we're talking about copyright, which is a very different concept to property (despite attempts to create misleading terms like "intellectual property" to muddy the waters).

Copyright is supposed to be a method of encouraging creative works. It is not meant to be a method for software developers to maintain control over their users. I never said licenses were not important: YOU SAID THAT. Licenses are incredibly important -- free software couldn't exist without them because of draconian copyright laws funded by special interests that don't give a shit about creative works.

liblfds · on June 1, 2016

> I never said licenses were not important: YOU SAID THAT.

I thought you were saying I thought licenses were not important. I was trying to make it clear I think they are fundamentally important, by being an aspect of property rights.

cyphar · on June 1, 2016

Why is the title of the submission "license-less library"? If you agree with me that licenses are very important for free software, why even have that section on the front page?

liblfds · on June 1, 2016

Because until recently, the library was license-free.

Licenses are important as a concept. Licenses can exist, and when they do, they must be honoured. They are an aspect of the property rights of the owner of the software.

That doesn't mean you have to use one, and the library was not using one, so it was license-free.

There might be some confusion here in that when I speak of licensing being important, I mean the concept - I do not particularly mean licensing as it exists today in practise.

As was pointed out in an early post in this discussion, the matter is now not so clear cut. I've added per data-structure information on licensing, and it's no so straightfoward a situation now as to be able to say, in a blanket fashion, "license-free". I'm thinking about what to do about that. The library is much closer to license-free than anything else, but it's hard to explain the situation in the one-line description while keeping it short and to the point.

cyphar · on June 1, 2016

Why don't you just license it under a single free software license and call it a day? Why say "you have an infinite set of dual licenses"? If you really, really really don't care just use CC0.

I don't understand your focus on the "concept of licensing". It's such an odd thing to focus on with such vigour. In my mind, a license is a tool that allows me to ensure that users of my software have freedoms that they deserve. Some people use it as a tool for oppressing users. But licenses are a tool, simple as that. They are a consequence of copyright law. Copyright licenses have literally nothing to do with property rights (you can't put a copyright license on a goat or a piece of land).

p0nce · on May 31, 2016

Now we only need more lock-free problems :) Seriously, I'm pretty much content with regular uncontended mutexes and atomics in the vast majority of cases.

liblfds · on May 31, 2016

The lock-free binary tree on two cores is about 2x faster than the next best method. On sixteen cores, about 20x faster.

Why not use it, if it's packaged up in a library? it's only the difference between typing one API or another, unless there are functionality differences.

wahern · on June 1, 2016

A lock-free self-balancing binary tree (e.g. RedBlack or AVL) would be awesome. But AFAIK that's not possible using commodity hardware. In general, there aren't many places where you can use a regular binary tree when using externally-sourced data because it opens you up to computational complexity attacks. You could using something like SipHash to securely randomize keys, but then it won't be nearly as fast. And of outside of batch processing, who uses a binary tree without deletions?

The only lock-free data structure that is practical is a queue. And that's because a lock-free queue is really the only data structure that doesn't require you to re-architect your entire application stack according to the limitations of the lock-free data structure. And if you're going to re-architect your application, better to re-architect it around message passing, which mostly obviates the need for lock-free data structures, with queues, again, being the obvious exception.

But it's comparatively trivial to implement a lock-free queue using intrinsics, assuming it's a hot-spot in your application. Library dependencies are extremely costly in terms of the long-term maintenance of software projects. Supplying a lock-free queue in a single C file (or header) using common compiler intrinsics, OTOH, and which automagically can be dropped into most existing projects, might be much more useful.

liblfds · on June 1, 2016

Lock-free red-black trees now existing in the literature. They were invented recently, about two or three years ago now.

> And of outside of batch processing, who uses a binary tree without deletions?

IME, it's surprising how often even an add-only data structure is absolutely fine. For example, the benchmark application uses one to store the processor/memory topology, and perform operations upon it, such as generating interesting combinations of cores to execute the benchmarks upon.

One other thing which helps a little is that the tree elements have a key and a value, and the value can change at any time. So there's a bit of flexibility there.

However, of course, the real solution is proper support for delete. The library must have this, and it's pressing - but there's just so many other things which also need to be done, and which have been being done. For example, it's all very well having a tree, but without e benchmark, I have no idea if I've implemented correctly.

Where the benchmark was added in 7.1.0, I was able to see the implementation on ARM was utterly broken - running at about 10% of the speed of locking trees! I investigated, fixed it, they returned to running about 3x that of the next best solution - and that fix also gave about a 25% improvement on all other platforms.

> The only lock-free data structure that is practical is a queue.

There's quite a few lock-free data structure in the literature. Obviously the stack (it's ancient), then the queue (lots of variants thereof), deques, lists (singly-linked, doubly-linked, ordered, unordered, locality aware), then hashes (of vast importance, as they scale magnificently) and now trees.

> And that's because a lock-free queue is really the only data structure that doesn't require you to re-architect your entire application stack according to the limitations of the lock-free data structure.

What limitations do you have in mind? as far as I can see, the API for the lock-free queue is identical to the API for the locking queue, you just get more performance.

> But it's comparatively trivial to implement a lock-free queue using intrinsics

Mmm... I'm not sure I agree. The standard lock-free queue is from Michael and Scott. It takes a while to understand it, and you have to know - really know - all about memory ordering behaviour. I've seen more broken implementations of this data structure than working (although they work fine under low load, of course). My original implementation made the same mistakes.

Moreover, today, although I may well be completely wrong, I think I know of two bugs in that queue, both of which are fixed in liblfds. Getting beyond even the white paper implementation to realising there could be issues in the queue is another leap.

> Supplying a lock-free queue in a single C file (or > header) using common compiler intrinsics, OTOH, and which > automagically can be dropped into most existing projects, > might be much more useful.

To the extent this is true, it's true for all libraries, not just lock-free data structure libraries. It's a question of packaging. Do you have multiple C files and multiple header files, and compile them all up and have the user link, or do you bundle them all together into a single file (in this case so you have a single file per data structure, where the abstraction layers are duplicated between them) and ship that?

That single file is going to be some hundreds of kilobyets of code. Maintaining that arrangement would be quite painful (and error prone) for the library developer. What you'd need in fact is to maintain the code in the form of multiple files, and then have a release mechanism bundle them up.

In both cases, the users take your source file (or files) and add them to their projects. In the former case, the file is added to their build files. In the latter case, building the library is added to their build files, and then linking also.

p0nce · on June 1, 2016

Sure, but I somehow never encounter a problem where I need a real lockfree data structure other than a ringbuffer or a variation thereof. Interlocked queues are almost giving me enough.

liblfds · on June 1, 2016

What do you mean by interlocked queue?

On Windows, the atomic instrinsic APIs all begin with "interlocked", and although I'm not sure what you mean, it's making me think of that.

partycoder · on May 31, 2016

It would be good to know more about the thread safety of the structures.

liblfds · on May 31, 2016

Lock-free data structures are process, thread and interrupt safe (i.e. the same data structure instance can be safely used across processes, threads and both inside and outside of interrupt handlers).

partycoder · on May 31, 2016

How is this guaranteed? by which process was this safety declared?

liblfds · on May 31, 2016

To answer that, I need to basically explain the difference between lock-free and locking data structures.

If we consider a locking data structure, say a binary tree using a mutex to serialise access, we see a couple of things;

1. when a thread holds the mutex, no other thread can get any work done

2. if a thread holding the mutex is idled by the operating system, every other thread cannot get any work done on the tree until that thread is finally swapped back in

3. if a thread is holding the mutex when an interrupt occurs, the interrupt handler cannot use the tree, because the tree may be in an invalid state - the mutex holding thread could be right in the middle of adjusting a bunch of pointers

The fact that thread can be prevented from working by other threads is what is meant by saying it is a locking dats structure.

Lock-free data structures are implemented such that, broadly speaking (there are nuances in this which are not necessary here), all threads can continue with their work, regardless of how many other threads are using the data structure or whether or not they've been idled by the operating system.

In short, a lock-free data structure is written in such a way that all possible operations, and any number of them in any order can happen at ANY time, i.e. between the execution of one instruction and the next, and the data structure remains valid at all time.

It is this which makes the data structure process, thread and interrupt safe.

Typically, lock-free data strutures achieve this though the use atomic operations and careful design. (Superior designs also distribute memory accesses away from just one or a few pointers, an improvement which is necessary but insufficient for scalability).

abimaelmartell · on May 31, 2016

There's a different repository for every version :P

https://github.com/liblfds?tab=repositories

liblfds · on May 31, 2016

Yes. The reason for this is that every version of the library can be concurrently included and linked against. The reason for this is so that when a new version of the library is incorporated, existing code does not even have to be revalidated, because the code it is using is unchanged.

In this latest release, I've actually moved over to a single repo, which will gain a new directory for each new release. The reason for this is that the benchmark app builds and links against earlier version of the library, so it can benchmark them. The gnuplots show not only the performance of locking data strcutures running the same benchmarks, but also of the earlier versions of liblfds.

abimaelmartell · on May 31, 2016

Yeah, that doesn't sound right, if you are already using git, try to use branches, and tags.

Also the fact that the whole API uses the version number as some kind of namespace is weird.

liblfds · on May 31, 2016

If it does not, how can you concurrently include and link against multiple versions of the library?

wahern · on June 1, 2016

On Linux and maybe Solaris you can use ELF symbol versioning.

Otherwise, if the concern is both libfoo, libbar, and bin/acme linking to liblfds, then they should be linking it statically. Either liblfds or the depending code (libfoo, libbar, and bin/acme) can be compiled in such a way that liblfds symbols do not leak outside the component they're used in. On modern systems you're not limited to static or extern linkage scopes, even across compilation units.

Windows might be difficult. But shared, dynamically linked DLLs aren't common on Windows. At least, not in the sense where the DLL can be sourced independently. That's because historically DLLs were only compatible with code compiled with the same Visual Studio release (because CRT intrinsics and data structures were not backwards or forwards compatible), which meant that in practice you always bundled dependencies, even if dynamically linked, to ensure everything was compatible.

liblfds · on June 1, 2016

> On Linux and maybe Solaris you can use ELF symbol versioning.

Remember that the library supports bare metal platforms. It is intended to be used in the absence of an operating system.

I'm afraid I'm not sure I undersand what you've written.

My aim is to ensure that for applications using the library, that when a new release comes out, that they can use it while seeing absolutely no change in the code they are already using (header files, the binary being linked to, etc), as this is the and the only way to know that existing code will be wholly unaffected.

int_19h · on May 31, 2016

Look at other popular libraries, and consider how they solve this problem - and whether it even is a problem for them (and if not, why).

Usually, this sort of versioning is done if and only if a breaking API change occurs, and such changes are avoided if possible for that very reason. If you only rev the major version number on breaking changes, then you only need to reflect that number in your API (and even then only if/when you actually rev it).

liblfds · on May 31, 2016

I an of the view that it is a problem, but one which is almost wholly ignored and as such, not addressed.

Generally, software projects version as you have described, based on breaking the external API. If the API is unchanged, the library is deemed to be effectively unchanged.

I am not happy with this. ANY change in a library requires revalidation - and so the only way to deal with this is to have no change at all, not even in the library binary. To do otherwise is to assume correctness, which as we know, given the nature of software, this is something which is never always true.

This level of care about change is I think almost never used and normally software libraries take for themselves the advantages of fixing versioning at the API-breaking level - it makes development easier, at some of cost to the users of reliability (if the users accept the changes without revalidation) or at some cost to the users of work (if the users revalidate).

I think a library should act as a factorization - it should take into itself all possible work which would otherwise have to be performed by ALL its users, so that that multiplicity of work is converted into a single instance of that work, performed by the library authors.

glibgil · on May 31, 2016

Shouldn't this type of library be done in rust from now on?

liblfds · on May 31, 2016

I've heard of Rust. People seem to like it. However, I am a C programmer and have always been a C programmer, and I write the library for the pleasure of it, so other languages don't really connect.

monocasa · on May 31, 2016

This library has been around for many years, and predates anything resembling stable rust by more than half a decade.

liblfds · on May 31, 2016

God, I'm old.