Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: A hash array-mapped trie implementation in C (github.com/mkirchner)
171 points by mkirchner on July 10, 2023 | hide | past | favorite | 56 comments
Long-simmering side project that is finally ready to see the light. HAMTs are a cool persistent data structure and implementing one has been a lot of fun. Beyond the code, there is likely some value in the extensive and largely complete implementation docs; basic benchmarks are linked in the README, too.

Kind of aiming to be "the libavl for HAMTs". That is obviously a high and aspirational bar but a distinct possibility if it stirs up a little interest and/or contribution.

Anyways, it's time for this to go out, collect feedback and maybe even some use outside of toy projects. Let me know how it goes.




I discovered HAMTs first when Erlang added support for first-class maps, and it was sort of a "holy shit!" moment for me. They felt like the "holy grail" of data structures for me; I can treat any updates as "copies" without the cost of a copy.

About a year later, I learned Clojure, and fell even more in love with the data structure; when the language fully embraces a useful data structure, it changes the way you think about the entire program, and now it's sort of hard for me to go back to languages that don't have a good HAMT implementation.

I mean, I still do it, but I do think that having a "go to standard" in C really has the potential to set a great precedent.


Just a note about your 'exported memory allocation' API:

    struct hamt_allocator {
        void *(*malloc)(const size_t size);
        void *(*realloc)(void *chunk, const size_t size);
        void (*free)(void *chunk);
    };
This whole thing could just be:

    struct hamt_allocator {
        void *cookie;
        void*  (*realloc) (struct hamt_allocator* h, void* chk, const size_t size);
    };
With the following constraints:

1. `realloc(H, nullptr, N)` -- allocated N bytes

2. `realloc(H, p, 0)` -- frees the pointer p

3. `realloc(H, p, N)` -- resizes the pointer p

And, the user has access to a 'context' (`cookie`) so they can use a (for instance) pool allocation scheme. Personally, I like a slightly different API:

    struct hamt_allocator {
        void *cookie;
        void*  (*realloc) (struct hamt_allocator* h, void* chk, const size_t oldsize, const size_t newsize);
    };
With the following constraints:

1. `realloc(H, nullptr, 0, N)` -- allocated N bytes

2. `realloc(H, p, N, 0)` -- frees the pointer p

3. `realloc(H, p, N, M)` -- resizes the pointer p

But I know a lot of people get confused and/or don't like having to pass (& thus keep) so much information to the allocator.


Sadly, this pattern doesn't work with standard realloc() anymore. C23 makes this undefined behavior due to existing non-conforming implementations.

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf


The pattern was never specified as fully working. Not in C99 and C90.

It's due to the following reason: it was never specified that realloc(ptr, 0) behaves like free(ptr).

The case of size == 0 is not separately discussed in the C99 description of realloc.

realloc(ptr, 0) can behave like (free(ptr), malloc(0)), where malloc(0) doesn't necessarily behave like ((void *) 0). Malloc may return a unique object that may be later freed.

That is to say, realloc can effectively reduce an object to zero size without freeing it, or possibly even free an object only to replace it with a different, unique zero-sized object.

Note that in the case when ptr == NULL, which is discussed in C99, realloc behaves like malloc. So in the case realloc(NULL, 0), we know that it's the same as malloc(0). That's a case when size == 0, and realloc is required to behave like malloc.

This is a standard-conforming realloc, and always has been:

  void *realloc(void *oldptr, size_t newsize)
  {
    size_t oldsize = __allocated_size(oldptr); // this handles NULL, and returns 0

    void *newptr = malloc(newsize);

    if (newptr != NULL && oldptr != NULL)
       memcpy(newptr, oldptr, min(newsize, oldsize);
    
    return newptr;
  }
This realloc will not behave like free if malloc(0) returns unique pointers.

If you want a realloc-like function that behaves like an "all in one allocator", you have to write your own:

   void *git_reset_of_allocators(void *oldptr, size_t newsize)
   {
      if (size == 0) {
        free(oldptr);
        return NULL;
      } else { 
        // handles all other cases OK
        return realloc(oldptr, newsize);
      }
   }


I stand corrected! Thanks :)



> Sadly, this pattern doesn't work with standard realloc() anymore

It never really did and it wasn't supposed to.


I would drop the API directly and concentrate on the algorithm.

If users integrating HAMT need a different allocator situation, they can solve that problem by themselves, without a run-time indirection shim.

You can help those users by providing some macros somewhere like #define hamt_malloc(ctx, x) malloc(x) and so forth, so it can be retargeted in one place. Leave an ignored context argument in place for those who would like to have a run-time switch per-instance.


I used macros in my libraries for overriding the allocation functions before, but have switched to runtime callbacks later, same for logging btw (I'd need to look through the issues list for the exact reasons which led to this decision though). In any case, passing callbacks as part of the initialization "feels" right and I didn't hear any complaints since the switch (and it's definitely a lot more convenient for the library user).

Also, if the library calls those function so frequently that the pointer indirection would be a performance problem, then there's arguably something very wrong with the library's design.


I've seen this type of API design in C before, but not with a context. I'm curious where HAMT would get the ctx instance to pass to hamt_malloc in this design?


If the macro doesn't use the argument, it doesn't have to exist.

Currently the code does use macros, e.g. mem_alloc(h->ator, size).

Firstly, I'd hide the detail of how the allocator is derived from h into the allocator wrapper routines and just make it mem_alloc(h, size). It's mem_alloc which can do h->ator.

Now the nice thing is that h always exists. So even mem_alloc ignores the first parameter, we don't have to pass something fictional as an argument:

  #define mem_alloc(h, size) malloc(size)
then remove the h->ator and related cruft. Someone who needs a context for their allocator can hack that in.

By the way, if you're going to have allocator providers, you want:

  #define mem_alloc(h, size) h->ator->alloc(h->ator, size)
                                            ^^^^^^^
and not:

  #define mem_alloc(h, size) h->ator->alloc(size)
like the code has it now.

Thou shalt not define a C callback interface without a context pointer.

Without a context pointer, you cannot have an allocator module where you can bind different allocator arenas to different objects.



The biggest advantages here are not having a single function, but instead:

1. Having a context.

2. Having free and realloc getting the old size. Not having the size passed is more or less consensually regarded as a mistake in the C stdlib.


The advantage of the alloc/realloc/free interface is that you can simply plug in the standard functions malloc, realloc and free though, and any other allocator implementation which follows this convention without your own wrapper function inbetween.


> But I know a lot of people get confused and/or don't like having to pass (& thus keep) so much information to the allocator.

That's not why I think it' a bad idea.

I prefer the original because:

1. I can pass use the existing `free`, `malloc` and `realloc` implementations. By using a function with a different set of params I cannot do that. The user has to always write the `realloc` function you propose.

2. The interface for free, malloc and realloc is already documented, I don't have to explain to the programmers what constraints are needed for their memory allocator function. Under your proposal the programmer has to now have, and read, the documentation


I generally like this pattern, but why pass the allocator instead of the cookie?


If you want to "shim" the API, then it's easier to have the whole previous object, rather than just the cookie.


You don't need the cookie then. You can just allocate a larger struct (sort of subclassing it). You save a pointer indirection.


There's a limit to what can be crammed into a HN comment!


I have discovered a truly marvelous allocator pattern which this HN comment is too small to contain.


[350 years later] I have confirmed the optimality of the allocator pattern as a special case in my proof of the Inter-universal Teichmüller theorem (Springer, 879pp).


Fair enough:

    struct hamt_allocator {
        void *(*realloc)(struct hamt_allocator *h, void *chk, size_t oldsize,
                        size_t newsize);
    };
    struct my_allocator {
        struct hamt_allocator parent;
        size_t used;
        char buffer[8192];
    };
    static void *alloc_func(struct hamt_allocator *h, void *chk,
                            const size_t oldsize, const size_t newsize)
    {
        if (!newsize) {
            return NULL;
        }
        if (h && newsize <= oldsize) {
            return h;
        }
        // this cast is allowed by the standard for this purpose
        __auto_type *alloc = (struct my_allocator*) h;
        if (newsize > sizeof(alloc->buffer) - alloc->used) {
            return NULL;
        }
        void *ret = alloc->buffer + alloc->used;
        alloc->used += newsize;
        if (h && oldsize) {
            memcpy(ret, h, oldsize);
        }
        return ret;
    }
    void func_taking_allocator(struct hamt_allocator *h);
    void user_code() {
        struct my_allocator alloc = { .parent.realloc = alloc_func };
        func_taking_allocator(&alloc.parent);
    }


While this was not the point of the comment, one thing missing is that oldsize and newsize should be rounded for alignment purposes:

    struct my_allocator {
        struct hamt_allocator parent;
        size_t used;
        union {
            char buffer[8192];
            max_align_t _align;
        };
    };
    
    static inline size_t align_size(size_t value)
    {
        size_t alignment = _Alignof(max_align_t);
    
        // if (value % _Alignof(max_align_t) == 0) {
        //     return value;
        // } else {
        //     return ((value / alignment) + 1) * alignment;
        // }
    
        return value + (alignment - 1) & ~(alignment - 1);
    }
Even better, the allocator function should get the alignment so don't waste bytes unnecessarily.


Beautiful, thanks!


The solution is obviously to realloc the comment.


Good datastructure, code looks quite clean for C.

Your API is missing some of the advantages relative to hash tables though. Because it's a tree, operations like union and difference of two instances can be sublinear in the size of the instances. E.g. union can copy subtrees when the other instance has empty at the corresponding position.

You're also missing a batch construction call, create a new tree out of N key/value pairs. That's much faster than inserting one at a time because you can sort the array up front and then create the tree without any temporary nodes.

The function pointers in the interface are probably difficult for the compiler to devirtualise. Changing that in C means macros or code generators though, does some damage to ease of use.

Thanks for sharing it


Thank you, happy to share. These are excellent pointers.

Regarding the batch construction, it's not immediately clear to me how to implement sorting since the order is implicit through the hash function (it seems one would need to construct a trie to build a trie?) but I might be wrong...


Hash the keys before/while sorting. The repeated hashing with different seeds is an interesting approach to collisions but would add some annoyance here.

Roughly do just enough work to put the key/values in the same order that you would see them in when iterating through the corresponding trie.

Other way to go would be to implement merge then do the batch construction by partitioning the initial array, building tries out of the pieces then merging them. That could also be the fallback for when the first hash collides. If the partitioning was by the first five bits of the hash you'd get a reasonable approximation to doing the sort.


How does this compare to https://github.com/arximboldi/immer (other than the C/C++ difference)?

Also, it's my understanding that, in practice, persistent data structures require a garbage collector in order to handle deallocation when used in a general-purpose way. How does your implementation handle that?

Also, have you seen https://github.com/cnuernber/ham-fisted ? I think there are a few other Java-based persistent collections as well in the overall Clojure ecosystem that also improve on Hickey's original implementation, but I can't recall them now…


> How does your implementation handle that?

This is explained in the readme[1]. You can pass custom allocation functions (mallic, realloc, free), so you can plug in Boehm fex.

[1]: https://github.com/mkirchner/hamt#memory-management


So refcounting?


Beohm is not refcounting, it's a tracing GC.


Boehm is not tracing, it's a conservative GC.

PS admittedly, terminology is not precise enough in this space.


I would say that the terminology is pretty good in this space, and BDWGC is a conservative tracing GC. A tracing GC determines reachability by following (i.e. tracing) chains of references. A precise GC will retain exactly the set of reachable objects, a conservative GC will retain potentially more.

Tracing GCs might be precise (or not), they might handle internal-pointers (or not), they might move the data (or not), they might handle variable sized allocations (or not).


Are these the same as Crit-bit trees [0]?

[0]: https://cr.yp.to/critbit.html

Edit: I think I understand now from the design section on your page, they are both tries, but the HAMT uses the hash of the key to locate the node whereas Crit-bit uses the key itself.


Crit bit tries are also, therefore, sorted. They can be iterated in order.

See also https://dotat.at/prog/qp/README.html


Very cool!


how does HAMTs compare with more recent designs like Swiss Tables? [1]

[1] https://abseil.io/about/design/swisstables


Swiss Tables are hash tables. O(1) lookup, but expensive to copy.

HAMTs are hash tries. O(log(n)) lookup, but persistent / cheep to copy.

They are not really comparable, since hash tables are not persistent.

In functional languages, persistent data structures are MUCH more natural to work with. HAMTs we're originally created for the Clojure standard library, IIRC.

HAMTs lend themselves to more elegant/performant implementations than self-balancing trees, since they don't need to rebalance as long as your hash function is good.

Also, HAMTs generally have a high branching factor, so the search can be as fast as a hash table for small-to-medium maps. Though I don't know of any HAMTs using SIMD tricks like Swiss Table.

(EDIT: I guess the popcnt thing that HAMTs do would be considered a SIMD trick. Larger registers would allow the branching factor to be raised.)

Like hash tables, HAMTs don't require intermediate key comparison for lookup.

The original HAMT paper is a good read: http://infoscience.epfl.ch/record/64398/files/idealhashtrees...


> HAMTs are hash tries. O(log(n)) lookup, but persistent / cheep to copy.

O(LOG(k)) might be a clearer bound rather than n.


For a hash trie, the depth is bounded by the log of the number of elements: O(log(n)).

I think O(log(k)) would mean that the bound is based on the size of the largest key. This may be true of regular tries, but not of hash tries.


Completely unrelated.

The primary advantage of hamt is that they’re persistent, so they’re immutable with cheap update, but with efficient lookup & good cache behaviour thanks to the dense nodes and high branching factor.


Not much to add other than this is cool and the README was very informative. Nice work!


I have spotted a couple of ways to improve performance:

Increase the size of the bitmap in each node from 32 bits to 64 bits. You are wasting 4 bytes per node, and wider nodes mean fewer indirections for each lookup.

Change the recursion in the lookup to iteration. You used iteration in other traversals, and it should be much more efficient.


Thanks! Yes, basically log_64(n) vs. log_32(n) and it would also require to switch to a 64 bit hash function and adjust hash exhaustion and bit fiddling arithmetic accordingly.

Re the impact of the recursion, that's actually zero for the search code since clang does a tail call optimization; it's a fair point for the removal code.


Bagwell’s HAMT paper is one of my favourite data structure papers! Thanks for sharing!

In case this is unfamiliar topic - immutable value based programming is super cool because it simplifies the cognitive load of reasoning about your program - hence enabling you to write better programs per-unit-of-effort consumed.


If you're interested in persistent data structures for C++, then I highly recommend Immer. High quality and easy to work with.

https://github.com/arximboldi/immer


I just did a bit of a write-up on HAMTs and you can find it here: https://photonlines.substack.com/p/grokking-hash-array-mappe...

I actually included your repo as well in the mentions at the end of the article so hopefully it helps :)


Nice! I think you should really add a userdata member to hamt_allocator. Not every memory allocator is global!


Agree. I was trying to mostly mimick the libavl API, hence the design decision.


That's cool! I recently had GPT-4 write me a HAMT in C++, and it somewhat works, a lot of the basics are there. I ran out of time but am planning on fixing it up with instruct. It missed a lot of stuff and its tests run but they aren't very good.


Doc fix: Iterators section repeats a code block with these declaration from the previous section:

  size_t hamt_size(const struct hamt *trie);
  const void *hamt_get(const struct hamt *trie, void *key);


Is enum { N = 5; }; legal C now with the semiclolon inside the braces?


Nice work! what were some of the biggest challenges getting this to work?


#1 doing it in many 20-30min batches with little kids

#2 see #1

Jokes aside, IMHO there was not a single challenge standing out in terms of implementation; getting the pieces to work together and finding residual bugs was hard. LLDB was my friend but still missing valgrind on Mac. Writing (and re-writing) the docs really helped with mental clarity and not having to stress about a deadline was not hurting either. I often just closed the lid and made it my future self's problem with good success ;-)

Oh, and one thing I am proud of: how well the recursive search generalized to path copying. That came together very nicely.


I like how readable the code and docs are. Thanks for sharing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: