Ordering is not the only concern here. std::set actually provides a logarithmic ...

kllrnohj · on March 4, 2021

If performance is a concern then you should still avoid std::set though by default. Logarithmic worst case when it's just always slow isn't really useful.

There may be a benchmark out there where std::set can beat std::unordered_set, but you'll be really hard pressed to find it

mehrdadn · on March 4, 2021

Sure, but this isn't a benchmarking paper.

shultays · on March 4, 2021

Imho "but technically..." is not a valid opinion while in practice the access is O(1) on average. Yea sure, it becomes linear if your hash is "return 42;", it can't grantee that you supplied a good hasher.

majjgepolja · on March 4, 2021

It's not technically. Worst-case guarantee is required for some applications. Hash Maps have better amortized complexity but some implementations have bad worst case time.

Java uses balanced trees instead of linked list in their chained-hashtable implementation, if I recall correctly.

mehrdadn · on March 4, 2021

There are lots of reasons to prefer trees (and, correspondingly, lots of reasons to prefer hashtables); I just pointed out ordering isn't the only one, and I merely gave another (worst-case performance guarantee). For example, yet another one is iterator stability; insertion can invalidate iterators for unordered containers, making them useless for some applications that need to analyze data efficiently as they insert.

I could go on, but it's very much detracting from the point of the paper to argue about the data structure choice when the paper isn't on this topic at all or trying to analyze any particular application to begin with.

Kranar · on March 4, 2021

Technically is a very valid opinion if you're writing software that needs to be robust and secure in the face of user input. std::set is guaranteed to be logarithmic strictly in the number of comparisons in the worst case without exception.

std::unordered_set has some very difficult to reason about worst case scenarios that must factor in both the size of the collection and the hash function. Unless you are very careful about the type of hashing you use, you can easily be vulnerable to DDOS attacks that scale on the order of O(k * n) for key size k and container size n.

owl57 · on March 4, 2021

Note that this isn't just some user-supplied bad hashers. Hash for std::string in GNU library is not much harder to hack for collisions than your example. And also it uses the whole string, so is slow on long strings. I didn't investigate if other implementations of STL are any better.