Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ordering is not the only concern here. std::set actually provides a logarithmic worst-case guarantee, whereas std::unordered_set does not. This is a factor to consider depending on the application, regardless of whether ordering is necessary. Whichever one prefers in any case, though, is beside my point—I'm merely trying to use trees and hashtables to illustrate a far more general CS phenomenon that can occur in lots of data structures and languages.


If performance is a concern then you should still avoid std::set though by default. Logarithmic worst case when it's just always slow isn't really useful.

There may be a benchmark out there where std::set can beat std::unordered_set, but you'll be really hard pressed to find it


Sure, but this isn't a benchmarking paper.


Imho "but technically..." is not a valid opinion while in practice the access is O(1) on average. Yea sure, it becomes linear if your hash is "return 42;", it can't grantee that you supplied a good hasher.


It's not technically. Worst-case guarantee is required for some applications. Hash Maps have better amortized complexity but some implementations have bad worst case time.

Java uses balanced trees instead of linked list in their chained-hashtable implementation, if I recall correctly.


There are lots of reasons to prefer trees (and, correspondingly, lots of reasons to prefer hashtables); I just pointed out ordering isn't the only one, and I merely gave another (worst-case performance guarantee). For example, yet another one is iterator stability; insertion can invalidate iterators for unordered containers, making them useless for some applications that need to analyze data efficiently as they insert.

I could go on, but it's very much detracting from the point of the paper to argue about the data structure choice when the paper isn't on this topic at all or trying to analyze any particular application to begin with.


Technically is a very valid opinion if you're writing software that needs to be robust and secure in the face of user input. std::set is guaranteed to be logarithmic strictly in the number of comparisons in the worst case without exception.

std::unordered_set has some very difficult to reason about worst case scenarios that must factor in both the size of the collection and the hash function. Unless you are very careful about the type of hashing you use, you can easily be vulnerable to DDOS attacks that scale on the order of O(k * n) for key size k and container size n.


Note that this isn't just some user-supplied bad hashers. Hash for std::string in GNU library is not much harder to hack for collisions than your example. And also it uses the whole string, so is slow on long strings. I didn't investigate if other implementations of STL are any better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: