Maps and sets can have quadratic-time performance

corysama · on Jan 30, 2017

A paper came up on HN recently that showed that C arrays could have log(N) behavior for individual accesses if you antagonized the TLB cache hard enough. It got more discussion on Reddit than HN though.

https://arxiv.org/abs/1212.0703

https://news.ycombinator.com/item?id=12385458

https://www.reddit.com/r/programming/comments/509qhd/randoml...

panic · on Jan 30, 2017

Assuming your hash table is power-of-two sized, could your iterator step forward by (say) 19 elements at a time instead of one? Since 19 is relatively prime to any power of two, you'll eventually iterate over every element if you take the remainder each time you hit the end. Is there a way to pick the step size to ensure quadratic behavior never happens?

EDIT: The bug description at https://bugs.swift.org/browse/SR-3268 mentions this solution but dimisses it due to poor cache locality.

raphlinus · on Jan 30, 2017

If I'm reading right, this is the same problem as http://accidentallyquadratic.tumblr.com/post/153545455987/ru... , only it hasn't been fixed yet in Swift?

Gankro · on Jan 30, 2017

Yes. However Rust's precise mitigation only works because it doesn't use hashcoding (Swift does). A possible mitigation that's similar to Rust's would be for each Dictionary to post-process hashes with some per-Dictionary key/salt.

panic · on Jan 30, 2017

Thanks, this is a much clearer description of the problem!

captainmuon · on Jan 30, 2017

I wonder if there is a way to provoke hash collisions. A while back there was a bug in Python dicts that let you do that. Construct a bunch of strings that hash to the same bucket, and feed that to an application that tries to store them in the dict, and you could cause extreme resource consumption IIRC. The remedy was to add a random (per-process) part to the key function. It would be interesting to see if Swift/ObjC has this vulnerability, too.

dgfgfdagasdfgfa · on Jan 30, 2017

Could we get "swift" added to the title?

Thanks!

mywittyname · on Jan 30, 2017

Alternate proposal: in-order iterating over a set is what's causing the slowdown.

    let nanobuild = Swimsuit.nanotime() {
        for i in 1...size {
            d.insert(i)
        }
    }

vs.

    let nanocopy = Swimsuit.nanotime() {
        for i in d {
            dcopy.insert(i)
        }
    }

Clearly the biggest difference between the code is how he's getting the values to insert. If he changed the later loop from for i in d, to the original for i in 1...size, I suspect he'd see equivalent performance.

This code does not demonstrate the issues with HashMaps outlined in the article. I'm not familiar with Swift's libraries, but I bet it has a load factor parameter, the author could just set that to >1.0 and load a bunch of data into it, then the code would show what the article claims.

Gankro · on Jan 30, 2017

There's a load factor in Swift's Dictionary but it's hardcoded. Same for Rust. (I've maintained both implementations)

I assure you the bug is as described. I wrote it up the initial bug report and what needed to be done to mitigate it here [1].

Most critically, my analysis on different sizes demonstrates that it's ultimately a consequence of how hashes are modulo truncated. Increasing the size without increasing capacity (higher load) can in fact decrease execution time.

[1] https://bugs.swift.org/plugins/servlet/mobile#issue/SR-3268

(sorry this comment is poorly written, my phone was about to die; read the issue for a proper discussion)

jwatte · on Jan 30, 2017

To be fair, my hash table implementations always keep a nth-key iterator cache, to make the default case of 0..n fast. If the standard library doesn't do this, I'd call that a miss in implementation.

alistproducer2 · on Jan 30, 2017

I hope this isn't too off topic, but this is the first time I've read swift and, to my eyes, it looks gorgeous. I had no idea iOS apps could be written in such a nice looking language. I could never stomach Objective-C.