Hacker News new | past | comments | ask | show | jobs | submit login

>If we set _foo to 0 and have two threads that both execute incl (_foo) 10000 times each, incrementing the same location with a single instruction 20000 times, is guaranteed not to exceed 20000, but it could (theoretically) be as low as 2.

Initially it seemed to me the theoretical minimum should be 10000 (as the practical minimum seems to be). But it does indeed seem possible to get 2:

1) Both threads load 0

2) Thread 1 increments 9999 times and stores

3) Thread 2 increments and stores 1

5) Both threads load 1

5) Thread 2 increments 9999 times and stores

6) Thread 1 increments and stores 2

Is there anything in the x86 coherency setup that would disallow this and make 10000 the actual minimum?




I think not. The only limit seems to be the number of times one CPU can increment while the other is between load and store operations.

If you could manipulate your two cores with enough dexterity, you could force a 2 result, but without enough fine-grained control, the practical limit is going to be determined by the number of increments that can be "wasted" by sandwiching them between the load and store operations of the other thread. You would essentially need to stop and start each CPU at four very specific operations.

The practical results show that "load + add + store + load + add + store" on one thread probably never happens during a single "add" on the other thread. You would need that to happen at least once to get below 10000. Otherwise, each increment can waste no more than one increment for the other thread, and you end up with at least 10000.

The experimental numbers are probably indicative of how long the add portion of INCL takes in relation to the whole thing.


Unless you use a "lock inc", increment is semantically a read-modify-write, so I believe those steps can technically happen on x86. However, since a thread cannot be interrupted in the middle of an instruction, it's very unlikely that 9999 operations can happen in one thread while another executes an increment. If you wrote it as a mov, an add, and another mov, all it would take is a context switch at exactly the right time.

On the other hand, 20000 is unlikely as well, if the threads have any concurrency at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: