I was hoping for more information on how they implement it.. there's nothing in ...

exDM69 · on Nov 21, 2011

GCC will probably not use any specific hardware facilities, which means this is probably going to be implemented with regular atomic operations.

Within a transaction block, the results of all reads are stored (to a local, hidden variable). When the transaction is about to finish, all reads are repeated and if any of them yields a different result, the transaction is restarted. When the transaction is committed, there will likely be some kind of a global lock (that will be held for a very small time).

As GCC probably doesn't require any kind of threading or locking, it's most likely that the write lock will be a spinlock using an atomic read-modify-write and some kind of yield instruction (monitor/mwait on new cpu's, pause on older).

As far as I can see, there really aren't lots of other methods to implement STM, especially from within the C compiler.

roxtar · on Nov 21, 2011

The article hints that GCC uses a combination of HTM and STM, if HTM is available.

scott_s · on Nov 21, 2011

The paper I linked to claims they have an STM, and a hybrid HTM-STM system if hardware support is available.

eis · on Nov 21, 2011

Since you can't do all reads in one atomic instruction and you also need to make it atomic with the write (CAS), wouldn't that still require a lock for the whole operation?

onemoreact · on Nov 21, 2011

As long as you write to separate parts of memory and are cautious with freeing memory you don't need locks for reads.

eis · on Nov 21, 2011

How can you make sure there are no writes to the memory you are reading from?

onemoreact · on Nov 21, 2011

Don't write to the same location. Everything needs to be a pointer, but updating pointers is an atomic operation. aka assume a is an integer.

  a->(0x00010001)->5
  a->(0x00030001)->6

You can keep reading 0x00010001 and getting 5 all day even as a is "actually' 6. This also works with strings or objects etc, the only downside is you tend to eat up a far amount of memory, and you need to avoid freeing 0x00010001 when something still thinks a's value is stored there.

eis · on Nov 22, 2011

You can still not read or write to more than 2 pointers atomically on x86_64 so my question remains.

onemoreact · on Nov 22, 2011

All you need to do is read the location that pointer points to as an atomic operation. So allocating a 50kb string would work in the same way as long as you could store it in a specific processes memory.

  PX a=(0x00010001) //which points to 5
  P0 x0=a=(0x00010001) //which points to 5
  P1 y0=a
  P1 pointer y1=0
  P1 y1 = malloc(sizeof(int))
  P2 x1=a=(0x00010001) //which points to 5
  P1 *y = *y0 + 1 //aka 6
  p3 x2=a=(0x00010001) //which points to 5
  P1 a=y //you could do a lock to verify that a == (0x00010001) but if you don't care about dirty writes then then you can also do this as an atomic operation.
  p3 x3=a=(0x00030001)//which points to 6

And once x0,x1,x2 stop pointing to (0x00010001) you can free that memory. The assumption is xN and yN is a process specific local variable preferably a register.

eis · on Nov 22, 2011

I couldn't really follow your example. I can't see how this solves race conditions where multiple threads can intermingle those instructions at will. That's the basic problem with lock-free algorithms.

Did you maybe suggest to add one more indirection so all variables in a transaction are in one memory block behind a single pointer which can be atomically updated to newly allocated code?

I wonder what the overhead of that would be. All the allocations plus either keeping track of references or doing garbage collection...

onemoreact · on Nov 22, 2011

Sorry, I will try and be more clear there are some good descriptions of this technique on the web but I can't remember what it's called. Still you have the basic idea and it's downsides. Still normally your dealing with larger objects so the overhead is a little different.

Just think about how a you normally work with objects. Normally you have a pointer to some allocated memory locationObjectPointer = (x00100) which points to a chunk of memory the size of 3 floats (x,y,z). Now normally when you update x and y you overwrite their memory locations which is fast in single threaded programming but you need to worry about something reading after you update x at (x00100) but before you update y (x00104).

However, if instead of updating in that memory block you created a new object locationObjectPointer2 = (x00600) copy'd what was in the original object and then when your done changed the pointer from the old object to your new one. Well, as you say you have the overhead of keeping track of references or doing garbage collection, but at the same time you can do a vary fast lock when you want to change the pointer and test to see if the object was changed. That test is actually hard to do with most types of memory management systems and normally creates a lot of overhead so it's a trade off.