x86 enforces (essentially) total store order across all sockets: http://www.cl.cam.ac.uk/~pes20/weakmemory/index3.html. The barriers are still useful for kernel code because other processors on the machine usually don't participate in the cache-coherency protocol.
Do you have a reference for the strong memory ordering on x86? I'd like to read more about it.