The author is incorrect in the section about memory fences. x86 has strong memory ordering [1], which means that writes always appear in program order with respect to other cores. Use a memory fence to guarantee that reads and writes are memory bus visible.
The example that the author gives does not apply to x86.
[1]There are memory types that do not have strong memory ordering, and if you use non-temporal instructions for streaming SIMD, SFENCE/LFENCE/MFENCE are useful.
Modern X86 machines do not quite have strong memory ordering. Loads may be reordered with older stores to different locations. This breaks some of the overly clever "lock free" algorithms:
x86 enforces (essentially) total store order across all sockets: http://www.cl.cam.ac.uk/~pes20/weakmemory/index3.html. The barriers are still useful for kernel code because other processors on the machine usually don't participate in the cache-coherency protocol.
The example that the author gives does not apply to x86.
[1]There are memory types that do not have strong memory ordering, and if you use non-temporal instructions for streaming SIMD, SFENCE/LFENCE/MFENCE are useful.