Even if one programs in assembly, such features are a "black box" because instruction sets rarely provide any instructions for controlling these things, apart from issuing cache prefetching hints which are not mandatory for the CPU to honour.
I do not think one has full control over OOO execution on x86-64 processors. Also I do not believe one has control over the execution pipeline even in assembly, although I do not know exactly what you mean by that, so it could just be a misunderstanding.
And headers for memory subsystem (DDR SDRAM)! I mean, I know usually there's cache line sized interleaving repeating for each memory channel, but it would sure be nice to reliably control which memory access goes to which memory module.
With NUMA physical memory ordering gets even uglier, usually each physical socket's memory is in a big chunk, but sometimes it's also interleaved every 4 kB.
Additionally not all C compilers provide such headers.