For CPUs, you need to at least allow programming at a much lower level because of the greater diversity of workloads. The average programmer will never write anything that cannot be specified in C, but they will certainly use such code (operating systems, for one). A good CPU needs to be versatile and flexible, but most of the good stuff doesn't need to be exposed to third-party developers.
Even CPUs' architectural transparency has its limits. You could squeeze a few percent out of your parallel code if you knew exactly how writes and reads were ordered on a particular system. As a rule, however, whoever made your CPU will not tell you; there's the 'trade secrets' thing, sure, but more importantly, telling you would impede their ability to do things a slightly different but better way next time, because your code would rely on the old behavior in ways that cannot yet be automatically detected or fixed. Instead, they specify a memory consistency model[1][2][3], an interface you can code to once and rely on for future generations.
Even CPUs' architectural transparency has its limits. You could squeeze a few percent out of your parallel code if you knew exactly how writes and reads were ordered on a particular system. As a rule, however, whoever made your CPU will not tell you; there's the 'trade secrets' thing, sure, but more importantly, telling you would impede their ability to do things a slightly different but better way next time, because your code would rely on the old behavior in ways that cannot yet be automatically detected or fixed. Instead, they specify a memory consistency model[1][2][3], an interface you can code to once and rely on for future generations.
[1] http://en.wikipedia.org/wiki/Consistency_model [2] http://en.wikipedia.org/wiki/Memory_ordering [3] http://en.wikipedia.org/wiki/Weak_consistency