This is one thing that amazed me about early mainframes, the hardware was the API. It was designed to be used, not abstracted over and cleaned. In the modern world each abstraction layer fixes problems of the layer below, exposing its own flaws that are fixed by the layer above. With exokernels, nanokernels, picokernels and unikernels mabye we can get some of these qualities back. If an abstraction because pervasive enough, then we can make hardware that implements the abstract hardware interface.
The evolutionary growth of the PC platform provided for some powerful things. But we've lost a bit of the "engineered system" principle in the process.
Like the Apple II I suspect such an integrated system was a pain to expand. There were later Amiga OSs so perhaps this was possible, but I don't know how well they maintained compatibility with original programs.
As long as one programmed through the OS API, things were guaranteed to work. Even the AGA CustomChips were backwards compatible with ECS, just as ECS was backwards compatible with OCS: the addresses were the same, and so were the control / function bits in them. Newer ChipSets simply added additional CustomChip registers. The biggest source of incompatibility on the Amiga were the processors, as newer ones introduced instruction and data caches: software which used self-modifying code, a concept taken over from the C=64 would crash instantly due to cache being out of sync with memory.
Hardware expansion was and remains a breeze with the AutoConfig protocol and the Zorro-II and Zorro-III expansion buses. Many people run their Amigas with additional sound and graphics cards. There are even PCI bridge expansion cards.
My main concern is that if you have a publicly documented chip that explicitly concerns itself with timing it makes it very hard to shave off cycles in the future.
The Apple IIhad similar problems. Woz moved a lot of stuff to software to save chips but it causes compatibility issues in the future.
I can't help wonder if this is the root of the schim forming around Linux.
Where more developers, and more and more "admins" are less than interested in the underlying hardware, because all they see during the day is piles upon piles of VMs and containers that abstract away the actual hardware being used.
And then there is a dwindling group of people with at least a token interest in the hardware, but they are either being ignored or ostracized by the abstract software people.
That is a side effect of the UNIX culture, because the hardware doesn't matter as long as you have C and POSIX.
Even OpenGL and audio APIs made it so that most of the time the cards you have plugged are irrelevant.
Being a old timer, which also has spent quite a few time with Amiga users/devs back in its golden age, it took me a few years to realise, that in what concerns desktop graphics programming, macOS and Windows communities are much more welcoming than UNIX focused ones.
You see this quite clearly on macOS, those that came from Mac OS days (pre OS X) focus on UI/UX and the whole experience as a dev taking advantage of an unified software/hardware stack.
Those that came from BSD/Linux just use CLI tools as if they were in any other UNIX.
Which is one reason why the demosscene never thrived on GNU/Linux.
I think it has less to do with C and POSIX, and more that Unix from day one was multi-user, you can't have someone "bitbang" hardware when there are multiple users accessing.
Also, i think perhaps you conflating issues here.
While sure they may be more interested in the GUI (though you will find plenty of TUI programs in the Unix world, and even some in Windows that has been inherited from DOS) the newer generation is less likely to be interested in the actual hardware beyond that it works to run their precious "apps".
And IMO the demo scene were largely "dead" by the time Linux come on stage anyways.
This because the fixed hardware models of the C64 and like were being supplanted by the mix and match PC, where the only commonality is the BIOS and the CPU ISA.
So you're saying the open ISA still doesn't allow access to the command processor? What sort of things does that preclude? Suppose you decided to write a game directly to the hardware instead of via DirectX or OpenGL, or a molecular dynamics simulation directly to the hardware instead of via CUDA, what would happen?
If you tried to do it directly, you'd a) find it very hard as it's largely undocumented, and b) it'd probably break when the vendors decide to change it without warning on their new GPUs.
As for applications, well I dunno. One particular example would be Render To Vertex Buffer, something that ATI cards used to expose an extension for but NVidia cards didn't. Even though basically _any_ GPU could do it if the driver decided to.
A better use would be to allow the GPU to read the game's scene structure directly without even needing an API in-between.
I wasn't suggesting a way to write your own command buffers, but instead a way to write your own code that reads those command buffers on the GPU side. (running on the GPUs command processor)