Heavier things, like graphics processing were typically written in assembler. Or sound mixing.
Some things like texture mapping you could only write in assembler, because you'd need to use x86 lower/higher half of word (like AL and AH registers) due to register pressure. Spilling to stack could have caused 50%+ slowdown.
486 era you needed assembler to work around quirks like AGI stalls.
On Pentium the reason for assembler was to use FPU efficiently in parallel with normal code (FPU per pixel divide for perspective correction). Of course you also needed to carefully hand optimize for Pentium U and V pipes. If you did it correctly, you could execute up to 2 instructions per clock. If not, you lose up to half of the performance (or even more if you messed up register dependency chains, which were a bit weird sometimes).
One also needs to remember compilers in the nineties were not very amazing at optimization. You could run circles around them by using assembler.
Mind you, I still need to write some things in assembler even on modern x86. But it's pretty little nowadays. SIMD stuff (SSE/AVX) you can mostly do in "almost assembler" with instruction intrinsics, but without needing to worry about instruction scheduling and so on.
>486 era you needed assembler to work around quirks like AGI stalls.
Plus, nobody had a 486 in the 80s (it was released in 1989). People would be lucky to have a 286, but usually just some home computer (Apple II, Spectrum, Commodore 64, Atari ST, Amiga 500, Amstrad CPC, etc).
Yeah, back then assembler was even more pervasive. It was the only way to write something publishable on 8-bit systems. Well, there were some action games (like C64 Beach Head), trivia and adventure games written in BASIC.
You spent a lot of time on 8-bitters getting code and asset size down, so that it'd even fit on the machine in the first place. Forget about luxuries like division and multiply, most advanced math those things could do was little more than adding (add, sub, and, or, xor) two 8 bit numbers together. Even shifts and rotates could only handle 1 bit left or right.
CPU clocks were measured in low single digit MHz. On top of that, 8 bitters were very inefficient — each instruction would take 2-8 clock cycles (6510) or 4-23 cycles (Z80).
On PAL C64 you have 19656 clock cycles per 50Hz screen frame minus 25 bad lines. So you could realistically expect to execute only 5-8k instructions. If you used all the frame time for just copying memory, you'd be able to transfer just about 2 kB. Just scrolling character RAM (ignoring color RAM) took half of the available raster time (yes, I know about VSP tricks, but it wasn't known in the eighties).
16-bit systems allowed some C, but most Amiga and Atari ST games were written in assembler. I'd guess same is true for 286 era, but not sure.
I'm not sure this is correct but: I remember reading or hearing rumours that Psygnosis' Barbarian (for the Amiga and Atari ST) was written in C and that it was so slow because of that.
What I mean is: we perceived it as being slow because of the rumours. Just typical nerd attitude that we still see now.
Sure, some games were written completely in C. I believe significant number were hybrids, performance sensitive parts in assembler and other code in C.
Some games were prototyped in C and optimized afterwards.
For example Amiga Turrican 2 required 33 MHz 68030 CPU during the development phase. Of course the final version ran fine on 7 MHz 68000 Amiga.
Last time I used SSE intrinsics, which was GCC 4.9 I think, I had a lot of trouble with register usage. It looked like it was compiling down to use only one SSE register for everything instead of parralelizing across them.
I tried the same algorithm in godbolt with some clang versions and it was slightly better, using two or three registers, but not by much. So I had to break it into inline assembly.
> It looked like it was compiling down to use only one SSE register for everything instead of parralelizing across them.
Yeah, that's a common problem and leads to nasty dependency stalls. MSVC is horrible in the same way, at least 2015. Haven't tried newer versions yet. Intel's ICC seems to generate good code most of the time.
Yes, it has. I've written a lot of SIMD code and spent a good amount of time reading the compiler assembly output and there has been huge improvement over the last decade.
GCC register allocation wasn't great, then it got better with x86 SSE but still sucked at ARM NEON, and now it seems to be decent with both.
Clang was better at SIMD code before GCC was. It was equally good with SSE and NEON.
In my experience, compilers are much better than humans at instruction scheduling. Especially when using portable vector extensions, you don't have to write the same code twice and then tweak the scheduling for every architecture separately.
> In my experience, compilers are much better than humans at instruction scheduling.
It'd be more accurate to say they're much better than humans when the heuristics or whatever they use works. Sometimes the compiler messes up badly.
The workflow is often to compile and then examine disassembly to see whether the compiler managed to generate something sensible or not.
Other issue is that compiler pattern matching is sometimes not working and generating correct SIMD instruction. Even when data is SIMD width aligned. For example, recently I saw ICC not generating a horizontal add in the most basic scenario imaginable. * shrug *.
Things like this make me question the wisdom of ever using higher level languages. We took the path of abstracting our description of what we want to happen away from processor instructions with the idea that we could write code that could then compile on multiple architectures without changes, but the reality is that we still often need to special case things even without performance considerations, and the farther we abstract the more performance seems to be impacted and the more often we seem to end up jumping through abstraction hoops rather than getting things done.
The minimalist in me wonders if maybe just using some kind of macro system on top of assembler plus a bytecode VM with the ability to drop to native instructions wouldn't ultimately be better.
Things are much harder nowadays due to complexity. From almost impossible to understand CPU cores to massive amounts of third party code to the modern requirements (IoT, ouch!).
Debugging predictable single thread single core system was also child's play compared to distributed networked beasts each running on lots of cores and thousands of threads.
Nineties problems were contained in a small box. Oh, and no internet like today, so needed to order books and magazines. And to use BBS and usenet. Even then, a lot of it was reinventing the wheel again and again.
Modern problems are sometimes nearly uncontained (think software like web browsers, etc.).
Just jumping in to agree here. I've always thought of this way:
In the 90's the code I wrote was more difficult. Today coding is much easier (better languages, tooling, etc..). However the systems I build are many times more complex.
In the 90's we hired the best programmers, today we hire the people who are best at managing ambiguity and complexity.
All of this is very hand-wavy as there are certainly still disciplines where pure programming skill is most important, but those seem to be fewer and fewer every day.
Anyway I envy you guys, now things tend to be too high level and you lose that holistic view of the computing machine.
Back than you felt pretty skilled I am sure, now I feel anybody could do my job, often. Unless you are working for the big 4 or similar, many jobs don't give you that excitement.
Back then assembly language was much more ergonomic than it is now, and the machines were simpler. Many of my favourite games in the 80s were made by teenagers programming in their bedrooms. Check out this 68000 assembly tutorial video from Scoopex, the Amiga demo scene group: https://youtu.be/bqT1jsPyUGw
He gets a simple graphical effect going on the Amiga in only a few lines of assembly. Doing the same thing using DirectX in c++ would take you all day!
This is an interesting line of argument - in what way, and what could be done to improve the ergonomics?
> He gets a simple graphical effect going on the Amiga in only a few lines of assembly. Doing the same thing using DirectX in c++ would take you all day!
This is absolutely true, but in something like ShaderToy you can go back to producing complex pixel-bashing effects with a huge amount of processing power.
It's just the external Tower of Babel from boot to usability has got a lot larger.
My notion of ergonomic mostly comes from programming the Amiga in 68000 and then moving to the PC and being horrified by x86!
In 68k you had 8 32-bit data registers, (d0-d7) and 8 address registers (a0-a7)
If you wanted to access bytes or 16 or 32 bits you could do so like this:
move.w #123,d0 ; move 16bit number into d0
move.b #123,d1 ; move byte into d1
move.l #SOME_ADDRESS,a0; set address reg a0 to point to a memory location.
move.b d1,(a0) ; move contents of d1 to memory location a0 is pointing to.
Nice and easy to work with and remember.
On x86, thanks to its long and convoluted history you have all kinds of doubled up registers which you have to refer to by different names depending on what you are doing, and tons of historical cruft.
On top of the CPU, the old home computers had no historical cruft and it was very easy to talk to the hardware or system firmware; usually you'd just be getting and setting data at fixed memory locations. I can read an Amiga mouse click in one line of 68k. I've no idea how you'd do it on a modern PC or even Java! Modern systems just aren't as integrated, for better and worse.
Assembly language was also part of mainstream programming back then. You'd learn Basic then go straight to assembly if you wanted to do anything serious. So there were computer magazine articles on assembly, childrens books[1]. My first assembler, Devpac, came from a magazine coverdisk with a tutorial from Bullfrog, Peter Molyneaux's old game company[2].
So there were a whole range of cultural and technical reasons for assembly language being much more of a human-useable technology back in the day.
>It's just the external Tower of Babel from boot to usability has got a lot larger.
Yes I agree, I kinda miss being able to see the ground, which is probably why I find retro programming so appealing.
> This is an interesting line of argument - in what way, and what could be done to improve the ergonomics?
Due to the enormous complexity of modern CPUs I'm not sure there's anything that could be done. With the 486 and contemporary (and earlier) uarches you could largely expect the CPU to execute exactly what you wrote so understanding the performance impact of any given bit of assembly was pretty straightforward. Then CPUs started adding features like superscalar, speculative, and out-of-order execution, branch prediction, deep pipelines, register renaming, and multi-level caching that massively complicate modeling the performance of any given code.
For example you may need to explicitly clear an architectural register before reusing it for a new calculation to avoid creating a false dependency in the uarch which would prevent the CPU from executing the calculations in parallel. Knowing when this is necessary can be hard and the rules are usually different between different uarches, even within the same uarch family.
Good assembly programmers who are aware of all this complexity can still beat compilers but they certainly can't do that for the scale of code that compilers routinely generate. Thankfully compilers are generally "good enough" these days and assembly only needs to be hand-written for very hot inner loops for performance-critical code or for cryptographic code where the exact performance characteristics of the code could potentially leak information if they're not handled correctly.
> Back then assembly language was much more ergonomic than it is now, and the machines were simpler.
In a way that's true. I also enjoyed writing assembler back then, especially on 68k.
However, to really extract the last cycle you often ended up generating a block of code at runtime (kind of very basic JIT).
Sometimes self-modifying code provided the final oomph to get something running fast enough. The reasons varied: sometimes it was because of running out of registers, sometimes dynamically changing the performed operation without branch.
Too bad for the later CPUs with instruction prefetching and caching...
Have you looked through the SDL source, though? Sure, I can get a window open and paint lines very easily, I only have to write ~50-150 LOC. However, I've silently added thousands of lines (and at least one dll) to my project. The library only hides some of the complexity (which I love about it), but the complexity still exists.
My point is, if you're allowing arbitrarily deep abstraction you can make almost any task trivial by just using something that basically already does what you're trying to do. The point the parent was making was how, with almost no abstraction at all, you could do graphics on an Amiga in a few lines, but to do the equivalent on a modern system at the lowest reasonable level of abstraction takes a lot more effort. Comparing that to a giant abstraction layer is missing the point.
> you can't use OpenGL without it.
Yes you can. You can't even pretend that coding once against SDL will absolve you of having to deal with platform issues, it just helps a lot.
> and creation of 3D images in bmp is excellent exercise you should try.
I'm a hobbiest game dev who almost exclusively uses software rendering (albeit to a framebuffer that gets pasted onto the screen with OpenGL as the path of least resistance). I've also written image libraries. None of this has anything to do with the parent comment.
Some things like texture mapping you could only write in assembler, because you'd need to use x86 lower/higher half of word (like AL and AH registers) due to register pressure. Spilling to stack could have caused 50%+ slowdown.
486 era you needed assembler to work around quirks like AGI stalls.
On Pentium the reason for assembler was to use FPU efficiently in parallel with normal code (FPU per pixel divide for perspective correction). Of course you also needed to carefully hand optimize for Pentium U and V pipes. If you did it correctly, you could execute up to 2 instructions per clock. If not, you lose up to half of the performance (or even more if you messed up register dependency chains, which were a bit weird sometimes).
One also needs to remember compilers in the nineties were not very amazing at optimization. You could run circles around them by using assembler.
Mind you, I still need to write some things in assembler even on modern x86. But it's pretty little nowadays. SIMD stuff (SSE/AVX) you can mostly do in "almost assembler" with instruction intrinsics, but without needing to worry about instruction scheduling and so on.