> The undocumented C9 opcode is identical to the documented CB, far return instruction.
I remember this.
Once ... gosh, it's hard to believe how long ago that was now, but once I knew the entire Z80 opcode table off head. I could read the Z80 machine code and disassemble it. High school crowded that stuff out of my head, I went to a special math school, it was very very hard. Except... C9 was RET (and the Z80 is an extension of the Intel 8080). That has burned into me so deep I still remember, across more than 35 years. I will be 30 ;) years old in two weeks.
There is a plaque with hex dump at the entrance of Computer History Museum in Mountain View, California.
I immediately recognized the opcodes on it because I knew Z80 and they were in 8080 (I don't remember seeing any ED B0 or similar Z80 extensions). 01, 11, 21, C3 and CD’s all over. I think it was either a boot loader or a BASIC ROM startup code, but not sure.
8080 probably implies Altair BASIC because there was “(C) MICROSOFT” string in the code.
As a total aside, domain expertise is kind of amazing.
Like, you can look at a string of opcodes from a processor that you haven’t touched for a decade-plus, and still recognize what’s going on.
It’s important to remember that applies in all fields.
That bumpkin that grew up on a farm?
Sure, he might not be able to write an award-winning essay, but he could probably tell you everything you need to know about your local soil chemistry and it’s suitability for various crops just by smelling it.
In a similar vein, C000 == 49152 has been burned into my brain. I recognize it immediately, because that’s where most C64 assembly programs were loaded, followed by “SYS 49152” for loading them.
A000 for example I recognize, but can’t tell you as quickly what it is in decimal as I can with C000.
I just know 49152 on another level, as SYS49152 was part of very early childhood. No thinking at all necessary, even as trivial as 40960=10x4096. 49152 is ingrained.
Nope it's because it's the only 4K of memory that BASIC did not use and that wasn't mapped under the system ROM; so it was a common place where to put assembly language code to be loaded with a BASIC program, or to be used while the BASIC interpreter was running. Cartridge ROM can appear at various places since they have access to the memory bus but $C000 can't be used as the only place for cartridge ROM.
I grew up behind the Iron Curtain, the ZX Spectrum we had was smuggled into the country by my parents in 1985. What printer...? There was no printer to buy! More, we were soldering the data cable when it broke because what today looks like a bog standard 3.5mm mono cable was impossible to replace. It was not available and that was the end of the story.
A lot of the undocumented instruction "match" bits make sense, such as POP CS at [0F]. However, it's clear that the authors of the microcode deliberately made every opcode match to some routine, as evidenced by Jcc being mirrored into the [60..6F] region, LOCK into [F1], group 2 /6 into /7, etc. It wouldn't've "cost" anything extra to make Jcc only match on its documented [70..7F] region (and others) like later processors do. What's the advantage of matching on undefined byte sequences?
Also, SALC is still technically undocumented by Intel (AMD documents it, though). It doesn't have a dedicated section in the SDM (would be in Volume 2, Chapter 4), and in the opcode map (Volume 2, Appendix A), there's a blank there. One actually has to go to Volume 3, Chapter 23 "Architecture Compatibility", Section 15 "Undefined Opcodes" (of version 080 from June) to see it mentioned. It's weird. They even call it out as SALC "when not in 64-bit mode" and that it performs "IF (CF=1), AL=FF, ELSE, AL=0", but refuse to officially document it.
There's a reason why every 8086 opcode matches something. If an instruction didn't match anything, I think the microcode engine would spin idle and the instruction would never terminate. (You need a RNI micro-instruction to end microcode processing.) Having your processor lock up due to a bad opcode is something that the designers presumably explicitly avoided.
The 6502 on the other hand, didn't take such precautions. There are opcodes that cause the internal timing state machine to sort of fall off the end, causing the CPU to lock up and even an interrupt won't rescue you. You need a RESET signal.
Also the 6800, with its famous HCF (Halt and Catch Fire) instruction:
> With the advent of the MC6800 (introduced in 1974), a design flaw was discovered by programmers. Due to incomplete opcode decoding, two illegal opcodes, 0x9D and 0xDD, will cause the program counter on the processor to increment endlessly, which locks the processor until reset. Those codes have been unofficially named HCF. During the design process of the MC6802, engineers originally planned to remove this instruction, but kept it as-is for testing purposes. As a result, HCF was officially recognized as a real instruction.
It also has a lot of soap opera syndrome, where the characters are all mary sues, and constantly make new problems when there are none to be dramatic about. They are entirely unlikeable and spend all their time not name-dropping nerd bait being assholes and punishing you as a viewer for being invested in anything.
The show feels like if you were supposed to take It's Always Sunny seriously, for drama. Meanwhile there just happens to be 25 years of computer history occasionally shoved into the background, that magically is the fault of the same like four people.
although by the time I got to computing, the HCF didn't actually cause conflagration.
I think I recall a 1960s computer design that would poll a limited set of (magnetic core) memory addresses in its idle loop, which led to overheating in those memory elements. Boot loader for PDP-8? I don't think it was the CDC-6600...
When implementing the decoding in silicon, each bit you want to compare requires one or more transistors; but no transistors are required for the "don't care" bits.
Since the decoder is implemented in a PLA-style matrix, you don't save any space by omitting transistors. So that's not a motivation.
Random thing: chip transistor counts usually count "transistor sites" rather than physical transistors, so the omitted transistors still show up in the transistor counts.
Although these registers are normally not accessible by the programmer, some undocumented instructions provide access to these registers, as will be described later.
Also, I'm not sure if you've explored 8f/1-7 yet, since it's not mentioned in the article, but I suspect it's just the same as pop r/m16 (8f/0) as it ignores the subopcode bits completely. It's very weird that the push and pop are in completely different places in the opcode map.
I hadn't noticed that characteristic of 8F. Looking at the microcode, the subopcode bits are ignored, so I think your analysis is correct.
I agree that it's very weird that PUSH rm and POP rm are arranged completely differently. The obvious place to put POP would be in FF/7, especially since that spot is unused. I can't think of any reason why they wouldn't do that.
> It's very weird that the push and pop are in completely different places in the opcode map.
Are they? There's a lot of different opcodes for push and pop, but if I'm reading right, for a given address mode, the op code for push and pop always differs by one bit (which bit changes though)
Aside: If you are writing stuff this good (and this niche) I would say ditch the ads. They really will turn off the audience you want to attract. Or at least use one of the developer-focused ad networks that just inline ads not pop them up.
> They really will turn off the audience you want to attract.
I disagree. The vast majority of the internet doesn’t care, because it’s the internet and this is normal.
The types of readers who will read a blog post and critique the author for things unrelated to the content of the blog (such as the presence of ads) are generally not a great audience to cultivate, and definitely not a great vocal minority to cater to.
There are numerous examples of YouTubers being hesitant to turn on ads in their early days because they think it will drive away potential subscribers and drive down view counts. Then they turn them on eventually, growth continues exactly as before, and their only regret is not turning them on sooner.
Be careful not to mistake the vocal minority’s complaints as a common concern.
I use an adblocker, but even if I didn’t, I’d have a very hard time to think that someone who is actually interested in the subject would think this is an “ad ridden” site and close it. The quality of the blog posts are apparent immediately.
Having AdBlock extension, but like all my extensions also have "This can read and change site data > when you click the extension" while turn on to any domain that "misbehave" with sticky ads, videos or whatever annoying flashing things. Otherwise I'm just hurting content creators for no good reason.
I would not enable an extension on any domain i input my email, password, or otherwise have sensitive private information.
> Otherwise I'm just hurting content creators for no good reason.
It's less about the content creators and more about supporting a system / business model that isn't really sustainable and reliant on business practices that distract and in many cases actively harm consumers. Most content creators have Patreon profiles or some similar service for taking donations in a structured and community-driven way, why don't you use an adblocker and donate there?
That’s why Safari has content blocker extensions. Those cannot read the content or do network requests, they can only block content.
I use 1Blocker. It’s made of several extensions, only one or two non-essential ones that are not pure content blockers (and so can read the site), and which I have turned off, with still very good results.
I see this kind of “ads? I didn’t see any ads in my ad-blocking browser!” response practically every time someone mentions ads on here. Can I ask what you’re trying to achieve? Do you think the HN audience is unaware of ad blockers?
It's just an expression of irritation towards people who contribute nothing of value to the discussion about the content of the article. "oh no I see ads on your site it sucks!" Erm, okay? Who cares? Install an ad blocker if it bothers you so much. It's faster than typing a comment on hn for greater pay off.
> Aside: If you are writing stuff this good (and this niche) I would say ditch the ads. They really will turn off the audience you want to attract. Or at least use one of the developer-focused ad networks that just inline ads not pop them up.
Similarly, not having https properly configured is a huge turnoff - it literally takes 5 minutes with LetsEncrypt
I love this blog series that just goes on and on. Keep up the good work!
One thing I'm thinking about: I grew up writing 6502 assembly on the VIC20 and C64. We hated the 8086 back then, as kids do with hardware. When growing up I saw that we were partly right. The 6502 performed very well compared to the IBM PC on practical tasks. But the 6502 is a dirt simple, while the 8086 is surprisingly complex. Was the PC slow due to 8088 and not 8086?
I can of course just Google it or check the op code table. But always fun to talk about old technology :)
I can only imagine the nail-biting at Intel when repurposing the 8086's undocumented but instruction aliases for new 80[123]86's instructions. Intel had no way of knowing whether someone, somewhere, had written software using these aliases.
It is doubtful they cared one bit. The 186 was really targeted at the embedded market (which is why it integrated so much of the usual support chips roles onboard the CPU). So it did not likely figure much here.
For the 286, do keep in mind that prior to IBM releasing the PC-AT, no one at Intel likely considered that anyone would want to use their newfangled 286 as nothing more than a "fast 8086". It appears that Intel's plan for the 286 was that everyone would also have a 286 protected mode OS to run upon it, and 8086 real mode was included only for the purposes of setting up the minimum necessary protected mode data tables to enable a switch into protected mode. This is one of the reasons why the 286 provided no documented way to leave protected mode once the OS flipped the PE bit in CR0 to 1. And, in 'protected mode' one has to provide protection against executing invalid opcodes for it to be properly a 'protected mode'.
The fact that most 286 system purchasers were running their shiny new 286 as nothing more than a fast 8086 for many years probably caused much more consternation at Intel than the fact that the [12]86's generated exceptions for undocumented 8086 opcodes.
I wonder what happens when a 286 in protected mode flips PE back to 0, assuming sane values in seg regs etc. Do you end up in some kind of unreal mode, does it lock up completely, or what happens?
attempting to set the PE bit back to zero has no effect:
> The CPU is put into protected mode by setting the PE bit in MSW using the LMSW or LOADALL instructions. Clearing the PE bit has no effect using either of those instructions, thus it is not possible to switch back to real mode.
My guess is that they didn't care much when they made the 186/286 because it wasn't yet clear how big the PC would get -- and those chips did a lot to weed out bad programs (the once they didn't use caused INT 6) so they felt perfectly safe when they made the 386.
This begs for the challenge: How would a program look like if the opcodes it could use exclusively consists of ASCII characters?
This has been a niche programming challenge which was popular before I knew it existed, so I upped the ante by using alpha-numeric only.
The usable opcodes where practically IMUL and XOR with severe limitation on registers and offset. But with them I managed to create a random number generator that magically outputs fragments of codes at the location where the next instruction would be located. This would snowball adding more opcodes/functionality as it unrolls into a complete application.
It felt like the instruction set was so restricting and that the designers of the instruction set deliberately mapped the most critical instruction to make this possible to the opcode values. This has made me wonder what the considerations were to select which binary byte value to match with the instructions. if MUL/XOR were mapped differently, this project most likely would not have existed.
Tom Murphy VII (tom7) did such a thing. It's a partial C89 compiler that outputs only printable characters. In meta fashion, the compiler is compiled by itself to also contain only printable characters. The TXT and EXE files below are exactly the same.
Yes, like many others. That's why I upped the ante by stating "numeric and lowercase characters only". And with the glyph differences between them also create a "hidden" image.
Editted: What you reference to is a compiler and something completely different.
Both projects stem from the same root, TXT=EXE. However, both branch in totally different directions. One says "any printable goes" whereas the other says "only the smallest subset (preferably also for ASCII art) goes", mainly being being “0123456789acemnorsuvwxz”.
These are two different projects with different design goals and challenges. Tom's built a compiler around it, I created a bootloader that consists of only MUL and XOR.
I don't understand why I feel that I have to defend myself, I thought hackers love these kind of projects, yet it seems I got cancelled.
A true innovation would be to combine both projects. One that inputs source code plus an image and outputs ASCII art that runs like a program.
A friend of mine (and one of the best CTF players I’ve ever met) once wrote shellcode that used only the [0-9A-F] ASCII character range and self-modified to access other instructions.
Based on AAM using CORD for the division operation will D4 00 generate the same division by zero error as a DIV with a zero divisor? (I can’t think of any reason this would be useful but I do like thinking about edge cases and how things break.)
My transistor-level simulator shows that the 8086 will generate a divide by 0 interrupt if you give the AAM instruction a divisor of 0. But I haven't tried this on a real chip.
With the one exception being the 80186, where 'AAM 00h' will result in AH=FFh. It passes the divisor directly to the ALU engine without checking for zero, most likely rationale being that operands other than 0Ah are undocumented :)
That was fixed (either in microcode or hardware) on all later generations. And the 8086/88 did the check in the CORD subroutine of course, which is used by both DIV and AAM.
Yes, the 186 microcode is remarkably similar to that of the original 8086, except for some encoding differences.
Multiply/divide is now assisted by the ALU, so the microcode mainly has to set up the registers and do the check for zero divisor / overflow. There is also a final adjustment to the result, because the hardware uses a non-restoring algorithm that may underflow. And for signed division, the operands are converted to unsigned first and the result possibly negated at the end, using the F1 flag to store the sign like on the 8086.
The 286's instruction decoder and microcode ROM are entirely different: 1536 words of 35 bits, with the entry point determined by a separate PLA.
Huh. I thought they were identical except for the 286 having more hardware and more microcode for all the protected mode stuff -- and less for not having timer/interrupt controller/DMA/etc.
Where can one learn more about microcode, how it's implemented in silicon? As I understand it most machine code is actually "VM bytecode", and the "real" cpu is the microcode processor?
A computer architecture book such as Hennessy and Patterson will describe microcode in detail.
It's a bit confusing because microcode has changed meaning a bit over time. "Classical" microcode, such as the 8086, replaces hard-wired control logic with micro-instructions. The processor steps through the appropriate micro-instructions, which are decoded to generate control signals.
The Pentium Pro introduced a new model, where machine instructions are broken down into independent micro-ops, which are handed off to the core processor engine and processed independently, in parallel. At the end, the micro-ops are "retired" in a sequential order, so your program appears sequential.
Most micro-ops are generated by decoders that convert a machine instruction into a small number of micro-ops. However, complicated machine instructions are converted into micro-ops by microcode. This is similar to classical microcode, except it's not executing micro-instructions but generating micro-ops that then get run by the underlying processor.
I wouldn't recommend anything from Hennessy and Patterson for learning about microcode; they are entirely RISC proponents, after all. Instead, books and articles from 50s-70s on computer design are probably far more detailed and relevant.
That is the start of the videos where the control logic gets microcoded. Its pretty basic but over the next few videos he comes up with about 10 different OPCODES and programs their microcode (a series of control logic activations). Its pretty amazing to see it all come together and work in the end.
If you want to learn about simple microcode and how a (non-superscalar, which matches the 8086) CPU generally works, this is a brilliant resource, and fun to watch.
His entire channel is brilliant. I also love his homemade GPU. I forget sometimes just how fast CPUs are so watching him 'race the beam' (not literally anymore, but HDMI has timings just the same) is fascinating.
Provided you can find a copy, the book "Computer Organization", 2nd ed, by V. Car Hamacher, Zvonko G. Vranesic and Safwat G. Zary, published by McGraw Hill, contains a very understandable chapter of microcode control. Depending on your background, you may want to read some of the prior chapters first before starting on the microcode one, to get some foundational knowledge.
Do note that the 2nd ed was published in 1984, so finding a copy may be difficult. I do not know how (or even if) the microcode control chapter was updated in subsequent editions.
On a side note - it would be also interesting how invalid instructions detection is implemented in later processors. It's hard to imagine giant lookup table because that sounds like waste of resources.
Not exactly. It used the two different encodings possible for source/destination operand when both are registers.
For example, 'ADD AL,CL' can be encoded as (values in octal):
000 310 : 3=reg/reg 1=source is CL 0=dest. is AL
002 301 : 3=reg/reg 0=dest is AL 1=src. is CL
Other assemblers always use one of these consistently, A86 switches between them depending on some bits of the opcode and operands (but each instruction will always be encoded the same when it appears multiple times, there is no steganographic message embedded).
I remember this.
Once ... gosh, it's hard to believe how long ago that was now, but once I knew the entire Z80 opcode table off head. I could read the Z80 machine code and disassemble it. High school crowded that stuff out of my head, I went to a special math school, it was very very hard. Except... C9 was RET (and the Z80 is an extension of the Intel 8080). That has burned into me so deep I still remember, across more than 35 years. I will be 30 ;) years old in two weeks.