There was actually a tool "com2txt" back in the DOS days. So you could convert an executable and put it into an email....
Update: I have my data well sorted enough that I found it :-) It even comes with code and is under a vague free license: https://github.com/hannob/com2txt
Back in the BBS/FidoNet days someone sent me a comic GIF that, when renamed with a .COM extension, would execute under DOS. It was a nifty little demo, although .COM files were going the way of the dinosaur around then.
Wish I could remember the trick, but in the DOS days I used to type in a few characters (5ish IIRC) at the start of text files that allowed them to be renamed and executed as COM files.
If you read the paper I think you'll find it is something like "ZM~~_#____PRinty__C", where _ is a space, (HTML compresses spaces). See section 8, also the beginning of the paper.
.COM files were straight binary, no header and loaded at address 256 (0x100) into any 64KB segment. All indirections were local to that segment, hence the 64KB limit of .com files.
The characters you were typing are (probably) the code for a jump to an entry point somewhere else in the file.
No, COM files had no header. [2] I think that's part of why they were replaced. ZM was for EXEs. I think that one was someone's initials. (Yes, looked it up. [1])
I think what parent is remembering were characters that effectively created a jump instruction at the beginning of the file.
It didn't just create a text file. It created an executable text file.
First two paragraphs of the README:
Com2txt is a tool on MS-DOS which converts a com file to a text file. It's DOS generic. Unlike tools such as uuencode, the text file generated by com2txt works as a com file, exactly like the original com file does. Using com2txt, you can create a com file which can be sent through networks such as internet, and runs without any decoding.
Moreover, the text file got by com2txt consists only of ECHOable characters; it doesn't contain characters such as `<' or `|'. So, using ECHO command, you can easily generate the textized com file and use it in a batch file. For detail see section 4.
True, but there are a few gotchas with that version. It won't work with execve, it doesn't cache the binary, it won't work if called with "source", it doesn't set argv[0] properly when the binary is called, and a few other things. It is nice and terse though.
execve compatibility is not easy at all, since it requires a shebang line which isn't valid C. However thanks to emmelaich's // idea I just figured out a way to do it using fewer lines than your #if 0 solution.
Yeah, a shebang is required, and unfortunately it isn't valid C so you can no longer feed the file directly to the compiler. I just figured out how to whittle it down to 2 lines though! Thanks for the // idea. Here it is in both shebang (2 line) and non-shebang (1 line) versions: https://gist.github.com/jdarpinian/1952a58b823222627cc1a8b83...
Because most people don't have tcc installed, and the convenience is ruined if you have to install things for this to work. I really like that tcc has a flag for this; GCC and Clang really should copy it.
When I read the source and started on the comments, I thought: won't be long until someone drops in TCC. But yes, TCC does limit you in this regard a bit.
> Dennis Ritchie invents a powerful gun that shoots both forward and backward simultaneously. Not satisfied with the number of deaths and permanent maimings from that invention he invents C and Unix.
Absolutely. It's very subtle humor. Students of computer architecture consider x86 to be one of the least elegant architectures around. Its many warts include segment registers (originally a hacky workaround to stretch 64k of memory to 1M), and an extremely complex instruction encoding employing prefix bytes. Many of the legacy issues (such as not having enough registers) have been papered over, leaving traces behind. Many people felt that the complexity would doom the architecture, and that a cleaner, leaner RISC approach would win out.
However, Intel has used their advantage in process technology to throw massive amounts of transistors to make up for the problems caused by all this complexity, and has done well. RISC has done well in the mobile space because those transistors tend to be power-hungry, but everywhere else x86 is today almost the only game in town.
One reason it's especially funny is that "HLT" is one of those legacy instructions that has pretty much no use in a modern system, yet takes up a whole slot in the byte encoding, while common operations like MOV or ADD often require extra prefix bytes to specify the size of the operands.
Segment registers did not evolve from the hacky address space expansion mechanisms. It may look that way looking at nothing but the Intel history, but the descriptor-style segment registers existed in mainframe architectures before the 8086/88 existed. The 8086 has trivial segment registers (which were just scaled offset addresses) which then morphed into mainframe-like descriptors of the successors (registers being indices into tables of segment descriptors). That could have been a plan all along, though.
> Students of computer architecture consider x86 to be one of the least elegant architectures around.
I guess it depends when and where one studied.
Having grown with Z80 and x86, it surely looked kind of alright to me.
I only missed the flat addressing from 68000, but given that I only had access to it on Amigas available at some dev meetings, it wasn't something I bothered much with.
Also I don't remember anyone jumping of joy during the MIPS assignments (using SPIM).
HLT is absolutely used in modern systems, ARMs have the WFI and WFE (wait for interrupt, etc).
They're essential for dropping in to low power modes, though admittedly HLT wasn't used for that back in the day.
It does have a slight advantage over ARM / most other RISC architectures in that the instructions are fairly small, meaning that you can get quite good decoding throughput without going wider. That advantage doesn't get entirely cancelled out by how badly allocated things are, since instructions can decode to multiple "actual" instructions (µops).
I'm still curious as to how Intel thought a mobile x86 chip could ever work.
HLT is still used by kernels when they want to idle the processor until the next interrupt. There are many examples of instructions that aren't actually used much if at all nowadays, like POPAD and PUSHAD, or the binary-coded decimal instructions
The histogram on the last page counts the occurrences of each character in the paper (all of them printable, of course). But because the histogram's counts are made of characters too, the author had to add a few extra numbers to make the histogram "converge". Brilliant.
Meta literate programs. Not only do you have the code and a descriptive document about the code in the same document, but you also have the executable!
No, it is the _data_ that the compiler precomputes to help changing the value of the AL register. It is unused here, in fact it is cropped to 160 columns and it has a caption in the middle so it's wrong even. He included it just because it looks cool.
I must be missing something, but I don't see how the actual text of the paper originates from the source code. Those C instructions actually compile into the sentences of the paper as well?
From a quick look at the source[1], it seems the compiler will always generate an executable with the text from the paper (which is read from the "paper/" directory, and some bits hard-coded in the compiler source). Or something. I don't really know SML.
From what I can tell, the .exe file generated by the compiler must be really big anyway (since the relevant sizes in the header can't be small because they have to be printable). So there must be some text, it might as well be the paper.
Ah so all the x86 bytes that the actual text generates are basically just filler for the actually relevant sections of the paper (i.e. the jumble of bytes that appears)? They're never actually read or executed by the CPU?
Update: I have my data well sorted enough that I found it :-) It even comes with code and is under a vague free license: https://github.com/hannob/com2txt