Hacker News new | past | comments | ask | show | jobs | submit login
The Undocumented Microsoft “Rich” Header (2018) (bytepointer.com)
123 points by Lammy on Oct 17, 2021 | hide | past | favorite | 23 comments



Rich Shupak, huh?

I used to work with his brother Paul, who had many stories about Rich.

The only ones I remember, sadly, were:

1. Rich filed some large percentage of the bug reports on Excel (and possibly other Microsoft executables), along with the recommended code fixes and patches in assembly -- without ever having access to the source code.

2. He was fairly quickly hired by Microsoft, and is responsible for the design and implementation of Visual Basic.


One use that the article doesn’t mention: If a large company ships a binary that was compiled with Visual Studio Community, Microsoft knows that they should do a license audit :)


It might just be nostalgia to the first computers I used, but the shear complexity, mystery, and stability that allows this stuff to accrete in Windows fills me with a sense of Wonder, even if it is also terrible.

Unix is very less bloated, which is better, but also means there is basically no sense of Wonder. So Linux, my daily driver, I gripe about, but Windows, which I thankfully never have to use, makes my eyebrows raise and lips smile from a safe distance.


> the shear complexity, mystery, and stability that allows this stuff to accrete in Windows fills me with a sense of Wonder, even if it is also terrible.

If "complexity, mystery and stability" induce in you a sense of wonder – learn yourself some IBM z/OS, and prepare to take that wonder to a whole new level

Open the JCL reference manual [0] to a random page (in true lectio divina style) – it is like reading some esoteric religious text, you have no idea what it even means, only a sense that it must be something profound. I think some other manuals, such as the many volumes of the Authorized Assembler Services Reference [1], are even better at inducing this effect

No doubt some of that is due to the mistake of reading the Reference manual without reading the Guide manual first – but I think many will find the Guides almost as inscrutable as the References are, and much in the Reference manuals will remain incomprehensible even after the Guides have been fully digested. The endless cross-references from one weighty tome to another form a maze of twisty little passages, all alike.

[0] https://www-01.ibm.com/servers/resourcelink/svc00100.nsf/pag...

[1] https://www-01.ibm.com/servers/resourcelink/svc00100.nsf/pag...


They are consistent and hence for a old dos (not the pc) and mvs/esa support, nothing particular. Just job. And I suspect you do not have one guy (Amdahl aside) has a name to these.


> Unix is very less bloated

Depends on which UNIX you are talking about, specially commercial UNIXes.


I suppose Unix's sense of wonder is in the culture surrounding it. So much lore to read about. Using Unix must have been quite the social experience back in those days. Completely unlike today's multi-user systems being used by a single person on a computer at home.


In college (early 90’s), anyone who took a CS class got a Unix account, mainly for access to email. The machine was running, I believe, SunOS 4. No idea what the exact hardware was, but obviously there were a bunch of user accounts on it, and unsurprisingly they had disk quotas enabled. Eventually someone figured out that if you created a sub directory in $HOME and chown’d it a professor’s user ID that it wouldn’t be counted against your quota. And since you still owned the parent directory you could still change the ownership back any time you wanted. Fun times..


Linux is like using a constructed language. It's precise, constantly pared down, and optimized. It's effective and boring.

Windows is like English. A mishmash of different languages with no perfect rules. It's a pain in the ass to get right. Lots of fun nuggets to find amusing though.


> A mishmash of different languages with no perfect rules

There is some of that in Unix/Linux as well, even if to a lesser degree than Windows exhibits it.

One good example is the dd command–both the command name and its rather unique syntax are inspired by the IBM mainframe operating system MVS (nowadays rebranded as z/OS) and the omnipresent DD statement of its (in)famous JCL job control language, although the inspiration is only at a very high level and the actual details are quite different.

Another is the fact that one of the fields in the /etc/passwd file is named pw_gecos, after the mainframe operating system General Electric Comprehensive Operating System (GECOS), which was used at Bell Labs in the early days of Unix. Some of those early Unix systems used the Bell Labs GECOS mainframe for print spooling, and pw_gecos was used to store the GECOS account name for each user. pw_gecos is still around, and has even spread from Unix into much newer systems such as LDAP and SCIM, although nowadays it stores the user's full name instead, and sometimes other contact details such as email or phone number as well. GECOS is still around too (if a shadow of its former self), having been passed from General Electric to Honeywell and since then to the French firm Groupe Bull (now part of Atos), and having been renamed from GECOS to GCOS along the way.

The Unix terminal subsystem (line drivers, termios, terminfo/termcap, curses, etc) is full of all kinds of crazy legacy crud which made sense back when people used to use real physical terminals running over RS-232, but now in the 21st century it is basically just legacy complexity we are stuck with due to inertia, backward compatibility, and the inability of any redesign to offer enough value to overcome all that inertia.

Current versions of Solaris [0] and its derivatives (such as Illumos) still contain the header file "rje.h", which is for an old Bell Labs system for submitting batch jobs to IBM mainframes. I doubt the actual code has been present for a long-time – the header references "/dev/dn2", which I believe is a PDP-11 modem device (see [1], [2]) – but I take it the Solaris developers never removed this file just in case doing so broke backward compatibility, maybe some developer somewhere included it in their C code and removing the file might stop their code from compiling. Linux, lacking the same heritage as Solaris, doesn't contain anything quite as ancient as this.

The use of "VTOC" as a term for partition table by a number of Unix systems (including Solaris) is inherited from IBM mainframe systems, where VTOC is one of the most fundamental on-disk structures – although in normal use more like a root directory with embedded allocation information than an actual partition table, but the partition table usage is inspired by the fact that when Unix was ported to IBM mainframes, the mainframe VTOC filesystem was used as a partition table, with each Unix filesystem being stored as a separate file ("dataset" to use the proper mainframe terminology) in the mainframe filesystem. Linux can run on IBM mainframes (z/Linux), and on them it actually literally supports the IBM VTOC filesystem as a partition table format [3], along with a bunch of other weird mainframe features–use of 3270 terminals as a system console [4] and even integration with virtual card punches and card readers [5].

Filesystems in particular are an area in which the Linux kernel supports a huge amount of legacy cruft, with all manner of rarely used filesystems supported for backward compatibility, for example the ADFS filesystem of Acorn RISC OS [6], the classic Mac OS HFS filesystem [7], the HPFS filesystem of OS/2 [8], the "System V" filesystem used by old x86 Unix systems such as Xenix and Coherent [9], the "Boot File System" used by the boot process of SCO Unixware [10]

[0] see for example https://github.com/illumos/illumos-gate/blob/master/usr/src/...

[1] http://squoze.net/UNIX/v5man/pdf/man4/dn.pdf

[2] https://github.com/illumos/illumos-gate/blob/master/usr/src/...

[3] https://github.com/torvalds/linux/blob/master/arch/s390/incl...

[4] https://github.com/torvalds/linux/blob/master/Documentation/...

[5] https://github.com/torvalds/linux/blob/master/drivers/s390/c...

[6] https://github.com/torvalds/linux/blob/master/Documentation/...

[7] https://github.com/torvalds/linux/blob/master/Documentation/...

[8] https://github.com/torvalds/linux/blob/master/Documentation/...

[9] https://github.com/torvalds/linux/blob/master/Documentation/...

[10] https://github.com/torvalds/linux/blob/master/Documentation/...


Thanks for the interesting tidbits!


> Linux is like using a constructed language. It's precise, constantly pared down, and optimized. It's effective and boring.

That depends on what you consider "Linux". The Linux kernel probably sits within that definition and to be fair so does the Windows Kernel (which a lot of folks forget about). It's all the other stuff that sits on top of the kernels that make up the "mish mash".


That's an interesting analogy. I would say even the market share is similar to Esperanto and English, respectively.


I was hoping to read this article on my iPhone. But the darn thing won’t narrow and stay narrowed so that it can be read without scrolling left and right to read each line.

Is it specific to me or is this some behavior specific to this site?


I had the same issue but switching to reader view “fixed it“ :)


It's because all the text is inside of a big table and there are several long rows. Combined, that means that the text can't really be resized below 1250px.

Don't use tables for design, folks.


> Don't use tables for design, folks.

Wow, feels like 15 years ago, when everybody moved from table layouts to divs.


Click the aA button in Safari on the left of the URL bar -> "Show reader view"

It won't show the highlighting on the hex dump, but you get the article text in a more readable format for the device


Same on Android, but Chrome pops up a "Show simplified view" on the bottom that fixes it right up (at the expense of wrapping the hex dumps).


>The problem was that Microsoft intentionally hid and encrypted this information. Since the structure doesn't officially exist, there isn't going to be an official way to disable it. Anyone who develops with Microsoft tools gets this structure crammed in their executable whether or they like it or not. Failing to document this fact can be considered a questionable practice.

Yes questionable, but if they had documented it or even left it unencrypted that would have made it trivial to spoof the contents, which would defeat the apparent purpose of the fingerprint. Most of the benefits of this thing would have been rendered useless.

Benefits to Microsoft, of course.


This seems like it might be useful for reverse engineering projects aiming to do source code reconstruction; if a Rich header is present, presumably that would guide you to being able to figure out if you have the exact builds of the tools that the original binary was made with. Interesting.


Could someone explain how this “header” aids in debugging? There’s this:

> To Microsoft's credit, the "Rich" header offers invaluable debugging statistics about how a given executable was built.

But how does knowing the compiler version help with debugging? Are we assuming compiler bugs?


“Debugging statistics” might mean something like, when analyzing WER data they could see if a new compiler build was having an elevated error rate, for example.

That might not imply compiler bugs; it could also be the result of fixes for things that now crash instead of doing something unsafe.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: