Stuff like that is definitely fun. In the 1990s I bought a Sharp PC-E500S pocket computer and hacked the CPU's instruction set. With no internet and no documentation about the processor, I invented my own assembler syntax for the instructions. Assembler, disassembler, hex monitor, (written in Basic) are all still working to this day.
All my notes at the time were made with pencil on paper. Even if I could find them, I'm not sure they would still be readable. The Basic programs could only be copied by re-typing them manually on a contemporary computer. Presenting this pre-internet stuff on a website would just be too much work, sorry.
If you do find the documents though, please consider just scanning them and uploading them to Internet Archive and posting the links to HN. That way someone else in the future can find it and decide if they want to do the manual re-typing etc themselves :)
That just makes it a meta challenge…for some unknown engineer who wants to reverse-engineer an engineer’s program that reverse-engineered a program with an unknown instruction set.
Probably the second-best fun I ever had was reverse engineering a discrete-TTL processor and the firmware written for it. These were embedded in some Xerox Diablo daisy-wheel printers dating from the latter half of the 20th Century. And the best fun I ever had was hacking that code to better suit the unique needs of my customer!
I wrote about the Diablos and their multi-axis realtime motion control here [1]. The good stuff about the hacking starts just over halfway down the page, "the Diablo proprietary processor."
HN has honored me in past by recognizing other items on the site, such as "One-Bit Computing at 60 Hertz" [2] and "the KK Computer - a radical 6502 redesign" [3].
I wonder what the mystery instruction set in the slides actually is? (Assuming it is a real instruction set and not just something made up to demo the idea.)
It's a reverse-engineering conference presentation by 2 Russian authors who highlight that they aren't providing any details about the context despite the obvious extreme relevance, and where their solution does not handle any obfuscation at all. So they are probably not decompiling APT malware running in nested VMs, but I'm going to guess reverse-engineering old highly-secret Russian military hardware where the only docs are high-level ones about the usage and repair, not what the chips are doing, and where the contractor wants to bugfix or develop new versions but needs to understand all the inner logic and what empirical ad hoc corrections it might be incorporating through the wisdom of long-dead Russian mega-brain engineers.
Yeah I'm very familiar with 'file', I just wondered in what context one needs the ability to identify 38 machine languages, i.e. why does an organization deal with files containing unknown machine code, and have the need to identify them?
Sounds like maybe reverse engineering/security "research"-oriented work, perhaps.
I was basically leveraging my eidetic memory of opcodes and operands and its bitfields.
It all got started with writing pure assembly for Motorola 6502 (for arcades) and PDP-11 then eventually ended with ARM/RISC/MIPS. Most esoteric one is the Transmeta VLIW (TMS3200-02).
In the slide 9, they show the frequency of each 16-bit value. In a compressed code, the frequency of each value should be almost equal.
10 or 20 years ago, when reverse engineering any unknown file it was a good to assume it was no compressed and you could get some insight looking at the hex editor and hopping the best. Now many are compressed, so a good first step is to change the extension to .zip and try WinRar (or look for a header if you are not lazy).
I assume that with compressed code you can use the same strategy. Try to assume it's using a well known compression algorithm, and crossing your fingers.