Hacker News new | past | comments | ask | show | jobs | submit login
Reverse Engineering for Everyone (0xinfection.github.io)
621 points by udev4096 6 months ago | hide | past | favorite | 66 comments



I'd like to add that reverse engineering can also be done without any peeking at the thing you're trying to reverse-engineer.

Andrew Tridgell explaining how he reverse engineered Microsoft's SMB protocol with the "French cafe technique": https://www.samba.org/ftp/tridge/misc/french_cafe.txt

Tridge also reverse engineered BitKeeper, the proprietary software that Linus foolishly used to host Linux kernel development for a while. He noticed that if you telnet to the BitKeeper address:port rather than use its proprietary client, you can type "help" and it then spits out a list of commands to try...

You can then interrogate the repository with these commands and get a complete understanding of all the internal data structures, without ever using the proprietary software, let alone having to disassemble it.

The fact that Tridge did this reverse-engineering led BitKeeper's owner, Larry McVoy, to rescind the Linux community's use of his software, so Linus wrote git.


> Tridge also reverse engineered BitKeeper, the proprietary software that Linus foolishly used to host Linux kernel development for a while.

I wouldn't necessarily call it 'foolish': Linus used the best available tool at the time. (I don't know whether BitKeeper was the best available tool in some absolute sense, but Linus looked around and evaluated many of them.)

> [...] you can type "help" and it then spits out a list of commands to try...

That was actually a nice engineering / UI decisions by the BitKeeper developers, but I'm afraid the morale of the story would be not to make your software too helpful?

> You can then interrogate the repository with these commands and get a complete understanding of all the internal data structures, without ever using the proprietary software, let alone having to disassemble it.

That's a strange use of 'use'? Clearly, talking to some software over the network is 'using' it?


> That's a strange use of 'use'? Clearly, talking to some software over the network is 'using' it?

The point is that the proprietary client software was not used.


That makes sense.


> I wouldn't necessarily call it 'foolish'

It was foolish because, by selecting it, Linus was endorsing use of non-free software to work on one of the premier free software projects. It gave Larry McVoy unwarranted control over Linux kernel developers.

As rms said at the time: "The spirit of the Bitkeeper license is the spirit of the whip hand. It is the spirit that says, "You have no right to use Bitkeeper, only temporary privileges that we can revoke. Be grateful that we allow you to use Bitkeeper. Be grateful, and don't do anything we dislike, or we may revoke those privileges." https://marc.info/?l=linux-kernel&m=103454948625224&w=2

It caused animosity for years and was resolved by Linus writing git, famously in 10 days. Could he not have taken 10 days off in 2002 and written his preferred DVCS then?

The moral of the story is: don't use proprietary software. It will bite you in the ass.

"Torvalds seems to have fallen for the “free beer” argument: He didn’t have to pay for BitKeeper, so he figured it was good enough. But not having to pay is not, and has never been, the real purpose of free software. The point is to avoid the situation Torvalds eventually found himself in: McVoy didn’t like how his product was being used, so he took his ball and went home. Could you afford to switch gears in the middle of a project if one of your key software vendors did the same?" https://www.infoworld.com/article/2211030/linus-torvalds-bit...

> That's a strange use of 'use'?

To "use" software is to copy it into your computer's memory/CPU to execute it, for which courts have said you need a copyright license. You don't need a copyright license to connect to an open network port and interrogate it (or even capture packets of other people's conversations). US courts have also affirmed that web-scraping is a legal way to collect information because you're just throwing the data out there to anyone who asks; if you want to force people to agree to terms and conditions to see data or "use" web-software, you have to make them login or supply a key that you only issue _after_ they agree to your license or contract.

Andrew Tridgell did not even use anything which would require him to accede to Larry's license. It wrecked Larry's desire that nobody work on a "competing" tool to his, and there was nothing he could do about it, which is why he took his ball and went home.


Linus said it took longer to design, maybe he wasn't ready in 2002.

"So I’d like to stress that while it really came together in just about ten days or so (at which point I did my first kernel commit using git), it wasn’t like it was some kind of mad dash of coding. The actual amount of that early code is actually fairly small, it all depended on getting the basic ideas right. And that I had been mulling over for a while before the whole project started. I’d seen the problems others had. I’d seen what I wanted to avoid doing." https://www.linuxfoundation.org/blog/blog/10-years-of-git-an...


> Clearly, talking to some software over the network is 'using' it?

In some sense, yes. But I wouldn't say my mom uses Linux when she uses her Ipad to visit a website hosted on a Linux server.


Possibly not, however the passive "Linux is being used" would still be a valid observation.

In this case, the software (or a component of the software's ecosystem) was "in use" over the network.


Maybe. But in some sense, if you use a classic X application, it's all done over the network, too.


So Andrew saved us not only once, but twice!

It goes to show that yes, you need someone to score, but you also need someone to make that critical pass of the ball.


An undergrad highlight for me was hearing the bitkeeper/git story from Tridge one afternoon that he happened to be in the faculty lunch room :)


I bet Larry Mcavoy highly regrets his actions.


Yes, he is still butthurt about it. This is from 20 days ago : https://news.ycombinator.com/item?id=40887244


That's fantastic. He deserved his comeuppance.


Thats insane

Is git a "clone" of bitkeeper?


I never thought of reversing as something you pick up a book for. Everything I learned was through application from a young age.

1. Learning how to use Cheat Engine to scan video game process memory and modify games.

2. Learned how to read/replay packets in an MMO to try an cheat.

3. Learned how to craft DLLs, hooks and inject them in processes.

4. Learned how create patches for executables to solve some crackme challenges.

5. Mess with real world software that requires a license key, to suddenly not require a license key (or accept any key).

6. Mess with binary formats to try an reverse how game saves worked to.. you guessed it, cheat.

7. Get a real job and make money with the skills and knowledge I acquired.


Same. I learned reverse engineering by staring at CE/IDA for entirely too many hours as a kid, which means whenever someone asks me for advice on how to learn reverse engineering I don't really have any good answers :)

I think in reality it's the type of thing you do just have to try and spend some time on. The OP tutorial comes across as very sparse, both trying to cover too much and also not really teaching reverse engineering skills more than most people would be able to pick up in a few hours of messing around. beginners.re in contrast is massive, but also much more in-depth and goes step-by-step; on the other hand crackmes are probably better hands on challenges to try.


Wow, did you really have access to IDA as a kid? Even with adult money it seems expensive to me.


Most people used a cracked old version of IDA. I actually just used the freeware version, which was ancient and didn't come with any decompiler. Which was definitely difficult, and people having access to Ghidra for free these days is definitely a lot better!


Everyone pirated IDA as a young reverse engineer, that's just a rite of passage.


Numega's SoftIce for me, but I always preferred interactive exploration over static disassembly.

Disassembling a large binary would get you a massive text file that was painful to navigate - and often times I'd find that the code I was interested in removing "Invalid license key" (ahem) would be stored in some unrelated DLL.

So for me setting breakpoints on MessageBoxEx, and similar things, was by far the quickest and easiest way to go.


Going straight for reverse-engineering is doable, but it's significantly harder without some engineering background, either formal or self-taught.

I have an ongoing reverse-engineering project for a video game and I ended up getting in contact with a self-taught modder of the game, who doesn't know how to program. He learned more in a couple of evening Discord calls with me showing him around the reverse-engineered Ghidra project, explaining the basics of computer program engineering as we went, than he did flipping bits with Cheat Engine.

He then proceeded to recreate a fairly ambitious mod that was showcased in a Youtube video 15 years ago but never released, something that was bugging him for years but was unable to recreate. I steered him throughout, but by seeing how the pieces fit together he then managed to do the same mod on the sequel (which was never done before) all by himself.

Experience with engineering gives you perspective when reverse-engineering.


It depends what you mean by "engineering". You need to understand the "memory model" (I don't know the proper term). So that memory has addresses, you can point to them, the stack, registers, etc.

I have met many software developers that have almost no understanding about that stuff. They wouldn't help much when it comes to reverse engineering.

At the end of the day, there's a bunch of knowledge you need to be able to reverse engineer efficiently. It doesn't really matter if you're coming from flipping bits in CE to programming or vice versa, but you need both. Having some around that knows both guiding you is a massive help.

For what it's worth, I also started reverse engineering first and programming second. There were many concepts I knew but didn't know the name of. I remember seeing a weird function where a pointer to an object was passed via ecx. I had no idea that how functions were called was a "calling convention" and that Microsoft called that a __thiscall. But at the end of the day, I did figure out what was going on, I just couldn't tell you what the original c++ code was until years later (when I finally "learned" c++).


Understanding the low level details helps, but another benefit of having engineering experience is being able to empathize with the original engineers.


I don't think this is true, or at least I'm not convinced by a single anecdote. The majority of good reverse engineers I know picked up reverse engineering first and programming second (and a lot of them are still frankly not great programmers), and likewise I know plenty of good programmers who would be completely lost reverse engineering. Reverse engineering is a very different skillset than programming.


While I am reverse-engineering a video game by myself, I'm not really part of the reverse-engineering scene, so this one anecdote is really the only data point I have about "mentoring" someone, if it even counts. I fall into the category of people who picked up programming first and then reverse-engineering second. I don't know what I'm worth compared to other reverse-engineers and my signature technique is extremely fringe. I don't really have a reference point of what's normal or not.

That being said, I believe that there's a large skillset overlap between comparable reverse-engineering and programming activities. Knowing various programming patterns and architectures is helpful for making sense of (de)compiled code during static analysis. Being knee-deep in the bowels of a misbehaving program armed with GDB and you're getting a taste of dynamic analysis. Throw in some missing debugging symbols or advanced optimization work and you'll pick up some assembly on the way.

In my eyes, the only real difference is the mindset. On one side you're building software, on the other you're deconstructing it. Maybe I've been at it in the trenches for so long that I can't tell the difference anymore.


I agree with you both to some extent. It's all anecdotal though, really.

I think a fair point is that there are common idioms that you need to learn one way or another. Whether that is formal training or intuition or just plain force of will, you need to come to understand the meaning of what you are looking at and not just what the individual instructions are doing.

Otherwise, it's a similar idea to saying, "nobody needs to learn how to read music because look how great Jimi Hendrix was and he couldn't".


Right, I'm not saying that learning software engineering wouldn't help. I'm specifically pushing back against "it's significantly harder without some engineering background", since a lot of good reverse engineers I know still don't have a good software engineering background. Being able to identify program constructs and idioms from the programming side instead of the reverse engineer side is definitely one way to do it, but I don't think it's the only way and I'm not sure is even the best way, since a lot of programming details are surprisingly irrelevant for RE so going through a full CS degree program will also spend a lot of time teaching you things you don't need to know for RE.


My point was about relevant engineering background for a particular task. For example, if you're trying to binary patch something, having prior assembly programming experience would help a lot, but knowing the runtime complexity characteristics of various sorting algorithms wouldn't.

I'm not suggesting that aspiring reverse-engineers need to pursue a full-blown CS degree first, but most reverse-engineering activities usually have at least one counterpart engineering activity. You can power through without learning it first, but I'm not convinced that it's easier or faster to learn that way.

As for me, I've spent quite a lot of time doing low-level software engineering beforehand (stuff like OSDev, bare-metal programming and GDB debugging sessions with missing symbols...) and I've picked up on reverse-engineering very quickly I believe, thanks to lots of relevant prior engineering experience. Had I spent my time making cracks and mods instead, I highly doubt I would've been able to later pivot towards software engineering that easily, due to a lack of foundational CS knowledge.


You entirely underestimate the power of structured learning and reinforcing exercises. While critical reasoning, curiosity, and passion are things that may be difficult to impart, a well-written book can cut hours of trial and error to something suitably reasonable. Notice that there are plenty of books but there are only a handful of "good" books.


Structured learning is great, but I think you're over estimating the power of books. Especially in a domain like reverse engineering. The moment a book is published it's out of date. What worked yesterday doesn't work tomorrow.

I never suggested people learn entirely on their own. I learned in a loosely structured way by reading thousands of forum posts, asking questions on forums, sitting in IRC channels talking to people, etc.


Perhaps, but like I said, there are books and then there are good books. Besides, state of the art might change rapidly but the fundamentals rarely do.


I never thought of cooking as something you pick up a book for.


Damn, you’re so good, can I get an autograph?


there's a book for most things.

everything you listed here could be in a book used to help you gain those skills


if I ever get around to learning reverse engineering, I don't expect a book like this to teach me how to do it. I expect it to inform me of what I don't know that I don't know. For that it seems okay as a starting point.


This is the way.

Although some books greatly help in getting there, I learnt a lot from "Reversing: Secrets of Reverse Engineering" by Eldad Eilam (might be a bit dated now)


> Get a real job and make money with the skills and knowledge I acquired.

Do you mind sharing what kind of job is that?


My first job was working at a video surveillance company... My specific job was reversing multiple proprietary video streams, transcoding them and stitching them into a single output stream and sending it to a browser. For example, taking nine 1080p video streams and stitching them into a single 3x3 video stream that totaled 1080p.

It was a chaotic mess of C++.

I did it for a year before joining a startup, and on and on.


One word : plastics.


It's also how I learnt programming when I was 14! It was so much fun.


Lmao we must be related. U explained my childhood


I found so many mistakes, badly understood concepts and entirely wrong explanations just reading for 5 minutes, that I can't possibly recommend this. It's obviously written by amateurs / people with minimal experience of the domain.

Much better resources are Eldad Eilam's "Secrets of Reverse Engineering", for Windows "Practical Reverse Engineering" and for the absolute basics, Patterson's "Computer Organization and Design".


Secrets of Reverse Engineering is from 2005. Is there a more recent book you would recommend?


It seems like a high-level overview, good for somebody new to the topic.

It also linked to this resource, which was more in depth... https://github.com/mytechnotalent/Reverse-Engineering

EDIT: Whoops... it looks like it mostly links back to the original article.


This brought back memories when I was reading reversing tutorials from searchlores.org and fravia.com...

It's in web archive now, https://web.archive.org/web/20191201105759/http://search.lor...


Reverse engineering Java is cool as well, especially fishy Android apps which control some appliances via byzantine Bluetooth protocols.


If only there was an equivalent to DnSpyEx for Java. Can't wait for Recaf4 to be ready.


Neat, I know the guy that wrote this guide! Glad to see it made it onto HN, if you have any specific feedback I can pass it along.


This is way too short on engagement and visuals. Way too much telling and walls of text. For these reasons alone this is not for "everyone".


Props to the author for writing this – that being said, I felt the same way.

Very long, windy and hard to parse sentences.

For example, Part 2

> There are two basic techniques that you can employ when analyzing malware. The first being static analysis and the other being dynamic analysis.

> Static analysis uses software tools to examine the executable without running the actual decompiled instructions in Assembly. We will not focus on this type of analysis here as we are going to focus on actual disassembled binaries instead however in future courses we will.

> Dynamic analysis uses disassemblers and debuggers to analyze malware binaries while actually running them. The most popular tool in the market today is called IDA which is a multi-platform, multi-processor disassembler and debugger. There are other disassembler/debugger tools as well on the market today such as Hopper Disassembler, OllyDbg and many more.

> A disassembler will convert an executable binary written in Assembly, C, C++, etc into Assembly Language instructions that you can debug and manipulate.

> Reverse engineering is much more than just malware analysis. At the end of our series, our capstone tutorial will utilize IDA as we will create a real-world scenario where you will be tasked by the CEO of ABC Biochemicals to secretly try to ethically hack his companies software that controls a bullet-proof door in a very sensitive Bio-Chemical lab in order to test how well the software works against real threats. The project will be very basic however it will ultimately showcase the power of Assembly Language and how one can use it to reverse engineer and ultimately provide solutions on how to better design the code to make it safer.

> In our next lesson we will discuss various types of malware.

could be written:

> There are two basic techniques that you can employ when analyzing malware: static analysis and dynamic analysis.

> Static analysis examines the executable without running it. We will not focus on this type of analysis here, however in future courses we will.

> Dynamic analysis uses disassemblers and debuggers to analyze malware binaries while running them.

> A disassembler converts an executable binary into Assembly Language instructions that you can debug and manipulate. There are many disassembler/debugger tools available such as Hopper, OllyDbg, IDA and many more. The most popular being IDA, a multi-platform, multi-processor disassembler and debugger.

> Reverse engineering is much more than just malware analysis.

> At the end of our series, we will use IDA in a fictional scenario where you will be tasked by the CEO of ABC Biochemicals – a very sensitive Bio-Chemical lab – to ethically hack his company’s bullet-proof door control-system.

> The project, while basic, will showcase the power of Assembly Language and how one can use it to reverse engineer black-box binaries and ultimately find solutions to make the code safer.

> In our next lesson we will discuss various types of malware.


> hard to parse sentences

That's because you're supposed to reverse engineer them :)


Book links are not working for me, did you get the HN hug of death?


Hmm, yeah I don't know. This reads like a lot of fluff or immediately unimportant stuff.

Reverse engineering in the real world takes a few forms, some of which the write takes on too briefly towards the end of the material. Applied reverse engineer is usually modifying an existing piece of software so:

    * .dll/lib injection
    * signature scanning and patching
    * packet interception and rewriting
    * mitm HTTP(s) calls
These are just a few places where you see reverse engineering used, usually to modify existing software.

I'm curious if there's any reading out there that covers this stuff from the meat and potatoes and less of this CS 101 stuff.

I've done all of the above, and you can usually learn about this stuff from some different forums on the web, but I don't know of any good bibles on the subject matter.


Applied reverse-engineering is all about bending the rules of engineering. Because of this, I think it can be learned through experience, but I doubt it can be taught through theory (or at least not in an effective manner). At its core, it's about spotting metapatterns to gain an understanding of a program and applying leverage to affect it. That's more art than science, no matter how much tooling you throw at it.

Honestly, I think the most effective way to learn about how to reverse-engineer something is to learn engineering at the same layer first and then start tinkering. If you want to binary patch a program, learn assembly. If you want to inject a .dll, learn how to write and use dynamic libraries. If you want to MITM a REST API, learn how to call a REST API. Because once you know the rules well, you can start breaking them and see exactly how much you can get away with.

I wrote a series of articles on reverse-engineering on my blog, about studying and modifying a program that outputs an ASCII table, mostly because I needed a way to introduce delinking as a technique. I would not say it's good, but it starts with how to build the case study and then it handholds the reader through the meat and potatoes.


This. There's a lot to be said about understanding registers and assembly and different languages and how a USB packet is constructed, but efficiency in reverse engineering comes down to effective pattern recognition.

A binary is likely to have a reasonable amount of often-called code for memory operations (memset, memcpy, strcat, strlen, sscanf, log) and a lot of library code (Flexcomm_Init, Clock_AttachClk, SPI1_Handler, NVIC_EnableIRQ) and then probably fairly little actual application code. For Ghidra users, being able to ignore the boilerplate (mem and BSP code) and quickly find and analyze the application code saves a TON of time.

(Conversely, if I know a binary is written using FreeRTOS, finding the task creation function would be my first step, as this reveals nearly all of the application code.)

There are techniques to help (setting a flash memory region as non-write so string references are recognized and disassembled correctly, loading a chip SVD so all the library code is more obvious) but those come with experience or a good hands-on tutorial, and they still won't tell you everything about the application code.

In my own breakdown of one Cortex-M binary (bare metal, no objects known) the only reason I was able to get the firmware in the first place was by noticing and decoding a base64 string in an unpacked Electron app used for USB communication with the device. This ended up holding plaintext credentials for their update server which had two channels: one for encrypted production binaries and the other for unencrypted development binaries.

In this specific case, it helped to know what base64 looks like, but that's like how knowing different methods of slicing onions might help you figure out a recipe by tasting a cooked meal. Very often such background knowledge is irrelevant. Once in a while it will be the only realistic way forward.


> I wrote a series of articles on reverse-engineering on my blog, about studying and modifying a program that outputs an ASCII table,

Would you mind sharing the links? I would be interested!


You can find the table of contents for the series there: https://boricj.net/reverse-engineering/2023/05/01/introducti...

I expect that you'll be mostly interested in parts 2 through 6. Part 1 explains how a toolchain works in general (so mostly CS 101 stuff as the OP put it). Parts 7 to 10 demonstrates the delinking technique by easing into it, a technique which is as powerful as it is esoteric, but probably not what you're looking for in a beginner's guide.


> I'm curious if there's any reading out there that covers this stuff from the meat and potatoes

In my experience using radare2 to peek at the code is pretty much the meat and potatoes of reverse engineering binaries and far from "CS 101 stuff". You certainly don't need to modify a binary to MITM an API or inspect/alter packets or inject code via dynamic loading; nor is it the most convenient or clean or easy to maintain way to do so.

Secondly, this is a shockingly dismissive attitude for such a large resource. It took me a few minutes to just read through the table of contents.


Just because it's large doesn't mean it's relevant: using radare2, IDA Pro, or some other tool doesn't mean you're going to be able to do anything besides look at a binary.

I mean, you said you read the table of contents, yeah? Doing the same thing across different CPU architectures isn't doing something at length, it's just doing the same thing over and over again in rhymes.

In practice, yeah, people in the wild are absolutely modifying binaries, injecting, stubbing .dlls and redirecting calls, or creating proxy servers that alter payloads, for sure.

Learning how to compile a program isn't exactly reverse engineering worthy content to write about.


I disagree, learning how to compile a program is a prime example of something you'd want in a book about reverse engineering "for everyone". A book which focuses only on specific methods of changing software behavior would be useful only to those who know how to understand said software. In fact the term "reverse engineering" itself does not imply modification at all.


> Just because it's large doesn't mean it's relevant: using radare2, IDA Pro, or some other tool doesn't mean you're going to be able to do anything besides look at a binary.

Looking at a binary is like 99% of the work, though. Or at least looking at some secondary form of it (e.g. assembly, decompilation, etc). Tools are absolutely critical to the work.

> people in the wild are absolutely modifying binaries, injecting, stubbing .dlls and redirecting calls, or creating proxy servers that alter payloads, for sure

I would call modifying a binary "cracking" it but it's been a few decades since I was involved in that scene. I also think that the topic is large enough to warrant multiple focuses—to me, at least, writing a MITM server is much more trivial than extracting a private key from a binary (or a running process) that makes that MITM server functionally useful.

> Learning how to compile a program isn't exactly reverse engineering worthy content to write about.

That's a disingenuous characterization of most of the content here. Coding at the instruction level requires a different way of reading and writing code than you're otherwise exposed to. Most programmers aren't used to handling bits directly, and certainly not to the extent that it rewards you at the instruction level for learning and knowing. With the tools here you can, in fact, sit down and inspect the license verification function of a piece of software (although I'm not sure how much that's true or beneficial these days with code-signing etc).

EDIT: Or you could do what I did and work with as, `otool`, and a hex editor, and learn extremely slowly & painfully why custom-built reverse engineering tools are so valuable to learn.

There's always more to learn, of course, but that's no reason to belittle what you've already learned and other people still have yet to learn.


Yeah, I'm sure what I'm saying probably comes off as belittling, but that's not my intent. It's just more productive to understand who the audience is. The author write "free PDF" content with Guy Fawkes mask header images in the README.mds.

If you're going to target script kiddies, at least show them how to Hello, World! from a DLL_PROCESS_ATTACH, and then teach them sigscanning.


Resources exist, but are only so helpful IMO.

One can't necessarily build an airplane after watching a documentary on it.

Even if there was some "bible" on it, reverse engineering is one of those things that you have to put the reps in for to get good at it and actually develop understanding.

The "bible" is tackling reverse-engineering related projects independently over the course of months/years and picking up knowledge along the way.

Starting with something like cracking software (and making increasingly-advanced cracks) is always my advice for beginners.


> places where you see reverse engineering used, usually to modify existing software.

funnily enough I have a team reverse engineering binary data formats, which is often more easily accomplished by other means + only dropping down to the disassembly/decompilation where absolutely necessary. and which as far as I am aware never involves binary patching

but yeah about the article it seems like if you know this much about assembly / chips etc. to be able to read it, then general problem solving ability should be able to cover most of the article's content


> The x64 Architecture

Ahh yes, the famous 8064 computer.

Just kidding, looks like a great work.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: