Hacker News new | past | comments | ask | show | jobs | submit login
A Gentle Primer on Reverse Engineering (emily.st)
192 points by luu on Jan 28, 2015 | hide | past | favorite | 60 comments



First: great article.

One nit, though. There's a subtle error in the main function:

    char* input;
    printf("Please input a word: ");
    scanf("%s", input);
Local variables are not automatically initialized in C, and we never assign input to point to any particular block of memory. This means it's probably pointing off to some random location - basically whatever address happened to be sitting on the stack when main was called. So when scanf writes the user password to input, it's going to go to some unpredictable location with unpredictable results. This could lead to code execution, or at least a straightforward denial of service.


[flagged]


Don't write things like this on HN.


The author addresses this in footnote #2. They're simplifying the C code to get to the point of the article faster.


I don't really buy that this is a simplification. This code isn't even guaranteed to work on correct input.

A really simple fix would be just to use a character array.


Hmm, that footnote was very hard to notice, so I was quick to point out these errors to the person who maintains the blog, only to get blocked by her on twitter. Not sure why go that far.


Because, if you're going to criticize the article, maybe go read everything there is to read? Because every one of your points was addressed somewhere in the article.


The footnote was very difficult to spot.

Furthermore, I am not sure why not write out the complete code, even if the author knew the code was not correct, even when pointed out, it would have been very trivial to initialize the variable.


Because the author made a stylistic choice to publish slightly incomplete code to get to the point faster (and pointed that out in a footnote, which was just as easy to spot as the "bug"). The point of the essay is not a perfectly executed C program, it's to demonstrate code disassembly and reverse engineering. More attention on perfect C code means less attention (reader attention and author attention alike) on the essential facts the author is trying to communicate.

In addition, I'm not an expert at this stuff, but it seems like initializing that variable would make the disassembly more complex and thus obfuscate the point just that much further.


The gist accompanying the article of the full source code contains the same bug. I'm not convinced it was a stylistic choice in the blog for a clearer explanation.

I would expect the full code to be correct/complete. This type of bug goes well beyond "I also make some assumptions in string handling that are considered gravely unsafe to use in a modern program, so please do not use this code in the real world." IMO.


The C code is literally the least interesting part of this essay. This bug, such as it is, does not matter. It is entirely and completely beside the point.


[flagged]


Not helping.


Regarding your last sentence. Reverse Engineering is a complex task, and obfuscation is but one of the many challenges a reverse engineer faces, granted malloc/free or stack variable and the example binary not being one of them.


The title of the essay starts with "A Gentle Primer". Think less "this is a complex and deep thing that is hard to understand" and more "check out this neat thing you can do with computers! computers are awesome!"


Simplifying does not mean incorrect. Initializing the variable would not add appreciable complexity to the code. It's a snippet and should be fixed lest it make the author appear incompetent.


I'm not 100% confident, but I'm pretty sure the author does not care if you think they're incompetent. The author's skill and apparent interest is in explaining an inherently difficult concept (reverse engineering a complied executable) in a way that won't cause their audience (not you) to tune out and/or go running for the hills.

Could they adjust the snippet? Sure. Will it add anything to the essay to do so? Nope.


Eh... I'm not really buying it. I think it was an honest mistake. C'mon; this makes that snippet harder to read?

    char input[SIZE];
I really don't think so. I'm all for simplifying example code to get the core concept across, but I wouldn't go as far as to invoke undefined behavior. I also don't understand the use of scanf. At all. A seasoned, competent C user would never consider using scanf.

Anyway, I didn't mean to derail things too much here. It's really a nitpick and has little to do with the article itself.


Forget the author appearing competent, this is a primer. Primers should have code you can copy and paste and be correct every time. Most beginner following along running into a weird issue with the program they're using to reverse engineer before even getting to the fun part are just going to give up.


I would expect an article on reverse engineering to have correct C code.

If you think that having undefined behavior in your code is fine as long as it works for you, do not be surprised that one moment a vile dragon appears and starts spewing fire.


The author was wrong, and then corrected their post. Why get indignant on their behalf?


The wrongness, such as it was, is completely beside the point. The author's audience is a group of people where this has a high probability of this being their first exposure to any C at all. That `malloc` is one more thing for them to get tripped up and distracted by.

I'm indignant because this happens on almost every marginally interesting article that makes the front page and I'm sick of it. I get that the code was wrong and a bug and C must be correct at all times or the world will literally light on fire. It's just exhausting reading comment after comment about the tiniest little nit in an otherwise perfectly wonderful essay.


C must be correct at all times or the world will literally light on fire.

Yes, this is frustrating. It's why we try not to write anything in C anymore.


Kind of sad that 75% of the comments in this thread are negative or pedantic. I work as a reverse engineer and I thought it was a good "gentle primer".


Whenever I read a comment like this of sufficient age I wish I could see a snapshot of the page at the time. Right now there's only 48 comments in total and I don't really notice much negativity. There's some back and forth nit-picking, do you consider that negative?


This was great. If there's ever a "Gentle Primer, pt. II" I'd love to see a walkthrough on replacing the existing hardcoded password with a different string.


I used emacs hexl-mode and http://support.amd.com/TechDocs/24594.pdf to edit a je to a jne which caused the program to think I put in the correct password. That was fun.


But it doesn't work anymore with the correct password...

I always found it was cleaner to either force (jmp) or remove (nopnop) the jump rather than inversing its condition. It's more explicit.

Also, in the real world, cracking's usually a bit more than finding the right jump to force/remove. Although, if it's enough to reach your goal, you should do it.


The firstest trick in the book.


The title seems a bit misleading, e.g. one could reverse engineer source code into UML. Perhaps a more appropriate title would be: A Gentle Primer on Code disassembling.


For what she's describing, "reverse engineering" is actually the more common phrase rather than "code disassembling".

If you search amazon.com, "reverse engineering" is in the titles of the first 2 books:

http://www.amazon.com/s/ref=nb_sb_noss_1?&field-keywords=rev...

The way most people use the terminology now, I'd say "reverse engineering" encompasses all the strategies of analyzing and unraveling the logic of assembler code. On the other hand "code disassembling" is more specific about using a tool such as IDA to disassemble the binary executable into assembly before the intellectual task reverse engineering begins.


That's a bit misleading. Reverse engineering has, on the face, little to do with assembler of any sort; it's about understanding your target and choosing the best angle of attack to learn more about it. That often depends on how the target was produced in the first place. If it's written in C or C++, sure, go to the assembler. If it's some Adobe Air program that is really just a launcher for a SWF, well, you're not going to have much luck. Similarly, the application logic in a program like EVE Online is almost entirely written in Python; you wouldn't exactly know it from the outside, because they've wrapped all the native things they need. But then you won't learn much either just looking at the assembly of what are various layers and layers of wrappers.


Something interesting I found out whilst following these instructions: My version of linux has two different echo commands `echo` and `/bin/echo`.

The basic one cannot accept any flags and when I type `echo "\x01"` it prints `\x01`

The one in bin can accept flags, and requires the -e flag to interpret backslashes.

This changes the echo line of the program to this:

`/bin/echo -e "\x01"`

I found this out because it was changing the byte from 00 to 5C rather than 01, because 5C is the ASCII for \


Interesting. You might be seeing the bash (or other shell) `echo` builtin.


$ which echo

echo: shell built-in command.


This is slightly offtopic but I find the vulgar example passwords amusing. I used to use a lot of vulgar language in my code but when you have to commit it and other people (e.g. your supervisor) read it, you have to be more careful.


I find it especially amusing because "poop" and "butts" are my default "I need to enter some text here" strings.


This is why we have [foo, bar, baz, quux] and example.com though: https://www.iana.org/domains/reserved


God, I hate these, especially in API documentation. Come up with a simple example of how I might want to use your API. The relationship between fruit and bananas is much more clear than between foos and bazzes.


I recently had to expand my list and came up with:

    foo bar baz buz qux pip pop tut rof art uff dex dom zed


Except there's already foo, bar, baz, qux, quux, garply, waldo, fred, plugh, xyzzy, thud.

http://www.catb.org/~esr/jargon/html/F/foo.html


Same here except I go much more vulgar.

It's such a hard habit to break. Also, if I can name a character in a videogame, it's almost always Dickbutt.


I usually try to make my passwords obscene if I suspect that they're being stored in something that can be reverted back to plain text, or if I suspect that someone may force me to give it to them (e.g., clueless IT team).


If you're a fan of Abbott and Costello, and you believe someone might ask you for your password, try this one:

"I always forget, so I wrote it down."

A: What's your password?

B: "I always forget, so I wrote it down."

A: Where?

B: Where what?

A: Where did you write down your password?

B: I don't write down my password.

A: You just told me you did!

B: Why would I do that?

A: Because I was asking you for your password.

B: And I gave it to you.

A: Did you, or did you not, forget your password?

B: No.

A: So let's have it.

B: "I always forget, so I wrote it down."

A: Do you... ah... need to search your pockets?

B: Why?

A: To help you remember.

B: Remember what?

A: Your password.

B: No. It's very distinctive. I always remember it.


'------------...IsSomethingNobodyShouldHaveToRead', get as offensive as necessary and then include this 36-character salt. Like that?


C lacks a boolean type

This is false, as of C99 we have booleans in C, just include stdbool.h in your code, e.g.:

    #include <stdbool.h>
    
    ...

    bool test = true;

    ...


Let's look at the source of stdbool.h:

    #define true 1
    #define false 0
http://clang.llvm.org/doxygen/stdbool_8h_source.html


That simply explains the truth behind the words. Re-type-ing basic types into other names gives us the ability to share intent without commentary. A function returning 'bool' is intended to be treated as having two valid return values. A function returning int might be intended to return any of the large rage of values that int can support.


You're missing

   #define bool _Bool
Since bool wasn't reserved prior to C99, they use the _Bool keyword (which was reserved). [http://stackoverflow.com/questions/8724349/difference-betwee...]


I know that sdtbool actually contains macros for defining true and false, but _Bool is a new type in C99 ... But it is in the standard, implemented by all major compilers, so it should be used for the sake of clarity.

IMHO this:

    bool is_valid(char* password);
is more clear than this:

    int is_valid(char* password);


But here's the thing: in the assembly (the very essence of the essay), booleans are still represented as integers. Using a standard macro would obfuscate the path to getting to the payoff of being able to switch a single byte from 0 to 1 and crack the program, because the author would then have to explain macros and delve into a bunch of ancillary stuff.


Starting to work with Reverse Engineering assumes prior knowledge of programming and/or some basic information about assembly for the specific platform.

Furthermore as the examples are given in C, it is implied that the reader needs to be familiar with the language, and as a consequence, knowing about macros, should be if not a given, then at least required.


The original audience of this content (i.e. not HN) is a broad interest group so you can't really assume that kind of knowledge. Again, think "hey this is neat!" vs "this is absolutely technically correct in all ways."


Of course it should be bool.

I'm glad that the OP at least made it a const char *password though, since that's also a must-have. Always const input pointers. And local temporaries. And everything else that you can const, except scalar arguments pretty much.


Given that more than half of that file is a copyright comment, and the other half basically has to follow a standard exactly, it almost makes me wonder if it meets the minimum threshold of creativity to be copyrightable.


Again (hackernews should just prepopulate my comment box with this text), if you enjoyed this, you'll probably enjoy https://microcorruption.com/ Surprised to see that it wasn't already mentioned, especially since one of the creators, tptacek, was active on this thread.


Slightly off-topic. Could somebody recommend good resources on reverse engineering (especially C and Linux)? I'm writing C code for living, but binary level security is not my strong side and I wish to improve it.


I don't do this for a living, but an accessible resource is the UIUC ACM SIGMil guide. It is a little old, but relevant.

http://althing.cs.dartmouth.edu/local/www.acm.uiuc.edu/sigmi...


Wouldn't be cleaner to implement is_valid like this:

  int is_valid(char* password) {
    return strcmp(password, "poop") == 0;
  }


Maybe, but if/else is more approachable especially if you haven't seen that pattern before.


Would have been a bit more realistic if the program had been stripped of symbols before disassembly. Still, the call to strcmp would be pretty obvious.


FWIW I would wager that a male would not have to inb4 pedantic arguments. Also that a male would not get so much criticism in regards to nit picky things.

But maybe it's just a common trait of detail oriented programmer types to be pedantic?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: