A Gentle Primer on Reverse Engineering

sdevlin · on Jan 29, 2015

First: great article.

One nit, though. There's a subtle error in the main function:

    char* input;
    printf("Please input a word: ");
    scanf("%s", input);

Local variables are not automatically initialized in C, and we never assign input to point to any particular block of memory. This means it's probably pointing off to some random location - basically whatever address happened to be sitting on the stack when main was called. So when scanf writes the user password to input, it's going to go to some unpredictable location with unpredictable results. This could lead to code execution, or at least a straightforward denial of service.

fezz · on Jan 29, 2015

[flagged]

tptacek · on Jan 29, 2015

Don't write things like this on HN.

zrail · on Jan 29, 2015

The author addresses this in footnote #2. They're simplifying the C code to get to the point of the article faster.

sdevlin · on Jan 29, 2015

I don't really buy that this is a simplification. This code isn't even guaranteed to work on correct input.

A really simple fix would be just to use a character array.

farmdve · on Jan 29, 2015

Hmm, that footnote was very hard to notice, so I was quick to point out these errors to the person who maintains the blog, only to get blocked by her on twitter. Not sure why go that far.

zrail · on Jan 29, 2015

Because, if you're going to criticize the article, maybe go read everything there is to read? Because every one of your points was addressed somewhere in the article.

farmdve · on Jan 29, 2015

The footnote was very difficult to spot.

Furthermore, I am not sure why not write out the complete code, even if the author knew the code was not correct, even when pointed out, it would have been very trivial to initialize the variable.

zrail · on Jan 29, 2015

Because the author made a stylistic choice to publish slightly incomplete code to get to the point faster (and pointed that out in a footnote, which was just as easy to spot as the "bug"). The point of the essay is not a perfectly executed C program, it's to demonstrate code disassembly and reverse engineering. More attention on perfect C code means less attention (reader attention and author attention alike) on the essential facts the author is trying to communicate.

In addition, I'm not an expert at this stuff, but it seems like initializing that variable would make the disassembly more complex and thus obfuscate the point just that much further.

sokoloff · on Jan 29, 2015

The gist accompanying the article of the full source code contains the same bug. I'm not convinced it was a stylistic choice in the blog for a clearer explanation.

I would expect the full code to be correct/complete. This type of bug goes well beyond "I also make some assumptions in string handling that are considered gravely unsafe to use in a modern program, so please do not use this code in the real world." IMO.

zrail · on Jan 29, 2015

The C code is literally the least interesting part of this essay. This bug, such as it is, does not matter. It is entirely and completely beside the point.

digler990 · on Jan 29, 2015

[flagged]

tptacek · on Jan 29, 2015

Not helping.

farmdve · on Jan 29, 2015

Regarding your last sentence. Reverse Engineering is a complex task, and obfuscation is but one of the many challenges a reverse engineer faces, granted malloc/free or stack variable and the example binary not being one of them.

zrail · on Jan 29, 2015

The title of the essay starts with "A Gentle Primer". Think less "this is a complex and deep thing that is hard to understand" and more "check out this neat thing you can do with computers! computers are awesome!"

EpicEng · on Jan 29, 2015

Simplifying does not mean incorrect. Initializing the variable would not add appreciable complexity to the code. It's a snippet and should be fixed lest it make the author appear incompetent.

zrail · on Jan 29, 2015

I'm not 100% confident, but I'm pretty sure the author does not care if you think they're incompetent. The author's skill and apparent interest is in explaining an inherently difficult concept (reverse engineering a complied executable) in a way that won't cause their audience (not you) to tune out and/or go running for the hills.

Could they adjust the snippet? Sure. Will it add anything to the essay to do so? Nope.

EpicEng · on Jan 29, 2015

Eh... I'm not really buying it. I think it was an honest mistake. C'mon; this makes that snippet harder to read?

    char input[SIZE];

I really don't think so. I'm all for simplifying example code to get the core concept across, but I wouldn't go as far as to invoke undefined behavior. I also don't understand the use of scanf. At all. A seasoned, competent C user would never consider using scanf.

Anyway, I didn't mean to derail things too much here. It's really a nitpick and has little to do with the article itself.

benihana · on Jan 29, 2015

Forget the author appearing competent, this is a primer. Primers should have code you can copy and paste and be correct every time. Most beginner following along running into a weird issue with the program they're using to reverse engineer before even getting to the fun part are just going to give up.

sireat · on Jan 29, 2015

I would expect an article on reverse engineering to have correct C code.

If you think that having undefined behavior in your code is fine as long as it works for you, do not be surprised that one moment a vile dragon appears and starts spewing fire.

tptacek · on Jan 29, 2015

The author was wrong, and then corrected their post. Why get indignant on their behalf?

zrail · on Jan 29, 2015

The wrongness, such as it was, is completely beside the point. The author's audience is a group of people where this has a high probability of this being their first exposure to any C at all. That `malloc` is one more thing for them to get tripped up and distracted by.

I'm indignant because this happens on almost every marginally interesting article that makes the front page and I'm sick of it. I get that the code was wrong and a bug and C must be correct at all times or the world will literally light on fire. It's just exhausting reading comment after comment about the tiniest little nit in an otherwise perfectly wonderful essay.

tptacek · on Jan 29, 2015

C must be correct at all times or the world will literally light on fire.

Yes, this is frustrating. It's why we try not to write anything in C anymore.

_nullandnull_ · on Jan 29, 2015

Kind of sad that 75% of the comments in this thread are negative or pedantic. I work as a reverse engineer and I thought it was a good "gentle primer".

aptwebapps · on Jan 29, 2015

Whenever I read a comment like this of sufficient age I wish I could see a snapshot of the page at the time. Right now there's only 48 comments in total and I don't really notice much negativity. There's some back and forth nit-picking, do you consider that negative?

jkubicek · on Jan 28, 2015

This was great. If there's ever a "Gentle Primer, pt. II" I'd love to see a walkthrough on replacing the existing hardcoded password with a different string.

shanemhansen · on Jan 28, 2015

I used emacs hexl-mode and http://support.amd.com/TechDocs/24594.pdf to edit a je to a jne which caused the program to think I put in the correct password. That was fun.

palmer_eldritch · on Jan 29, 2015

But it doesn't work anymore with the correct password...

I always found it was cleaner to either force (jmp) or remove (nopnop) the jump rather than inversing its condition. It's more explicit.

Also, in the real world, cracking's usually a bit more than finding the right jump to force/remove. Although, if it's enough to reach your goal, you should do it.

agumonkey · on Jan 29, 2015

The firstest trick in the book.

i5rider · on Jan 28, 2015

The title seems a bit misleading, e.g. one could reverse engineer source code into UML. Perhaps a more appropriate title would be: A Gentle Primer on Code disassembling.

jasode · on Jan 29, 2015

For what she's describing, "reverse engineering" is actually the more common phrase rather than "code disassembling".

If you search amazon.com, "reverse engineering" is in the titles of the first 2 books:

http://www.amazon.com/s/ref=nb_sb_noss_1?&field-keywords=rev...

The way most people use the terminology now, I'd say "reverse engineering" encompasses all the strategies of analyzing and unraveling the logic of assembler code. On the other hand "code disassembling" is more specific about using a tool such as IDA to disassemble the binary executable into assembly before the intellectual task reverse engineering begins.

revelation · on Jan 29, 2015

That's a bit misleading. Reverse engineering has, on the face, little to do with assembler of any sort; it's about understanding your target and choosing the best angle of attack to learn more about it. That often depends on how the target was produced in the first place. If it's written in C or C++, sure, go to the assembler. If it's some Adobe Air program that is really just a launcher for a SWF, well, you're not going to have much luck. Similarly, the application logic in a program like EVE Online is almost entirely written in Python; you wouldn't exactly know it from the outside, because they've wrapped all the native things they need. But then you won't learn much either just looking at the assembly of what are various layers and layers of wrappers.

GotAnyMegadeth · on Jan 29, 2015

Something interesting I found out whilst following these instructions: My version of linux has two different echo commands `echo` and `/bin/echo`.

The basic one cannot accept any flags and when I type `echo "\x01"` it prints `\x01`

The one in bin can accept flags, and requires the -e flag to interpret backslashes.

This changes the echo line of the program to this:

`/bin/echo -e "\x01"`

I found this out because it was changing the byte from 00 to 5C rather than 01, because 5C is the ASCII for \

zrail · on Jan 29, 2015

Interesting. You might be seeing the bash (or other shell) `echo` builtin.

GotAnyMegadeth · on Jan 29, 2015

$ which echo

echo: shell built-in command.

Kenji · on Jan 28, 2015

This is slightly offtopic but I find the vulgar example passwords amusing. I used to use a lot of vulgar language in my code but when you have to commit it and other people (e.g. your supervisor) read it, you have to be more careful.

jkubicek · on Jan 28, 2015

I find it especially amusing because "poop" and "butts" are my default "I need to enter some text here" strings.

seanp2k2 · on Jan 29, 2015

This is why we have [foo, bar, baz, quux] and example.com though: https://www.iana.org/domains/reserved

coldpie · on Jan 29, 2015

God, I hate these, especially in API documentation. Come up with a simple example of how I might want to use your API. The relationship between fruit and bananas is much more clear than between foos and bazzes.

caipre · on Jan 29, 2015

I recently had to expand my list and came up with:

    foo bar baz buz qux pip pop tut rof art uff dex dom zed

juliangregorian · on Jan 29, 2015

Except there's already foo, bar, baz, qux, quux, garply, waldo, fred, plugh, xyzzy, thud.

http://www.catb.org/~esr/jargon/html/F/foo.html

busterarm · on Jan 29, 2015

Same here except I go much more vulgar.

It's such a hard habit to break. Also, if I can name a character in a videogame, it's almost always Dickbutt.

pavel_lishin · on Jan 28, 2015

I usually try to make my passwords obscene if I suspect that they're being stored in something that can be reverted back to plain text, or if I suspect that someone may force me to give it to them (e.g., clueless IT team).

logfromblammo · on Jan 29, 2015

If you're a fan of Abbott and Costello, and you believe someone might ask you for your password, try this one:

"I always forget, so I wrote it down."

A: What's your password?

B: "I always forget, so I wrote it down."

A: Where?

B: Where what?

A: Where did you write down your password?

B: I don't write down my password.

A: You just told me you did!

B: Why would I do that?

A: Because I was asking you for your password.

B: And I gave it to you.

A: Did you, or did you not, forget your password?

B: No.

A: So let's have it.

B: "I always forget, so I wrote it down."

A: Do you... ah... need to search your pockets?

B: Why?

A: To help you remember.

B: Remember what?

A: Your password.

B: No. It's very distinctive. I always remember it.

ics · on Jan 28, 2015

'------------...IsSomethingNobodyShouldHaveToRead', get as offensive as necessary and then include this 36-character salt. Like that?

AlexeyBrin · on Jan 29, 2015

C lacks a boolean type

This is false, as of C99 we have booleans in C, just include stdbool.h in your code, e.g.:

    #include <stdbool.h>
    
    ...

    bool test = true;

    ...

zrail · on Jan 29, 2015

Let's look at the source of stdbool.h:

    #define true 1
    #define false 0

http://clang.llvm.org/doxygen/stdbool_8h_source.html

delinka · on Jan 29, 2015

That simply explains the truth behind the words. Re-type-ing basic types into other names gives us the ability to share intent without commentary. A function returning 'bool' is intended to be treated as having two valid return values. A function returning int might be intended to return any of the large rage of values that int can support.

samsaga2 · on Jan 29, 2015

You're missing

   #define bool _Bool

Since bool wasn't reserved prior to C99, they use the _Bool keyword (which was reserved). [http://stackoverflow.com/questions/8724349/difference-betwee...]

AlexeyBrin · on Jan 29, 2015

I know that sdtbool actually contains macros for defining true and false, but _Bool is a new type in C99 ... But it is in the standard, implemented by all major compilers, so it should be used for the sake of clarity.

IMHO this:

    bool is_valid(char* password);

is more clear than this:

    int is_valid(char* password);

zrail · on Jan 29, 2015

But here's the thing: in the assembly (the very essence of the essay), booleans are still represented as integers. Using a standard macro would obfuscate the path to getting to the payoff of being able to switch a single byte from 0 to 1 and crack the program, because the author would then have to explain macros and delve into a bunch of ancillary stuff.

farmdve · on Jan 29, 2015

Starting to work with Reverse Engineering assumes prior knowledge of programming and/or some basic information about assembly for the specific platform.

Furthermore as the examples are given in C, it is implied that the reader needs to be familiar with the language, and as a consequence, knowing about macros, should be if not a given, then at least required.

zrail · on Jan 29, 2015

The original audience of this content (i.e. not HN) is a broad interest group so you can't really assume that kind of knowledge. Again, think "hey this is neat!" vs "this is absolutely technically correct in all ways."

unwind · on Jan 29, 2015

Of course it should be bool.

I'm glad that the OP at least made it a const char *password though, since that's also a must-have. Always const input pointers. And local temporaries. And everything else that you can const, except scalar arguments pretty much.

userbinator · on Jan 29, 2015

Given that more than half of that file is a copyright comment, and the other half basically has to follow a standard exactly, it almost makes me wonder if it meets the minimum threshold of creativity to be copyrightable.

zellyn · on Jan 29, 2015

Again (hackernews should just prepopulate my comment box with this text), if you enjoyed this, you'll probably enjoy https://microcorruption.com/ Surprised to see that it wasn't already mentioned, especially since one of the creators, tptacek, was active on this thread.

kbart · on Jan 29, 2015

Slightly off-topic. Could somebody recommend good resources on reverse engineering (especially C and Linux)? I'm writing C code for living, but binary level security is not my strong side and I wish to improve it.

Qworg · on Jan 29, 2015

I don't do this for a living, but an accessible resource is the UIUC ACM SIGMil guide. It is a little old, but relevant.

http://althing.cs.dartmouth.edu/local/www.acm.uiuc.edu/sigmi...

davenportw15 · on Jan 29, 2015

Wouldn't be cleaner to implement is_valid like this:

  int is_valid(char* password) {
    return strcmp(password, "poop") == 0;
  }

codygman · on Jan 29, 2015

Maybe, but if/else is more approachable especially if you haven't seen that pattern before.

mwcampbell · on Jan 28, 2015

Would have been a bit more realistic if the program had been stripped of symbols before disassembly. Still, the call to strcmp would be pretty obvious.

kschmit90 · on Jan 29, 2015

FWIW I would wager that a male would not have to inb4 pedantic arguments. Also that a male would not get so much criticism in regards to nit picky things.

But maybe it's just a common trait of detail oriented programmer types to be pedantic?