One nit, though. There's a subtle error in the main function:
char* input;
printf("Please input a word: ");
scanf("%s", input);
Local variables are not automatically initialized in C, and we never assign input to point to any particular block of memory. This means it's probably pointing off to some random location - basically whatever address happened to be sitting on the stack when main was called. So when scanf writes the user password to input, it's going to go to some unpredictable location with unpredictable results. This could lead to code execution, or at least a straightforward denial of service.
Hmm, that footnote was very hard to notice, so I was quick to point out these errors to the person who maintains the blog, only to get blocked by her on twitter. Not sure why go that far.
Because, if you're going to criticize the article, maybe go read everything there is to read? Because every one of your points was addressed somewhere in the article.
Furthermore, I am not sure why not write out the complete code, even if the author knew the code was not correct, even when pointed out, it would have been very trivial to initialize the variable.
Because the author made a stylistic choice to publish slightly incomplete code to get to the point faster (and pointed that out in a footnote, which was just as easy to spot as the "bug"). The point of the essay is not a perfectly executed C program, it's to demonstrate code disassembly and reverse engineering. More attention on perfect C code means less attention (reader attention and author attention alike) on the essential facts the author is trying to communicate.
In addition, I'm not an expert at this stuff, but it seems like initializing that variable would make the disassembly more complex and thus obfuscate the point just that much further.
The gist accompanying the article of the full source code contains the same bug. I'm not convinced it was a stylistic choice in the blog for a clearer explanation.
I would expect the full code to be correct/complete. This type of bug goes well beyond "I also make some assumptions in string handling that are considered gravely unsafe to use in a modern program, so please do not use this code in the real world." IMO.
The C code is literally the least interesting part of this essay. This bug, such as it is, does not matter. It is entirely and completely beside the point.
Regarding your last sentence. Reverse Engineering is a complex task, and obfuscation is but one of the many challenges a reverse engineer faces, granted malloc/free or stack variable and the example binary not being one of them.
The title of the essay starts with "A Gentle Primer". Think less "this is a complex and deep thing that is hard to understand" and more "check out this neat thing you can do with computers! computers are awesome!"
Simplifying does not mean incorrect. Initializing the variable would not add appreciable complexity to the code. It's a snippet and should be fixed lest it make the author appear incompetent.
I'm not 100% confident, but I'm pretty sure the author does not care if you think they're incompetent. The author's skill and apparent interest is in explaining an inherently difficult concept (reverse engineering a complied executable) in a way that won't cause their audience (not you) to tune out and/or go running for the hills.
Could they adjust the snippet? Sure. Will it add anything to the essay to do so? Nope.
Eh... I'm not really buying it. I think it was an honest mistake. C'mon; this makes that snippet harder to read?
char input[SIZE];
I really don't think so. I'm all for simplifying example code to get the core concept across, but I wouldn't go as far as to invoke undefined behavior. I also don't understand the use of scanf. At all. A seasoned, competent C user would never consider using scanf.
Anyway, I didn't mean to derail things too much here. It's really a nitpick and has little to do with the article itself.
Forget the author appearing competent, this is a primer. Primers should have code you can copy and paste and be correct every time. Most beginner following along running into a weird issue with the program they're using to reverse engineer before even getting to the fun part are just going to give up.
I would expect an article on reverse engineering to have correct C code.
If you think that having undefined behavior in your code is fine as long as it works for you, do not be surprised that one moment a vile dragon appears and starts spewing fire.
The wrongness, such as it was, is completely beside the point. The author's audience is a group of people where this has a high probability of this being their first exposure to any C at all. That `malloc` is one more thing for them to get tripped up and distracted by.
I'm indignant because this happens on almost every marginally interesting article that makes the front page and I'm sick of it. I get that the code was wrong and a bug and C must be correct at all times or the world will literally light on fire. It's just exhausting reading comment after comment about the tiniest little nit in an otherwise perfectly wonderful essay.
Kind of sad that 75% of the comments in this thread are negative or pedantic. I work as a reverse engineer and I thought it was a good "gentle primer".
Whenever I read a comment like this of sufficient age I wish I could see a snapshot of the page at the time. Right now there's only 48 comments in total and I don't really notice much negativity. There's some back and forth nit-picking, do you consider that negative?
This was great. If there's ever a "Gentle Primer, pt. II" I'd love to see a walkthrough on replacing the existing hardcoded password with a different string.
I used emacs hexl-mode and http://support.amd.com/TechDocs/24594.pdf to edit a je to a jne which caused the program to think I put in the correct password. That was fun.
But it doesn't work anymore with the correct password...
I always found it was cleaner to either force (jmp) or remove (nopnop) the jump rather than inversing its condition. It's more explicit.
Also, in the real world, cracking's usually a bit more than finding the right jump to force/remove. Although, if it's enough to reach your goal, you should do it.
The title seems a bit misleading, e.g. one could reverse engineer source code into UML. Perhaps a more appropriate title would be: A Gentle Primer on Code disassembling.
The way most people use the terminology now, I'd say "reverse engineering" encompasses all the strategies of analyzing and unraveling the logic of assembler code. On the other hand "code disassembling" is more specific about using a tool such as IDA to disassemble the binary executable into assembly before the intellectual task reverse engineering begins.
That's a bit misleading. Reverse engineering has, on the face, little to do with assembler of any sort; it's about understanding your target and choosing the best angle of attack to learn more about it. That often depends on how the target was produced in the first place. If it's written in C or C++, sure, go to the assembler. If it's some Adobe Air program that is really just a launcher for a SWF, well, you're not going to have much luck. Similarly, the application logic in a program like EVE Online is almost entirely written in Python; you wouldn't exactly know it from the outside, because they've wrapped all the native things they need. But then you won't learn much either just looking at the assembly of what are various layers and layers of wrappers.
This is slightly offtopic but I find the vulgar example passwords amusing. I used to use a lot of vulgar language in my code but when you have to commit it and other people (e.g. your supervisor) read it, you have to be more careful.
God, I hate these, especially in API documentation. Come up with a simple example of how I might want to use your API. The relationship between fruit and bananas is much more clear than between foos and bazzes.
I usually try to make my passwords obscene if I suspect that they're being stored in something that can be reverted back to plain text, or if I suspect that someone may force me to give it to them (e.g., clueless IT team).
That simply explains the truth behind the words. Re-type-ing basic types into other names gives us the ability to share intent without commentary. A function returning 'bool' is intended to be treated as having two valid return values. A function returning int might be intended to return any of the large rage of values that int can support.
I know that sdtbool actually contains macros for defining true and false, but _Bool is a new type in C99 ... But it is in the standard, implemented by all major compilers, so it should be used for the sake of clarity.
But here's the thing: in the assembly (the very essence of the essay), booleans are still represented as integers. Using a standard macro would obfuscate the path to getting to the payoff of being able to switch a single byte from 0 to 1 and crack the program, because the author would then have to explain macros and delve into a bunch of ancillary stuff.
Starting to work with Reverse Engineering assumes prior knowledge of programming and/or some basic information about assembly for the specific platform.
Furthermore as the examples are given in C, it is implied that the reader needs to be familiar with the language, and as a consequence, knowing about macros, should be if not a given, then at least required.
The original audience of this content (i.e. not HN) is a broad interest group so you can't really assume that kind of knowledge. Again, think "hey this is neat!" vs "this is absolutely technically correct in all ways."
I'm glad that the OP at least made it a const char *password though, since that's also a must-have. Always const input pointers. And local temporaries. And everything else that you can const, except scalar arguments pretty much.
Given that more than half of that file is a copyright comment, and the other half basically has to follow a standard exactly, it almost makes me wonder if it meets the minimum threshold of creativity to be copyrightable.
Again (hackernews should just prepopulate my comment box with this text), if you enjoyed this, you'll probably enjoy https://microcorruption.com/ Surprised to see that it wasn't already mentioned, especially since one of the creators, tptacek, was active on this thread.
Slightly off-topic. Could somebody recommend good resources on reverse engineering (especially C and Linux)? I'm writing C code for living, but binary level security is not my strong side and I wish to improve it.
Would have been a bit more realistic if the program had been stripped of symbols before disassembly. Still, the call to strcmp would be pretty obvious.
FWIW I would wager that a male would not have to inb4 pedantic arguments. Also that a male would not get so much criticism in regards to nit picky things.
But maybe it's just a common trait of detail oriented programmer types to be pedantic?
One nit, though. There's a subtle error in the main function:
Local variables are not automatically initialized in C, and we never assign input to point to any particular block of memory. This means it's probably pointing off to some random location - basically whatever address happened to be sitting on the stack when main was called. So when scanf writes the user password to input, it's going to go to some unpredictable location with unpredictable results. This could lead to code execution, or at least a straightforward denial of service.