Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: 'Hello, World ' in x86 assembly, but make it gibberish (github.com/phoreverpheebs)
101 points by phoreverpheebs on March 17, 2023 | hide | past | favorite | 21 comments



The SIGSEGVs you are seeing are caused by these instructions:

    0x8048199 <_start+299>     add    cl, 0x2
    0x804819c <_start+302>     jp     0x80481a3 <_start+309>
At this point, ECX has a value based on the initial value of your stack pointer ESP. With ASLR enabled, this is randomized by the ELF loader. The parity flag (PF) is set if the number of bits in the least significant byte of the arithmetic result (which is just CL) is even, and cleared if odd. This ends up clear about 50% of the time, so the branch isn't taken and execution continues here, which because EAX is 1 (from the previous syscall return value), is an invalid access, causing SIGSEGV:

    0x804819e <_start+304>     pblendw xmm2, XMMWORD PTR [eax+0x4eb3909], 0xd9
You should be able to reliably reproduce this in GDB by providing additional arguments to your program, which will adjust the initial value of ESP as they are placed on the stack. For example, this worked to always hit the SIGSEGV on my system:

    gdb -ex r --args ./gibberish 1 2 3 4 5 6 7
There's another potential issue here, in that these instructions assume that the most significant bit is set in ESP, which EBP has been derived from (via ECX):

    0x8048127 <_start+185>     neg    ebp
    0x8048129 <_start+187>     jns    0x804812d <_start+191>
This isn't always the case. For instance, if running your binary under QEMU's user-mode emulation, the stack is likely to be placed in the lower end of the address space, so that branch isn't taken and you will get a SIGSEGV here:

    0x804812b <_start+189>     cmovb  edx, DWORD PTR [eax-0x72dbfbe0]


thank you! you are wonderful for this <3 idk why but the stack being in the lower end of the address space doesn't feel right. nevertheless this is an amazing explanation and thank you for taking the time <3


Memory layout in gdb is consistent because gdb disables ASLR by default.

This is usually what you want so that memory addresses don't change unnecessarily from one run to the next. But if you're debugging a problem that only shows up when ASLR is in effect then you can turn it back on. https://visualgdb.com/gdbreference/commands/set_disable-rand...


gibberish.asm (at least on my machine) runs into a segmentation fault about 50% of the time, which to me raises an interesting question that will lead me to look into how memory is laid out at the beginning of a processes execution on a standard Linux system. In gdb, the memory seems to be allocated in a consistent manner, which causes the exception to never occur, though in normal execution it seems to be slightly different.

Wonderful!


Reminded me of the movsfucator that compiles programs into only mov instructions!

https://github.com/xoreaxeaxeax/movfuscator

And also REpsych from the same developer which really isn't for the fainthearted...

https://github.com/xoreaxeaxeax/REpsych


i love Domas' work! this project did have a bit of inspiration from him


Finally finished this "proof of concept", that obfuscates a string in a binary by scattering its bytes across the program's opcodes.


Makes me think of the "when is main not a function" article from 2015:

http://jroweboy.github.io/c/asm/2015/01/26/when-is-main-not-...


If you are interested in obfuscation and anti-reverse engineering I can recommend Josh Stroschein's courses. He does a fantastic job explaining various techniques.


It's like the inverse of Enterprise FizzBuzz (https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...)


That's literally some of the worst code I've ever seen. Since that's sort-of what you were attempting, good job! It appears to be a misaligned instruction sequence (but I know better since I read the github).


Looks like what happens when a dumb disassembler misaligns instruction boundaries. All those immediate constants looking like instructions is a huge hint.


"'Hello, World ' in x86 assembly, but make it gibberish"

Well, there's a pleonasm if I ever saw one:)


How did you learn this stuff? All of these assembly instructions reminds me of Diablo 2 runes.


most of the program is standard assembly, then sometimes it's just manipulating some bits of data in bizarre ways to get to a value, either way it's mostly just consulting the x86 instruction documentation for functionality and encoding.

in terms of the instructions being wild, some of them are chosen, because the opcode matches the byte we need on a specific spot, but other times the instruction is just a randomly chosen one, however it was still usually important that the instruction could take a long operand instead of an immediate 8-bit value


Can't this be reversed by static analysis? Most approaches focus on the source, but IIRC there are also bytecode tools for at least Java


Yeah, it's not a way to 100% obfuscate the functionality, but instead it could be more of a way to throw off someone looking at the binary


If I was reverse engineering a program and I saw this I think I might just quit and rethink my life.


Beautiful! I love obfuscated code, but I normally do it in high level languages. Always wanted to pick up golfing in asm.


Was this done by hand?


sadly yes lmao, it was a fun challenge trying to not reuse instructions, but nearing the end where most modr/m bytes started repeating, i ended up having to resort to jmps to jump from a long operand to another long operand




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: