Show HN: 'Hello, World ' in x86 assembly, but make it gibberish

fortyonepercent · on March 17, 2023

The SIGSEGVs you are seeing are caused by these instructions:

    0x8048199 <_start+299>     add    cl, 0x2
    0x804819c <_start+302>     jp     0x80481a3 <_start+309>

At this point, ECX has a value based on the initial value of your stack pointer ESP. With ASLR enabled, this is randomized by the ELF loader. The parity flag (PF) is set if the number of bits in the least significant byte of the arithmetic result (which is just CL) is even, and cleared if odd. This ends up clear about 50% of the time, so the branch isn't taken and execution continues here, which because EAX is 1 (from the previous syscall return value), is an invalid access, causing SIGSEGV:

    0x804819e <_start+304>     pblendw xmm2, XMMWORD PTR [eax+0x4eb3909], 0xd9

You should be able to reliably reproduce this in GDB by providing additional arguments to your program, which will adjust the initial value of ESP as they are placed on the stack. For example, this worked to always hit the SIGSEGV on my system:

    gdb -ex r --args ./gibberish 1 2 3 4 5 6 7

There's another potential issue here, in that these instructions assume that the most significant bit is set in ESP, which EBP has been derived from (via ECX):

    0x8048127 <_start+185>     neg    ebp
    0x8048129 <_start+187>     jns    0x804812d <_start+191>

This isn't always the case. For instance, if running your binary under QEMU's user-mode emulation, the stack is likely to be placed in the lower end of the address space, so that branch isn't taken and you will get a SIGSEGV here:

    0x804812b <_start+189>     cmovb  edx, DWORD PTR [eax-0x72dbfbe0]

phoreverpheebs · on March 18, 2023

thank you! you are wonderful for this <3 idk why but the stack being in the lower end of the address space doesn't feel right. nevertheless this is an amazing explanation and thank you for taking the time <3

abbeyj · on March 17, 2023

Memory layout in gdb is consistent because gdb disables ASLR by default.

This is usually what you want so that memory addresses don't change unnecessarily from one run to the next. But if you're debugging a problem that only shows up when ASLR is in effect then you can turn it back on. https://visualgdb.com/gdbreference/commands/set_disable-rand...

actionfromafar · on March 17, 2023

gibberish.asm (at least on my machine) runs into a segmentation fault about 50% of the time, which to me raises an interesting question that will lead me to look into how memory is laid out at the beginning of a processes execution on a standard Linux system. In gdb, the memory seems to be allocated in a consistent manner, which causes the exception to never occur, though in normal execution it seems to be slightly different.

Wonderful!

antegamisou · on March 18, 2023

Reminded me of the movsfucator that compiles programs into only mov instructions!

https://github.com/xoreaxeaxeax/movfuscator

And also REpsych from the same developer which really isn't for the fainthearted...

https://github.com/xoreaxeaxeax/REpsych

phoreverpheebs · on March 18, 2023

i love Domas' work! this project did have a bit of inspiration from him

phoreverpheebs · on March 17, 2023

Finally finished this "proof of concept", that obfuscates a string in a binary by scattering its bytes across the program's opcodes.

Teknoman117 · on March 17, 2023

Makes me think of the "when is main not a function" article from 2015:

http://jroweboy.github.io/c/asm/2015/01/26/when-is-main-not-...

weinzierl · on March 17, 2023

If you are interested in obfuscation and anti-reverse engineering I can recommend Josh Stroschein's courses. He does a fantastic job explaining various techniques.

TremendousJudge · on March 17, 2023

It's like the inverse of Enterprise FizzBuzz (https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...)

IncRnd · on March 18, 2023

That's literally some of the worst code I've ever seen. Since that's sort-of what you were attempting, good job! It appears to be a misaligned instruction sequence (but I know better since I read the github).

userbinator · on March 17, 2023

Looks like what happens when a dumb disassembler misaligns instruction boundaries. All those immediate constants looking like instructions is a huge hint.

larsrc · on March 18, 2023

"'Hello, World ' in x86 assembly, but make it gibberish"

Well, there's a pleonasm if I ever saw one:)

thdespou · on March 18, 2023

How did you learn this stuff? All of these assembly instructions reminds me of Diablo 2 runes.

phoreverpheebs · on March 18, 2023

most of the program is standard assembly, then sometimes it's just manipulating some bits of data in bizarre ways to get to a value, either way it's mostly just consulting the x86 instruction documentation for functionality and encoding.

in terms of the instructions being wild, some of them are chosen, because the opcode matches the byte we need on a specific spot, but other times the instruction is just a randomly chosen one, however it was still usually important that the instruction could take a long operand instead of an immediate 8-bit value

csdvrx · on March 17, 2023

Can't this be reversed by static analysis? Most approaches focus on the source, but IIRC there are also bytecode tools for at least Java

phoreverpheebs · on March 17, 2023

Yeah, it's not a way to 100% obfuscate the functionality, but instead it could be more of a way to throw off someone looking at the binary

antibasilisk · on March 17, 2023

If I was reverse engineering a program and I saw this I think I might just quit and rethink my life.

juliusgeo · on March 17, 2023

Beautiful! I love obfuscated code, but I normally do it in high level languages. Always wanted to pick up golfing in asm.

saagarjha · on March 18, 2023

Was this done by hand?

phoreverpheebs · on March 18, 2023

sadly yes lmao, it was a fun challenge trying to not reuse instructions, but nearing the end where most modr/m bytes started repeating, i ended up having to resort to jmps to jump from a long operand to another long operand