Dogbolt Decompiler Explorer

psifertex · 2023-12-05T00:47:01 1701737221

Can I just say, thanks to the person who posted this for waiting until this week to do so. (Side note: I suspect it was due to the recent coverage from C++ Weekly which is a great resource: https://www.youtube.com/watch?v=h3F0Fw0R7ME)

As recently as last week we had some horrible performance problems but it looks like the queue (https://dogbolt.org/queue) is mostly still fine! Other than the long pole of a few of the decompilers being backed up, things are humming along quite smoothly! Josh + Glenn have done some great work on it! (https://github.com/decompiler-explorer/decompiler-explorer/c...)

dang · 2023-12-04T19:02:23 1701716543

hoosieree · 2023-12-04T19:22:33 1701717753

Wow, I really could have used this for my Ph.D. research (deep learning for obfuscated code).

I ditched Ghidra in my experiments in favor of angr early on because Ghidra did not play nicely with multiprocessing and I had a lot of data to process. Well maybe it does but it was much easier for me to achieve the same thing with angr.

Love the name! Although I feel compelled to point out that Compiler Explorer is the name of the project and Godbolt is its author's last name, but I suppose if people are to the point of using Godbolt as a verb the ship has sailed.

psifertex · 2023-12-04T19:32:11 1701718331

We know! Similarly, the GH repo is actually the Decompiler Explorer:

https://github.com/decompiler-explorer/decompiler-explorer/

account42 · 2023-12-05T11:42:29 1701776549

I like the name, it's cute and a nice homage.

mvelbaum · 2023-12-05T04:57:00 1701752220

Has there been any good progress in deobfuscating/decompiling machine code using Machine Learning techniques?

hoosieree · 2023-12-05T15:03:15 1701788595

Short answer: not where it counts.

My work focuses on recognizing known functions in obfuscated binaries, but there are some papers you might want to check out related to deobfuscation, if not necessarily using ML for deobfuscation or decompilation.

My take is that ML can soundly defeat the "easy" and more static obfuscation types (encodings, control flow flattening, splitting functions). It's low hanging fruit, and it's what I worked on most, but adoption is slow. On the other hand, "hard" obfuscations like virtualized functions or programs which embed JIT compilers to obfuscate at runtime... as far as I know, those are still unsolved problems.

This is a good overview of the subject, but pretty old and doesn't cover "hard" obfuscations: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1566145.

https://www.jinyier.me/papers/DATE19_Obf.pdf uses deobfuscation for RTL logic (FGPA/ASIC domain) with SAT solvers. Might be useful for a point of view from a fairly different domain.

https://advising.cs.arizona.edu/~debray/Publications/generic... uses "semantics-preserving transformations" to shed obfuscation. I think this approach is the way to go, especially when combined with dynamic/symbolic analysis to mitigate virt/jit types of transformations.

I'll mention this one as a cautionary tale: https://dl.acm.org/doi/pdf/10.1145/2886012 has some good general info but glosses over the machine learning approach. It considers Hex-rays' FLIRT to be "machine learning", but FLIRT just hashes signatures, can be spoofed (i.e. https://siliconpr0n.org/uv/issues_with_flirt_aware_malware.p...), and is useless against obfuscation.

Eventually I think SBOM tools like Black Duck[1] and SLSA[2] will incorporate ML to improve the accuracy of even figuring out what dependencies a piece of software actually has.

[1]: https://www.synopsys.com/software-integrity/software-composi...

[2]: https://slsa.dev/

mvelbaum · 2023-12-05T16:39:14 1701794354

Very cool - thank you very much!

> My take is that ML can soundly defeat the "easy" and more static obfuscation types (encodings, control flow flattening, splitting functions). It's low hanging fruit, and it's what I worked on most, but adoption is slow.

If I wanted to implement my own toy HexRays-like decompiler using a few of these techniques to decompile x86-64 binaries is there any high quality up-to-date paper/resource you would recommend?

Or do you think that "A Generic Approach to Automatic Deobfuscation of Executable Code" paper is a good enough start?

Also, what do you think about https://tigress.wtf/ ?

hoosieree · 2023-12-06T18:47:28 1701888448

"A Generic Approach" seems like a good starting point for a classical approach: building a set of reusable components and heuristics to recognize idioms, etc.

Might also be worth considering an approach integrating LLMs for summarizing code. Maybe you could fine-tune a pretrained model that already "understands" source code to associate sources with generated code? If going this route I would still probably use a disassembler to preprocess, and maybe also extract basic blocks to use as my "target" domain for fine-tuning.

As for Tigress, I used it extensively and found it to be really great most of the time. There are some limitations to be aware of: it only works with C code, and you have to turn your multi-file projects into a single file with a main() function. Also, its C parser (CIL) has some limitations (e.g. doesn't recognize the word static in "struct foo x[static 1]") so you might need to translate your C code first. I translated manually because it was a really rare issue for the code I started with. I also had mixed results using Virtualize and JIT. Sometimes they would emit invalid code, so I ended up just throwing out that data.

In my view, the up-and-coming Tigress challenger is obfuscator-llvm. I think it is very promising for future work because it inherently supports more languages than only C. But currently obfuscator-llvm is much more limited (~3 transformations compared to ~48). So if you're using C, today I would pick Tigress.

mvelbaum · 2023-12-07T19:19:32 1701976772

Thanks again! :)

tomcam · 2023-12-05T01:28:34 1701739714

Sometimes we must look back in angr

hoosieree · 2023-12-05T15:04:58 1701788698

That better be a Bowie reference and not an Oasis reference.

tomcam · 2023-12-05T22:57:11 1701817031

John Osborne actually. But never Oasis ;0

rixtox · 2023-12-04T19:14:13 1701717253

I really wish a similar tool for exploring binary lifting to different IRs. Like Ghidra p-code with sleigh, LLVM Machine IR, Qemu TCG etc

psifertex · 2023-12-04T19:33:43 1701718423

IRs aren't generally suited toward small snippets of examination by human when you're starting with a full binary. I would imagine something like that would only work well when done for very small bits of assembly. Likewise, you might be interested in BNIL which is an entire stack of ILs that Binary Ninja is based on. (You can see it exposed in the cloud.binary.ninja UI or the demo)

JonChesterfield · 2023-12-04T23:20:27 1701732027

Qemu works by translating a binary to an IR then doing stuff with it. Valgrind likewise. There's an optimiser called bolt (associated with facebook) which has the same idea.

psifertex · 2023-12-05T00:38:41 1701736721

Yup, I'm aware of both of those, but none of those tools listed so far are intended for the IR to be for human-consumable unlike disassemblers and decompilers. You think disassembly is verbose compared to a decompiler? Go look at the equivalent Vex (Valgrind's IR) for any non-trivial disassembly. It's suuuper verbose.

As far as I know, BNIL (https://docs.binary.ninja/dev/bnil-overview.html) is the only one that is designed to be readable and it still wouldn't make sense to include it in an IL comparison such as the one done here for decompilation in my opinion.

aidenfoxivey · 2023-12-05T04:11:01 1701749461

Speaking of decompilers, would Binary Ninja be a safe bet to pick? I've been told IDA is the gold standard, but it's also expensive for someone who wants to recreationally reverse engineer.

kdbg · 2023-12-05T07:21:31 1701760891

Binja decompiler is more-or-less fine. Its not as mature as IDA or Ghidra but its not a bad decompiler.

Though for me the big selling point on Binja is the Intermediate Languages (ILs). HIgh-level IL is the decompiler but you also get Low-level and Medium-level ILs as steps between assembly and source. If the decompiler is a bit funky you can look at the ILs to get a better idea of what is happening. the ILs are also just much nicer to read than plain assembly so I tend to use them a lot.

Its a feature that isn't really matched on any other platform. Ghidra and IDA both have a single IL that is more machine readable compared to Binja's human-readable ones.

gavinray · 2023-12-05T21:30:21 1701811821

IDA Free has essentially all the features of Pro nowadays, if you're only looking to do x86_64 on Windows/Linux.

https://hex-rays.com/ida-free/

The only thing you lose out on is Python scripting, which is kind of big, but for a free tool you really can't complain.

You probably want to use both IDA and Ghidra since they have different strengths/weaknesses and community plugins.

IAmLiterallyAB · 2023-12-05T04:32:46 1701750766

Honestly just use Ghidra. It has it's quirks but it's pretty good. And open source. If it's good enough for the NSA it's probably good enough for recreational use.

codedokode · 2023-12-05T09:24:59 1701768299

If Ghidra is made by NSA, does it mean that it can have backdoors for non-US users?

dddnzzz334 · 2023-12-05T12:13:05 1701778385

The code is open source and has been looked at by several people over the years. It would be quite hard for the NSA to sneak in a backdoor but it is never out of the question. However, the risk is so extremely minuscule when compared to other alternatives since they are not even open source.

cristeigabriel · 2023-12-04T18:55:23 1701716123

Very nice. A parallel, I've been working on an emulator project recently, implementing my own disassembler, and I keep thinking about how I would turn patterns of machine code into a generalized form, which could then be turned into something like C-like pseudo-code, so it's been really compelling me lately to implement my own toy decompiler

withzombies · 2023-12-05T14:49:52 1701787792

BinaryNinja does this. They have several layers of intermediate representations[1], which they build their compiler on top of. Ghidra does something similar with their PCode. They disassemble to PCode and then decompile the PCode[2].

[1] https://docs.binary.ninja/dev/bnil-overview.html [2] https://riverloopsecurity.com/blog/2019/05/pcode/ (an example)

cristeigabriel · 2023-12-05T20:09:23 1701806963

Thanks for sharing!

DrNosferatu · 2023-12-05T15:48:59 1701791339

Any good and thorough decompiler tutorials for non-expert users?

DrNosferatu · 2023-12-05T22:56:25 1701816985

There seems to be demand!

- Anyone care to give some pointers?

29athrowaway · 2023-12-04T23:13:48 1701731628

Now take the output of dogbolt and feed into godbolt.

userbinator · 2023-12-05T04:48:54 1701751734

Machine translation, for machine code.

Theoretically, a fixed point should be reached.

staunton · 2023-12-04T23:45:47 1701733547

And reinforcement-train an LLM to reconstruct the original code...

29athrowaway · 2023-12-05T00:13:32 1701735212

That would be dogebolt

iBotPeaches · 2023-12-04T19:13:55 1701717235

Love this - I can almost imagine the convincing for other companies wasn't even needed when they realized a small binary size and comparison to competitors would net them more business. A perfect little solution for triaging issues between services and comparing solutions.

psifertex · 2023-12-04T19:35:15 1701718515

That was indeed the logic. The two main commercial solutions included (Binary Ninja made by Vector 35, where I'm one of hte founders) and Hex-Rays both pay for all the hosting costs. And it's not particularly cheap -- there's a fair amount of compute to drive the decompilers especially as some of them are... not very efficient.

costco · 2023-12-05T02:00:28 1701741628

I wish I saw this when it was posted last year. This is awesome and really convenient.

Arch-TK · 2023-12-04T18:44:27 1701715467

HexRays online? Is that allowed?

alright2565 · 2023-12-04T18:54:09 1701716049

From the FAQ, Hex-Rays actually sponsors the project:

> Vector 35 and Hex-Rays jointly sponsor the hosting on Digital Ocean as a community service.

cristeigabriel · 2023-12-04T18:57:23 1701716243

It makes sense, it's a perfect advertisement of their superiority.

Fabricio20 · 2023-12-04T19:34:36 1701718476

Indeed, looking at the samples HexRays really did a great job compared to the others, much more readable code.

rychco · 2023-12-04T18:54:01 1701716041

When this first came out a year(ish?) ago, I remember seeing somewhere that they had received permission from Hexrays/Ilfak Guilfanov.

sonicanatidae · 2023-12-04T18:50:26 1701715826

Not anymore!

angrily writes a letter to his congressman who won't understand a word of it

quickthrower2 · 2023-12-04T21:03:49 1701723829

Your congressman doesn’t yet have hexrays to decompile your letter

sonicanatidae · 2023-12-05T14:37:00 1701787020

From what I can tell in observation, they don't parse English either.

exikyut · 2023-12-04T21:44:54 1701726294

His brain is relegated to spewing out the Matrix unparsed as he receives it. He gets none of the blondes, brunettes or redheads.

w10-1 · 2023-12-04T19:59:26 1701719966

OMG I am so happy

Of note: HexRays is not only cleaner, but right now their queue is mostly empty while others are backed up.

psifertex · 2023-12-05T00:42:05 1701736925

Binary Ninja likewise is empty and keeps up just fine as well. It's not a coincidence that the two commercial products that are funding it are both confident enough to put their stuff online like this.

And it's no conspiracy theory or intentional sandbagging, you can see the implementation: https://github.com/decompiler-explorer/decompiler-explorer

and if anyone can improve the other tools performance we'd be happy to accept it. We reached out to the Ghidra devs: https://github.com/NationalSecurityAgency/ghidra/issues/5228 but they didn't have any silver bullets for us either.

fritzo · 2023-12-04T23:23:34 1701732214

Is there a similar project for javascript? That is, de-obfuscating large javascript codebases?

intelVISA · 2023-12-05T23:05:09 1701817509

> de-obfuscating large javascript codebases

Impossible, sadly.

Carbocarde · 2023-12-04T20:09:22 1701720562

> All submitted binaries are saved and made available to any of the authors of the tools used so they may improve their decompilers. If you're such an author who would like access, let us know!.

oof

CaliforniaKarl · 2023-12-04T20:16:20 1701720980

Good that this is clearly mentioned up-front on their site.

einpoklum · 2023-12-04T21:21:51 1701724911

If you believe that content you submit to websites is not examined by interested parties associated with that website, then - I have a bridge to sell you... or perhaps I should say a Google account to give you, free of charge.

Carbocarde · 2023-12-04T23:22:35 1701732155

Compare this policy to godbolt’s policy:

> In short: your source code is stored in plaintext for the minimum time feasible to be able to process your request. After that, it is discarded and is inaccessible. In very rare cases your code may be kept for a little longer (at most a week) to help debug issues in Compiler Explorer.

boneitis · 2023-12-05T14:51:47 1701787907

My bias may be showing, being a ctf-scene enthusiast. Most of these (tools on dogbolt) look like foss utilities you can run yourself. The rest, I'd imagine you are welcome to pay for licenses. Binary Ninja in particular, while maybe not cheap for everybody, isn't sky-high.

einpoklum · 2023-12-05T19:31:02 1701804662

While it is possible they throw it all away:

1. If a third-party does their link-shortening, which gets the program text, then - it doesn't matter how nice they are. And if that party is Google then, well...

2. The language you quoted still allows them to keep effectively all information through mining aspects of it rather than keeping the entire code as a stretch of plain text.

3. If GodBolt or its servers are subject to US law, then there might be National Security Letters which compel it to pass information on to the US government, and keep that secret. And this is not a conspiracy theory, this what Snowden has exposed about Google, Apple, Microsoft, Yahoo etc.

So - I respect and like the GodBolt'ers, but you don't have a good guarantee of your data being kept private.

saagarjha · 2023-12-05T00:36:40 1701736600

Pretty sure links work basically forever

extraduder_ire · 2023-12-05T04:03:16 1701748996

I think they changed it recently, but all of the code you submit is embedded in the URL. (after an anchor) So, it's stored by google's link shortening service, but is resubmitted to the site every time you load it.

saagarjha · 2023-12-04T22:56:49 1701730609

Sweet, free file hosting

account42 · 2023-12-05T11:28:41 1701775721

Yep. Remember that that means you are not allowed to submit any binaries for which you don't have the license to redistribute.

marcellus23 · 2023-12-05T01:40:30 1701740430

They make it very clear. If you don't notice that before uploading some private binaries, that's on you.

smegsicle · 2023-12-04T22:40:22 1701729622

so like vscode?

dfawcus · 2023-12-05T21:46:53 1701812813

I threw some 16 bit files at it, all of them puked in one way or another.

One was a CP/M-86 "small model" executable, the other an object file (16 bit intel OMF object file) - i.e. compiler output.

Boomerang looked like it'd have most chance of getting somewhere since it mentioned having a DOS .EXE analyser.

I'm surprised that "Hex Rays" (i.e. IDA-Pro) got nowhere...

danielwmayer · 2023-12-04T19:11:49 1701717109

The name of this is a reference to the incredibly useful godbolt compiler explorer. If you are interested in this you will likely enjoy the other as well:

https://godbolt.org/

riffraff · 2023-12-04T19:24:09 1701717849

and for those who don't know it, that one is named after the author, Matt Godbolt.

I thought for a longtime it was some joke I wasn't getting related to deities smithing people.

insulanus · 2023-12-04T19:53:56 1701719636

> deities smithing people.

That's "deities smiting people.", but I really like the idea of deities smithing people :)

WJW · 2023-12-05T00:00:24 1701734424

The Dwarven god in DnD is so good at crafting he can literally make new souls in his forge. :)

stcredzero · 2023-12-04T21:22:44 1701724964

This happens in the Norse myths.

pjmorris · 2023-12-04T20:12:12 1701720732

There's a joke about Adam and Eve in here somewhere. Genesis 2 for reference.

reactordev · 2023-12-04T21:26:14 1701725174

Sculpty terracotta would be a fitting choice. It's pretty easy to sculpt when kneaded, bakes in a traditional oven, keeps it's details. Perfect for silicone mold making.

TeMPOraL · 2023-12-04T23:06:05 1701731165

> bakes in a traditional oven

Now that reminds me of a verse from a song I heard on the radio as a teenager:

  Had a meeting with my maker
  The superhuman baker
  He popped me in the oven
  And set the dial to lovin'

Waterluvian · 2023-12-04T19:44:30 1701719070

Damn. To just name something your last name.

I thought it was the sibling part to the Jesus Nut. https://en.wikipedia.org/wiki/Jesus_nut

mattgodbolt · 2023-12-04T20:16:11 1701720971

It's never been called anything but either "GCC Explorer" or "Compiler Explorer", by me, anyway... The URL it's accessible for is an accident of the one I had hanging around :) (it's now available at compiler-explorer.com too, but...the name other people use has stuck so I'll never be able to reclaim my own domain...)

joemi · 2023-12-04T20:42:22 1701722542

I think you _could_ reclaim your own domain if you wanted. You'd want to have a banner at the top with a clear note directing people to the new domain for the compiler explorer, so that people realize immediately that you're not domain squatting. A few people might put up a stink, but I'm pretty confident that most people wouldn't mind, especially since the tool itself is so useful. The name, for those who don't know it as your last name, is fun, but it isn't the reason people use the tool. Eventually, over enough time, people would start remembering the new URL, and you could shrink or remove the banner (and/or put a note elsewhere on the page).

bombcar · 2023-12-04T22:26:43 1701728803

Honestly "godbolt" is so memorable I can find it instantly even though I rarely use it; but "compiler-explorer" sounds like some generic SEO spam site that I'd probably never click on.

johannes1234321 · 2023-12-04T23:42:27 1701733347

Even then the internet (and even books) are full of "godbolt" links, to the tool itself, to specific code samples. Till all those became irrelevant will take quite some time.

As a data point: Search on stack overflow yields "500" hits. https://stackoverflow.com/search?q=godbolt

account42 · 2023-12-05T11:35:36 1701776136

Links to specific examples are less of a problem as he could redirect those to compiler-explorer.com and just keep that redirect up forever. Really the only URL that would need to be "reclaimed" is https://godbolt.org/ and having a prominent link to compiler-explorer.com thee would solve that issue.

OTOH the godbolt domain is at least not actively used for a number of other TLDs getting one of those might be an easier option.

Waterluvian · 2023-12-04T20:29:19 1701721759

It’s such a memorable name for a tool like that. Other than losing your domain name to the topic, how do you feel about the de facto name?

To a far far lesser degree, I’ve experienced many examples of “you named it X but everyone at work calls it Y and now you have to live with that.” It used to really irk me for some reason.

nhatcher · 2023-12-04T21:42:45 1701726165

It is fantastic name of an otherwise fantastic tool. The day I found it was your last name made me chuckle and liked it even more. And since I am here, thank you very much for it!

I always call it the compiler explorer but the url, as a sibling comment says, is memorable.

jchw · 2023-12-04T19:51:45 1701719505

Could be misremembering, but IIRC it was called Compiler Explorer and used to live only on a subdomain of godbolt.org. But, it was so useful that it became presumably vastly higher traffic than the personal homepage part and people often referred to it as just "Godbolt" probably because it sounds cooler and is shorter than saying "Compiler Explorer" (and it may not be obvious the domain name is a last name rather than just a cool name for something.)

Waterluvian · 2023-12-04T19:56:51 1701719811

Now that’s a pretty cool origin story for a name. What a compliment!

jjoonathan · 2023-12-04T22:04:53 1701727493

To be fair it's an amazing last name and it feels like there probably is a story, it just has to do with this guy's ancestors rather than the assembler tool we all know and love.

29athrowaway · 2023-12-04T23:12:36 1701731556

There's also RMSbolt, which is a Compiler explorer for Emacs, where Richard M. Stallman is regarded as the "creator".

extraduder_ire · 2023-12-05T04:12:57 1701749577

It makes for a nice parallel, since the original version of godbolt was just a split tmux session with vim running on one side, and "watch 'gcc -S -o /dev/stdout'" on the other. The main advantage of putting it online is not needing all of the compilers locally.

account42 · 2023-12-05T11:40:10 1701776410

> Richard M. Stallman

That's St IGNUcius to you.

[0] https://stallman.org/saint.html

reaperman · 2023-12-04T19:21:58 1701717718

It might also be a bit of a portmanteau with a second reference to dogpile.com which was a pre-Google "search engine" that compiled search results from multiple search engines. Back in the day you often had to separately search altavista.com, lycos.com, askjeeves.com, yahoo.com, etc. because some of them would work for your query but others would not and it was difficult to predict the performance of any particular search engine, but usually at least one of them would have the result you wanted/needed.

Dogpile was an automated way to search all of the search engines at the same time with one query.

https://web.archive.org/web/19990429194414/http://dogpile.co...

codetrotter · 2023-12-04T19:40:11 1701718811

Look no further than https://dogbolt.org/faq

> It's meant to be the reverse of the amazing Compiler Explorer.

With a link to https://godbolt.org/

It’s very obvious that Dogbolt Decompiler Explorer is primarily named after Godbolt Compiler Explorer.

psifertex · 2023-12-04T19:31:10 1701718270

I do remember dogpile, but as one of the folks who named it, nope, that wasn't a conscious influence!

borski · 2023-12-04T22:13:12 1701727992

Oh, it you! Hi Jordan I miss you let’s hang out sometime :)

psifertex · 2023-12-05T00:44:03 1701737043

Yes, lets! And before hacker summer camp when we're way way too busy! :-)

T3RMINATED · 2023-12-04T19:02:22 1701716542