Hacker News new | past | comments | ask | show | jobs | submit login

Idea: automatically name variables and members of structs based on how code interacts with them.

Eg. The next pointer in a linked list should be easy to identify as 'next'.

That would be done by downloading all of GitHub, then seeing what variables in GitHub code have the most similar layouts and interactions, and then if the confidence is high enough, using those names.




In the past we were thinking to do something like this by hand. For instance, we detect induction variables, we could rename them into `i`.

However, nowadays, it seems pretty obvious that the right way to do this things is using LLMs.

This said, at this stage, we see ourselves as people building robust infrastructure. Once the infrastructure is there, using some off the shelf model to rename things or add comments is relatively easy.

Basically: we do the hard decompilation work that needs 100% accuracy, and then we can adopt LLMs for things that are OK to be approximate such as names, comments and the like.

Anyway, writing a script that renames stuff is pretty easy. Check out the docs: https://docs.rev.ng/user-manual/model-tutorial/


If an LLM is used, it's unclear how to best do it.

One could try to train ones own LLM from scratch, using an encoder-decoder (translation - aka seq2seq) architecture trying to predict the correct variable name given the decompiled output.

One could try to use something like GPT-4 with a carefully designed prompt "Given this datastructure, what might be the name for this field?"

One could try to use something pretrained like llama, but then finetune it based on hundreds of thousands of compiled and decompiled programs.


Option 4:

One could take an pretrained model like llama, train it on only a few thousands of compiled and decompiled programs, then feed it compiled programs and have it decompile them and evaluate that output to make a new dataset and fine tune it again. Repeat until satisfactory.


Would be very cool indeed, something like http://jsnice.org/

Paper that describes what JSNice is doing behind the scenes: https://files.sri.inf.ethz.ch/website/papers/jsnice15.pdf


Sounds like sidekick for binary ninja


Sort of like GitHub Copilot but for reversing?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: