Hacker News new | past | comments | ask | show | jobs | submit login

If an LLM is used, it's unclear how to best do it.

One could try to train ones own LLM from scratch, using an encoder-decoder (translation - aka seq2seq) architecture trying to predict the correct variable name given the decompiled output.

One could try to use something like GPT-4 with a carefully designed prompt "Given this datastructure, what might be the name for this field?"

One could try to use something pretrained like llama, but then finetune it based on hundreds of thousands of compiled and decompiled programs.




Option 4:

One could take an pretrained model like llama, train it on only a few thousands of compiled and decompiled programs, then feed it compiled programs and have it decompile them and evaluate that output to make a new dataset and fine tune it again. Repeat until satisfactory.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: