If an LLM is used, it's unclear how to best do it.
One could try to train ones own LLM from scratch, using an encoder-decoder (translation - aka seq2seq) architecture trying to predict the correct variable name given the decompiled output.
One could try to use something like GPT-4 with a carefully designed prompt "Given this datastructure, what might be the name for this field?"
One could try to use something pretrained like llama, but then finetune it based on hundreds of thousands of compiled and decompiled programs.
One could take an pretrained model like llama, train it on only a few thousands of compiled and decompiled programs, then feed it compiled programs and have it decompile them and evaluate that output to make a new dataset and fine tune it again. Repeat until satisfactory.
One could try to train ones own LLM from scratch, using an encoder-decoder (translation - aka seq2seq) architecture trying to predict the correct variable name given the decompiled output.
One could try to use something like GPT-4 with a carefully designed prompt "Given this datastructure, what might be the name for this field?"
One could try to use something pretrained like llama, but then finetune it based on hundreds of thousands of compiled and decompiled programs.