SmaliLLM
Collection
Our Large Language Model to Decompile Smali code to Java code.
•
9 items
•
Updated
SmaliLLM is a large language model designed to decompile Smali code into Java code. Reconstructing Smali language representations into high-level languages such as Java holds significant practical engineering value. This transformation not only lowers the technical barrier for reverse engineering but also provides the necessary semantic foundation for subsequent tasks such as static analysis and vulnerability detection.
SmaliLLM is a series of models finetuned using nearly 1000 "Smali2Java" data, based on Qwen3, Qwen2.5-Coder, Gemma3, with the following features:
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "MoxStone/SmaliLLM-Qwen2.5-Coder-0.5B-Instruct-Finetuned"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "Smali Code You Want to Decompile"
messages = [
{"role":"system", "content": "Decompile following smali code to java code."}
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=8192
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n")
print("Java code:", content)
Base model
Qwen/Qwen2.5-0.5B