The script can be called with the following (example) command: | |
python src/transformers/models/llama/convert_llama_weights_to_hf.py \ | |
--input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path | |
After conversion, the model and tokenizer can be loaded via: | |
thon | |
from transformers import LlamaForCausalLM, LlamaTokenizer | |
tokenizer = LlamaTokenizer.from_pretrained("/output/path") | |
model = LlamaForCausalLM.from_pretrained("/output/path") | |
Note that executing the script requires enough CPU RAM to host the whole model in float16 precision (even if the biggest versions | |
come in several checkpoints they each contain a part of each weight of the model, so we need to load them all in RAM). |