Convert to HF format
This PR converts the weights and configs to HF format following the PR I just merged directly in Transformers https://github.com/huggingface/transformers/pull/36939
We also added multimodality support in chat template, which comes in a separate PR (https://huggingface.co/microsoft/Phi-4-multimodal-instruct/discussions/56)
@cyrilvallez @RaushanTurganbay Thank you for your contributions! In the PR, I see the checkpoints have been modified. However, as the HF checkpoint is being used for vLLM (supported by vLLM 0.7.3+), I wonder if your changes on checkpoint / configs are still compatible with vLLM? Thanks in advance!
cc: @nguyenbh
@cyrilvallez
When I tried the sample_inference_phi4mm.py
in your PR with transformers==4.51.0
, it shows error below. Could you help check the issue? Thanks!
Traceback (most recent call last):
File "/home/weijianxu/code/phi-o/Phi-4-multimodal-instruct-for-pr/sample_inference_phi4mm.py", line 13, in <module>
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
File "/home/weijianxu/anaconda3/envs/phi4mm_hf/lib/python3.13/site-packages/transformers/models/auto/processing_auto.py", line 347, in from_pretrained
return processor_class.from_pretrained(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/weijianxu/anaconda3/envs/phi4mm_hf/lib/python3.13/site-packages/transformers/processing_utils.py", line 1082, in from_pretrained
return cls.from_args_and_dict(args, processor_dict, **kwargs)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/weijianxu/anaconda3/envs/phi4mm_hf/lib/python3.13/site-packages/transformers/processing_utils.py", line 876, in from_args_and_dict
processor = cls(*args, **processor_dict)
File "/home/weijianxu/anaconda3/envs/phi4mm_hf/lib/python3.13/site-packages/transformers/models/phi4_multimodal/processing_phi4_multimodal.py", line 74, in __init__
super().__init__(image_processor, audio_processor, tokenizer, **kwargs)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/weijianxu/anaconda3/envs/phi4mm_hf/lib/python3.13/site-packages/transformers/processing_utils.py", line 464, in __init__
raise TypeError(f"Unexpected keyword argument {key}.")
TypeError: Unexpected keyword argument fake_audio_token_pattern.
@xwjabc for vLLM some weights might need to be adapted, I ma not sure if Cyril has specific keys that need to be changed. If won't be hard, since vLLM internally has a mapping from HF weights to vLLM weights, so we'll need to change that mapping
For the error, did you try to merge both PRs (current and linked)? The processor in transformers==4.51.0 is updated to accommodate for chat template modifications as well. I don't have access to commit to existing PRs unfortunately so made a separate one
@RaushanTurganbay Thank you for your reply! I will give it a try to merge both PRs.
In addition, I see your PR 56 removes processor_config.json
and add chat_template, so I believe your PR depends on the processor in PR 55 & transformers==4.51.0
. Is my understanding correct? Thanks!
You should merge this PR first, then @RaushanTurganbay 's PR! That way the processor will be up-to-date!
Yeah, I rebased on #55 before making changes and the config should indeed be deleted to work with transformers==4.51.0