Running in vLLM with tool calling support
#1
by
LLukas22
- opened
Is it currently possible to run this quantized model with support for tool calling?
If i run it according to the VLLM documentation the tool call gets returned in the content
section of the response, which isnt compatible with most libraries. The tool parser also produces the following warning: [mistral_tool_parser.py:55] Non-Mistral tokenizer detected when using a Mistral model...
Example command:
--model RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16 --chat-template examples/tool_chat_template_mistral_parallel.jinja --tool-call-parser mistral --limit-mm-per-prompt image=1 --enable-auto-tool-choice
If i try to use the original mistral tokenizer the engine fails to start with an AttributeError: 'MistralTokenizer' object has no attribute 'init_kwargs'
error (Related vLLM Issue).
Example command:
--model RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16--chat-template examples/tool_chat_template_mistral_parallel.jinja --tool-call-parser mistral --limit-mm-per-prompt image=1 --enable-auto-tool-choice --tokenizer-mode mistral --tokenizer mistralai/Mistral-Small-3.1-24B-Instruct-2503