Running in vLLM with tool calling support

#1
by LLukas22 - opened

Is it currently possible to run this quantized model with support for tool calling?

If i run it according to the VLLM documentation the tool call gets returned in the content section of the response, which isnt compatible with most libraries. The tool parser also produces the following warning: [mistral_tool_parser.py:55] Non-Mistral tokenizer detected when using a Mistral model...

Example command:

--model RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16  --chat-template examples/tool_chat_template_mistral_parallel.jinja --tool-call-parser mistral --limit-mm-per-prompt image=1 --enable-auto-tool-choice

If i try to use the original mistral tokenizer the engine fails to start with an AttributeError: 'MistralTokenizer' object has no attribute 'init_kwargs' error (Related vLLM Issue).

Example command:

--model RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16--chat-template examples/tool_chat_template_mistral_parallel.jinja  --tool-call-parser mistral --limit-mm-per-prompt image=1 --enable-auto-tool-choice --tokenizer-mode mistral --tokenizer mistralai/Mistral-Small-3.1-24B-Instruct-2503 
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment