RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16 · Running in vLLM with tool calling support

Apr 24

Is it currently possible to run this quantized model with support for tool calling?

If i run it according to the VLLM documentation the tool call gets returned in the content section of the response, which isnt compatible with most libraries. The tool parser also produces the following warning: [mistral_tool_parser.py:55] Non-Mistral tokenizer detected when using a Mistral model...

Example command:

--model RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16  --chat-template examples/tool_chat_template_mistral_parallel.jinja --tool-call-parser mistral --limit-mm-per-prompt image=1 --enable-auto-tool-choice

If i try to use the original mistral tokenizer the engine fails to start with an AttributeError: 'MistralTokenizer' object has no attribute 'init_kwargs' error (Related vLLM Issue).

Example command:

--model RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16--chat-template examples/tool_chat_template_mistral_parallel.jinja  --tool-call-parser mistral --limit-mm-per-prompt image=1 --enable-auto-tool-choice --tokenizer-mode mistral --tokenizer mistralai/Mistral-Small-3.1-24B-Instruct-2503

makromaksym

May 2

same error here

zacksiri

Jun 4

I'm running this model with vllm 0.9.0 it runs perfectly even with multiple tool call (178 in one go 😆)

      --model RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16
      --served-model-name mistral-small-3.1-24b
      --tensor-parallel-size 2
      --max-model-len 65536
      --enable-auto-tool-choice
      --tool-call-parser mistral
      --chat-template examples/tool_chat_template_mistral_parallel.jinja