Phi-4-mini-instruct with llama-server (Tool-Enhanced Version)

This repository contains instructions for running a modified version of the Phi-4-mini-instruct model using llama-server. This version has been enhanced to support tool usage, allowing the model to interact with external tools and APIs through a ChatGPT-compatible interface.

Model Capabilities

This modified version of Phi-4-mini-instruct includes:

  • Full support for tool usage and function calling
  • Custom chat template optimized for tool interactions
  • Ability to process and respond to tool outputs
  • ChatGPT-compatible API interface

Prerequisites

  • llama-cpp-python installed with server support
  • The Phi-4-mini-instruct model in GGUF format

Installation

  1. Install llama-cpp-python with server support:
pip install llama-cpp-python[server]
  1. Ensure your model file is in the correct location:
models/Phi-4-mini-instruct-Q4_K_M-function_calling.gguf

Running the Server

Start the llama-server with the following command:

llama-server \
    --model models/Phi-4-mini-instruct-Q4_K_M-function_calling.gguf \
    --port 8080 \
    --jinja

This will start the server with:

  • The model loaded in memory
  • Server running on port 8082
  • Verbose logging enabled
  • Jinja template to support tool use

Testing the API

You can test the server using curl commands. Here are some examples:

Example 1: Using Tools

curl http://localhost:8080/v1/chat/completions -d '{
    "model": "phi-4-mini-instruct-with-tools",
    "tools": [
        {
        "type":"function",
        "function":{
            "name":"python",
            "description":"Runs code in an ipython interpreter and returns the result of the execution after 60 seconds.",
            "parameters":{
            "type":"object",
            "properties":{
                "code":{
                "type":"string",
                "description":"The code to run in the ipython interpreter."
                }
            },
            "required":["code"]
            }
        }
        }
    ],
    "messages": [
        {
        "role": "user",
        "content": "Print a hello world message with python."
        }
    ]
}'

Example 2: Tell a Joke

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-4-mini-instruct-with-tools",
    "messages": [
      {"role":"system","content":"You are a helpful clown instruction assistant"},
      {"role":"user","content":"tell me a funny joke"}
    ]
  }'

Example 3: Generate HTML Hello World

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-4-mini-instruct-with-tools",
    "messages": [
      {"role":"system","content":"You are a helpful coding assistant"},
      {"role":"user","content":"give me an html hello world document"}
    ]
  }'

API Endpoints

The server provides a ChatGPT-compatible API with the following main endpoints:

  • /v1/chat/completions - For chat completions
  • /v1/completions - For text completions
  • /v1/models - To list available models

Notes

  • The server uses the same API format as OpenAI's ChatGPT API, making it compatible with many existing tools and libraries
  • The --jinja flag enables proper chat template formatting for the model, which is essential for tool usage

Troubleshooting

If you encounter issues:

  1. Ensure the model file exists in the specified path
  2. Check that port 8080 is not in use by another application
  3. Verify that llama-cpp-python is installed with server support

License

Please ensure you comply with the model's license terms when using it.

Downloads last month
193
GGUF
Model size
3.84B params
Architecture
phi3
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ariel-pillar/phi-4_function_calling

Quantized
(93)
this model