Phi-4-mini-instruct with llama-server (Tool-Enhanced Version)

This repository contains instructions for running a modified version of the Phi-4-mini-instruct model using llama-server. This version has been enhanced to support tool usage, allowing the model to interact with external tools and APIs through a ChatGPT-compatible interface.

Model Capabilities

This modified version of Phi-4-mini-instruct includes:

Full support for tool usage and function calling
Custom chat template optimized for tool interactions
Ability to process and respond to tool outputs
ChatGPT-compatible API interface

Prerequisites

llama-cpp-python installed with server support
The Phi-4-mini-instruct model in GGUF format

Installation

Install llama-cpp-python with server support:

pip install llama-cpp-python[server]

Ensure your model file is in the correct location:

models/Phi-4-mini-instruct-Q4_K_M-function_calling.gguf

Running the Server

Start the llama-server with the following command:

llama-server \
    --model models/Phi-4-mini-instruct-Q4_K_M-function_calling.gguf \
    --port 8080 \
    --jinja

This will start the server with:

The model loaded in memory
Server running on port 8082
Verbose logging enabled
Jinja template to support tool use

Testing the API

You can test the server using curl commands. Here are some examples:

Example 1: Using Tools

curl http://localhost:8080/v1/chat/completions -d '{
    "model": "phi-4-mini-instruct-with-tools",
    "tools": [
        {
        "type":"function",
        "function":{
            "name":"python",
            "description":"Runs code in an ipython interpreter and returns the result of the execution after 60 seconds.",
            "parameters":{
            "type":"object",
            "properties":{
                "code":{
                "type":"string",
                "description":"The code to run in the ipython interpreter."
                }
            },
            "required":["code"]
            }
        }
        }
    ],
    "messages": [
        {
        "role": "user",
        "content": "Print a hello world message with python."
        }
    ]
}'

Example 2: Tell a Joke

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-4-mini-instruct-with-tools",
    "messages": [
      {"role":"system","content":"You are a helpful clown instruction assistant"},
      {"role":"user","content":"tell me a funny joke"}
    ]
  }'

Example 3: Generate HTML Hello World

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-4-mini-instruct-with-tools",
    "messages": [
      {"role":"system","content":"You are a helpful coding assistant"},
      {"role":"user","content":"give me an html hello world document"}
    ]
  }'

API Endpoints

The server provides a ChatGPT-compatible API with the following main endpoints:

/v1/chat/completions - For chat completions
/v1/completions - For text completions
/v1/models - To list available models

Notes

The server uses the same API format as OpenAI's ChatGPT API, making it compatible with many existing tools and libraries
The --jinja flag enables proper chat template formatting for the model, which is essential for tool usage

Troubleshooting

If you encounter issues:

Ensure the model file exists in the specified path
Check that port 8080 is not in use by another application
Verify that llama-cpp-python is installed with server support

License

Please ensure you comply with the model's license terms when using it.

ariel-pillar
/

phi-4_function_calling