--- license: apache-2.0 datasets: - Congliu/Chinese-DeepSeek-R1-Distill-data-110k - cognitivecomputations/dolphin-r1 - a-m-team/AM-DeepSeek-R1-0528-Distilled language: - zh - en base_model: - Zhihu-ai/Zhi-Create-Qwen3-32B tags: - qwen3 - eagle3 library_name: transformers --- # Zhi-Create-Qwen3-32B-Eagle3 This is a speculator model designed for use with [Zhihu-ai/Zhi-Create-Qwen3-32B](https://huggingface.co/Zhihu-ai/Zhi-Create-Qwen3-32B), based on the [EAGLE-3](https://arxiv.org/abs/2503.01840) speculative decoding algorithm. It was trained using the [SpecForge](https://github.com/sgl-project/SpecForge/) library on a subset of the Supervised Fine-tuning (SFT) Data from Zhihu-ai/Zhi-Create-Qwen3-32B. The model was trained in both thinking and non-thinking modes. You can easily start a service using [SGLang](https://github.com/sgl-project/sglang). ```bash pip install "sglang[all]>=0.4.9" python3 -m sglang.launch_server --model Zhihu-ai/Zhi-Create-Qwen3-32B --speculative-algorithm EAGLE3 --speculative-draft-model-path Zhihu-ai/Zhi-Create-Qwen3-32B-Eagle3 --speculative-num-steps 3 --speculative-eagle-topk 2 --speculative-num-draft-tokens 8 --tp 2 --port 8000 --dtype bfloat16 --reasoning-parser deepseek-r1 --served-model-name Zhi-Create-Qwen3-32B # send request curl http://localhost:8000/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Zhi-Create-Qwen3-32B", "prompt": "请你以鲁迅的口吻,写一篇介绍西湖醋鱼的文章", "max_tokens": 4096, "temperature": 0.6, "top_p": 0.95 }' ``` ```python # Alternative: Using OpenAI API from openai import OpenAI openai_api_key = "empty" openai_api_base = "http://127.0.0.1:8000/v1" client = OpenAI( api_key=openai_api_key, base_url=openai_api_base ) def get_answer(messages): response = client.chat.completions.create( messages=messages, model="Zhi-Create-Qwen3-32B", max_tokens=4096, temperature=0.3, top_p=0.95, stream=True, extra_body = {"chat_template_kwargs": {"enable_thinking": True}} ) answer = "" reasoning_content_all = "" for each in response: each_content = each.choices[0].delta.content if hasattr(each.choices[0].delta, "content"): each_content = each.choices[0].delta.content else: each_content = None if hasattr(each.choices[0].delta, "reasoning_content"): reasoning_content = each.choices[0].delta.reasoning_content else: reasoning_content = None if each_content is not None: answer += each_content print(each_content, end="", flush=True) if reasoning_content is not None: reasoning_content_all += reasoning_content print(reasoning_content, end="", flush=True) return answer, reasoning_content_all prompt = "请你以鲁迅的口吻,写一篇介绍西湖醋鱼的文章" messages = [ {"role": "user", "content": prompt} ] answer, reasoning_content_all = get_answer(messages) ```