File size: 3,100 Bytes
8bed2d6 7da33d3 8bed2d6 7da33d3 8bed2d6 c29548e 8bed2d6 b6952ff 8bed2d6 b6952ff 8bed2d6 b6952ff 8bed2d6 b6952ff 8bed2d6 b6952ff |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
---
license: apache-2.0
datasets:
- Congliu/Chinese-DeepSeek-R1-Distill-data-110k
- cognitivecomputations/dolphin-r1
- a-m-team/AM-DeepSeek-R1-0528-Distilled
language:
- zh
- en
base_model:
- Zhihu-ai/Zhi-Create-Qwen3-32B
tags:
- qwen3
- eagle3
library_name: transformers
---
# Zhi-Create-Qwen3-32B-Eagle3
This is a speculator model designed for use with [Zhihu-ai/Zhi-Create-Qwen3-32B](https://huggingface.co/Zhihu-ai/Zhi-Create-Qwen3-32B), based on the [EAGLE-3](https://arxiv.org/abs/2503.01840) speculative decoding algorithm.
It was trained using the [SpecForge](https://github.com/sgl-project/SpecForge/) library on a subset of the Supervised Fine-tuning (SFT) Data from Zhihu-ai/Zhi-Create-Qwen3-32B.
The model was trained in both thinking and non-thinking modes.
You can easily start a service using [SGLang](https://github.com/sgl-project/sglang).
```bash
pip install "sglang[all]>=0.4.9"
python3 -m sglang.launch_server --model Zhihu-ai/Zhi-Create-Qwen3-32B --speculative-algorithm EAGLE3 --speculative-draft-model-path Zhihu-ai/Zhi-Create-Qwen3-32B-Eagle3 --speculative-num-steps 3 --speculative-eagle-topk 2 --speculative-num-draft-tokens 8 --tp 2 --port 8000 --dtype bfloat16 --reasoning-parser deepseek-r1 --served-model-name Zhi-Create-Qwen3-32B
# send request
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Zhi-Create-Qwen3-32B",
"prompt": "请你以鲁迅的口吻,写一篇介绍西湖醋鱼的文章",
"max_tokens": 4096,
"temperature": 0.6,
"top_p": 0.95
}'
```
```python
# Alternative: Using OpenAI API
from openai import OpenAI
openai_api_key = "empty"
openai_api_base = "http://127.0.0.1:8000/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base
)
def get_answer(messages):
response = client.chat.completions.create(
messages=messages,
model="Zhi-Create-Qwen3-32B",
max_tokens=4096,
temperature=0.3,
top_p=0.95,
stream=True,
extra_body = {"chat_template_kwargs": {"enable_thinking": True}}
)
answer = ""
reasoning_content_all = ""
for each in response:
each_content = each.choices[0].delta.content
if hasattr(each.choices[0].delta, "content"):
each_content = each.choices[0].delta.content
else:
each_content = None
if hasattr(each.choices[0].delta, "reasoning_content"):
reasoning_content = each.choices[0].delta.reasoning_content
else:
reasoning_content = None
if each_content is not None:
answer += each_content
print(each_content, end="", flush=True)
if reasoning_content is not None:
reasoning_content_all += reasoning_content
print(reasoning_content, end="", flush=True)
return answer, reasoning_content_all
prompt = "请你以鲁迅的口吻,写一篇介绍西湖醋鱼的文章"
messages = [
{"role": "user", "content": prompt}
]
answer, reasoning_content_all = get_answer(messages)
``` |