Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

.gitattributes +1 -0
README.md +261 -3
config.json +33 -0
images/data_distribution.png +0 -0
images/writingbench_score.png +3 -0
model.safetensors +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+images/writingbench_score.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,261 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+datasets:
+- Congliu/Chinese-DeepSeek-R1-Distill-data-110k
+- cognitivecomputations/dolphin-r1
+- a-m-team/AM-DeepSeek-R1-0528-Distilled
+language:
+- zh
+- en
+base_model:
+- Qwen/Qwen3-32B
+tags:
+- qwen3
+library_name: transformers
+---
+# Zhi-Create-Qwen3-32B-Eagle3
+This is a speculator model designed for use with [Zhihu-ai/Zhi-Create-Qwen3-32B](https://huggingface.co/Zhihu-ai/Zhi-Create-Qwen3-32B), based on the [EAGLE-3](https://arxiv.org/abs/2503.01840) speculative decoding algorithm.
+It was trained using the [SpecForge](https://github.com/sgl-project/SpecForge/) library on a subset of the Supervised Fine-tuning (SFT) Data from Zhihu-ai/Zhi-Create-Qwen3-32B.
+The model was trained in both thinking and non-thinking modes.
+# Zhi-Create-Qwen3-32B
+## 1. Introduction
+Zhi-Create-Qwen3-32B is a fine-tuned model derived from [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B), with a focus on enhancing creative writing capabilities. Through careful optimization, the model shows promising improvements in creative writing performance, as evaluated using the [WritingBench](https://github.com/X-PLUG/WritingBench). In our evaluation, the model attains a score of **82.08** on WritingBench, which represents a significant improvement over the base Qwen3-32B model's score of **78.97**.
+Additionally, to maintain the model's general capabilities such as knowledge and reasoning, we performed fine-grained data mixture experiments by combining general knowledge, mathematics, code, and other data types. The final evaluation results show that general capabilities remain stable with no significant decline compared to the base model.
+## 2. Training Process
+### Data
+The model's training corpus comprises three primary data sources: rigorously filtered open-source datasets, synthesized chain-of-thought reasoning corpora, and curated question-answer pairs from Zhihu.
+To achieve optimal domain coverage, we meticulously balanced the distribution of various datasets through data mixture optimization experiments. These datasets include [Dolphin-r1](https://huggingface.co/datasets/cognitivecomputations/dolphin-r1), [Congliu/Chinese-DeepSeek-R1-Distill-data-110k](https://huggingface.co/datasets/Congliu/Chinese-DeepSeek-R1-Distill-data-110k), [a-m-team/AM-DeepSeek-R1-0528-Distilled](https://huggingface.co/datasets/a-m-team/AM-DeepSeek-R1-0528-Distilled), alongside high-quality content from Zhihu. All datasets underwent comprehensive quality assurance through our Reward Model (RM) filtering pipeline. To guarantee the model’s foundational knowledge and reasoning capabilities, creative writing data accounted for approximately 23% of the training data, with the remainder consisting of mathematics, code, and fundamental general knowledge data. The chain-of-thought (CoT) reasoning components in the training data was synthesized using [deepseek-ai/DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528) and other similar models.
+The detailed data distribution is shown in the figure below:
+![data-distribution](./images/data_distribution.png)
+<figcaption style="text-align:center; font-size:0.9em; color:#666">
+Figure 1: Training data distribution showing the composition of different data sources, with creative writing data accounting for approximately 23% of the total training corpus, alongside mathematics, code, and general knowledge data.
+</figcaption>
+### Training
+**Supervised Fine-tuning (SFT)**: We employed a curriculum learning strategy for supervised fine-tuning. This methodical approach systematically enhances creative writing capabilities while incorporating diverse domain data to maintain core competencies and mitigate catastrophic forgetting. Adopting a multi-stage progressive iteration method, we select samples that were insufficiently trained in previous rounds and categorize samples by reasoning complexity and context length. This allows us to gradually increase the difficulty of training samples, achieving step-by-step enhancement in model performance.
+**Direct Preference Optimization (DPO)**: We integrate the RAFT(Reward-Ranked Fine-Tuning) method, combining rule-based systems and LLM-as-judge approaches to identify correct and incorrect samples. This enables the construction of DPO preference sample pairs to address issues such as Chinese-English code-mixing and undesirable repetition in the model, while simultaneously improving its reasoning capabilities.
+## 3. Evaluation Results
+We evaluated our model using WritingBench, a comprehensive framework for assessing large language model writing capabilities. Zhi-Create-Qwen3-32B achieved a score of 82.08 (evaluated with Claude Sonnet 3.7 as the judge), demonstrating significant improvements in creative writing performance. This represents a substantial improvement over the base Qwen3-32B model, which scored 78.97.
+The performance comparison across six different domains is presented in the figure below:
+![writingbench](./images/writingbench_score.png)
+<figcaption style="text-align:center; font-size:0.9em; color:#666">
+Figure 2: WritingBench performance comparison between Zhi-Create-Qwen3-32B and Qwen3-32B across six domains, evaluated using WritingBench with Claude 3.7 Sonnet as the judge model. The domains encompass: (D1) Academic & Engineering, (D2) Finance & Business, (D3) Politics & Law, (D4) Literature & Art, (D5) Education, and (D6) Advertising & Marketing.
+</figcaption>
+## 4. How to Run Locally
+Zhi-Create-Qwen3-32B can be deployed across various hardware configurations, including 80GB memory GPUs and single H20/A800/H800 units. For more accessible deployment, we offer quantized versions: the FP8 quantized model (Zhi-Create-Qwen3-32B-FP8) can run on dual RTX 4090 setups, while the Q4_K_M quantized version can be deployed on a single RTX 4090.
+### Transformers
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from transformers.generation import GenerationConfig
+MODEL_NAME = "Zhihu-ai/Zhi-Create-Qwen3-32B"
+tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
+# use bf16
+# model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="auto", trust_remote_code=True, bf16=True).eval()
+# use fp16
+# model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="auto", trust_remote_code=True, fp16=True).eval()
+# use cpu only
+# model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cpu", trust_remote_code=True).eval()
+# use auto mode, automatically select precision based on the device.
+model = AutoModelForCausalLM.from_pretrained(
+    MODEL_NAME,
+    device_map="auto",
+    trust_remote_code=True
+).eval()
+# Specify hyperparameters for generation. But if you use transformers>=4.32.0, there is no need to do this.
+# model.generation_config = GenerationConfig.from_pretrained(MODEL_NAME, trust_remote_code=True)
+generate_configs = {
+    "temperature": 0.6,
+    "do_sample": True,
+    "top_p": 0.95,
+    "max_new_tokens": 4096
+}
+prompt = "请你以鲁迅的口吻，写一篇介绍西湖醋鱼的文章"
+messages = [
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    **generate_configs
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(response)
+```
+### vllm
+For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm)
+```bash
+# install vllm
+pip install vllm>=0.6.4.post1
+# huggingface model id
+vllm serve Zhihu-ai/Zhi-Create-Qwen3-32B --served-model-name Zhi-Create-Qwen3-32B --port 8000
+# local path
+vllm serve /path/to/model  --served-model-name Zhi-Create-Qwen3-32B --port 8000
+curl http://localhost:8000/v1/completions \
+    -H "Content-Type: application/json" \
+    -d '{
+        "model": "Zhi-Create-Qwen3-32B",
+        "prompt": "请你以鲁迅的口吻，写一篇介绍西湖醋鱼的文章",
+        "max_tokens": 4096,
+        "temperature": 0.6,
+        "top_p": 0.95
+    }'
+```
+### SGLang
+You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang)
+```bash
+# install SGLang
+pip install "sglang[all]>=0.4.5" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python
+# huggingface model id
+python -m sglang.launch_server --model-path Zhihu-ai/Zhi-Create-Qwen3-32B --served-model-name Zhi-Create-Qwen3-32B --port 8000
+# local path
+python -m sglang.launch_server --model-path /path/to/model  --served-model-name Zhi-Create-Qwen3-32B --port 8000
+# send request
+curl http://localhost:8000/v1/completions \
+    -H "Content-Type: application/json" \
+    -d '{
+        "model": "Zhi-Create-Qwen3-32B",
+        "prompt": "请你以鲁迅的口吻，写一篇介绍西湖醋鱼的文章",
+        "max_tokens": 4096,
+        "temperature": 0.6,
+        "top_p": 0.95
+    }'
+# Alternative: Using OpenAI API
+from openai import OpenAI
+openai_api_key = "empty"
+openai_api_base = "http://127.0.0.1:8000/v1"
+client = OpenAI(
+    api_key=openai_api_key,
+    base_url=openai_api_base
+)
+def get_answer(messages):
+    response = client.chat.completions.create(
+        messages=messages,
+        model="Zhi-Create-Qwen3-32B",
+        max_tokens=4096,
+        temperature=0.3,
+        top_p=0.95,
+        stream=True,
+        extra_body = {"chat_template_kwargs": {"enable_thinking": True}}
+    )
+    answer = ""
+    reasoning_content_all = ""
+    for each in response:
+        each_content = each.choices[0].delta.content
+        if hasattr(each.choices[0].delta, "content"):
+            each_content = each.choices[0].delta.content
+        else:
+            each_content = None
+        if hasattr(each.choices[0].delta, "reasoning_content"):
+            reasoning_content = each.choices[0].delta.reasoning_content
+        else:
+            reasoning_content = None
+        if each_content is not None:
+            answer += each_content
+            print(each_content, end="", flush=True)
+        if reasoning_content is not None:
+            reasoning_content_all += reasoning_content
+            print(reasoning_content, end="", flush=True)
+    return answer, reasoning_content_all
+prompt = "请你以鲁迅的口吻，写一篇介绍西湖醋鱼的文章"
+messages = [
+    {"role": "user", "content": prompt}
+]
+answer, reasoning_content_all = get_answer(messages)
+```
+### ollama
+You can download ollama using [this](https://ollama.com/download/)
+* quantization: Q4_K_M
+```bash
+ollama run zhihu/zhi-create-qwen3-32b
+```
+* bf16
+```bash
+ollama run zhihu/zhi-create-qwen3-32b:bf16
+```
+## 5. Usage Recommendations
+For optimal performance, we recommend setting temperature between 0.5-0.7 (0.6 recommended) and top-p to 0.95 for balanced creativity and coherence.
+## 6. Citation
+```text
+@misc{Zhi-Create-Qwen3-32B,
+      title={Zhi-Create-Qwen3-32B: RAFT-Enhanced Direct Preference Optimization and Curriculum Learning for Robust Creative Writing in LLMs},
+      author={Jiewu Wang, Xu Chen, Wenyuan Su, Chao Huang, Hongkui Gao, Lin Feng, Shan Wang, Jingjing Wang, Zebin Ou},
+      year={2025},
+      eprint={},
+      archivePrefix={},
+      url={https://huggingface.co/Zhihu-ai/Zhi-Create-Qwen3-32B},
+}
+```
+## 7. Contact
+If you have any questions, please raise an issue or contact us at [ai@zhihu.com](mailto:ai@zhihu.com).

config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "architectures": [
+    "LlamaForCausalLMEagle3"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "draft_vocab_size": 32000,
+  "eos_token_id": 151645,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 5120,
+  "initializer_range": 0.02,
+  "intermediate_size": 25600,
+  "max_position_embeddings": 40960,
+  "max_window_layers": 64,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 64,
+  "num_hidden_layers": 1,
+  "num_key_value_heads": 8,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000,
+  "sliding_window": null,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.53.2",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}

images/data_distribution.png ADDED Viewed

images/writingbench_score.png ADDED Viewed

Git LFS Details

SHA256: 1d6feb3f54dbad00c9c24cf3f3b485e8534b443b55cdcbf20113ea978c51ac36
Pointer size: 131 Bytes
Size of remote file: 128 kB

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:81f91df26b525a30110e6538e33efbd71f820b9cda8070104bbdf7234bd18994
+size 3121274856