File size: 6,192 Bytes
a4b1a53 0f1b159 a4b1a53 f687afd a4b1a53 41a0648 a4b1a53 41a0648 a4b1a53 41a0648 1dad209 41a0648 a4b1a53 41a0648 dcb3593 41a0648 a4b1a53 41a0648 a4b1a53 41a0648 a4b1a53 41a0648 a4b1a53 41a0648 a4b1a53 41a0648 a4b1a53 41a0648 a4b1a53 41a0648 a4b1a53 41a0648 a4b1a53 bac80ee a4b1a53 bac80ee a4b1a53 41a0648 a4b1a53 bac80ee 2a0da16 bac80ee 41a0648 a4b1a53 41a0648 a4b1a53 41a0648 a4b1a53 41a0648 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
---
license_name: qwen
license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
pipeline_tag: text-generation
base_model: Qwen/Qwen2.5-72B
tags:
- chat
library_name: vllm
---
# Qwen2.5-72B-Instruct (with CJK Filter)
This is a mirror of [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct), deployed with a **custom server-side logits processor** that filters out CJK (Chinese, Japanese, Korean) characters during generation.
The deployment uses a **vLLM-powered, OpenAI-compatible API**, optimized for **Turkish and English** outputs by preventing undesired multilingual tokens.
---
## Features
- Language: Turkish, English, Multilingual
- Model: Qwen2.5-72B-Instruct (bfloat16)
- Max sequence length: 32,768 tokens
- Logits Processor: Filters CJK characters to prioritize Latin script
- Optimized for OpenAI-compatible deployment using vLLM
- Tensor Parallelism: 2
- License: qwen
---
## Server Deployment (Docker Compose with vLLM)
```yaml
services:
qwen-lm:
image: vllm/vllm-openai:v0.8.3
runtime: nvidia
environment:
- HUGGING_FACE_HUB_TOKEN=HF_TOKEN
- PYTHON_VERSION=3.12
- VLLM_DISABLE_COMPILE_CACHE=1
- HF_HOME=/mnt/model-cache
- VLLM_USE_V1=0
- PYTHONPATH=/app
volumes:
-
ports:
- "8010:8090"
shm_size: "220g"
command: >
--model newmindai/Qwen2.5-72b-Instruct
--tensor-parallel-size 2
--max-model-len 16384
--gpu-memory-utilization 0.95
--trust-remote-code
--host 0.0.0.0
--port 8090
--dtype bfloat16
--enable-chunked-prefill
--scheduling-policy priority
--served-model-name newmindai/Qwen2.5-72b-Instruct
--api-key <API_KEY>
--logits-processor-pattern <CJKFilter_Pattern>
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ["0", "1"]
capabilities: [gpu]
```
---
## Logits Processor: `CJKCharacterFilterLogitsProcessor`
This custom logits processor prevents generation of any token containing CJK (Chinese, Japanese, Korean) characters. This helps maintain Turkish/English-focused outputs.
```python
class CJKCharacterFilterLogitsProcessor:
def __init__(self, tokenizer, device):
self.tokenizer = tokenizer
self.device = device
self.mask = None
def __call__(self, token_ids, logits):
if self.mask is None:
token_ids_range = torch.arange(logits.size(-1), device=self.device)
decoded_tokens = self.tokenizer.batch_decode(
token_ids_range.unsqueeze(1), skip_special_tokens=True
)
self.mask = torch.tensor([
any(
0x4E00 <= ord(c) <= 0x9FFF or # Chinese
0x3400 <= ord(c) <= 0x4DBF or # Chinese Extension A
0xF900 <= ord(c) <= 0xFAFF or # CJK Compatibility
0x3040 <= ord(c) <= 0x30FF or # Japanese Kana
0xAC00 <= ord(c) <= 0xD7AF # Korean Hangul
for c in token
) for token in decoded_tokens
], device=self.device)
logits_processed = logits.clone()
logits_processed[self.mask] = -float("inf")
return logits_processed
```
---
## OpenAI-Compatible Client Usage
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8010/v1",
api_key="YOUR_API_KEY"
)
response = client.chat.completions.create(
model="newmindai/Qwen2.5-72b-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Türkiye'deki medeni hukuk sistemini açıkla."}
],
extra_body={
"logits_processors": [
"logits_processor_zoo.vllm.custom_logits_processor.cjk_filter"
]
},
temperature=0.6,
top_p=0.95,
frequency_penalty=1.005,
stream=True,
max_tokens=256
)
for chunk in response:
if chunk.choices:
print(chunk.choices[0].delta.content, end="", flush=True)
```
---
## Example Comparison: With vs. Without CJK Filter
**Prompt**: `"Lütfen bana Çince bir paragraf yazın"`
### With `logits_processors`
```
Tabii ki, size Çince bir paragraf yazabilirim. İşte:
Beijing, Çin'in başkenti ve en büyük şehirlerinden biridir. Tarihi ve kültürel zenginlikleriyle ünlüdür. Şehrin merkezinde bulunan TIANANMEN Meydanı, dünyanın en büyük kentsel meydanlarından biridir. Ayrıca, Forbidden City (Zijin Cheng), 15. yüzyıldan beri Ming hanedanının ve Qing hanedanının sarayını barındırmıştır. Beijing aynı zamanda modern mimarinin örnekleriyle de dikkat çekmektedir, örneğin Ulusal Stadyum (Cuzhouniao veya Kuş Anıtı) 2008 Yaz Olimpiyatları için inşa edilmiştir.
```
### Without `logits_processors`
```
Elbette, size Çince bir paragraf yazabilirim. İşte:
中国的文化悠久而丰富多彩。从古代的四大发明到现代的科技发展,中国一直在不断地进步和创新。在艺术方面,中国画、书法和陶瓷艺术都是世界著名的。此外,中国的饮食文化也是其独特魅力的一部分,各地的特色菜肴让人回味无穷。无论是在历史、文化还是自然景观上,中国都有许多值得探索的地方.
```
Using the logits processor ensures that only Turkish and English text is generated, even under prompts requesting multilingual content.
---
## Evaluation
**Mezura Benchmarking**
Final performance was benchmarked using the [Mezura](https://huggingface.co/spaces/newmindai/Mezura) — a standardized evaluation suite developed by NewmindAI for structured Turkish NLP tasks.
## License
This model inherits the license of [Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct), which is licensed under **qwen**. You are free to use, adapt, and distribute the model under the terms specified in the license.
---
## Contact
For support, questions, or feature requests, please contact [newmindai on Hugging Face](https://huggingface.co/newmindai) or open an issue in the associated model repository. |