File size: 6,192 Bytes

---
license_name: qwen
license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
pipeline_tag: text-generation
base_model: Qwen/Qwen2.5-72B
tags:
- chat
library_name: vllm
---

# Qwen2.5-72B-Instruct (with CJK Filter)

This is a mirror of [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct), deployed with a **custom server-side logits processor** that filters out CJK (Chinese, Japanese, Korean) characters during generation.

The deployment uses a **vLLM-powered, OpenAI-compatible API**, optimized for **Turkish and English** outputs by preventing undesired multilingual tokens.

---

## Features

- Language: Turkish, English, Multilingual
- Model: Qwen2.5-72B-Instruct (bfloat16)
- Max sequence length: 32,768 tokens
- Logits Processor: Filters CJK characters to prioritize Latin script
- Optimized for OpenAI-compatible deployment using vLLM
- Tensor Parallelism: 2
- License: qwen

---

## Server Deployment (Docker Compose with vLLM)

```yaml
services:
  qwen-lm:
    image: vllm/vllm-openai:v0.8.3
    runtime: nvidia
    environment:
      - HUGGING_FACE_HUB_TOKEN=HF_TOKEN
      - PYTHON_VERSION=3.12
      - VLLM_DISABLE_COMPILE_CACHE=1
      - HF_HOME=/mnt/model-cache
      - VLLM_USE_V1=0
      - PYTHONPATH=/app
    volumes:
      - 
    ports:
      - "8010:8090"
    shm_size: "220g"
    command: >
      --model newmindai/Qwen2.5-72b-Instruct
      --tensor-parallel-size 2
      --max-model-len 16384
      --gpu-memory-utilization 0.95
      --trust-remote-code
      --host 0.0.0.0
      --port 8090
      --dtype bfloat16
      --enable-chunked-prefill
      --scheduling-policy priority
      --served-model-name newmindai/Qwen2.5-72b-Instruct
      --api-key <API_KEY>
      --logits-processor-pattern <CJKFilter_Pattern>
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["0", "1"]
              capabilities: [gpu]
```

---

## Logits Processor: `CJKCharacterFilterLogitsProcessor`

This custom logits processor prevents generation of any token containing CJK (Chinese, Japanese, Korean) characters. This helps maintain Turkish/English-focused outputs.

```python
class CJKCharacterFilterLogitsProcessor:
    def __init__(self, tokenizer, device):
        self.tokenizer = tokenizer
        self.device = device
        self.mask = None

    def __call__(self, token_ids, logits):
        if self.mask is None:
            token_ids_range = torch.arange(logits.size(-1), device=self.device)
            decoded_tokens = self.tokenizer.batch_decode(
                token_ids_range.unsqueeze(1), skip_special_tokens=True
            )
            self.mask = torch.tensor([
                any(
                    0x4E00 <= ord(c) <= 0x9FFF or  # Chinese
                    0x3400 <= ord(c) <= 0x4DBF or  # Chinese Extension A
                    0xF900 <= ord(c) <= 0xFAFF or  # CJK Compatibility
                    0x3040 <= ord(c) <= 0x30FF or  # Japanese Kana
                    0xAC00 <= ord(c) <= 0xD7AF     # Korean Hangul
                    for c in token
                ) for token in decoded_tokens
            ], device=self.device)

        logits_processed = logits.clone()
        logits_processed[self.mask] = -float("inf")
        return logits_processed
```
---

## OpenAI-Compatible Client Usage

```python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8010/v1",
    api_key="YOUR_API_KEY"
)

response = client.chat.completions.create(
    model="newmindai/Qwen2.5-72b-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Türkiye'deki medeni hukuk sistemini açıkla."}
    ],
    extra_body={
        "logits_processors": [
            "logits_processor_zoo.vllm.custom_logits_processor.cjk_filter"
        ]
    },
    temperature=0.6,
    top_p=0.95,
    frequency_penalty=1.005,
    stream=True,
    max_tokens=256
)

for chunk in response:
    if chunk.choices:
        print(chunk.choices[0].delta.content, end="", flush=True)
```

---

## Example Comparison: With vs. Without CJK Filter

**Prompt**: `"Lütfen bana Çince bir paragraf yazın"`

### With `logits_processors`

```
Tabii ki, size Çince bir paragraf yazabilirim. İşte:

Beijing, Çin'in başkenti ve en büyük şehirlerinden biridir. Tarihi ve kültürel zenginlikleriyle ünlüdür. Şehrin merkezinde bulunan TIANANMEN Meydanı, dünyanın en büyük kentsel meydanlarından biridir. Ayrıca, Forbidden City (Zijin Cheng), 15. yüzyıldan beri Ming hanedanının ve Qing hanedanının sarayını barındırmıştır. Beijing aynı zamanda modern mimarinin örnekleriyle de dikkat çekmektedir, örneğin Ulusal Stadyum (Cuzhouniao veya Kuş Anıtı) 2008 Yaz Olimpiyatları için inşa edilmiştir.
```

### Without `logits_processors`

```
Elbette, size Çince bir paragraf yazabilirim. İşte:

中国的文化悠久而丰富多彩。从古代的四大发明到现代的科技发展，中国一直在不断地进步和创新。在艺术方面，中国画、书法和陶瓷艺术都是世界著名的。此外，中国的饮食文化也是其独特魅力的一部分，各地的特色菜肴让人回味无穷。无论是在历史、文化还是自然景观上，中国都有许多值得探索的地方.
```

Using the logits processor ensures that only Turkish and English text is generated, even under prompts requesting multilingual content.

---

## Evaluation

**Mezura Benchmarking**  
  Final performance was benchmarked using the [Mezura](https://huggingface.co/spaces/newmindai/Mezura) — a standardized evaluation suite developed by NewmindAI for structured Turkish NLP tasks.

## License

This model inherits the license of [Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct), which is licensed under **qwen**. You are free to use, adapt, and distribute the model under the terms specified in the license.

---

## Contact

For support, questions, or feature requests, please contact [newmindai on Hugging Face](https://huggingface.co/newmindai) or open an issue in the associated model repository.