Alphatao commited on
Commit
a067993
·
verified ·
1 Parent(s): 4993722

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ base_model:
5
+ - deepseek-ai/DeepSeek-V3-0324
6
+ - deepseek-ai/DeepSeek-R1
7
+ pipeline_tag: text-generation
8
+ ---
9
+ # DeepSeek-R1T-Chimera
10
+
11
+ <div align="center">
12
+ <img src="https://354918363417-runtime-assets.s3.eu-central-1.amazonaws.com/company_logo_light.svg"
13
+ alt="TNG Logo"
14
+ width="400"
15
+ style="display: inline-block; vertical-align: middle;"/>
16
+ </div>
17
+ <br>
18
+ <div align="center">
19
+ <a href="LICENSE" style="margin: 2px;">
20
+ <img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
21
+ </a>
22
+ </div>
23
+ <br>
24
+ <div align="center">
25
+ <a href="https://x.com/tngtech/status/1916284566127444468" style="margin: 2px;">
26
+ <img alt="Benchmarks" src="R1T-Chimera_Benchmarks_20250427_V1.jpg" style="display: inline-block; vertical-align: middle;"/>
27
+ </a>
28
+ </div>
29
+
30
+
31
+ **Model merge of DeepSeek-R1 and DeepSeek-V3 (0324)**
32
+
33
+ An open weights model combining the intelligence of R1 with the token efficiency of V3.
34
+
35
+ For details on the construction process and analyses of Chimera model variants, please [read our paper](https://arxiv.org/abs/2506.14794).
36
+
37
+ [Paper on arXiV](https://arxiv.org/abs/2506.14794) | [Announcement on X](https://x.com/tngtech/status/1916284566127444468) | [LinkedIn post](https://www.linkedin.com/posts/tng-technology-consulting_on-the-weekend-we-released-deepseek-r1t-chimera-activity-7323008947236290560-Cf2m) | [Try it on OpenRouter](https://openrouter.ai/tngtech/deepseek-r1t-chimera:free)
38
+
39
+
40
+ ## Model Details
41
+
42
+ - **Architecture**: DeepSeek-MoE Transformer-based language model
43
+ - **Combination Method**: Merged model weights from DeepSeek-R1 and DeepSeek-V3 (0324)
44
+ - **Release Date**: 2025-04-27
45
+
46
+ ## Use, Out-of-scope Use, Limitations, Risks, Recommendations et al
47
+ Regarding R1T Chimera, we ask you to follow the careful guidelines that Microsoft has created for their "MAI-DS-R1" DeepSeek-based model.
48
+
49
+ These guidelines are available [here on Hugging Face](https://huggingface.co/microsoft/MAI-DS-R1).
50
+
51
+ ## Contact
52
+
53
+ - Email: research@tngtech.com
54
+ - X.com: @tngtech
55
+
56
+ ## Citation
57
+
58
+ ```
59
+ @misc{tng_technology_consulting_gmbh_2025,
60
+ author = { TNG Technology Consulting GmbH },
61
+ title = { DeepSeek-R1T-Chimera },
62
+ year = 2025,
63
+ month = {April},
64
+ url = { https://huggingface.co/tngtech/DeepSeek-R1T-Chimera },
65
+ doi = { 10.57967/hf/5330 },
66
+ publisher = { Hugging Face }
67
+ }
68
+
69
+ ```