使用llama-factory训练70B最低的硬件配置是什么?
#20 opened about 1 month ago
by
Lraos

Do not require reasoning but just the ouput
1
#19 opened about 2 months ago
by
ameyv6
chat_template中为什么要把assistant角色中的<think>过程切掉
2
#18 opened about 2 months ago
by
zhm0
能否发布一个awq版本的模型:deepseek-r1-distill-llama-70b-AWQ
#17 opened about 2 months ago
by
classdemo
Update README.md
#16 opened 2 months ago
by
shubham001213
Does DeepSeek-Llama-70B support tensor parallelism for multi-GPU inference?
1
#14 opened 2 months ago
by
Merk0701234
weight files naming is not regular rule
#13 opened 3 months ago
by
haili-tian
How much vram do you need?
8
#12 opened 3 months ago
by
hyun10
Upload IMG_4815.jpeg
#11 opened 3 months ago
by
H3mzy11

Amazon Sagemaker deployment failing with CUDA OutOfMemory error
3
#10 opened 3 months ago
by
neelkapadia
<thinking> is the proper tag?
1
4
#8 opened 3 months ago
by
McUH
Add pipeline tag
#7 opened 3 months ago
by
nielsr

Template
1
#6 opened 3 months ago
by
tugot17
SFT (Non-RL) distillation is this good on a sub-100B model?
3
#2 opened 3 months ago
by
KrishnaKaasyap