Resources

View closed (2)

Tool Use

#21 opened 20 days ago by

jhuntbach

使用llama-factory训练70B最低的硬件配置是什么？

#20 opened about 1 month ago by

Lraos

Do not require reasoning but just the ouput

#19 opened about 2 months ago by

ameyv6

chat_template中为什么要把assistant角色中的<think>过程切掉

#18 opened about 2 months ago by

zhm0

能否发布一个awq版本的模型：deepseek-r1-distill-llama-70b-AWQ

#17 opened about 2 months ago by

classdemo

Update README.md

#16 opened 2 months ago by

shubham001213

chnsmth

#15 opened 2 months ago by

chnsmth

Does DeepSeek-Llama-70B support tensor parallelism for multi-GPU inference?

#14 opened 2 months ago by

Merk0701234

weight files naming is not regular rule

#13 opened 3 months ago by

haili-tian

How much vram do you need?

#12 opened 3 months ago by

hyun10

Upload IMG_4815.jpeg

#11 opened 3 months ago by

H3mzy11

Amazon Sagemaker deployment failing with CUDA OutOfMemory error

#10 opened 3 months ago by

neelkapadia

<thinking> is the proper tag?

#8 opened 3 months ago by

McUH

Add pipeline tag

#7 opened 3 months ago by

nielsr

Template

#6 opened 3 months ago by

tugot17

Lora

#4 opened 3 months ago by

PSM24

SFT (Non-RL) distillation is this good on a sub-100B model?

#2 opened 3 months ago by

KrishnaKaasyap

Lfg

#1 opened 3 months ago by

Prakh24s