Burkov
Andriy
AI & ML interests
None yet
Organizations
None yet
Andriy's activity
Issues with FSDP and DeepSpeed During Distributed Training for Gemma
2
5
#30 opened 9 months ago
by
anandhperumal
How does v0.2 manages to support 32k token context without Sliding Window Attention?
4
#85 opened about 1 year ago
by
Andriy
What is the max. content length of Mistral-7B-Instruct-v0.2?
17
#43 opened about 1 year ago
by
hanshupe
Longer inference time
2
#4 opened about 1 year ago
by
dittops

What the SFT data?
3
5
#7 opened over 1 year ago
by
Ede-CH
Dataset?
5
#1 opened about 1 year ago
by
0xbitches
Questions about architecture (+ LoRA)
2
#16 opened about 1 year ago
by
alex0dd

Can you tell us the original models that you merged to create this model?
3
1
#3 opened over 1 year ago
by
Bruce001