Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 101
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated 4 days ago • 157
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages Paper • 2503.20212 • Published 28 days ago • 5
view article Article Training and Finetuning Reranker Models with Sentence Transformers v4 28 days ago • 116
MambaVision Collection MambaVision: A Hybrid Mamba-Transformer Vision Backbone. Includes both 1K and 21K pretrained models. • 13 items • Updated 8 days ago • 31
Llama Nemotron Collection Open, Production-ready Enterprise Models • 4 items • Updated 8 days ago • 37
view article Article NVIDIA's GTC 2025 Announcement for Physical AI Developers: New Open Models and Datasets Mar 18 • 35
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM Mar 12 • 398
view article Article Train 400x faster Static Embedding Models with Sentence Transformers Jan 15 • 173