🚀 Optimized Models: torchao & Pruna Quantization
Collection
Quantized Models using torchao & Pruna for efficient inference and deployment.
•
6 items
•
Updated
•
1
SmolVLM2‑2.2B‑Base Quantized
This is a quantized version of SmolVLM2‑2.2B‑Base, a compact yet powerful vision+language model by Hugging Face. It’s designed for multimodal understanding—including images, multi‑image inputs, and videos—while offering faster and more efficient inference thanks to quantization. Perfect for on-device and resource-constrained deployments.
Method: torchao quantization
Weight Precision: int8
Activation Precision: int8 dynamic
Technique: Symmetric mapping
Base model
HuggingFaceTB/SmolLM2-1.7B