WAN 2.1 14B Quantized with Q-DiT (W8A8)
This repository contains the WAN 2.1 14B video generation model quantized to W8A8 using Q-DiT.
Model Details
- Original Model: WAN 2.1 14B (FirstIntelligence/Wan2.1-T2V-14B-Diffusers)
- Quantization Method: Q-DiT
- Configuration: W8A8 (8-bit weights, 8-bit activations)
- Size Reduction: ~28GB โ ~14GB (50% reduction)
Important Note
This repository only contains the quantized transformer model. The VAE encoder/decoder should be loaded separately from the original model.
Usage
import torch
from diffusers import AutoencoderKL
# Load the quantized transformer
quantized_state_dict = torch.load("quantized_transformer.pt", map_location="cuda")
# Load VAE separately (not quantized)
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse").cuda()
# You'll need to implement the model architecture and load the weights
# See the original Q-DiT repository for the model implementation
Quantization Details
- Weight bits: 8
- Activation bits: 8
- Weight group size: 128
- Activation group size: 128
- Calibration: 32 video samples
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support