WAN 2.1 14B Quantized with Q-DiT (W8A8)

This repository contains the WAN 2.1 14B video generation model quantized to W8A8 using Q-DiT.

Model Details

Original Model: WAN 2.1 14B (FirstIntelligence/Wan2.1-T2V-14B-Diffusers)
Quantization Method: Q-DiT
Configuration: W8A8 (8-bit weights, 8-bit activations)
Size Reduction: ~28GB → ~14GB (50% reduction)

Important Note

This repository only contains the quantized transformer model. The VAE encoder/decoder should be loaded separately from the original model.

Usage

import torch
from diffusers import AutoencoderKL

# Load the quantized transformer
quantized_state_dict = torch.load("quantized_transformer.pt", map_location="cuda")

# Load VAE separately (not quantized)
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse").cuda()

# You'll need to implement the model architecture and load the weights
# See the original Q-DiT repository for the model implementation

Quantization Details

Weight bits: 8
Activation bits: 8
Weight group size: 128
Activation group size: 128
Calibration: 32 video samples