WAN 2.1 14B Quantized with Q-DiT (W8A8)

This repository contains the WAN 2.1 14B video generation model quantized to W8A8 using Q-DiT.

Model Details

  • Original Model: WAN 2.1 14B (FirstIntelligence/Wan2.1-T2V-14B-Diffusers)
  • Quantization Method: Q-DiT
  • Configuration: W8A8 (8-bit weights, 8-bit activations)
  • Size Reduction: ~28GB โ†’ ~14GB (50% reduction)

Important Note

This repository only contains the quantized transformer model. The VAE encoder/decoder should be loaded separately from the original model.

Usage

import torch
from diffusers import AutoencoderKL

# Load the quantized transformer
quantized_state_dict = torch.load("quantized_transformer.pt", map_location="cuda")

# Load VAE separately (not quantized)
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse").cuda()

# You'll need to implement the model architecture and load the weights
# See the original Q-DiT repository for the model implementation

Quantization Details

  • Weight bits: 8
  • Activation bits: 8
  • Weight group size: 128
  • Activation group size: 128
  • Calibration: 32 video samples
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support