You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

By clicking "Agree", you acknowledge that this model is released solely for academic research purposes. It is initialized from Stable Diffusion v2.1 base (CreativeML Open RAIL++-M License) and further trained on a subset of the Re-LAION-5B (research-safe) dataset. You agree to review and comply with the terms and licenses of both the pretrained model and training dataset, and you bear responsibility for any use of this model.

Log in or Sign Up to review the conditions and access this model content.

SD v2-1-base, trained with realigned covariances (DFT-colored noise)

This repository contains a version of Stable Diffusion v2.1 base adapted with realigned covariances, using colored noise instead of white noise. The weights are initialized from the pretrained model (Stable Diffusion v2.1 base), and training was done on a 100,000-sample subset of the Re-LAION-5B research-safe dataset.

This model is intended for academic research use only and is not suitable for production deployment.

Usage

from diffusers import StableDiffusionPipeline
pretrained_model_name_or_path = "EPFL-IVRL/sd2.1-base-colorednoiseDFT"
pipe = StableDiffusionPipeline.from_pretrained(pretrained_model_name_or_path).to("cuda")
import torch
from diffusers.utils import _get_model_file
from safetensors.torch import load_file

prompt = "An astronaut riding a horse."

stats = load_file(_get_model_file(pretrained_model_name_or_path, weights_name="stats.safetensors", subfolder="initial_noise_loader"))
variance_spectrum = stats["variance_spectrum_vae64_dft"].to("cuda")

generator = torch.manual_seed(123456)
initial_noise = torch.randn((1, 4, 64, 64), generator=generator).to("cuda")

dft = torch.fft.fftshift(torch.fft.fftn(initial_noise, dim=(-2, -1), norm="ortho"), dim=(-2, -1))
dft *= torch.sqrt(variance_spectrum)
initial_noise = torch.real(torch.fft.ifftn(torch.fft.ifftshift(dft, dim=(-2, -1)), dim=(-2, -1), norm="ortho"))

pipe(prompt, latents=initial_noise).images[0].show()

Generated image

Model Description

Citation

@article{everaert2024covariancemismatch,
    author   = {Everaert, Martin Nicolas and Süsstrunk, Sabine and Achanta, Radhakrishna},
    title    = {{C}ovariance {M}ismatch in {D}iffusion {M}odSels}, 
    journal  = {Infoscience preprint Infoscience:20.500.14299/242173},
    month    = {November},
    year     = {2024},
}

Training details

  • Dataset size: 100k image-caption pairs from Re-LAION-5B research-safe
  • Hardware: 1 × NVIDIA A100-SXM4-80GB
  • Training Time: 9h55min
  • Pretrained model: Stable Diffusion v2.1 base
  • Covariance realignment method:
    • original data (without data whitening)
    • colored noise (DFT approximation)
    • no reweighting of components in the loss
  • Optimizer: AdamW (32-bit, no quantization)
    • betas: (0.9, 0.999)
    • weight_decay: 0.01
    • eps: 1e-08
    • lr: Constant 1e-05
  • Batch size: 32 (no gradient accumulation)
  • Caption dropout: 10%
  • Exponential Moving Average (EMA) decay: 0.99
  • Training steps: 20,000 (intermediate checkpoint at training step 10,000 in the unet_10000 subfolder)
  • Training range of noise levels:
  • Training loss:
Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including EPFL-IVRL/sd2.1-base-colorednoiseDFT