You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
By clicking "Agree", you acknowledge that this model is released solely for academic research purposes. It is initialized from Stable Diffusion v2.1 base (CreativeML Open RAIL++-M License) and further trained on a subset of the Re-LAION-5B (research-safe) dataset. You agree to review and comply with the terms and licenses of both the pretrained model and training dataset, and you bear responsibility for any use of this model.
Log in or Sign Up to review the conditions and access this model content.
SD v2-1-base, trained with realigned covariances (DFT-colored noise)
This repository contains a version of Stable Diffusion v2.1 base adapted with realigned covariances, using colored noise instead of white noise. The weights are initialized from the pretrained model (Stable Diffusion v2.1 base), and training was done on a 100,000-sample subset of the Re-LAION-5B research-safe dataset.
This model is intended for academic research use only and is not suitable for production deployment.
Usage
from diffusers import StableDiffusionPipeline
pretrained_model_name_or_path = "EPFL-IVRL/sd2.1-base-colorednoiseDFT"
pipe = StableDiffusionPipeline.from_pretrained(pretrained_model_name_or_path).to("cuda")
import torch
from diffusers.utils import _get_model_file
from safetensors.torch import load_file
prompt = "An astronaut riding a horse."
stats = load_file(_get_model_file(pretrained_model_name_or_path, weights_name="stats.safetensors", subfolder="initial_noise_loader"))
variance_spectrum = stats["variance_spectrum_vae64_dft"].to("cuda")
generator = torch.manual_seed(123456)
initial_noise = torch.randn((1, 4, 64, 64), generator=generator).to("cuda")
dft = torch.fft.fftshift(torch.fft.fftn(initial_noise, dim=(-2, -1), norm="ortho"), dim=(-2, -1))
dft *= torch.sqrt(variance_spectrum)
initial_noise = torch.real(torch.fft.ifftn(torch.fft.ifftshift(dft, dim=(-2, -1)), dim=(-2, -1), norm="ortho"))
pipe(prompt, latents=initial_noise).images[0].show()
Model Description
- Model type: Diffusion-based text-to-image generation model
- Language(s): English
- License: This model is meant for research academic use only, not for production use. See EPFL source code academic license. The pretrained model Stable Diffusion v2.1 base is licensed under CreativeML Open RAIL++-M License.
- Adapted from model: Stable Diffusion v2.1 base
- Resources for more information: Project page GitHub Repository
- Cite as:
Citation
@article{everaert2024covariancemismatch,
author = {Everaert, Martin Nicolas and Süsstrunk, Sabine and Achanta, Radhakrishna},
title = {{C}ovariance {M}ismatch in {D}iffusion {M}odSels},
journal = {Infoscience preprint Infoscience:20.500.14299/242173},
month = {November},
year = {2024},
}
Training details
- Dataset size: 100k image-caption pairs from Re-LAION-5B research-safe
- Hardware: 1 × NVIDIA A100-SXM4-80GB
- Training Time: 9h55min
- Pretrained model: Stable Diffusion v2.1 base
- Covariance realignment method:
- original data (without data whitening)
- colored noise (DFT approximation)
- no reweighting of components in the loss
- Optimizer: AdamW (32-bit, no quantization)
- betas:
(0.9, 0.999)
- weight_decay:
0.01
- eps:
1e-08
- lr: Constant
1e-05
- betas:
- Batch size:
32
(no gradient accumulation) - Caption dropout: 10%
- Exponential Moving Average (EMA) decay:
0.99
- Training steps: 20,000 (intermediate checkpoint at training step 10,000 in the
unet_10000
subfolder) - Training range of noise levels:
- Same noise scheduler as Stable Diffusion v2.1 base, i.e. $SNR \in [0.0047, 1175.4403]$
- Training loss:
- Downloads last month
- 7