metadata

license: apple-amlr

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

Website | arXiv | GitHub | 🤗 Demo | BibTeX

Official implementation and pre-trained models for:
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length, arXiv 2025
Roman Bachmann*, Jesse Allardice*, David Mizrahi*, Enrico Fini, Oğuzhan Fatih Kar, Elmira Amirloo, Alaaeldin El-Nouby, Amir Zamir, Afshin Dehghan

Installation

For install instructions, please see https://github.com/apple/ml-flextok.

Usage

To load the 8-channel VAE-GAN directly from HuggingFace Hub and autoencode a sample image, call:

from diffusers.models import AutoencoderKL
from flextok.utils.demo import imgs_from_urls

vae = AutoencoderKL.from_pretrained(
    'EPFL-VILAB/flextok_vae_c8', low_cpu_mem_usage=False
).eval()

# Load example images of shape (B, 3, H, W), normalized to [-1,1]
imgs = imgs_from_urls(urls=['https://storage.googleapis.com/flextok_site/nb_demo_images/0.png'])

# Autoencode with the VAE
latents = vae.encode(imgs).latent_dist.sample() # Shape (B, 8, H//8, W//8)
reconst = vae.decode(latents).sample # Shape (B, 3, H, W)

Citation

If you find this repository helpful, please consider citing our work:

@article{flextok,
    title={{FlexTok}: Resampling Images into 1D Token Sequences of Flexible Length},
    author={Roman Bachmann and Jesse Allardice and David Mizrahi and Enrico Fini and O{\u{g}}uzhan Fatih Kar and Elmira Amirloo and Alaaeldin El-Nouby and Amir Zamir and Afshin Dehghan},
    journal={arXiv 2025},
    year={2025},
}

License

The model weights in this repository are released under the Apple Model License for Research.