MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data

Paul Borne--Pons, Mikolaj Czerkawski,Rosalie Martin, Romain Rouffet

CVPR 2025 Workshop MORSE

MESA is a novel generative model based on latent denoising diffusion capable of generating 2.5D representations of terrain based on the text prompt conditioning supplied via natural language. The model produces two co-registered modalities of optical and depth maps. This model is a finetune of stable-diffusion-2-1 and is builds upon Hugging Face’s Diffusers library.

Model Description

Paper: MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data
Github: https://github.com/PaulBorneP/MESA
Project page: https://paulbornep.github.io/mesa-terrain/

Installation

# Clone the repository
git clone https://github.com/PaulBorneP/MESA
cd MESA
# using python 3.11.12
pip install -r requirements.txt

Model Download

mkdir weights
huggingface-cli download NewtNewt/MESA --local-dir ./weights

Usage

from MESA.pipeline_terrain import TerrainDiffusionPipeline
import torch

pipe = TerrainDiffusionPipeline.from_pretrained("./weights", torch_dtype=torch.float16)
pipe.to("cuda");

prompt = "A sentinel-2 image of montane forests and mountains in Mexico in August"
image,dem = pipe(prompt, num_inference_steps=50, guidance_scale=7.5)

Citation

@inproceedings{mesa2025,
title={MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data},
author={Paul Borne--Pons and Mikolaj Czerkawski and Rosalie Martin and Romain Rouffet},
year={2025},
booktitle={MORSE Workshop at CVPR 2025},
eprint={2504.07210},
url={https://arxiv.org/abs/2504.07210},}

Acknowledgements

This model is the product of a collaboration between Φ-lab, European Space Agency (ESA) and the Adobe Research (Paris, France).

NewtNewt
/

MESA