zeeshanp
/

scaling_diffusion_perception

Depth Estimation

amodal-segmentation

Model card Files Files and versions Community

zeeshanp commited on 12 days ago

Commit

1d5f707

·

verified ·

1 Parent(s): 97b3d5b

Update README.md

Files changed (1) hide show

README.md +23 -1

README.md CHANGED Viewed

@@ -1,3 +1,25 @@
 ---
 license: apache-2.0
----

 ---
 license: apache-2.0
+tags:
+- diffusion
+- image-to-image
+- depth-estimation
+- optical-flow
+- amodal-segmentation
+---
+# Scaling Properties of Diffusion Models for Perceptual Tasks
+### CVPR 2025
+**Rahul Ravishankar\*, Zeeshan Patel\*, Jathushan Rajasegaran, Jitendra Malik**
+[[Paper](https://arxiv.org/abs/2411.08034)] · [[Project Page](https://scaling-diffusion-perception.github.io/)]
+In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and amodal segmentation under the framework of image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perceptual tasks. Through a careful analysis of these scaling properties, we formulate compute-optimal training and inference recipes to scale diffusion models for visual perception tasks. Our models achieve competitive performance to state-of-the-art methods using significantly less data and compute.
+## Getting started
+You can download our DiT-MoE Generalist model [here](https://huggingface.co/zeeshanp/scaling_diffusion_perception/blob/main/dit_moe_generalist.pt). Please see instructions on how to use our model in the [GitHub README](https://github.com/scaling-diffusion-perception/scaling-diffusion-perception)·