Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation Paper • 2507.08441 • Published 21 days ago • 59
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Paper • 2507.10548 • Published 18 days ago • 33
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published Feb 27 • 30
TEXGen: a Generative Diffusion Model for Mesh Textures Paper • 2411.14740 • Published Nov 22, 2024 • 18
Image Inpainting via Iteratively Decoupled Probabilistic Modeling Paper • 2212.02963 • Published Dec 6, 2022
Is synthetic data from generative models ready for image recognition? Paper • 2210.07574 • Published Oct 14, 2022
Towards Efficient and Scale-Robust Ultra-High-Definition Image Demoireing Paper • 2207.09935 • Published Jul 20, 2022
GO-NeRF: Generating Virtual Objects in Neural Radiance Fields Paper • 2401.05750 • Published Jan 11, 2024
UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation Paper • 2312.08754 • Published Dec 14, 2023 • 11
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning Paper • 2503.06960 • Published Mar 10 • 3
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection Paper • 2411.14794 • Published Nov 22, 2024 • 13
MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More Paper • 2410.06270 • Published Oct 8, 2024 • 1
MarS3D: A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan 3D Point Clouds Paper • 2307.09316 • Published Jul 18, 2023 • 1