VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model Paper • 2504.07615 • Published 13 days ago • 30
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 16 days ago • 168
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Paper • 2503.06749 • Published Mar 9 • 29
Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections Paper • 2409.14677 • Published Sep 23, 2024 • 16
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 130
BRAT: Bonus oRthogonAl Token for Architecture Agnostic Textual Inversion Paper • 2408.04785 • Published Aug 8, 2024 • 9
Perturbed Attention Guidance pipelines Collection Pipelines for Perturbed Attention Guidance with 🧨 library • 8 items • Updated Jun 26, 2024 • 6
Reproducibility Study of CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification Paper • 2405.11574 • Published May 19, 2024 • 1