VLMs - a HeartofSheep Collection

HeartofSheep 's Collections

3D

VLMs

Image Representation

LLMs

VLMs

updated Jul 3

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

Paper • 2503.12797 • Published Mar 17 • 32
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era

Paper • 2503.12329 • Published Mar 16 • 26
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13 • 53
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 198
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10 • 134
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8 • 185
Seed1.5-VL Technical Report

Paper • 2505.07062 • Published May 11 • 150
MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21 • 96
OmniGen2: Exploration to Advanced Multimodal Generation

Paper • 2506.18871 • Published Jun 23 • 75