VLM - a NothingLQH Collection

NothingLQH 's Collections

VLM

ORC

Code

Speech

Prompt

Story

NLP

Anime

3D

Video

DatasetLanguage

Vistral-7B-Chat

Image

LLM

VLM

updated Jun 16

FocusedAD: Character-centric Movie Audio Description

Paper • 2504.12157 • Published Apr 16 • 9
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding

Paper • 2504.10465 • Published Apr 14 • 27
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Paper • 2504.13180 • Published Apr 17 • 18
OS-Copilot/OS-Atlas-Base-7B

Image-Text-to-Text • 8B • Updated Nov 19, 2024 • 2.39k • 42
google/siglip-so400m-patch14-384

Zero-Shot Image Classification • 0.9B • Updated Sep 26, 2024 • 3.7M • 585
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning

Paper • 2505.14231 • Published May 20 • 53
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization

Paper • 2504.12083 • Published Apr 16 • 4
Running on Zero

87

87

D-Fine - SOTA Real-Time Object Detector

⚡

Object Detection on Images and Video
ByteDance-Seed/BAGEL-7B-MoT

Any-to-Any • 15B • Updated Jun 23 • 844 • 1.11k
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Paper • 2506.03143 • Published Jun 3 • 51
Running

16

16

Leaderboard: Physical Reasoning from Video

🏃

Submit and evaluate model performance on video and text tasks
Running on Zero

MCP

28

28

Gaze LLE

👀

Gaze Target Estimation
Hcompany/Holo1-7B

Image-Text-to-Text • 8B • Updated Jun 10 • 3.94k • 219
OpenGVLab/InternVL3-9B

Image-Text-to-Text • 9B • Updated May 29 • 11.4k • 26