TinyLLaVA-Video-R1

Here, we introduce a small-scale video reasoning model TinyLLaVA-Video-R1, based on the traceably trained model TinyLLaVA-Video. After reinforcement learning on general Video-QA datasets, the model not only significantly improves its reasoning and thinking abilities, but also exhibits the emergent characteristic of “aha moments”.

Result

Model (HF Path)	Video-MME	MVBench	MLVU	MMVU
Zhang199/TinyLLaVA-Video-R1	46.6	49.5	52.4	46.9

Downloads last month: 32

Safetensors

Model size

3.63B params

Tensor type

BF16

Inference Providers NEW

Video-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Zhang199/TinyLLaVA-Video-R1

TinyLLaVA-Video-R1

Collection

Towards Smaller LMMs for Video Reasoning. • 4 items • Updated 5 days ago