README.md · Zhang199/TinyLLaVA-Video-R1 at main

metadata

license: apache-2.0
pipeline_tag: video-text-to-text
library_name: transformers

TinyLLaVA-Video-R1

Here, we introduce a small-scale video reasoning model TinyLLaVA-Video-R1, based on the traceably trained model TinyLLaVA-Video. After reinforcement learning on general Video-QA datasets, the model not only significantly improves its reasoning and thinking abilities, but also exhibits the emergent characteristic of “aha moments”.

Result

Model (HF Path)	Video-MME(wo sub)	MVBench	MLVU	MMVU(mc)
Zhang199/TinyLLaVA-Video-R1	46.6	49.5	52.4	46.9