TinyLLaVA-Video-R1 / README.md
Zhang199's picture
Update README.md
1c18ef0 verified
metadata
license: apache-2.0
pipeline_tag: video-text-to-text
library_name: transformers

TinyLLaVA-Video-R1

arXivGithub

Here, we introduce a small-scale video reasoning model TinyLLaVA-Video-R1, based on the traceably trained model TinyLLaVA-Video. After reinforcement learning on general Video-QA datasets, the model not only significantly improves its reasoning and thinking abilities, but also exhibits the emergent characteristic of “aha moments”.

Result

Model (HF Path) Video-MME(wo sub) MVBench MLVU MMVU(mc)
Zhang199/TinyLLaVA-Video-R1 46.6 49.5 52.4 46.9