PALO: A Polyglot Large Multimodal Model for 5B People Paper • 2402.14818 • Published Feb 22, 2024 • 25
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models Paper • 2311.13435 • Published Nov 22, 2023 • 19
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models Paper • 2306.05424 • Published Jun 8, 2023 • 7