shi-labs/slowfast-video-mllm-qwen2-7b-convnext-576-frame64-s1t4 Video-Text-to-Text • Updated 14 days ago • 175
shi-labs/slowfast-video-mllm-qwen2-7b-convnext-576-frame96-s1t6 Video-Text-to-Text • Updated 14 days ago • 73
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation Paper • 2412.09585 • Published Dec 12, 2024 • 11
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts Paper • 2405.05949 • Published May 9, 2024 • 3
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation Paper • 2412.09585 • Published Dec 12, 2024 • 11
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28, 2024 • 88
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text Paper • 2403.14773 • Published Mar 21, 2024 • 11