FasterViT: Fast Vision Transformers with Hierarchical Attention Paper • 2306.06189 • Published Jun 9, 2023 • 30
Adaptive Sharpness-Aware Pruning for Robust Sparse Networks Paper • 2306.14306 • Published Jun 25, 2023
Global Vision Transformer Pruning with Hessian-Aware Saliency Paper • 2110.04869 • Published Oct 10, 2021
RegionGPT: Towards Region Understanding Vision Language Model Paper • 2403.02330 • Published Mar 4, 2024 • 2
LITA: Language Instructed Temporal-Localization Assistant Paper • 2403.19046 • Published Mar 27, 2024 • 20
X-VILA: Cross-Modality Alignment for Large Language Model Paper • 2405.19335 • Published May 29, 2024
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19, 2024 • 53
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation Paper • 2409.04429 • Published Sep 6, 2024
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation Paper • 2410.21271 • Published Oct 28, 2024 • 7
VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge Paper • 2411.12915 • Published Nov 19, 2024
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6 • 93
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Paper • 2409.17481 • Published Sep 26, 2024 • 48
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28, 2024 • 88