Hongxu Yin's picture

1 2

Hongxu Yin

yinhongxu

·

AI & ML interests

None yet

Recent Activity

authored a paper 23 days ago

FasterViT: Fast Vision Transformers with Hierarchical Attention

authored a paper 23 days ago

Adaptive Sharpness-Aware Pruning for Robust Sparse Networks

authored a paper 23 days ago

Global Vision Transformer Pruning with Hessian-Aware Saliency

View all activity

Organizations

yinhongxu's activity

authored 15 papers 23 days ago

FasterViT: Fast Vision Transformers with Hierarchical Attention

Paper • 2306.06189 • Published Jun 9, 2023 • 30

Adaptive Sharpness-Aware Pruning for Robust Sparse Networks

Paper • 2306.14306 • Published Jun 25, 2023

Global Vision Transformer Pruning with Hessian-Aware Saliency

Paper • 2110.04869 • Published Oct 10, 2021

DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14, 2024 • 27

RegionGPT: Towards Region Understanding Vision Language Model

Paper • 2403.02330 • Published Mar 4, 2024 • 2

Global Context Vision Transformers

Paper • 2206.09959 • Published Jun 20, 2022

LITA: Language Instructed Temporal-Localization Assistant

Paper • 2403.19046 • Published Mar 27, 2024 • 20

X-VILA: Cross-Modality Alignment for Large Language Model

Paper • 2405.19335 • Published May 29, 2024

Flextron: Many-in-One Flexible Large Language Model

Paper • 2406.10260 • Published Jun 11, 2024 • 2

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 53

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Paper • 2409.04429 • Published Sep 6, 2024

EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation

Paper • 2410.21271 • Published Oct 28, 2024 • 7

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 60

VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge

Paper • 2411.12915 • Published Nov 19, 2024

Scaling Vision Pre-Training to 4K Resolution

Paper • 2503.19903 • Published 24 days ago • 39

upvoted a paper 24 days ago

Scaling Vision Pre-Training to 4K Resolution

Paper • 2503.19903 • Published 24 days ago • 39

authored a paper about 1 month ago

Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6 • 93

upvoted a paper 4 months ago

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 60

authored a paper 7 months ago

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

Paper • 2409.17481 • Published Sep 26, 2024 • 48

authored a paper 8 months ago

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published Aug 28, 2024 • 88