21 52 16

Yuhao Dong

THUdyh

AI & ML interests

None yet

Recent Activity

upvoted a paper about 11 hours ago

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

upvoted a paper 8 days ago

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

upvoted a paper 8 days ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

View all activity

Organizations

THUdyh's activity

upvoted a paper about 11 hours ago

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

Paper • 2504.15271 • Published 1 day ago • 48

upvoted 2 papers 8 days ago

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published 12 days ago • 42

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 8 days ago • 239

upvoted a paper 11 days ago

Kimi-VL Technical Report

Paper • 2504.07491 • Published 13 days ago • 118

upvoted a paper 20 days ago

Synthetic Video Enhances Physical Fidelity in Video Synthesis

Paper • 2503.20822 • Published 28 days ago • 16

upvoted a paper 26 days ago

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

Paper • 2503.21755 • Published 26 days ago • 32

upvoted a paper about 2 months ago

EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published Mar 5 • 42

upvoted a collection about 2 months ago

EgoLife

Collection

CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/ • 10 items • Updated Mar 7 • 17

upvoted 2 papers 2 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 143

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published Feb 6 • 30

upvoted 3 papers 3 months ago

s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31 • 120

PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding

Paper • 2501.16411 • Published Jan 27 • 19

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

Paper • 2501.04003 • Published Jan 7 • 27

upvoted 7 papers 4 months ago

Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models

Paper • 2412.09645 • Published Dec 10, 2024 • 37

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147

GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 97

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Paper • 2412.09501 • Published Dec 12, 2024 • 49