2 27 22

Gullal Singh Cheema

gullalc

gullalc

AI & ML interests

Multimodality, Vision and Language, Cross-modal relations, Video Understanding

Recent Activity

upvoted a paper 8 days ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

upvoted a paper 8 days ago

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

upvoted a paper 11 days ago

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

View all activity

Organizations

None yet

gullalc's activity

upvoted 2 papers 8 days ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 9 days ago • 239

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published 13 days ago • 42

upvoted 4 papers 11 days ago

upvoted a paper 12 days ago

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

Paper • 2504.07951 • Published 13 days ago • 27

upvoted 5 papers 19 days ago

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

Paper • 2503.21614 • Published 27 days ago • 39

Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features

Paper • 2504.00557 • Published 22 days ago • 15

Scaling Language-Free Visual Representation Learning

Paper • 2504.01017 • Published 22 days ago • 26

Improved Visual-Spatial Reasoning via R1-Zero-Like Training

Paper • 2504.00883 • Published 22 days ago • 62

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

Paper • 2504.02587 • Published 20 days ago • 30

upvoted 2 papers 21 days ago

Multi-Token Attention

Paper • 2504.00927 • Published 22 days ago • 45

Command A: An Enterprise-Ready Large Language Model

Paper • 2504.00698 • Published 22 days ago • 25

upvoted a collection 26 days ago

BLIP models

Collection

A collection of all BLIP models • 8 items • Updated 5 days ago • 24

upvoted 2 papers 29 days ago

Where do Large Vision-Language Models Look at when Answering Questions?

Paper • 2503.13891 • Published Mar 18 • 8

See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias

Paper • 2503.13834 • Published Mar 18 • 5

upvoted a paper about 1 month ago

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Paper • 2503.12937 • Published Mar 17 • 28

upvoted an article 2 months ago

Article

SigLIP 2: A better multilingual vision language encoder

Feb 21

• 153

upvoted a collection 6 months ago

Llama 3.1

Collection

This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Dec 6, 2024 • 663