Manan Shah's picture

Manan Shah

cs-mshah

·

https://cs-mshah.github.io/

AI & ML interests

Computer Vision

Recent Activity

upvoted a paper 8 days ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

upvoted a paper 13 days ago

SmolVLM: Redefining small and efficient multimodal models

liked a model 18 days ago

reducto/RolmOCR

View all activity

Organizations

cs-mshah's activity

upvoted a paper 8 days ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

Paper • 2504.07615 • Published 13 days ago • 30

upvoted a paper 13 days ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published 16 days ago • 168

upvoted a paper about 1 month ago

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

Paper • 2503.06749 • Published Mar 9 • 29

upvoted an article 2 months ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

Feb 20

• 237

upvoted a paper 7 months ago

Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections

Paper • 2409.14677 • Published Sep 23, 2024 • 16

upvoted 2 papers 8 months ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 130

BRAT: Bonus oRthogonAl Token for Architecture Agnostic Textual Inversion

Paper • 2408.04785 • Published Aug 8, 2024 • 9

upvoted a collection 9 months ago

Perturbed Attention Guidance pipelines

Pipelines for Perturbed Attention Guidance with 🧨 library • 8 items • Updated Jun 26, 2024 • 6

upvoted a paper 10 months ago

Scalable 3D Captioning with Pretrained Models

Paper • 2306.07279 • Published Jun 12, 2023 • 15

upvoted a paper 11 months ago

Reproducibility Study of CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification

Paper • 2405.11574 • Published May 19, 2024 • 1