Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2507.07998

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Running

12

12

PyVision

💬

Ask questions about images to get detailed responses
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31

Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8, 2024 • 164
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31

VideoDeepResearch: Long Video Understanding With Agentic Tool Using

Paper • 2506.10821 • Published Jun 12 • 20
Jan-nano Technical Report

Paper • 2506.22760 • Published Jun 28 • 9
MMSearch-R1: Incentivizing LMMs to Search

Paper • 2506.20670 • Published Jun 25 • 62
WebSailor: Navigating Super-human Reasoning for Web Agent

Paper • 2507.02592 • Published Jul 3 • 110

Multimodal Agent

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 28
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 51

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 31
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 5

PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31

Interesting findings

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19 • 126
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31
OG-RAG: Ontology-Grounded Retrieval-Augmented Generation For Large Language Models

Paper • 2412.15235 • Published Dec 12, 2024

Images are Worth Variable Length of Representations

Paper • 2506.03643 • Published Jun 4 • 4
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31
Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10 • 156

about 10 hours ago

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3, 2024 • 35
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17, 2024 • 28
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 127
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17, 2024 • 23

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 31
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 5

Running

12

12

PyVision

💬

Ask questions about images to get detailed responses
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31

PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31

Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8, 2024 • 164
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31

Interesting findings

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19 • 126
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31
OG-RAG: Ontology-Grounded Retrieval-Augmented Generation For Large Language Models

Paper • 2412.15235 • Published Dec 12, 2024

VideoDeepResearch: Long Video Understanding With Agentic Tool Using

Paper • 2506.10821 • Published Jun 12 • 20
Jan-nano Technical Report

Paper • 2506.22760 • Published Jun 28 • 9
MMSearch-R1: Incentivizing LMMs to Search

Paper • 2506.20670 • Published Jun 25 • 62
WebSailor: Navigating Super-human Reasoning for Web Agent

Paper • 2507.02592 • Published Jul 3 • 110

Images are Worth Variable Length of Representations

Paper • 2506.03643 • Published Jun 4 • 4
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31
Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10 • 156

Multimodal Agent

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 28
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 51

about 10 hours ago

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3, 2024 • 35
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17, 2024 • 28
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 127
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17, 2024 • 23

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs