Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations Paper • 2508.09789 • Published 10 days ago • 4
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents Paper • 2508.13186 • Published 9 days ago • 16
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents Paper • 2508.04038 • Published 18 days ago • 1
MultiRef: Controllable Image Generation with Multiple Visual References Paper • 2508.06905 • Published 14 days ago • 19
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos Paper • 2508.14041 • Published 4 days ago • 50
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published 17 days ago • 102
Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward Paper • 2508.12800 • Published 5 days ago • 4
Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends Paper • 2508.11548 • Published 8 days ago • 5
Evaluating Podcast Recommendations with Profile-Aware LLM-as-a-Judge Paper • 2508.08777 • Published 11 days ago • 13
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer Paper • 2508.09131 • Published 11 days ago • 14
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers Paper • 2508.14704 • Published 3 days ago • 26
From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery Paper • 2508.14111 • Published 6 days ago • 25
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8 • 185
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding Paper • 2501.05452 • Published Jan 9 • 15
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published Apr 21 • 75
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities Paper • 2406.14562 • Published Jun 20, 2024 • 29
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published Jan 10 • 66
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19 • 16
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Paper • 2407.01284 • Published Jul 1, 2024 • 82
ComposeAnything: Composite Object Priors for Text-to-Image Generation Paper • 2505.24086 • Published May 30 • 5
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper • 2506.23918 • Published Jun 30 • 86
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model Paper • 2407.07053 • Published Jul 9, 2024 • 48
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning Paper • 2403.12884 • Published Mar 19, 2024 • 1
CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography Paper • 2504.10090 • Published Apr 14
Visual Programming: Compositional visual reasoning without training Paper • 2211.11559 • Published Nov 18, 2022 • 1
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning Paper • 2408.02210 • Published Aug 5, 2024 • 9
MMFactory: A Universal Solution Search Engine for Vision-Language Tasks Paper • 2412.18072 • Published Dec 24, 2024 • 20
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published Apr 8 • 111
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published Nov 26, 2024 • 56
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters Paper • 2504.08791 • Published Apr 7 • 134
LLM Inference Unveiled: Survey and Roofline Model Insights Paper • 2402.16363 • Published Feb 26, 2024 • 2
Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures Paper • 2504.11750 • Published Apr 16
Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices Paper • 2410.11795 • Published Oct 15, 2024 • 18
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions Paper • 2504.19056 • Published Apr 27 • 18
Personalized Image Generation with Deep Generative Models: A Decade Survey Paper • 2502.13081 • Published Feb 18
Diffusion Models: A Comprehensive Survey of Methods and Applications Paper • 2209.00796 • Published Sep 2, 2022
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation Paper • 2502.09411 • Published Feb 13 • 21
Text-to-image Diffusion Models in Generative AI: A Survey Paper • 2303.07909 • Published Mar 14, 2023