AI-paper - a shankars Collection

shankars 's Collections

AI-paper

updated about 6 hours ago

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published 10 days ago • 4
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published 9 days ago • 16
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published 18 days ago • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published 4 days ago • 38
MultiRef: Controllable Image Generation with Multiple Visual References

Paper • 2508.06905 • Published 14 days ago • 19
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos

Paper • 2508.14041 • Published 4 days ago • 50
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published 17 days ago • 102
Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

Paper • 2508.12800 • Published 5 days ago • 4
Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends

Paper • 2508.11548 • Published 8 days ago • 5
Evaluating Podcast Recommendations with Profile-Aware LLM-as-a-Judge

Paper • 2508.08777 • Published 11 days ago • 13
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer

Paper • 2508.09131 • Published 11 days ago • 14
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

Paper • 2508.14704 • Published 3 days ago • 26
From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery

Paper • 2508.14111 • Published 6 days ago • 25
RynnEC: Bringing MLLMs into Embodied World

Paper • 2508.14160 • Published 4 days ago • 12
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8 • 185
Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 116
A Survey on Large Language Model Benchmarks

Paper • 2508.15361 • Published 2 days ago • 11
Deep Think with Confidence

Paper • 2508.15260 • Published 3 days ago • 30
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

Paper • 2501.05452 • Published Jan 9 • 15
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

Paper • 2504.15279 • Published Apr 21 • 75
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities

Paper • 2406.14562 • Published Jun 20, 2024 • 29
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published Jan 10 • 66
Thinking with Generated Images

Paper • 2505.22525 • Published May 28 • 14
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models

Paper • 2505.13444 • Published May 19 • 16
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Paper • 2407.01284 • Published Jul 1, 2024 • 82
ComposeAnything: Composite Object Priors for Text-to-Image Generation

Paper • 2505.24086 • Published May 30 • 5
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30 • 86
Visual Planning: Let's Think Only with Images

Paper • 2505.11409 • Published May 16 • 57
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9, 2024 • 48
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

Paper • 2403.12884 • Published Mar 19, 2024 • 1
CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography

Paper • 2504.10090 • Published Apr 14
Visual Programming: Compositional visual reasoning without training

Paper • 2211.11559 • Published Nov 18, 2022 • 1
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

Paper • 2408.02210 • Published Aug 5, 2024 • 9
MMFactory: A Universal Solution Search Engine for Vision-Language Tasks

Paper • 2412.18072 • Published Dec 24, 2024 • 20
Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published 2 days ago • 185
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Paper • 2504.06261 • Published Apr 8 • 111
Star Attention: Efficient LLM Inference over Long Sequences

Paper • 2411.17116 • Published Nov 26, 2024 • 56
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 134
LLM Inference Unveiled: Survey and Roofline Model Insights

Paper • 2402.16363 • Published Feb 26, 2024 • 2
Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures

Paper • 2504.11750 • Published Apr 16
Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices

Paper • 2410.11795 • Published Oct 15, 2024 • 18
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions

Paper • 2504.19056 • Published Apr 27 • 18
Personalized Image Generation with Deep Generative Models: A Decade Survey

Paper • 2502.13081 • Published Feb 18
Diffusion Models: A Comprehensive Survey of Methods and Applications

Paper • 2209.00796 • Published Sep 2, 2022
An Empirical Study of GPT-4o Image Generation Capabilities

Paper • 2504.05979 • Published Apr 8 • 63
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation

Paper • 2502.09411 • Published Feb 13 • 21
A survey of Generative AI Applications

Paper • 2306.02781 • Published Jun 5, 2023
Text-to-image Diffusion Models in Generative AI: A Survey

Paper • 2303.07909 • Published Mar 14, 2023