APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay Paper • 2504.03601 • Published 19 days ago • 16
InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback Paper • 2306.14898 • Published Jun 26, 2023
xLAM: A Family of Large Action Models to Empower AI Agent Systems Paper • 2409.03215 • Published Sep 5, 2024 • 4
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments Paper • 2411.02305 • Published Nov 4, 2024
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding Paper • 2411.04282 • Published Nov 6, 2024 • 36
SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs Paper • 2411.13547 • Published Nov 20, 2024
ActionStudio: A Lightweight Framework for Data and Training of Large Action Models Paper • 2503.22673 • Published 26 days ago • 12
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion Paper • 2406.04338 • Published Jun 6, 2024 • 40
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement Paper • 2402.07456 • Published Feb 12, 2024 • 45
Tree of Thoughts: Deliberate Problem Solving with Large Language Models Paper • 2305.10601 • Published May 17, 2023 • 12