tau-bench results

by quantizor - opened 26 days ago

26 days ago

Thank you so much for sharing these models with the community! I was wondering if it would be possible for new models to test using tau-bench to evaluate tool calling performance? https://github.com/sierra-research/tau-bench

In the quest for an open source high-quality Claude alternative for agentic coding, this bench seems to be part of Anthropic's mix for evaluation.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment