tau-bench results

#5
by quantizor - opened

Thank you so much for sharing these models with the community! I was wondering if it would be possible for new models to test using tau-bench to evaluate tool calling performance? https://github.com/sierra-research/tau-bench

In the quest for an open source high-quality Claude alternative for agentic coding, this bench seems to be part of Anthropic's mix for evaluation.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment