tau-bench results
#5
by
quantizor
- opened
Thank you so much for sharing these models with the community! I was wondering if it would be possible for new models to test using tau-bench to evaluate tool calling performance? https://github.com/sierra-research/tau-bench
In the quest for an open source high-quality Claude alternative for agentic coding, this bench seems to be part of Anthropic's mix for evaluation.