欢迎来到本站!

VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications

VitaBench is a challenging benchmark that evaluates agents on versatile interactive tasks grounded in real-world settings, comprising 66 tools and 400 tasks.

访问网站 https://vitabench.github.io
1 次点击
添加于 2026-06-30

VitaBench is a challenging benchmark that evaluates agents on versatile interactive tasks grounded in real-world settings, comprising 66 tools and 400 tasks.