“The evals tools we've looked at, including Braintrust, are built for single-turn benchmarking, not for teams like ours that need to evaluate complex, multi-turn conversational workflows across the full development lifecycle. That gap is a real blocker for us.”
Riley Jameson, Product Lead at Zuma