Backed by Y Combinator logo

Evals for AI Agents That Actually Work

Define your tool calls, schemas, and prompts. ashr runs the full eval suite and gives you scores, failures with ideal answers, and complete tool call traces.

New Post Why's This Shit Broken? Our pivot story and why AI testing infrastructure is fundamentally broken. Read →

What We Do

ashr is a complete eval platform for AI agents. Enter your tool calls, schemas, and test prompts — we run your agent against every case and deliver scored results, failed test breakdowns with ideal answers, and full tool call path traces.

How It Works

1. Define Tool Calls

Register the tools your agent can use — function names, parameters, and expected return types. ashr understands your agent's capabilities.

2. Set Up Schemas & Prompts

Enter your data schemas and write test prompts. Define the ideal outputs and expected tool call sequences for each case.

3. Run the Eval

ashr runs your agent against every test case, scoring accuracy, tool selection, and output quality in real time.

4. Review & Iterate

Get scored results with failures broken down alongside ideal answers. See full tool call traces and track regressions across runs.

See What ashr Can Do

Eval scores, failed cases, and tool call paths — track every metric that matters. ashr surfaces regressions, shows ideal answers alongside failures, and traces every tool call so you know exactly where your agent breaks.

Define your agent's tools, schemas, and prompts:

tools: [get_balance, transfer, ...]
schema: { user_id: str, amount: float }
prompt: "Transfer $500 to savings"

ashr runs your agent against every configuration and scores the results.

Plug in our SDK. Ship fast, ship tested.

Questions? Let's Talk

Schedule a quick call. We'll show you how ashr can generate the test data your agents need and answer any questions.