Ashr Labs Python SDK
A Python SDK for evaluating AI agents against Ashr Labs test datasets. Generate datasets, run your agent against test scenarios, compare expected vs actual behavior, and submit results — all with zero external dependencies.
Quick Links
- Testing Your Agent — start here (end-to-end guide with EvalRunner, debugging failures)
- VM Integration — browser/desktop agents with VM stream logging
- Installation
- Quick Start
- Authentication
- API Reference — EvalRunner, Agent, comparators, RunBuilder, client methods
- SDK Notes — platform advisories delivered to your SDK
- Error Handling
- Examples
Requirements
- Python 3.10 or higher
- No external dependencies required
Installation
pip install ashr-labs
Quick Example
Any agent with respond() and reset() methods works out of the box:
from ashr_labs import AshrLabsClient, EvalRunner
client = AshrLabsClient(api_key="tp_your_api_key_here")
runner = EvalRunner.from_dataset(client, dataset_id=322)
runner.run_and_deploy(my_agent, client, dataset_id=322)
Or with more control:
from ashr_labs import AshrLabsClient, EvalRunner
client = AshrLabsClient(api_key="tp_your_api_key_here")
# Generate a dataset
dataset_id, source = client.generate_dataset(
request_name="My Agent Eval",
config={ ... }, # Your agent config
)
# Run the eval with progress logging
runner = EvalRunner(source)
run = runner.run(my_agent, on_scenario=lambda sid, s: print(f"Running: {s['title']}"))
# Submit and wait for server-side grading
created = run.deploy(client, dataset_id=dataset_id)
graded = client.poll_run(created["id"])
metrics = graded["result"]["aggregate_metrics"]
print(f"Passed: {metrics['tests_passed']}/{metrics['total_tests']}")
Support
For issues and feature requests, reach out at support@ashr.io or book a call.