Ashr Labs Python SDK

A Python SDK for the Ashr Labs platform. Two independent products share this SDK and one API key:

Testing Platform — generate datasets, run your agent against test scenarios offline, compare expected vs actual behavior, and submit graded results. The EvalRunner / RunBuilder / Agent / comparators surface.
Observability (separate product) — trace your agent's production behavior (LLM calls, tool invocations, retrieval steps, latency, errors). The client.trace() / Span / Generation surface. Traces are stored in Postgres and rendered in the Observability panel of the Ashr Labs dashboard. Requires the observability feature flag for your tenant.

Zero external dependencies — pure Python stdlib.

Quick Links

Testing Your Agent — start here if you're doing offline evals (end-to-end guide with EvalRunner, debugging failures)
Observability — Production Agent Tracing — client.trace(), spans, generations, analytics
VM Integration — browser/desktop agents with VM stream logging
Installation
Quick Start
Authentication
API Reference — EvalRunner, Agent, comparators, RunBuilder, client methods, observability methods
SDK Notes — platform advisories delivered to your SDK
Error Handling
Examples — testing platform examples + observability examples (full agent trace, error tracking, analytics dashboard)

Requirements

Python 3.10 or higher
No external dependencies required

Installation

pip install ashr-labs

Quick Example

Any agent with respond() and reset() methods works out of the box:

from ashr_labs import AshrLabsClient, EvalRunner

client = AshrLabsClient(api_key="tp_your_api_key_here")
runner = EvalRunner.from_dataset(client, dataset_id=322)
runner.run_and_deploy(my_agent, client, dataset_id=322)

Or with more control:

from ashr_labs import AshrLabsClient, EvalRunner

client = AshrLabsClient(api_key="tp_your_api_key_here")

# Generate a dataset
dataset_id, source = client.generate_dataset(
    request_name="My Agent Eval",
    config={ ... },  # Your agent config
)

# Run the eval with progress logging
runner = EvalRunner(source)
run = runner.run(my_agent, on_scenario=lambda sid, s: print(f"Running: {s['title']}"))

# Submit and wait for server-side grading
created = run.deploy(client, dataset_id=dataset_id)
graded = client.poll_run(created["id"])
metrics = graded["result"]["aggregate_metrics"]
print(f"Passed: {metrics['tests_passed']}/{metrics['total_tests']}")

Support

For issues and feature requests, reach out at support@ashr.io or book a call.

Quick Links​

Requirements​

Installation​

Quick Example​

Support​

Quick Links

Requirements

Installation

Quick Example

Support