API Reference
Complete reference for all classes and methods in the Ashr Labs TypeScript SDK.
AshrLabsClient
The main client class for interacting with the Ashr Labs API.
Constructor
new AshrLabsClient(
apiKey: string,
baseUrl?: string,
timeout?: number,
)
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
apiKey | string | Yes | - | Your API key (must start with tp_) |
baseUrl | string | No | Production URL | Base URL of the API |
timeout | number | No | 30 | Request timeout in seconds |
Throws:
Error: If the API key format is invalid
Example:
// Minimal — just pass your API key
const client = new AshrLabsClient("tp_your_key_here");
// Custom timeout
const client = new AshrLabsClient("tp_your_key_here", undefined, 60);
fromEnv (static method)
Create a client from environment variables.
AshrLabsClient.fromEnv(timeout?: number): AshrLabsClient
Reads ASHR_LABS_API_KEY (required) and ASHR_LABS_BASE_URL (optional) from the environment.
Throws:
Error: IfASHR_LABS_API_KEYis not set
Example:
// export ASHR_LABS_API_KEY="tp_your_key_here"
const client = AshrLabsClient.fromEnv();
Session Methods
init
Initialize a session and validate authentication.
async init(): Promise<Record<string, unknown>>
Returns: Session information containing user and tenant data
Throws:
AuthenticationError: If the API key is invalid or expired
Example:
// Validate credentials and get user/tenant info
const session = await client.init();
const user = session.user as Record<string, unknown>;
const tenant = session.tenant as Record<string, unknown>;
console.log(`User ID: ${user.id}`);
console.log(`Email: ${user.email}`);
console.log(`Tenant ID: ${tenant.id}`);
console.log(`Tenant Name: ${tenant.tenant_name}`);
Dataset Methods
getDataset
Retrieve a dataset by ID.
async getDataset(
datasetId: number,
includeSignedUrls?: boolean,
urlExpiresSeconds?: number,
): Promise<Record<string, unknown>>
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
datasetId | number | Yes | - | The ID of the dataset |
includeSignedUrls | boolean | No | false | Include signed S3 URLs for media |
urlExpiresSeconds | number | No | 3600 | URL expiration time in seconds |
Returns: The dataset object
Throws:
NotFoundError: Dataset not foundAuthorizationError: No access to this dataset
Example:
const dataset = await client.getDataset(42, true, 7200);
console.log(dataset.name);
listDatasets
List datasets for a tenant.
async listDatasets(
tenantId?: number | null,
limit?: number,
offset?: number,
includeSignedUrls?: boolean,
urlExpiresSeconds?: number,
): Promise<Record<string, unknown>>
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
tenantId | number | No | auto | The tenant ID (auto-resolved if omitted) |
limit | number | No | 50 | Maximum results to return |
offset | number | No | 0 | Number of results to skip |
includeSignedUrls | boolean | No | false | Include signed S3 URLs |
urlExpiresSeconds | number | No | 3600 | URL expiration time |
Returns: Object with keys:
status:"ok"datasets: Array of dataset objects
Example:
// tenantId auto-resolved from API key
const response = await client.listDatasets(undefined, 10);
const datasets = response.datasets as Record<string, unknown>[];
for (const dataset of datasets) {
console.log(`${dataset.id}: ${dataset.name}`);
}
Run Methods
createRun
Create a new test run.
async createRun(
datasetId: number,
result: Record<string, unknown>,
tenantId?: number | null,
runnerId?: number | null,
): Promise<Record<string, unknown>>
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
datasetId | number | Yes | - | The dataset ID |
result | Record<string, unknown> | Yes | - | Run results (metrics, status, etc.) |
tenantId | number | No | auto | The tenant ID (auto-resolved if omitted) |
runnerId | number | No | null | ID of user who ran the test |
Returns: The created run object
Example:
const run = await client.createRun(42, {
status: "passed",
score: 0.95,
metrics: {
accuracy: 0.98,
latency_ms: 150,
},
});
getRun
Retrieve a run by ID.
async getRun(runId: number): Promise<Record<string, unknown>>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
runId | number | Yes | The run ID |
Returns: The run object
Throws:
NotFoundError: Run not found
Example:
const run = await client.getRun(99);
const result = run.result as Record<string, unknown>;
console.log(`Score: ${result.score}`);
listRuns
List runs for a tenant or dataset.
async listRuns(
datasetId?: number | null,
tenantId?: number | null,
limit?: number,
offset?: number,
): Promise<Record<string, unknown>>
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
datasetId | number | No | null | Filter by dataset |
tenantId | number | No | auto | Filter by tenant (auto-resolved if omitted) |
limit | number | No | 50 | Maximum results |
offset | number | No | 0 | Results to skip |
Returns: Object with keys:
status:"ok"runs: Array of run objects
Example:
// Get runs for a specific dataset
const response = await client.listRuns(42);
const runs = response.runs as Record<string, unknown>[];
for (const run of runs) {
const result = run.result as Record<string, unknown>;
console.log(`Run #${run.id}: ${result.status}`);
}
deleteRun
Delete a test run.
async deleteRun(runId: number): Promise<Record<string, unknown>>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
runId | number | Yes | The run ID to delete |
Returns: Confirmation of deletion
Throws:
NotFoundError: Run not found
Example:
await client.deleteRun(99);
console.log("Run deleted");
Observability — Production Agent Tracing
Trace your agent's production behavior — LLM calls, tool invocations, retrieval steps, guardrail checks, and more. Traces are stored in Postgres and optionally forwarded to Langfuse.
Production-safe: tracing never rejects or interferes with your agent. If the backend is unreachable, trace.end() resolves with an error object instead of rejecting.
client.trace
const trace = client.trace(
name: string,
opts?: {
userId?: string;
sessionId?: string;
metadata?: Record<string, unknown>;
tags?: string[];
},
): Trace
| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Name for this trace (e.g. "handle-ticket") |
opts.userId | string | No | End-user ID for grouping |
opts.sessionId | string | No | Conversation/session ID |
opts.metadata | Record | No | Arbitrary metadata |
opts.tags | string[] | No | Tags for filtering |
Trace methods
| Method | Description |
|---|---|
trace.span(name, opts?) | Create a top-level span |
trace.generation(name, opts?) | Create a top-level generation (LLM call) |
trace.event(name, opts?) | Record a point-in-time event |
trace.wrap(fn) | Run callback, auto-flush on completion |
await trace.end(opts?) | Flush the trace to the backend. Never rejects. |
trace.traceId | Server-assigned trace ID (available after end()) |
Span methods
| Method | Description |
|---|---|
span.span(name, opts?) | Create a child span |
span.generation(name, opts?) | Create a child generation |
span.event(name, opts?) | Record an event under this span |
span.wrap(fn) | Run callback, auto-end on completion |
span.end(opts?) | Mark the span as complete |
If wrap() callback throws, the span auto-ends with level: "ERROR" and the exception message captured in statusMessage.
Generation methods
Inherits all Span methods, plus:
| Method | Description |
|---|---|
gen.end(opts?) | Mark complete. Accepts output, usage: { input_tokens, output_tokens }, statusMessage, level. |
wrap() pattern (recommended)
wrap() ensures spans/traces are always ended, even if your code throws:
await trace.wrap(async (t) => {
await t.span("tool:search", { input: { q: "..." } }).wrap(async (s) => {
const data = await search(...);
s.end({ output: data });
return data;
});
const gen = t.generation("respond", { model: "claude-sonnet-4-6" });
const response = await callLlm(...);
gen.end({ output: response, usage: { input_tokens: 100, output_tokens: 50 } });
});
// trace.end() called automatically
Manual instrumentation
const trace = client.trace("support-chat", { userId: "user_42", sessionId: "conv_abc" });
const gen = trace.generation("classify", { model: "claude-sonnet-4-6",
input: [{ role: "user", content: "Reset my password" }] });
gen.end({ output: { intent: "password_reset" },
usage: { input_tokens: 50, output_tokens: 12 } });
const tool = trace.span("tool:reset_password", { input: { user_id: "42" } });
tool.end({ output: { success: true } });
trace.event("guardrail-check", { input: { passed: true } });
const result = await trace.end({ output: { resolution: "password_reset_complete" } });
console.log(trace.traceId);
listObservabilityTraces
await client.listObservabilityTraces(opts?: {
userId?: string;
sessionId?: string;
limit?: number;
page?: number;
}): Promise<{ status: string; traces: object[]; total: number }>
getObservabilityTrace
await client.getObservabilityTrace(traceId: string): Promise<{ status: string; trace: object }>
getObservabilityAnalytics
await client.getObservabilityAnalytics(days?: number): Promise<{
status: string;
overview: { total_traces, avg_latency_ms, total_input_tokens, total_output_tokens, error_rate, total_tool_calls, ... };
tool_performance: { tool_name, total_calls, error_rate, avg_latency_ms }[];
model_usage: { model, total_calls, total_tokens, avg_latency_ms }[];
}>
getObservabilityErrors / getObservabilityToolErrors
await client.getObservabilityErrors(opts?: { days?: number; limit?: number; page?: number })
await client.getObservabilityToolErrors(opts?: { days?: number; limit?: number; page?: number })
SDK Notes — Platform Advisories
SDK Notes are platform advisories delivered to your SDK from Ashr Labs. They communicate context changes, best practices, deprecations, or breaking changes that may affect how you configure or run your agent.
Notes are automatically fetched when the client initializes (via init()).
You can also refresh them on demand.
client.notes (getter)
Get cached SDK notes from the last init() or getNotes() call. No network
request is made.
get notes(): Record<string, unknown>[]
Returns: List of active notes for your tenant.
Example:
const client = new AshrLabsClient("tp_...");
await client.init();
// Notes are auto-fetched on init
for (const note of client.notes) {
console.log(`[${note.severity}] ${note.title}: ${note.content}`);
}
getNotes
Fetch fresh SDK notes from the platform. Updates the cached client.notes.
async getNotes(agentId?: number | null): Promise<Record<string, unknown>[]>
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
agentId | number | null | No | undefined | Include notes targeted at this specific agent |
Returns: List of active notes (global + tenant-specific, plus agent-specific if agentId is provided).
Example:
// Refresh notes
const notes = await client.getNotes();
// Filter by agent
const notes = await client.getNotes(42);
// Check for breaking changes
const breaking = notes.filter(n => n.category === "breaking_change");
if (breaking.length) {
console.log("Warning: Breaking changes detected:");
for (const n of breaking) {
console.log(` ${n.title}: ${n.content}`);
}
}
Note categories: info, warning, breaking_change, best_practice, deprecation
Severity levels: info, warning, critical
Request Methods
createRequest
Create a dataset generation request.
async createRequest(
requestName: string,
request: Record<string, unknown>,
requestInputSchema?: Record<string, unknown> | null,
tenantId?: number | null,
requestorId?: number | null,
): Promise<Record<string, unknown>>
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
requestName | string | Yes | - | Name/title for the request |
request | Record<string, unknown> | Yes | - | The generation config (see below) |
requestInputSchema | Record<string, unknown> | No | auto | JSON Schema for validating the request. A permissive default is sent if omitted. If your agent has tools, include them here under the "tools" key so they're auto-saved as skill templates. |
tenantId | number | No | auto | The tenant ID (auto-resolved if omitted) |
requestorId | number | No | auto | ID of requesting user (auto-resolved if omitted) |
Returns: The created request object
Generation config structure (the request object):
{
metadata: {
dataset_name: "My Eval Dataset",
description: "Description of what this dataset tests",
},
agent: {
name: "My Agent",
description: "What the agent does",
system_prompt: "You are a helpful assistant...",
tools: [
{
name: "tool_name",
description: "What the tool does",
parameters: {
type: "object",
properties: { arg: { type: "string" } },
required: ["arg"],
},
},
],
accepted_inputs: { text: true, audio: false, file: false, image: false, video: false },
output_format: { type: "text" },
},
context: {
domain: "ecommerce",
use_case: "Customers asking about orders",
scenario_context: "An online retail store",
},
test_config: {
num_variations: 3, // how many test scenarios to generate
coverage: {
happy_path: true,
edge_cases: true,
error_handling: true,
multi_turn: true,
},
},
generation_options: {
generate_audio: false, // set true to generate audio files
generate_files: false,
generate_simulations: false,
},
}
Validation: The request config is validated before sending:
config.agentmust be a non-empty object with at least one of:name,description,system_promptconfig.contextmust be a non-empty object with at least one of:domain,use_case,scenario_context
Example:
const req = await client.createRequest(
"Support Agent Eval",
{
metadata: { dataset_name: "Support Eval" },
agent: {
name: "Support Bot",
description: "Answers customer questions",
system_prompt: "You are a helpful support agent.",
tools: [{ name: "lookup_order", description: "Look up an order",
parameters: { type: "object", properties: { order_id: { type: "string" } }, required: ["order_id"] } }],
accepted_inputs: { text: true, audio: false, file: false, image: false, video: false },
output_format: { type: "text" },
},
context: { domain: "ecommerce", use_case: "Customers asking about orders", scenario_context: "Online store" },
test_config: { num_variations: 3, coverage: { happy_path: true, edge_cases: true } },
generation_options: { generate_audio: false, generate_files: false, generate_simulations: false },
},
);
// Poll for completion
const completed = await client.waitForRequest(req.id as number);
console.log(`Status: ${completed.request_status}`);
getRequest
Retrieve a request by ID.
async getRequest(requestId: number): Promise<Record<string, unknown>>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
requestId | number | Yes | The request ID |
Returns: The request object
Throws:
NotFoundError: Request not found
Example:
const req = await client.getRequest(123);
console.log(`Status: ${req.request_status}`);
listRequests
List requests for a tenant.
async listRequests(
tenantId?: number | null,
status?: string | null,
limit?: number,
offset?: number,
): Promise<Record<string, unknown>>
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
tenantId | number | No | auto | The tenant ID (auto-resolved if omitted) |
status | string | No | null | Filter by status |
limit | number | No | 50 | Maximum results |
offset | number | No | 0 | Results to skip |
Returns: Object with keys:
status:"ok"requests: Array of request objects
Example:
// Get pending requests
const response = await client.listRequests(undefined, "pending");
const requests = response.requests as Record<string, unknown>[];
for (const req of requests) {
console.log(`Request #${req.id}: ${req.request_name}`);
}
API Key Methods
listApiKeys
List API keys for your tenant.
async listApiKeys(
includeInactive?: boolean,
): Promise<Record<string, unknown>[]>
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
includeInactive | boolean | No | false | Include revoked keys |
Returns: Array of API key objects
Note: For security, only the key prefix is returned, not the full key.
Example:
const keys = await client.listApiKeys();
for (const key of keys) {
console.log(`${key.key_prefix}... - ${key.name}`);
}
revokeApiKey
Revoke an API key.
async revokeApiKey(apiKeyId: number): Promise<Record<string, unknown>>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
apiKeyId | number | Yes | The API key ID to revoke |
Returns: Confirmation of revocation
Throws:
NotFoundError: API key not found
Example:
await client.revokeApiKey(123);
console.log("API key revoked");
Convenience Methods
waitForRequest
Block until a request reaches a terminal state (completed or failed).
async waitForRequest(
requestId: number,
timeout?: number,
pollInterval?: number,
): Promise<Record<string, unknown>>
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
requestId | number | Yes | - | The request ID to poll |
timeout | number | No | 600 | Maximum seconds to wait |
pollInterval | number | No | 5 | Seconds between polls |
Returns: The final request object
Throws:
Error: If the request doesn't finish withintimeoutsecondsAshrLabsError: If the request fails
Example:
const req = await client.createRequest("My Eval", config);
const completed = await client.waitForRequest(req.id as number, 300);
console.log(`Status: ${completed.request_status}`);
generateDataset
Create a dataset generation request, wait for completion, and fetch the result. Combines createRequest + waitForRequest + getDataset into one call.
async generateDataset(
requestName: string,
config: Record<string, unknown>,
requestInputSchema?: Record<string, unknown> | null,
timeout?: number,
pollInterval?: number,
): Promise<[number, Record<string, unknown>]>
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
requestName | string | Yes | - | Name/title for the request |
config | Record<string, unknown> | Yes | - | The generation config (same as createRequest's request parameter) |
requestInputSchema | Record<string, unknown> | No | auto | Optional JSON Schema for validation |
timeout | number | No | 600 | Maximum seconds to wait |
pollInterval | number | No | 5 | Seconds between polls |
Returns: A tuple of [datasetId, datasetSource] where datasetSource is the object containing "runs".
Throws:
Error: If generation doesn't finish in timeAshrLabsError: If generation fails or no datasets are found
Example:
const [datasetId, source] = await client.generateDataset(
"Support Agent Eval",
{
metadata: { dataset_name: "Support Eval" },
agent: { /* ... */ },
context: { /* ... */ },
test_config: { num_variations: 10, coverage: { happy_path: true } },
generation_options: { generate_audio: false, generate_files: false, generate_simulations: false },
},
);
const runs = (source.runs ?? {}) as Record<string, unknown>;
console.log(`Dataset #${datasetId}: ${Object.keys(runs).length} scenarios`);
Utility Methods
healthCheck
Check if the API is reachable.
async healthCheck(): Promise<Record<string, unknown>>
Returns: Status information
Example:
const status = await client.healthCheck();
console.log(`API Status: ${status.status}`);
toString
Get a string representation of the client.
toString(): string
Example:
console.log(client.toString());
// => AshrLabsClient(baseUrl='https://api.ashr.io/testing-platform-api', apiKey='tp_abc12...')
RunBuilder
A builder for incrementally constructing run result objects as an agent executes tests. Once complete, the result can be deployed via the client.
Constructor
new RunBuilder()
No parameters. Creates a run in "pending" status.
RunBuilder.start
Mark the run as started. Records the current timestamp.
run.start(): this
Returns: this (for chaining)
RunBuilder.addTest
Create and register a new test within this run.
run.addTest(testId: string): TestBuilder
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
testId | string | Yes | Unique identifier for the test case |
Returns: TestBuilder — A builder for the individual test
RunBuilder.complete
Mark the run as completed. Records the current timestamp.
run.complete(status?: string): this
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
status | string | No | "completed" | Final status ("completed" or "failed") |
Returns: this (for chaining)
RunBuilder.build
Serialize the full run result to an object.
run.build(): Record<string, unknown>
Returns: An object matching the run result schema, ready to be passed to client.createRun(datasetId, result). Aggregate metrics are computed automatically from action results.
RunBuilder.deploy
Build the result and submit it as a new run via the API.
run.deploy(
client: AshrLabsClient,
datasetId: number,
tenantId?: number,
runnerId?: number,
): Promise<Record<string, unknown>>
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
client | AshrLabsClient | Yes | - | An authenticated client instance |
datasetId | number | Yes | - | The dataset this run is for |
tenantId | number | No | auto | The tenant (auto-resolved if omitted) |
runnerId | number | No | undefined | ID of the user who ran the test |
Returns: The created run object from the API
Example:
import { AshrLabsClient, RunBuilder } from "ashr-labs";
const client = new AshrLabsClient("tp_...");
const run = new RunBuilder();
run.start();
const test = run.addTest("bank_analysis");
test.start();
test.addUserText("Analyze this", "User prompt");
test.addToolCall(
{ name: "analyze", arguments: { data: "input" } },
{ name: "analyze", arguments: { data: "input" } },
"exact",
);
test.complete();
run.complete();
const createdRun = await run.deploy(client, 42);
console.log(`Run #${createdRun.id} created`);
TestBuilder
Builds a single test result incrementally. Returned by RunBuilder.addTest().
TestBuilder.start
Mark the test as started. Records the current timestamp.
test.start(): this
Returns: this (for chaining)
TestBuilder.addUserFile
Record a user file input action.
test.addUserFile(
filePath: string,
description: string,
actionIndex?: number,
): this
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
filePath | string | Yes | - | Path to the file in the dataset |
description | string | Yes | - | Description of the action |
actionIndex | number | No | auto | Explicit index, or auto-incremented |
Returns: this (for chaining)
TestBuilder.addUserText
Record a user text input action.
test.addUserText(
text: string,
description: string,
actionIndex?: number,
): this
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text | string | Yes | - | The user's text input |
description | string | Yes | - | Description of the action |
actionIndex | number | No | auto | Explicit index, or auto-incremented |
Returns: this (for chaining)
TestBuilder.addToolCall
Record an agent tool call action with expected vs actual comparison.
test.addToolCall(
expected: Record<string, unknown>,
actual: Record<string, unknown>,
matchStatus: string,
divergenceNotes?: string | null,
actionIndex?: number,
): this
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
expected | Record<string, unknown> | Yes | - | Expected tool call (name, arguments) |
actual | Record<string, unknown> | Yes | - | Actual tool call made by the agent |
matchStatus | string | Yes | - | "exact", "partial", or "mismatch" |
divergenceNotes | string | No | null | Notes explaining the divergence |
actionIndex | number | No | auto | Explicit index, or auto-incremented |
Returns: this (for chaining)
TestBuilder.addAgentResponse
Record an agent text response with expected vs actual comparison.
test.addAgentResponse(
expectedResponse: Record<string, unknown>,
actualResponse: Record<string, unknown>,
matchStatus: string,
semanticSimilarity?: number | null,
divergenceNotes?: string | null,
actionIndex?: number,
): this
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
expectedResponse | Record<string, unknown> | Yes | - | The expected response content |
actualResponse | Record<string, unknown> | Yes | - | The actual response from the agent |
matchStatus | string | Yes | - | "exact", "similar", or "divergent" |
semanticSimilarity | number | No | null | Similarity score (0.0 to 1.0) |
divergenceNotes | string | No | null | Notes explaining the divergence |
actionIndex | number | No | auto | Explicit index, or auto-incremented |
Returns: this (for chaining)
TestBuilder.setVmStream
Attach VM session logs to this test. For agents that operate in a browser or virtual machine.
test.setVmStream(
provider: string,
opts?: {
sessionId?: string;
durationMs?: number;
logs?: Record<string, unknown>[];
metadata?: Record<string, unknown>;
},
): this
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
provider | string | Yes | - | VM provider name (e.g. "kernel", "browserbase", "steel") |
opts.sessionId | string | No | - | Provider session ID for linking |
opts.durationMs | number | No | - | Total session duration in milliseconds |
opts.logs | Record[] | No | - | Timestamped log entries (see below) |
opts.metadata | Record | No | - | Additional provider-specific metadata |
Log entry format: Each entry should have ts (number, ms offset from start) and type (string):
{ ts: 0, type: "navigation", data: { url: "https://..." } }
{ ts: 1200, type: "action", data: { action: "click", selector: "#btn" } }
{ ts: 3000, type: "error", data: { message: "Element not found" } }
Example:
test.setVmStream("browserbase", {
sessionId: "sess_abc123",
durationMs: 12000,
logs: [
{ ts: 0, type: "navigation", data: { url: "https://app.example.com" } },
{ ts: 2000, type: "action", data: { action: "click", selector: "#submit" } },
{ ts: 5000, type: "network", data: { method: "POST", url: "/api/order", status: 201 } },
],
});
Returns: this (for chaining)
TestBuilder.setKernelVm
Convenience method for attaching a Kernel browser session. Sets provider="kernel" and exposes Kernel-specific metadata fields. Fields map to Kernel's browser API response.
test.setKernelVm(
sessionId: string,
opts?: {
durationMs?: number;
logs?: Record<string, unknown>[];
liveViewUrl?: string;
cdpWsUrl?: string;
replayId?: string;
replayViewUrl?: string;
headless?: boolean;
stealth?: boolean;
viewport?: { width: number; height: number };
},
): this
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
sessionId | string | Yes | - | Kernel browser session ID |
opts.durationMs | number | No | - | Total session duration in milliseconds |
opts.logs | Record[] | No | - | Timestamped log entries (same format as setVmStream) |
opts.liveViewUrl | string | No | - | Remote live-view URL (browser_live_view_url) |
opts.cdpWsUrl | string | No | - | Chrome DevTools Protocol WebSocket URL |
opts.replayId | string | No | - | ID of the session recording |
opts.replayViewUrl | string | No | - | URL to view the session replay |
opts.headless | boolean | No | - | Whether the session ran in headless mode |
opts.stealth | boolean | No | - | Whether anti-bot stealth mode was enabled |
opts.viewport | object | No | - | Browser viewport, e.g. { width: 1920, height: 1080 } |
Example:
test.setKernelVm("kern_sess_abc123", {
durationMs: 15000,
logs: [
{ ts: 0, type: "navigation", data: { url: "https://app.example.com" } },
{ ts: 1200, type: "action", data: { action: "click", selector: "#login" } },
{ ts: 3000, type: "screenshot", data: { s3_key: "vm-streams/.../frame.png" } },
],
replayId: "replay_abc123",
replayViewUrl: "https://www.kernel.sh/replays/replay_abc123",
stealth: true,
viewport: { width: 1920, height: 1080 },
});
Returns: this (for chaining)
TestBuilder.complete
Mark the test as completed. Records the current timestamp.
test.complete(status?: string): this
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
status | string | No | "completed" | Final status ("completed" or "failed") |
Returns: this (for chaining)
TestBuilder.build
Serialize this test to an object matching the run result schema.
test.build(): Record<string, unknown>
Returns: An object with test_id, status, action_results, started_at, and completed_at.
EvalRunner
Runs an agent against every scenario in a dataset and records results. This is the high-level API that encapsulates the full eval loop — iterating scenarios, calling the agent, comparing tool calls and text, and producing a RunBuilder.
Constructor
new EvalRunner(
datasetSource: Record<string, unknown>,
options?: {
toolComparator?: ToolComparator;
textComparator?: TextComparator;
similarityThresholds?: { exact?: number; similar?: number };
},
)
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
datasetSource | Record<string, unknown> | Yes | - | The dataset_source object from a dataset (contains "runs") |
options.toolComparator | ToolComparator | No | compareToolArgs | Custom (expected, actual) => [status, notes] function |
options.textComparator | TextComparator | No | textSimilarity | Custom (textA, textB) => number function |
options.similarityThresholds | object | No | { exact: 0.70, similar: 0.40 } | Score thresholds for match status |
Type aliases:
type ToolComparator = (
expected: Record<string, unknown>,
actual: Record<string, unknown>,
) => [string, string | null];
type TextComparator = (a: string, b: string) => number;
Example:
import { EvalRunner } from "ashr-labs";
const runner = new EvalRunner(source);
// With custom thresholds
const runner = new EvalRunner(source, {
similarityThresholds: { exact: 0.85, similar: 0.50 },
});
EvalRunner.fromDataset (static method)
Create an EvalRunner by fetching a dataset from the API.
static async EvalRunner.fromDataset(
client: AshrLabsClient,
datasetId: number,
options?: {
toolComparator?: ToolComparator;
textComparator?: TextComparator;
similarityThresholds?: { exact?: number; similar?: number };
},
): Promise<EvalRunner>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
client | AshrLabsClient | Yes | An authenticated client |
datasetId | number | Yes | The dataset ID to fetch |
options | object | No | Passed to EvalRunner constructor |
Returns: EvalRunner — A configured runner ready to call .run()
Example:
const runner = await EvalRunner.fromDataset(client, 322);
EvalRunner.run
Run the agent against every scenario and return a populated RunBuilder.
async runner.run(
agent: Agent | (() => Agent),
options?: {
onScenario?: OnScenarioCallback;
onAction?: OnActionCallback;
maxWorkers?: number;
},
): Promise<RunBuilder>
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
agent | Agent | (() => Agent) | Yes | - | An object implementing the Agent interface, or a factory function |
options.onScenario | OnScenarioCallback | No | undefined | Called at the start of each scenario: (scenarioId, scenarioDict) |
options.onAction | OnActionCallback | No | undefined | Called for each action: (actionIndex, actionDict) |
options.maxWorkers | number | No | 1 | Number of scenarios to run in parallel. When >1, scenarioId is passed to respond() and reset() so the agent can key state per scenario. |
Type aliases:
type OnScenarioCallback = (scenarioId: string, scenario: Record<string, unknown>) => void;
type OnActionCallback = (actionIndex: number, action: Record<string, unknown>) => void;
Returns: RunBuilder — A populated builder ready for .build() or .deploy()
Example:
// Sequential (default)
const run = await runner.run(agent);
const result = run.build();
console.log(result.aggregate_metrics);
// Parallel — run 4 scenarios at a time
const run = await runner.run(agent, { maxWorkers: 4 });
// With factory function
const run = await runner.run(() => new MyAgent(), { maxWorkers: 4 });
EvalRunner.runAndDeploy
Run the eval and submit results in one call.
async runner.runAndDeploy(
agent: Agent | (() => Agent),
client: AshrLabsClient,
datasetId?: number,
options?: {
onScenario?: OnScenarioCallback;
onAction?: OnActionCallback;
maxWorkers?: number;
tenantId?: number;
runnerId?: number;
},
): Promise<Record<string, unknown>>
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
agent | Agent | (() => Agent) | Yes | - | An object implementing the Agent interface, or a factory function |
client | AshrLabsClient | Yes | - | An authenticated client |
datasetId | number | No | undefined | The dataset to submit against |
options.onScenario | OnScenarioCallback | No | undefined | Callback per scenario |
options.onAction | OnActionCallback | No | undefined | Callback per action |
options.maxWorkers | number | No | 1 | Number of scenarios to run in parallel (default sequential) |
options.tenantId | number | No | auto | The tenant (auto-resolved if omitted) |
options.runnerId | number | No | undefined | ID of the user who ran the test |
Returns: The created run object from the API
Example:
// Sequential
const created = await runner.runAndDeploy(agent, client, 322);
console.log(`Run #${created.id} submitted`);
// Parallel
const created = await runner.runAndDeploy(agent, client, 322, { maxWorkers: 4 });
Agent Interface
An interface that defines the contract agents must implement.
interface Agent {
respond(
message: string,
scenarioId?: string,
): Record<string, unknown> | Promise<Record<string, unknown>>;
reset(scenarioId?: string): void | Promise<void>;
}
respond
Process a user message and return the agent's response.
Parameters:
| Parameter | Type | Description |
|---|---|---|
message | string | The user's message text |
scenarioId | string | Optional scenario ID (passed during parallel execution) |
Returns: An object (or Promise of an object) with:
"text"(string): The agent's text response"tool_calls"(Array): Tool calls made during this turn, each with"name"(string) and"arguments"(object) keys
argumentsvsarguments_json: The Agent interface returns tool arguments as an object under the"arguments"key. However,RunBuilderand the API store them as a JSON string under"arguments_json".EvalRunnerhandles this conversion automatically. If you useRunBuilderdirectly, pass"arguments_json"(a JSON string) toaddToolCall(). TheextractToolArgs()helper accepts both formats, so comparators work either way.
reset
Clear conversation state for a new scenario. Called before each scenario begins.
Parameters:
| Parameter | Type | Description |
|---|---|---|
scenarioId | string | Optional scenario ID (passed during parallel execution) |
Comparator Functions
All comparator functions are standalone and importable from the top-level package.
stripMarkdown
Remove markdown formatting from text.
stripMarkdown(text: string): string
Removes bold/italic markers, headers, bullets, and markdown links. Collapses whitespace.
Example:
stripMarkdown("**Bold** and [link](https://x.com)");
// => "Bold and link"
tokenize
Lowercase, strip markdown and punctuation, split into word tokens.
tokenize(text: string): string[]
Example:
tokenize("Order **ORD-123** shipped!");
// => ["order", "ord123", "shipped"]
fuzzyStrMatch
Check if two strings are semantically close enough to count as matching.
fuzzyStrMatch(a: string, b: string, threshold?: number): boolean
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
a | string | Yes | - | First string |
b | string | Yes | - | Second string |
threshold | number | No | adaptive | Word-overlap threshold. If undefined: 0.35 for <=5 words, 0.40 for <=8, 0.55 otherwise |
Returns: true if the strings match closely enough
Checks in order: exact match after normalization, containment, then word-set overlap.
Example:
fuzzyStrMatch("Customer wants a refund", "customer wants refund"); // true
fuzzyStrMatch("apple banana", "cherry grape"); // false
extractToolArgs
Extract arguments from a tool call object, handling both formats.
extractToolArgs(toolCall: Record<string, unknown>): Record<string, unknown>
Handles { arguments: {...} } (object form) and { arguments_json: "..." } (JSON string form). Prefers the object form if both are present.
Example:
extractToolArgs({ arguments_json: '{"order_id": "ORD-123"}' });
// => { order_id: "ORD-123" }
extractToolArgs({ arguments: { order_id: "ORD-123" } });
// => { order_id: "ORD-123" }
compareToolArgs
Compare expected vs actual tool call arguments.
compareToolArgs(
expected: Record<string, unknown>,
actual: Record<string, unknown>,
): [string, string | null]
Parameters:
| Parameter | Type | Description |
|---|---|---|
expected | Record<string, unknown> | Expected tool call (with arguments or arguments_json) |
actual | Record<string, unknown> | Actual tool call made by the agent |
Returns: A tuple of [matchStatus, divergenceNotes]:
matchStatus:"exact","partial", or"mismatch"divergenceNotes: Human-readable diff summary, ornullif exact
String arguments are compared using fuzzyStrMatch. Non-string values use JSON.stringify equality. Extra arguments in the actual call don't cause divergence.
Example:
const [status, notes] = compareToolArgs(
{ arguments: { order_id: "ORD-123" } },
{ arguments: { order_id: "ORD-123", extra: "field" } },
);
// => ["exact", null]
const [status, notes] = compareToolArgs(
{ arguments: { order_id: "ORD-123", reason: "damaged item" } },
{ arguments: { order_id: "ORD-999", reason: "item was damaged" } },
);
// => ["partial", "'order_id': expected='ORD-123' actual='ORD-999'"]
textSimilarity
Compute similarity between two text strings.
textSimilarity(textA: string, textB: string): number
Returns: A number between 0.0 and 1.0
Uses cosine similarity on word frequency vectors, plus:
- Entity bonus (+0.20): for matching order IDs (
ORD-*), refund IDs (REF-*), prices ($*), dates (YYYY-MM-DD), and tracking URLs - Concept bonus (+0.10): for matching domain concepts (refund/credited, shipped/transit/delivered, stock/available, etc.)
Example:
textSimilarity(
"Your order ORD-123 has shipped and is on the way",
"Order ORD-123 has been shipped and is in transit",
);
// => 0.78
Data Types
User
interface User {
id?: number;
created_at?: string;
email?: string;
name?: string | null;
tenant?: number;
is_active?: boolean;
}
Tenant
interface Tenant {
id?: number;
created_at?: string;
tenant_name?: string;
is_active?: boolean;
}
Session
interface Session {
status: string;
user: User;
tenant: Tenant;
}
Dataset
interface Dataset {
id?: number;
created_at?: string;
tenant?: number;
creator?: number;
name?: string;
description?: string | null;
dataset_source?: Record<string, unknown>;
}
Run
interface Run {
id?: number;
created_at?: string;
dataset?: number;
tenant?: number;
runner?: number;
result?: Record<string, unknown>;
}
SdkNote
interface SdkNote {
id?: number;
created_at?: string;
updated_at?: string;
title?: string;
content?: string;
category?: string; // "info" | "warning" | "breaking_change" | "best_practice" | "deprecation"
severity?: string; // "info" | "warning" | "critical"
tenant_id?: number | null;
agent_id?: number | null;
active_from?: string;
expires_at?: string | null;
is_archived?: boolean;
note_metadata?: Record<string, unknown>;
}
Request
interface Request {
id?: number;
created_at?: string;
requestor_id?: number;
requestor_tenant?: number;
request_name?: string;
request_status?: string;
request_input_schema?: Record<string, unknown> | null;
request?: Record<string, unknown>;
}
APIKey
interface APIKey {
id?: number;
key?: string; // Only present on creation
key_prefix?: string;
name?: string;
scopes?: string[];
user_id?: number;
tenant_id?: number;
created_at?: string;
last_used_at?: string | null;
expires_at?: string | null;
is_active?: boolean;
}
ToolCall
interface ToolCall {
name?: string;
arguments_json?: string;
}
ExpectedResponse
interface ExpectedResponse {
tool_calls?: ToolCall[];
text?: string;
}
Action
interface Action {
actor?: string; // "user" or "agent"
content?: string;
name?: string;
expected_response?: ExpectedResponse;
}
Scenario
interface Scenario {
title?: string;
actions?: Action[];
}