API Reference

Complete reference for all classes and methods in the Ashr Labs TypeScript SDK.

AshrLabsClient

The main client class for interacting with the Ashr Labs API.

Constructor

new AshrLabsClient(
  apiKey: string,
  baseUrl?: string,
  timeout?: number,
)

Parameters:

Parameter	Type	Required	Default	Description
`apiKey`	`string`	Yes	-	Your API key (must start with `tp_`)
`baseUrl`	`string`	No	Production URL	Base URL of the API
`timeout`	`number`	No	`30`	Request timeout in seconds

Throws:

Error: If the API key format is invalid

Example:

// Minimal — just pass your API key
const client = new AshrLabsClient("tp_your_key_here");

// Custom timeout
const client = new AshrLabsClient("tp_your_key_here", undefined, 60);

fromEnv (static method)

Create a client from environment variables.

AshrLabsClient.fromEnv(timeout?: number): AshrLabsClient

Reads ASHR_LABS_API_KEY (required) and ASHR_LABS_BASE_URL (optional) from the environment.

Throws:

Error: If ASHR_LABS_API_KEY is not set

Example:

// export ASHR_LABS_API_KEY="tp_your_key_here"
const client = AshrLabsClient.fromEnv();

Session Methods

init

Initialize a session and validate authentication.

async init(): Promise<Record<string, unknown>>

Returns: Session information containing user and tenant data

Throws:

AuthenticationError: If the API key is invalid or expired

Example:

// Validate credentials and get user/tenant info
const session = await client.init();

const user = session.user as Record<string, unknown>;
const tenant = session.tenant as Record<string, unknown>;
console.log(`User ID: ${user.id}`);
console.log(`Email: ${user.email}`);
console.log(`Tenant ID: ${tenant.id}`);
console.log(`Tenant Name: ${tenant.tenant_name}`);

Dataset Methods

getDataset

Retrieve a dataset by ID.

async getDataset(
  datasetId: number,
  includeSignedUrls?: boolean,
  urlExpiresSeconds?: number,
): Promise<Record<string, unknown>>

Parameters:

Parameter	Type	Required	Default	Description
`datasetId`	`number`	Yes	-	The ID of the dataset
`includeSignedUrls`	`boolean`	No	`false`	Include signed S3 URLs for media
`urlExpiresSeconds`	`number`	No	`3600`	URL expiration time in seconds

Returns: The dataset object

Throws:

NotFoundError: Dataset not found
AuthorizationError: No access to this dataset

Example:

const dataset = await client.getDataset(42, true, 7200);
console.log(dataset.name);

listDatasets

List datasets for a tenant.

async listDatasets(
  tenantId?: number | null,
  limit?: number,
  offset?: number,
  includeSignedUrls?: boolean,
  urlExpiresSeconds?: number,
): Promise<Record<string, unknown>>

Parameters:

Parameter	Type	Required	Default	Description
`tenantId`	`number`	No	auto	The tenant ID (auto-resolved if omitted)
`limit`	`number`	No	`50`	Maximum results to return
`offset`	`number`	No	`0`	Number of results to skip
`includeSignedUrls`	`boolean`	No	`false`	Include signed S3 URLs
`urlExpiresSeconds`	`number`	No	`3600`	URL expiration time

Returns: Object with keys:

status: "ok"
datasets: Array of dataset objects

Example:

// tenantId auto-resolved from API key
const response = await client.listDatasets(undefined, 10);
const datasets = response.datasets as Record<string, unknown>[];
for (const dataset of datasets) {
  console.log(`${dataset.id}: ${dataset.name}`);
}

Run Methods

createRun

Create a new test run.

async createRun(
  datasetId: number,
  result: Record<string, unknown>,
  tenantId?: number | null,
  runnerId?: number | null,
): Promise<Record<string, unknown>>

Parameters:

Parameter	Type	Required	Default	Description
`datasetId`	`number`	Yes	-	The dataset ID
`result`	`Record<string, unknown>`	Yes	-	Run results (metrics, status, etc.)
`tenantId`	`number`	No	auto	The tenant ID (auto-resolved if omitted)
`runnerId`	`number`	No	`null`	ID of user who ran the test

Returns: The created run object

Example:

const run = await client.createRun(42, {
  status: "passed",
  score: 0.95,
  metrics: {
    accuracy: 0.98,
    latency_ms: 150,
  },
});

getRun

Retrieve a run by ID.

async getRun(runId: number): Promise<Record<string, unknown>>

Parameters:

Parameter	Type	Required	Description
`runId`	`number`	Yes	The run ID

Returns: The run object

Throws:

NotFoundError: Run not found

Example:

const run = await client.getRun(99);
const result = run.result as Record<string, unknown>;
console.log(`Score: ${result.score}`);

listRuns

List runs for a tenant or dataset.

async listRuns(
  datasetId?: number | null,
  tenantId?: number | null,
  limit?: number,
  offset?: number,
): Promise<Record<string, unknown>>

Parameters:

Parameter	Type	Required	Default	Description
`datasetId`	`number`	No	`null`	Filter by dataset
`tenantId`	`number`	No	auto	Filter by tenant (auto-resolved if omitted)
`limit`	`number`	No	`50`	Maximum results
`offset`	`number`	No	`0`	Results to skip

Returns: Object with keys:

status: "ok"
runs: Array of run objects

Example:

// Get runs for a specific dataset
const response = await client.listRuns(42);
const runs = response.runs as Record<string, unknown>[];
for (const run of runs) {
  const result = run.result as Record<string, unknown>;
  console.log(`Run #${run.id}: ${result.status}`);
}

deleteRun

Delete a test run.

async deleteRun(runId: number): Promise<Record<string, unknown>>

Parameters:

Parameter	Type	Required	Description
`runId`	`number`	Yes	The run ID to delete

Returns: Confirmation of deletion

Throws:

NotFoundError: Run not found

Example:

await client.deleteRun(99);
console.log("Run deleted");

Observability — Production Agent Tracing

Trace your agent's production behavior — LLM calls, tool invocations, retrieval steps, guardrail checks, and more. Traces are stored in Postgres and rendered in the Observability panel of the Ashr Labs dashboard.

Production-safe: tracing never rejects or interferes with your agent. If the backend is unreachable, trace.end() resolves with an error object instead of rejecting.

client.trace

const trace = client.trace(
  name: string,
  opts?: {
    userId?: string;
    sessionId?: string;
    metadata?: Record<string, unknown>;
    tags?: string[];
  },
): Trace

Parameter	Type	Required	Description
`name`	`string`	Yes	Name for this trace (e.g. `"handle-ticket"`)
`opts.userId`	`string`	No	End-user ID for grouping
`opts.sessionId`	`string`	No	Conversation/session ID
`opts.metadata`	`Record`	No	Arbitrary metadata
`opts.tags`	`string[]`	No	Tags for filtering

Trace methods

Method	Description
`trace.span(name, opts?)`	Create a top-level span
`trace.generation(name, opts?)`	Create a top-level generation (LLM call)
`trace.event(name, opts?)`	Record a point-in-time event
`trace.wrap(fn)`	Run callback, auto-flush on completion
`await trace.end(opts?)`	Flush the trace to the backend. Never rejects.
`trace.traceId`	Server-assigned trace ID (available after `end()`)

Span methods

Method	Description
`span.span(name, opts?)`	Create a child span
`span.generation(name, opts?)`	Create a child generation
`span.event(name, opts?)`	Record an event under this span
`span.wrap(fn)`	Run callback, auto-end on completion
`span.end(opts?)`	Mark the span as complete

If wrap() callback throws, the span auto-ends with level: "ERROR" and the exception message captured in statusMessage.

Generation methods

Inherits all Span methods, plus:

Method	Description
`gen.end(opts?)`	Mark complete. Accepts `output`, `usage: { input_tokens, output_tokens }`, `statusMessage`, `level`.

wrap() pattern (recommended)

wrap() ensures spans/traces are always ended, even if your code throws:

await trace.wrap(async (t) => {
  await t.span("tool:search", { input: { q: "..." } }).wrap(async (s) => {
    const data = await search(...);
    s.end({ output: data });
    return data;
  });

  const gen = t.generation("respond", { model: "claude-sonnet-4-6" });
  const response = await callLlm(...);
  gen.end({ output: response, usage: { input_tokens: 100, output_tokens: 50 } });
});
// trace.end() called automatically

Manual instrumentation

const trace = client.trace("support-chat", { userId: "user_42", sessionId: "conv_abc" });

const gen = trace.generation("classify", { model: "claude-sonnet-4-6",
  input: [{ role: "user", content: "Reset my password" }] });
gen.end({ output: { intent: "password_reset" },
  usage: { input_tokens: 50, output_tokens: 12 } });

const tool = trace.span("tool:reset_password", { input: { user_id: "42" } });
tool.end({ output: { success: true } });

trace.event("guardrail-check", { input: { passed: true } });

const result = await trace.end({ output: { resolution: "password_reset_complete" } });
console.log(trace.traceId);

listObservabilityTraces

await client.listObservabilityTraces(opts?: {
  userId?: string;
  sessionId?: string;
  limit?: number;
  page?: number;
}): Promise<{ status: string; traces: object[]; total: number }>

getObservabilityTrace

await client.getObservabilityTrace(traceId: string): Promise<{ status: string; trace: object }>

getObservabilityAnalytics

await client.getObservabilityAnalytics(days?: number): Promise<{
  status: string;
  overview: { total_traces, avg_latency_ms, total_input_tokens, total_output_tokens, error_rate, total_tool_calls, ... };
  tool_performance: { tool_name, total_calls, error_rate, avg_latency_ms }[];
  model_usage: { model, total_calls, total_tokens, avg_latency_ms }[];
}>

getObservabilityErrors / getObservabilityToolErrors

await client.getObservabilityErrors(opts?: { days?: number; limit?: number; page?: number })
await client.getObservabilityToolErrors(opts?: { days?: number; limit?: number; page?: number })

SDK Notes — Platform Advisories

SDK Notes are platform advisories delivered to your SDK from Ashr Labs. They communicate context changes, best practices, deprecations, or breaking changes that may affect how you configure or run your agent.

Notes are automatically fetched when the client initializes (via init()). You can also refresh them on demand.

client.notes (getter)

Get cached SDK notes from the last init() or getNotes() call. No network request is made.

get notes(): Record<string, unknown>[]

Returns: List of active notes for your tenant.

Example:

const client = new AshrLabsClient("tp_...");
await client.init();

// Notes are auto-fetched on init
for (const note of client.notes) {
  console.log(`[${note.severity}] ${note.title}: ${note.content}`);
}

getNotes

Fetch fresh SDK notes from the platform. Updates the cached client.notes.

async getNotes(agentId?: number | null): Promise<Record<string, unknown>[]>

Parameters:

Parameter	Type	Required	Default	Description
`agentId`	`number \| null`	No	`undefined`	Include notes targeted at this specific agent

Returns: List of active notes (global + tenant-specific, plus agent-specific if agentId is provided).

Example:

// Refresh notes
const notes = await client.getNotes();

// Filter by agent
const notes = await client.getNotes(42);

// Check for breaking changes
const breaking = notes.filter(n => n.category === "breaking_change");
if (breaking.length) {
  console.log("Warning: Breaking changes detected:");
  for (const n of breaking) {
    console.log(`  ${n.title}: ${n.content}`);
  }
}

Note categories: info, warning, breaking_change, best_practice, deprecation

Severity levels: info, warning, critical

Request Methods

createRequest

Create a dataset generation request.

async createRequest(
  requestName: string,
  request: Record<string, unknown>,
  requestInputSchema?: Record<string, unknown> | null,
  tenantId?: number | null,
  requestorId?: number | null,
): Promise<Record<string, unknown>>

Parameters:

Parameter	Type	Required	Default	Description
`requestName`	`string`	Yes	-	Name/title for the request
`request`	`Record<string, unknown>`	Yes	-	The generation config (see below)
`requestInputSchema`	`Record<string, unknown>`	No	auto	JSON Schema for validating the request. A permissive default is sent if omitted. If your agent has tools, include them here under the `"tools"` key so they're auto-saved as skill templates.
`tenantId`	`number`	No	auto	The tenant ID (auto-resolved if omitted)
`requestorId`	`number`	No	auto	ID of requesting user (auto-resolved if omitted)

Returns: The created request object

Generation config structure (the request object):

{
  metadata: {
    dataset_name: "My Eval Dataset",
    description: "Description of what this dataset tests",
  },
  agent: {
    name: "My Agent",
    description: "What the agent does",
    system_prompt: "You are a helpful assistant...",
    tools: [
      {
        name: "tool_name",
        description: "What the tool does",
        parameters: {
          type: "object",
          properties: { arg: { type: "string" } },
          required: ["arg"],
        },
      },
    ],
    accepted_inputs: { text: true, audio: false, file: false, image: false, video: false },
    output_format: { type: "text" },
  },
  context: {
    domain: "ecommerce",
    use_case: "Customers asking about orders",
    scenario_context: "An online retail store",
  },
  test_config: {
    num_variations: 3,       // how many test scenarios to generate
    coverage: {
      happy_path: true,
      edge_cases: true,
      error_handling: true,
      multi_turn: true,
    },
  },
  generation_options: {
    generate_audio: false,   // set true to generate audio files
    generate_files: false,
    generate_simulations: false,
  },
}

Validation: The request config is validated before sending:

config.agent must be a non-empty object with at least one of: name, description, system_prompt
config.context must be a non-empty object with at least one of: domain, use_case, scenario_context

Example:

const req = await client.createRequest(
  "Support Agent Eval",
  {
    metadata: { dataset_name: "Support Eval" },
    agent: {
      name: "Support Bot",
      description: "Answers customer questions",
      system_prompt: "You are a helpful support agent.",
      tools: [{ name: "lookup_order", description: "Look up an order",
                parameters: { type: "object", properties: { order_id: { type: "string" } }, required: ["order_id"] } }],
      accepted_inputs: { text: true, audio: false, file: false, image: false, video: false },
      output_format: { type: "text" },
    },
    context: { domain: "ecommerce", use_case: "Customers asking about orders", scenario_context: "Online store" },
    test_config: { num_variations: 3, coverage: { happy_path: true, edge_cases: true } },
    generation_options: { generate_audio: false, generate_files: false, generate_simulations: false },
  },
);

// Poll for completion
const completed = await client.waitForRequest(req.id as number);
console.log(`Status: ${completed.request_status}`);

getRequest

Retrieve a request by ID.

async getRequest(requestId: number): Promise<Record<string, unknown>>

Parameters:

Parameter	Type	Required	Description
`requestId`	`number`	Yes	The request ID

Returns: The request object

Throws:

NotFoundError: Request not found

Example:

const req = await client.getRequest(123);
console.log(`Status: ${req.request_status}`);

listRequests

List requests for a tenant.

async listRequests(
  tenantId?: number | null,
  status?: string | null,
  limit?: number,
  offset?: number,
): Promise<Record<string, unknown>>

Parameters:

Parameter	Type	Required	Default	Description
`tenantId`	`number`	No	auto	The tenant ID (auto-resolved if omitted)
`status`	`string`	No	`null`	Filter by status
`limit`	`number`	No	`50`	Maximum results
`offset`	`number`	No	`0`	Results to skip

Returns: Object with keys:

status: "ok"
requests: Array of request objects

Example:

// Get pending requests
const response = await client.listRequests(undefined, "pending");
const requests = response.requests as Record<string, unknown>[];
for (const req of requests) {
  console.log(`Request #${req.id}: ${req.request_name}`);
}

API Key Methods

listApiKeys

List API keys for your tenant.

async listApiKeys(
  includeInactive?: boolean,
): Promise<Record<string, unknown>[]>

Parameters:

Parameter	Type	Required	Default	Description
`includeInactive`	`boolean`	No	`false`	Include revoked keys

Returns: Array of API key objects

Note: For security, only the key prefix is returned, not the full key.

Example:

const keys = await client.listApiKeys();
for (const key of keys) {
  console.log(`${key.key_prefix}... - ${key.name}`);
}

revokeApiKey

Revoke an API key.

async revokeApiKey(apiKeyId: number): Promise<Record<string, unknown>>

Parameters:

Parameter	Type	Required	Description
`apiKeyId`	`number`	Yes	The API key ID to revoke

Returns: Confirmation of revocation

Throws:

NotFoundError: API key not found

Example:

await client.revokeApiKey(123);
console.log("API key revoked");

Convenience Methods

waitForRequest

Block until a request reaches a terminal state (completed or failed).

async waitForRequest(
  requestId: number,
  timeout?: number,
  pollInterval?: number,
): Promise<Record<string, unknown>>

Parameters:

Parameter	Type	Required	Default	Description
`requestId`	`number`	Yes	-	The request ID to poll
`timeout`	`number`	No	`600`	Maximum seconds to wait
`pollInterval`	`number`	No	`5`	Seconds between polls

Returns: The final request object

Throws:

Error: If the request doesn't finish within timeout seconds
AshrLabsError: If the request fails

Example:

const req = await client.createRequest("My Eval", config);
const completed = await client.waitForRequest(req.id as number, 300);
console.log(`Status: ${completed.request_status}`);

generateDataset

Create a dataset generation request, wait for completion, and fetch the result. Combines createRequest + waitForRequest + getDataset into one call.

async generateDataset(
  requestName: string,
  config: Record<string, unknown>,
  requestInputSchema?: Record<string, unknown> | null,
  timeout?: number,
  pollInterval?: number,
): Promise<[number, Record<string, unknown>]>

Parameters:

Parameter	Type	Required	Default	Description
`requestName`	`string`	Yes	-	Name/title for the request
`config`	`Record<string, unknown>`	Yes	-	The generation config (same as `createRequest`'s `request` parameter)
`requestInputSchema`	`Record<string, unknown>`	No	auto	Optional JSON Schema for validation
`timeout`	`number`	No	`600`	Maximum seconds to wait
`pollInterval`	`number`	No	`5`	Seconds between polls

Returns: A tuple of [datasetId, datasetSource] where datasetSource is the object containing "runs".

Throws:

Error: If generation doesn't finish in time
AshrLabsError: If generation fails or no datasets are found

Example:

const [datasetId, source] = await client.generateDataset(
  "Support Agent Eval",
  {
    metadata: { dataset_name: "Support Eval" },
    agent: { /* ... */ },
    context: { /* ... */ },
    test_config: { num_variations: 10, coverage: { happy_path: true } },
    generation_options: { generate_audio: false, generate_files: false, generate_simulations: false },
  },
);
const runs = (source.runs ?? {}) as Record<string, unknown>;
console.log(`Dataset #${datasetId}: ${Object.keys(runs).length} scenarios`);

Utility Methods

healthCheck

Check if the API is reachable.

async healthCheck(): Promise<Record<string, unknown>>

Returns: Status information

Example:

const status = await client.healthCheck();
console.log(`API Status: ${status.status}`);

toString

Get a string representation of the client.

toString(): string

Example:

console.log(client.toString());
// => AshrLabsClient(baseUrl='https://api.ashr.io/testing-platform-api', apiKey='tp_abc12...')

RunBuilder

A builder for incrementally constructing run result objects as an agent executes tests. Once complete, the result can be deployed via the client.

Constructor

new RunBuilder()

No parameters. Creates a run in "pending" status.

RunBuilder.start

Mark the run as started. Records the current timestamp.

run.start(): this

Returns: this (for chaining)

RunBuilder.addTest

Create and register a new test within this run.

run.addTest(testId: string): TestBuilder

Parameters:

Parameter	Type	Required	Description
`testId`	`string`	Yes	Unique identifier for the test case

Returns: TestBuilder — A builder for the individual test

RunBuilder.complete

Mark the run as completed. Records the current timestamp.

run.complete(status?: string): this

Parameters:

Parameter	Type	Required	Default	Description
`status`	`string`	No	`"completed"`	Final status (`"completed"` or `"failed"`)

Returns: this (for chaining)

RunBuilder.build

Serialize the full run result to an object.

run.build(): Record<string, unknown>

Returns: An object matching the run result schema, ready to be passed to client.createRun(datasetId, result). Aggregate metrics are computed automatically from action results.

RunBuilder.deploy

Build the result and submit it as a new run via the API.

run.deploy(
  client: AshrLabsClient,
  datasetId: number,
  tenantId?: number,
  runnerId?: number,
): Promise<Record<string, unknown>>

Parameters:

Parameter	Type	Required	Default	Description
`client`	`AshrLabsClient`	Yes	-	An authenticated client instance
`datasetId`	`number`	Yes	-	The dataset this run is for
`tenantId`	`number`	No	auto	The tenant (auto-resolved if omitted)
`runnerId`	`number`	No	`undefined`	ID of the user who ran the test

Returns: The created run object from the API

Example:

import { AshrLabsClient, RunBuilder } from "ashr-labs";

const client = new AshrLabsClient("tp_...");

const run = new RunBuilder();
run.start();

const test = run.addTest("bank_analysis");
test.start();
test.addUserText("Analyze this", "User prompt");
test.addToolCall(
  { name: "analyze", arguments: { data: "input" } },
  { name: "analyze", arguments: { data: "input" } },
  "exact",
);
test.complete();

run.complete();
const createdRun = await run.deploy(client, 42);
console.log(`Run #${createdRun.id} created`);

TestBuilder

Builds a single test result incrementally. Returned by RunBuilder.addTest().

TestBuilder.start

Mark the test as started. Records the current timestamp.

test.start(): this

Returns: this (for chaining)

TestBuilder.addUserFile

Record a user file input action.

test.addUserFile(
  filePath: string,
  description: string,
  actionIndex?: number,
): this

Parameters:

Parameter	Type	Required	Default	Description
`filePath`	`string`	Yes	-	Path to the file in the dataset
`description`	`string`	Yes	-	Description of the action
`actionIndex`	`number`	No	auto	Explicit index, or auto-incremented

Returns: this (for chaining)

TestBuilder.addUserText

Record a user text input action.

test.addUserText(
  text: string,
  description: string,
  actionIndex?: number,
): this

Parameters:

Parameter	Type	Required	Default	Description
`text`	`string`	Yes	-	The user's text input
`description`	`string`	Yes	-	Description of the action
`actionIndex`	`number`	No	auto	Explicit index, or auto-incremented

Returns: this (for chaining)

TestBuilder.addToolCall

Record an agent tool call action with expected vs actual comparison.

test.addToolCall(
  expected: Record<string, unknown>,
  actual: Record<string, unknown>,
  matchStatus: string,
  divergenceNotes?: string | null,
  actionIndex?: number,
): this

Parameters:

Parameter	Type	Required	Default	Description
`expected`	`Record<string, unknown>`	Yes	-	Expected tool call (`name`, `arguments`)
`actual`	`Record<string, unknown>`	Yes	-	Actual tool call made by the agent
`matchStatus`	`string`	Yes	-	`"exact"`, `"partial"`, or `"mismatch"`
`divergenceNotes`	`string`	No	`null`	Notes explaining the divergence
`actionIndex`	`number`	No	auto	Explicit index, or auto-incremented

Returns: this (for chaining)

TestBuilder.addAgentResponse

Record an agent text response with expected vs actual comparison.

test.addAgentResponse(
  expectedResponse: Record<string, unknown>,
  actualResponse: Record<string, unknown>,
  matchStatus: string,
  semanticSimilarity?: number | null,
  divergenceNotes?: string | null,
  actionIndex?: number,
): this

Parameters:

Parameter	Type	Required	Default	Description
`expectedResponse`	`Record<string, unknown>`	Yes	-	The expected response content
`actualResponse`	`Record<string, unknown>`	Yes	-	The actual response from the agent
`matchStatus`	`string`	Yes	-	`"exact"`, `"similar"`, or `"divergent"`
`semanticSimilarity`	`number`	No	`null`	Similarity score (0.0 to 1.0)
`divergenceNotes`	`string`	No	`null`	Notes explaining the divergence
`actionIndex`	`number`	No	auto	Explicit index, or auto-incremented

Returns: this (for chaining)

TestBuilder.setVmStream

Attach VM session logs to this test. For agents that operate in a browser or virtual machine.

test.setVmStream(
  provider: string,
  opts?: {
    sessionId?: string;
    durationMs?: number;
    logs?: Record<string, unknown>[];
    metadata?: Record<string, unknown>;
  },
): this

Parameters:

Parameter	Type	Required	Default	Description
`provider`	`string`	Yes	-	VM provider name (e.g. `"kernel"`, `"browserbase"`, `"steel"`)
`opts.sessionId`	`string`	No	-	Provider session ID for linking
`opts.durationMs`	`number`	No	-	Total session duration in milliseconds
`opts.logs`	`Record[]`	No	-	Timestamped log entries (see below)
`opts.metadata`	`Record`	No	-	Additional provider-specific metadata

Log entry format: Each entry should have ts (number, ms offset from start) and type (string):

{ ts: 0, type: "navigation", data: { url: "https://..." } }
{ ts: 1200, type: "action", data: { action: "click", selector: "#btn" } }
{ ts: 3000, type: "error", data: { message: "Element not found" } }

Example:

test.setVmStream("browserbase", {
  sessionId: "sess_abc123",
  durationMs: 12000,
  logs: [
    { ts: 0, type: "navigation", data: { url: "https://app.example.com" } },
    { ts: 2000, type: "action", data: { action: "click", selector: "#submit" } },
    { ts: 5000, type: "network", data: { method: "POST", url: "/api/order", status: 201 } },
  ],
});

Returns: this (for chaining)

TestBuilder.setKernelVm

Convenience method for attaching a Kernel browser session. Sets provider="kernel" and exposes Kernel-specific metadata fields. Fields map to Kernel's browser API response.

test.setKernelVm(
  sessionId: string,
  opts?: {
    durationMs?: number;
    logs?: Record<string, unknown>[];
    liveViewUrl?: string;
    cdpWsUrl?: string;
    replayId?: string;
    replayViewUrl?: string;
    headless?: boolean;
    stealth?: boolean;
    viewport?: { width: number; height: number };
  },
): this

Parameters:

Parameter	Type	Required	Default	Description
`sessionId`	`string`	Yes	-	Kernel browser session ID
`opts.durationMs`	`number`	No	-	Total session duration in milliseconds
`opts.logs`	`Record[]`	No	-	Timestamped log entries (same format as `setVmStream`)
`opts.liveViewUrl`	`string`	No	-	Remote live-view URL (`browser_live_view_url`)
`opts.cdpWsUrl`	`string`	No	-	Chrome DevTools Protocol WebSocket URL
`opts.replayId`	`string`	No	-	ID of the session recording
`opts.replayViewUrl`	`string`	No	-	URL to view the session replay
`opts.headless`	`boolean`	No	-	Whether the session ran in headless mode
`opts.stealth`	`boolean`	No	-	Whether anti-bot stealth mode was enabled
`opts.viewport`	`object`	No	-	Browser viewport, e.g. `{ width: 1920, height: 1080 }`

Example:

test.setKernelVm("kern_sess_abc123", {
  durationMs: 15000,
  logs: [
    { ts: 0, type: "navigation", data: { url: "https://app.example.com" } },
    { ts: 1200, type: "action", data: { action: "click", selector: "#login" } },
    { ts: 3000, type: "screenshot", data: { s3_key: "vm-streams/.../frame.png" } },
  ],
  replayId: "replay_abc123",
  replayViewUrl: "https://www.kernel.sh/replays/replay_abc123",
  stealth: true,
  viewport: { width: 1920, height: 1080 },
});

Returns: this (for chaining)

TestBuilder.complete

Mark the test as completed. Records the current timestamp.

test.complete(status?: string): this

Parameters:

Parameter	Type	Required	Default	Description
`status`	`string`	No	`"completed"`	Final status (`"completed"` or `"failed"`)

Returns: this (for chaining)

TestBuilder.build

Serialize this test to an object matching the run result schema.

test.build(): Record<string, unknown>

Returns: An object with test_id, status, action_results, started_at, and completed_at.

EvalRunner

Runs an agent against every scenario in a dataset and records results. This is the high-level API that encapsulates the full eval loop — iterating scenarios, calling the agent, comparing tool calls and text, and producing a RunBuilder.

Constructor

new EvalRunner(
  datasetSource: Record<string, unknown>,
  options?: {
    toolComparator?: ToolComparator;
    textComparator?: TextComparator;
    similarityThresholds?: { exact?: number; similar?: number };
  },
)

Parameters:

Parameter	Type	Required	Default	Description
`datasetSource`	`Record<string, unknown>`	Yes	-	The `dataset_source` object from a dataset (contains `"runs"`)
`options.toolComparator`	`ToolComparator`	No	`compareToolArgs`	Custom `(expected, actual) => [status, notes]` function
`options.textComparator`	`TextComparator`	No	`textSimilarity`	Custom `(textA, textB) => number` function
`options.similarityThresholds`	`object`	No	`{ exact: 0.70, similar: 0.40 }`	Score thresholds for match status

Type aliases:

type ToolComparator = (
  expected: Record<string, unknown>,
  actual: Record<string, unknown>,
) => [string, string | null];

type TextComparator = (a: string, b: string) => number;

Example:

import { EvalRunner } from "ashr-labs";

const runner = new EvalRunner(source);

// With custom thresholds
const runner = new EvalRunner(source, {
  similarityThresholds: { exact: 0.85, similar: 0.50 },
});

EvalRunner.fromDataset (static method)

Create an EvalRunner by fetching a dataset from the API.

static async EvalRunner.fromDataset(
  client: AshrLabsClient,
  datasetId: number,
  options?: {
    toolComparator?: ToolComparator;
    textComparator?: TextComparator;
    similarityThresholds?: { exact?: number; similar?: number };
  },
): Promise<EvalRunner>

Parameters:

Parameter	Type	Required	Description
`client`	`AshrLabsClient`	Yes	An authenticated client
`datasetId`	`number`	Yes	The dataset ID to fetch
`options`	`object`	No	Passed to `EvalRunner` constructor

Returns: EvalRunner — A configured runner ready to call .run()

Example:

const runner = await EvalRunner.fromDataset(client, 322);

EvalRunner.run

Run the agent against every scenario and return a populated RunBuilder.

async runner.run(
  agent: Agent | (() => Agent),
  options?: {
    onScenario?: OnScenarioCallback;
    onAction?: OnActionCallback;
    maxWorkers?: number;
  },
): Promise<RunBuilder>

Parameters:

Parameter	Type	Required	Default	Description
`agent`	`Agent \| (() => Agent)`	Yes	-	An object implementing the Agent interface, or a factory function
`options.onScenario`	`OnScenarioCallback`	No	`undefined`	Called at the start of each scenario: `(scenarioId, scenarioDict)`
`options.onAction`	`OnActionCallback`	No	`undefined`	Called for each action: `(actionIndex, actionDict)`
`options.maxWorkers`	`number`	No	`1`	Number of scenarios to run in parallel. When >1, `scenarioId` is passed to `respond()` and `reset()` so the agent can key state per scenario.

Type aliases:

type OnScenarioCallback = (scenarioId: string, scenario: Record<string, unknown>) => void;
type OnActionCallback = (actionIndex: number, action: Record<string, unknown>) => void;

Returns: RunBuilder — A populated builder ready for .build() or .deploy()

Example:

// Sequential (default)
const run = await runner.run(agent);
const result = run.build();
console.log(result.aggregate_metrics);

// Parallel — run 4 scenarios at a time
const run = await runner.run(agent, { maxWorkers: 4 });

// With factory function
const run = await runner.run(() => new MyAgent(), { maxWorkers: 4 });

EvalRunner.runAndDeploy

Run the eval and submit results in one call.

async runner.runAndDeploy(
  agent: Agent | (() => Agent),
  client: AshrLabsClient,
  datasetId?: number,
  options?: {
    onScenario?: OnScenarioCallback;
    onAction?: OnActionCallback;
    maxWorkers?: number;
    tenantId?: number;
    runnerId?: number;
  },
): Promise<Record<string, unknown>>

Parameters:

Parameter	Type	Required	Default	Description
`agent`	`Agent \| (() => Agent)`	Yes	-	An object implementing the Agent interface, or a factory function
`client`	`AshrLabsClient`	Yes	-	An authenticated client
`datasetId`	`number`	No	`undefined`	The dataset to submit against
`options.onScenario`	`OnScenarioCallback`	No	`undefined`	Callback per scenario
`options.onAction`	`OnActionCallback`	No	`undefined`	Callback per action
`options.maxWorkers`	`number`	No	`1`	Number of scenarios to run in parallel (default sequential)
`options.tenantId`	`number`	No	auto	The tenant (auto-resolved if omitted)
`options.runnerId`	`number`	No	`undefined`	ID of the user who ran the test

Returns: The created run object from the API

Example:

// Sequential
const created = await runner.runAndDeploy(agent, client, 322);
console.log(`Run #${created.id} submitted`);

// Parallel
const created = await runner.runAndDeploy(agent, client, 322, { maxWorkers: 4 });

Agent Interface

An interface that defines the contract agents must implement.

interface Agent {
  respond(
    message: string,
    scenarioId?: string,
  ): Record<string, unknown> | Promise<Record<string, unknown>>;

  reset(scenarioId?: string): void | Promise<void>;
}

respond

Process a user message and return the agent's response.

Parameters:

Parameter	Type	Description
`message`	`string`	The user's message text
`scenarioId`	`string`	Optional scenario ID (passed during parallel execution)

Returns: An object (or Promise of an object) with:

"text" (string): The agent's text response
"tool_calls" (Array): Tool calls made during this turn, each with "name" (string) and "arguments" (object) keys

arguments vs arguments_json: The Agent interface returns tool arguments as an object under the "arguments" key. However, RunBuilder and the API store them as a JSON string under "arguments_json". EvalRunner handles this conversion automatically. If you use RunBuilder directly, pass "arguments_json" (a JSON string) to addToolCall(). The extractToolArgs() helper accepts both formats, so comparators work either way.

reset

Clear conversation state for a new scenario. Called before each scenario begins.

Parameters:

Parameter	Type	Description
`scenarioId`	`string`	Optional scenario ID (passed during parallel execution)

Comparator Functions

All comparator functions are standalone and importable from the top-level package.

stripMarkdown

Remove markdown formatting from text.

stripMarkdown(text: string): string

Removes bold/italic markers, headers, bullets, and markdown links. Collapses whitespace.

Example:

stripMarkdown("**Bold** and [link](https://x.com)");
// => "Bold and link"

tokenize

Lowercase, strip markdown and punctuation, split into word tokens.

tokenize(text: string): string[]

Example:

tokenize("Order **ORD-123** shipped!");
// => ["order", "ord123", "shipped"]

fuzzyStrMatch

Check if two strings are semantically close enough to count as matching.

fuzzyStrMatch(a: string, b: string, threshold?: number): boolean

Parameters:

Parameter	Type	Required	Default	Description
`a`	`string`	Yes	-	First string
`b`	`string`	Yes	-	Second string
`threshold`	`number`	No	adaptive	Word-overlap threshold. If undefined: 0.35 for <=5 words, 0.40 for <=8, 0.55 otherwise

Returns: true if the strings match closely enough

Checks in order: exact match after normalization, containment, then word-set overlap.

Example:

fuzzyStrMatch("Customer wants a refund", "customer wants refund");  // true
fuzzyStrMatch("apple banana", "cherry grape");                      // false

extractToolArgs

Extract arguments from a tool call object, handling both formats.

extractToolArgs(toolCall: Record<string, unknown>): Record<string, unknown>

Handles { arguments: {...} } (object form) and { arguments_json: "..." } (JSON string form). Prefers the object form if both are present.

Example:

extractToolArgs({ arguments_json: '{"order_id": "ORD-123"}' });
// => { order_id: "ORD-123" }

extractToolArgs({ arguments: { order_id: "ORD-123" } });
// => { order_id: "ORD-123" }

compareToolArgs

Compare expected vs actual tool call arguments.

compareToolArgs(
  expected: Record<string, unknown>,
  actual: Record<string, unknown>,
): [string, string | null]

Parameters:

Parameter	Type	Description
`expected`	`Record<string, unknown>`	Expected tool call (with `arguments` or `arguments_json`)
`actual`	`Record<string, unknown>`	Actual tool call made by the agent

Returns: A tuple of [matchStatus, divergenceNotes]:

matchStatus: "exact", "partial", or "mismatch"
divergenceNotes: Human-readable diff summary, or null if exact

String arguments are compared using fuzzyStrMatch. Non-string values use JSON.stringify equality. Extra arguments in the actual call don't cause divergence.

Example:

const [status, notes] = compareToolArgs(
  { arguments: { order_id: "ORD-123" } },
  { arguments: { order_id: "ORD-123", extra: "field" } },
);
// => ["exact", null]

const [status, notes] = compareToolArgs(
  { arguments: { order_id: "ORD-123", reason: "damaged item" } },
  { arguments: { order_id: "ORD-999", reason: "item was damaged" } },
);
// => ["partial", "'order_id': expected='ORD-123' actual='ORD-999'"]

textSimilarity

Compute similarity between two text strings.

textSimilarity(textA: string, textB: string): number

Returns: A number between 0.0 and 1.0

Uses cosine similarity on word frequency vectors, plus:

Entity bonus (+0.20): for matching order IDs (ORD-*), refund IDs (REF-*), prices ($*), dates (YYYY-MM-DD), and tracking URLs
Concept bonus (+0.10): for matching domain concepts (refund/credited, shipped/transit/delivered, stock/available, etc.)

Example:

textSimilarity(
  "Your order ORD-123 has shipped and is on the way",
  "Order ORD-123 has been shipped and is in transit",
);
// => 0.78

Data Types

User

interface User {
  id?: number;
  created_at?: string;
  email?: string;
  name?: string | null;
  tenant?: number;
  is_active?: boolean;
}

Tenant

interface Tenant {
  id?: number;
  created_at?: string;
  tenant_name?: string;
  is_active?: boolean;
}

Session

interface Session {
  status: string;
  user: User;
  tenant: Tenant;
}

Dataset

interface Dataset {
  id?: number;
  created_at?: string;
  tenant?: number;
  creator?: number;
  name?: string;
  description?: string | null;
  dataset_source?: Record<string, unknown>;
}

Run

interface Run {
  id?: number;
  created_at?: string;
  dataset?: number;
  tenant?: number;
  runner?: number;
  result?: Record<string, unknown>;
}

SdkNote

interface SdkNote {
  id?: number;
  created_at?: string;
  updated_at?: string;
  title?: string;
  content?: string;
  category?: string;   // "info" | "warning" | "breaking_change" | "best_practice" | "deprecation"
  severity?: string;   // "info" | "warning" | "critical"
  tenant_id?: number | null;
  agent_id?: number | null;
  active_from?: string;
  expires_at?: string | null;
  is_archived?: boolean;
  note_metadata?: Record<string, unknown>;
}

Request

interface Request {
  id?: number;
  created_at?: string;
  requestor_id?: number;
  requestor_tenant?: number;
  request_name?: string;
  request_status?: string;
  request_input_schema?: Record<string, unknown> | null;
  request?: Record<string, unknown>;
}

APIKey

interface APIKey {
  id?: number;
  key?: string;           // Only present on creation
  key_prefix?: string;
  name?: string;
  scopes?: string[];
  user_id?: number;
  tenant_id?: number;
  created_at?: string;
  last_used_at?: string | null;
  expires_at?: string | null;
  is_active?: boolean;
}

ToolCall

interface ToolCall {
  name?: string;
  arguments_json?: string;
}

ExpectedResponse

interface ExpectedResponse {
  tool_calls?: ToolCall[];
  text?: string;
}

Action

interface Action {
  actor?: string;           // "user" or "agent"
  content?: string;
  name?: string;
  expected_response?: ExpectedResponse;
}

Scenario

interface Scenario {
  title?: string;
  actions?: Action[];
}

AshrLabsClient​

Constructor​

fromEnv (static method)​

Session Methods​

init​

Dataset Methods​

getDataset​

listDatasets​

Run Methods​

createRun​

getRun​

listRuns​

deleteRun​

Observability — Production Agent Tracing​

client.trace​

Trace methods​

Span methods​

Generation methods​

wrap() pattern (recommended)​

Manual instrumentation​

listObservabilityTraces​

getObservabilityTrace​

getObservabilityAnalytics​

getObservabilityErrors / getObservabilityToolErrors​

SDK Notes — Platform Advisories​

client.notes (getter)​

getNotes​

Request Methods​

createRequest​

getRequest​

listRequests​

API Key Methods​

listApiKeys​

revokeApiKey​

Convenience Methods​

waitForRequest​

generateDataset​

Utility Methods​

healthCheck​

toString​

RunBuilder​

Constructor​

RunBuilder.start​

RunBuilder.addTest​

RunBuilder.complete​

RunBuilder.build​

RunBuilder.deploy​

TestBuilder​

TestBuilder.start​

TestBuilder.addUserFile​

TestBuilder.addUserText​

TestBuilder.addToolCall​

TestBuilder.addAgentResponse​

TestBuilder.setVmStream​

TestBuilder.setKernelVm​

TestBuilder.complete​

TestBuilder.build​

EvalRunner​

Constructor​

EvalRunner.fromDataset (static method)​

EvalRunner.run​

EvalRunner.runAndDeploy​

Agent Interface​

respond​

reset​

Comparator Functions​

stripMarkdown​

tokenize​

fuzzyStrMatch​

extractToolArgs​

compareToolArgs​

textSimilarity​

Data Types​

User​

Tenant​

Session​

Dataset​

Run​

SdkNote​

Request​

AshrLabsClient

Constructor

fromEnv (static method)

Session Methods

init

Dataset Methods

getDataset

listDatasets

Run Methods

createRun

getRun

listRuns

deleteRun

Observability — Production Agent Tracing

client.trace

Trace methods

Span methods

Generation methods

wrap() pattern (recommended)

Manual instrumentation

listObservabilityTraces

getObservabilityTrace

getObservabilityAnalytics

getObservabilityErrors / getObservabilityToolErrors

SDK Notes — Platform Advisories

client.notes (getter)

getNotes

Request Methods

createRequest

getRequest

listRequests

API Key Methods

listApiKeys

revokeApiKey

Convenience Methods

waitForRequest

generateDataset

Utility Methods

healthCheck

toString

RunBuilder

Constructor

RunBuilder.start

RunBuilder.addTest

RunBuilder.complete

RunBuilder.build

RunBuilder.deploy

TestBuilder

TestBuilder.start

TestBuilder.addUserFile

TestBuilder.addUserText

TestBuilder.addToolCall

TestBuilder.addAgentResponse

TestBuilder.setVmStream

TestBuilder.setKernelVm

TestBuilder.complete

TestBuilder.build

EvalRunner

Constructor

EvalRunner.fromDataset (static method)

EvalRunner.run

EvalRunner.runAndDeploy

Agent Interface

respond

reset

Comparator Functions

stripMarkdown

tokenize

fuzzyStrMatch

extractToolArgs

compareToolArgs

textSimilarity

Data Types

User

Tenant

Session

Dataset

Run

SdkNote

Request