Skip to main content

API Reference

Complete reference for all classes and methods in the Ashr Labs TypeScript SDK.

AshrLabsClient

The main client class for interacting with the Ashr Labs API.

Constructor

new AshrLabsClient(
apiKey: string,
baseUrl?: string,
timeout?: number,
)

Parameters:

ParameterTypeRequiredDefaultDescription
apiKeystringYes-Your API key (must start with tp_)
baseUrlstringNoProduction URLBase URL of the API
timeoutnumberNo30Request timeout in seconds

Throws:

  • Error: If the API key format is invalid

Example:

// Minimal — just pass your API key
const client = new AshrLabsClient("tp_your_key_here");

// Custom timeout
const client = new AshrLabsClient("tp_your_key_here", undefined, 60);

fromEnv (static method)

Create a client from environment variables.

AshrLabsClient.fromEnv(timeout?: number): AshrLabsClient

Reads ASHR_LABS_API_KEY (required) and ASHR_LABS_BASE_URL (optional) from the environment.

Throws:

  • Error: If ASHR_LABS_API_KEY is not set

Example:

// export ASHR_LABS_API_KEY="tp_your_key_here"
const client = AshrLabsClient.fromEnv();

Session Methods

init

Initialize a session and validate authentication.

async init(): Promise<Record<string, unknown>>

Returns: Session information containing user and tenant data

Throws:

  • AuthenticationError: If the API key is invalid or expired

Example:

// Validate credentials and get user/tenant info
const session = await client.init();

const user = session.user as Record<string, unknown>;
const tenant = session.tenant as Record<string, unknown>;
console.log(`User ID: ${user.id}`);
console.log(`Email: ${user.email}`);
console.log(`Tenant ID: ${tenant.id}`);
console.log(`Tenant Name: ${tenant.tenant_name}`);

Dataset Methods

getDataset

Retrieve a dataset by ID.

async getDataset(
datasetId: number,
includeSignedUrls?: boolean,
urlExpiresSeconds?: number,
): Promise<Record<string, unknown>>

Parameters:

ParameterTypeRequiredDefaultDescription
datasetIdnumberYes-The ID of the dataset
includeSignedUrlsbooleanNofalseInclude signed S3 URLs for media
urlExpiresSecondsnumberNo3600URL expiration time in seconds

Returns: The dataset object

Throws:

  • NotFoundError: Dataset not found
  • AuthorizationError: No access to this dataset

Example:

const dataset = await client.getDataset(42, true, 7200);
console.log(dataset.name);

listDatasets

List datasets for a tenant.

async listDatasets(
tenantId?: number | null,
limit?: number,
offset?: number,
includeSignedUrls?: boolean,
urlExpiresSeconds?: number,
): Promise<Record<string, unknown>>

Parameters:

ParameterTypeRequiredDefaultDescription
tenantIdnumberNoautoThe tenant ID (auto-resolved if omitted)
limitnumberNo50Maximum results to return
offsetnumberNo0Number of results to skip
includeSignedUrlsbooleanNofalseInclude signed S3 URLs
urlExpiresSecondsnumberNo3600URL expiration time

Returns: Object with keys:

  • status: "ok"
  • datasets: Array of dataset objects

Example:

// tenantId auto-resolved from API key
const response = await client.listDatasets(undefined, 10);
const datasets = response.datasets as Record<string, unknown>[];
for (const dataset of datasets) {
console.log(`${dataset.id}: ${dataset.name}`);
}

Run Methods

createRun

Create a new test run.

async createRun(
datasetId: number,
result: Record<string, unknown>,
tenantId?: number | null,
runnerId?: number | null,
): Promise<Record<string, unknown>>

Parameters:

ParameterTypeRequiredDefaultDescription
datasetIdnumberYes-The dataset ID
resultRecord<string, unknown>Yes-Run results (metrics, status, etc.)
tenantIdnumberNoautoThe tenant ID (auto-resolved if omitted)
runnerIdnumberNonullID of user who ran the test

Returns: The created run object

Example:

const run = await client.createRun(42, {
status: "passed",
score: 0.95,
metrics: {
accuracy: 0.98,
latency_ms: 150,
},
});

getRun

Retrieve a run by ID.

async getRun(runId: number): Promise<Record<string, unknown>>

Parameters:

ParameterTypeRequiredDescription
runIdnumberYesThe run ID

Returns: The run object

Throws:

  • NotFoundError: Run not found

Example:

const run = await client.getRun(99);
const result = run.result as Record<string, unknown>;
console.log(`Score: ${result.score}`);

listRuns

List runs for a tenant or dataset.

async listRuns(
datasetId?: number | null,
tenantId?: number | null,
limit?: number,
offset?: number,
): Promise<Record<string, unknown>>

Parameters:

ParameterTypeRequiredDefaultDescription
datasetIdnumberNonullFilter by dataset
tenantIdnumberNoautoFilter by tenant (auto-resolved if omitted)
limitnumberNo50Maximum results
offsetnumberNo0Results to skip

Returns: Object with keys:

  • status: "ok"
  • runs: Array of run objects

Example:

// Get runs for a specific dataset
const response = await client.listRuns(42);
const runs = response.runs as Record<string, unknown>[];
for (const run of runs) {
const result = run.result as Record<string, unknown>;
console.log(`Run #${run.id}: ${result.status}`);
}

deleteRun

Delete a test run.

async deleteRun(runId: number): Promise<Record<string, unknown>>

Parameters:

ParameterTypeRequiredDescription
runIdnumberYesThe run ID to delete

Returns: Confirmation of deletion

Throws:

  • NotFoundError: Run not found

Example:

await client.deleteRun(99);
console.log("Run deleted");

Observability — Production Agent Tracing

Trace your agent's production behavior — LLM calls, tool invocations, retrieval steps, guardrail checks, and more. Traces are stored in Postgres and optionally forwarded to Langfuse.

Production-safe: tracing never rejects or interferes with your agent. If the backend is unreachable, trace.end() resolves with an error object instead of rejecting.

client.trace

const trace = client.trace(
name: string,
opts?: {
userId?: string;
sessionId?: string;
metadata?: Record<string, unknown>;
tags?: string[];
},
): Trace
ParameterTypeRequiredDescription
namestringYesName for this trace (e.g. "handle-ticket")
opts.userIdstringNoEnd-user ID for grouping
opts.sessionIdstringNoConversation/session ID
opts.metadataRecordNoArbitrary metadata
opts.tagsstring[]NoTags for filtering

Trace methods

MethodDescription
trace.span(name, opts?)Create a top-level span
trace.generation(name, opts?)Create a top-level generation (LLM call)
trace.event(name, opts?)Record a point-in-time event
trace.wrap(fn)Run callback, auto-flush on completion
await trace.end(opts?)Flush the trace to the backend. Never rejects.
trace.traceIdServer-assigned trace ID (available after end())

Span methods

MethodDescription
span.span(name, opts?)Create a child span
span.generation(name, opts?)Create a child generation
span.event(name, opts?)Record an event under this span
span.wrap(fn)Run callback, auto-end on completion
span.end(opts?)Mark the span as complete

If wrap() callback throws, the span auto-ends with level: "ERROR" and the exception message captured in statusMessage.

Generation methods

Inherits all Span methods, plus:

MethodDescription
gen.end(opts?)Mark complete. Accepts output, usage: { input_tokens, output_tokens }, statusMessage, level.

wrap() ensures spans/traces are always ended, even if your code throws:

await trace.wrap(async (t) => {
await t.span("tool:search", { input: { q: "..." } }).wrap(async (s) => {
const data = await search(...);
s.end({ output: data });
return data;
});

const gen = t.generation("respond", { model: "claude-sonnet-4-6" });
const response = await callLlm(...);
gen.end({ output: response, usage: { input_tokens: 100, output_tokens: 50 } });
});
// trace.end() called automatically

Manual instrumentation

const trace = client.trace("support-chat", { userId: "user_42", sessionId: "conv_abc" });

const gen = trace.generation("classify", { model: "claude-sonnet-4-6",
input: [{ role: "user", content: "Reset my password" }] });
gen.end({ output: { intent: "password_reset" },
usage: { input_tokens: 50, output_tokens: 12 } });

const tool = trace.span("tool:reset_password", { input: { user_id: "42" } });
tool.end({ output: { success: true } });

trace.event("guardrail-check", { input: { passed: true } });

const result = await trace.end({ output: { resolution: "password_reset_complete" } });
console.log(trace.traceId);

listObservabilityTraces

await client.listObservabilityTraces(opts?: {
userId?: string;
sessionId?: string;
limit?: number;
page?: number;
}): Promise<{ status: string; traces: object[]; total: number }>

getObservabilityTrace

await client.getObservabilityTrace(traceId: string): Promise<{ status: string; trace: object }>

getObservabilityAnalytics

await client.getObservabilityAnalytics(days?: number): Promise<{
status: string;
overview: { total_traces, avg_latency_ms, total_input_tokens, total_output_tokens, error_rate, total_tool_calls, ... };
tool_performance: { tool_name, total_calls, error_rate, avg_latency_ms }[];
model_usage: { model, total_calls, total_tokens, avg_latency_ms }[];
}>

getObservabilityErrors / getObservabilityToolErrors

await client.getObservabilityErrors(opts?: { days?: number; limit?: number; page?: number })
await client.getObservabilityToolErrors(opts?: { days?: number; limit?: number; page?: number })

SDK Notes — Platform Advisories

SDK Notes are platform advisories delivered to your SDK from Ashr Labs. They communicate context changes, best practices, deprecations, or breaking changes that may affect how you configure or run your agent.

Notes are automatically fetched when the client initializes (via init()). You can also refresh them on demand.

client.notes (getter)

Get cached SDK notes from the last init() or getNotes() call. No network request is made.

get notes(): Record<string, unknown>[]

Returns: List of active notes for your tenant.

Example:

const client = new AshrLabsClient("tp_...");
await client.init();

// Notes are auto-fetched on init
for (const note of client.notes) {
console.log(`[${note.severity}] ${note.title}: ${note.content}`);
}

getNotes

Fetch fresh SDK notes from the platform. Updates the cached client.notes.

async getNotes(agentId?: number | null): Promise<Record<string, unknown>[]>

Parameters:

ParameterTypeRequiredDefaultDescription
agentIdnumber | nullNoundefinedInclude notes targeted at this specific agent

Returns: List of active notes (global + tenant-specific, plus agent-specific if agentId is provided).

Example:

// Refresh notes
const notes = await client.getNotes();

// Filter by agent
const notes = await client.getNotes(42);

// Check for breaking changes
const breaking = notes.filter(n => n.category === "breaking_change");
if (breaking.length) {
console.log("Warning: Breaking changes detected:");
for (const n of breaking) {
console.log(` ${n.title}: ${n.content}`);
}
}

Note categories: info, warning, breaking_change, best_practice, deprecation

Severity levels: info, warning, critical


Request Methods

createRequest

Create a dataset generation request.

async createRequest(
requestName: string,
request: Record<string, unknown>,
requestInputSchema?: Record<string, unknown> | null,
tenantId?: number | null,
requestorId?: number | null,
): Promise<Record<string, unknown>>

Parameters:

ParameterTypeRequiredDefaultDescription
requestNamestringYes-Name/title for the request
requestRecord<string, unknown>Yes-The generation config (see below)
requestInputSchemaRecord<string, unknown>NoautoJSON Schema for validating the request. A permissive default is sent if omitted. If your agent has tools, include them here under the "tools" key so they're auto-saved as skill templates.
tenantIdnumberNoautoThe tenant ID (auto-resolved if omitted)
requestorIdnumberNoautoID of requesting user (auto-resolved if omitted)

Returns: The created request object

Generation config structure (the request object):

{
metadata: {
dataset_name: "My Eval Dataset",
description: "Description of what this dataset tests",
},
agent: {
name: "My Agent",
description: "What the agent does",
system_prompt: "You are a helpful assistant...",
tools: [
{
name: "tool_name",
description: "What the tool does",
parameters: {
type: "object",
properties: { arg: { type: "string" } },
required: ["arg"],
},
},
],
accepted_inputs: { text: true, audio: false, file: false, image: false, video: false },
output_format: { type: "text" },
},
context: {
domain: "ecommerce",
use_case: "Customers asking about orders",
scenario_context: "An online retail store",
},
test_config: {
num_variations: 3, // how many test scenarios to generate
coverage: {
happy_path: true,
edge_cases: true,
error_handling: true,
multi_turn: true,
},
},
generation_options: {
generate_audio: false, // set true to generate audio files
generate_files: false,
generate_simulations: false,
},
}

Validation: The request config is validated before sending:

  • config.agent must be a non-empty object with at least one of: name, description, system_prompt
  • config.context must be a non-empty object with at least one of: domain, use_case, scenario_context

Example:

const req = await client.createRequest(
"Support Agent Eval",
{
metadata: { dataset_name: "Support Eval" },
agent: {
name: "Support Bot",
description: "Answers customer questions",
system_prompt: "You are a helpful support agent.",
tools: [{ name: "lookup_order", description: "Look up an order",
parameters: { type: "object", properties: { order_id: { type: "string" } }, required: ["order_id"] } }],
accepted_inputs: { text: true, audio: false, file: false, image: false, video: false },
output_format: { type: "text" },
},
context: { domain: "ecommerce", use_case: "Customers asking about orders", scenario_context: "Online store" },
test_config: { num_variations: 3, coverage: { happy_path: true, edge_cases: true } },
generation_options: { generate_audio: false, generate_files: false, generate_simulations: false },
},
);

// Poll for completion
const completed = await client.waitForRequest(req.id as number);
console.log(`Status: ${completed.request_status}`);

getRequest

Retrieve a request by ID.

async getRequest(requestId: number): Promise<Record<string, unknown>>

Parameters:

ParameterTypeRequiredDescription
requestIdnumberYesThe request ID

Returns: The request object

Throws:

  • NotFoundError: Request not found

Example:

const req = await client.getRequest(123);
console.log(`Status: ${req.request_status}`);

listRequests

List requests for a tenant.

async listRequests(
tenantId?: number | null,
status?: string | null,
limit?: number,
offset?: number,
): Promise<Record<string, unknown>>

Parameters:

ParameterTypeRequiredDefaultDescription
tenantIdnumberNoautoThe tenant ID (auto-resolved if omitted)
statusstringNonullFilter by status
limitnumberNo50Maximum results
offsetnumberNo0Results to skip

Returns: Object with keys:

  • status: "ok"
  • requests: Array of request objects

Example:

// Get pending requests
const response = await client.listRequests(undefined, "pending");
const requests = response.requests as Record<string, unknown>[];
for (const req of requests) {
console.log(`Request #${req.id}: ${req.request_name}`);
}

API Key Methods

listApiKeys

List API keys for your tenant.

async listApiKeys(
includeInactive?: boolean,
): Promise<Record<string, unknown>[]>

Parameters:

ParameterTypeRequiredDefaultDescription
includeInactivebooleanNofalseInclude revoked keys

Returns: Array of API key objects

Note: For security, only the key prefix is returned, not the full key.

Example:

const keys = await client.listApiKeys();
for (const key of keys) {
console.log(`${key.key_prefix}... - ${key.name}`);
}

revokeApiKey

Revoke an API key.

async revokeApiKey(apiKeyId: number): Promise<Record<string, unknown>>

Parameters:

ParameterTypeRequiredDescription
apiKeyIdnumberYesThe API key ID to revoke

Returns: Confirmation of revocation

Throws:

  • NotFoundError: API key not found

Example:

await client.revokeApiKey(123);
console.log("API key revoked");

Convenience Methods

waitForRequest

Block until a request reaches a terminal state (completed or failed).

async waitForRequest(
requestId: number,
timeout?: number,
pollInterval?: number,
): Promise<Record<string, unknown>>

Parameters:

ParameterTypeRequiredDefaultDescription
requestIdnumberYes-The request ID to poll
timeoutnumberNo600Maximum seconds to wait
pollIntervalnumberNo5Seconds between polls

Returns: The final request object

Throws:

  • Error: If the request doesn't finish within timeout seconds
  • AshrLabsError: If the request fails

Example:

const req = await client.createRequest("My Eval", config);
const completed = await client.waitForRequest(req.id as number, 300);
console.log(`Status: ${completed.request_status}`);

generateDataset

Create a dataset generation request, wait for completion, and fetch the result. Combines createRequest + waitForRequest + getDataset into one call.

async generateDataset(
requestName: string,
config: Record<string, unknown>,
requestInputSchema?: Record<string, unknown> | null,
timeout?: number,
pollInterval?: number,
): Promise<[number, Record<string, unknown>]>

Parameters:

ParameterTypeRequiredDefaultDescription
requestNamestringYes-Name/title for the request
configRecord<string, unknown>Yes-The generation config (same as createRequest's request parameter)
requestInputSchemaRecord<string, unknown>NoautoOptional JSON Schema for validation
timeoutnumberNo600Maximum seconds to wait
pollIntervalnumberNo5Seconds between polls

Returns: A tuple of [datasetId, datasetSource] where datasetSource is the object containing "runs".

Throws:

  • Error: If generation doesn't finish in time
  • AshrLabsError: If generation fails or no datasets are found

Example:

const [datasetId, source] = await client.generateDataset(
"Support Agent Eval",
{
metadata: { dataset_name: "Support Eval" },
agent: { /* ... */ },
context: { /* ... */ },
test_config: { num_variations: 10, coverage: { happy_path: true } },
generation_options: { generate_audio: false, generate_files: false, generate_simulations: false },
},
);
const runs = (source.runs ?? {}) as Record<string, unknown>;
console.log(`Dataset #${datasetId}: ${Object.keys(runs).length} scenarios`);

Utility Methods

healthCheck

Check if the API is reachable.

async healthCheck(): Promise<Record<string, unknown>>

Returns: Status information

Example:

const status = await client.healthCheck();
console.log(`API Status: ${status.status}`);

toString

Get a string representation of the client.

toString(): string

Example:

console.log(client.toString());
// => AshrLabsClient(baseUrl='https://api.ashr.io/testing-platform-api', apiKey='tp_abc12...')

RunBuilder

A builder for incrementally constructing run result objects as an agent executes tests. Once complete, the result can be deployed via the client.

Constructor

new RunBuilder()

No parameters. Creates a run in "pending" status.


RunBuilder.start

Mark the run as started. Records the current timestamp.

run.start(): this

Returns: this (for chaining)


RunBuilder.addTest

Create and register a new test within this run.

run.addTest(testId: string): TestBuilder

Parameters:

ParameterTypeRequiredDescription
testIdstringYesUnique identifier for the test case

Returns: TestBuilder — A builder for the individual test


RunBuilder.complete

Mark the run as completed. Records the current timestamp.

run.complete(status?: string): this

Parameters:

ParameterTypeRequiredDefaultDescription
statusstringNo"completed"Final status ("completed" or "failed")

Returns: this (for chaining)


RunBuilder.build

Serialize the full run result to an object.

run.build(): Record<string, unknown>

Returns: An object matching the run result schema, ready to be passed to client.createRun(datasetId, result). Aggregate metrics are computed automatically from action results.


RunBuilder.deploy

Build the result and submit it as a new run via the API.

run.deploy(
client: AshrLabsClient,
datasetId: number,
tenantId?: number,
runnerId?: number,
): Promise<Record<string, unknown>>

Parameters:

ParameterTypeRequiredDefaultDescription
clientAshrLabsClientYes-An authenticated client instance
datasetIdnumberYes-The dataset this run is for
tenantIdnumberNoautoThe tenant (auto-resolved if omitted)
runnerIdnumberNoundefinedID of the user who ran the test

Returns: The created run object from the API

Example:

import { AshrLabsClient, RunBuilder } from "ashr-labs";

const client = new AshrLabsClient("tp_...");

const run = new RunBuilder();
run.start();

const test = run.addTest("bank_analysis");
test.start();
test.addUserText("Analyze this", "User prompt");
test.addToolCall(
{ name: "analyze", arguments: { data: "input" } },
{ name: "analyze", arguments: { data: "input" } },
"exact",
);
test.complete();

run.complete();
const createdRun = await run.deploy(client, 42);
console.log(`Run #${createdRun.id} created`);

TestBuilder

Builds a single test result incrementally. Returned by RunBuilder.addTest().

TestBuilder.start

Mark the test as started. Records the current timestamp.

test.start(): this

Returns: this (for chaining)


TestBuilder.addUserFile

Record a user file input action.

test.addUserFile(
filePath: string,
description: string,
actionIndex?: number,
): this

Parameters:

ParameterTypeRequiredDefaultDescription
filePathstringYes-Path to the file in the dataset
descriptionstringYes-Description of the action
actionIndexnumberNoautoExplicit index, or auto-incremented

Returns: this (for chaining)


TestBuilder.addUserText

Record a user text input action.

test.addUserText(
text: string,
description: string,
actionIndex?: number,
): this

Parameters:

ParameterTypeRequiredDefaultDescription
textstringYes-The user's text input
descriptionstringYes-Description of the action
actionIndexnumberNoautoExplicit index, or auto-incremented

Returns: this (for chaining)


TestBuilder.addToolCall

Record an agent tool call action with expected vs actual comparison.

test.addToolCall(
expected: Record<string, unknown>,
actual: Record<string, unknown>,
matchStatus: string,
divergenceNotes?: string | null,
actionIndex?: number,
): this

Parameters:

ParameterTypeRequiredDefaultDescription
expectedRecord<string, unknown>Yes-Expected tool call (name, arguments)
actualRecord<string, unknown>Yes-Actual tool call made by the agent
matchStatusstringYes-"exact", "partial", or "mismatch"
divergenceNotesstringNonullNotes explaining the divergence
actionIndexnumberNoautoExplicit index, or auto-incremented

Returns: this (for chaining)


TestBuilder.addAgentResponse

Record an agent text response with expected vs actual comparison.

test.addAgentResponse(
expectedResponse: Record<string, unknown>,
actualResponse: Record<string, unknown>,
matchStatus: string,
semanticSimilarity?: number | null,
divergenceNotes?: string | null,
actionIndex?: number,
): this

Parameters:

ParameterTypeRequiredDefaultDescription
expectedResponseRecord<string, unknown>Yes-The expected response content
actualResponseRecord<string, unknown>Yes-The actual response from the agent
matchStatusstringYes-"exact", "similar", or "divergent"
semanticSimilaritynumberNonullSimilarity score (0.0 to 1.0)
divergenceNotesstringNonullNotes explaining the divergence
actionIndexnumberNoautoExplicit index, or auto-incremented

Returns: this (for chaining)


TestBuilder.setVmStream

Attach VM session logs to this test. For agents that operate in a browser or virtual machine.

test.setVmStream(
provider: string,
opts?: {
sessionId?: string;
durationMs?: number;
logs?: Record<string, unknown>[];
metadata?: Record<string, unknown>;
},
): this

Parameters:

ParameterTypeRequiredDefaultDescription
providerstringYes-VM provider name (e.g. "kernel", "browserbase", "steel")
opts.sessionIdstringNo-Provider session ID for linking
opts.durationMsnumberNo-Total session duration in milliseconds
opts.logsRecord[]No-Timestamped log entries (see below)
opts.metadataRecordNo-Additional provider-specific metadata

Log entry format: Each entry should have ts (number, ms offset from start) and type (string):

{ ts: 0, type: "navigation", data: { url: "https://..." } }
{ ts: 1200, type: "action", data: { action: "click", selector: "#btn" } }
{ ts: 3000, type: "error", data: { message: "Element not found" } }

Example:

test.setVmStream("browserbase", {
sessionId: "sess_abc123",
durationMs: 12000,
logs: [
{ ts: 0, type: "navigation", data: { url: "https://app.example.com" } },
{ ts: 2000, type: "action", data: { action: "click", selector: "#submit" } },
{ ts: 5000, type: "network", data: { method: "POST", url: "/api/order", status: 201 } },
],
});

Returns: this (for chaining)


TestBuilder.setKernelVm

Convenience method for attaching a Kernel browser session. Sets provider="kernel" and exposes Kernel-specific metadata fields. Fields map to Kernel's browser API response.

test.setKernelVm(
sessionId: string,
opts?: {
durationMs?: number;
logs?: Record<string, unknown>[];
liveViewUrl?: string;
cdpWsUrl?: string;
replayId?: string;
replayViewUrl?: string;
headless?: boolean;
stealth?: boolean;
viewport?: { width: number; height: number };
},
): this

Parameters:

ParameterTypeRequiredDefaultDescription
sessionIdstringYes-Kernel browser session ID
opts.durationMsnumberNo-Total session duration in milliseconds
opts.logsRecord[]No-Timestamped log entries (same format as setVmStream)
opts.liveViewUrlstringNo-Remote live-view URL (browser_live_view_url)
opts.cdpWsUrlstringNo-Chrome DevTools Protocol WebSocket URL
opts.replayIdstringNo-ID of the session recording
opts.replayViewUrlstringNo-URL to view the session replay
opts.headlessbooleanNo-Whether the session ran in headless mode
opts.stealthbooleanNo-Whether anti-bot stealth mode was enabled
opts.viewportobjectNo-Browser viewport, e.g. { width: 1920, height: 1080 }

Example:

test.setKernelVm("kern_sess_abc123", {
durationMs: 15000,
logs: [
{ ts: 0, type: "navigation", data: { url: "https://app.example.com" } },
{ ts: 1200, type: "action", data: { action: "click", selector: "#login" } },
{ ts: 3000, type: "screenshot", data: { s3_key: "vm-streams/.../frame.png" } },
],
replayId: "replay_abc123",
replayViewUrl: "https://www.kernel.sh/replays/replay_abc123",
stealth: true,
viewport: { width: 1920, height: 1080 },
});

Returns: this (for chaining)


TestBuilder.complete

Mark the test as completed. Records the current timestamp.

test.complete(status?: string): this

Parameters:

ParameterTypeRequiredDefaultDescription
statusstringNo"completed"Final status ("completed" or "failed")

Returns: this (for chaining)


TestBuilder.build

Serialize this test to an object matching the run result schema.

test.build(): Record<string, unknown>

Returns: An object with test_id, status, action_results, started_at, and completed_at.


EvalRunner

Runs an agent against every scenario in a dataset and records results. This is the high-level API that encapsulates the full eval loop — iterating scenarios, calling the agent, comparing tool calls and text, and producing a RunBuilder.

Constructor

new EvalRunner(
datasetSource: Record<string, unknown>,
options?: {
toolComparator?: ToolComparator;
textComparator?: TextComparator;
similarityThresholds?: { exact?: number; similar?: number };
},
)

Parameters:

ParameterTypeRequiredDefaultDescription
datasetSourceRecord<string, unknown>Yes-The dataset_source object from a dataset (contains "runs")
options.toolComparatorToolComparatorNocompareToolArgsCustom (expected, actual) => [status, notes] function
options.textComparatorTextComparatorNotextSimilarityCustom (textA, textB) => number function
options.similarityThresholdsobjectNo{ exact: 0.70, similar: 0.40 }Score thresholds for match status

Type aliases:

type ToolComparator = (
expected: Record<string, unknown>,
actual: Record<string, unknown>,
) => [string, string | null];

type TextComparator = (a: string, b: string) => number;

Example:

import { EvalRunner } from "ashr-labs";

const runner = new EvalRunner(source);

// With custom thresholds
const runner = new EvalRunner(source, {
similarityThresholds: { exact: 0.85, similar: 0.50 },
});

EvalRunner.fromDataset (static method)

Create an EvalRunner by fetching a dataset from the API.

static async EvalRunner.fromDataset(
client: AshrLabsClient,
datasetId: number,
options?: {
toolComparator?: ToolComparator;
textComparator?: TextComparator;
similarityThresholds?: { exact?: number; similar?: number };
},
): Promise<EvalRunner>

Parameters:

ParameterTypeRequiredDescription
clientAshrLabsClientYesAn authenticated client
datasetIdnumberYesThe dataset ID to fetch
optionsobjectNoPassed to EvalRunner constructor

Returns: EvalRunner — A configured runner ready to call .run()

Example:

const runner = await EvalRunner.fromDataset(client, 322);

EvalRunner.run

Run the agent against every scenario and return a populated RunBuilder.

async runner.run(
agent: Agent | (() => Agent),
options?: {
onScenario?: OnScenarioCallback;
onAction?: OnActionCallback;
maxWorkers?: number;
},
): Promise<RunBuilder>

Parameters:

ParameterTypeRequiredDefaultDescription
agentAgent | (() => Agent)Yes-An object implementing the Agent interface, or a factory function
options.onScenarioOnScenarioCallbackNoundefinedCalled at the start of each scenario: (scenarioId, scenarioDict)
options.onActionOnActionCallbackNoundefinedCalled for each action: (actionIndex, actionDict)
options.maxWorkersnumberNo1Number of scenarios to run in parallel. When >1, scenarioId is passed to respond() and reset() so the agent can key state per scenario.

Type aliases:

type OnScenarioCallback = (scenarioId: string, scenario: Record<string, unknown>) => void;
type OnActionCallback = (actionIndex: number, action: Record<string, unknown>) => void;

Returns: RunBuilder — A populated builder ready for .build() or .deploy()

Example:

// Sequential (default)
const run = await runner.run(agent);
const result = run.build();
console.log(result.aggregate_metrics);

// Parallel — run 4 scenarios at a time
const run = await runner.run(agent, { maxWorkers: 4 });

// With factory function
const run = await runner.run(() => new MyAgent(), { maxWorkers: 4 });

EvalRunner.runAndDeploy

Run the eval and submit results in one call.

async runner.runAndDeploy(
agent: Agent | (() => Agent),
client: AshrLabsClient,
datasetId?: number,
options?: {
onScenario?: OnScenarioCallback;
onAction?: OnActionCallback;
maxWorkers?: number;
tenantId?: number;
runnerId?: number;
},
): Promise<Record<string, unknown>>

Parameters:

ParameterTypeRequiredDefaultDescription
agentAgent | (() => Agent)Yes-An object implementing the Agent interface, or a factory function
clientAshrLabsClientYes-An authenticated client
datasetIdnumberNoundefinedThe dataset to submit against
options.onScenarioOnScenarioCallbackNoundefinedCallback per scenario
options.onActionOnActionCallbackNoundefinedCallback per action
options.maxWorkersnumberNo1Number of scenarios to run in parallel (default sequential)
options.tenantIdnumberNoautoThe tenant (auto-resolved if omitted)
options.runnerIdnumberNoundefinedID of the user who ran the test

Returns: The created run object from the API

Example:

// Sequential
const created = await runner.runAndDeploy(agent, client, 322);
console.log(`Run #${created.id} submitted`);

// Parallel
const created = await runner.runAndDeploy(agent, client, 322, { maxWorkers: 4 });

Agent Interface

An interface that defines the contract agents must implement.

interface Agent {
respond(
message: string,
scenarioId?: string,
): Record<string, unknown> | Promise<Record<string, unknown>>;

reset(scenarioId?: string): void | Promise<void>;
}

respond

Process a user message and return the agent's response.

Parameters:

ParameterTypeDescription
messagestringThe user's message text
scenarioIdstringOptional scenario ID (passed during parallel execution)

Returns: An object (or Promise of an object) with:

  • "text" (string): The agent's text response
  • "tool_calls" (Array): Tool calls made during this turn, each with "name" (string) and "arguments" (object) keys

arguments vs arguments_json: The Agent interface returns tool arguments as an object under the "arguments" key. However, RunBuilder and the API store them as a JSON string under "arguments_json". EvalRunner handles this conversion automatically. If you use RunBuilder directly, pass "arguments_json" (a JSON string) to addToolCall(). The extractToolArgs() helper accepts both formats, so comparators work either way.

reset

Clear conversation state for a new scenario. Called before each scenario begins.

Parameters:

ParameterTypeDescription
scenarioIdstringOptional scenario ID (passed during parallel execution)

Comparator Functions

All comparator functions are standalone and importable from the top-level package.

stripMarkdown

Remove markdown formatting from text.

stripMarkdown(text: string): string

Removes bold/italic markers, headers, bullets, and markdown links. Collapses whitespace.

Example:

stripMarkdown("**Bold** and [link](https://x.com)");
// => "Bold and link"

tokenize

Lowercase, strip markdown and punctuation, split into word tokens.

tokenize(text: string): string[]

Example:

tokenize("Order **ORD-123** shipped!");
// => ["order", "ord123", "shipped"]

fuzzyStrMatch

Check if two strings are semantically close enough to count as matching.

fuzzyStrMatch(a: string, b: string, threshold?: number): boolean

Parameters:

ParameterTypeRequiredDefaultDescription
astringYes-First string
bstringYes-Second string
thresholdnumberNoadaptiveWord-overlap threshold. If undefined: 0.35 for <=5 words, 0.40 for <=8, 0.55 otherwise

Returns: true if the strings match closely enough

Checks in order: exact match after normalization, containment, then word-set overlap.

Example:

fuzzyStrMatch("Customer wants a refund", "customer wants refund");  // true
fuzzyStrMatch("apple banana", "cherry grape"); // false

extractToolArgs

Extract arguments from a tool call object, handling both formats.

extractToolArgs(toolCall: Record<string, unknown>): Record<string, unknown>

Handles { arguments: {...} } (object form) and { arguments_json: "..." } (JSON string form). Prefers the object form if both are present.

Example:

extractToolArgs({ arguments_json: '{"order_id": "ORD-123"}' });
// => { order_id: "ORD-123" }

extractToolArgs({ arguments: { order_id: "ORD-123" } });
// => { order_id: "ORD-123" }

compareToolArgs

Compare expected vs actual tool call arguments.

compareToolArgs(
expected: Record<string, unknown>,
actual: Record<string, unknown>,
): [string, string | null]

Parameters:

ParameterTypeDescription
expectedRecord<string, unknown>Expected tool call (with arguments or arguments_json)
actualRecord<string, unknown>Actual tool call made by the agent

Returns: A tuple of [matchStatus, divergenceNotes]:

  • matchStatus: "exact", "partial", or "mismatch"
  • divergenceNotes: Human-readable diff summary, or null if exact

String arguments are compared using fuzzyStrMatch. Non-string values use JSON.stringify equality. Extra arguments in the actual call don't cause divergence.

Example:

const [status, notes] = compareToolArgs(
{ arguments: { order_id: "ORD-123" } },
{ arguments: { order_id: "ORD-123", extra: "field" } },
);
// => ["exact", null]

const [status, notes] = compareToolArgs(
{ arguments: { order_id: "ORD-123", reason: "damaged item" } },
{ arguments: { order_id: "ORD-999", reason: "item was damaged" } },
);
// => ["partial", "'order_id': expected='ORD-123' actual='ORD-999'"]

textSimilarity

Compute similarity between two text strings.

textSimilarity(textA: string, textB: string): number

Returns: A number between 0.0 and 1.0

Uses cosine similarity on word frequency vectors, plus:

  • Entity bonus (+0.20): for matching order IDs (ORD-*), refund IDs (REF-*), prices ($*), dates (YYYY-MM-DD), and tracking URLs
  • Concept bonus (+0.10): for matching domain concepts (refund/credited, shipped/transit/delivered, stock/available, etc.)

Example:

textSimilarity(
"Your order ORD-123 has shipped and is on the way",
"Order ORD-123 has been shipped and is in transit",
);
// => 0.78

Data Types

User

interface User {
id?: number;
created_at?: string;
email?: string;
name?: string | null;
tenant?: number;
is_active?: boolean;
}

Tenant

interface Tenant {
id?: number;
created_at?: string;
tenant_name?: string;
is_active?: boolean;
}

Session

interface Session {
status: string;
user: User;
tenant: Tenant;
}

Dataset

interface Dataset {
id?: number;
created_at?: string;
tenant?: number;
creator?: number;
name?: string;
description?: string | null;
dataset_source?: Record<string, unknown>;
}

Run

interface Run {
id?: number;
created_at?: string;
dataset?: number;
tenant?: number;
runner?: number;
result?: Record<string, unknown>;
}

SdkNote

interface SdkNote {
id?: number;
created_at?: string;
updated_at?: string;
title?: string;
content?: string;
category?: string; // "info" | "warning" | "breaking_change" | "best_practice" | "deprecation"
severity?: string; // "info" | "warning" | "critical"
tenant_id?: number | null;
agent_id?: number | null;
active_from?: string;
expires_at?: string | null;
is_archived?: boolean;
note_metadata?: Record<string, unknown>;
}

Request

interface Request {
id?: number;
created_at?: string;
requestor_id?: number;
requestor_tenant?: number;
request_name?: string;
request_status?: string;
request_input_schema?: Record<string, unknown> | null;
request?: Record<string, unknown>;
}

APIKey

interface APIKey {
id?: number;
key?: string; // Only present on creation
key_prefix?: string;
name?: string;
scopes?: string[];
user_id?: number;
tenant_id?: number;
created_at?: string;
last_used_at?: string | null;
expires_at?: string | null;
is_active?: boolean;
}

ToolCall

interface ToolCall {
name?: string;
arguments_json?: string;
}

ExpectedResponse

interface ExpectedResponse {
tool_calls?: ToolCall[];
text?: string;
}

Action

interface Action {
actor?: string; // "user" or "agent"
content?: string;
name?: string;
expected_response?: ExpectedResponse;
}

Scenario

interface Scenario {
title?: string;
actions?: Action[];
}