Voice Observability

This guide walks through tracing realtime voice agents in production with the Ashr Labs SDK. Voice sessions land in the same Observability panel as text traces, but the dashboard renders them as a turn timeline with transcripts, per-stage STT / LLM / TTS breakdown, mixed-audio replay, barge-in metrics, and per-turn cost.

Voice observability is part of the broader Observability product — same API key, same dashboard, same backend. This page covers the realtime/voice-specific surfaces. For text agent tracing (chatbots, RAG pipelines, batch LLM jobs) see the main Observability guide.

Two integration paths

Path	When to use	Setup effort
LiveKit plugin	Your voice agent runs on LiveKit Agents	2 lines
Generic primitives	Any other stack — Pipecat, custom WebRTC pipeline, server-side voice loop	Open a session, wrap each turn

Both paths produce the same dashboard rows. The LiveKit plugin is just a thin adapter on top of the generic primitives that maps LiveKit's event bus to our Session / Turn / Stage model automatically.

Path 1: LiveKit (the happy path)

If your agent is a LiveKit AgentSession, the entire instrumentation is two lines.

Install

pip install ashr-labs[livekit]

Attach in your worker

import os
from ashr_labs.voice_obs.livekit import VoiceObservability

obs = VoiceObservability(api_key=os.environ["ASHR_LABS_API_KEY"])
obs.attach(
    session,                         # your LiveKit AgentSession
    agent_id="support_v3",           # logical name — shows in the dashboard
    agent_version="v42",             # optional, for A/B comparisons
    stt_model="deepgram/nova-3",     # provider/model strings used by AgentSession
    llm_model="openai/gpt-4.1-mini", # — needed for cost rollups, see below
    tts_model="cartesia/sonic-2",
)

That's the whole instrumentation. STT, LLM, TTS metrics, turn boundaries, and barge-ins are captured automatically by hooking the AgentSession's event surface. Mixed-audio replay is enabled by default — agent TTS and remote participant audio are mixed at 24 kHz mono and uploaded so the dashboard's audio player can presign and stream them.

Why pass `stt_model` / `llm_model` / `tts_model`

LiveKit's STTMetrics / LLMMetrics / TTSMetrics only carry a label field — never the provider/model. Without these hints, the cost-pricing table can't look anything up and every per-turn cost lands as 0.0. Pass the same provider/model strings you used to construct the AgentSession (e.g. "deepgram/nova-3", "openai/gpt-4.1-mini"); the SDK splits on / to recover provider + model.

`attach(...)` parameters

Parameter	Type	Required	Description
`livekit_session`	`AgentSession`	Yes	The LiveKit session to instrument (positional)
`agent_id`	`str`	Yes	Logical agent name — what shows up in the dashboard's agent filter
`agent_version`	`str`	No	Version tag for comparing rollouts side-by-side
`tenant_id`	`int`	No	Defaults to `0`; set if you're using multi-tenancy
`room_id`	`str`	No	LiveKit room ID; auto-derived if omitted
`user_id`	`str`	No	End-user identifier for grouping
`external_session_id`	`str`	No	Your own session ID, for cross-system joining
`stt_model`	`str`	No	`"provider/model"` hint for cost rollup
`llm_model`	`str`	No	`"provider/model"` hint for cost rollup
`tts_model`	`str`	No	`"provider/model"` hint for cost rollup

Graceful shutdown

VoiceObservability.attach(...) returns immediately and does its work on the worker's event loop. On worker shutdown, call:

obs.shutdown()  # drains the buffer with a 5-second timeout

This is optional — the SDK registers an atexit flush as a safety net — but recommended in livekit_worker.entrypoint's shutdown hook so any in-flight turns land before the process exits.

Required env vars (for the demo agents)

The shipped demos read configuration from environment:

LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET — your LiveKit project
ASHR_LABS_API_KEY (or ASHR_VOICE_OBS_API_KEY) — your Ashr Labs API key
ASHR_VOICE_OBS_TENANT_ID — your tenant ID

Runnable demo agents

Two examples ship with the SDK so you can see voice observability flow end-to-end without writing any agent code:

# Minimal — connects to LiveKit, attaches observability, greets the participant
python -m ashr_labs.voice_obs.examples.livekit_worker dev

# Full — a more "real-feeling" support agent built on the same primitives
python -m ashr_labs.voice_obs.examples.ashr_support_agent dev

What gets captured automatically (LiveKit)

For each AgentSession, the plugin maps native events into the dashboard's data model:

LiveKit event	What the plugin records
`user_state_changed` → `speaking`	Open a user turn, fire `user_speech_start` event
`user_state_changed` → `listening`	Close the user turn, fire `user_speech_end` event
`agent_state_changed` → `speaking`	Open an agent turn, compute & attach TTFA, fire `agent_speech_start`
`agent_state_changed` → `listening`	Close the agent turn, fire `agent_speech_end`
`user_input_transcribed` (`is_final=True`)	Set transcript on the active user turn
`conversation_item_added` (assistant)	Set transcript on the active agent turn
`metrics_collected` → `STTMetrics`	One `stt` stage on the user turn (with cost)
`metrics_collected` → `LLMMetrics`	One `llm` stage on the agent turn (with TTFT, tokens, cost)
`metrics_collected` → `TTSMetrics`	One `tts` stage on the agent turn (with TTFB, audio duration, cost)
`metrics_collected` → `InterruptionMetrics`	Mark user turn as interrupting agent turn, fire `barge_in` event
`agent_false_interruption`	Fire `failed_interrupt` event on the active turn
`close`	Close any open turns and end the session

TTFA (time-to-first-audio, the user-perceived latency from "I stopped speaking" to "agent started speaking") is computed using a monotonic clock between the user's listening transition and the agent's speaking transition.

Mixed-audio replay taps the LiveKit AudioOutput sink and the remote participant track, mixes them at 24 kHz mono, and uploads them to object storage with a 5-minute presigned URL. The dashboard's audio player handles streaming.

Path 2: Generic primitives (any stack)

If you're not on LiveKit, use the same Client directly. The model is: open a Session, wrap each user/agent turn in a context manager, wrap each stage (STT/LLM/TTS) in a nested context manager, end the session.

from ashr_labs.voice_obs import Client, STTPayload, LLMPayload, TTSPayload, Message

client = Client(api_key="vo_...your_ingest_key...")

session = client.start_session(
    agent_id="support_v3",
    transport="webrtc",
    user_id="user_42",
    external_session_id="my-call-id-abc",
)

# A user turn — wrap STT, set the final transcript.
with session.user_turn() as user_turn:
    with user_turn.stage("stt", provider="deepgram", model="nova-3") as stt:
        # ... your STT call here ...
        stt.set_payload(STTPayload(
            audio_duration_ms=2400,
            request_duration_ms=180,
            final_transcript="I can't log in",
            final_confidence=0.97,
            language="en",
            cost_usd=0.0012,
        ))
    user_turn.set_transcript("I can't log in")

# An agent turn — wrap LLM, then TTS.
with session.agent_turn() as agent_turn:
    with agent_turn.stage("llm", provider="openai", model="gpt-4.1-mini") as llm:
        # ... your LLM call here ...
        llm.set_payload(LLMPayload(
            prompt_tokens=420, completion_tokens=80,
            ttft_ms=320, tokens_per_second=72.5,
            prompt=[Message(role="user", content="I can't log in")],
            completion="Let me help reset your password.",
            cost_usd=0.0021,
        ))
    with agent_turn.stage("tts", provider="cartesia", model="sonic-2") as tts:
        # ... your TTS call here ...
        tts.set_payload(TTSPayload(
            text_input="Let me help reset your password.",
            audio_bytes=18000,
            audio_duration_ms=1900,
            ttfb_ms=140,
            voice_id="cartesia-en-female-1",
            cost_usd=0.0008,
        ))
    agent_turn.set_transcript("Let me help reset your password.")

session.end(reason="user_left")
client.shutdown()

Recording barge-ins manually

When a user starts speaking before the agent finishes, mark it on the user turn:

with session.user_turn() as user_turn:
    user_turn.mark_interrupted(
        by_turn_id=current_agent_turn_id,
        interrupt_latency_ms=180,
    )
    user_turn.event("barge_in", {
        "agent_turn_id": current_agent_turn_id,
        "interrupt_latency_ms": 180,
    })

Recording transport quality

For WebRTC transports, push MOS / packet loss / jitter onto the session as you observe them. The most recent value is forwarded at session close:

session.set_transport_quality(mos_score=4.2, packet_loss_pct=0.3, jitter_ms=18)

Custom turn events

Beyond barge_in / failed_interrupt, you can attach any string event type to a turn. These show up inline in the dashboard's timeline:

turn.event("guardrail:toxicity", {"flagged": False, "score": 0.04})
turn.event("tool:lookup_account", {"user_id": "user_42", "found": True})

Stage kinds beyond STT/LLM/TTS

The supported kind values are stt, llm, tts, vad, nlu, dialogue_management, api_call. Each has a corresponding typed payload (STTPayload, VADPayload, NLUPayload, …) under ashr_labs.voice_obs.schemas. The dashboard has dedicated rendering for STT/LLM/TTS; the others are rendered as a generic span row with the payload pretty-printed.

`Client(...)` parameters

Parameter	Default	Description
`api_key`	required	Your voice obs ingest key (`vo_...`)
`base_url`	`https://api.ashr.io`	Override for self-hosted backends
`flush_interval_seconds`	`5.0`	How often the background worker flushes the buffer
`flush_threshold`	`10`	Flush early when this many spans accumulate
`max_buffer_size`	`10_000`	Hard cap; oldest dropped if exceeded (warning logged)
`masker`	default registry	Override to add custom PII masking rules

Audio replay

The LiveKit plugin uploads mixed audio automatically. For the generic primitives path, upload it yourself once you have the file:

with open("call.opus", "rb") as f:
    client.upload_audio(
        session_id=session.session_id,
        audio=f,
        duration_ms=180_000,
        codec="opus",
    )

Audio is stored in object storage with a 5-minute presigned URL minted on-demand by the dashboard's audio player. Anything stored is encrypted at rest.

Reading your data

Voice sessions are surfaced through the same query API as text traces, plus voice-specific endpoints. See the Reading your data section in the Observability guide for the text-trace API; for voice-specific reads, the dashboard renders:

Turn timeline — every user/agent turn with transcript, duration, and stage breakdown
Per-stage cost — STT + LLM + TTS cost rolled up per turn and per session
Barge-in metrics — count of interruptions, p50/p95 interrupt latency
TTFA distribution — user-perceived latency, p95 surfaced as a session-level rollup
Mixed-audio player — 24 kHz mono replay with turn markers
Transport quality — MOS / packet loss / jitter

Sessions land in the Observability → Voice tab of the Ashr Labs dashboard.

Safety properties

Never raises into your agent. Every public method catches and logs its own errors. If the backend is unreachable, sessions buffer locally and flush on retry; if the buffer overflows the cap, the oldest spans are dropped with a warning.
Never blocks the hot path. Public API is enqueue-and-return. HTTP flush happens on a background thread on a 5-second cadence (or when the buffer hits 10 spans).
Bounded memory. Default cap is 10k spans per process. Override with Client(max_buffer_size=...) if your throughput justifies more.
Monotonic clocks for duration_ms and TTFA so durations stay accurate across NTP adjustments. Wall-clock is only used for timestamps sent over the wire.
Lazy LiveKit import. ashr_labs.voice_obs itself doesn't depend on livekit-agents; the plugin only resolves it when you call .attach(...). Importing ashr_labs.voice_obs.Client works fine without LiveKit installed.
Default PII masking. A masker registry runs against transcripts and prompts before they leave the process. The default masks emails, phone numbers, and credit-card numbers; pass masker= to extend.

Troubleshooting

Symptom	Likely cause
All per-turn `cost_usd` show as `0.0`	LiveKit `STTMetrics`/`LLMMetrics`/`TTSMetrics` don't carry provider/model. Pass `stt_model="provider/model"`, `llm_model=...`, `tts_model=...` to `obs.attach(...)`.
Sessions appear, transcripts are empty	The agent isn't emitting `conversation_item_added` (assistant role) or `user_input_transcribed` with `is_final=True`. Confirm your STT plugin is configured for final results.
TTFA is `None` on agent turns	The plugin couldn't observe a `user_state_changed → listening` before `agent_state_changed → speaking`. Usually means the agent spoke without a preceding user utterance (greetings, proactive prompts) — expected.
Mixed audio is missing	The LiveKit plugin's audio recorder needs to attach to the AudioOutput sink before the agent speaks. Confirm `obs.attach(...)` runs before `session.start()`.
Session never closes (stays `active`)	Your worker exited without firing LiveKit's `close` event. Call `obs.shutdown()` in your shutdown hook, or call `session.end(reason=...)` directly on the generic primitives path.

Next steps

Observability — text trace tracing, analytics, and reading data back
Quick Start — minting an API key and your first instrumented call
Authentication — managing and rotating ingest keys
LiveKit's AgentSession docs for the upstream surface this plugin hooks into

If you hit issues, the SDK never crashes your agent — but it does log to stderr. Set logging.getLogger("ashr_labs").setLevel(logging.DEBUG) to see the full picture during development.

Two integration paths​

Path 1: LiveKit (the happy path)​

Install​

Attach in your worker​

Why pass stt_model / llm_model / tts_model​

attach(...) parameters​

Graceful shutdown​

Required env vars (for the demo agents)​

Runnable demo agents​

What gets captured automatically (LiveKit)​

Path 2: Generic primitives (any stack)​

Recording barge-ins manually​

Recording transport quality​

Custom turn events​

Stage kinds beyond STT/LLM/TTS​

Client(...) parameters​

Audio replay​

Reading your data​

Safety properties​

Troubleshooting​

Next steps​