Scenario — Lost Item Claim

CustomerCustomer calls about lost item

I left my laptop bag on the plane. Flight DL 847 from ATL to ORD.

Audio

AgentAgent creates lost item report

I'll file a lost item report. Can you describe the bag or send a photo?

create_lost_item(flight="DL847", date="2026-03-12")

CustomerCustomer sends photo of item

Here's a photo of the same bag from my last trip.

Image

AgentAgent matches against found items

match_item(report="LI-3382", image=true)

AgentAgent locates the item

shipping_label_LI-3382.pdf

File

generate_shipping(report="LI-3382", dest="customer_address")

Scenario — Flight Rebooking

CustomerCustomer requests rebooking

My connecting flight was cancelled. I need to get to Seattle by tonight.

Audio

AgentAgent searches alternatives

search_flights(dest="SEA", date="2026-03-12", after="14:00")

AgentAgent offers options

I found two options: UA 512 at 4:15 PM or AS 330 at 6:00 PM. Which works?

CustomerCustomer picks earlier flight

The 4:15 please. Will my checked bag transfer automatically?

AgentAgent rebooks and transfers bag

rebook_pax(pnr="XK73M", new_flight="UA512")

transfer_bag(tag="SEA-0042917")

Scenario — Refund Dispute

CustomerCustomer disputes charge

I was charged twice for my hotel stay on Jan 5. Confirmation #H-88201.

Chat

AgentAgent pulls billing records

get_billing(conf="H-88201", guest_id="G-4419")

AgentAgent confirms duplicate

You're right — I see two charges of $189. I'll reverse the duplicate now.

issue_refund(txn="TXN-90221", amount=189.00)

CustomerCustomer asks about timeline

How long until it shows on my card?

AgentAgent provides refund ETA

The $189 refund should appear within 3–5 business days.

refund_receipt_TXN-90221.pdf

File

Scenario — Lost Item Claim

CustomerCustomer calls about lost item

I left my laptop bag on the plane. Flight DL 847 from ATL to ORD.

Audio

AgentAgent creates lost item report

I'll file a lost item report. Can you describe the bag or send a photo?

create_lost_item(flight="DL847", date="2026-03-12")

CustomerCustomer sends photo of item

Here's a photo of the same bag from my last trip.

Image

AgentAgent matches against found items

match_item(report="LI-3382", image=true)

AgentAgent locates the item

shipping_label_LI-3382.pdf

File

generate_shipping(report="LI-3382", dest="customer_address")

Scenario — Flight Rebooking

CustomerCustomer requests rebooking

My connecting flight was cancelled. I need to get to Seattle by tonight.

Audio

AgentAgent searches alternatives

search_flights(dest="SEA", date="2026-03-12", after="14:00")

AgentAgent offers options

I found two options: UA 512 at 4:15 PM or AS 330 at 6:00 PM. Which works?

CustomerCustomer picks earlier flight

The 4:15 please. Will my checked bag transfer automatically?

AgentAgent rebooks and transfers bag

rebook_pax(pnr="XK73M", new_flight="UA512")

transfer_bag(tag="SEA-0042917")

Scenario — Refund Dispute

CustomerCustomer disputes charge

I was charged twice for my hotel stay on Jan 5. Confirmation #H-88201.

Chat

AgentAgent pulls billing records

get_billing(conf="H-88201", guest_id="G-4419")

AgentAgent confirms duplicate

You're right — I see two charges of $189. I'll reverse the duplicate now.

issue_refund(txn="TXN-90221", amount=189.00)

CustomerCustomer asks about timeline

How long until it shows on my card?

AgentAgent provides refund ETA

The $189 refund should appear within 3–5 business days.

refund_receipt_TXN-90221.pdf

File

Lost Item Claim — Results

✓verified_identityPASS

✓correct_report_filedPASS

✓image_matchPASS

✗missing_confirmationFAIL

✓shipping_generatedPASS

Flight Rebooking — Results

✓alternatives_offeredPASS

✓booking_confirmedPASS

✓bag_transferPASS

✗timeout_exceededFAIL

✓polite_tonePASS

Refund Dispute — Results

✓duplicate_detectedPASS

✓correct_refund_amountPASS

✓receipt_generatedPASS

✗incomplete_summaryFAIL

✓compliance_checkPASS

Lost Item Claim — Results

✓verified_identityPASS

✓correct_report_filedPASS

✓image_matchPASS

✗missing_confirmationFAIL

✓shipping_generatedPASS

Flight Rebooking — Results

✓alternatives_offeredPASS

✓booking_confirmedPASS

✓bag_transferPASS

✗timeout_exceededFAIL

✓polite_tonePASS

Refund Dispute — Results

✓duplicate_detectedPASS

✓correct_refund_amountPASS

✓receipt_generatedPASS

✗incomplete_summaryFAIL

✓compliance_checkPASS

Backed by

Evals for Agents That Actually Work

Ship AI agents with confidence. Run evals, catch regressions, fix failures — before your users find them.

Get Started Read the Docs

Running evals for AI teams at

HumanBehavior

Pax Historia

SkillSync

New Post Why's This Shit Broken? Our pivot story and why AI testing infrastructure is fundamentally broken. Read →

ashr understands your system

and proactively tests what's most likely to fail

Every dataset your agent has run. Status, traces, scores — one click.

Full test timelines. Every speaker, every tool call, every response.

Expected vs. actual, side by side. See exactly where it broke.

Version every prompt. Inline diffs, pass rates per version. Know which edit broke it.

Plug in our SDK. Ship fast, ship tested.

Ready to ship?

Schedule a Call