booking-flow-v2
Run #482 · production · 3.2s
13
Passed
2
Failed
86.7%
Pass Rate
0.87
Avg Score
Validation Scores
embeddings
91%
llm-judge
84%
exact-match
67%
Run History
#482
13/15
#479
11/15
#475
14/15
#471
15/15
#468
12/15
Actions (5)
Run #482 passed
1
agent
tool_call
exact
97% similar
3
Expected
3
Actual
3
Exact
0
Partial
0
Missed
0
Extra
Expected
search_flights
(
from=
"SFO"
, to=
"JFK"
, date=
"2026-03-15"
)
→ 3 results
Actual
search_flights
(
from=
"SFO"
, to=
"JFK"
, date=
"2026-03-15"
)
→ 3 results
Matching (3)
from
to
date
2
agent
tool_call
mismatch
41% similar
3
Expected
2
Actual
1
Exact
0
Partial
1
Missed
0
Extra
Expected
confirm_cancel
(
booking_id=
"BK-1492"
, refund=
true
)
→ refund $342.00
Actual
cancel_booking
(
booking_id=
"BK-1492"
)
→ cancelled, no refund
Matching (1)
booking_id
Missing (1)
refund
Agent called cancel_booking instead of confirm_cancel — skipped refund confirmation step
3
agent
tool_call
exact
96% similar
2
Expected
2
Actual
2
Exact
0
Partial
0
Missed
0
Extra
Expected
get_seat_map
(
flight_id=
"AA1492"
, class=
"economy"
)
→ 42 seats available
Actual
get_seat_map
(
flight_id=
"AA1492"
, class=
"economy"
)
→ 42 seats available
Matching (2)
flight_id
class
4
agent
tool_call
~ partial
78% similar
4
Expected
4
Actual
3
Exact
1
Partial
0
Missed
0
Extra
Expected
book_flight
(
flight_id=
"AA1492"
, seat=
"14A"
, passenger=
"Emily Chen"
, amount=
342
)
Actual
book_flight
(
flight_id=
"AA1492"
, seat=
"14A"
, passenger=
"Emily Chen"
, amount=
342.00
)
Matching (3)
flight_id
seat
passenger
Different (1)
amount:
342
→
342.00
5
agent
tool_call
exact
94% similar
2
Expected
2
Actual
2
Exact
0
Partial
0
Missed
0
Extra
Expected
process_payment
(
booking_id=
"BK-1492"
, amount=
342.00
)
→ payment confirmed
Actual
process_payment
(
booking_id=
"BK-1492"
, amount=
342.00
)
→ payment confirmed
Matching (2)
booking_id
amount