Back to Blog

Why's This Shit Broken?

If you were a fly in our Dogpatch apartment during the late weeks of December, you would hear my cofounder and I barking the same sentence back and forth to each other. "Why's this shit broken?"

Turns out we would decide to dedicate our 20s to trying to answer that question for other engineers.

Ashr started similarly to many of the AI startups reading this post, with the thesis that AI was a fundamentally driving force that would revolutionize the way entire sectors do business. We worked with SMBs: automating workflows, scheduling, and payroll.

The whole way through this process, we struggled. Clients wouldn't let us use their employee's payroll information as a testing ground for an AI system, current clients wouldn't be representative of every case or the scale at which we wanted to operate, and no existing testing suite was malleable enough to fit our seemingly esoteric use-case.

We thought this was due to the novelty of our space; there had been no other AI-native service primarily targeting this side of the SMB space.

When we generated test suites with Claude Code, Cursor, or Codex, they would test basic schema-matching, but would collapse at catching subtle silent fails and would not warn us when the tooling wasn't set up for the agent to retrieve information.

Further, these agents completely failed to validate the quality of responses. For any good agentic service, calling the right tools and delivering a truly high-quality response go hand-in-hand to pushing a good product.

Then we ventured out, talking to founders. Voice agent companies testing their software manually, driving themselves crazy by speaking to their agent to test every tool path. Companies working in banking, legal, and healthcare without the ability to test features not currently encompassed by the limited data they have access to.

We knew one thing: current testing infrastructure is structurally broken and doesn't match the speed of development.

What's more: quality assurance was completely absent. With no evaluation metric or North Star to look towards or compare their results with, teams are left in the dark as to the true utility or accuracy of their product.

So, in week 3 of YC, we took a step forward, and completely pivoted.

We want to work to help other founders improve their products before customers complain. We're providing testing and evaluation as a service, because the speed of testing and quality validation should match the speed at which we now code.