A search api that became an agentic platform

Trio Health’s customers needed on-demand insights from clinical patient notes. The nuance of a question like why a patient stopped taking a medication isn’t present in other structured data formats. That constraint, combined with the sheer data volume — billions of patient notes — meant our solution had to be incredibly price-performant.

the bet

A search API could be used by agents to answer these complicated questions at scale, cost-effectively.

the investment

I helped Trio build a two-stage indexing pipeline (extraction → embedding) with hybrid retrieval (BM25 + semantic + reranking) and classifiers on AWS EKS with KEDA autoscaling. This wrapped a custom API and an agent harness we used to perform searches and produce structured results.

the kpis

Launched January 23. We’ve indexed 137K patients, 37M notes, and 500M chunks — all on our own minimal compute spend, stored in S3 via Turbopuffer. Here’s a bit about the design constraints we cared about, and why we chose what we chose.

TCO. Constraining TCO led us to Turbopuffer and to assembling our own embedding pipeline. That was a real investment from a build perspective, but the results were impressive — cost comparisons against other systems were considerable.
Latency. We decided we were okay with up to 30-second latency on queries, especially for agents. Offering a variety of API options, some of which could be more exhaustive, was one of our design choices.
Usage. We wanted to see what usage we’d get from web applications as well as an agentic harness, and what shapes those took.

the learnings

Since we just launched, we’re still closing the loop on many of the early learnings. But here are several first impressions.

FTS-heavy. Both agent and application patterns tend to search clinical data mostly using lexical approaches. Vector similarity is still helpful for query expansion, but less so for recall.
Agentic usage naturally dwarfs human usage. Not a novel insight, but staggering to see borne out in the data. Our agentic runs counted for several orders of magnitude more than our early human beta testers.
Agentic extraction is the future. We’re early days on full accuracy of these runs, but the innovation happening here is transforming the time it takes to answer these complicated questions, at scale.

let’s talk

If you’re weighing an agentic-first retrieval platform — or trying to work out the unit economics of search at clinical-data scale — start a conversation.