A two-week CLI sprint for an LLM evals platform

the problem

The platform had a strong web app for prompt management, trace inspection, and experiments — but the day-to-day developer loop kept bouncing between the IDE, the terminal, the web UI, and a separate cost dashboard. Prompt state was hard to reason about across environments. Trace and session hierarchies were easier to inspect by clicking through the UI than by reading code. And there was no obvious agent surface — every workflow assumed a human in front of a browser.

The ask: build a CLI that closes those loops without trying to replace the web app.

what shipped

A full CLI — auth, init, prompt status (modeled on git status), tree and list views of sessions and traces, deep-linking back into the web app via open, and an agent-friendly TOON output mode. All non-interactive by default, with the TUI layered on for browsing. About two weeks end to end.

The design moves that mattered:

CLI-first, agent-second. Every command works headless. The TUI is supplementary; agents see the same surface developers do.
prompts status as the unit of progress. A single command answers “where am I drifting?” across local files, remote prompts, and deployed environments. Decouples the application from the platform.
Costs in the tree view. No more leaving the platform to check OpenRouter — model, duration, and cost render alongside the trace hierarchy.
open as the bridge. Every CLI artifact has a one-keystroke path back to the full web UI. The CLI doesn’t compete with the web app — it accelerates the path to it.

what changed

The web app stopped being the only entry point. Adding the platform to an existing project went from a 20-minute scavenger hunt to a single init command. Verifying observability instrumentation became a tree or ls invocation instead of a click-through. Prompt drift across environments became visible without leaving the terminal.

The CLI-first abstraction also opened the door to using the same platform from inside agent loops — the same interface that helped me also helped my agents.

the lesson

Build the CLI on purpose and the web app gets stronger, not weaker. Designing the same surface to serve a developer and an agent on the same day is the actual unlock — not “we have a CLI now.”