Engineering

How we reduced analysis time from 3 minutes to under 2

April 1, 2026 · 8 min read

The problem

When ARI launched, a full analysis took an average of 3 minutes 20 seconds. That's not bad — manual QA takes 3 days — but it felt slow. Engineers would trigger an analysis, context-switch to something else, and forget to check the result.

We set a target: under 2 minutes, always.

What was taking so long

We profiled hundreds of analyses and found the time was split roughly like this:

—46% — crawling and simulating user flows
—31% — AI inference (the Claude API calls)
—15% — regression comparison against the previous baseline
—8% — report generation and storage

The low-hanging fruit was obvious: we were doing most of this sequentially.

Fix 1: Parallel flow execution

Previously, we crawled each critical user flow one at a time: login, then checkout, then signup, then search. Each flow took 15–25 seconds. Five flows = up to 125 seconds before we even started AI inference.

We refactored to run all flows concurrently using a worker pool. Each flow gets its own headless browser instance. The pool scales based on application complexity — simple apps get 3 workers, complex apps get up to 8.

Result: flow crawling dropped from ~90s to ~22s.

Fix 2: Streaming AI inference

We used to wait for the full crawl to complete before sending anything to the AI. Now we stream flow results to the AI as they complete. By the time the last flow finishes crawling, the AI has already processed the first three.

This required restructuring our prompt pipeline significantly — the AI now receives partial context and updates its analysis as new flow data arrives.

Result: AI processing overlap reduced total time by ~35s.

Fix 3: Incremental regression comparison

Comparing against the previous baseline used to reload the entire baseline snapshot from storage. For large apps, this could be 50MB+ of crawl data.

We switched to a diff-based approach: we store a compact hash fingerprint of each flow state, and only load full snapshots for flows where the fingerprint changed.

Result: regression comparison dropped from ~30s to ~4s for most analyses.

The result

Average analysis time is now 1 minute 58 seconds. P95 is under 3 minutes even for large applications.

More importantly, completion rates improved — engineers now stay on the analysis page instead of switching away, because the result arrives before they lose focus.

More from the blog

ProductIntroducing Revenue Impact Analysis: put a dollar amount on every bug

EngineeringBuilding an AI that simulates QA engineers at scale

Best PracticesWhy gut-feel release decisions are costing you $22K per incident