title: "Selective Test Execution at Stripe: Dynamic Discovery and Verification at Massive Scale" date: 2026-07-01 author: "Chokmah LLC" tags: ["engineering", "ci-cd", "testing", "stripe", "verification", "dynamic-analysis", "chokmah"] source: ".\stripe-ste-loop-engineering-analysis.md"
Selective Test Execution at Stripe: Dynamic Discovery and Verification at Massive Scale
Autonomously synthesized by the Chokmah Content-to-Blog Loop from: .\stripe-ste-loop-engineering-analysis.md
What Stripe Built
Stripe runs a ~50 million line Ruby monorepo with roughly 100,000 test files. A full sequential run would take four months. They ship ~50,000 builds per week. Their solution is Selective Test Execution (STE) — on average they only run about 5% of the test suite (median ~0.5%) per build while keeping safety high.
Instead of the naive “run everything” or brittle static analysis, Stripe built a dynamic observation system.
How It Works: File Access Interception
The core idea is deceptively simple and extremely effective:
They use a small in-house C++ library (file_access_interceptor) loaded via LD_PRELOAD. It intercepts open syscalls at the operating system level and records which files each test actually touched.
Key design choices that make it work at scale:
- It propagates into child processes (forks and spawns).
- It does almost zero work inside the hot open path — just appends a record. All heavy lifting happens later.
- It naturally captures not just Ruby code, but config files, templates, fixtures, generated artifacts, and everything else a test depends on.
This is dynamic discovery done right. It observes reality instead of trying to statically model Ruby’s metaprogramming and dynamism.
Scopes and the Honest Global Surface
Tests are attributed using a hierarchical scope system. Everything loaded before tests run lives in the “root scope”. Anything in root scope becomes a global dependency — changing it can (correctly) force many tests to run.
Stripe treats this as a feature: the system is explicit about shared global state instead of pretending perfect isolation is possible.
Verification + Guardrails
Raw file-access data is turned into a compact selection index using roaring bitmaps. This allows very fast lookup of “which tests are impacted by these changed files?”
But the system is not blindly data-driven. It includes important guardrails:
- Mandatory tests: Tests that use directory globbing or file discovery are always run (new files can affect them without any previous open signal).
- Previously failing tests are always re-run.
- Special handling for linters so they don’t trigger unnecessary full-repo scans.
These guardrails are the practical “human judgment” layer on top of the automated selection.
Persistence and Reproducibility at Scale
Every build produces:
- A file inventory (path + hash for every file, including generated artifacts — computed with hashdeep)
- The compact selection index
This data is stored with Monotonic Revision IDs (MRIs) so that a single fast database query can retrieve the correct baseline for any new build. This gives both performance and debuggability (“which STE base did we use for this selection?”).
Why This Matters (Loop Engineering View)
Stripe’s STE is one of the best large-scale real-world examples of a properly engineered loop:
- Discovery: Dynamic, low-overhead observation via syscall interception instead of static analysis.
- Verification: Independent selection index + explicit guardrails (mandatory tests, special cases).
- Persistence: Compact bitmaps + reproducible baselines with MRIs.
- Honest boundaries: Root scope makes global dependencies visible instead of hidden.
It dramatically reduces compute waste and feedback time while keeping safety through pragmatic exceptions rather than trying to automate everything perfectly.
This is exactly the kind of disciplined, observable, guardrail-equipped automation we advocate for in sovereign systems, agentic workflows, simulation platforms, and safety-critical pipelines.
Key Takeaways
- Dynamic observation at the right boundary (file I/O) beats trying to statically model highly dynamic languages.
- Make the irreducible shared surface explicit (root scope / global dependencies).
- Layer pragmatic guardrails on top of data-driven automation instead of pretending the model is perfect.
- Invest in reproducible, queryable state so selection decisions are fast and debuggable.
Stripe’s approach is portable to many other domains where you have high volume, high dynamism, and a need for both speed and safety.
Generated autonomously by the Chokmah Content-to-Blog Loop. Raw source preserved in draft artifacts for traceability.