Speed
Measured runtime path from request entry to a sponsoring decision being ready, excluding the AI model’s own wait time. p99 means 99% of requests were this fast or faster.
Evidence
We test wavebird the way we would want our infrastructure tested. Every claim on this page is backed by reproducible evidence from controlled benchmark runs and a comprehensive pre-pilot validation campaign with fault injection.
Last updated: 2026-04-07
Speed
Measured runtime path from request entry to a sponsoring decision being ready, excluding the AI model’s own wait time. p99 means 99% of requests were this fast or faster.
Reliability
Over 8 hours (263,534 total slots processed) with fault injection active, we observed 0 missing proofs across the 260,321 terminal slots expected to produce proof.
Resilience
We simulated seven exchange failure modes (plus three PostgreSQL failures). Result: 0 crashes, 0 unrecoverable states, correct circuit breaker activation and recovery.
In March and April 2026, we ran our pre-pilot validation campaign: a set of automated tests designed to find problems before the first real partner connects. The final run completed on April 7, 2026. We did not test under ideal conditions. We deliberately broke things.
Our Mock-SSP chaos mode randomly injected network delays, server errors, malformed responses, dropped connections, and traffic spikes into the test runs. The goal is simple: prove correct behavior under failure before we connect a live partner.
Mock-SSP
Mock-SSP simulates an ad exchange response inside the benchmark harness and inside the pre-pilot chaos campaign so we can measure the internal ad path without public network noise.
We processed 10,000 sponsoring slots at 100 concurrent connections with fault injection active. Result: 0 missing proofs, 0 invalid signatures, 0 orphaned beacons.
We ran 5,000 slots through 6 billing scenarios — including micro-unit price boundaries, duplicate detection, and multi-SSP fallback attribution. Result: exact reconciliation in every scenario (0 billing errors).
We tested 7 SSP failure scenarios plus 3 PostgreSQL failure scenarios. Result: 0 crashes and correct circuit breaker activation and recovery in all scenarios.
Found and fixed during the campaign
Settlement attribution bug in multi-SSP fallback: slots were incorrectly attributed to the timed-out primary SSP.
The final pre-pilot campaign run completed on April 7, 2026 with chaos fault injection active. It covered release verification, resilience, proof integrity, settlement accuracy, concurrency profiling, and the full 8-hour sustained load test.
Found and fixed: settlement attribution bug in multi-SSP fallback. Slots were incorrectly attributed to the timed-out primary SSP.
Memory management: slot eviction, ledger compaction, streaming settlement, projection pruning, rate-limiter sweeps, and automatic settlement snapshots are all active. The final 8-hour run completed with stable memory under all pass criteria.
We pushed the system from 10 to 200 concurrent connections to find where it starts to struggle. The answer: it never crashes. It gets slower, but it keeps working.
“c100” means 100 concurrent connections.
| Concurrent connections | Response time (p99) | Throughput | Errors |
|---|---|---|---|
| 10 | 64 ms | 333 ops/s | 0 |
| 25 | 293 ms | 126 ops/s | 0 |
| 50 | 695 ms | 92 ops/s | 0 |
| 75 | 1,203 ms | 73 ops/s | 0 |
| 100 | 1,764 ms | 64 ops/s | 0 |
| 150 | 3,267 ms | 52 ops/s | 0 |
| 200 | 3,590 ms | 33 ops/s | 0 |
At 200 concurrent connections, p99 response time increases to 3.6 seconds but every response is still valid (2xx). Under that extreme load we see decision poll timeouts; when load drops back to 25 connections, the system recovers within 30 seconds.
The “Errors” column is HTTP-level errors. In these runs, every response was 2xx at every concurrency level. Under extreme load we do observe decision poll timeouts (2 at c100, 130 at c150, and 1,871 at c200). The system degrades gracefully rather than failing hard. Spike recovery from c200 to c25 completes within 30 seconds.
We ran the system continuously for 8 hours with fault injection active. All 9 validation steps passed, including the full 8-hour sustained load test with memory stability verification.
Memory stability
The runtime now includes slot eviction, ledger compaction, streaming settlement, projection pruning, rate-limiter sweeps, and automatic settlement snapshots. The 8-hour soak test passed all memory stability criteria with these optimizations active.
The benchmark suite and the pre-pilot campaign were both run under controlled conditions. The goal was to measure the wavebird runtime itself, not the public internet or live model providers.
Per-run variation exists internally and will be published once the sanitized artifact bundle is ready. The original benchmark methodology remains unchanged and the March 23 results remain valid.
March 23, 2026
Firewall p99 latency
Filtering step before any ad request leaves the runtime.
Mock-SSP round-trip p99 latency
Internal ad path against a controlled exchange substitute.
End-to-end p99 latency
Measured runtime path with external model wait time excluded.
Settlement max runtime
Longest measured settlement run in the current evidence pack.
Mock-SSP request throughput
Controlled request throughput inside the benchmark harness.
April 7, 2026
Proof integrity
Processed at c100 with 0 missing proofs.
Settlement accuracy
6 scenarios with exact reconciliation.
SSP resilience
0 crashes across SSP failure scenarios.
Concurrency tested
Graceful degradation under spike load.
Sustained load
Processed over 8 hours with all memory stability criteria passed.
What this does not claim
We are transparent about what this evidence does and does not prove:
What is still open
Two things are still open in the original benchmark suite: beacon processing slows down above 50 concurrent connections, and jobs/sec remains below target. The sustained 8-hour memory stability finding from the earlier run is resolved in the final campaign.
Artifacts
Downloadable artifacts will be published once the sanitized bundle is ready for public release. Pre-pilot campaign reports are available internally as machine-readable JSON artifacts.
Related material
Next step
If the runtime evidence is what you needed, the next step is the integration path.