We measure before we claim. And we publish our failures.
0.574N71 overall — the highest of any memory system in the study (best published: 0.42)
0.628 vs 0.03Cascade — when a fact changes, dependent facts update
0.42 vs 0.01Absence — knowing what it no longer knows
The full 100-episode suite — the benchmark's published episodes and judges, through our production pipeline. All 1,188 questions downloadable for validation.