RFC-0029: Fuzz infrastructure status, deferred work catalog, and next-priority recommendations
- Status: Draft (Status report + forward-looking catalog)
- Author: Mark Truluck mark.truluck@cogiton.com
- Created: 2026-05-18
- Closes (supersedes): Roadmap #172 — “Fuzz runtime exec pipeline (new infrastructure)”
Framing — read this first. Roadmap #172 framed the fuzz harness as “produces .frm source + compiles via framec; doesn’t run the generated programs.” That framing is months stale — the harness already executes generated programs across all 17 backends in two complementary modes. This RFC documents what’s actually built, what’s been deliberately deferred, and what remains genuinely actionable. It does not propose new infrastructure; instead it converts #172 into an honest catalog so future work picks the right sub-task at the right time. Pattern matches RFC-0028, which did the same for the in-process API claim.
Summary
The fuzz harness in framec-test-env/fuzz/ is a mature, multi-phase
runtime-execution corpus:
- 35,740 generated cases across 20+ phase directories
- 14 phase generators (
gen_perm.py,gen_persist.py, etc.) producing per-phase Frame source - Two execution modes:
- Differential trace (
diff_harness/run_fuzz.py, phases 2-7): transpile + compile + run each case on every applicable backend, byte-diff the trace stream against the Python oracle. Format contract inTRACE_FORMAT.md. Defects surface as runtime divergence. - Assertion-driven (
run_<phase>.sh, phases 8-14, 21, 24): each case ships its own PASS/FAIL driver; runner compiles + executes via per-language toolchain (python3, node, cargo build, javac+java, kotlinc, gcc/g++, erlc+erl, etc.) and grep stdout forPASS:/FAIL:.
- Differential trace (
- Meta-runner (
run_all.sh) with--tier=smoke|core|full/--tag=<comma-list>/--lang=<name>flag contract, parallel-by-lang. - 17-backend coverage for most phases (some phases skip inapplicable backends — e.g. Go for async).
- DEFECTS.md — closed-defect log; D1 through D18 found and fixed by the fuzz harness in 2026-04 / 2026-05.
The roadmap entry framing “major new infra; multi-week” was correct circa some prior session. As of 2026-05-18, the infrastructure exists, has caught 18 distinct framec defects in production, and runs as part of the developer workflow (not yet in CI; see § “CI integration question” below).
What the harness covers today
Per FUZZ_PLAN.md § Phases 1-10 status (2026-04-28):
| Phase | What it exercises | Cases | Backends | Status |
|---|---|---|---|---|
| 1 | Trace harness foundation | — | — | shipped |
| 2 | @@persist save/load |
1,377 | 17 | clean |
| 3 | @@:self recursive dispatch |
2,646 | 16 + Erlang 54 of 162 | clean |
| 4 | HSM parent/child dispatch | 1,377 | 17 | clean |
| 5 | Operations (interface call shapes) | 459 | 17 | clean |
| 6 | Async dispatch | 220 | 11 wired | clean |
| 7 | Multi-system | 168 | 14 | clean |
| 8 | Negative — E-code triggering | 18 | 17 | clean |
| 9 | Nested syntax (curated regression) | 85 | 17 | clean |
| 10 | Expression cross-product | 7,820 | 17 | clean (D1/D2/D3 closed) |
| 11 | Stmt-pair | 1,700 | 17 | clean |
| 12 | Ctrl-flow | 4,800 | 17 | clean |
| 13 | Shadow (var scoping) | 1,700 | 17 | clean |
| 14 | HSM-cross | 1,360 | 17 | clean |
| 15 (wave 3) | State-args typed (int/str/bool/float) | 7 patterns × 17 | 17 | clean — D4-D8 fixed |
| 17 (wave 2) | Multievent + self-call mid-handler | 2,040 | 17 | clean |
| 18 (wave 1) | Stress / boundary (N=10/N=100) | 102 | 17 | clean |
| 19 (wave 4) | Push/pop depth-3 + HSM forward | 2,380 | 17 | clean |
| 20 (wave 1) | const + @@:system.state |
480 | 16 (Erlang skipped) | clean |
| 21 (wave 1) | Arithmetic edges | 680 | 17 | clean |
| 24 (wave 7) | Persist × { state-args, enter-args, HSM, push, multi-event, async, multi-system } | 75 + per-wave | 17 incremental | clean — D16/D17/D18 fixed |
Totals: ~30,000 case-runs passing across the 17-backend matrix. 18 framec defects (D1-D18) caught by the harness and fixed in framepiler over April-May 2026.
What’s been deliberately deferred
Per FUZZ_PLAN.md § Remaining roadmap (2026-04-29):
Phase 22 — Error / panic recovery: SKIP (documented per-backend semantics, not framec bugs)
The behavior of generated code when a handler raises (Rust panic,
Java exception, Python raise) is per-target-language semantics —
not a framec correctness question. A per-language guide section
covering panic propagation through dispatch is more useful than a
fuzz corpus. Not a framec defect surface.
Phase 23 — True concurrency: SKIP (out-of-scope by design)
Frame is single-threaded by spec — a single system instance dispatches events serially. Multi-threaded access is the user application’s responsibility (lock externally). Not a framec concern.
Phase 18 wave 2 (N≥1000 + wide domain + deep HSM): DEFERRED
The wave 1 endurance tests at N=10/N=100 cover the basic axes; N≥1000 would require language-native loop emission in the generators, which is significant generator work for marginal additional defect yield. Re-evaluate if/when a long-run defect is suspected.
Phase 20 wave 2 (const-as-transition-arg, const-from-system-param, @@:system.state in conditions, Erlang atom normalization): DEFERRED
Wave 1 (480 cases) covered the basic axes clean. Wave 2 axes are plausibly defect-bearing but speculatively so. Land when a real const-related issue surfaces.
Phase 21 wave 2 (type coercion): DEFERRED — target-side behavior
Wave 1 (680 cases) covered arithmetic edges. Wave 2 (int ↔ str / float, signed ↔ unsigned) would document per-target coercion semantics, not framec correctness. Same reasoning as Phase 22.
Persist × { list-typed state-args }: DEFERRED
Phase 15 wave 3 typed state-args list candidates were noted with “unclear value-density.” Could be opened if a list-typed-persist defect appears.
Architectural — nested system instance auto-rehydrate on JSON persist: OUT OF SCOPE
Per Phase 24 wave 7 finding (2026-04-30): JSON-based persist backends lose class info when a nested system is saved as a domain field — primitive round-trip works but method invocation post- restore does not. Auto-rehydration would need either a Frame-level type registry pass or per-backend custom serializers. Documented limitation; test 82 deliberately tests only primitive round-trip. Not a fuzz problem to solve.
Genuinely-remaining executable work
These are the items where future work would have measurable defect-finding value:
A. Persist cross-product completion: extend test 80-87 patterns to remaining 14 backends
Phase 24 (gen_persist_x.py) shipped wave 1+2 for 3 backends
(python_3, javascript, typescript) covering 5 patterns: state-args,
enter-args, HSM-state-args, push-state-args, multi-event-cross-save.
Wave 3+ would extend to the remaining 14 backends. Per the
“Wave 3+ candidates” note: Rust, Java, Erlang are highest priority
because they received the most invasive parts of the persist bulk
fix and are most likely to harbor remaining serialization defects.
Effort: ~2-3 hours per backend (each has its own JSON serialization shape — record builds for Erlang, derive(Serialize) for Rust, Jackson for Java, etc.) × 14 backends = ~30-40 hours, realistically a 1-2 week part-time effort.
Defect yield: Probably 3-5 framec defects per the historical ratio (D4-D8 + D16-D18 all came from persist cross-product runs).
Trigger to start: a real persist-related defect that the existing 9-backend coverage didn’t catch. No urgency today.
B. Phase 3 (selfcall) Erlang completion: 54 of 162 cases
Erlang’s selfcall coverage is partial because some @@:self
shapes produced erlc-incompatible code prior to Wave-1 phase 17
fixes (D1). The remaining 108 cases may now compile cleanly.
Effort: ~1-2 hours to re-run the deferred subset, identify which now pass, and any remaining failures.
Defect yield: Low if the D1 fix was comprehensive; higher if there’s a different class of issue lurking.
Trigger to start: bandwidth, or specific Erlang selfcall issue surfacing.
C. CI integration — fuzz smoke as a gate
RESOLVED 2026-05-18 by RFC-0031. This RFC originally argued the trigger to start was “a defect that gets through the matrix but that fuzz would have caught — none known today; this is preventative.” Within days of writing that, the RFC-0030 fuzz repair surfaced five such defects (Rust clone, C++ factory bypass, E606 assignment-form gap, Erlang split_inline_arm × 2). All would have shipped to users if the matrix had stayed the only signal. The preventative argument is no longer hypothetical.
RFC-0031 Layer 2 codifies fuzz-smoke (plus matrix + framec-local checks) as a pre-merge gate on every push. Implementation tracked as roadmap #437.
(Original analysis retained below for context.)
The fuzz harness today runs developer-side. CI runs only the
matrix (5,455 fixtures, 143s wall). run_all.sh --tier=smoke
takes ~2 min per FUZZ_PLAN.md § Wall clock budget targets —
plausible CI gate.
Effort: ~2-4 hours: docker image with framec + all 17 toolchains for fuzz exec (mostly already in the matrix image), GitHub Actions workflow, smoke-tier wiring.
Defect yield: Indirect — wouldn’t find new bugs —
revised 2026-05-18: surfaces real defects on every breaking RFC,
prevents the corpus-rot pattern that this RFC documented as
already in progress.
D. Diagnostic-axis-by-axis test discipline (process improvement)
Per FUZZ_PLAN.md § Diagnostic-axis lesson: Phase 19 wave 3
confirmed that isolated single-axis tests catch contract bugs
that cross-product tests miss. The Erlang frame_stack defect
took until wave 3 to surface because waves 1+2 always exercised
push/pop combined with other state shapes; the bug only triggered
when push/pop was tested in isolation.
Action: when adding new waves to existing phases, include at
least one test per invariant the runtime promises to preserve,
exercising ONLY that invariant. Document in
FUZZ_PLAN.md so future wave authors follow the pattern.
Effort: process discipline, not new code. Add a paragraph to
FUZZ_PLAN.md. ~15 min.
E. Per-language guides for fuzz patterns (TODO section in FUZZ_PLAN.md)
There’s a TODO section for per-language idiomatic-Frame guides. This is documentation work, not test infrastructure.
Effort: ~2-4 hours per language × 17 = 1-2 weeks if pursued.
Defect yield: zero — pure documentation.
Trigger: external contributor confusion, or onboarding cost becoming a real friction point.
CI integration question
RESOLVED 2026-05-18 by RFC-0031. Condition (a) above — “a real regression slips past the matrix that fuzz-smoke would have caught” — was satisfied during the very next session. The RFC-0030 fuzz repair surfaced 5 framec defects (Rust clone, C++ factory bypass, E606 assignment-form gap, Erlang split_inline_arm × 2). None caught by the matrix. All caught by the fuzz harness once repaired. The “not urgent” framing turned out to be wrong.
RFC-0031 codifies fuzz-smoke as a pre-merge CI gate (Layer 2) alongside the matrix and framec-local checks. Implementation tracked as roadmap #437.
(Original question and analysis retained below for context. The “Recommendation: defer” was a mistake — the defects existed at the time; nobody had run the harness to notice.)
Today’s split:
- Matrix (
make test, 143s wall, 5,455 fixtures): runs on every commit viaframecpre-commit hook (doc validator) and manualmake test. Not in CI workflow per.github/. Catches fixture-corpus regressions. - Fuzz (
run_all.sh, ~2 min smoke / ~45 min full): developer- side only. Not in CI. Catches structurally-generated edge cases the matrix doesn’t.
Question: should run_all.sh --tier=smoke run as a CI gate?
Arguments for:
- 2-minute budget is reasonable for a PR check
- 18 historical defects all caught by fuzz first — high signal
- Catches regressions the matrix corpus misses by construction (the matrix is hand-written; fuzz is generated)
Arguments against:
- Docker setup is heavy for CI (17 toolchain images, ~5-15 GB)
- Smoke tier overlaps significantly with matrix coverage
- Developers run it locally anyway when touching codegen
Recommendation Mistaken recommendation: defer until either
(a) a real regression slips past the matrix that fuzz-smoke would
have caught, or (b) CI infrastructure is being re-architected for
other reasons. Not urgent.
The mistake was treating (a) as a hypothetical-future condition rather than a present one. The defects were already in the code the day this RFC was written — fuzz just wasn’t being run to find them. RFC-0031 fixes the pattern by making fuzz-smoke a hard pre-merge gate, eliminating the “nobody ran the harness” failure mode entirely.
What this RFC recommends
-
Close roadmap #172 as superseded by this RFC. The “major new infrastructure” framing was correct months ago when it was written; it’s stale now. The infrastructure exists.
-
Don’t open #172 sub-tasks proactively. Each deferred item (A through E) has a clear “trigger to start” condition. Wait for the trigger.
-
Treat fuzz as developer-side preventative infrastructure for now. Run
./run_all.sh --tier=smokelocally before any codegen-touching commit; run--tier=fullbefore any major codegen change (RFC milestone, backend rewrite, etc.). -
Adopt diagnostic-axis-by-axis discipline (Item D above) for future fuzz wave authoring. Document the pattern in
FUZZ_PLAN.mdto make it a standing process rule. -
If a real defect-density argument emerges for one of the deferred items (A: persist × 14 remaining backends; B: Erlang selfcall remainder), open a dedicated roadmap task with that item’s effort + yield estimate, not “complete #172.”
Drawbacks
-
Closing #172 may look like work was punted. Same caveat as RFC-0028: the work was done, just unlabeled. Pattern of stale roadmap entries has happened ≥5 times in the 2026-05-17/18 session (#402, #406, #435, #403, #171, #172). This is a memory- drift issue, not real backlog.
-
No active push to extend fuzz coverage. Per the deferral rationale, this is correct — speculative coverage extension has lower defect yield than reacting to defect surfaces.
-
CI not gated by fuzz. Accepted today; reconsider if a matrix-passes-but-fuzz-would-have-caught regression occurs.
Unresolved questions
-
Is the Phase 22 / Phase 23 “skip” stable, or worth revisiting as Frame matures? Both are documented as out-of-scope today (Phase 22 = target-side semantics; Phase 23 = single-threaded by design). If Frame ever introduces explicit concurrency primitives, Phase 23 needs revisiting. Today: stay skipped.
-
Should the differential-trace harness (phases 2-7) absorb the assertion-driven phases (8-14)? Differential trace is strictly more thorough — byte-diff against Python oracle catches semantic divergence the assertion driver doesn’t. But the assertion driver was easier to write per-phase. Migration would be ~1-2 days per phase. Not urgent — both modes work, the duplication isn’t expensive.
-
Should
cases_*directories be checked into git, or generated on-demand? Currently 35,740 cases × ~17 backend variants are committed. They’re regenerable from the gen_*.py scripts. Pros for committing: stable regression corpus, no generation-time. Cons: large repo, everygit checkouttouches many files. Net: current “commit them” is fine; flag if repo size becomes a problem.
References
framec-test-env/fuzz/FUZZ_PLAN.md— authoritative phase catalog + remaining-roadmap detailframec-test-env/fuzz/TEST_INFRA_ROADMAP.md— runner contract (tier + tag + lang flags) + wall-clock budgetsframec-test-env/fuzz/TRACE_FORMAT.md— differential-trace line format contract (the “byte-identical stdout” spec)framec-test-env/fuzz/DEFECTS.md— closed-defect log (D1-D18, 18 framec defects caught by fuzz)framec-test-env/fuzz/diff_harness/run_fuzz.py— phase 2-7 differential trace runnerframec-test-env/fuzz/run_all.sh— meta-runner entry point- Roadmap #172 (closed by this RFC) — original framing as “major new infra”
- RFC-0028 — parallel closure of the in-process-API stale-framing pattern