RFC-0027: In-tree snapshot tests per backend (insta)
- Status: Accepted
- Author: Mark Truluck mark.truluck@cogiton.com
- Created: 2026-05-17
- Companion to: RFC-0025 (quality remediation — this RFC’s regressions detection is the safety net under RFC-0025’s mechanical sweeps), RFC-0026 (the cause-side companion to this RFC’s symptom-side fix)
Motivation
The full persona-critique origin is recorded in RFC-0025. The two relevant excerpts for this RFC:
Bill Gates: “Tests: 400
#[test]s in-tree is decent. Zero snapshot tests, zero end-to-end fixtures in the framec repo itself, and the real coverage matrix lives in a separate repo. That’s a hand-grenade. Land at least one snapshot per backend in this tree so a contributor cancargo testand trust the result.”
Snarky Reddit dev: “Tests in a separate repo is a handgrenade.”
The concrete incident that triggered this RFC: a cookbook batch
landed on main with regressions across seven typed backends —
Java filename mismatch (×24 fixtures), C++ string-not-
std::string in domain: declarations (×18), Go method-call
capitalization (×24), Swift argument labels (×24), C T_init(&t)
vs T* t = @@T() factory mismatch (×7), C# base keyword
collision (×1), GDScript _init() duplicate-function collision
(×35). None of these were caught locally by the framec author. All
were caught by the external framec-test-env matrix five minutes
after merge, with batched opaque output. Triage and remediation
took an entire session.
Every one of those regressions would have shown up as a snapshot
diff against a single representative fixture per backend, before
the merge, in cargo test output. That’s the gap this RFC closes.
Summary
Add in-tree snapshot testing to framec using the
insta crate. Each fixture is one Frame
source file plus an expected compiled-output snapshot file checked
into git. cargo test runs framec on each fixture and diffs the
output against its snapshot. Mismatch = either a bug (fix the
code) or an intentional change (cargo insta review to re-bless
the snapshot).
Initial corpus: 3 representative fixtures × 17 backends = 51 snapshots, covering basic dispatch, HSM lifecycle, and persist serialization paths. Three-phase rollout: skeleton (1 day), full backend coverage (1–2 days), corpus extension (ongoing).
What snapshot testing is (since the user asked)
In conventional unit testing, you write:
assert_eq!(framec_compile(input, "python"), expected_output);
The expected_output is a string literal — usually 5–500 lines —
embedded in the test. When the codegen changes intentionally, you
hand-edit the literal. The change is reviewable in PR but tedious
to update.
Snapshot testing replaces the inline literal with a separate file:
let output = framec_compile(input, "python");
insta::assert_snapshot!(output);
On first run, insta writes the current output to a .snap file
checked into git. On every subsequent run, insta diffs the
current output against the snapshot. If they differ, the test
fails and insta shows a colored diff.
When the diff is a bug, you fix the code; the next run passes.
When the diff is intentional (you changed the codegen on purpose),
you run cargo insta review and accept the new snapshot
interactively. The new .snap file is then committed alongside
the codegen change in the same PR — so the diff is part of the
code review.
The whole point: codegen changes become reviewable diffs in the PR. A change to the Java backend that accidentally affects the C# backend is visible in the PR before merge, not five minutes later in an external matrix.
Implementation
Dependencies
Add to framec/Cargo.toml:
[dev-dependencies]
insta = { version = "1", features = ["yaml"] }
The yaml feature is for snapshot-file formatting; the default is
text-only which works but yaml gives slightly nicer multi-line
diffs.
Directory layout
framec/
tests/
backends/
mod.rs # shared fixture loader
python_snapshots.rs # one file per backend
java_snapshots.rs
...
fixtures/
01_linear_fsm.frm
02_hsm.frm
03_persist.frm
snapshots/ # auto-managed by insta
backends__python_snapshots__linear_fsm.snap
backends__python_snapshots__hsm.snap
backends__python_snapshots__persist.snap
... (51 files)
Initial fixture corpus
Three fixtures, each exercising a distinct codegen surface:
-
01_linear_fsm.frm— three flat states ($Idle,$Active,$Done), simple transitions, no HSM, no persist. Covers basic state dispatch, interface methods, and the bread-and-butter codegen path. ~15 LOC of Frame. -
02_hsm.frm— one parent$P, two children$Aand$B, both extending$P. Handlers in both children cascade-call=> $^. Tests lifecycle ($>/<$) on cascade. Covers HSM dispatch, lifecycle emission, and the per-backend variation around inheritance representation. ~25 LOC. -
03_persist.frm— three states with typed state-args (int,str,List<int>),@@[persist(json)],@@[save(serialize)],@@[load(restore)]. Covers serialization codegen, the type-coding paths, and the new- contract factory shape. ~30 LOC.
These three together touch the great majority of the codegen surface. They are not exhaustive — that’s what phase 3 extends.
Phase 3 corpus expansion (2026-05-18)
The corpus was expanded from 3 to 12 fixtures in one batch after a
Track B regression demonstrated the value of broader snapshot
coverage. Nine new fixtures, each target-agnostic (no native
blocks, no @@[target(...)]), each verified to compile clean on
all 17 backends:
04_state_args.frm— typed state args ($Holding(value: i32))- transition with args + read in
@@:(value). Covers the per-event return enum + state-arg propagation paths Track B ships on Rust.
- transition with args + read in
05_pushpop.frm—push$+-> pop$modal stack. Covers the_state_stackpush/pop runtime infrastructure on the 14 backends that support it.06_selfcall.frm—@@:self.method()self-dispatch inside a handler body. Tests recursive event dispatch + the context-stack push/pop the runtime does around it.07_forward.frm—-> => $Stateforward transition. Covers the rare-but-real forward-event re-dispatch on the new compartment.08_lifecycle.frm—$>(args)enter with args +<$()exit body + transition-arg passing. Covers the lifecycle variant of FrameEvent and the entry/exit handler emission paths.09_return_explicit.frm—@@:return(<expr>)form (the explicit alternative to the@@:(<expr>)shorthand used in 03).10_actions.frm—actions:block with_helper(n)called from a handler body. Covers the action-call rewrite path.11_consts.frm— system-level params with defaults (@@system Consts(step: i32 = 5, limit: i32 = 20)) — what Frame calls “const” state. Covers the constructor + domain shadow handling.12_no_persist.frm—@@[no_persist]mixed with@@[persist(String)]on the same system. Covers the per-field save/load skip logic on all 17 backends.
Skipped from this expansion (intentional):
@@async— backend-specific syntax (some backends don’t support async; the corpus must compile on all 17). Track separately if/when async snapshot coverage becomes needed.- Native blocks inside handler bodies — by definition target-specific syntax.
- Cross-system construction —
@@SubSys()call shape varies per backend.
Current corpus: 12 fixtures × 17 backends = 204 snapshots.
Per-backend test module
Example, tests/backends/python_snapshots.rs:
use crate::backends::compile_fixture;
#[test]
fn linear_fsm() {
insta::assert_snapshot!(compile_fixture("01_linear_fsm.frm", "python"));
}
#[test]
fn hsm() {
insta::assert_snapshot!(compile_fixture("02_hsm.frm", "python"));
}
#[test]
fn persist() {
insta::assert_snapshot!(compile_fixture("03_persist.frm", "python"));
}
The compile_fixture helper lives in tests/backends/mod.rs,
calls framec’s library API to compile a fixture string for a
target, and returns the output string.
Phasing
- Phase 1 (1 day) — Skeleton + python backend.
- Wire
instaintoCargo.toml. - Create
tests/backends/mod.rswithcompile_fixturehelper. - Author the three fixtures.
- Write
tests/backends/python_snapshots.rswith 3 tests. - Run
cargo test; bless initial snapshots; commit. - Acceptance:
cargo testpasses with 3 new snapshot files checked in.
- Wire
- Phase 2 (1–2 days) — Roll out to the remaining 16 backends.
- Copy
python_snapshots.rsto 16 sibling files (one per backend), changing the target string. - For each, run
cargo test, hand-review the generated.snap, bless if clean, fix if anomalous. - Acceptance: 51
.snapfiles committed;cargo testpasses; per-backend snapshot files visible undertests/snapshots/.
- Copy
- Phase 3 (ongoing) — Extend the fixture corpus as new
patterns ship.
- Add fixtures for multi-system, async, lifecycle edge cases, and any new feature that lands in a future RFC.
- Each new fixture adds 17 snapshots. The cost is bounded.
Re-bless workflow
When a contributor intentionally changes codegen, snapshots will diff and the test will fail. The workflow:
cargo install cargo-insta # one-time
cargo test # see failure
cargo insta review # interactive accept/reject UI
git add tests/snapshots/ # commit the new .snap files
This goes in CONTRIBUTING.md so contributors know what to do
when a snapshot diff appears. The framing in CONTRIBUTING.md
should be: a snapshot diff is a code review artifact, not a
test failure to suppress. If you intentionally changed codegen,
the diff is exactly what the reviewer needs to see.
Drawbacks
- Maintenance burden on intentional codegen changes. Every
change to codegen produces snapshot diffs to review. The cost
is small (
cargo insta reviewis two keystrokes per snapshot) but it’s nonzero and recurring. The trade-off is that unintentional codegen changes — the kind that caused this RFC’s motivating incident — are caught at PR time instead of in the external matrix. - 51 snapshot files in git. Each
.snapfile is small (tens to hundreds of lines) but it’s still ~5,000 lines of generated output in the repo. They are not source code; they are expected-output specifications. The repository grows by a small fixed amount. - Phase 3 corpus discipline. If contributors add fixtures
ad-hoc, the corpus can grow to thousands of snapshots without
a corresponding test-value gain. Mitigation: a short policy
note (“snapshot fixtures must exercise a codegen surface not
covered by an existing fixture”) in
CONTRIBUTING.md.
Unresolved questions
- Snapshot stderr too? framec sometimes emits warnings (W7xx series). Should those be in the snapshot? Probably yes — warning regressions are a real failure mode. Recommend: capture both stdout and stderr into a combined snapshot, delimited.
- Multi-file fixtures? RFC-0024 cross-file scenarios involve
multiple
.frmfiles. Snapshot testing those requires a multi-file fixture loader. Defer to Phase 3 or later RFC. - Insta version pin? Pin to a major version (
1.*) and let semver protect against breaking changes; revisit if the crate ever ships 2.0 with new defaults that would invalidate snapshots. - Interaction with RFC-0025 Track A. If snapshot tests land before Track A’s wave 2 (backend unwrap sweep), Track A’s regressions surface as snapshot diffs. Recommend RFC-0027 Phase 1 lands before RFC-0025 Track A wave 2 starts. (Phase 1 is 1 day. Track A wave 2 is the longest wave. The sequencing works.)
References
- insta crate documentation
- RFC-0025 — quality remediation companion; shares the persona-critique origin context.
- RFC-0026 — the cause-side companion (the per-backend invariant gap that this RFC’s snapshot diffs would surface symptomatically).
_scratch/roadmap.md— task #431 (Land RFC-0027).CONTRIBUTING.md— will gain the re-bless workflow section upon Phase 1 landing.CHANGELOG.md— once shipped, the release notes record the version.