RFC-0027: In-tree snapshot tests per backend (insta)

  • Status: Accepted
  • Author: Mark Truluck mark.truluck@cogiton.com
  • Created: 2026-05-17
  • Companion to: RFC-0025 (quality remediation — this RFC’s regressions detection is the safety net under RFC-0025’s mechanical sweeps), RFC-0026 (the cause-side companion to this RFC’s symptom-side fix)

Motivation

The full persona-critique origin is recorded in RFC-0025. The two relevant excerpts for this RFC:

Bill Gates: “Tests: 400 #[test]s in-tree is decent. Zero snapshot tests, zero end-to-end fixtures in the framec repo itself, and the real coverage matrix lives in a separate repo. That’s a hand-grenade. Land at least one snapshot per backend in this tree so a contributor can cargo test and trust the result.”

Snarky Reddit dev: “Tests in a separate repo is a handgrenade.”

The concrete incident that triggered this RFC: a cookbook batch landed on main with regressions across seven typed backends — Java filename mismatch (×24 fixtures), C++ string-not- std::string in domain: declarations (×18), Go method-call capitalization (×24), Swift argument labels (×24), C T_init(&t) vs T* t = @@T() factory mismatch (×7), C# base keyword collision (×1), GDScript _init() duplicate-function collision (×35). None of these were caught locally by the framec author. All were caught by the external framec-test-env matrix five minutes after merge, with batched opaque output. Triage and remediation took an entire session.

Every one of those regressions would have shown up as a snapshot diff against a single representative fixture per backend, before the merge, in cargo test output. That’s the gap this RFC closes.

Summary

Add in-tree snapshot testing to framec using the insta crate. Each fixture is one Frame source file plus an expected compiled-output snapshot file checked into git. cargo test runs framec on each fixture and diffs the output against its snapshot. Mismatch = either a bug (fix the code) or an intentional change (cargo insta review to re-bless the snapshot).

Initial corpus: 3 representative fixtures × 17 backends = 51 snapshots, covering basic dispatch, HSM lifecycle, and persist serialization paths. Three-phase rollout: skeleton (1 day), full backend coverage (1–2 days), corpus extension (ongoing).

What snapshot testing is (since the user asked)

In conventional unit testing, you write:

assert_eq!(framec_compile(input, "python"), expected_output);

The expected_output is a string literal — usually 5–500 lines — embedded in the test. When the codegen changes intentionally, you hand-edit the literal. The change is reviewable in PR but tedious to update.

Snapshot testing replaces the inline literal with a separate file:

let output = framec_compile(input, "python");
insta::assert_snapshot!(output);

On first run, insta writes the current output to a .snap file checked into git. On every subsequent run, insta diffs the current output against the snapshot. If they differ, the test fails and insta shows a colored diff.

When the diff is a bug, you fix the code; the next run passes. When the diff is intentional (you changed the codegen on purpose), you run cargo insta review and accept the new snapshot interactively. The new .snap file is then committed alongside the codegen change in the same PR — so the diff is part of the code review.

The whole point: codegen changes become reviewable diffs in the PR. A change to the Java backend that accidentally affects the C# backend is visible in the PR before merge, not five minutes later in an external matrix.

Implementation

Dependencies

Add to framec/Cargo.toml:

[dev-dependencies]
insta = { version = "1", features = ["yaml"] }

The yaml feature is for snapshot-file formatting; the default is text-only which works but yaml gives slightly nicer multi-line diffs.

Directory layout

framec/
  tests/
    backends/
      mod.rs                       # shared fixture loader
      python_snapshots.rs          # one file per backend
      java_snapshots.rs
      ...
    fixtures/
      01_linear_fsm.frm
      02_hsm.frm
      03_persist.frm
    snapshots/                     # auto-managed by insta
      backends__python_snapshots__linear_fsm.snap
      backends__python_snapshots__hsm.snap
      backends__python_snapshots__persist.snap
      ... (51 files)

Initial fixture corpus

Three fixtures, each exercising a distinct codegen surface:

  • 01_linear_fsm.frm — three flat states ($Idle, $Active, $Done), simple transitions, no HSM, no persist. Covers basic state dispatch, interface methods, and the bread-and-butter codegen path. ~15 LOC of Frame.

  • 02_hsm.frm — one parent $P, two children $A and $B, both extending $P. Handlers in both children cascade-call => $^. Tests lifecycle ($> / <$) on cascade. Covers HSM dispatch, lifecycle emission, and the per-backend variation around inheritance representation. ~25 LOC.

  • 03_persist.frm — three states with typed state-args (int, str, List<int>), @@[persist(json)], @@[save(serialize)], @@[load(restore)]. Covers serialization codegen, the type-coding paths, and the new- contract factory shape. ~30 LOC.

These three together touch the great majority of the codegen surface. They are not exhaustive — that’s what phase 3 extends.

Phase 3 corpus expansion (2026-05-18)

The corpus was expanded from 3 to 12 fixtures in one batch after a Track B regression demonstrated the value of broader snapshot coverage. Nine new fixtures, each target-agnostic (no native blocks, no @@[target(...)]), each verified to compile clean on all 17 backends:

  • 04_state_args.frm — typed state args ($Holding(value: i32))
    • transition with args + read in @@:(value). Covers the per-event return enum + state-arg propagation paths Track B ships on Rust.
  • 05_pushpop.frmpush$ + -> pop$ modal stack. Covers the _state_stack push/pop runtime infrastructure on the 14 backends that support it.
  • 06_selfcall.frm@@:self.method() self-dispatch inside a handler body. Tests recursive event dispatch + the context-stack push/pop the runtime does around it.
  • 07_forward.frm-> => $State forward transition. Covers the rare-but-real forward-event re-dispatch on the new compartment.
  • 08_lifecycle.frm$>(args) enter with args + <$() exit body + transition-arg passing. Covers the lifecycle variant of FrameEvent and the entry/exit handler emission paths.
  • 09_return_explicit.frm@@:return(<expr>) form (the explicit alternative to the @@:(<expr>) shorthand used in 03).
  • 10_actions.frmactions: block with _helper(n) called from a handler body. Covers the action-call rewrite path.
  • 11_consts.frm — system-level params with defaults (@@system Consts(step: i32 = 5, limit: i32 = 20)) — what Frame calls “const” state. Covers the constructor + domain shadow handling.
  • 12_no_persist.frm@@[no_persist] mixed with @@[persist(String)] on the same system. Covers the per-field save/load skip logic on all 17 backends.

Skipped from this expansion (intentional):

  • @@async — backend-specific syntax (some backends don’t support async; the corpus must compile on all 17). Track separately if/when async snapshot coverage becomes needed.
  • Native blocks inside handler bodies — by definition target-specific syntax.
  • Cross-system construction — @@SubSys() call shape varies per backend.

Current corpus: 12 fixtures × 17 backends = 204 snapshots.

Per-backend test module

Example, tests/backends/python_snapshots.rs:

use crate::backends::compile_fixture;

#[test]
fn linear_fsm() {
    insta::assert_snapshot!(compile_fixture("01_linear_fsm.frm", "python"));
}

#[test]
fn hsm() {
    insta::assert_snapshot!(compile_fixture("02_hsm.frm", "python"));
}

#[test]
fn persist() {
    insta::assert_snapshot!(compile_fixture("03_persist.frm", "python"));
}

The compile_fixture helper lives in tests/backends/mod.rs, calls framec’s library API to compile a fixture string for a target, and returns the output string.

Phasing

  • Phase 1 (1 day) — Skeleton + python backend.
    • Wire insta into Cargo.toml.
    • Create tests/backends/mod.rs with compile_fixture helper.
    • Author the three fixtures.
    • Write tests/backends/python_snapshots.rs with 3 tests.
    • Run cargo test; bless initial snapshots; commit.
    • Acceptance: cargo test passes with 3 new snapshot files checked in.
  • Phase 2 (1–2 days) — Roll out to the remaining 16 backends.
    • Copy python_snapshots.rs to 16 sibling files (one per backend), changing the target string.
    • For each, run cargo test, hand-review the generated .snap, bless if clean, fix if anomalous.
    • Acceptance: 51 .snap files committed; cargo test passes; per-backend snapshot files visible under tests/snapshots/.
  • Phase 3 (ongoing) — Extend the fixture corpus as new patterns ship.
    • Add fixtures for multi-system, async, lifecycle edge cases, and any new feature that lands in a future RFC.
    • Each new fixture adds 17 snapshots. The cost is bounded.

Re-bless workflow

When a contributor intentionally changes codegen, snapshots will diff and the test will fail. The workflow:

cargo install cargo-insta            # one-time
cargo test                           # see failure
cargo insta review                   # interactive accept/reject UI
git add tests/snapshots/             # commit the new .snap files

This goes in CONTRIBUTING.md so contributors know what to do when a snapshot diff appears. The framing in CONTRIBUTING.md should be: a snapshot diff is a code review artifact, not a test failure to suppress. If you intentionally changed codegen, the diff is exactly what the reviewer needs to see.

Drawbacks

  • Maintenance burden on intentional codegen changes. Every change to codegen produces snapshot diffs to review. The cost is small (cargo insta review is two keystrokes per snapshot) but it’s nonzero and recurring. The trade-off is that unintentional codegen changes — the kind that caused this RFC’s motivating incident — are caught at PR time instead of in the external matrix.
  • 51 snapshot files in git. Each .snap file is small (tens to hundreds of lines) but it’s still ~5,000 lines of generated output in the repo. They are not source code; they are expected-output specifications. The repository grows by a small fixed amount.
  • Phase 3 corpus discipline. If contributors add fixtures ad-hoc, the corpus can grow to thousands of snapshots without a corresponding test-value gain. Mitigation: a short policy note (“snapshot fixtures must exercise a codegen surface not covered by an existing fixture”) in CONTRIBUTING.md.

Unresolved questions

  • Snapshot stderr too? framec sometimes emits warnings (W7xx series). Should those be in the snapshot? Probably yes — warning regressions are a real failure mode. Recommend: capture both stdout and stderr into a combined snapshot, delimited.
  • Multi-file fixtures? RFC-0024 cross-file scenarios involve multiple .frm files. Snapshot testing those requires a multi-file fixture loader. Defer to Phase 3 or later RFC.
  • Insta version pin? Pin to a major version (1.*) and let semver protect against breaking changes; revisit if the crate ever ships 2.0 with new defaults that would invalidate snapshots.
  • Interaction with RFC-0025 Track A. If snapshot tests land before Track A’s wave 2 (backend unwrap sweep), Track A’s regressions surface as snapshot diffs. Recommend RFC-0027 Phase 1 lands before RFC-0025 Track A wave 2 starts. (Phase 1 is 1 day. Track A wave 2 is the longest wave. The sequencing works.)

References

  • insta crate documentation
  • RFC-0025 — quality remediation companion; shares the persona-critique origin context.
  • RFC-0026 — the cause-side companion (the per-backend invariant gap that this RFC’s snapshot diffs would surface symptomatically).
  • _scratch/roadmap.md — task #431 (Land RFC-0027).
  • CONTRIBUTING.md — will gain the re-bless workflow section upon Phase 1 landing.
  • CHANGELOG.md — once shipped, the release notes record the version.