RFC-0034: In-process compile checks for every backend’s snapshot fixtures

Summary

framec ships a snapshot-test corpus per backend: each canonical fixture is compiled to every target language and the emitted text is frozen into a .snap file. A regression in the codegen surfaces as a diff in PR review. But the snapshot tests never invoke the target language’s compiler on the emitted text — they only diff strings. A fixture can emit syntactically invalid Java / Python / Go / etc. and the test suite passes as long as the new invalid output matches its frozen invalid snapshot. This RFC closes the gap by adding an in-process compile check for every backend: each fixture, compiled to its target, is then piped through the target language’s parser or type-checker (no execution, no link).

Motivation

This issue surfaced under audit during RFC-0033’s Rust polish work. Compiling the framec-emitted output for 01_linear_fsm.frm through cargo clippy -D warnings revealed multiple clippy violations that the Rust snapshot test had silently been freezing. Pushing the audit further surfaced two fixtures (07_forward.frm, 10_actions.frm) where framec was emitting Rust that rustc rejected as syntactically invalid — yet the Rust snapshot tests passed cleanly. The matrix test-env DOES compile and run, but uses a separate fixture corpus (framec-test-env/tests/) that can drift from the in-tree one.

A codegen regression that emits invalid output sails through every in-tree test as long as the new invalid output matches the old (now invalid) snapshot. The question “do we execute all our tests?” has a misleading answer: yes, the test suite runs, but it doesn’t verify the central correctness property — does framec emit code that parses in the target language?

RFC-0027 set up the snapshot infrastructure. RFC-0033 closed the gap for Rust by adding a rustc --emit=metadata check. This RFC extends the pattern to every other backend so the same shape of bug can’t survive review in any target.

The contract

The key words MUST, SHOULD, MAY are to be interpreted as in RFC 2119.

Per-backend compile check

For each backend <lang> framec targets, the snapshot test file framec/tests/<lang>_snapshots.rs MUST contain a test named rfc0034_all_fixtures_compile that, for every fixture in the canonical corpus:

  1. compiles the fixture through framec for the target,
  2. invokes the target language’s compiler / type-checker / syntax-parser on the resulting source, in a mode that does not execute code and does not require external dependencies beyond the target’s standard toolchain,
  3. asserts the tool exits cleanly.

When the target toolchain is not available on the host, the test MUST report ignored (Cargo’s standard skip mechanism) rather than fail. CI MUST run the test on a host where the toolchain is present, so coverage is binding there. Per-developer machines may legitimately lack some toolchains (a Rust developer isn’t required to install Swift to run cargo test).

Tool selection per backend

The fastest non-executing check the target’s standard toolchain offers:

Backend Tool Mode
Python python3 -m py_compile Bytecode-only
TypeScript tsc --noEmit Type-check
JavaScript node --check Parse
Java javac -Xstdout /dev/null (or --release) Compile (no link)
Kotlin kotlinc -script -nowarn Parse + analyze
Swift swift -typecheck Type-check
C cc -fsyntax-only Parse
C++ c++ -fsyntax-only -std=c++17 Parse
C# dotnet-script --check or csc /target:dll Compile (no link)
Go gofmt -e (parse-only) or go build Parse / build
Dart dart analyze --fatal-warnings Analyze
PHP php -l Lint (parse only)
Ruby ruby -c Compile-check only
Lua luac -p Parse only
Erlang erlc to /dev/null Compile (no link)
GDScript godot --headless --check-only Parse
Rust rustc --emit=metadata (RFC-0033) Type-check

Where multiple tools are listed, the test MAY use whichever is available; the priority is “fastest non-executing check the backend’s standard toolchain ships.”

Fixtures with external dependencies

Some fixtures pull in target-language libraries that the in-process compile check cannot resolve without a package manager. The canonical examples are 03_persist.frm and 12_no_persist.frm emitting serde_json::Value references for Rust.

When a fixture’s emitted output depends on an external library, it MAY be excluded from the per-backend compile check. Each exclusion MUST be:

  1. Listed in a backend-specific constant in the test file with a one-line reason (“emits serde_json::Value; rustc-alone can’t resolve”).
  2. Covered by the matrix test-env (which uses each target’s package manager via Docker).

The exclusions are the seam where the in-process check hands off to the matrix.

Diagnostic quality

On failure, the test MUST include in the panic message:

  • the fixture name,
  • the target language,
  • the tool that rejected the output (e.g. “rustc”),
  • the first 200 lines of the rejected source (so the developer doesn’t have to find / regenerate it).

Stack traces alone are insufficient. The developer reading the failure should be able to diagnose the issue without leaving the test output.

Test-helper consolidation

Every backend follows the same shape. To avoid 17 near-duplicate test bodies:

  • A helper common::compile_check(fixture, target, tool_runner) in framec/tests/common/mod.rs performs the common steps: compile through framec, write to a tempfile, invoke tool_runner with the path, assert success, format the diagnostic on failure.
  • Each backend’s snapshot test calls compile_check for every fixture in its corpus.
  • The tool_runner closure is the only per-backend code: it takes the path and returns a std::process::Output.

A new fixture added to tests/fixtures/ doesn’t require touching 17 test files individually — the shared corpus list lives in common/mod.rs.

Skip conditions

The skip mechanism for unavailable toolchains MUST be a Command::new(...).output() probe, not a hard-coded environment variable. A developer who installs the toolchain mid-session gets coverage on the next test run without having to flip a flag.

When skipping, the test SHOULD print one line to stdout (“python_3 compile check skipped: python3 not on PATH”) so the developer notices coverage gaps rather than seeing a silent green.

Examples

Python backend

framec/tests/python_snapshots.rs:

#[test]
fn rfc0034_all_fixtures_compile() {
    let py3 = match find_tool("python3") {
        Some(p) => p,
        None => {
            eprintln!("python_3 compile check skipped: python3 not on PATH");
            return;
        }
    };
    for fixture in &FIXTURES {
        compile_check(fixture, "python_3", |path| {
            Command::new(&py3)
                .args(["-m", "py_compile"])
                .arg(path)
                .output()
                .expect("python3 process")
        });
    }
}

Java backend

framec/tests/java_snapshots.rs:

#[test]
fn rfc0034_all_fixtures_compile() {
    let javac = match find_tool("javac") {
        Some(p) => p,
        None => return,
    };
    for fixture in &FIXTURES {
        compile_check(fixture, "java", |path| {
            // -d: write class files into the same temp dir so the
            //     compile produces nothing the test cares about
            //     beyond the success/failure status.
            let dir = path.parent().expect("path has parent");
            Command::new(&javac)
                .args(["-d"])
                .arg(dir)
                .arg(path)
                .output()
                .expect("javac process")
        });
    }
}

Cross-backend fixture exclusions

The persist-using fixtures are excluded uniformly:

// In common/mod.rs
pub const FIXTURES_ALL: &[&str] = &[
    "01_linear_fsm",
    "02_hsm",
    "03_persist",
    "04_state_args",
    // ... 12 total
];

// Per-backend exclusions, keyed by target.
pub fn excluded_for(target: &str) -> &'static [&'static str] {
    match target {
        // Rust + Java + C# + Kotlin + Swift: serde-style
        // JSON serialization needs library deps the in-process
        // compile check can't resolve. Matrix covers these.
        "rust" | "java" | "csharp" | "kotlin" | "swift" => {
            &["03_persist", "12_no_persist"]
        }
        _ => &[],
    }
}

Alternatives

Run the matrix in CI as the sole compile check

Considered: rely entirely on the matrix test-env to catch invalid emission. Rejected because:

  • The matrix uses a separate fixture corpus (framec-test-env/tests/) that can drift from the in-tree one. A bug in an in-tree fixture’s emitted output is invisible to the matrix.
  • The matrix requires Docker images and takes ~5+ minutes wall; developer iteration on a per-backend codegen change benefits from sub-second feedback. cargo test --test python_snapshots in <1s lets the developer iterate; “wait for the matrix” does not.

Compile to bytecode/IR and diff that instead of source

Considered: snapshot the bytecode (Python .pyc, Java .class, etc.) so the check is “does the bytecode match?” — that requires the source to compile to produce the bytecode, so compile-success falls out for free. Rejected because:

  • Bytecode is implementation-defined and version-sensitive (Python 3.10 vs 3.12 produce different bytecode for identical source). Diffs would surface compiler-version changes as regressions, which is noise.
  • Source-text snapshots are reviewable; bytecode is not.

Add per-fixture # rustc: ok markers

Considered: each fixture carries an inline directive declaring which targets should compile-clean. Rejected because the canonical expectation IS “every fixture compiles in every backend” — opting in fixture-by-fixture turns the test into a partial check and invites silent gaps when someone forgets to add the marker. The exclusion list (persist fixtures pulling in serde) is the only legitimate hole; coded once in common/mod.rs covers it.

Migration

Source-additive — adds new tests, doesn’t change anything that exists. Each backend’s snapshot test gains one new test method. No code change in framec’s emission. No fixture changes (any fixture that fails the new test is a fixture bug — see RFC-0033 for examples of fixtures that needed correction once the Rust compile check was added).

A backend whose toolchain isn’t available on a developer’s machine sees the new test as ignored — no breakage, no false red.

References

  • Frame language reference
  • Glossary
  • CHANGELOG.md
  • RFC-0027 — In-tree snapshot tests per backend (the infrastructure this RFC extends).
  • RFC-0033 — Idiomatic Rust output (where the Rust compile check shipped; this RFC generalizes the same shape to every backend).