RFC-0035: Dogfooding inventory — existing FSMs, migration candidates, and single-state test corpus

  • Status: Draft
  • Author: Mark Truluck mark.truluck@cogiton.com
  • Created: 2026-05-19
  • Builds on: RFC-0027, RFC-0033, RFC-0034

Summary

framec uses Frame .frs specifications to generate large portions of its own implementation — every per-target syntax skipper and body closer, the attribute scanner, the context parser, several backend-specific assemblers. The “dogfood” pattern is a stated project discipline: when a part of the compiler is a state machine on bytes, write it as a Frame system and generate the Rust from it. This RFC catalogues every existing .frs spec, identifies hand-coded subsystems that fit the dogfood pattern but were written in plain Rust, and proposes a small corpus of single-state “test-via-Frame” programs that exercise the compiler without needing transitions.

The motivation is concrete: four consecutive iterations of the same class of bug (FRAMEC_BUGS #24 → #25-A → #25-B → #26) all came from a hand-coded structural scanner. Each round patched the latest reported case while opening another. Migrating that scanner to a Frame FSM closed all four with one change because the spec is declarative — every edge of the state machine is visible in one document. The same risk applies to every other hand-coded scanner-like subsystem in framec.

Motivation

The pattern is well-established. Of the 39 .frs files in framec/src/:

  • 30 are per-target lexical machines (16 body closers + 14 syntax skippers, one per code-generating target).
  • 4 are shared Frame-syntax parsers (attribute scanner, context parser, expression scanner, state-var parser).
  • 4 are compiler-utility FSMs (output-block lexer + parser, Erlang scope scanner, GDScript multi-system assembler).
  • 1 is the structural shim added in #26 to fix the whack-a-mole cycle in the GraphViz pipeline (frame_structural × body_closer + skipper).

Each is a Rust state machine on bytes whose lexical rules are declared in Frame source, compiled to Rust via framec compile -l rust. The glue file wires the generated FSM into the appropriate trait. Re-bless instructions are at the top of each glue file.

Outside that catalog, several subsystems in framec do scanner-like work in plain Rust:

  • The Java handler-body native rewriter (added in RFC-0033 for Issue #18) lowers self.Xthis.X and await EXPREXPR.join() in Java handler bodies. It walks bytes, recognizes // line comments and "..." strings to avoid rewriting inside them, and applies token-level substitutions.
  • The Erlang per-line native rewriter classifies Frame-emitted body lines into ActionCall / RecordUpdate / etc. shapes. Hand-coded; works correctly today but exists at the same level of complexity as the per-target skippers.
  • Several smaller helpers in frame_expansion/ and handler_body.rs walk bytes with hand-coded state tracking (statement-terminator detection, ; insertion at user→Frame-expansion boundaries).

Each is a place where a future contributor might add a new edge case and step on the same trap. Migration is not urgent — the existing code works for the inputs it was tested against — but the RFC names them so a future round of “scanner is missing one edge” has a clear answer: write the spec, generate the FSM.

Separately, the dogfood discipline pays a second dividend: every Frame program framec compiles is a regression test for the compiler itself. The 39 specs above already double as exercises of the language. There is room for more — small, single-state “utility” Frame programs that don’t model a state machine but use Frame syntax to express a function (a type mapper, a name converter, a single-shot validator). They generate Rust code framec then compiles into its own binary. Adding them expands the dogfood surface and is cheaper than authoring external integration tests.

The contract

Inventory of existing FSMs

The complete current catalogue. Every entry is a Frame system that compiles to Rust as part of framec’s own build.

Per-target lexical machines (30 files)

Body closers — find matching } for a state body, respecting target-language strings and comments
Target .frs Variants
C body_closer/c.frs // + /* */ + "..." + 'X'
C++ body_closer/cpp.frs Same as C + raw strings
C# body_closer/csharp.frs @"..." verbatim + $"..." interpolated
Erlang body_closer/erlang.frs % ... comments + "..." only
Go body_closer/go.frs `...` raw + standard "..."
Java body_closer/java.frs Standard C-family
JavaScript body_closer/javascript.frs `...` template literals + standard
Kotlin body_closer/kotlin.frs """...""" raw + standard
Lua body_closer/lua.frs [[...]] long strings + --[[...]] comments
PHP body_closer/php.frs Heredoc/nowdoc + <?php/?> tags
Python body_closer/python.frs Triple-quoted + f-strings
Ruby body_closer/ruby.frs %w[...] literals + #{...} interpolation
Rust body_closer/rust_lang.frs Nested /* */ + r#"..."# raw + 'X'
Swift body_closer/swift.frs """...""" + #"..."# raw + \(...)
TypeScript body_closer/typescript.frs Same as JavaScript
Structural body_closer/frame_structural.frs Both // and # line comments; for GraphViz
Syntax skippers — skip_comment / skip_string / find_line_end / balanced_paren_end
Target .frs
C native_region_scanner/c_skipper.frs
C++ native_region_scanner/cpp_skipper.frs
C# native_region_scanner/csharp_skipper.frs
Erlang native_region_scanner/erlang_skipper.frs
Java native_region_scanner/java_skipper.frs
JavaScript native_region_scanner/javascript_skipper.frs
Kotlin native_region_scanner/kotlin_skipper.frs
Lua native_region_scanner/lua_skipper.frs
PHP native_region_scanner/php_skipper.frs
Python native_region_scanner/python_skipper.frs
Ruby native_region_scanner/ruby_skipper.frs
Rust native_region_scanner/rust_skipper.frs
Swift native_region_scanner/swift_skipper.frs
TypeScript native_region_scanner/typescript_skipper.frs
Structural native_region_scanner/frame_structural_skipper.frs

(Dart and Go currently use C-like helpers without a dedicated .frs — see Migration candidates below.)

Shared Frame-syntax parsers (4 files)

Spec Purpose
attribute_scanner/attribute_scanner.frs Parses @@[name(args)] attribute headers on declarations
native_region_scanner/context_parser.frs Recognizes @@:return, @@:event, @@:data[k], etc.
native_region_scanner/expr_scanner.frs Pulls expressions out of native bodies for Frame-token detection
native_region_scanner/state_var_parser.frs Parses $.var: type = init declarations

Compiler-utility FSMs (4 files)

Spec Purpose
codegen/output_block_lexer.frs Tokenizes the @@output { ... } block embedded in source
codegen/output_block_parser.frs Builds an AST from the @@output token stream
native_region_scanner/erlang_scope_scanner.frs Tracks Erlang case/end scope boundaries in handler bodies
gdscript_multisys/multisys_assembler.frs Sequences GDScript multi-system class re-arrangement

Migration candidates

Hand-coded subsystems whose shape matches the existing dogfood pattern and would benefit from migration.

Java handler-body native rewriter

File: framec/src/frame_c/compiler/codegen/state_dispatch/handler_methods/java_native_rewrite.rs (~194 LOC).

Today: walks bytes in a Java-target NativeBlock and applies two token-level substitutions, respecting ////* */ comments and "..." strings:

  • self.Xthis.X
  • await EXPREXPR.join() (where EXPR is a balanced parenthesized call form).

Should be: body_closer/java_handler_rewrite.frs (or a shared handler_rewrite/ module) modeled on body_closer/java.frs. States: $Scanning, $InString, $InLineComment, $InBlockComment. Output: a string-builder domain field collecting the rewritten bytes.

Why: rewriting native code while respecting comments and strings is the exact same shape as brace matching — and the brace matcher is a dogfooded FSM. The Rust target’s &strString auto-promotion (RFC-0033) and the self.Xthis.X lowering (also RFC-0033) are parallel patches that will likely need to extend to other backends (PHP’s self.X$this->X was filed as a follow-up). Each new backend’s rewriter would benefit from the same dogfood pattern.

Effort: one day. Same complexity as adding a per-target body_closer.

Erlang per-line native rewriter — REVIEWED, NOT A FIT

File: framec/src/frame_c/compiler/codegen/erlang_system/native_rewrite.rs (~266 LOC).

Today: classifies each line of a Frame-emitted Erlang handler body into one of seven shapes (ActionCall, ActionCallWithBind, InterfaceCall, InterfaceCallWithBind, RecordUpdate, Reply, Plain). The classification drives how Data threads through the gen_statem reply.

Verdict on closer inspection: not a fit for the FSM dogfood pattern. The Erlang rewriter is a line classifier, not a byte walker. It takes a whole line, tries seven ordered starts_with / contains / find pattern checks in priority order, extracts a field name + args + similar via string ops, returns an enum variant. The byte-walking work it does need — self.fieldDataN#data.field substitution — is already delegated to replace_outside_strings_and_comments, which is FSM-driven through the per-target skipper trait.

Forcing this into a Frame FSM would be performative — a single state, single handler, seven if branches inside the handler body. The code would not be more declarative, only more ceremonious. Skipped from the migration backlog.

The hand-coded classifier remains an audit target the same way the validator is: well-suited for direct Rust, but every new branch should be added carefully and the pattern-priority order documented in the function comment (which it is, today).

Statement-terminator insertion at user→Frame-expansion boundaries

File: framec/src/frame_c/compiler/codegen/frame_expansion/handler_body.rs (needs_statement_terminator function, ~40 LOC).

Today: a single Rust function that asks “if I’m about to emit a Frame expansion after this NativeCode tail, do I need to insert ; first?” Walks the tail backward to find the last non-whitespace character, returns true unless it’s an in-expression continuation (=, +, (, ,, etc.).

Should be: marginal — this is small enough that a function is fine. Listed for completeness; not recommending migration.

Why not: the function is 40 LOC, has no recursion, no token context, and a flat enumeration of “this is an expression continuation” characters. The FSM would be a single state with one decision; the spec would be longer than the function.

Per-target type mappers, name converters, dispatch helpers

Files: scattered across codegen/codegen_utils.rs and per- backend modules.

Today: small helpers (to_snake_case, pascal_case_variant, frame_type_to_rust_type, java_map_type, rust_dispatch_convert, etc.) — pure functions, often single-pass.

Should be: see Single-state Frame components below — these are the canonical “use Frame syntax for the function structure, even without transitions” candidates.

Single-state Frame components — DOGFOOD AS POLICY

The initial RFC proposed migrating ~30 inline helper functions (to_snake_case, pascal_case_variant, per-target type mappers, dispatch-convert helpers, etc.) into single-state Frame systems — one @@system Foo with one state and one handler per function. An earlier triage of this section said “skip, cost-benefit not justified.” The owner has since reversed that call:

“I don’t care about code bloat etc. I want to implement framec as much as possible in Frame to 1) test the language, 2) show as diverse a set of problems it CAN be applied to as possible. If any are truly bad fits — that is a good thing to document.”

The migration is now an explicit goal of this RFC, not a deferred backlog item. Each helper migrated:

  1. Tests Frame on a real-world function shape.
  2. Documents Frame’s reach through the inventory in this RFC.
  3. Surfaces ergonomic gaps worth recording (e.g., @@:(value) sets the return value but doesn’t return early; recursion via a single-state FSM has to route through the public glue function rather than calling “self”).

Code-size + per-call dispatch overhead are accepted costs. For framec internals these helpers are not the hot path — the compile-time cost of an FSM dispatch is negligible compared to codegen and IO; the wall-clock per-fixture impact of Round 1

  • Round 2 was unmeasurable.

Round 1 (shipped)

Two name converters now live in framec/src/frame_c/compiler/name/:

Helper Spec Notes
to_snake_case(&str) to_snake_case.frs CamelCase → snake_case; loop + char branch
pascal_case_variant(&str) pascal_case_variant.frs snake_case → PascalCase; loop + boolean

Both call sites now delegate to the FSM; existing public signatures preserved.

Round 2 (shipped)

Per-target type mappers + RFC-0033 Rust dispatch helpers, plus a target-language predicate:

Helper Spec Module
csharp_map_type(&str) csharp_map_type.frs compiler/type_map/
java_map_type(&str) java_map_type.frs compiler/type_map/
kotlin_map_type(&str) kotlin_map_type.frs compiler/type_map/
go_map_type(&str) go_map_type.frs compiler/type_map/
cpp_map_type(&str) cpp_map_type.frs compiler/type_map/
swift_map_type(&str) swift_map_type.frs compiler/type_map/ (recursive)
frame_type_to_rust_type(&Type) rust_map_type.frs compiler/type_map/
rust_dispatch_convert(&Type) rust_dispatch_convert.frs compiler/type_map/
rust_owned_promotion(&str) rust_owned_promotion.frs compiler/type_map/
is_dynamic_target(TargetLanguage) is_dynamic_target.frs compiler/target_query/

Frame ergonomic observations recorded during Round 2:

  • @@:(value) sets the return, not “return early.” All four early-return-style helpers had to be restructured to compute a single result in let result = ... then call @@:(result) once at the end. The natural Rust idiom (if cond { return x; }) doesn’t translate verbatim.
  • Recursion routes through the glue function. Inside the Swift mapper’s handler body, the recursive call has to be crate::frame_c::compiler::type_map::swift_map_type(base), not “call self on a different argument.” Frame’s single-state shape has no built-in self.dispatch_again(x) affordance.
  • bool is not first-class in Frame interfaces yet. The is_dynamic_target predicate stringifies its answer ("true" / "false") and the glue function parses it back. Worth following up with a Frame-language proposal for bool/enum interface types.
  • The TargetLanguage enum doesn’t round-trip cleanly through a Frame event param. The glue function flattens the enum to the canonical lowercased name string, the FSM matches on the string. Same observation: Frame interfaces need first-class enum support to avoid string round-trips.

Round 3 (shipped)

Erlang per-line native rewriter / classifier migrated to a single Frame system in compiler/erlang_classifier/. The classifier walks each line of spliced Erlang handler text and tags it as one of seven ErlangRewrite variants (ActionCall, ActionCallWithBind, InterfaceCall, InterfaceCallWithBind, RecordUpdate, Plain, Reply).

Helper Spec Module
erlang_rewrite_native_classified_full(line, actions, interfaces, data_var) erlang_line_classifier.frs compiler/erlang_classifier/

This was the round that set out to document where Frame doesn’t fit cleanly. Three concrete findings:

  1. “Multi-state Frame” is a misnomer for line classifiers. The natural shape is single $Classifying state that transitions to one of 7 terminal states (one per variant) with the destination state’s $> entry handler emitting the encoded result. Frame semantics: classify() returns BEFORE $> fires on the destination state, so the entry handler can’t set the classify() return value. The result has to be computed in classify()’s body before transitioning, which makes the multi-state framing purely cosmetic. Round 3 ships as a single-state system with a branching body — the right shape for classifiers.

  2. Rich return types round-trip through pipe-delimited strings. ErlangRewrite has 7 variants, several with struct payloads (field+method+args, etc.). The FSM emits a tagged string (InterfaceCallWithBind|field=X|method=Y|args=Z) and the glue Rust function parses it back into the typed variant. This is a real ergonomic cost — Frame interfaces need first-class sum types / structs for this kind of classifier to be natural.

  3. Slice / Vec params have the same gap. action_names and interface_names are &[String] in the original Rust signature. They flatten to comma-separated strings at the FSM boundary and re-split inside the handler body. Same underlying gap: Frame interfaces need first-class slice params.

framec body_closer bug surfaced (worth tracking separately): the initial spec used a Rust labeled-break construct ('classify: { ... break 'classify EXPR; ... }). The body_closer (rust_lang.frs) treats the ' apostrophe as the start of a char literal and miscounts braces, dumping the framec module’s pub use statement in the middle of the user handler body. Worked around in this round by using a nested Rust fn classify_one(...) -> String { ... } instead. The underlying framec bug should be fixed in rust_lang.frs.

Second framec finding — W415 false positive: Frame’s W415 warning (“return EXPR in event handler — return value is lost”) fires on Rust early-returns inside nested fns/closures inside the handler body. The lint is a lexical return keyword search; it cannot distinguish “Frame handler early-return” from “nested Rust scope early-return.” Round 3 hits this 6 times. The warnings are advisory and don’t block compilation; a fix to make W415 scope-aware is a separate follow-up.

Round 4 (shipped)

E113 section-order validator migrated to a genuinely multi-state Frame system in compiler/section_order_validator/.

Helper Spec Module
validate_section_order(kinds: &[SystemSectionKind]) -> Option<String> section_order_validator.frs compiler/section_order_validator/

This is the round where Frame’s natural shape (state machine with transitions) actually fits the problem cleanly. The validator has two semantic phases:

  • $Walking — accept sections in canonical order, tracking the highest section index seen as a domain field.
  • $OutOfOrder — terminal; further sections are absorbed (E113 is reported once per system per the existing contract).

The transition $Walking → $OutOfOrder fires on the first violation. This is exactly the “error-absorbing terminal state” pattern Frame state machines were designed for. The match is so clean that the validator reads as a state machine on the page — no awkward shoehorning needed.

Bonus dogfood moment: framec rejected the first version of its own validator spec. The initial .frs placed domain: between interface: and machine: (a natural authoring order). framec’s parser rejected with E113: blocks out of order. Expected: operations:, interface:, machine:, actions:, domain: — i.e. framec validated the very Frame system that implements the E113 check. The fix was to move the domain: section to after machine: (canonical position). Pure self-application; a satisfying RFC-0035 outcome.

The 6 unit tests cover: empty sequence, full canonical order, partial canonical order, two specific out-of-order patterns, and the “report once” contract (subsequent out-of-order sections after the first one are absorbed by $OutOfOrder). All existing E113 tests in frame_validator/tests continue to pass against the FSM-backed implementation.

Round 5 (shipped)

E413 HSM parent-chain cycle detector migrated to a genuinely multi-state Frame system in compiler/hsm_cycle_validator/. This is the first round where Frame is applied to a graph algorithm.

Helper Spec Module
validate_hsm_cycles(parents: &[(String, Option<String>)]) -> Vec<(String, String)> hsm_cycle_walker.frs compiler/hsm_cycle_validator/

Each state’s parent chain is walked by a fresh FSM instance with four states:

  • $Initial — first step seeds start_name and visited, transitions to $Walking.
  • $Walking — each step receives the next parent. Three outcomes: empty parent → $ChainRoot; parent already in visited$CycleFound; otherwise record and self-loop.
  • $CycleFound — terminal error state.
  • $ChainRoot — terminal success state.

This is the canonical “graph walk as state machine” pattern: each state in the FSM corresponds to a phase of the algorithm. The visited set threads through as a domain field that evolves over event calls — not a state-arg, because state-args are per-state and visited needs to persist across the transition from $Initial to $Walking.

Frame ergonomic observation: HashSet is not a Frame interface type yet, so visited is a comma-separated string. Lookup is O(n) instead of O(1). For HSM chains (typical depth ≤10) this is fine, but on larger graphs the linear scan would be measurable. A first-class Set interface type would help, though for non-pathological HSM hierarchies the string CSV is a clean-enough stand-in.

Bonus observation: the FSM-driven design separates concerns cleanly. The orchestrating Rust caller does:

let mut walker = HsmCycleWalker::__create();
let _ = walker.step(state_name.to_string());  // seed
let mut current = first_parent;
loop {
    let result = walker.step(current.unwrap_or_default());
    if result.starts_with("CYCLE|") { cycles.push(...); break; }
    if result == "ROOT" { break; }
    current = parent_map.get(...);
}

The FSM owns the algorithmic state (visited set, current phase); the Rust caller owns the I/O (parent-map lookup, error pushing). This is a clearer separation than the original hand-coded version where both concerns interleaved in one function. Whether the clarity gain is worth the FSM scaffolding is the kind of judgment call RFC-0035 was set up to capture — for this case the answer is yes, but the heuristic is “graph algorithms with non-trivial visited-tracking benefit; pure linear scans probably don’t.”

Round 6 (shipped)

W414 reachable-states BFS migrated to a 3-state Frame system in compiler/reachable_validator/. This is the second graph algorithm in the arc; where Round 5 used one FSM instance per chain walk, Round 6 uses ONE FSM instance for the entire BFS over the state-machine graph, with both visited and queue threaded through domain fields that evolve across many events.

Helper Spec Module
validate_reachable_states(start, edges, all_states) -> Vec<String> reachable_walker.frs compiler/reachable_validator/

Three states with real transitions:

  • $Initial — first call must be seed(start), which seeds queue and visited and transitions to $Walking.
  • $Walkingenqueue(name) adds neighbors (de-duped against visited); next() pops the queue head or transitions to $Done when the queue empties; unreachable(all_csv) works as a read-only diff.
  • $Done — terminal. next() returns "DONE" idempotently; unreachable() works the same as in $Walking; enqueue() is absorbed (defensive — a caller that loops past DONE shouldn’t crash).

The orchestrating Rust caller (frame_validator/machine.rs) builds the edge map from the AST (transition targets + ancestor chain, with pop$ pre-filtered) and drives the walk:

walker.seed(start);
loop {
    let head = walker.next();
    if head == "DONE" { break; }
    for n in edges[head] { walker.enqueue(n); }
}
walker.unreachable(all_states_csv)

The FSM owns BFS state; the caller owns AST extraction. Same separation-of-concerns benefit Round 5 surfaced.

Frame ergonomic finding (new): @@:(value) sets the return value but does NOT exit the handler. Without a transition, the body keeps running past the @@: call. The Round 6 enqueue() body originally read:

if name.is_empty() {
    @@:("SKIP".to_string())
}
if already_visited {
    @@:("SKIP".to_string())
}
// add to visited + queue
@@:("ENQUEUED".to_string())

This had the BFS append the name to visited + queue regardless of the early-set returns, producing an infinite loop on cyclic input. The fix was an explicit Frame return after each @@:(...):

if name.is_empty() {
    @@:("SKIP".to_string())
    return
}

The same shape is documented in W415’s warning message (“return to exit”), but it’s a recurring papercut: Round 3 worked around with nested-fn returns, Round 5 stayed safe because every conditional branch had a transition (which DOES emit return; in the generated Rust), Round 4 has a latent fall-through that happens to be harmless. A first-class “return value and exit” Frame syntax (e.g. @@!:(value) as a combined set+exit) would prevent this class of bug.

Round 7 (shipped)

Pipeline supervisor — the framec compile pipeline expressed as a 5-state Frame system in compiler/pipeline_supervisor/. This is Frame at the META level: a state machine that describes the compiler’s own pipeline.

Helper Spec Module
PipelineSupervisor::new() pipeline_supervisor.frs compiler/pipeline_supervisor/

Five states:

  • $Idle — initial; no phase has begun.
  • $Running — actively executing a phase; non-fatal errors collected, pipeline continues.
  • $Aborted — fatal error (e.g. segmentation failure); the pipeline cannot continue. Terminal.
  • $Failed — pipeline completed but non-fatal errors were collected. Terminal.
  • $Done — clean exit, zero errors. Terminal.

As shipped in R7 the supervisor was observational — it did not drive the orchestrator’s control flow; pipeline/compiler.rs stayed native Rust and called into the supervisor at each phase boundary (begin_phase("segment"), abort("E001", ...), etc.), so wrong supervisor logic only affected --debug output, not compilation correctness.

Superseded by Round 8 (below). The observer shape was a half-measure — the worst middle. Round 8 makes the FSM actually drive compile_ast_based. The “multi-day arc / high regression risk” estimate quoted here turned out to be wrong: the conversion landed in a single session, byte-identical across all snapshot suites, by extracting the phases into functions and letting the FSM sequence them (the phase bodies stay native — Frame owns the control structure).

Bonus self-describing artifact: framec renders its own pipeline diagram.

$ framec compile -l graphviz \
    framec/src/frame_c/compiler/pipeline_supervisor/pipeline_supervisor.frs \
    > docs/pipeline_supervisor.gv
$ dot -Tsvg docs/pipeline_supervisor.gv > docs/pipeline_supervisor.svg

The .frs source IS the pipeline spec; the SVG is the documentation. Changing the pipeline’s phase model means editing the .frs and regenerating both the supervisor implementation AND the diagram. Single source of truth.

Frame architectural finding (new): Observer FSMs are a viable Round-7 shape when the underlying control flow is too intertwined to rewrite. The pattern:

  1. Define the high-level state machine in Frame.
  2. Have the native-Rust controller call into the FSM at boundary events.
  3. The FSM tracks state, accumulates errors, produces summary output.

This unblocks Frame-as-meta-compiler for ANY system whose controller is already in a complex form (validators, codegen pipelines, optimization passes). The FSM is the model; the existing code is the implementation. Migration is additive — no rewrite required.

Future rounds (open)

The migration backlog has covered: name converters (R1), type mappers + dispatch helpers (R2), line classifiers (R3), stateful validators with terminal error states (R4), graph algorithms in two flavors (R5 per-chain FSM, R6 whole-walk FSM), and meta-compiler observer FSMs (R7). The RFC has its “diverse problems Frame CAN be applied to” inventory.

A 2026-05-25 whole-codebase audit (against the “FSM everywhere we can — including single-state systems” mandate) found that the inventory is extensive but not complete, and that one shipped FSM is decorative. These are the immediate roadmap — concrete native state machines to convert, and one fake to make real, each parity-gated against the matrix + fuzz:

  • Round 8 — make the pipeline supervisor drive. ✅ SHIPPED (2026-05-25). compile_ast_based (~1000 lines of intertwined linear logic) was carved into six phase functions (do_segment / do_parse / do_module_gates / do_graphviz / do_validate_codegen / do_assemble) over a shared owned PipelineCtx. The old observer PipelineSupervisor was replaced by PipelineFsm (compiler/pipeline_supervisor/): one state per phase ($Idle → $Segment → $Parse → $ModuleGates → $Graphviz → $ValidateCodegen → $Assemble → $Done), each state’s $> enter handler running its do_* phase and transitioning — to the next phase, or, on early exit, stashing the CompileResult in early and jumping to $Done. The machine self-drives from a single run() call (RFC-0039 B1 “backbone owns the state”: PipelineCtx + early are domain fields). compile_ast_based is now a three-line delegation to run_pipeline. The transition graph is the pipeline control flow, and framec compile -l graphviz on the .frs renders the real pipeline. Byte-identical output, full suite + parity green.
  • Round 9 — domain.rs outer scanner. ✅ SHIPPED (2026-05-25). The domain: section line-walk is now DomainScannerFsm (compiler/domain_scanner/): $Start → $Scan (self-loops one physical line per pass) → $Done, owning bytes/pos/vars/pending buffers and delegating @@[…] + default-expression sub-scans to the existing scan_attribute / ExprScannerFsm. parse_domain is a thin caller. Byte-identical; gate green.
  • Round 10 — assembler/mod.rs call-site scanner. ✅ SHIPPED (2026-05-25). expand_system_instantiations is now a lexer: CallSiteScannerFsm (compiler/call_site_scanner/) walks a native region into a CallToken stream (Literal verbatim / Call per @@[!]Name(args)), delegating comment/string/ balanced-paren detection to the language SyntaxSkipper. The assembler expands each Call via expand_one (it holds the borrowed system-params maps, which never enter the FSM). Byte-identical; gate green.
  • Round 11 — native_region_scanner/unified/metadata.rs. ✅ SHIPPED (2026-05-25), scoped honestly. On inspection extract_segment_metadata is a flat match kind dispatch, not a scanner — a top-level FSM would be decorative (the R8 lesson). Its one genuinely parser-shaped arm, the transition-string grammar (exit)? -> (=>)? (enter)? ($State(args)? | pop$) "label"?, became a real multi-state grammar FSM: TransitionMetaScannerFsm (compiler/transition_meta_scanner/) walks one state per element ($Target → $ExitArgs → $EnterArgs → $StateArgs → $LabelForward → $Done), each extracting its piece. The Transition arm is now a one-line delegation; the rest of metadata.rs stays the dispatch it honestly is (369 → 238 LOC). Byte-identical; gate green.
  • Round 12 — Erlang scanners. ✅ SHIPPED (2026-05-25), scoped honestly. lexical.rs is mostly stateless string utilities (replace_word, erlang_op_name, split_top_level_commas, …), not scanners. Its one genuine state machine — and the most state-machine-like thing here — is paren_balance_unclosed: a lexical-mode scanner (string / quoted-atom / escape) counting brackets, now ParenBalanceFsm (compiler/paren_balance_scanner/) whose STATES are the lexical modes ($Normal / $InString / $InAtom / $StringEscape / $AtomEscape). body_processor.rs (1639 LOC) is a line-transform pipeline with a dozen tiny intertwined local scanners — not a clean single FSM; forcing it would be decorative + R8-scale, so it is assessed, not converted (the R8/R11 lesson). Byte- identical; gate green.
  • Round 13 (judgement call) — the pipeline_parser parse_* oracle methods. ⊘ EVALUATED, NOT CONVERTED (2026-05-25). The judgement: there is no non-ceremonial standalone conversion to make here — the genuine parser-as-FSM work is RFC-0039’s (now merged onto this line; see below). Reasoning:
    • The parse_* methods are LL(1) recursive-descent dispatch (match token → build node → recurse). Their genuinely-stateful byte-level scanning already delegates to leaf FSMs (ExprScannerFsm, AttributeScannerFsm, and the R9–R12 domain / call-site / transition-meta / paren-balance scanners). What remains is token dispatch + mutual recursion.
    • The one genuinely scanner-shaped method, parse_body_block (a token-dispatch loop driving the lexer), is exactly the RFC-0039 “parser-as-Frame backbone” ($Machine → $StateHeader / $StateBody) — now merged onto this line, where the SystemBackbone Frame system already drives the state-body loop. That conversion is therefore done, by RFC-0039, not a fresh R13 task.
    • Turning the recursive descent into an explicit push/pop PDA is RFC-0039’s “(B) backbone-as-Frame-system” half, which RFC-0039 itself evaluated as a self-hosting proof, not a clarity or correctness win for framec’s flat, shallow, LL(1) grammar — a deliberate multi-day arc, not a quick round.

    Net: the genuine “parser as composed FSMs” lives in RFC-0039; there is no non-ceremonial standalone parse_* conversion left for RFC-0035 to make. (Refusing to pad the round count with a degenerate FSM is the R8/R11/R12 lesson applied.)

Lower-value / explicitly-deferred candidates (record, don’t rush): codegen backend emit dispatch (ceremony, not a real demo); per-target attribute scanners (only if a new attribute family lands).

Process for adding a new FSM

When migrating a hand-coded scanner / parser / classifier to a Frame FSM:

  1. Read the closest existing .frs as a template. rust_lang.frs is the model for “scan bytes, recognize lexical shapes, output a position or error.” attribute_scanner.frs is the model for “scan a structured construct.”
  2. Write the new .frs next to the hand-coded module it replaces. Include a header comment explaining what it does and what the per-state transition graph means.
  3. Generate the .gen.rs via framec compile -l rust -o <dir>/ <spec>.frs, then rename the output file to <name>.gen.rs.
  4. Write the glue .rs that include!s the .gen.rs and implements the trait the hand-coded module used to satisfy. Same shape as body_closer/rust.rs etc.
  5. Delete the hand-coded module. The build catches any missed references; nothing should silently retain the old implementation.
  6. Run cargo test --workspace + the matrix Rust target. Both must stay green.
  7. Document the regen in the glue file’s module comment so future contributors can rebuild after editing the spec.

Migration outcome

The backlog is being worked round by round, with each round shipping a coherent slice and recording the Frame ergonomic observations it surfaces. Current state:

Migration target Status
Java handler-body native rewriter Done. java_await_rewrite.frs
Name converters (to_snake_case, pascal_case_variant) Done — Round 1. compiler/name/
Per-target type mappers + Rust dispatch helpers (10 fns) Done — Round 2. compiler/type_map/ + compiler/target_query/
Erlang per-line native rewriter (line classifier) Done — Round 3. compiler/erlang_classifier/
E113 section-order validator (multi-state) Done — Round 4. compiler/section_order_validator/
E413 HSM cycle detector (multi-state graph walker) Done — Round 5. compiler/hsm_cycle_validator/
W414 reachable-states BFS (whole-walk multi-state FSM) Done — Round 6. compiler/reachable_validator/
Pipeline supervisor (meta-compiler driver FSM) Driving — Round 8 (shipped 2026-05-25). compiler/pipeline_supervisor/ (PipelineFsm). One state per compile phase; the enter-handler chain runs each do_* phase and transitions. compile_ast_based delegates to run_pipeline; the transition graph IS the control flow. (R7 shipped this observational; R8 closed that half-measure.)
domain.rs outer line scanner Open — Round 9. Native byte-walk; only sub-parts FSM’d.
assembler/mod.rs call-site scanner Open — Round 10.
unified/metadata.rs scanner Open — Round 11.
Erlang lexical.rs + body_processor.rs scanners Open — Round 12.
pipeline_parser parse_* oracle methods Open — Round 13 (judgement call).

The first migration (Java await rewriter) closed an immediate whack-a-mole risk: a 90-LOC hand-coded byte walker was replaced with a Frame spec modeled on body_closer/rust_lang.frs. Round 1 and Round 2 added 12 single-state systems — Frame functioning as a function- definition wrapper around native bodies — and surfaced the ergonomic observations documented above (return-value vs. return-early semantics, recursion routing, missing bool/enum interface types).

The RFC’s lasting value is the inventory + the process documentation: a future contributor whose change touches a scanner-like subsystem can read this RFC and decide quickly whether the FSM approach fits. The whack-a-mole risk that motivated the RFC (#24 → #25 → #26) only applies to genuine byte-walking scanners; the inventory makes that distinction explicit.

Examples

Migrating a hand-coded scanner to an FSM — the #26 path

The fix for FRAMEC_BUGS #26 already demonstrates the process. Before: framec/src/frame_c/compiler/native_region_scanner/frame_structural.rs contained a 150-line hand-coded skipper. After:

body_closer/frame_structural.frs          ← the state-machine spec
body_closer/frame_structural.gen.rs       ← generated by framec
body_closer/frame_structural.rs           ← 30-line glue file
native_region_scanner/frame_structural_skipper.frs
native_region_scanner/frame_structural_skipper.gen.rs
native_region_scanner/frame_structural.rs ← glue

The before-and-after LOC ratio is roughly 1:1, but the after version is declarative and dogfoods framec’s own compiler. Four prior bugs (#24, #25-A, #25-B, #26) closed simultaneously.

Single-state Frame for a name converter

name/to_snake_case.frs:

@@system ToSnakeCase {
    interface:
        convert(s: String): String

    machine:
        $Active {
            convert(s: String): String {
                @@:({
                    let mut out = String::new();
                    let mut prev_lower = false;
                    for c in s.chars() {
                        if c.is_ascii_uppercase() {
                            if prev_lower { out.push('_'); }
                            out.push(c.to_ascii_lowercase());
                            prev_lower = false;
                        } else {
                            out.push(c);
                            prev_lower = c.is_ascii_lowercase();
                        }
                    }
                    out
                })
            }
        }
}

framec compiles this to a single Rust struct + impl in its own source tree. The previously-inline to_snake_case function in codegen_utils.rs becomes a re-export:

pub fn to_snake_case(s: &str) -> String {
    ToSnakeCase::create().convert(s.to_string())
}

Same call sites continue to work. The function body now lives in a .frs file alongside its peers, and any codegen regression that affects how this gets emitted breaks framec’s build immediately.

Alternatives

Migrate the top-level lexer and parser to Frame FSMs

Considered: write pipeline_parser.frs and replace the hand-written parser. Rejected for this RFC because the top-level parser handles all of Frame’s grammar (every keyword, every operator, every section header) and would be a substantial multi-week effort. Worth pursuing separately if and when the parser’s complexity creates its own whack-a-mole problems. The lexer and parser are stable enough today that the cost-benefit doesn’t justify the work right now.

Migrate the validator to Frame FSMs

Considered: each validate_X function becomes a single-state Frame system. Rejected as one-shot; instead, new validators that match the FSM shape (multi-state walking the AST) should be written as FSMs. The existing flat-function validators stay flat — they’re simple enough that the migration adds noise without payoff.

Single big “framec utilities” Frame program

Considered: one mega-system containing every helper as an interface method. Rejected — Frame systems are state machines, and a state machine with one state and 30 interface methods is just a class. Splitting into one system per function is more honest about the shape, makes each spec independently editable, and matches the existing per-target-skipper layout.

Skip the inventory; just migrate as bugs surface

Considered: do nothing now, migrate each hand-coded module the next time it acquires a bug. Rejected because the FRAMEC_BUGS #24-#26 cycle is exactly the cost: each round looked like a small fix but cumulatively was four iterations of debugging the same class of bug. The inventory exists so a future contributor can spot the pattern and reach for the FSM template before adding the third hand-coded special case.

Migration

Source-additive. New .frs specs join the existing 39. The generated .gen.rs files join the existing per-target gen files. Hand-coded modules are deleted as their replacements ship — no parallel implementations, no transitional period.

Each migration is independently mergeable. The RFC commits to the inventory and the process, not to a deadline for any specific item.

References

  • Frame language reference
  • Glossary
  • CHANGELOG.md
  • RFC-0027 — In-tree snapshot tests (the regression net that catches whether a migrated .frs reproduces its hand-coded predecessor’s output).
  • RFC-0033 — Idiomatic Rust output. Introduced java_native_rewrite.rs, the largest current migration candidate.
  • RFC-0034 — In-process compile checks. Verifies the generated output of every .frs (and every other fixture) parses in its target language.
  • FRAMEC_BUGS.md #24 / #25 / #26 — the bug arc that motivated this RFC. The structural skipper was migrated to an FSM as part of #26’s fix; future hand-coded scanners should follow the same path before they accumulate similar bugs.