RFC-0035: Dogfooding inventory — existing FSMs, migration candidates, and single-state test corpus
- Status: Draft
- Author: Mark Truluck mark.truluck@cogiton.com
- Created: 2026-05-19
- Builds on: RFC-0027, RFC-0033, RFC-0034
Summary
framec uses Frame .frs specifications to generate large portions
of its own implementation — every per-target syntax skipper and
body closer, the attribute scanner, the context parser, several
backend-specific assemblers. The “dogfood” pattern is a stated
project discipline: when a part of the compiler is a state machine
on bytes, write it as a Frame system and generate the Rust from
it. This RFC catalogues every existing .frs spec, identifies
hand-coded subsystems that fit the dogfood pattern but were
written in plain Rust, and proposes a small corpus of single-state
“test-via-Frame” programs that exercise the compiler without
needing transitions.
The motivation is concrete: four consecutive iterations of the same class of bug (FRAMEC_BUGS #24 → #25-A → #25-B → #26) all came from a hand-coded structural scanner. Each round patched the latest reported case while opening another. Migrating that scanner to a Frame FSM closed all four with one change because the spec is declarative — every edge of the state machine is visible in one document. The same risk applies to every other hand-coded scanner-like subsystem in framec.
Motivation
The pattern is well-established. Of the 39 .frs files in
framec/src/:
- 30 are per-target lexical machines (16 body closers + 14 syntax skippers, one per code-generating target).
- 4 are shared Frame-syntax parsers (attribute scanner, context parser, expression scanner, state-var parser).
- 4 are compiler-utility FSMs (output-block lexer + parser, Erlang scope scanner, GDScript multi-system assembler).
- 1 is the structural shim added in #26 to fix the
whack-a-mole cycle in the GraphViz pipeline
(
frame_structural× body_closer + skipper).
Each is a Rust state machine on bytes whose lexical rules are
declared in Frame source, compiled to Rust via framec compile
-l rust. The glue file wires the generated FSM into the
appropriate trait. Re-bless instructions are at the top of each
glue file.
Outside that catalog, several subsystems in framec do scanner-like work in plain Rust:
- The Java handler-body native rewriter (added in RFC-0033 for
Issue #18) lowers
self.X→this.Xandawait EXPR→EXPR.join()in Java handler bodies. It walks bytes, recognizes//line comments and"..."strings to avoid rewriting inside them, and applies token-level substitutions. - The Erlang per-line native rewriter classifies Frame-emitted body lines into ActionCall / RecordUpdate / etc. shapes. Hand-coded; works correctly today but exists at the same level of complexity as the per-target skippers.
- Several smaller helpers in
frame_expansion/andhandler_body.rswalk bytes with hand-coded state tracking (statement-terminator detection,;insertion at user→Frame-expansion boundaries).
Each is a place where a future contributor might add a new edge case and step on the same trap. Migration is not urgent — the existing code works for the inputs it was tested against — but the RFC names them so a future round of “scanner is missing one edge” has a clear answer: write the spec, generate the FSM.
Separately, the dogfood discipline pays a second dividend: every Frame program framec compiles is a regression test for the compiler itself. The 39 specs above already double as exercises of the language. There is room for more — small, single-state “utility” Frame programs that don’t model a state machine but use Frame syntax to express a function (a type mapper, a name converter, a single-shot validator). They generate Rust code framec then compiles into its own binary. Adding them expands the dogfood surface and is cheaper than authoring external integration tests.
The contract
Inventory of existing FSMs
The complete current catalogue. Every entry is a Frame system that compiles to Rust as part of framec’s own build.
Per-target lexical machines (30 files)
Body closers — find matching } for a state body, respecting target-language strings and comments
| Target | .frs |
Variants |
|---|---|---|
| C | body_closer/c.frs |
// + /* */ + "..." + 'X' |
| C++ | body_closer/cpp.frs |
Same as C + raw strings |
| C# | body_closer/csharp.frs |
@"..." verbatim + $"..." interpolated |
| Erlang | body_closer/erlang.frs |
% ... comments + "..." only |
| Go | body_closer/go.frs |
`...` raw + standard "..." |
| Java | body_closer/java.frs |
Standard C-family |
| JavaScript | body_closer/javascript.frs |
`...` template literals + standard |
| Kotlin | body_closer/kotlin.frs |
"""...""" raw + standard |
| Lua | body_closer/lua.frs |
[[...]] long strings + --[[...]] comments |
| PHP | body_closer/php.frs |
Heredoc/nowdoc + <?php/?> tags |
| Python | body_closer/python.frs |
Triple-quoted + f-strings |
| Ruby | body_closer/ruby.frs |
%w[...] literals + #{...} interpolation |
| Rust | body_closer/rust_lang.frs |
Nested /* */ + r#"..."# raw + 'X' |
| Swift | body_closer/swift.frs |
"""...""" + #"..."# raw + \(...) |
| TypeScript | body_closer/typescript.frs |
Same as JavaScript |
| Structural | body_closer/frame_structural.frs |
Both // and # line comments; for GraphViz |
Syntax skippers — skip_comment / skip_string / find_line_end / balanced_paren_end
| Target | .frs |
|---|---|
| C | native_region_scanner/c_skipper.frs |
| C++ | native_region_scanner/cpp_skipper.frs |
| C# | native_region_scanner/csharp_skipper.frs |
| Erlang | native_region_scanner/erlang_skipper.frs |
| Java | native_region_scanner/java_skipper.frs |
| JavaScript | native_region_scanner/javascript_skipper.frs |
| Kotlin | native_region_scanner/kotlin_skipper.frs |
| Lua | native_region_scanner/lua_skipper.frs |
| PHP | native_region_scanner/php_skipper.frs |
| Python | native_region_scanner/python_skipper.frs |
| Ruby | native_region_scanner/ruby_skipper.frs |
| Rust | native_region_scanner/rust_skipper.frs |
| Swift | native_region_scanner/swift_skipper.frs |
| TypeScript | native_region_scanner/typescript_skipper.frs |
| Structural | native_region_scanner/frame_structural_skipper.frs |
(Dart and Go currently use C-like helpers without a dedicated
.frs — see Migration candidates below.)
Shared Frame-syntax parsers (4 files)
| Spec | Purpose |
|---|---|
attribute_scanner/attribute_scanner.frs |
Parses @@[name(args)] attribute headers on declarations |
native_region_scanner/context_parser.frs |
Recognizes @@:return, @@:event, @@:data[k], etc. |
native_region_scanner/expr_scanner.frs |
Pulls expressions out of native bodies for Frame-token detection |
native_region_scanner/state_var_parser.frs |
Parses $.var: type = init declarations |
Compiler-utility FSMs (4 files)
| Spec | Purpose |
|---|---|
codegen/output_block_lexer.frs |
Tokenizes the @@output { ... } block embedded in source |
codegen/output_block_parser.frs |
Builds an AST from the @@output token stream |
native_region_scanner/erlang_scope_scanner.frs |
Tracks Erlang case/end scope boundaries in handler bodies |
gdscript_multisys/multisys_assembler.frs |
Sequences GDScript multi-system class re-arrangement |
Migration candidates
Hand-coded subsystems whose shape matches the existing dogfood pattern and would benefit from migration.
Java handler-body native rewriter
File: framec/src/frame_c/compiler/codegen/state_dispatch/handler_methods/java_native_rewrite.rs
(~194 LOC).
Today: walks bytes in a Java-target NativeBlock and applies
two token-level substitutions, respecting ////* */
comments and "..." strings:
self.X→this.Xawait EXPR→EXPR.join()(where EXPR is a balanced parenthesized call form).
Should be: body_closer/java_handler_rewrite.frs (or a
shared handler_rewrite/ module) modeled on
body_closer/java.frs. States: $Scanning, $InString,
$InLineComment, $InBlockComment. Output: a string-builder
domain field collecting the rewritten bytes.
Why: rewriting native code while respecting comments and
strings is the exact same shape as brace matching — and the
brace matcher is a dogfooded FSM. The Rust target’s &str →
String auto-promotion (RFC-0033) and the self.X → this.X
lowering (also RFC-0033) are parallel patches that will likely
need to extend to other backends (PHP’s self.X → $this->X
was filed as a follow-up). Each new backend’s rewriter would
benefit from the same dogfood pattern.
Effort: one day. Same complexity as adding a per-target body_closer.
Erlang per-line native rewriter — REVIEWED, NOT A FIT
File: framec/src/frame_c/compiler/codegen/erlang_system/native_rewrite.rs
(~266 LOC).
Today: classifies each line of a Frame-emitted Erlang handler
body into one of seven shapes (ActionCall, ActionCallWithBind,
InterfaceCall, InterfaceCallWithBind, RecordUpdate, Reply,
Plain). The classification drives how Data threads through
the gen_statem reply.
Verdict on closer inspection: not a fit for the FSM dogfood
pattern. The Erlang rewriter is a line classifier, not a
byte walker. It takes a whole line, tries seven ordered
starts_with / contains / find pattern checks in priority
order, extracts a field name + args + similar via string ops,
returns an enum variant. The byte-walking work it does need —
self.field → DataN#data.field substitution — is already
delegated to replace_outside_strings_and_comments, which is
FSM-driven through the per-target skipper trait.
Forcing this into a Frame FSM would be performative — a single
state, single handler, seven if branches inside the handler
body. The code would not be more declarative, only more
ceremonious. Skipped from the migration backlog.
The hand-coded classifier remains an audit target the same way the validator is: well-suited for direct Rust, but every new branch should be added carefully and the pattern-priority order documented in the function comment (which it is, today).
Statement-terminator insertion at user→Frame-expansion boundaries
File: framec/src/frame_c/compiler/codegen/frame_expansion/handler_body.rs
(needs_statement_terminator function, ~40 LOC).
Today: a single Rust function that asks “if I’m about to
emit a Frame expansion after this NativeCode tail, do I need
to insert ; first?” Walks the tail backward to find the last
non-whitespace character, returns true unless it’s an
in-expression continuation (=, +, (, ,, etc.).
Should be: marginal — this is small enough that a function is fine. Listed for completeness; not recommending migration.
Why not: the function is 40 LOC, has no recursion, no token context, and a flat enumeration of “this is an expression continuation” characters. The FSM would be a single state with one decision; the spec would be longer than the function.
Per-target type mappers, name converters, dispatch helpers
Files: scattered across codegen/codegen_utils.rs and per-
backend modules.
Today: small helpers (to_snake_case, pascal_case_variant,
frame_type_to_rust_type, java_map_type, rust_dispatch_convert,
etc.) — pure functions, often single-pass.
Should be: see Single-state Frame components below — these are the canonical “use Frame syntax for the function structure, even without transitions” candidates.
Single-state Frame components — DOGFOOD AS POLICY
The initial RFC proposed migrating ~30 inline helper functions
(to_snake_case, pascal_case_variant, per-target type
mappers, dispatch-convert helpers, etc.) into single-state
Frame systems — one @@system Foo with one state and one
handler per function. An earlier triage of this section said
“skip, cost-benefit not justified.” The owner has since
reversed that call:
“I don’t care about code bloat etc. I want to implement framec as much as possible in Frame to 1) test the language, 2) show as diverse a set of problems it CAN be applied to as possible. If any are truly bad fits — that is a good thing to document.”
The migration is now an explicit goal of this RFC, not a deferred backlog item. Each helper migrated:
- Tests Frame on a real-world function shape.
- Documents Frame’s reach through the inventory in this RFC.
- Surfaces ergonomic gaps worth recording (e.g.,
@@:(value)sets the return value but doesn’t return early; recursion via a single-state FSM has to route through the public glue function rather than calling “self”).
Code-size + per-call dispatch overhead are accepted costs. For framec internals these helpers are not the hot path — the compile-time cost of an FSM dispatch is negligible compared to codegen and IO; the wall-clock per-fixture impact of Round 1
- Round 2 was unmeasurable.
Round 1 (shipped)
Two name converters now live in
framec/src/frame_c/compiler/name/:
| Helper | Spec | Notes |
|---|---|---|
to_snake_case(&str) |
to_snake_case.frs |
CamelCase → snake_case; loop + char branch |
pascal_case_variant(&str) |
pascal_case_variant.frs |
snake_case → PascalCase; loop + boolean |
Both call sites now delegate to the FSM; existing public signatures preserved.
Round 2 (shipped)
Per-target type mappers + RFC-0033 Rust dispatch helpers, plus a target-language predicate:
| Helper | Spec | Module |
|---|---|---|
csharp_map_type(&str) |
csharp_map_type.frs |
compiler/type_map/ |
java_map_type(&str) |
java_map_type.frs |
compiler/type_map/ |
kotlin_map_type(&str) |
kotlin_map_type.frs |
compiler/type_map/ |
go_map_type(&str) |
go_map_type.frs |
compiler/type_map/ |
cpp_map_type(&str) |
cpp_map_type.frs |
compiler/type_map/ |
swift_map_type(&str) |
swift_map_type.frs |
compiler/type_map/ (recursive) |
frame_type_to_rust_type(&Type) |
rust_map_type.frs |
compiler/type_map/ |
rust_dispatch_convert(&Type) |
rust_dispatch_convert.frs |
compiler/type_map/ |
rust_owned_promotion(&str) |
rust_owned_promotion.frs |
compiler/type_map/ |
is_dynamic_target(TargetLanguage) |
is_dynamic_target.frs |
compiler/target_query/ |
Frame ergonomic observations recorded during Round 2:
@@:(value)sets the return, not “return early.” All four early-return-style helpers had to be restructured to compute a single result inlet result = ...then call@@:(result)once at the end. The natural Rust idiom (if cond { return x; }) doesn’t translate verbatim.- Recursion routes through the glue function. Inside the
Swift mapper’s handler body, the recursive call has to be
crate::frame_c::compiler::type_map::swift_map_type(base), not “call self on a different argument.” Frame’s single-state shape has no built-inself.dispatch_again(x)affordance. boolis not first-class in Frame interfaces yet. Theis_dynamic_targetpredicate stringifies its answer ("true"/"false") and the glue function parses it back. Worth following up with a Frame-language proposal for bool/enum interface types.- The
TargetLanguageenum doesn’t round-trip cleanly through a Frame event param. The glue function flattens the enum to the canonical lowercased name string, the FSM matches on the string. Same observation: Frame interfaces need first-class enum support to avoid string round-trips.
Round 3 (shipped)
Erlang per-line native rewriter / classifier migrated to a
single Frame system in compiler/erlang_classifier/. The
classifier walks each line of spliced Erlang handler text and
tags it as one of seven ErlangRewrite variants
(ActionCall, ActionCallWithBind, InterfaceCall,
InterfaceCallWithBind, RecordUpdate, Plain, Reply).
| Helper | Spec | Module |
|---|---|---|
erlang_rewrite_native_classified_full(line, actions, interfaces, data_var) |
erlang_line_classifier.frs |
compiler/erlang_classifier/ |
This was the round that set out to document where Frame doesn’t fit cleanly. Three concrete findings:
-
“Multi-state Frame” is a misnomer for line classifiers. The natural shape is single
$Classifyingstate that transitions to one of 7 terminal states (one per variant) with the destination state’s$>entry handler emitting the encoded result. Frame semantics:classify()returns BEFORE$>fires on the destination state, so the entry handler can’t set the classify() return value. The result has to be computed inclassify()’s body before transitioning, which makes the multi-state framing purely cosmetic. Round 3 ships as a single-state system with a branching body — the right shape for classifiers. -
Rich return types round-trip through pipe-delimited strings.
ErlangRewritehas 7 variants, several with struct payloads (field+method+args, etc.). The FSM emits a tagged string (InterfaceCallWithBind|field=X|method=Y|args=Z) and the glue Rust function parses it back into the typed variant. This is a real ergonomic cost — Frame interfaces need first-class sum types / structs for this kind of classifier to be natural. -
Slice / Vec params have the same gap.
action_namesandinterface_namesare&[String]in the original Rust signature. They flatten to comma-separated strings at the FSM boundary and re-split inside the handler body. Same underlying gap: Frame interfaces need first-class slice params.
framec body_closer bug surfaced (worth tracking
separately): the initial spec used a Rust labeled-break
construct ('classify: { ... break 'classify EXPR; ... }).
The body_closer (rust_lang.frs) treats the ' apostrophe as
the start of a char literal and miscounts braces, dumping the
framec module’s pub use statement in the middle of the user
handler body. Worked around in this round by using a nested
Rust fn classify_one(...) -> String { ... } instead. The
underlying framec bug should be fixed in rust_lang.frs.
Second framec finding — W415 false positive: Frame’s W415
warning (“return EXPR in event handler — return value is
lost”) fires on Rust early-returns inside nested fns/closures
inside the handler body. The lint is a lexical return
keyword search; it cannot distinguish “Frame handler
early-return” from “nested Rust scope early-return.” Round 3
hits this 6 times. The warnings are advisory and don’t block
compilation; a fix to make W415 scope-aware is a separate
follow-up.
Round 4 (shipped)
E113 section-order validator migrated to a genuinely
multi-state Frame system in compiler/section_order_validator/.
| Helper | Spec | Module |
|---|---|---|
validate_section_order(kinds: &[SystemSectionKind]) -> Option<String> |
section_order_validator.frs |
compiler/section_order_validator/ |
This is the round where Frame’s natural shape (state machine with transitions) actually fits the problem cleanly. The validator has two semantic phases:
$Walking— accept sections in canonical order, tracking the highest section index seen as a domain field.$OutOfOrder— terminal; further sections are absorbed (E113 is reported once per system per the existing contract).
The transition $Walking → $OutOfOrder fires on the first
violation. This is exactly the “error-absorbing terminal state”
pattern Frame state machines were designed for. The match is so
clean that the validator reads as a state machine on the page —
no awkward shoehorning needed.
Bonus dogfood moment: framec rejected the first version of
its own validator spec. The initial .frs placed domain:
between interface: and machine: (a natural authoring
order). framec’s parser rejected with E113: blocks out of
order. Expected: operations:, interface:, machine:, actions:,
domain: — i.e. framec validated the very Frame system that
implements the E113 check. The fix was to move the domain:
section to after machine: (canonical position). Pure
self-application; a satisfying RFC-0035 outcome.
The 6 unit tests cover: empty sequence, full canonical order,
partial canonical order, two specific out-of-order patterns,
and the “report once” contract (subsequent out-of-order
sections after the first one are absorbed by $OutOfOrder).
All existing E113 tests in frame_validator/tests continue to
pass against the FSM-backed implementation.
Round 5 (shipped)
E413 HSM parent-chain cycle detector migrated to a genuinely
multi-state Frame system in compiler/hsm_cycle_validator/.
This is the first round where Frame is applied to a graph
algorithm.
| Helper | Spec | Module |
|---|---|---|
validate_hsm_cycles(parents: &[(String, Option<String>)]) -> Vec<(String, String)> |
hsm_cycle_walker.frs |
compiler/hsm_cycle_validator/ |
Each state’s parent chain is walked by a fresh FSM instance with four states:
$Initial— first step seedsstart_nameandvisited, transitions to$Walking.$Walking— each step receives the next parent. Three outcomes: empty parent →$ChainRoot; parent already invisited→$CycleFound; otherwise record and self-loop.$CycleFound— terminal error state.$ChainRoot— terminal success state.
This is the canonical “graph walk as state machine” pattern: each
state in the FSM corresponds to a phase of the algorithm. The
visited set threads through as a domain field that evolves over
event calls — not a state-arg, because state-args are per-state
and visited needs to persist across the transition from
$Initial to $Walking.
Frame ergonomic observation: HashSet is not a Frame
interface type yet, so visited is a comma-separated string.
Lookup is O(n) instead of O(1). For HSM chains (typical
depth ≤10) this is fine, but on larger graphs the linear scan
would be measurable. A first-class Set interface type would
help, though for non-pathological HSM hierarchies the string CSV
is a clean-enough stand-in.
Bonus observation: the FSM-driven design separates concerns cleanly. The orchestrating Rust caller does:
let mut walker = HsmCycleWalker::__create();
let _ = walker.step(state_name.to_string()); // seed
let mut current = first_parent;
loop {
let result = walker.step(current.unwrap_or_default());
if result.starts_with("CYCLE|") { cycles.push(...); break; }
if result == "ROOT" { break; }
current = parent_map.get(...);
}
The FSM owns the algorithmic state (visited set, current phase); the Rust caller owns the I/O (parent-map lookup, error pushing). This is a clearer separation than the original hand-coded version where both concerns interleaved in one function. Whether the clarity gain is worth the FSM scaffolding is the kind of judgment call RFC-0035 was set up to capture — for this case the answer is yes, but the heuristic is “graph algorithms with non-trivial visited-tracking benefit; pure linear scans probably don’t.”
Round 6 (shipped)
W414 reachable-states BFS migrated to a 3-state Frame system
in compiler/reachable_validator/. This is the second graph
algorithm in the arc; where Round 5 used one FSM instance per
chain walk, Round 6 uses ONE FSM instance for the entire BFS
over the state-machine graph, with both visited and queue
threaded through domain fields that evolve across many events.
| Helper | Spec | Module |
|---|---|---|
validate_reachable_states(start, edges, all_states) -> Vec<String> |
reachable_walker.frs |
compiler/reachable_validator/ |
Three states with real transitions:
$Initial— first call must beseed(start), which seedsqueueandvisitedand transitions to$Walking.$Walking—enqueue(name)adds neighbors (de-duped againstvisited);next()pops the queue head or transitions to$Donewhen the queue empties;unreachable(all_csv)works as a read-only diff.$Done— terminal.next()returns"DONE"idempotently;unreachable()works the same as in$Walking;enqueue()is absorbed (defensive — a caller that loops pastDONEshouldn’t crash).
The orchestrating Rust caller (frame_validator/machine.rs)
builds the edge map from the AST (transition targets +
ancestor chain, with pop$ pre-filtered) and drives the walk:
walker.seed(start);
loop {
let head = walker.next();
if head == "DONE" { break; }
for n in edges[head] { walker.enqueue(n); }
}
walker.unreachable(all_states_csv)
The FSM owns BFS state; the caller owns AST extraction. Same separation-of-concerns benefit Round 5 surfaced.
Frame ergonomic finding (new): @@:(value) sets the return
value but does NOT exit the handler. Without a transition, the
body keeps running past the @@: call. The Round 6 enqueue()
body originally read:
if name.is_empty() {
@@:("SKIP".to_string())
}
if already_visited {
@@:("SKIP".to_string())
}
// add to visited + queue
@@:("ENQUEUED".to_string())
This had the BFS append the name to visited + queue regardless
of the early-set returns, producing an infinite loop on cyclic
input. The fix was an explicit Frame return after each
@@:(...):
if name.is_empty() {
@@:("SKIP".to_string())
return
}
The same shape is documented in W415’s warning message
(“return to exit”), but it’s a recurring papercut: Round 3
worked around with nested-fn returns, Round 5 stayed safe
because every conditional branch had a transition (which DOES
emit return; in the generated Rust), Round 4 has a latent
fall-through that happens to be harmless. A first-class
“return value and exit” Frame syntax (e.g. @@!:(value) as a
combined set+exit) would prevent this class of bug.
Round 7 (shipped)
Pipeline supervisor — the framec compile pipeline expressed as
a 5-state Frame system in compiler/pipeline_supervisor/.
This is Frame at the META level: a state machine that
describes the compiler’s own pipeline.
| Helper | Spec | Module |
|---|---|---|
PipelineSupervisor::new() |
pipeline_supervisor.frs |
compiler/pipeline_supervisor/ |
Five states:
$Idle— initial; no phase has begun.$Running— actively executing a phase; non-fatal errors collected, pipeline continues.$Aborted— fatal error (e.g. segmentation failure); the pipeline cannot continue. Terminal.$Failed— pipeline completed but non-fatal errors were collected. Terminal.$Done— clean exit, zero errors. Terminal.
As shipped in R7 the supervisor was observational — it did
not drive the orchestrator’s control flow; pipeline/compiler.rs
stayed native Rust and called into the supervisor at each phase
boundary (begin_phase("segment"), abort("E001", ...), etc.),
so wrong supervisor logic only affected --debug output, not
compilation correctness.
Superseded by Round 8 (below). The observer shape was a
half-measure — the worst middle. Round 8 makes the FSM actually
drive compile_ast_based. The “multi-day arc / high regression
risk” estimate quoted here turned out to be wrong: the conversion
landed in a single session, byte-identical across all snapshot
suites, by extracting the phases into functions and letting the
FSM sequence them (the phase bodies stay native — Frame owns the
control structure).
Bonus self-describing artifact: framec renders its own pipeline diagram.
$ framec compile -l graphviz \
framec/src/frame_c/compiler/pipeline_supervisor/pipeline_supervisor.frs \
> docs/pipeline_supervisor.gv
$ dot -Tsvg docs/pipeline_supervisor.gv > docs/pipeline_supervisor.svg
The .frs source IS the pipeline spec; the SVG is the
documentation. Changing the pipeline’s phase model means
editing the .frs and regenerating both the supervisor
implementation AND the diagram. Single source of truth.
Frame architectural finding (new): Observer FSMs are a viable Round-7 shape when the underlying control flow is too intertwined to rewrite. The pattern:
- Define the high-level state machine in Frame.
- Have the native-Rust controller call into the FSM at boundary events.
- The FSM tracks state, accumulates errors, produces summary output.
This unblocks Frame-as-meta-compiler for ANY system whose controller is already in a complex form (validators, codegen pipelines, optimization passes). The FSM is the model; the existing code is the implementation. Migration is additive — no rewrite required.
Future rounds (open)
The migration backlog has covered: name converters (R1), type mappers + dispatch helpers (R2), line classifiers (R3), stateful validators with terminal error states (R4), graph algorithms in two flavors (R5 per-chain FSM, R6 whole-walk FSM), and meta-compiler observer FSMs (R7). The RFC has its “diverse problems Frame CAN be applied to” inventory.
A 2026-05-25 whole-codebase audit (against the “FSM everywhere we can — including single-state systems” mandate) found that the inventory is extensive but not complete, and that one shipped FSM is decorative. These are the immediate roadmap — concrete native state machines to convert, and one fake to make real, each parity-gated against the matrix + fuzz:
- Round 8 — make the pipeline supervisor drive. ✅ SHIPPED
(2026-05-25).
compile_ast_based(~1000 lines of intertwined linear logic) was carved into six phase functions (do_segment/do_parse/do_module_gates/do_graphviz/do_validate_codegen/do_assemble) over a shared ownedPipelineCtx. The old observerPipelineSupervisorwas replaced byPipelineFsm(compiler/pipeline_supervisor/): one state per phase ($Idle → $Segment → $Parse → $ModuleGates → $Graphviz → $ValidateCodegen → $Assemble → $Done), each state’s$>enter handler running itsdo_*phase and transitioning — to the next phase, or, on early exit, stashing theCompileResultinearlyand jumping to$Done. The machine self-drives from a singlerun()call (RFC-0039B1“backbone owns the state”:PipelineCtx+earlyare domain fields).compile_ast_basedis now a three-line delegation torun_pipeline. The transition graph is the pipeline control flow, andframec compile -l graphvizon the.frsrenders the real pipeline. Byte-identical output, full suite + parity green. - Round 9 —
domain.rsouter scanner. ✅ SHIPPED (2026-05-25). Thedomain:section line-walk is nowDomainScannerFsm(compiler/domain_scanner/):$Start → $Scan(self-loops one physical line per pass)→ $Done, owning bytes/pos/vars/pending buffers and delegating@@[…]+ default-expression sub-scans to the existingscan_attribute/ExprScannerFsm.parse_domainis a thin caller. Byte-identical; gate green. - Round 10 —
assembler/mod.rscall-site scanner. ✅ SHIPPED (2026-05-25).expand_system_instantiationsis now a lexer:CallSiteScannerFsm(compiler/call_site_scanner/) walks a native region into aCallTokenstream (Literalverbatim /Callper@@[!]Name(args)), delegating comment/string/ balanced-paren detection to the languageSyntaxSkipper. The assembler expands eachCallviaexpand_one(it holds the borrowed system-params maps, which never enter the FSM). Byte-identical; gate green. - Round 11 —
native_region_scanner/unified/metadata.rs. ✅ SHIPPED (2026-05-25), scoped honestly. On inspectionextract_segment_metadatais a flatmatch kinddispatch, not a scanner — a top-level FSM would be decorative (the R8 lesson). Its one genuinely parser-shaped arm, the transition-string grammar(exit)? -> (=>)? (enter)? ($State(args)? | pop$) "label"?, became a real multi-state grammar FSM:TransitionMetaScannerFsm(compiler/transition_meta_scanner/) walks one state per element ($Target → $ExitArgs → $EnterArgs → $StateArgs → $LabelForward → $Done), each extracting its piece. TheTransitionarm is now a one-line delegation; the rest ofmetadata.rsstays the dispatch it honestly is (369 → 238 LOC). Byte-identical; gate green. - Round 12 — Erlang scanners. ✅ SHIPPED (2026-05-25), scoped
honestly.
lexical.rsis mostly stateless string utilities (replace_word,erlang_op_name,split_top_level_commas, …), not scanners. Its one genuine state machine — and the most state-machine-like thing here — isparen_balance_unclosed: a lexical-mode scanner (string / quoted-atom / escape) counting brackets, nowParenBalanceFsm(compiler/paren_balance_scanner/) whose STATES are the lexical modes ($Normal / $InString / $InAtom / $StringEscape / $AtomEscape).body_processor.rs(1639 LOC) is a line-transform pipeline with a dozen tiny intertwined local scanners — not a clean single FSM; forcing it would be decorative + R8-scale, so it is assessed, not converted (the R8/R11 lesson). Byte- identical; gate green. - Round 13 (judgement call) — the
pipeline_parserparse_*oracle methods. ⊘ EVALUATED, NOT CONVERTED (2026-05-25). The judgement: there is no non-ceremonial standalone conversion to make here — the genuine parser-as-FSM work is RFC-0039’s (now merged onto this line; see below). Reasoning:- The
parse_*methods are LL(1) recursive-descent dispatch (match token → build node → recurse). Their genuinely-stateful byte-level scanning already delegates to leaf FSMs (ExprScannerFsm,AttributeScannerFsm, and the R9–R12 domain / call-site / transition-meta / paren-balance scanners). What remains is token dispatch + mutual recursion. - The one genuinely scanner-shaped method,
parse_body_block(a token-dispatch loop driving the lexer), is exactly the RFC-0039 “parser-as-Frame backbone” ($Machine → $StateHeader / $StateBody) — now merged onto this line, where theSystemBackboneFrame system already drives the state-body loop. That conversion is therefore done, by RFC-0039, not a fresh R13 task. - Turning the recursive descent into an explicit push/pop PDA is RFC-0039’s “(B) backbone-as-Frame-system” half, which RFC-0039 itself evaluated as a self-hosting proof, not a clarity or correctness win for framec’s flat, shallow, LL(1) grammar — a deliberate multi-day arc, not a quick round.
Net: the genuine “parser as composed FSMs” lives in RFC-0039; there is no non-ceremonial standalone
parse_*conversion left for RFC-0035 to make. (Refusing to pad the round count with a degenerate FSM is the R8/R11/R12 lesson applied.) - The
Lower-value / explicitly-deferred candidates (record, don’t rush):
codegen backend emit dispatch (ceremony, not a real demo);
per-target attribute scanners (only if a new attribute family
lands).
Process for adding a new FSM
When migrating a hand-coded scanner / parser / classifier to a Frame FSM:
- Read the closest existing
.frsas a template.rust_lang.frsis the model for “scan bytes, recognize lexical shapes, output a position or error.”attribute_scanner.frsis the model for “scan a structured construct.” - Write the new
.frsnext to the hand-coded module it replaces. Include a header comment explaining what it does and what the per-state transition graph means. - Generate the
.gen.rsviaframec compile -l rust -o <dir>/ <spec>.frs, then rename the output file to<name>.gen.rs. - Write the glue
.rsthatinclude!s the.gen.rsand implements the trait the hand-coded module used to satisfy. Same shape asbody_closer/rust.rsetc. - Delete the hand-coded module. The build catches any missed references; nothing should silently retain the old implementation.
- Run
cargo test --workspace+ the matrix Rust target. Both must stay green. - Document the regen in the glue file’s module comment so future contributors can rebuild after editing the spec.
Migration outcome
The backlog is being worked round by round, with each round shipping a coherent slice and recording the Frame ergonomic observations it surfaces. Current state:
| Migration target | Status |
|---|---|
| Java handler-body native rewriter | Done. java_await_rewrite.frs |
Name converters (to_snake_case, pascal_case_variant) |
Done — Round 1. compiler/name/ |
| Per-target type mappers + Rust dispatch helpers (10 fns) | Done — Round 2. compiler/type_map/ + compiler/target_query/ |
| Erlang per-line native rewriter (line classifier) | Done — Round 3. compiler/erlang_classifier/ |
| E113 section-order validator (multi-state) | Done — Round 4. compiler/section_order_validator/ |
| E413 HSM cycle detector (multi-state graph walker) | Done — Round 5. compiler/hsm_cycle_validator/ |
| W414 reachable-states BFS (whole-walk multi-state FSM) | Done — Round 6. compiler/reachable_validator/ |
| Pipeline supervisor (meta-compiler driver FSM) | Driving — Round 8 (shipped 2026-05-25). compiler/pipeline_supervisor/ (PipelineFsm). One state per compile phase; the enter-handler chain runs each do_* phase and transitions. compile_ast_based delegates to run_pipeline; the transition graph IS the control flow. (R7 shipped this observational; R8 closed that half-measure.) |
domain.rs outer line scanner |
Open — Round 9. Native byte-walk; only sub-parts FSM’d. |
assembler/mod.rs call-site scanner |
Open — Round 10. |
unified/metadata.rs scanner |
Open — Round 11. |
Erlang lexical.rs + body_processor.rs scanners |
Open — Round 12. |
pipeline_parser parse_* oracle methods |
Open — Round 13 (judgement call). |
The first migration (Java await rewriter) closed an immediate
whack-a-mole risk: a 90-LOC hand-coded byte walker was
replaced with a Frame spec modeled on
body_closer/rust_lang.frs. Round 1 and Round 2 added
12 single-state systems — Frame functioning as a function-
definition wrapper around native bodies — and surfaced the
ergonomic observations documented above (return-value vs.
return-early semantics, recursion routing, missing bool/enum
interface types).
The RFC’s lasting value is the inventory + the process documentation: a future contributor whose change touches a scanner-like subsystem can read this RFC and decide quickly whether the FSM approach fits. The whack-a-mole risk that motivated the RFC (#24 → #25 → #26) only applies to genuine byte-walking scanners; the inventory makes that distinction explicit.
Examples
Migrating a hand-coded scanner to an FSM — the #26 path
The fix for FRAMEC_BUGS #26 already demonstrates the process.
Before: framec/src/frame_c/compiler/native_region_scanner/frame_structural.rs
contained a 150-line hand-coded skipper. After:
body_closer/frame_structural.frs ← the state-machine spec
body_closer/frame_structural.gen.rs ← generated by framec
body_closer/frame_structural.rs ← 30-line glue file
native_region_scanner/frame_structural_skipper.frs
native_region_scanner/frame_structural_skipper.gen.rs
native_region_scanner/frame_structural.rs ← glue
The before-and-after LOC ratio is roughly 1:1, but the after version is declarative and dogfoods framec’s own compiler. Four prior bugs (#24, #25-A, #25-B, #26) closed simultaneously.
Single-state Frame for a name converter
name/to_snake_case.frs:
@@system ToSnakeCase {
interface:
convert(s: String): String
machine:
$Active {
convert(s: String): String {
@@:({
let mut out = String::new();
let mut prev_lower = false;
for c in s.chars() {
if c.is_ascii_uppercase() {
if prev_lower { out.push('_'); }
out.push(c.to_ascii_lowercase());
prev_lower = false;
} else {
out.push(c);
prev_lower = c.is_ascii_lowercase();
}
}
out
})
}
}
}
framec compiles this to a single Rust struct + impl in its own
source tree. The previously-inline to_snake_case function in
codegen_utils.rs becomes a re-export:
pub fn to_snake_case(s: &str) -> String {
ToSnakeCase::create().convert(s.to_string())
}
Same call sites continue to work. The function body now lives
in a .frs file alongside its peers, and any codegen regression
that affects how this gets emitted breaks framec’s build
immediately.
Alternatives
Migrate the top-level lexer and parser to Frame FSMs
Considered: write pipeline_parser.frs and replace the
hand-written parser. Rejected for this RFC because the top-level
parser handles all of Frame’s grammar (every keyword, every
operator, every section header) and would be a substantial
multi-week effort. Worth pursuing separately if and when the
parser’s complexity creates its own whack-a-mole problems. The
lexer and parser are stable enough today that the cost-benefit
doesn’t justify the work right now.
Migrate the validator to Frame FSMs
Considered: each validate_X function becomes a single-state
Frame system. Rejected as one-shot; instead, new validators
that match the FSM shape (multi-state walking the AST) should
be written as FSMs. The existing flat-function validators stay
flat — they’re simple enough that the migration adds noise
without payoff.
Single big “framec utilities” Frame program
Considered: one mega-system containing every helper as an interface method. Rejected — Frame systems are state machines, and a state machine with one state and 30 interface methods is just a class. Splitting into one system per function is more honest about the shape, makes each spec independently editable, and matches the existing per-target-skipper layout.
Skip the inventory; just migrate as bugs surface
Considered: do nothing now, migrate each hand-coded module the next time it acquires a bug. Rejected because the FRAMEC_BUGS #24-#26 cycle is exactly the cost: each round looked like a small fix but cumulatively was four iterations of debugging the same class of bug. The inventory exists so a future contributor can spot the pattern and reach for the FSM template before adding the third hand-coded special case.
Migration
Source-additive. New .frs specs join the existing 39. The
generated .gen.rs files join the existing per-target gen
files. Hand-coded modules are deleted as their replacements
ship — no parallel implementations, no transitional period.
Each migration is independently mergeable. The RFC commits to the inventory and the process, not to a deadline for any specific item.
References
- Frame language reference
- Glossary
CHANGELOG.md- RFC-0027 — In-tree snapshot tests (the regression net that
catches whether a migrated
.frsreproduces its hand-coded predecessor’s output). - RFC-0033 — Idiomatic Rust output. Introduced
java_native_rewrite.rs, the largest current migration candidate. - RFC-0034 — In-process compile checks. Verifies the generated
output of every
.frs(and every other fixture) parses in its target language. - FRAMEC_BUGS.md #24 / #25 / #26 — the bug arc that motivated this RFC. The structural skipper was migrated to an FSM as part of #26’s fix; future hand-coded scanners should follow the same path before they accumulate similar bugs.