RFC-0044: Kernel context-stack must clean up on exception
- Status: Partially implemented — the C++ row is implemented (2026-06-13,
issue #86): the interface dispatch wrapper now uses a method-local RAII
scope-guard (
struct __CtxGuard { decltype(_context_stack)& s; ~__CtxGuard(){ s.pop_back(); } }) instead oftry/catch(...) { pop; throw; }. This both keeps the cleanup invariant on every exit path AND makes C++ output compile under-fno-exceptions(the Godot-web requirement that motivated #86). The remaining per-backend rows below (including Swift’sdefer, deferred to avoid entangling #86 with a pre-existing async-stdout test flake) are still Phase-2 work. Original status: Draft — surfaced by RFC-0043 fixture-expansion research (2026-06-01). - Author: Mark Truluck mark.truluck@cogiton.com
- Created: 2026-06-01
- Builds on: RFC-0020 (runtime kernel), RFC-0043 (async layered architecture)
- Relates to: the framec interface-dispatch lowering in
framec/src/frame_c/compiler/codegen/interface_gen.rs
The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are to be interpreted as described in RFC 2119.
Summary
Every framec interface method on a system today emits the context-stack push/kernel-call/pop sequence without exception cleanup:
self._context_stack.append(__ctx)
self.__kernel(__e) # or: await self.__kernel(__e)
return self._context_stack.pop()._return
If __kernel raises (a handler body throws an unhandled exception, the
async machinery sees CancelledError, etc.), the pop never runs. The
context stack accumulates a stale entry. Subsequent dispatches read the
wrong _return slot, see a phantom _transitioned flag from a previous
turn, or — under sustained failure — run the stack to unbounded growth.
This RFC requires every backend’s interface dispatch to wrap the push
through the pop in language-idiomatic try/finally (or equivalent
RAII / catch+rethrow / scope-exit construct), so the context-stack
invariant len(stack_after) == len(stack_before) holds whether the
dispatch returns normally OR raises.
The same defect applies to the init() lifecycle method (which fires
the start $> event) — already shown to use the same shape in
async_wrap.rs::emit_init_method.
Motivation
The context stack is the kernel’s per-dispatch state record: each
interface call pushes a fresh FrameContext carrying the event, the
return slot, the _transitioned flag, the per-call _data map, and
any internal state the kernel needs for the duration of one event’s
trip through the state machine. Handler code reads
_context_stack[-1] to access these fields — @@:(value) sets
_context_stack[-1]._return, transition machinery sets
_context_stack[-1]._transitioned, and so on.
The invariant the kernel relies on is “every push is paired with a pop in the same interface call.” Today that’s enforced by hand in the straight-line happy path. The moment an exception escapes the kernel chain, the invariant is silently broken.
Concrete failure mode
class _Repo:
async def fetch(self, key):
__e = FrameEvent("fetch", [key])
__ctx = FrameContext(__e, None)
self._context_stack.append(__ctx) # +1, stack depth = 1
await self.__kernel(__e) # <-- raises ValueError
return self._context_stack.pop()._return # NEVER REACHED
# Caller:
try:
await repo.fetch("a")
except ValueError:
pass
# Now the stack carries a stale entry from the failed call.
status = await repo.get_status()
# This pushes context #2 on top of the stale #1. The kernel reads
# _context_stack[-1] — which is fine THIS turn.
# But _context_stack[-1]._transitioned may still hold True from #1's
# partial run, mis-routing transition logic.
This isn’t theoretical. RFC-0043’s fixture-expansion deep-dive surfaced
it across every async-capable backend (the same emission pattern appears
in Python, TypeScript, JavaScript, Rust, Java, C#, Kotlin, Swift, Dart,
GDScript, C++ — see interface_gen.rs). It also affects the sync
dispatch path (non-@@[async] systems) because handler code can throw
in any language.
Why this surfaced now
Pre-RFC-0043, async systems were unstructured single-class emissions.
A handler exception always propagated as a hard error and the program
either restarted or terminated — the stale-stack window was invisible.
RFC-0043’s layered casing introduced the architectural distinction
between “external boundary” (recoverable error) and “internal
dispatch” (structured kernel chain). The boundary now expects to
recover from E703 and from user-handler exceptions; the kernel
chain still hasn’t caught up.
Specification
The contract
Every emitted interface method on a Frame system MUST preserve the
context-stack length across the dispatch call, regardless of whether
__kernel returns normally or raises. The push-through-pop sequence
MUST be wrapped in a language-idiomatic cleanup construct that
runs the pop on both code paths.
Per-backend cleanup mechanism
| Backend | Construct |
|---|---|
| Python | try: ... finally: self._context_stack.pop() |
| TypeScript / JavaScript | try { ... } finally { this._context_stack.pop(); } |
| Java / C# / Kotlin | try { ... } finally { ... pop ... } |
| Swift | defer { ... pop ... } |
| Dart | try { ... } finally { ... pop ... } |
| Rust | RAII guard struct with Drop impl (split-borrow on the stack) |
| C++ | RAII guard struct with destructor on _context_stack.pop_back() — implemented (#86) |
| GDScript | manual cleanup (no try/finally) — see § GDScript caveat |
| Erlang | gen_statem handles natively |
| Go / PHP / Ruby / Lua / C | not currently async-capable; sync dispatch needs the same fix |
Return-value handling
The current shape moves the return-value extraction INTO the pop:
return self._context_stack.pop()._return
Under the new contract, the value must be extracted before the pop runs
in finally:
self._context_stack.append(__ctx)
try:
self.__kernel(__e)
return self._context_stack[-1]._return
finally:
self._context_stack.pop()
For Python this works because return evaluates its operand before the
finally block runs. For C# async Task<T>, same evaluation order. For
Rust, the value extraction happens before the RAII guard’s drop runs.
GDScript caveat
GDScript has no try/finally. Two paths exist for the cleanup
guarantee:
- Manual cleanup at every exit point — emit the pop before every
return and after
awaitcompletes. Brittle, code-size cost. - Wrap the kernel call in a generator that always pops — uses
GDScript’s
Callable+coroutine. Implementation cost moderate.
For initial RFC-0044 implementation, GDScript adopts the manual cleanup pattern, with the architectural caveat documented in the RFC’s known limitations.
__router and state-method recursion
__router and the per-state dispatch methods (_state_<Name>) also
push to the context stack indirectly (via state-args and per-handler
data scoping). Auditing those paths is a Phase 2 of this RFC; the
push/pop shape is similar but interacts with the HSM walk.
Per-backend implementation notes
Python — straightforward
self._context_stack.append(__ctx)
try:
await self.__kernel(__e)
return self._context_stack[-1]._return
finally:
self._context_stack.pop()
For void methods (no return), drop the return line:
self._context_stack.append(__ctx)
try:
await self.__kernel(__e)
finally:
self._context_stack.pop()
TypeScript / JavaScript — same pattern, semicolons
this._context_stack.push(__ctx);
try {
await this.__kernel(__e);
return this._context_stack[this._context_stack.length - 1]._return;
} finally {
this._context_stack.pop();
}
Java — synchronous dispatch (CompletableFuture wraps)
this._context_stack.add(__ctx);
try {
this.__kernel(__e);
return CompletableFuture.completedFuture(
this._context_stack.get(this._context_stack.size() - 1)._return
);
} finally {
this._context_stack.remove(this._context_stack.size() - 1);
}
C# — async Task path
this._context_stack.Add(__ctx);
try {
await this.__kernel(__e);
return this._context_stack[this._context_stack.Count - 1]._return;
} finally {
this._context_stack.RemoveAt(this._context_stack.Count - 1);
}
Kotlin — suspend fun
this._context_stack.add(__ctx)
try {
this.__kernel(__e)
return this._context_stack[this._context_stack.size - 1]._return
} finally {
this._context_stack.removeAt(this._context_stack.size - 1)
}
Swift — defer
self._context_stack.append(__ctx)
defer { self._context_stack.removeLast() }
await self.__kernel(__e)
return self._context_stack[self._context_stack.count - 1]._return
Dart — try/finally
this._context_stack.add(__ctx);
try {
await this.__kernel(__e);
return this._context_stack[this._context_stack.length - 1]._return;
} finally {
this._context_stack.removeLast();
}
Rust — RAII guard
struct _ContextStackGuard<'a> {
stack: &'a mut Vec<FrameContext>,
}
impl Drop for _ContextStackGuard<'_> {
fn drop(&mut self) {
self.stack.pop();
}
}
self._context_stack.push(__ctx);
let _guard = _ContextStackGuard { stack: &mut self._context_stack };
self.__kernel(&__e).await;
// _guard drops when this scope exits — pops the entry. Works on
// both happy path and panic unwind.
C++ — RAII guard
struct _ContextStackGuard {
std::vector<FrameContext>& stack;
~_ContextStackGuard() { stack.pop_back(); }
};
_context_stack.push_back(std::move(__ctx));
_ContextStackGuard __guard{_context_stack};
co_await __kernel(_context_stack.back()._event);
// __guard's destructor runs the pop on the happy path AND on
// exception unwinding.
GDScript — manual cleanup, documented limitation
GDScript has no try/finally. Adopt the manual cleanup at every exit:
self._context_stack.append(__ctx)
var __result = await self.__kernel(__e)
var __return = self._context_stack.pop_back()._return
return __return
A handler exception (which in GDScript is a hard error / push_error / runtime crash) leaves the stale entry. This is the known limitation that GDScript fixtures explicitly verify and document.
A more thorough mitigation (Callable-based scope guard, or a small helper class) is RFC-0044’s Phase 2 work.
Drawbacks
- Code-size growth. Every interface method gains ~4 lines of cleanup boilerplate. For systems with many interface methods, the emitted output grows measurably.
- Compile-time growth. Negligible.
- Per-backend variance. Each backend needs its own template. The
existing
interface_gen.rsalready has a per-backend match; adding the cleanup follows the same shape.
Rationale and alternatives
Why not a kernel-side cleanup? Could the cleanup live inside
__kernel instead of every interface method? The kernel doesn’t know
where the push was — it’s the interface method that holds that scope.
A kernel-side cleanup would need a more sophisticated scope mechanism
(e.g. a “dispatch token” pattern) that’s harder to verify.
Why not catch-and-rethrow? Several backends could use catch-all:
self._context_stack.append(__ctx)
try:
await self.__kernel(__e)
return self._context_stack[-1]._return
except:
self._context_stack.pop()
raise
Equivalent at runtime, but Python’s except: (bare) also catches
KeyboardInterrupt and SystemExit, which is unusual practice. try/
finally is the canonical idiom. Same shape for TS/JS — try/catch{
throw} works but is uglier than try/finally.
Why not patch only the layered (@@[async]) backends? The bug
exists on every backend with sync dispatch too — a handler that
throws on Python (without @@[async]) still leaves the stale entry.
Fix it everywhere.
Acceptance / done criteria
- Every backend’s
interface_gen.rsemission wraps push-through-pop in the language-idiomatic cleanup mechanism. - A fixture per backend that verifies
len(stack) before == len(stack) afterwhether the kernel returns normally OR raises. - A fixture per backend that verifies
_returnand_transitionedon a fresh dispatch see clean state after a previous dispatch raised. - GDScript’s manual cleanup is documented; the deeper Callable-based fix is filed as Phase 2.
Forward references
- Phase 2 —
__routeraudit. State-args and HSM walk also push to ancillary state; verify the same invariant. - Phase 3 — async machinery cleanup. Compartment chain
push/pop (
_state_stack.pushduringpush$transitions) needs the same audit.
Cross-references
- RFC-0020 — Runtime kernel (the machine’s dispatch path)
- RFC-0043 — Async layered architecture (the fixture-expansion arc that surfaced this RFC)
framec/src/frame_c/compiler/codegen/interface_gen.rs— the emission siteframec/src/frame_c/compiler/codegen/system_codegen/async_wrap.rs— theinit()lifecycle method’s emission, same patternCHANGELOG.md