RFC-0044: Kernel context-stack must clean up on exception

  • Status: Partially implemented — the C++ row is implemented (2026-06-13, issue #86): the interface dispatch wrapper now uses a method-local RAII scope-guard (struct __CtxGuard { decltype(_context_stack)& s; ~__CtxGuard(){ s.pop_back(); } }) instead of try/catch(...) { pop; throw; }. This both keeps the cleanup invariant on every exit path AND makes C++ output compile under -fno-exceptions (the Godot-web requirement that motivated #86). The remaining per-backend rows below (including Swift’s defer, deferred to avoid entangling #86 with a pre-existing async-stdout test flake) are still Phase-2 work. Original status: Draft — surfaced by RFC-0043 fixture-expansion research (2026-06-01).
  • Author: Mark Truluck mark.truluck@cogiton.com
  • Created: 2026-06-01
  • Builds on: RFC-0020 (runtime kernel), RFC-0043 (async layered architecture)
  • Relates to: the framec interface-dispatch lowering in framec/src/frame_c/compiler/codegen/interface_gen.rs

The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are to be interpreted as described in RFC 2119.

Summary

Every framec interface method on a system today emits the context-stack push/kernel-call/pop sequence without exception cleanup:

self._context_stack.append(__ctx)
self.__kernel(__e)            # or: await self.__kernel(__e)
return self._context_stack.pop()._return

If __kernel raises (a handler body throws an unhandled exception, the async machinery sees CancelledError, etc.), the pop never runs. The context stack accumulates a stale entry. Subsequent dispatches read the wrong _return slot, see a phantom _transitioned flag from a previous turn, or — under sustained failure — run the stack to unbounded growth.

This RFC requires every backend’s interface dispatch to wrap the push through the pop in language-idiomatic try/finally (or equivalent RAII / catch+rethrow / scope-exit construct), so the context-stack invariant len(stack_after) == len(stack_before) holds whether the dispatch returns normally OR raises.

The same defect applies to the init() lifecycle method (which fires the start $> event) — already shown to use the same shape in async_wrap.rs::emit_init_method.

Motivation

The context stack is the kernel’s per-dispatch state record: each interface call pushes a fresh FrameContext carrying the event, the return slot, the _transitioned flag, the per-call _data map, and any internal state the kernel needs for the duration of one event’s trip through the state machine. Handler code reads _context_stack[-1] to access these fields — @@:(value) sets _context_stack[-1]._return, transition machinery sets _context_stack[-1]._transitioned, and so on.

The invariant the kernel relies on is “every push is paired with a pop in the same interface call.” Today that’s enforced by hand in the straight-line happy path. The moment an exception escapes the kernel chain, the invariant is silently broken.

Concrete failure mode

class _Repo:
    async def fetch(self, key):
        __e = FrameEvent("fetch", [key])
        __ctx = FrameContext(__e, None)
        self._context_stack.append(__ctx)   # +1, stack depth = 1
        await self.__kernel(__e)            # <-- raises ValueError
        return self._context_stack.pop()._return  # NEVER REACHED

# Caller:
try:
    await repo.fetch("a")
except ValueError:
    pass

# Now the stack carries a stale entry from the failed call.
status = await repo.get_status()
# This pushes context #2 on top of the stale #1. The kernel reads
# _context_stack[-1] — which is fine THIS turn.
# But _context_stack[-1]._transitioned may still hold True from #1's
# partial run, mis-routing transition logic.

This isn’t theoretical. RFC-0043’s fixture-expansion deep-dive surfaced it across every async-capable backend (the same emission pattern appears in Python, TypeScript, JavaScript, Rust, Java, C#, Kotlin, Swift, Dart, GDScript, C++ — see interface_gen.rs). It also affects the sync dispatch path (non-@@[async] systems) because handler code can throw in any language.

Why this surfaced now

Pre-RFC-0043, async systems were unstructured single-class emissions. A handler exception always propagated as a hard error and the program either restarted or terminated — the stale-stack window was invisible. RFC-0043’s layered casing introduced the architectural distinction between “external boundary” (recoverable error) and “internal dispatch” (structured kernel chain). The boundary now expects to recover from E703 and from user-handler exceptions; the kernel chain still hasn’t caught up.

Specification

The contract

Every emitted interface method on a Frame system MUST preserve the context-stack length across the dispatch call, regardless of whether __kernel returns normally or raises. The push-through-pop sequence MUST be wrapped in a language-idiomatic cleanup construct that runs the pop on both code paths.

Per-backend cleanup mechanism

Backend Construct
Python try: ... finally: self._context_stack.pop()
TypeScript / JavaScript try { ... } finally { this._context_stack.pop(); }
Java / C# / Kotlin try { ... } finally { ... pop ... }
Swift defer { ... pop ... }
Dart try { ... } finally { ... pop ... }
Rust RAII guard struct with Drop impl (split-borrow on the stack)
C++ RAII guard struct with destructor on _context_stack.pop_back()implemented (#86)
GDScript manual cleanup (no try/finally) — see § GDScript caveat
Erlang gen_statem handles natively
Go / PHP / Ruby / Lua / C not currently async-capable; sync dispatch needs the same fix

Return-value handling

The current shape moves the return-value extraction INTO the pop:

return self._context_stack.pop()._return

Under the new contract, the value must be extracted before the pop runs in finally:

self._context_stack.append(__ctx)
try:
    self.__kernel(__e)
    return self._context_stack[-1]._return
finally:
    self._context_stack.pop()

For Python this works because return evaluates its operand before the finally block runs. For C# async Task<T>, same evaluation order. For Rust, the value extraction happens before the RAII guard’s drop runs.

GDScript caveat

GDScript has no try/finally. Two paths exist for the cleanup guarantee:

  1. Manual cleanup at every exit point — emit the pop before every return and after await completes. Brittle, code-size cost.
  2. Wrap the kernel call in a generator that always pops — uses GDScript’s Callable + coroutine. Implementation cost moderate.

For initial RFC-0044 implementation, GDScript adopts the manual cleanup pattern, with the architectural caveat documented in the RFC’s known limitations.

__router and state-method recursion

__router and the per-state dispatch methods (_state_<Name>) also push to the context stack indirectly (via state-args and per-handler data scoping). Auditing those paths is a Phase 2 of this RFC; the push/pop shape is similar but interacts with the HSM walk.

Per-backend implementation notes

Python — straightforward

self._context_stack.append(__ctx)
try:
    await self.__kernel(__e)
    return self._context_stack[-1]._return
finally:
    self._context_stack.pop()

For void methods (no return), drop the return line:

self._context_stack.append(__ctx)
try:
    await self.__kernel(__e)
finally:
    self._context_stack.pop()

TypeScript / JavaScript — same pattern, semicolons

this._context_stack.push(__ctx);
try {
    await this.__kernel(__e);
    return this._context_stack[this._context_stack.length - 1]._return;
} finally {
    this._context_stack.pop();
}

Java — synchronous dispatch (CompletableFuture wraps)

this._context_stack.add(__ctx);
try {
    this.__kernel(__e);
    return CompletableFuture.completedFuture(
        this._context_stack.get(this._context_stack.size() - 1)._return
    );
} finally {
    this._context_stack.remove(this._context_stack.size() - 1);
}

C# — async Task path

this._context_stack.Add(__ctx);
try {
    await this.__kernel(__e);
    return this._context_stack[this._context_stack.Count - 1]._return;
} finally {
    this._context_stack.RemoveAt(this._context_stack.Count - 1);
}

Kotlin — suspend fun

this._context_stack.add(__ctx)
try {
    this.__kernel(__e)
    return this._context_stack[this._context_stack.size - 1]._return
} finally {
    this._context_stack.removeAt(this._context_stack.size - 1)
}

Swift — defer

self._context_stack.append(__ctx)
defer { self._context_stack.removeLast() }
await self.__kernel(__e)
return self._context_stack[self._context_stack.count - 1]._return

Dart — try/finally

this._context_stack.add(__ctx);
try {
    await this.__kernel(__e);
    return this._context_stack[this._context_stack.length - 1]._return;
} finally {
    this._context_stack.removeLast();
}

Rust — RAII guard

struct _ContextStackGuard<'a> {
    stack: &'a mut Vec<FrameContext>,
}
impl Drop for _ContextStackGuard<'_> {
    fn drop(&mut self) {
        self.stack.pop();
    }
}

self._context_stack.push(__ctx);
let _guard = _ContextStackGuard { stack: &mut self._context_stack };
self.__kernel(&__e).await;
// _guard drops when this scope exits — pops the entry. Works on
// both happy path and panic unwind.

C++ — RAII guard

struct _ContextStackGuard {
    std::vector<FrameContext>& stack;
    ~_ContextStackGuard() { stack.pop_back(); }
};

_context_stack.push_back(std::move(__ctx));
_ContextStackGuard __guard{_context_stack};
co_await __kernel(_context_stack.back()._event);
// __guard's destructor runs the pop on the happy path AND on
// exception unwinding.

GDScript — manual cleanup, documented limitation

GDScript has no try/finally. Adopt the manual cleanup at every exit:

self._context_stack.append(__ctx)
var __result = await self.__kernel(__e)
var __return = self._context_stack.pop_back()._return
return __return

A handler exception (which in GDScript is a hard error / push_error / runtime crash) leaves the stale entry. This is the known limitation that GDScript fixtures explicitly verify and document.

A more thorough mitigation (Callable-based scope guard, or a small helper class) is RFC-0044’s Phase 2 work.

Drawbacks

  • Code-size growth. Every interface method gains ~4 lines of cleanup boilerplate. For systems with many interface methods, the emitted output grows measurably.
  • Compile-time growth. Negligible.
  • Per-backend variance. Each backend needs its own template. The existing interface_gen.rs already has a per-backend match; adding the cleanup follows the same shape.

Rationale and alternatives

Why not a kernel-side cleanup? Could the cleanup live inside __kernel instead of every interface method? The kernel doesn’t know where the push was — it’s the interface method that holds that scope. A kernel-side cleanup would need a more sophisticated scope mechanism (e.g. a “dispatch token” pattern) that’s harder to verify.

Why not catch-and-rethrow? Several backends could use catch-all:

self._context_stack.append(__ctx)
try:
    await self.__kernel(__e)
    return self._context_stack[-1]._return
except:
    self._context_stack.pop()
    raise

Equivalent at runtime, but Python’s except: (bare) also catches KeyboardInterrupt and SystemExit, which is unusual practice. try/ finally is the canonical idiom. Same shape for TS/JS — try/catch{ throw} works but is uglier than try/finally.

Why not patch only the layered (@@[async]) backends? The bug exists on every backend with sync dispatch too — a handler that throws on Python (without @@[async]) still leaves the stale entry. Fix it everywhere.

Acceptance / done criteria

  • Every backend’s interface_gen.rs emission wraps push-through-pop in the language-idiomatic cleanup mechanism.
  • A fixture per backend that verifies len(stack) before == len(stack) after whether the kernel returns normally OR raises.
  • A fixture per backend that verifies _return and _transitioned on a fresh dispatch see clean state after a previous dispatch raised.
  • GDScript’s manual cleanup is documented; the deeper Callable-based fix is filed as Phase 2.

Forward references

  • Phase 2 — __router audit. State-args and HSM walk also push to ancillary state; verify the same invariant.
  • Phase 3 — async machinery cleanup. Compartment chain push/pop (_state_stack.push during push$ transitions) needs the same audit.

Cross-references

  • RFC-0020 — Runtime kernel (the machine’s dispatch path)
  • RFC-0043 — Async layered architecture (the fixture-expansion arc that surfaced this RFC)
  • framec/src/frame_c/compiler/codegen/interface_gen.rs — the emission site
  • framec/src/frame_c/compiler/codegen/system_codegen/async_wrap.rs — the init() lifecycle method’s emission, same pattern
  • CHANGELOG.md