RFC-0031: Post-release process — RC validation, CI gates, and discipline against rot

Status: Accepted (Process)
Author: Mark Truluck mark.truluck@cogiton.com
Created: 2026-05-18
Resolves: RFC-0029 § “CI integration question”; the “matrix + fuzz both green before merge” recommendation from Bill Gates second-pass review.

Origin

This RFC was created in response to the second persona-critique review (Bill Gates, post-fuzz-repair) conducted at the end of the 2026-05-17/18 RC-validation session. The review’s top three asks were:

CI gate that runs matrix + fuzz-smoke on every push. ~3-4 minutes wall. Stops the fuzz rot pattern. Stops the matrix from being silently broken. “The single highest-leverage change available.”
Split frame_validator.rs by error-code family.
Document runtime overhead in framepiler_design.md.

Items 2 and 3 are technical roadmap tasks (#438, #439). Item 1 is process — and the conditions that made it necessary are process too. This RFC codifies the release+post-release process so the conditions that produced months of silent fuzz rot don’t recur.

The motivating pattern, in two sentences: RFCs landed, the matrix was the working signal, nobody ran run_all.sh to notice the fuzz had rotted. Five framec defects shipped in that window that fuzz would have caught. (Documented in RFC-0030 Track A.2 outcome and the Bill Gates second-pass review.)

What this RFC defines

A four-layer discipline:

RC validation — the exact set of checks that must pass before a tag is cut.
Merge gate — what CI runs on every push to main.
Release sequence — RC → tag → changelog → announce → monitor.
Drift detection — how we notice when a tier of the test pyramid is silently broken.

Each layer is small. The discipline is in running all four every release, not in any one layer being elaborate.

Layer 1: RC validation

Before tagging vX.Y.Z, the RC author runs:

# framec local — must all return clean exit
cd /path/to/framec
cargo test --release            # unit + RFC-0027 snapshot tests
cargo clippy --release -- -D warnings
cargo fmt --check
python3 scripts/validate_doc_samples.py

# Matrix — 17 backends × curated fixture corpus
cd /path/to/framec-test-env/docker
make test                       # expect: "17 languages clean, 0 with failures"

# Fuzz smoke — 21 phases × 17 backends, ~2-4 min
cd /path/to/framec-test-env/fuzz
FRAMEC=/path/to/framec/target/release/framec \
  ./run_all.sh --tier=smoke     # expect: 0 fails per-lang aggregated

RC bar: every above command returns clean. No exceptions for “known flakes” — if make test flakes, re-run it; if it flakes twice, the flake is a release blocker, not a personality trait.

RC validation discipline (per the 2026-05-17/18 session pattern):

Don’t trust auto-memory snapshots over current code state. Verify each roadmap - [ ] item against current code/git before working it. (Discipline note already added to _scratch/roadmap.md header.)
If a stale claim surfaces (RFC cited as “blocker” that already shipped; roadmap task that’s already done), close the stale entry immediately and continue.
When in doubt, run the command, not the recollection.

Layer 2: Merge gate (CI)

Every push to main (and every PR head update) runs:

# .github/workflows/ci.yml — to be authored by roadmap #437
jobs:
  framec-local:           # ~30s
    - cargo build --release
    - cargo test --release
    - cargo clippy --release -- -D warnings
    - cargo fmt --check
    - python3 scripts/validate_doc_samples.py
  matrix:                 # ~3-4 min
    - cd docker && make test
  fuzz-smoke:             # ~3-4 min
    - cd fuzz && FRAMEC=$(workspace)/framec/target/release/framec \
        ./run_all.sh --tier=smoke

Three jobs, parallel. Total wall: ~4 min. Merge blocks if any fail. No “merge anyway” override without a written exception in the PR description naming the RFC or roadmap task that tracks the regression.

Why this is the highest-leverage change available (per the Gates review):

Without it, the matrix is the working signal and fuzz is a developer’s-discretion side check. We just discovered fuzz had been silently broken for months and was hiding multiple real defects (Rust clone, C++ factory bypass, E606 assignment-form gap, Erlang split_inline_arm string-arrow match). All would have shipped to users.
With it, the gate flags rot the moment a generator emits invalid code or a renderer falls behind a contract. The problem becomes “a CI run failed, what changed?” — discoverable and bounded — rather than “everything looks fine until we run the harness someday and find six months of accumulated drift.”

Why ~4 min is acceptable for a per-push gate:

Matrix is ~143s warm
Fuzz smoke is ~120s on the 12-core measured host
The two run in parallel; jobs converge in ~max(143, 120) ≈ 150s plus CI cold-start overhead.
Compare to the cost of shipping a regression: minutes of CI vs. days of customer-facing breakage.

If the matrix container build cold-start is heavy in CI (Docker layer cache miss), pre-build the per-backend images on a nightly schedule and tag them; the per-push run pulls cached images and just runs.

Layer 3: Release sequence

Once the RC bar (Layer 1) is clean and CI (Layer 2) is green:

Tag: git tag -a vX.Y.Z -m "release X.Y.Z" — semver per the policy below.
Push tag: git push origin vX.Y.Z — triggers the GitHub Actions release workflow (auto-build cross-platform binaries per RFC-0024 pattern). The CI release pipeline already exists (commits 7989c7c, 548be57 from earlier this month).
Changelog: docs/CHANGELOG.md — append an entry summarizing the user-visible changes. Auto-generated from commits-since-last-tag with an editor pass for clarity, NOT raw git log dump. Each entry has: features, fixes, breaking-changes, deprecations. Reference RFCs / roadmap tasks by number.
Release notes page: add docs/releases/X.Y.Z.md per the release-notes style guide — a skimmable, why-focused summary (Highlights, plus Breaking changes / Action required when relevant), one page per published release. The section auto-orders newest-first (child_nav_order: reversed), so no renumbering is needed. This is the docs-site Release Notes section, served from main /docs. Also replace the auto-generated GitHub release body (a bare PR-title list) with the same summary so both read well. Skip RC tags and formatting-only point releases.
Announce: when a release page is up on GitHub, link it from any user-facing channel (org README’s release-callout section, project-status page, etc.).
Monitor for 24 hours: watch GitHub issues + any user-facing channels for regression reports. If one lands, triage immediately — don’t let it sit until “next release.”

Semver policy

Major (X.0.0): user-facing source incompatibility. A Frame v3 source no longer compiles, or an RFC hard-cut (RFC-0013 @@target, RFC-0015 factory, RFC-0024 @@import) removes syntax. Requires a migration guide in the release notes.
Minor (X.Y.0): new features, new language constructs, new backends. Existing valid Frame source still compiles identically.
Patch (X.Y.Z): bug fixes, codegen improvements that produce semantically-equivalent output, performance work, doc fixes.

Pre-release tags use vX.Y.Z-rc.N (hyphen → CI auto-flags as pre-release per commit 9986fcd). Promote rc.N to vX.Y.Z when Layer 1 + Layer 2 + Layer 3 monitoring are all clean.

Rollback

If a regression surfaces post-tag:

For a patch release: cut a fast follow vX.Y.Z+1 with the fix and a one-line changelog entry pointing at the regression.
For a minor/major with multiple shipped consumers: per- decision. Yanking a published binary asset is loud; usually better to ship the fix as a fast-follow and document the affected version in the changelog and release notes.

Don’t git push --force on tags. If a tag must be moved, delete and re-tag with the next version number.

Layer 4: Drift detection

This is the layer that prevents the next “fuzz rotted for months” incident.

Nightly comprehensive run

A nightly GitHub Actions job runs:

cd docker && make test              # matrix full
cd fuzz && ./run_all.sh --tier=full # fuzz full, all 35k+ cases

Total wall: ~50 min. If either fails, file a GitHub issue auto-tagged nightly-regression with the failure summary. The issue is the prompt for the next workday: “what changed yesterday that broke this?”

Stale-roadmap audit

Quarterly (or before any release): run the verification pattern from RFC-0030 / the discipline note in _scratch/roadmap.md:

For every - [ ] task, verify against current state (git log, file contents, build output) that the work hasn’t silently been done. The 2026-05-17/18 session closed 11 stale entries this way. The audit takes ~30 min if done as a focused pass; it prevents future sessions from spending hours on already-shipped work or building RFC plans on phantom premises.

Auto-memory hygiene

Auto-memory snapshots (the ~/.claude/projects/.../memory/ entries) decay faster than the codebase. Per the existing discipline rule in _scratch/roadmap.md: treat memory as a hypothesis, not a fact. Don’t cite memory in an RFC without verifying against current state.

When a memory entry becomes stale (the thing it described has moved): update the memory file’s content + the MEMORY.md index line in the same pass. The 2026-05-17/18 session fixed two stale memory entries (no_persist_codegen_arc.md, todo_remove_oracle_jdk8.md); similar drift will accumulate unless every closure does the same.

What this RFC does NOT cover

Test infrastructure design (matrix, fuzz, snapshot tests themselves) — covered by RFC-0027 (snapshots), RFC-0029 (fuzz status), RFC-0030 (fuzz catch-up).
Language design RFCs — handled per-RFC.
Bug triage workflow — owned by _scratch/FRAMEC_BUGS.md (its own numbering), referenced from the roadmap.
External user issue triage — out of scope until user- facing GitHub Issues becomes the source of truth (currently the roadmap doc + RFCs are, per Mark’s note 2026-05-17).

Drawbacks

CI takes ~4 minutes per push. Real cost. Worth it.
Nightly run uses CI quota. ~50 min/day. Modest for GitHub Actions’ free tier; cheap on any paid plan.
Stale-audit discipline depends on someone running the audit. If it doesn’t get run, the drift returns. The 2026- 05-17/18 audit happened because Mark explicitly asked for it. A scheduled reminder (e.g., open an audit issue every quarter) would make this self-enforcing.
Rollback discipline puts pressure on the RC process. If the RC bar in Layer 1 is rigorous, rollbacks should be rare. If rollbacks become routine, that’s a signal the RC bar is too loose and a layer is being skipped.

Unresolved questions

Should the CI gate also run a single-fixture differential trace through diff_harness/run_fuzz.py? That tier (Phase 2-7) is the property-based check against the Python oracle. The smoke tier of the shell-phase runners doesn’t exercise it. A bare diff_harness/run_fuzz.py --max 5 per phase would add ~30s to the CI gate and catch trace-divergence bugs. Recommend add for v1 of the gate.
Pre-merge vs. post-merge gate? Above describes pre-merge (PR gate blocks). Post-merge gate (run after merge, alert on break) is cheaper but doesn’t prevent regressions from reaching main. Recommend pre-merge for matrix and fuzz-smoke (~4 min is acceptable); post-merge only as a fallback if a CI budget constraint forces it.
What about external contributors? Today there are none. When external PRs land, the same CI gate runs against the PR branch. Contributors who can’t run the matrix locally (no Docker, etc.) get the CI result as their feedback.
Should we add a cargo bench step? Performance regressions would surface in nightlies if we had them. Today there are no benchmarks in framec. Not blocking; flag for a future RFC if perf becomes a release concern.

References

RFC-0027 — snapshot test infra
RFC-0029 — fuzz infrastructure status (raised the CI integration question this RFC resolves)
RFC-0030 — fuzz catch-up plan (executed 2026-05-18; surfaced the defects that motivate this RFC)
_scratch/roadmap.md § discipline note — verify-before-claim rule for - [ ] entries
Bill Gates second-pass review (recorded in session log 2026-05-18) — top-three asks that drove this RFC
docs/CHANGELOG.md — release-history doc updated per Layer 3
.github/workflows/ — to be authored by roadmap #437 to implement Layer 2