RFC-0031: Post-release process — RC validation, CI gates, and discipline against rot
- Status: Accepted (Process)
- Author: Mark Truluck mark.truluck@cogiton.com
- Created: 2026-05-18
- Resolves: RFC-0029 § “CI integration question”; the “matrix + fuzz both green before merge” recommendation from Bill Gates second-pass review.
Origin
This RFC was created in response to the second persona-critique review (Bill Gates, post-fuzz-repair) conducted at the end of the 2026-05-17/18 RC-validation session. The review’s top three asks were:
- CI gate that runs matrix + fuzz-smoke on every push. ~3-4 minutes wall. Stops the fuzz rot pattern. Stops the matrix from being silently broken. “The single highest-leverage change available.”
- Split
frame_validator.rsby error-code family. - Document runtime overhead in
framepiler_design.md.
Items 2 and 3 are technical roadmap tasks (#438, #439). Item 1 is process — and the conditions that made it necessary are process too. This RFC codifies the release+post-release process so the conditions that produced months of silent fuzz rot don’t recur.
The motivating pattern, in two sentences: RFCs landed, the
matrix was the working signal, nobody ran run_all.sh to
notice the fuzz had rotted. Five framec defects shipped in
that window that fuzz would have caught. (Documented in
RFC-0030 Track A.2 outcome and the Bill Gates second-pass
review.)
What this RFC defines
A four-layer discipline:
- RC validation — the exact set of checks that must pass before a tag is cut.
- Merge gate — what CI runs on every push to main.
- Release sequence — RC → tag → changelog → announce → monitor.
- Drift detection — how we notice when a tier of the test pyramid is silently broken.
Each layer is small. The discipline is in running all four every release, not in any one layer being elaborate.
Layer 1: RC validation
Before tagging vX.Y.Z, the RC author runs:
# framec local — must all return clean exit
cd /path/to/framec
cargo test --release # unit + RFC-0027 snapshot tests
cargo clippy --release -- -D warnings
cargo fmt --check
python3 scripts/validate_doc_samples.py
# Matrix — 17 backends × curated fixture corpus
cd /path/to/framec-test-env/docker
make test # expect: "17 languages clean, 0 with failures"
# Fuzz smoke — 21 phases × 17 backends, ~2-4 min
cd /path/to/framec-test-env/fuzz
FRAMEC=/path/to/framec/target/release/framec \
./run_all.sh --tier=smoke # expect: 0 fails per-lang aggregated
RC bar: every above command returns clean. No exceptions for
“known flakes” — if make test flakes, re-run it; if it flakes
twice, the flake is a release blocker, not a personality trait.
RC validation discipline (per the 2026-05-17/18 session pattern):
- Don’t trust auto-memory snapshots over current code state.
Verify each roadmap
- [ ]item against current code/git before working it. (Discipline note already added to_scratch/roadmap.mdheader.) - If a stale claim surfaces (RFC cited as “blocker” that already shipped; roadmap task that’s already done), close the stale entry immediately and continue.
- When in doubt, run the command, not the recollection.
Layer 2: Merge gate (CI)
Every push to main (and every PR head update) runs:
# .github/workflows/ci.yml — to be authored by roadmap #437
jobs:
framec-local: # ~30s
- cargo build --release
- cargo test --release
- cargo clippy --release -- -D warnings
- cargo fmt --check
- python3 scripts/validate_doc_samples.py
matrix: # ~3-4 min
- cd docker && make test
fuzz-smoke: # ~3-4 min
- cd fuzz && FRAMEC=$(workspace)/framec/target/release/framec \
./run_all.sh --tier=smoke
Three jobs, parallel. Total wall: ~4 min. Merge blocks if any fail. No “merge anyway” override without a written exception in the PR description naming the RFC or roadmap task that tracks the regression.
Why this is the highest-leverage change available (per the Gates review):
- Without it, the matrix is the working signal and fuzz is a developer’s-discretion side check. We just discovered fuzz had been silently broken for months and was hiding multiple real defects (Rust clone, C++ factory bypass, E606 assignment-form gap, Erlang split_inline_arm string-arrow match). All would have shipped to users.
- With it, the gate flags rot the moment a generator emits invalid code or a renderer falls behind a contract. The problem becomes “a CI run failed, what changed?” — discoverable and bounded — rather than “everything looks fine until we run the harness someday and find six months of accumulated drift.”
Why ~4 min is acceptable for a per-push gate:
- Matrix is ~143s warm
- Fuzz smoke is ~120s on the 12-core measured host
- The two run in parallel; jobs converge in ~max(143, 120) ≈ 150s plus CI cold-start overhead.
- Compare to the cost of shipping a regression: minutes of CI vs. days of customer-facing breakage.
If the matrix container build cold-start is heavy in CI (Docker layer cache miss), pre-build the per-backend images on a nightly schedule and tag them; the per-push run pulls cached images and just runs.
Layer 3: Release sequence
Once the RC bar (Layer 1) is clean and CI (Layer 2) is green:
- Tag:
git tag -a vX.Y.Z -m "release X.Y.Z"— semver per the policy below. - Push tag:
git push origin vX.Y.Z— triggers the GitHub Actions release workflow (auto-build cross-platform binaries per RFC-0024 pattern). The CI release pipeline already exists (commits7989c7c,548be57from earlier this month). - Changelog:
docs/CHANGELOG.md— append an entry summarizing the user-visible changes. Auto-generated from commits-since-last-tag with an editor pass for clarity, NOT raw git log dump. Each entry has: features, fixes, breaking-changes, deprecations. Reference RFCs / roadmap tasks by number. - Release notes page: add
docs/releases/X.Y.Z.mdper the release-notes style guide — a skimmable, why-focused summary (Highlights, plus Breaking changes / Action required when relevant), one page per published release. The section auto-orders newest-first (child_nav_order: reversed), so no renumbering is needed. This is the docs-site Release Notes section, served frommain/docs. Also replace the auto-generated GitHub release body (a bare PR-title list) with the same summary so both read well. Skip RC tags and formatting-only point releases. - Announce: when a release page is up on GitHub, link it from any user-facing channel (org README’s release-callout section, project-status page, etc.).
- Monitor for 24 hours: watch GitHub issues + any user-facing channels for regression reports. If one lands, triage immediately — don’t let it sit until “next release.”
Semver policy
- Major (
X.0.0): user-facing source incompatibility. A Frame v3 source no longer compiles, or an RFC hard-cut (RFC-0013@@target, RFC-0015 factory, RFC-0024@@import) removes syntax. Requires a migration guide in the release notes. - Minor (
X.Y.0): new features, new language constructs, new backends. Existing valid Frame source still compiles identically. - Patch (
X.Y.Z): bug fixes, codegen improvements that produce semantically-equivalent output, performance work, doc fixes.
Pre-release tags use vX.Y.Z-rc.N (hyphen → CI auto-flags as
pre-release per commit 9986fcd). Promote rc.N to vX.Y.Z
when Layer 1 + Layer 2 + Layer 3 monitoring are all clean.
Rollback
If a regression surfaces post-tag:
- For a patch release: cut a fast follow
vX.Y.Z+1with the fix and a one-line changelog entry pointing at the regression. - For a minor/major with multiple shipped consumers: per- decision. Yanking a published binary asset is loud; usually better to ship the fix as a fast-follow and document the affected version in the changelog and release notes.
Don’t git push --force on tags. If a tag must be moved, delete
and re-tag with the next version number.
Layer 4: Drift detection
This is the layer that prevents the next “fuzz rotted for months” incident.
Nightly comprehensive run
A nightly GitHub Actions job runs:
cd docker && make test # matrix full
cd fuzz && ./run_all.sh --tier=full # fuzz full, all 35k+ cases
Total wall: ~50 min. If either fails, file a GitHub issue
auto-tagged nightly-regression with the failure summary. The
issue is the prompt for the next workday: “what changed
yesterday that broke this?”
Stale-roadmap audit
Quarterly (or before any release): run the verification pattern
from RFC-0030 / the discipline note in _scratch/roadmap.md:
For every - [ ] task, verify against current state (git log,
file contents, build output) that the work hasn’t silently been
done. The 2026-05-17/18 session closed 11 stale entries this
way. The audit takes ~30 min if done as a focused pass; it
prevents future sessions from spending hours on already-shipped
work or building RFC plans on phantom premises.
Auto-memory hygiene
Auto-memory snapshots (the ~/.claude/projects/.../memory/
entries) decay faster than the codebase. Per the existing
discipline rule in _scratch/roadmap.md: treat memory as a
hypothesis, not a fact. Don’t cite memory in an RFC without
verifying against current state.
When a memory entry becomes stale (the thing it described has
moved): update the memory file’s content + the MEMORY.md index
line in the same pass. The 2026-05-17/18 session fixed two stale
memory entries (no_persist_codegen_arc.md,
todo_remove_oracle_jdk8.md); similar drift will accumulate
unless every closure does the same.
What this RFC does NOT cover
- Test infrastructure design (matrix, fuzz, snapshot tests themselves) — covered by RFC-0027 (snapshots), RFC-0029 (fuzz status), RFC-0030 (fuzz catch-up).
- Language design RFCs — handled per-RFC.
- Bug triage workflow — owned by
_scratch/FRAMEC_BUGS.md(its own numbering), referenced from the roadmap. - External user issue triage — out of scope until user- facing GitHub Issues becomes the source of truth (currently the roadmap doc + RFCs are, per Mark’s note 2026-05-17).
Drawbacks
- CI takes ~4 minutes per push. Real cost. Worth it.
- Nightly run uses CI quota. ~50 min/day. Modest for GitHub Actions’ free tier; cheap on any paid plan.
- Stale-audit discipline depends on someone running the audit. If it doesn’t get run, the drift returns. The 2026- 05-17/18 audit happened because Mark explicitly asked for it. A scheduled reminder (e.g., open an audit issue every quarter) would make this self-enforcing.
- Rollback discipline puts pressure on the RC process. If the RC bar in Layer 1 is rigorous, rollbacks should be rare. If rollbacks become routine, that’s a signal the RC bar is too loose and a layer is being skipped.
Unresolved questions
-
Should the CI gate also run a single-fixture differential trace through
diff_harness/run_fuzz.py? That tier (Phase 2-7) is the property-based check against the Python oracle. The smoke tier of the shell-phase runners doesn’t exercise it. A barediff_harness/run_fuzz.py --max 5per phase would add ~30s to the CI gate and catch trace-divergence bugs. Recommend add for v1 of the gate. -
Pre-merge vs. post-merge gate? Above describes pre-merge (PR gate blocks). Post-merge gate (run after merge, alert on break) is cheaper but doesn’t prevent regressions from reaching main. Recommend pre-merge for matrix and fuzz-smoke (~4 min is acceptable); post-merge only as a fallback if a CI budget constraint forces it.
-
What about external contributors? Today there are none. When external PRs land, the same CI gate runs against the PR branch. Contributors who can’t run the matrix locally (no Docker, etc.) get the CI result as their feedback.
-
Should we add a
cargo benchstep? Performance regressions would surface in nightlies if we had them. Today there are no benchmarks in framec. Not blocking; flag for a future RFC if perf becomes a release concern.
References
- RFC-0027 — snapshot test infra
- RFC-0029 — fuzz infrastructure status (raised the CI integration question this RFC resolves)
- RFC-0030 — fuzz catch-up plan (executed 2026-05-18; surfaced the defects that motivate this RFC)
_scratch/roadmap.md§ discipline note — verify-before-claim rule for- [ ]entries- Bill Gates second-pass review (recorded in session log 2026-05-18) — top-three asks that drove this RFC
docs/CHANGELOG.md— release-history doc updated per Layer 3.github/workflows/— to be authored by roadmap #437 to implement Layer 2