Mem2reg (Analysis + Transform plan)#998
Conversation
Add the SW_mem2reg switch, an identity-stub Core pass (core_mem2reg.ml), pipeline wiring, and 10 CI tests covering simple scalars, branches, loops, address-taken variables, structs, and mixed cases. All 10 tests pass with the stub (identity transform). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phase 1: run all pre-existing CI tests with --sw mem2reg and verify output is unchanged (regression safety). Phase 2: run --pp core --sw mem2reg on the 10 new mem2reg tests and count create( occurrences to verify promotable variables are eliminated. - run-mem2reg.sh: thin dispatcher that runs phase1 then phase2 - run-mem2reg-phase1.sh: regression check — --sw mem2reg must not change output of existing CI tests (0001–0340) - run-mem2reg-phase2.sh: elimination check — counts create() in Core IR for all 10 mem2reg tests (0341–0350); current expected values reflect the stub pass, with target values noted in a comment Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the identity-transform stub with a full promotability analysis:
find all Create(PrefSource)-bound pointer syms in each Proc and determine
which ones are only ever used as Load0/Store0/Kill addresses (never
address-taken or passed to arbitrary expressions).
Key details:
- sym_occurs_in_{pexpr,expr,action}: conservative occurrence check
- classify_action: precise Load0/Store0/Kill address classification
- collect_uses: handles the Core load-alias idiom
(let weak tmp = pure(ptr) in load(ty, tmp)) via a special Ewseq arm
- collect_creates: finds Esseq-bound Create(PrefSource _) syms
- Debug output at level 3: "[mem2reg] f: N promotable: [...]"
- Note in collect_creates on the via-pointer calling-convention assumption
and how CN's value-passing convention would allow extending this to
non-address-taken function parameters
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ries Under Normal_callconv only PrefSource (C locals) are promoted; under Inner_arg_callconv PrefFunArg Create bindings are also included, since in that convention the callee owns the argument slot. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nalysis check_definitely_init now correctly handles: - Ewseq: e2 receives already_init (not ia1), since Neg stores in e1 are not sequenced before e2; init_after = ia1 || ia2 - Eunseq: all arms receive already_init; safe/init_after both require AND across arms (no arm's result is visible to siblings) - SeqRMW: classified as Use_seqrmw (promotable); (safe=already_init, init_after=true) — atomically reads then writes Add no_mixed_unseq_uses predicate: blocks promotion when a write arm and >=2 mentioning arms coexist in an Eunseq, preserving Cerberus's unsequenced-race detector for cases like `x + (x = 42)`. find_promotable now requires all three predicates: use_is_promotable && check_definitely_init && no_mixed_unseq_uses Add seven CI tests (0351-0357) covering Ewseq promotion, uninit/init unsequenced races, read-only Eunseq arms, post/pre-increment SeqRMW, and x+(x++) UB detection. Update run-mem2reg-phase2.sh and tests.sh. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Pasting this discussion here for visiblity. @cp526: @neel-krishnaswami: It's only when you hit a @dc-mak: AFAIR, elaboration ensures that I think the fact that The transformation for @cp526 For what it's worth, there's two alternatives to doing this as a core-to-core rewrite: option (1), which Kayvan was considering, is to adjust the elaboration around the variables mem2reg would optimise; option(2) would be to make this a mucore-to-mucore rewrite. We want to merge the core and mucore expression grammars, but will want to keep mucore's separate handling of labels (with labels as function-local procedures rather than Esave), so then you wouldn't have to think about how mem2reg should work for Esave |
|
@cp526: Other related points raised in discussion:
|
…tracking See doc/history/2026-03-18_mem2reg-esave-erun.md for the full design. Previously, any occurrence of a CREATE-bound sym in an Esave's args or body conservatively blocked promotion. The fix rests on two invariants of the elaborator: 1. Every CREATE-bound local pointer sym that appears inside an Esave body is passed in as a bare PEsym default arg — it is never referenced directly from the body under its outer name. (Non-CREATE-bound syms such as function parameters may appear freely; the invariant is narrowly about CREATE-bound locals.) Consequently, if a sym is not aliased by any param, it cannot reach the body, and the analysis can skip it entirely. 2. Erun is terminal: control jumps unconditionally into the Esave body and the sequential continuation is unreachable, so init_after = true. Erun args for CREATE-bound syms are bare PEsym aliases of the target Esave's params, so alias resolution via find_single_direct_alias is sufficient and no structural occurrence check is needed. Both collect_uses and check_definitely_init use the (cached) result of recursing on the Esave body using the param_sym which is an alias of the Erun arg or Esave default arg. They then combine that with any existing information about the arg itself to calculate the result. New infrastructure: - collect_esave_defs: pre-walk that builds a memo table of Esave definitions (label_sym → params + body), enabling Erun sites to resolve forward references without re-traversal. - find_single_direct_alias: finds the unique param whose default is a bare PEsym equal to the outer sym. - Memoised collect_uses / check_definitely_init: results cached per (label_sym, param_sym); a sentinel is written before the recursive call to handle back-edge Eruns without diverging. For collect_uses the sentinel [] is exact (a back-edge Erun carries no semantic use). For check_definitely_init the sentinel (true, true) is a sound over-approximation: any load-before-store on an exit path sets safe=false through &&-propagation before the back-edge Erun is reached, so the sentinel cannot mask a real unsafety. - expr_writes_sym and no_mixed_unseq_uses updated with the same param-alias indirection for Esave. New CI tests: 0361–0364 cover pre-init loop reads, escaping writes, nested loops, and uninit loads inside loops. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The definite-initialisation check previously used a plain bool for "is sym initialised after this expression", conflating two distinct notions: a Paction Pos store (committed before any Esseq continuation) and a Paction Neg store inside Ewseq (unsequenced w.r.t. the Ewseq continuation). This made the analysis overly conservative for the common case of compound literals, whose Core IR is: Ewseq(_, Store0(Paction Pos, ptr, val), Load0(ptr)) The Pos store is sequenced before the Load in memory-model terms, but the old bool could not express this, so the Load was conservatively rejected and the variable was not promoted. Replace the bool with a three-element lattice Write_kind = No_write < Weak < Strong. Esseq promotion now treats any write (Weak or Strong) as initialised for its continuation; Ewseq only propagates Strong. Load/SeqRMW safety checks use is_strong. Branch and unsequenced joins use meet (&&); sequential joins use join (||). Adds test 0365-mem2reg_compound_lit.c verifying that an int variable initialised via a compound literal produces the correct result under --sw mem2reg, and adds it to run-mem2reg-phase2.sh with the stub create-count of 2 (fully-working target: 0). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Records the finding that the Write_kind refactoring does not change promotable-variable counts in practice: check_definitely_init checks for Load0(PEsym sym) directly, but C-generated Core always loads locals through an intermediate alias (Ewseq(tmp, pure(sym), load(tmp))), so the Ewseq rule change never fires. Includes annotated Core output for the compound-literal test to show why a_508 is filtered by collect_uses (Use_other) before check_definitely_init is reached, and why x is promotable under both old and new analysis for unrelated reasons. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces SW_copy_prop / --sw copy_prop, a new Core pass that runs immediately before mem2reg in core_passes. The pass finds bindings of the form `let alias = pure(sym)` in Ewseq/Esseq/Elet/PElet and substitutes sym for alias throughout the body, replacing the named pattern with a wildcard. It also handles the extended case where the RHS is effectful but its tail expression is pure(sym) with sym not locally bound within the RHS (the compound literal motivating case from 2026-03-20_mem2reg-weak-init.md). Key design point: binding nodes are always kept (pattern → wildcard), never dropped, so source-location annotations are preserved. New files: copy_propagation.mli/.ml, tests 0366-0373, phase1/phase2 test scripts, history doc. All 208 CI tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
I've been coding and thinking some more and I'm now less sure about doing a mem2reg transformation on Core.
I always found the treatment of local vars in CN to be confusing (and I was not alone in this #443). Could it be possible to end up in a situation where updating a mem2reg pass would cause existing verified code to break? That would be bonkers, but I am confused right now. These considerations push me towards favouring an analysis and then passing those locations to CN, so they can be treated specially/on a fast path in the checker itself. But I am not decided/am still unsure. |
Replaces the three-pass promotability analysis (check_definitely_init, expr_writes_sym, no_mixed_unseq_uses) with a single abstract interpretation, sequentialisable, that tracks Mem_event sequences and an env (Uninit / Init pe / DelayedInit pe / Killed) through the IR. Key improvements over the old Write_kind lattice approach: - Tracks the concrete stored value (pexpr) so the transform phase can substitute it directly at Load sites without a second traversal. - DelayedInit models Neg-polarity (Ewseq) stores, blocking any Load/Store on the same sym before the sequence point. - SeqRMW is handled: atomically reads the current value, substitutes it into the update expression, and records the result as Init. - Eunseq sequencing-violation detection is now integrated: raises Not_sequentialisable if a write arm coexists with any other arm that mentions sym. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Regrettably, this commit fixes up many small and big things in mem2reg. - Mem_event => Event - Mem_event.t list => Set.Make(Event).t - Remove DelayedInit and fix handling of Neg_stores in wseq - Remove alias case for wseq in collect_uses (handled by copy_prop pass) - Use Core_aux.unsafe_subst_pexpr instead of hand-rolled version - Use Pexpr (.. PEundef ..) for env "value" of non-returning case-branches to retain pattern-completeness - Handle End and Epar same as Eunseq - Fix the Ebound case to keep the env
- Replace tail_sym/tail_sym_acc with tail_pexpr/tail_pexpr_acc, which returns any safe pure expression at the tail of an effectful expression, not just PEsym. Two new helpers support this: pexpr_safe (conservative free-variable check) and binders_of_pat (collects all named symbols from any pattern, including CaseCtor tuples). - Change type env from (sym, sym) Pmap.map to (sym, pexpr) Pmap.map. Bindings for Ewseq/Esseq/Elet now store the full pexpr value in the env; PEsym substitution returns the stored pexpr directly. All 208 CI tests pass; copy-prop phase-1 regression suite unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Generalise tail_pexpr to return a pexpr_tree rose tree that mirrors Eunseq nesting, and add match_pat_tree to walk a Core pattern and tree in lock-step, partially wildcarding tuple positions where a known pure expression is available. This lets copy_prop propagate constants out of unsequenced sub-expressions — comparison operators, function-pointer loads, etc. — into the continuation. Also fix a Core typechecker asymmetry exposed by the new propagations: the checking rule for Cspecified was stricter than inference — it did not allow Specified(pe) to be checked against BTy_storable even when pe has an object type, which is semantically fine. New CI test: 0374-copy_prop_tuple.c. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tracking the actual pexpr value in the environment per sym complicates
the code needlessly (at this stage) and is also just plain incorrect,
if done one symbol at a time.
Hence this commit removes it, which simplifies much of the code
drastically. This commit also makes sure that non-returning code in a
wseq or an unseq throws a Not_sequentialisable error, since we can't
soundly approximate the events to detect a race condition (though I'm
fairly sure the elaboration never places a run inside either of those
constructs).
For example, for this C program:
```c
int main()
{
void *p;
int houdini;
escape:
p = &houdini;
return p ? 0 : 1;
}
```
After copy propagation, we get this label.
```ocaml
save escape: unit (p: pointer:= p, houdini: pointer:= houdini) in
let strong _: loaded pointer =
bound(
...
let weak _: unit = neg(store('void*', p, Specified(houdini))) in
pure(Specified(houdini))
) in
pure(Unit) ;
```
And we'd have to be very careful with tracking the environment for `p` after
the save block, to make sure it doesn't refer to the save-symbol houdini,
but to the procedure-level one.
Leaving the values in place still causes mem2reg to fail with plenty Use_others, so updating copy propagtion to rewrite to Units removes such false negative uses. This rewrite found that evaluation of PEare_compatible is stricter than it needed to be/can take multiple steps, due the fact (a) pure expression can diverge (b) so function calls reduce in more than 1 step (c) let pat = pe1 in pe2 eagerly evaluates pe1 to a value in more than 1 step but (d) are_compatible(pe1, pe2) only gives pe1 and pe2 one step each so (e) rewriting `let x = params_nth(param, 0) in are_compatible(_, x)` with `let _ = Unit in are_compatible(_, params_nt(param, 0))` caused evaluation to fail (test 0021-fact.c, amongst others). This was fixed by updating param_nth to return the whole expression when reducing one step each didn't produce a value (values of the wrong type are still rejected, of course). I still don't understand why marking PEcfunction as unsafe makes it not substituted - I would have thought the pattern matching on the tuples would have prevented it.
Previously, a tuple like let (x,y) = pure((v, cfunction(v)) in .. would be discarded from consideration altogether due to the cfunction on the right, even though x can be propagated. The reason for this was that pure tuples were replaced with Unit as a whole rather than per branch; this commit fixes that to do it per component recursively. It also tidies up much of the comments and function names along the way (sorry!).
Replace the two-phase pexpr_tree build + match_pat_tree approach with a single-pass analyze_pat_pexpr / analyze_pat_expr that takes the binding pattern alongside the expression and returns a coherent (bindings, new_pat, new_expr) triple. Previously, prop_and_rm could replace e.g. Specified(ptr) with unit while match_pat_tree left the Cspecified(x) pattern unchanged, producing a type-incoherent let Specified(x) = pure(unit). The new code recurses into Cspecified wrappers and rebuilds them consistently. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
For CN, the elaboration annotates pure(<int>) literal expressions with a C type so to support the use of signed and sized bit-vector typing. When replacing such expressions with pure(Unit), this messes up CN typing and so this commit removes such annotations from the units, but propagates them inside the pure expression so that propagated integer values get them. It also pushes the location annotations into pure expressions similarly. Testing with CN also uncovered rems-project#1000, hence the integer annotations aren't required/assumed to be unique.
This PR adds support for the analysis part of a Core-to-Core mem2reg pass. This would be useful to speed up execution of Core programs and also CN proof (and Fulminate?) substantially.
It was developed with Claude and so I've included the plans that fed into the code changes for documentation in this PR - the final PR can have these deleted (including a fresh branch so they don't even exist in the history).
For example, Claude seemed to think that an Ail/Ail+Core analysis would be more complicated than a simple Core-to-Core, so the reasoning for this is included in doc/history/2026-03-13_mem2reg-plan.md
It's already got a few commits and a fair amount of code, but the PR can and should be read one commit at a time.
The analysis pass is quite naive - it does multiple traversals for one thing at a time (instead of one complex one to collect all the info), but it's easy to understand and I figured it can be improved later once we have a working implementation that passes tests (or not, if it turns out to be fast enough), e.g. I assume perhaps the Core rewriter library does some sort of pass-fusion?