Skip to content

Mem2reg (Analysis + Transform plan)#998

Draft
dc-mak wants to merge 23 commits intorems-project:masterfrom
dc-mak:mem2reg
Draft

Mem2reg (Analysis + Transform plan)#998
dc-mak wants to merge 23 commits intorems-project:masterfrom
dc-mak:mem2reg

Conversation

@dc-mak
Copy link
Copy Markdown
Collaborator

@dc-mak dc-mak commented Mar 16, 2026

This PR adds support for the analysis part of a Core-to-Core mem2reg pass. This would be useful to speed up execution of Core programs and also CN proof (and Fulminate?) substantially.

It was developed with Claude and so I've included the plans that fed into the code changes for documentation in this PR - the final PR can have these deleted (including a fresh branch so they don't even exist in the history).

For example, Claude seemed to think that an Ail/Ail+Core analysis would be more complicated than a simple Core-to-Core, so the reasoning for this is included in doc/history/2026-03-13_mem2reg-plan.md

It's already got a few commits and a fair amount of code, but the PR can and should be read one commit at a time.

The analysis pass is quite naive - it does multiple traversals for one thing at a time (instead of one complex one to collect all the info), but it's easy to understand and I figured it can be improved later once we have a working implementation that passes tests (or not, if it turns out to be fast enough), e.g. I assume perhaps the Core rewriter library does some sort of pass-fusion?

dc-mak and others added 6 commits March 16, 2026 15:42
Add the SW_mem2reg switch, an identity-stub Core pass
(core_mem2reg.ml), pipeline wiring, and 10 CI tests covering simple
scalars, branches, loops, address-taken variables, structs, and mixed
cases. All 10 tests pass with the stub (identity transform).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phase 1: run all pre-existing CI tests with --sw mem2reg and verify
output is unchanged (regression safety).
Phase 2: run --pp core --sw mem2reg on the 10 new mem2reg tests and
count create( occurrences to verify promotable variables are eliminated.

- run-mem2reg.sh: thin dispatcher that runs phase1 then phase2
- run-mem2reg-phase1.sh: regression check — --sw mem2reg must not change
  output of existing CI tests (0001–0340)
- run-mem2reg-phase2.sh: elimination check — counts create() in Core IR
  for all 10 mem2reg tests (0341–0350); current expected values reflect
  the stub pass, with target values noted in a comment

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the identity-transform stub with a full promotability analysis:
find all Create(PrefSource)-bound pointer syms in each Proc and determine
which ones are only ever used as Load0/Store0/Kill addresses (never
address-taken or passed to arbitrary expressions).

Key details:
- sym_occurs_in_{pexpr,expr,action}: conservative occurrence check
- classify_action: precise Load0/Store0/Kill address classification
- collect_uses: handles the Core load-alias idiom
  (let weak tmp = pure(ptr) in load(ty, tmp)) via a special Ewseq arm
- collect_creates: finds Esseq-bound Create(PrefSource _) syms
- Debug output at level 3: "[mem2reg] f: N promotable: [...]"
- Note in collect_creates on the via-pointer calling-convention assumption
  and how CN's value-passing convention would allow extending this to
  non-address-taken function parameters

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ries

Under Normal_callconv only PrefSource (C locals) are promoted; under
Inner_arg_callconv PrefFunArg Create bindings are also included, since
in that convention the callee owns the argument slot.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nalysis

check_definitely_init now correctly handles:
- Ewseq: e2 receives already_init (not ia1), since Neg stores in e1 are not
  sequenced before e2; init_after = ia1 || ia2
- Eunseq: all arms receive already_init; safe/init_after both require AND
  across arms (no arm's result is visible to siblings)
- SeqRMW: classified as Use_seqrmw (promotable); (safe=already_init,
  init_after=true) — atomically reads then writes

Add no_mixed_unseq_uses predicate: blocks promotion when a write arm and
>=2 mentioning arms coexist in an Eunseq, preserving Cerberus's
unsequenced-race detector for cases like `x + (x = 42)`.

find_promotable now requires all three predicates:
  use_is_promotable && check_definitely_init && no_mixed_unseq_uses

Add seven CI tests (0351-0357) covering Ewseq promotion, uninit/init
unsequenced races, read-only Eunseq arms, post/pre-increment SeqRMW,
and x+(x++) UB detection. Update run-mem2reg-phase2.sh and tests.sh.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dc-mak
Copy link
Copy Markdown
Collaborator Author

dc-mak commented Mar 16, 2026

Pasting this discussion here for visiblity.

@cp526:
I suspect for Esave one wants to transform the label type. In the normal Core elaboration, for any in-scope C variable x, the Esave will take as an argument a pointer to the stack location of x. The mem2reg transformation should make it so the label instead takes the current value of x.

@neel-krishnaswami:
Operationally, save is basically just a let-binding, so it's easy.

It's only when you hit a run that anything weird happens. Its operational rule uses the "labelmap" which (annoyingly) shows up in the run operational rule and nowhere else in Kayvan's thesis. Conceptually it's pretty straightforward but there may be details which live only in the Cerberus codebase and/or Kayvan's brain.

@dc-mak:
Ok so it seems like for the analysis, the default parameter of save counts as a use, but a slightly unusual one. As Neel pointed out, it is basically a let-binding, so we need to consider the binder as an alias for analysing the body. But for the transformation/executing the save as straightline code (not from a run), we would want it to count the default value as a load.

AFAIR, elaboration ensures that saves are closed wrt C variables in scope (ie. only C local vars are bound in its binders). However, if this were not the case, any uses of a C local variable in the body (which were not bound by the save params), they would disqualify a local variable from promotion (since the save can be reached from straightline, forward run, and backward run, but any transformation would effectively lock-in the straightline case).

I think the fact that save parameters only bind the C local vars means that run should be easy. Unlike function calls, where passing the address of a local var would escape and disqualify it, passing the address of a local var to run would basically count as a load. I don't think labelmap will be much of a blocker, I remember understanding it once and I'll revisit it to be sure.

The transformation for save will be a little fiddly, like with if, since any mutated local vars will need to passed out the final expression, rebound with a let strong, and the environment updated.

@cp526 For what it's worth, there's two alternatives to doing this as a core-to-core rewrite: option (1), which Kayvan was considering, is to adjust the elaboration around the variables mem2reg would optimise; option(2) would be to make this a mucore-to-mucore rewrite. We want to merge the core and mucore expression grammars, but will want to keep mucore's separate handling of labels (with labels as function-local procedures rather than Esave), so then you wouldn't have to think about how mem2reg should work for Esave

@dc-mak
Copy link
Copy Markdown
Collaborator Author

dc-mak commented Mar 17, 2026

@cp526:
For what it's worth, there's two alternatives to doing this as a core-to-core rewrite: option (1), which Kayvan was considering, is to adjust the elaboration around the variables mem2reg would optimise; option(2) would be to make this a mucore-to-mucore rewrite. We want to merge the core and mucore expression grammars, but will want to keep mucore's separate handling of labels (with labels as function-local procedures rather than Esave), so then you wouldn't have to think about how mem2reg should work for Esave


Other related points raised in discussion:

  • Consider the impact on the code which supports reading local variables in at a program point within annotations (CN statements). We'd need to thread the environment of the transform through to that point.
  • Consider that the promotable variables info could be useful to Fulminate; this may push towards an Ail-analysis at least, if not an Ail-transform too. IIRC, the symbols are the same in Ail and Core so this may not be a big concern.
  • Another option is analysing at Ail/Core, but emulating the transform in the type system. This would make the type system more complicated.

dc-mak and others added 5 commits March 19, 2026 14:30
…tracking

See doc/history/2026-03-18_mem2reg-esave-erun.md for the full design.

Previously, any occurrence of a CREATE-bound sym in an Esave's args or
body conservatively blocked promotion.  The fix rests on two invariants
of the elaborator:

1. Every CREATE-bound local pointer sym that appears inside an Esave body
   is passed in as a bare PEsym default arg — it is never referenced
   directly from the body under its outer name.  (Non-CREATE-bound syms
   such as function parameters may appear freely; the invariant is
   narrowly about CREATE-bound locals.)  Consequently, if a sym is not
   aliased by any param, it cannot reach the body, and the analysis can
   skip it entirely.

2. Erun is terminal: control jumps unconditionally into the Esave body
   and the sequential continuation is unreachable, so init_after = true.
   Erun args for CREATE-bound syms are bare PEsym aliases of the target
   Esave's params, so alias resolution via find_single_direct_alias is
   sufficient and no structural occurrence check is needed.

Both collect_uses and check_definitely_init use the (cached) result of
recursing on the Esave body using the param_sym which is an alias of the
Erun arg or Esave default arg. They then combine that with any existing
information about the arg itself to calculate the result.

New infrastructure:
- collect_esave_defs: pre-walk that builds a memo table of Esave
  definitions (label_sym → params + body), enabling Erun sites to
  resolve forward references without re-traversal.
- find_single_direct_alias: finds the unique param whose default is a
  bare PEsym equal to the outer sym.
- Memoised collect_uses / check_definitely_init: results cached per
  (label_sym, param_sym); a sentinel is written before the recursive
  call to handle back-edge Eruns without diverging.  For collect_uses
  the sentinel [] is exact (a back-edge Erun carries no semantic use).
  For check_definitely_init the sentinel (true, true) is a sound
  over-approximation: any load-before-store on an exit path sets
  safe=false through &&-propagation before the back-edge Erun is
  reached, so the sentinel cannot mask a real unsafety.
- expr_writes_sym and no_mixed_unseq_uses updated with the same
  param-alias indirection for Esave.

New CI tests: 0361–0364 cover pre-init loop reads, escaping writes,
nested loops, and uninit loads inside loops.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The definite-initialisation check previously used a plain bool for
"is sym initialised after this expression", conflating two distinct
notions: a Paction Pos store (committed before any Esseq continuation)
and a Paction Neg store inside Ewseq (unsequenced w.r.t. the Ewseq
continuation).  This made the analysis overly conservative for the
common case of compound literals, whose Core IR is:

  Ewseq(_, Store0(Paction Pos, ptr, val), Load0(ptr))

The Pos store is sequenced before the Load in memory-model terms, but
the old bool could not express this, so the Load was conservatively
rejected and the variable was not promoted.

Replace the bool with a three-element lattice Write_kind = No_write <
Weak < Strong.  Esseq promotion now treats any write (Weak or Strong)
as initialised for its continuation; Ewseq only propagates Strong.
Load/SeqRMW safety checks use is_strong.  Branch and unsequenced joins
use meet (&&); sequential joins use join (||).

Adds test 0365-mem2reg_compound_lit.c verifying that an int variable
initialised via a compound literal produces the correct result under
--sw mem2reg, and adds it to run-mem2reg-phase2.sh with the stub
create-count of 2 (fully-working target: 0).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Records the finding that the Write_kind refactoring does not change
promotable-variable counts in practice: check_definitely_init checks
for Load0(PEsym sym) directly, but C-generated Core always loads locals
through an intermediate alias (Ewseq(tmp, pure(sym), load(tmp))), so
the Ewseq rule change never fires.  Includes annotated Core output for
the compound-literal test to show why a_508 is filtered by collect_uses
(Use_other) before check_definitely_init is reached, and why x is
promotable under both old and new analysis for unrelated reasons.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces SW_copy_prop / --sw copy_prop, a new Core pass that runs
immediately before mem2reg in core_passes.

The pass finds bindings of the form `let alias = pure(sym)` in
Ewseq/Esseq/Elet/PElet and substitutes sym for alias throughout the
body, replacing the named pattern with a wildcard.  It also handles
the extended case where the RHS is effectful but its tail expression
is pure(sym) with sym not locally bound within the RHS (the compound
literal motivating case from 2026-03-20_mem2reg-weak-init.md).

Key design point: binding nodes are always kept (pattern → wildcard),
never dropped, so source-location annotations are preserved.

New files: copy_propagation.mli/.ml, tests 0366-0373, phase1/phase2
test scripts, history doc.  All 208 CI tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dc-mak
Copy link
Copy Markdown
Collaborator Author

dc-mak commented Mar 20, 2026

I've been coding and thinking some more and I'm now less sure about doing a mem2reg transformation on Core.

  1. It can affect source locations.
  2. The eliminated and new symbols will need to be threaded through to any backends like anyways CN to support existing features to make sure that changes in mem2reg don't break specifications. The mem2reg pass would essentially need to preserve the environment used to support the ghost load in the middle of the function, or loop invariants.
    int main() {
        int x = 10;
        while (x > 0)
        /*@ inv 0i32 < x; x <= 10i32 @*/
       { x--; }
       /*@ assert (x == 0i32); @*/
       return x;
     }

I always found the treatment of local vars in CN to be confusing (and I was not alone in this #443).

Could it be possible to end up in a situation where updating a mem2reg pass would cause existing verified code to break? That would be bonkers, but I am confused right now.

These considerations push me towards favouring an analysis and then passing those locations to CN, so they can be treated specially/on a fast path in the checker itself. But I am not decided/am still unsure.

dc-mak and others added 6 commits March 21, 2026 10:02
Replaces the three-pass promotability analysis (check_definitely_init,
expr_writes_sym, no_mixed_unseq_uses) with a single abstract
interpretation, sequentialisable, that tracks Mem_event sequences and
an env (Uninit / Init pe / DelayedInit pe / Killed) through the IR.

Key improvements over the old Write_kind lattice approach:
- Tracks the concrete stored value (pexpr) so the transform phase can
  substitute it directly at Load sites without a second traversal.
- DelayedInit models Neg-polarity (Ewseq) stores, blocking any
  Load/Store on the same sym before the sequence point.
- SeqRMW is handled: atomically reads the current value, substitutes
  it into the update expression, and records the result as Init.
- Eunseq sequencing-violation detection is now integrated: raises
  Not_sequentialisable if a write arm coexists with any other arm
  that mentions sym.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Regrettably, this commit fixes up many small and big things in mem2reg.

- Mem_event => Event
- Mem_event.t list => Set.Make(Event).t
- Remove DelayedInit and fix handling of Neg_stores in wseq
- Remove alias case for wseq in collect_uses (handled by copy_prop pass)
- Use Core_aux.unsafe_subst_pexpr instead of hand-rolled version
- Use Pexpr (.. PEundef ..) for env "value" of non-returning
  case-branches to retain pattern-completeness
- Handle End and Epar same as Eunseq
- Fix the Ebound case to keep the env
- Replace tail_sym/tail_sym_acc with tail_pexpr/tail_pexpr_acc,
  which returns any safe pure expression at the tail of an effectful
  expression, not just PEsym.  Two new helpers support this:
  pexpr_safe (conservative free-variable check) and binders_of_pat
  (collects all named symbols from any pattern, including CaseCtor
  tuples).

- Change type env from (sym, sym) Pmap.map to (sym, pexpr) Pmap.map.
  Bindings for Ewseq/Esseq/Elet now store the full pexpr value in
  the env; PEsym substitution returns the stored pexpr directly.

All 208 CI tests pass; copy-prop phase-1 regression suite unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Generalise tail_pexpr to return a pexpr_tree rose tree that mirrors
Eunseq nesting, and add match_pat_tree to walk a Core pattern and tree
in lock-step, partially wildcarding tuple positions where a known pure
expression is available.  This lets copy_prop propagate constants out
of unsequenced sub-expressions — comparison operators, function-pointer
loads, etc. — into the continuation.

Also fix a Core typechecker asymmetry exposed by the new propagations:
the checking rule for Cspecified was stricter than inference — it did
not allow Specified(pe) to be checked against BTy_storable even when
pe has an object type, which is semantically fine.

New CI test: 0374-copy_prop_tuple.c.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tracking the actual pexpr value in the environment per sym complicates
the code needlessly (at this stage) and is also just plain incorrect,
if done one symbol at a time.

Hence this commit removes it, which simplifies much of the code
drastically. This commit also makes sure that non-returning code in a
wseq or an unseq throws a Not_sequentialisable error, since we can't
soundly approximate the events to detect a race condition (though I'm
fairly sure the elaboration never places a run inside either of those
constructs).

For example, for this C program:

```c
int main()
{
    void *p;
    int houdini;
escape:
    p = &houdini;
    return p ? 0 : 1;
}
```

After copy propagation, we get this label.

```ocaml
  save escape: unit (p: pointer:= p, houdini: pointer:= houdini) in
    let strong _: loaded pointer =
      bound(
	...
        let weak _: unit = neg(store('void*', p, Specified(houdini))) in
        pure(Specified(houdini))
      ) in
    pure(Unit) ;
```

And we'd have to be very careful with tracking the environment for `p` after
the save block, to make sure it doesn't refer to the save-symbol houdini,
but to the procedure-level one.
dc-mak added 3 commits March 24, 2026 11:41
Leaving the values in place still causes mem2reg to fail
with plenty Use_others, so updating copy propagtion to rewrite
to Units removes such false negative uses.

This rewrite found that evaluation of PEare_compatible is stricter
than it needed to be/can take multiple steps, due the fact (a) pure
expression can diverge (b) so function calls reduce in more than
1 step (c) let pat = pe1 in pe2 eagerly evaluates pe1 to a value
in more than 1 step but (d) are_compatible(pe1, pe2) only gives
pe1 and pe2 one step each so (e) rewriting

`let x = params_nth(param, 0) in are_compatible(_, x)`
with `let _ = Unit in are_compatible(_, params_nt(param, 0))`
caused evaluation to fail (test 0021-fact.c, amongst others).

This was fixed by updating param_nth to return the whole expression when
reducing one step each didn't produce a value (values of the wrong type
are still rejected, of course).

I still don't understand why marking PEcfunction as unsafe makes it not
substituted - I would have thought the pattern matching on the tuples
would have prevented it.
Previously, a tuple like
let (x,y) = pure((v, cfunction(v)) in ..
would be discarded from consideration altogether due to the cfunction on
the right, even though x can be propagated.

The reason for this was that pure tuples were replaced with Unit as a
whole rather than per branch; this commit fixes that to do it per
component recursively.

It also tidies up much of the comments and function names along the way
(sorry!).
dc-mak and others added 2 commits March 27, 2026 11:06
Replace the two-phase pexpr_tree build + match_pat_tree approach with a
single-pass analyze_pat_pexpr / analyze_pat_expr that takes the binding
pattern alongside the expression and returns a coherent (bindings,
new_pat, new_expr) triple.

Previously, prop_and_rm could replace e.g. Specified(ptr) with unit
while match_pat_tree left the Cspecified(x) pattern unchanged, producing
a type-incoherent let Specified(x) = pure(unit). The new code recurses
into Cspecified wrappers and rebuilds them consistently.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
For CN, the elaboration annotates pure(<int>) literal expressions with a
C type so to support the use of signed and sized bit-vector typing.

When replacing such expressions with pure(Unit), this messes up CN
typing and so this commit removes such annotations from the units,
but propagates them inside the pure expression so that propagated
integer values get them.

It also pushes the location annotations into pure expressions similarly.

Testing with CN also uncovered rems-project#1000, hence the integer annotations
aren't required/assumed to be unique.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant