Skip to content

Replace CIR interpreter with LIR interpreter#9272

Draft
lukewilliamboswell wants to merge 138 commits intomainfrom
lir-interpreter
Draft

Replace CIR interpreter with LIR interpreter#9272
lukewilliamboswell wants to merge 138 commits intomainfrom
lir-interpreter

Conversation

@lukewilliamboswell
Copy link
Copy Markdown
Collaborator

@lukewilliamboswell lukewilliamboswell commented Mar 18, 2026

Summary

This PR replaces the old CIR-level interpreter with a new pipeline that lowers CIR → MIR → LIR before interpreting. This means the interpreter now runs on the same monomorphized IR that the dev and wasm backends use, sharing more of the compiler pipeline and eliminating an entire class of bugs where the interpreter diverged from compiled behavior.

Key changes:

  • New interpreter pipeline: Comptime evaluation now goes through the full monomorphization and LIR lowering passes before interpretation, instead of directly interpreting CIR nodes
  • Stack-safe eval engine: Replaced recursive evaluation with an explicit WorkStack/ValueStack architecture, eliminating stack overflows on deeply nested expressions
  • Deleted old interpreter infrastructure: Removed interpreter_layout, interpreter_values, StackValue.zig, and ~15 recursive eval functions that are no longer needed
  • Test infrastructure overhaul: New parallel eval test runner with fork-based crash isolation, shared test harness extracted for reuse across CLI and eval tests, and kcov-based coverage tooling
  • Bug fixes across the pipeline: Fixed exponential blowup in MIR lowering and monomorphizer, memory leaks in list/string RC, register spill issues on x86_64, and numerous crashes in edge cases
  • Wasm backend improvements: Delegated host functions to shared builtins instead of reimplementing them, added compiler-rt intrinsics for wasm32 self-containment

Switch the eval backend from the direct CIR-level interpreter to a
CIR → MIR → LIR → RC pipeline that interprets LIR directly. This
unifies the lowering path with the dev/wasm backends and removes the
large CIR interpreter (~24k lines replaced).

Key changes:
- Add cir_to_lir.zig (LirProgram): shared CIR→MIR→LIR→RC lowering
- Add lirInterpreterEval: typed result extraction (int/float/dec/bool/str)
  without Str.inspect wrapping, avoiding double-inspect and type confusion
- Add DivisionByZero error to interpreter with proper message propagation
- Enable infinite-while-loop detection in comptime evaluator
- Add NodeStore.ensureScratch() for deserialized stores
- Fix Monotype API: resolve() → getMonotype(), funcRet() → .func.ret
- Remove is_try_suffix check (not present in LIR match_expr)
- Skip tests that trigger monomorphize panics (signal 6) on invalid code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lukewilliamboswell and others added 28 commits March 23, 2026 11:29
The synthetic symbol created in lowerRecord for record update extension
bindings (e.g. `{..acc, sum: acc.sum + item}`) inherited `reassignable = true`
from `Ident.Idx.NONE`, causing MirToLir to emit `cell_load` instead of a
regular `lookup`. Since the binding is `decl_const`, no `cell_init` was ever
emitted, and the interpreter failed at runtime.

Use an explicit non-reassignable ident template instead of `Ident.Idx.NONE`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
callUsesAnnotationOnlyIntrinsic only handled e_lookup_external and
e_lookup_required, so calling a local annotation-only function
(e.g. `foo : Str -> Str` then `foo("test")`) panicked with signal 6
in monomorphize. Add e_lookup_local handling by scanning module defs
to find the matching pattern.

Also fix MirToLir to emit crash expressions instead of panicking for
non-intrinsic annotation-only calls/accesses, so the evaluator properly
counts them as runtime crashes.

Unskips 9 previously-skipped tests (5 new annotation-only tests, 1
uncommented annotation-only test, 1 decode dispatch test, 1 while(True)
test, 1 closure capture test).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The MIR monotype resolver would infinitely recurse (stack overflow) when
resolving layouts for recursive tag unions without explicit Box, such as
Tree := [Node(Str, List(Tree)), Text(Str), Wrapper(Tree)]. The Wrapper(Tree)
variant directly references the enclosing type, causing unbounded recursion
in buildRefForMonotype.

Three fixes:

1. mir_monotype_resolver: Reserve and cache the tag_union graph node
   BEFORE recursing into variant payloads. Track actively-building
   tag unions; when a back-edge is detected (and we're not inside an
   explicit Box), wrap the reference in a box graph node to provide
   the indirection needed for finite layout sizes.

2. MirToLir lowerTag: When the resolved layout for a tag expression is
   box(tag_union), unwrap to the inner layout for tag construction,
   then box the result with box_box.

3. Re-enable the issue #8754 regression test (simplified to test value
   creation; full pattern matching on boxed recursive types needs
   additional match-lowering work as a follow-up).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ve test

The recursive function with record test (issue #8813) requires 1000 call
frames but the interpreter limit was 512. Bump to 1024 since interpreter
call frames are heap-allocated and lightweight.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The C builtin str_from_utf8 returns a flat FromUtf8Try struct, but the
Roc type system treats the result as a Result Str [BadUtf8 ...] tag union.
The interpreter was memcpy'ing the raw C struct bytes directly into the
tag union value, causing match expressions to fail because the discriminant
was at the wrong offset (e.g. reading the string length as the discriminant).

Fix by resolving Ok/Err variant indices and error record field offsets from
the layout store, then writing the tag union fields at the correct positions.
This mirrors the approach used by the dev backend (LirCodeGen.zig).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three root causes fixed:

1. Monomorphize: resolveLookupExprProcInst and inferDirectCallProcInst
   called resolveTemplateDefiningContextProcInst for closures without
   checking that an active proc inst context existed. Added guards
   matching the existing pattern in materializeDemandedExprProcInst.

2. MIR Lower: when a closure is lowered directly as a call target (not
   through its defining function's body), the defining function's
   parameter symbols were never bound in pattern_symbols. Debug tracing
   confirmed the store and lookup used different Lower instances
   (inst=1205 vs inst=1209) despite identical keys. Added
   ensureDefiningContextParamsBound to lazily bind the defining
   function's arg patterns before capture resolution.

3. REPL interpreter: evalCrash didn't extract the message from the LIR
   crash expression, and the interpreter REPL path skipped the
   getDeferredCompileCrash check. Fixed both so annotation-only
   function calls produce proper "Crash:" messages.

Also updates dev_object snapshot hashes and adds TODO_REPL_FAILURES.md
documenting remaining issues (docs panics, multiline_string segfault,
cross-def closure evaluation regression).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the sequential Zig built-in test runner for eval tests with a
standalone parallel binary. Worker threads pull tests from a shared
atomic index, each loading its own builtins to avoid shared mutable
state. Crash protection uses threadlocal setjmp/longjmp + signal
handlers (following the snapshot tool pattern) so segfaults are
recorded and the runner continues.

`zig build test-eval` now builds and runs the new runner.
Supports --filter, --threads, and --verbose via run args.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nd compare results

The parallel eval test runner was only exercising the interpreter. Now each
test also runs the dev, wasm, and llvm backends via Str.inspect, then
compares all outputs to catch cross-backend mismatches.

Key changes:
- compareAllBackends() runs dev/wasm/llvm via helpers.devEvaluatorStr,
  wasmEvaluatorStr, llvmEvaluatorStr and checks agreement
- Restore eval module to zig build test (was accidentally removed)
- Wire test-eval-parallel into zig build test
- Export devEvaluatorStr/wasmEvaluatorStr/llvmEvaluatorStr as pub in helpers.zig
- Fix runTestProblem UB (was passing undefined to cleanup), fix SA.NODEFER
  portability, remove unused ThreadBuiltins, implement dev_only_str

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…l tests

The old eval module tests are temporarily removed from `zig build test`
while tests are ported to the new parallel runner format. The parallel
runner (test-eval) is wired into `zig build test` as the replacement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… step

The eval runner was hanging under kcov because the dev backend uses fork()
for crash isolation, and kcov can't trace forked children properly.

- Add --coverage CLI flag: disables fork and forces single-threaded
- Add force_no_fork flag to helpers.zig devEvaluatorStr
- Move eval coverage out of `zig build coverage` into standalone
  `zig build coverage-eval` step that passes --coverage to the runner

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
kcov instrumentation skews timing measurements, so suppress the
aggregate stats table and slowest-tests ranking when --coverage is
active. Per-test breakdowns still show in --verbose for debugging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The coverage-eval step was pulling in the full parser coverage pipeline
via a transitive dependency on mkdir_step. Fixed by giving eval its own
codesign step.

Also made CoverageSummaryStep generic: label and min_coverage are now
configurable so eval coverage prints "EVAL CODE COVERAGE SUMMARY" and
uses its own threshold (0% while tests are being ported).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…unner

- Add `skip` field to TestCase with flags for interpreter/dev/wasm/llvm,
  allowing individual backends to be disabled per test. Any test with a
  skip reports as SKIP rather than PASS to keep partial coverage visible.
- Add per-phase monotonic timing (std.time.Timer) for parse, canonicalize,
  typecheck, interpreter, dev, wasm, and llvm phases with statistical
  summary (min/max/mean/median/stddev/P95) and slowest-5 breakdown.
- Add --help/-h with documentation of all options, timing instrumentation,
  and backend coverage philosophy.
- Update MIGRATE_EVAL_TEST_PROMPT.md with skip field usage examples.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix use-after-free: arena-allocated failure messages are now duped to
  the GPA so they survive arena resets between test iterations.

- Fix signal handler: remove SA.NODEFER to prevent re-entrant signals
  during longjmp. After recovery, explicitly unblock SEGV/BUS/ILL via
  sigprocmask so future crashes are still caught.

- Reduce duplication: consolidate six runTest* functions into a single
  runNormalTest with a switch on Expected variant. Extract runBackend
  helper for compareAllBackends. Rewrite runTestProblem to reuse
  parseAndCanonicalizeExpr.

- Strict layout checks: remove silent fallbacks in value assertions
  (e.g., i64_val no longer silently handles Dec layout). Each Expected
  variant now validates the exact layout type before reading the value.

- Remove redundant int_dec variant (i64_val already covers integers,
  dec_val covers Dec values).

- Fix i64_val type: i128 -> i64 to match the name.

- Fix test data: untyped number literals default to Dec in Roc, so
  tests now use dec_val instead of i64_val.

- Consistent Timer.start() error handling: use catch unreachable
  everywhere.

- Document LLVM evaluator bitrot in LLVM_EVAL_ISSUE.md (MonoLlvmCodeGen
  and lirExprResultLayout reference removed APIs). Fix monomorphization
  step in llvm_evaluator.zig.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… runner

- Replace ParTestEnv with shared TestEnv (fixes alignment-unsafe realloc
  that used Allocator.realloc instead of rawAlloc+memcpy, and removes
  80 lines of duplicated host ops code)
- Remove numericStringsEqual/boolStringsEquivalent — all backends use
  Str.inspect so direct byte comparison is correct
- Fix compareBackendResults OOM path: return static error string instead
  of null (which silently swallowed real mismatches)
- Remove int_dec variant from migration guide (not implemented)
- Remove hardcoded MAX_THREADS=64, dynamically allocate thread array
  capped by CPU count
- Document signal handler setjmp/longjmp UB as TODO
- Document wasm evaluator thread safety (per-call instances + threadlocal)
- Improve --help to explain the -- separator requirement
- Delete LLVM_EVAL_ISSUE.md (belongs in a GitHub issue, not repo root)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move all tests using supported Expected variants (i64_val, dec_val,
bool_val, str_val, f32_val, f64_val, err_val, problem,
type_mismatch_crash, dev_only_str) from eval_test.zig into the
data-driven eval_tests.zig table consumed by `zig build test-eval`.

Key decision: unsuffixed numeric literals in Roc default to Dec, not
I64. The old runExpectI64 silently converted Dec→int, masking the
actual type. Migrated tests now use .dec_val for unsuffixed literals
and .i64_val only for suffixed integer types (e.g. 42.I64, 255.U8),
making the expected types accurate.

62 test blocks remain in eval_test.zig using helpers that have no
parallel runner variant yet (runExpectRecord, runExpectTuple,
runExpectListI64, runExpectListZst, runExpectEmptyListI64,
runExpectIntDec, runExpectSuccess) plus custom infrastructure tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All 53 closure tests use unsuffixed numeric literals, so numeric results
use .dec_val. String results use .str_val. The old file is deleted and
its refAllDecls removed from mod.zig.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add per-type Expected variants (u8_val, u16_val, u32_val, u64_val,
u128_val, i8_val, i16_val, i32_val, i128_val) to the parallel runner
so type-annotated expressions use the correct storage type. All integer
variants share the same handler pattern via intExpected() helper.

Covers all 10 integer types (U8-U128, I8-I128), F32, F64, Dec,
Dec.to_str, and type mismatch tests. Old file deleted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All 11 list_refcount files migrated (10 with tests, 1 placeholder).
All tests use unsuffixed numeric literals → .dec_val. String tests
use .str_val. All files deleted and refAllDecls removed from mod.zig.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nner

Add `inspect_str` Expected variant to the parallel test runner that
compares RocValue.format() output (interpreter) and Str.inspect output
(compiled backends) against an expected string. This enables testing
records, tuples, lists, and other composite types without building
complex structured value comparisons.

Migrates record fold tests (26), list I64/ZST tests (16+6), tuple tests
(2), Dec fold/sum tests (6), literal evaluation tests (~15), and issue
regression tests to the parallel runner (987 total test cases).

5 tests remain in eval_test.zig: 2 infrastructure tests (crash callback,
ModuleEnv serialization), 3 tag-union-result tests that can't use
inspect_str (RocValue.format hits unreachable for tag_union layout).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nner

- Add TagUnionNotSupported error to interpreter and RocValue.format()
  so tag union tests can gracefully fall back to compiled-backend comparison
- Migrate 3 tag union regression tests from eval_test.zig to parallel runner
- Fix formatting/indentation across eval_tests.zig test cases
- Update dev_object snapshot hashes for nested tag codegen changes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Delete MIGRATE_EVAL_TEST_PROMPT.md (migration task complete)
- Add FUZZ_EVAL_COVERAGE_PROMPT.md for LLM-driven coverage improvement
- Add scripts/eval_coverage_gaps.py to analyze kcov output and find
  uncovered interpreter code regions
- Add SKIP_ALL constant to eval_tests.zig for bug-documenting tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Crash message test → TestEnv.zig (tests its own crash callback)
- ModuleEnv serialization + interpreter test → module_env_test.zig
  (joins existing serialization roundtrip tests)
- Remove eval_test.zig refAllDecls from eval/mod.zig

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Monomorphize pass produced .unit monotypes instead of .func for type
module method calls during comptime evaluation. This cascaded through
MIR Lower and MirToLir, hitting unreachable at each stage and crashing
all four docs snapshot tests (docs_static_dispatch, docs_type_module,
docs_type_module_visibility, docs_transitive_modules).

Root cause: compile_package.zig runs ComptimeEvaluator.evalAll() during
module compilation, which triggers Monomorphize for type module methods.
monotypeFromTypeVarInStore failed to resolve function types across module
boundaries, producing degenerate .unit monotypes instead.

Fixes:
- Monomorphize: guard resolveLookupExprProcInst and inferDirectCallProcInst
  to skip creating proc insts with non-func monotypes
- Monomorphize: defensive returns in bindCurrentCallFromProcInst and
  finalizeResolvedDirectCallProcInst for non-func monotypes
- Lower: defensive returns in bindFlatTypeMonotypesInStore,
  bindFlatTypeMonotypes, bindProcTemplateBoundaryMonotypes,
  lowerLambdaSpecialized, and procInstReturnMonotype
- Lower: e_lookup_external non-def-node targets emit runtime_err_type
- Lower: relax debug assertion in lowerCall for missing proc insts
- MirToLir: runtime_err_type/runtime_err_can callees emit crash LIR
  expression instead of panicking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lukewilliamboswell and others added 23 commits March 25, 2026 21:07
Two performance optimizations for the eval test runner (3.4x speedup):

1. Load builtin module once in the parent process before forking.
   Children inherit the data via copy-on-write instead of each
   independently deserializing builtins (~83% reduction in parse time).

2. Wrap the nested forkAndEval child allocator in ArenaAllocator
   instead of using page_allocator directly. Backend evaluators were
   doing hundreds of individual mmap/munmap syscalls per test; arena
   batches these into a few large chunks (~58-81% reduction in
   backend eval time).

Wall-clock: 2249ms → 663ms on 1303 tests with 16 processes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d match tests

- Remove interpreter+wasm skip from 31 "dev only" tests (Bool, U32, List, Str,
  polymorphic HOFs) — all pass on all three backends now
- Remove wasm skip from 3 match regression tests — wasm now handles tag unions
- Unskip "early return: ? in closure passed to List.fold" — passes on all
  backends after prior monomorphization fix

Eval tests: 1287 passed, 0 failed, 0 crashed, 16 skipped (was 1252/51)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the 5 sequential test_runner invocations in `zig build test-cli`
with a single fork-based parallel runner (parallel_cli_runner.zig) that
runs all 87 platform tests concurrently. Features:

- Unified data-driven spec covering int/str/fx platforms x backends
- Fork-based process pool with configurable worker count
- Per-test timing, statistics summary (min/max/mean/median/P95)
- Quiet output: only shows failures with stderr capture and repro instructions
- Filter support via `zig build test-cli -- --test-filter "pattern"`
- Timeout detection with process group cleanup (setsid + kill(-pid))
- TTY-aware progress reporting

Also add per-step timing to the minici build step with a summary table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Forward all --test-filter values to parallel_cli_runner, not just the
  first one (build.zig)
- Accept multiple --filter args in the parallel runner
- Match filters against both the formatted test name and the raw
  roc_file path, so filters from roc_subcommands naming conventions
  also work
- Suppress "No tests matched filter." when zero tests match — the
  parallel runner is one part of the test-cli umbrella step, so a
  filter targeting roc_subcommands_test legitimately matches nothing
  here

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consolidate the interpreter's public API into a single Interpreter.eval()
entry point that always takes caller-supplied RocOps, routing dbg,
roc_expect_failed, roc_crashed, memory ops, and hosted functions through
the caller. The interpreter no longer promotes a failed expect into a
synthetic crash. Update all callers (runner, comptime_evaluator, repl,
test_runner, interpreter_shim, test helpers) to pass explicit ops.

Fix dbg lowering in Lower.zig so `dbg x` evaluates x once, applies
Str.inspect for the dbg effect, then returns the original value. This
resolves the 42 vs 42.0 stderr mismatch and the DebugGlue decref/
use-after-free on complex values. Add RocOps.expectFailed() helper in
host_abi.zig and wire up a real dbg hook in TestEnv.zig.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… TestEnv

std.debug.print pulls in std.Thread and std.posix which don't exist on
freestanding targets. Replace with a debugPrint wrapper that is a no-op
on wasm32-freestanding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…edFunctions helper

Move the caller-provided RocOps from an EvalRequest field passed on each
eval() call to an Interpreter.init() parameter, simplifying the API and
removing the activate/deactivate dance in InterpreterRocEnv. Also add
host_abi.emptyHostedFunctions() to replace scattered `{ .count = 0,
.fns = undefined }` patterns with a safe, initialized function table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- REPL: decref the inspected string after copying it out
- list_sort_with: incref elements in cloned list, decref consumed source
- str_concat: release consumed input parts after concatenation
- str_escape_and_quote: decref consumed input string
- Match branches: drop owned matched values on wildcard-only patterns
- disc_switch: decref consumed tag union value
- Plumb value_layout through match dispatch/guard-check work items

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Create src/build/test_harness.zig with reusable infrastructure for
fork-based parallel test runners:

- Comptime-generic ProcessPool with configurable callbacks for test
  execution, serialization, and deserialization
- TimingStats, computeTimingStats, printStatsHeader, printStatsRow,
  printSlowestN for performance reporting
- writeAll, readStr pipe I/O helpers
- parseStandardArgs for consistent CLI flag handling (--filter,
  --threads, --timeout, --verbose)

Update parallel_cli_runner.zig to import the harness, removing ~500
lines of duplicated process pool, statistics, and CLI parsing code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update parallel_runner.zig to use test_harness.ProcessPool for its
fork-based process pool, replacing ~300 lines of duplicated ChildSlot,
launchChild, reapChild, drainPipe, processPoolMain, and
runTestsSequential code.

Also:
- Replace local TimingStats/computeTimingStats/nsToMs/printStatsRow
  with harness equivalents
- Replace local writeAll/readStr with harness.writeAll/harness.readStr
- Use harness.parseStandardArgs for CLI parsing (consistent --filter,
  --threads, --timeout, --verbose flags)
- Support multiple --filter values (fix test_filters[0] truncation
  in build.zig)
- Forward all test_filters from build.zig to eval runner

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split the overloaded test-cli umbrella into three independently
runnable steps with clear verb-target naming:

  build-test-hosts   — build platform host .a libraries (renamed from
                       test-platforms, which didn't actually test)
  test-platforms     — platform integration tests (int/str/fx build+run)
  test-subcommands   — roc CLI subcommand tests
  test-glue          — glue command tests
  test-cli           — umbrella depending on all three (backwards compat)

Each sub-step has its own --test-filter scope. No sequential chaining
between them — they can run independently or together via test-cli.
minici is unchanged (still calls zig build test-cli).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d help handling

- Preserve help_requested and timeout_provided flags in StandardArgs instead
  of returning empty defaults on --help; treat --threads 0 as "use default"
- Add stabilizeResult hook to PoolConfig so the sequential no-fork fallback
  deep-copies arena-owned data before the arena resets (fixes Windows path)
- Both runners (eval, cli) now implement stabilizeResult, honor --help
  explicitly, and use timeout_provided for explicit-timeout semantics
- Add unit tests for arg parsing edge cases
- Register test_harness.zig in ci/tidy.zig dead-file allowlist

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…unds

The interpreter's refcount handling had several divergences from the
compiled backends (Dev, WASM):

- .tag_union in performRcPlan was a no-op, causing leaks when tag unions
  with refcounted payloads were dropped (e.g. Str.from_utf8 Results)
- discriminant_switch_dispatch manually decreffed the scrutinee, but RC
  insertion already handles scrutinee cleanup via tail decrefs
- dropOwnedPatternValue recursively decreffed pattern sub-values on
  no-binding match branches, duplicating the aggregate RC plan
- str_concat_collect manually decreffed parts, but RC insertion processes
  str_concat parts with borrow semantics
- str_escape_and_quote manually decreffed a value RC insertion already
  modeled as consumed

Changes:
- Implement discriminant-aware recursive .tag_union RC in performRcPlan,
  mirroring Dev backend (LirCodeGen.zig:11857-11917)
- Remove manual decref in discriminant_switch_dispatch
- Remove eager match decrefs in all 3 no-binding match paths (match_dispatch,
  match_guard_check pass, match_guard_check fail)
- Remove manual str_concat_collect part decrefs
- Remove manual str_escape_and_quote decref
- Add isolated REPL snapshot tests for each RC ownership shape: wildcard
  match, extracting match, is_ok, is_err, ok_or, and REPL sequences

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a public dropValue(val, layout_idx) method to LirInterpreter that
wraps the existing RC decref machinery so callers can release ownership
of evaluated results. Uses it from all test-helper eval sites
(lirInterpreterEval, lirInterpreterInspectedStr, TestRunner.eval, and
module_env_test) with defer, and arms TestEnv.checkForLeaks() on every
exit path so interpreter-side memory leaks are caught at test time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
….join_with

Three root causes:

1. List child element cleanup: list_decref/list_free in performRcPlan
   only freed the list allocation without iterating child elements.
   Added decrefListElements to recursively decref each element when
   the list is unique, mirroring RocList.decref in the builtins.

2. Spurious incref in list pattern matching: matchPattern created
   seamless slices (with incref) for rest patterns purely to check
   if the pattern matched — removed since the length check suffices.
   bindPattern also incref'd via listSliceValue for rest bindings,
   but the LIR manages those lifetimes through explicit RC expressions.
   Added listSliceValueNoIncref for use in bindPattern.

3. Str.join_with not consuming input list: the interpreter had a custom
   evalStrJoinWith that didn't free the input list, while the native
   strJoinWithC consumes it. Replaced with a direct call to the builtin.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- helpers.zig: Reorder dropValue before checkForLeaks (not in defer)
  so leaks are caught as test errors instead of panics after cleanup.
  Wrap result extraction in labeled block to ensure drop always runs.
- parallel_runner.zig: Propagate error names (e.g. MemoryLeak) from
  child processes through the pipe so failures show the actual error
  instead of generic "ChildExecFailed".
- test_runner.zig: Read result before dropValue, then check for leaks.
- module_env_test.zig: Same reordering for serialization eval tests.
- snapshot_tool/main.zig: Fix snapshotRocRealloc to use rawAlloc+rawFree
  instead of realloc to preserve alignment correctness.
- Add rc_box snapshot tests for Box.box/unbox refcount scenarios.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add cycle detection to lowerStrInspectNominal to prevent infinite
recursion during compile-time inspection code generation for types
like Node := [Text(Str), Element(Str, List(Node))]. When a nominal
type is encountered again while already being inspected, emit "..."
instead of recursing infinitely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…preter shim

- evalBoxBox: Use getBoxInfo and allocRocDataWithRc with correct
  contains_refcounted flag, matching the dev backend. Previously
  hardcoded elements_refcounted=false, causing alloc/dealloc header
  size mismatch when the box contained refcounted types (e.g. Str),
  leading to crashes in the debug allocator.
- interpreter_shim: Check getExpectMessage() after successful eval
  so inline expect failures are reported as crashes instead of
  silently succeeding.
- TestEnv: Add doc comment on LeakError to satisfy zig lints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…isited sets

The monomorphizer's scanExprInternal bypassed the visited_exprs dedup
check entirely when active_bindings was set, allowing unbounded
re-scanning through lookup→def→call chains. This caused hangs on
complex programs like glue specs with many mutually-referencing
functions.

Replace the crude binding_scan_depth > 512 depth limit with a proper
per-convergence-loop visited set keyed by ContextExprKey (not plain
expr key), so the same source expression remains visitable under
different proc-inst contexts within one iteration. Each convergence
loop (scanProcInst, completeTemplateBindingsFromBody) owns a local
visited map that resets between iterations, preserving binding
propagation while preventing cycles within a single traversal.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lowerDotAccess lowered da.receiver twice for non-is_eq method calls:
once inside the structural_eq block and again after it. For method
chains like a.m1().m2().m3(), this caused 2^depth recursive lowering
calls, hanging indefinitely on ZigGlue's deeply nested match/method
chains.

Hoist the receiver lowering before the structural_eq block so it is
computed exactly once and reused by both paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plan is intentionally left here for a follow up PR... I didn't want to lost this context -- I probably could have made GH Issue instead but this seemed ok.

lukewilliamboswell and others added 6 commits March 26, 2026 22:37
…ProcInst

getProcInst returns a pointer into the proc_insts ArrayList backing buffer.
scanProcInst recursively discovers new proc instances and appends them,
which can reallocate the buffer and invalidate the pointer. Capture
fn_monotype_module_idx by value before the scan.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The parallel test runner was only comparing expected values for
inspect_str tests. For all other types (dec_val, bool_val, str_val,
f32_val, f64_val, and all integer types), the value_ok check was
always true — meaning wrong expected values would silently pass as
long as all backends agreed with each other.

Add matchesInspectOutput() on Expected with type-appropriate comparison
for every variant: RocDec formatting for decimals, epsilon tolerance for
floats, quote-stripping/unescaping for strings, and numeric string
equality for integers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants