Skip to content

Latest commit

 

History

History
226 lines (192 loc) · 9.58 KB

File metadata and controls

226 lines (192 loc) · 9.58 KB

program.md — BITR Solver Task Specification

Goal

Build a competitive BTOR2 model checker based on Bitvector Decision Diagrams (BVDDs). The solver should handle both bitvector and array benchmarks from HWMCC'24, with bitwuzla as the reference solver.

Architecture

  • bvdd/: Standalone BVDD library — value sets, terms, constraints, BVCs, BVDD nodes, HSC
  • bitr/: BTOR2 solver — parser, Canonicalize/Solve engine, theory blasting, BMC loop
  • See .claude/commands/bitr-expert.md for complete algorithmic reference

Current Status

Phase: 10+ — Correctness foundation landed; competitiveness push in flight Last updated: 2026-04-17

2026-04-17 — Correctness + Incremental BMC landed (commits 72d7d1a, 5c88523, d98856b)

See the plan at /Users/ck/.claude/plans/make-a-plan-on-melodic-dolphin.md and the detailed log entries in experiments.log.

  • A1 BMC bound-exhausted → Unknown (was Unsat) — eliminates wrong-UNSAT
  • A2 SAT witness verification (extract_witness_verified + BMC/kind downgrade)
  • A3 --verify flag: oracle cross-check with panic-on-mismatch
  • A4 Golden tests lock in 16 tiny benchmarks
  • A5 Reset-signal soundness fence (collect_non_reset_uses)
  • B Incremental SAT BMC module + BitBlaster snapshot API
  • C2 Bitblaster ITE memoization + 11 peepholes
  • Internal tier timeouts now capped by outer budget; Python sweep runner (scripts/run_sweep.py) enforces wall-clock via SIGKILL on timeout+5s
  • 120/120 tests passing, 0 clippy warnings
  • HWMCC BV 155-benchmark sweep measurement (in progress)

Open (post-sweep direction)

  • Default-on --incremental-bmc if sweep shows net gain
  • Phase D: inductive generalization for kind_only timeouts
  • AIG structural rewriting in bitblaster (larger than just ITE)
  • Cooperative timeout cancellation inside substitute_states (term DAG walk can blow budget; Python wrapper masks this today)

Phase 0 (complete)

  • Repository structure created
  • Cargo workspace configured
  • Core types defined (bvdd/src/types.rs)
  • ValueSet with tests (bvdd/src/valueset.rs)
  • Term table with hash consing (bvdd/src/term.rs)
  • Constraint table with simplification (bvdd/src/constraint.rs)
  • BVC manager stub (bvdd/src/bvc.rs)
  • BVDD node representation (bvdd/src/bvdd.rs)
  • HSC stub (bvdd/src/hsc.rs)
  • BTOR2 parser (bitr/src/btor2.rs)
  • CLI entry point (bitr/src/main.rs)
  • Tiny benchmarks committed
  • cargo test all passing (40 tests)
  • cargo clippy clean

Phase 1 (complete)

  • ValueSet: xor, insert, remove, min/max_element, from_fn, iter (11 tests)
  • TermTable: SubstAndFold with constant folding (4 tests)
  • Recursive term evaluator — all OpKind variants (12 tests)
  • Compiled term evaluator (bvdd/src/eval.rs) — register machine (10 tests)
  • Cross-validation: compiled vs recursive agree on all inputs

Phase 2 (complete)

  • Constraint hash consing via unique table (12 tests)
  • Restrict operation with short-circuit AND/OR
  • Predicate simplification (empty→FALSE, full→TRUE)
  • AND/OR normalization for better hash consing
  • collect_preds, has_no_predicates helpers
  • BVC Apply: structural (PRED constraints) and lifted (fresh vars) (6 tests)
  • BTOR2→BVC lifter (bitr/src/lifter.rs) — all operators + constants (5 tests)
  • 61 total tests, 0 clippy warnings

Phase 3 (complete)

  • BVDD unique table hash consing (terminals + decision nodes)
  • FALSE terminal (BvddId(0)) with FullyCanonical canonicity
  • Edge merging: same-child edges get OR'd value sets
  • Empty-edge filtering and single-FULL-edge collapse
  • Bottom-up flag computation (can_be_true, is_ground)
  • All-FALSE-children collapse to FALSE terminal
  • Direct-mapped computed cache (64K entries) for Solve results
  • restrict_to_valueset with cache integration
  • 74 total tests, 0 clippy warnings

Phase 4-5 (complete)

  • Solve algorithm with ground check, terminal→Canonicalize, decision traversal
  • Canonicalize with 6 phases: ground, SAT witness, UNSAT prune, no-preds, Decide, Restrict
  • Decide: predicate selection + coarsest partition
  • Generalized blast: narrowest-variable-first enumeration (budget 2^20)
  • Theory resolution for predicate-free constraints
  • BTOR2→BVC lifter with all operators, slice/ext special handling
  • CLI integration: parse → lift → solve → output sat/unsat
  • 9/9 tiny benchmarks correct (6 SAT, 3 UNSAT)
  • 82 total tests, 0 clippy warnings

Phase 8 (complete)

  • BMC loop: init/next substitution, step-by-step unrolling
  • Automatic sequential/combinational detection
  • State variable substitution with SubstAndFold
  • Fixed Solve ground check to verify value against target set
  • CLI: --bound N option for BMC depth
  • 11/11 benchmarks correct (9 combinational + 2 sequential: counter_sat, counter_unsat)
  • 82 total tests, 0 clippy warnings

Phase 6 (complete)

  • Boolean decomposition: find comparison subterms in 1-bit expressions, branch true/false
  • Compiled evaluator wired into generalized blast for fast single-variable enumeration
  • Domain budget: 2^28 with compiled evaluator (up from 2^20)
  • 4-stage theory resolution cascade: bool decomp → compiled blast → byte-blast → oracle

Phase 7 (complete)

  • SMT oracle (bitr/src/oracle.rs): term→SMT-LIB2, subprocess invocation, result parsing
  • Oracle caching via HashMap
  • Auto-detect solver binary (bitwuzla, boolector, z3)
  • Byte-blast oracle in solver: split widest var MSB byte, enumerate 256×LSB values
  • Adaptive bailout (25% threshold), max depth 4
  • Oracle integration via set_oracle() callback on SolverContext

Phase 9 (complete)

  • Array state tracking (Base / Write chain) in lifter
  • READ-over-WRITE expansion: READ(WRITE(a,w,v), r) → ITE(EQ(r,w), v, READ(a,r))
  • Nested write chains → nested ITE chains
  • Array input/state handling with sort detection
  • Array benchmarks: array_row_sat, array_row_unsat
  • 13/13 benchmarks correct (9 combinational + 2 sequential + 2 array)
  • 82 total tests, 0 clippy warnings

Completions (post-Phase 9)

  • Oracle wired into CLI: auto-detect bitwuzla/boolector/z3, --no-oracle flag
  • BMC term-for-term substitution: subst_term for symbolic next-state expressions
  • HSC cascade implemented: MSB→LSB 8-bit slice decomposition + tests
  • Multi-variable compiled blast: full enumeration via CompiledProgram (≤65536 total)
  • Counterexample extraction: extract_witness walks BVDD + records during blast
  • Witness display: --verbose prints variable assignments on SAT
  • Fixed Uext/Sext arity (1, not 2)
  • New benchmark: shift_reg_sat (4-bit shift register, symbolic BMC)
  • 14/14 benchmarks correct, 87 tests, 0 clippy warnings
  • Native CDCL bit-blasting via splr SAT solver (Tseitin encoding, gate memoization)
  • CDCL bitblaster moved to Stage 2b (before byte-blast/parallel blast)
  • 6,600x speedup on twovars_16_unsat (25.4s → 0.004s), 3,200x on wide_mul_32
  • Total perf time: 22.7s → 1.4s (16x improvement, near-parity with bitwuzla)
  • 35/35 benchmarks correct, 103 tests, 0 clippy warnings
  • K-induction prover (bitr/src/kinduction.rs): inductive safety proofs
  • HWMCC BV: 32/155 → 60/155 solved via k-induction + CDCL bitblaster in BMC

Implementation Phases

Phase 1: Value Sets + Terms (bvdd/)

  • Finalize ValueSet operations (tested)
  • Complete TermTable with SubstAndFold
  • Add term evaluation (recursive + compiled)
  • Unit tests for all operations

Phase 2: Constraints + BVCs (bvdd/)

  • Full constraint table with hash consing
  • BVC Apply operation (cross-product)
  • BVC from BTOR2 operators
  • Lifted variable definitions

Phase 3: BVDD Core (bvdd/)

  • Unique table for hash consing
  • Apply operation
  • Edge merging (same-child → bitmask OR)
  • HSC cascade construction
  • Computed cache (direct-mapped)
  • C API header generation

Phase 4: BTOR2 Lifting

  • BTOR2 parser → BVC construction
  • BVC → BVDD lifting (lazy mode)
  • Handle all BTOR2 operators
  • Validate on tiny benchmarks

Phase 5: Canonicalize/Solve

  • Implement 6-phase Canonicalize
  • Implement Solve (decision node traversal)
  • Decide (predicate selection + partition)
  • Restrict (constraint restriction)
  • Test on tiny sat/unsat benchmarks

Phase 6: Theory Resolution

  • Boolean decomposition (1-bit terms)
  • Generalized blast (variable enumeration)
  • Compiled evaluator
  • Domain budget checks

Phase 7: External Oracle

  • Bitwuzla integration via C FFI
  • Oracle caching
  • Byte-blast oracle (recursive splitting)
  • Budget management (depth, time)

Phase 8: BMC Loop

  • State unrolling
  • Init/Next substitution
  • Bad property checking per step
  • Counterexample extraction

Phase 9: Array Support

  • WRITE → READ-over-WRITE expansion
  • ITE chain construction for READ
  • Array-track benchmark testing

Phase 10+: Optimization

  • Profile hot paths
  • SIMD for ValueSet operations
  • Tune domain budgets
  • Maximize HWMCC'24 solved count

Evaluation

  • Timeout: 300s per benchmark
  • Memory: 8GB limit
  • Correctness: paramount — wrong answers are fatal
  • Primary metric: number of benchmarks solved correctly
  • Secondary metric: total solving time (PAR-2)

Experiment Protocol

Before each benchmark run:

  1. Note the change being evaluated
  2. Record git commit hash
  3. Run full benchmark suite with timeout
  4. Record results in results/ as CSV
  5. Append summary to experiments.log
  6. Compare against previous best and reference solver