Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe changes implement mutation-tracking infrastructure and optimization fast-paths across VM built-in types. A version counter system is introduced to the dictionary layer, enabling downstream mutation detection. Iterator types gain fast-path accessors, and PyFunction/PyType receive version management capabilities to support specialization and caching optimizations. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
8fc678d to
57cd6b2
Compare
|
Code has been automatically formatted The code in this PR has been formatted using:
git pull origin specialization |
843794e to
5ac7f95
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
crates/vm/src/builtins/range.rs (1)
663-674: Avoid duplicating step logic betweennext_fastandfast_next.Both methods currently implement the same increment/check path. Please keep one source of truth to prevent divergence.
♻️ Suggested simplification
impl PyRangeIterator { /// Fast path for FOR_ITER specialization. Returns the next isize value /// without allocating PyInt or PyIterReturn. pub(crate) fn fast_next(&self) -> Option<isize> { - let index = self.index.fetch_add(1); - if index < self.length { - Some(self.start + (index as isize) * self.step) - } else { - None - } + self.next_fast() } }As per coding guidelines: "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/range.rs` around lines 663 - 674, The iteration logic in PyRangeIterator is duplicated between fast_next and fast_next (the fast path) — extract the shared increment/check logic into a single helper (e.g., a private method on PyRangeIterator like next_index_or_none or advance_and_get_index) and have both next_fast and fast_next call that helper to compute the next index/value; ensure the helper uses self.index.fetch_add(1), compares to self.length, and returns either the computed isize value (start + index * step) or None so both next_fast and fast_next reuse the same implementation and avoid divergence.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 611-621: The get_version_for_current_state implementation allows
FUNC_VERSION_COUNTER to wrap and reuse version tags; update it to atomically
advance FUNC_VERSION_COUNTER without ever returning a recycled nonzero version
by using an atomic compare-and-swap/loop (or Atomics::fetch_update) that: loads
the current counter (FUNC_VERSION_COUNTER), if it's 0 or u32::MAX treat as
exhausted and return 0, otherwise compute next = current.wrapping_add(1) and
attempt compare_exchange to set it to next; once you successfully set the global
counter, store that nonzero new version into self.func_version and return it;
reference get_version_for_current_state, FUNC_VERSION_COUNTER, and
self.func_version for locating the change.
In `@crates/vm/src/dict_inner.rs`:
- Around line 263-270: The version() accessor and bump_version() updater must
use proper Acquire/Release ordering and bump_version must be called while
holding the dict write lock: change version() to load with Acquire and
bump_version() to fetch_add with Release (function names: version and
bump_version in dict_inner.rs), and move every call to bump_version so it
executes inside the same write-lock critical section where the dictionary
mutation occurs (the callers in this repo include the mutation sites referenced
in frame.rs that read the version lock-free for LOAD_GLOBAL caching); ensure the
write lock remains held across the mutation and the bump_version call so the
Release store synchronizes with readers that use Acquire.
---
Nitpick comments:
In `@crates/vm/src/builtins/range.rs`:
- Around line 663-674: The iteration logic in PyRangeIterator is duplicated
between fast_next and fast_next (the fast path) — extract the shared
increment/check logic into a single helper (e.g., a private method on
PyRangeIterator like next_index_or_none or advance_and_get_index) and have both
next_fast and fast_next call that helper to compute the next index/value; ensure
the helper uses self.index.fetch_add(1), compares to self.length, and returns
either the computed isize value (start + index * step) or None so both next_fast
and fast_next reuse the same implementation and avoid divergence.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
crates/vm/src/builtins/dict.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/list.rscrates/vm/src/builtins/object.rscrates/vm/src/builtins/property.rscrates/vm/src/builtins/range.rscrates/vm/src/builtins/tuple.rscrates/vm/src/builtins/type.rscrates/vm/src/dict_inner.rscrates/vm/src/frame.rs
| pub fn get_version_for_current_state(&self) -> u32 { | ||
| let v = self.func_version.load(Relaxed); | ||
| if v != 0 { | ||
| return v; | ||
| } | ||
| let new_v = FUNC_VERSION_COUNTER.fetch_add(1, Relaxed); | ||
| if new_v == 0 { | ||
| return 0; // Counter overflow | ||
| } | ||
| self.func_version.store(new_v, Relaxed); | ||
| new_v |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Rust std::sync::atomic::AtomicU32 fetch_add overflow behavior wrapping
💡 Result:
std::sync::atomic::AtomicU32::fetch_add(val, order) performs wrapping (modulo 2³²) addition on overflow and returns the previous value (it does not panic on overflow). (doc.rust-lang.org)
So if the atomic currently holds u32::MAX, then fetch_add(1, …) will store 0 (wraparound) and return u32::MAX. (doc.rust-lang.org)
Citations:
- 1: https://doc.rust-lang.org/beta/std/sync/atomic/struct.AtomicU32.html?utm_source=openai
- 2: https://doc.rust-lang.org/beta/std/sync/atomic/struct.AtomicU32.html?utm_source=openai
Prevent version-tag reuse after counter overflow.
fetch_add wraps on overflow. The current check (if new_v == 0) only catches exhaustion on the first wrap; after u32::MAX increments, the counter cycles back and old version tags become reusable. This violates the "no new valid versions after exhaustion" contract.
🛠️ Suggested fix (checked increment loop)
pub fn get_version_for_current_state(&self) -> u32 {
let v = self.func_version.load(Relaxed);
if v != 0 {
return v;
}
- let new_v = FUNC_VERSION_COUNTER.fetch_add(1, Relaxed);
- if new_v == 0 {
- return 0; // Counter overflow
- }
- self.func_version.store(new_v, Relaxed);
- new_v
+ loop {
+ let current = FUNC_VERSION_COUNTER.load(Relaxed);
+ let Some(next) = current.checked_add(1) else {
+ return 0; // Exhausted permanently
+ };
+ if FUNC_VERSION_COUNTER
+ .compare_exchange_weak(current, next, Relaxed, Relaxed)
+ .is_ok()
+ {
+ self.func_version.store(current, Relaxed);
+ return current;
+ }
+ }
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/builtins/function.rs` around lines 611 - 621, The
get_version_for_current_state implementation allows FUNC_VERSION_COUNTER to wrap
and reuse version tags; update it to atomically advance FUNC_VERSION_COUNTER
without ever returning a recycled nonzero version by using an atomic
compare-and-swap/loop (or Atomics::fetch_update) that: loads the current counter
(FUNC_VERSION_COUNTER), if it's 0 or u32::MAX treat as exhausted and return 0,
otherwise compute next = current.wrapping_add(1) and attempt compare_exchange to
set it to next; once you successfully set the global counter, store that nonzero
new version into self.func_version and return it; reference
get_version_for_current_state, FUNC_VERSION_COUNTER, and self.func_version for
locating the change.
| pub fn version(&self) -> u64 { | ||
| self.version.load(Relaxed) | ||
| } | ||
|
|
||
| /// Bump the version counter after any mutation. | ||
| fn bump_version(&self) { | ||
| self.version.fetch_add(1, Relaxed); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# 1) Show where version bumps happen relative to lock scopes.
rg -n -C4 'bump_version\(' crates/vm/src/dict_inner.rsRepository: RustPython/RustPython
Length of output: 2663
🏁 Script executed:
# 2) Show consumers of dict version values (to validate whether reads are lock-free).
rg -n -C4 '\.version\(\)' crates/vm/srcRepository: RustPython/RustPython
Length of output: 3008
🏁 Script executed:
# 3) Get the full context around lines 263-270 and surrounding code
head -n 300 crates/vm/src/dict_inner.rs | tail -n +250Repository: RustPython/RustPython
Length of output: 1941
Use Acquire/Release ordering for version field and perform version bumps under the write lock.
The version() and bump_version() methods use Relaxed ordering. Currently, bump_version() is called after the write lock is released at lines 319, 383, 441, 465, 499, 536, 738, and 756. Meanwhile, version() is read lock-free in frame.rs (lines 4644, 4680, 7076) to validate LOAD_GLOBAL specialization caches. With Relaxed ordering, a reader may see stale version values after mutations complete, causing cache validation to incorrectly accept invalidated caches.
🔧 Suggested direction
-use core::sync::atomic::{AtomicU64, Ordering::Relaxed};
+use core::sync::atomic::{AtomicU64, Ordering::{Acquire, Release}};
...
pub fn version(&self) -> u64 {
- self.version.load(Relaxed)
+ self.version.load(Acquire)
}
...
fn bump_version(&self) {
- self.version.fetch_add(1, Relaxed);
+ self.version.fetch_add(1, Release);
}Also move each bump_version() call into the same write-lock critical section as the corresponding mutation to ensure version updates are ordered with respect to dict mutations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/dict_inner.rs` around lines 263 - 270, The version() accessor
and bump_version() updater must use proper Acquire/Release ordering and
bump_version must be called while holding the dict write lock: change version()
to load with Acquire and bump_version() to fetch_add with Release (function
names: version and bump_version in dict_inner.rs), and move every call to
bump_version so it executes inside the same write-lock critical section where
the dictionary mutation occurs (the callers in this repo include the mutation
sites referenced in frame.rs that read the version lock-free for LOAD_GLOBAL
caching); ensure the write lock remains held across the mutation and the
bump_version call so the Release store synchronizes with readers that use
Acquire.
1b34652 to
9df2a06
Compare
- Add debug_assert preconditions in invoke_exact_args - Add get_version_for_current_state() for lazy version reassignment after func_version invalidation - Document NEXT_TYPE_VERSION overflow policy
- COMPARE_OP: CompareOpInt, CompareOpFloat, CompareOpStr - TO_BOOL: ToBoolBool, ToBoolInt, ToBoolNone, ToBoolList, ToBoolStr - FOR_ITER: ForIterRange, ForIterList, ForIterTuple with fast_next() - LOAD_GLOBAL: LoadGlobalModule, LoadGlobalBuiltin with dict version guard - Add version counter to Dict for mutation tracking
…ation - BinaryOpSubscrListInt, BinaryOpSubscrTupleInt, BinaryOpSubscrDict - ContainsOpDict, ContainsOpSet - UnpackSequenceTwoTuple, UnpackSequenceTuple, UnpackSequenceList - StoreAttrInstanceValue with type_version guard - Deoptimize bytecode for marshal serialization (original_bytes) - Separate co_code (deoptimized) from _co_code_adaptive (quickened)
…Isinstance, CallType1 specialization
…al, ForIterGen, CallListAppend specialization
- LoadAttrNondescriptorNoDict: plain class attr on objects without dict - LoadAttrNondescriptorWithValues: plain class attr with dict fallback - LoadAttrClass: handler for type attribute access (not yet routed) - CallMethodDescriptorNoargs: method descriptor with 0 args - CallMethodDescriptorO: method descriptor with 1 arg - CallMethodDescriptorFast: method descriptor with multiple args - Use HAS_DICT flag instead of obj.dict().is_some() for method/nondescriptor routing
- CallBuiltinFast: native function calls with arbitrary positional args - CallNonPyGeneral: fallback for unmatched callables (custom __call__, etc.) - All builtin function calls now specialize (CallBuiltinFast as default) - specialize_call now always produces a specialized instruction
- SendGen: direct coro.send() for generator/coroutine receivers - Add adaptive counter to Send instruction - specialize_send checks builtin_coro for PyGenerator/PyCoroutine
- LoadAttrSlot: direct obj.get_slot(offset) bypassing descriptor protocol - StoreAttrSlot: direct obj.set_slot(offset, value) bypassing descriptor protocol - Detect PyMemberDescriptor with MemberGetter::Offset in specialize_load_attr/store_attr - Cache slot offset in cache_base+3
…ltinFastWithKeywords, CallMethodDescriptorFastWithKeywords specialization
Fix LoadSuperAttrMethod to push unbound descriptor + self instead of bound method + self which caused double self binding. Fix LoadSuperAttrAttr obj_arg condition for classmethod detection.
Remove unnecessary CPython references, FIXME→TODO, redundant Note: prefix, and "Same as" cross-references.
9df2a06 to
0f8da0c
Compare
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [x] lib: cpython/Lib/doctest.py dependencies:
dependent tests: (33 tests)
Legend:
|
Summary by CodeRabbit