Skip to content

Specialized ops#7301

Draft
youknowone wants to merge 21 commits intoRustPython:mainfrom
youknowone:specialization
Draft

Specialized ops#7301
youknowone wants to merge 21 commits intoRustPython:mainfrom
youknowone:specialization

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Mar 1, 2026

Summary by CodeRabbit

  • Refactor
    • Added internal version tracking to dictionaries and functions
    • Optimized iteration performance for built-in collections (lists, tuples, ranges) with new fast-path methods
    • Enhanced VM type versioning system with improved counter management and property accessor methods

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 1, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The changes implement mutation-tracking infrastructure and optimization fast-paths across VM built-in types. A version counter system is introduced to the dictionary layer, enabling downstream mutation detection. Iterator types gain fast-path accessors, and PyFunction/PyType receive version management capabilities to support specialization and caching optimizations.

Changes

Cohort / File(s) Summary
Dictionary Mutation Tracking
crates/vm/src/dict_inner.rs, crates/vm/src/builtins/dict.rs
Added monotonically increasing version counter (AtomicU64) to track dictionary mutations. Dict now maintains a version field with public accessor, internal bump_version() helper, and instrumentation across insert, clear, delete, pop, setdefault, and related mutation paths. PyDict delegates via new version() method.
Iterator Fast-Paths
crates/vm/src/builtins/list.rs, crates/vm/src/builtins/tuple.rs, crates/vm/src/builtins/range.rs
Added pub(crate) fast_next() methods to PyListIterator, PyTupleIterator, and PyRangeIterator. These allocation-free fast-path entry points enable FOR_ITER specialization: list/tuple variants return Option, range variant returns Option. Existing next() implementations unchanged.
Version Management
crates/vm/src/builtins/function.rs, crates/vm/src/builtins/type.rs
PyFunction gains pub get_version_for_current_state() that manages global FUNC_VERSION_COUNTER with overflow handling. PyType introduces NEXT_TYPE_VERSION static counter and updates assign_version_tag() to return 0 on counter exhaustion via checked_add. Added runtime debug assertions in invoke_exact_args to validate exact-args fast-path preconditions.
API Expansion
crates/vm/src/builtins/object.rs, crates/vm/src/builtins/property.rs
Visibility change: slot_setattro on PyBaseObject now pub(crate) (was private). PyProperty gains new pub(crate) get_fget() accessor method mirroring existing fget getter functionality.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hops through versions, fast paths now clear,
Mutations tracked without a fear,
Iterators swift, no waste in sight,
Version tags keep caches bright!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Specialized ops' is vague and generic, using non-descriptive terms that don't convey meaningful information about the specific changes in the changeset. Use a more descriptive title that explains which operations are being specialized or what the main optimization goal is, such as 'Add version tracking and fast-path accessors for iteration specialization'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone force-pushed the specialization branch 3 times, most recently from 8fc678d to 57cd6b2 Compare March 2, 2026 04:16
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Code has been automatically formatted

The code in this PR has been formatted using:

  • cargo fmt --all
    Please pull the latest changes before pushing again:
git pull origin specialization

@youknowone youknowone marked this pull request as ready for review March 2, 2026 07:53
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
crates/vm/src/builtins/range.rs (1)

663-674: Avoid duplicating step logic between next_fast and fast_next.

Both methods currently implement the same increment/check path. Please keep one source of truth to prevent divergence.

♻️ Suggested simplification
 impl PyRangeIterator {
     /// Fast path for FOR_ITER specialization. Returns the next isize value
     /// without allocating PyInt or PyIterReturn.
     pub(crate) fn fast_next(&self) -> Option<isize> {
-        let index = self.index.fetch_add(1);
-        if index < self.length {
-            Some(self.start + (index as isize) * self.step)
-        } else {
-            None
-        }
+        self.next_fast()
     }
 }

As per coding guidelines: "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/range.rs` around lines 663 - 674, The iteration logic
in PyRangeIterator is duplicated between fast_next and fast_next (the fast path)
— extract the shared increment/check logic into a single helper (e.g., a private
method on PyRangeIterator like next_index_or_none or advance_and_get_index) and
have both next_fast and fast_next call that helper to compute the next
index/value; ensure the helper uses self.index.fetch_add(1), compares to
self.length, and returns either the computed isize value (start + index * step)
or None so both next_fast and fast_next reuse the same implementation and avoid
divergence.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 611-621: The get_version_for_current_state implementation allows
FUNC_VERSION_COUNTER to wrap and reuse version tags; update it to atomically
advance FUNC_VERSION_COUNTER without ever returning a recycled nonzero version
by using an atomic compare-and-swap/loop (or Atomics::fetch_update) that: loads
the current counter (FUNC_VERSION_COUNTER), if it's 0 or u32::MAX treat as
exhausted and return 0, otherwise compute next = current.wrapping_add(1) and
attempt compare_exchange to set it to next; once you successfully set the global
counter, store that nonzero new version into self.func_version and return it;
reference get_version_for_current_state, FUNC_VERSION_COUNTER, and
self.func_version for locating the change.

In `@crates/vm/src/dict_inner.rs`:
- Around line 263-270: The version() accessor and bump_version() updater must
use proper Acquire/Release ordering and bump_version must be called while
holding the dict write lock: change version() to load with Acquire and
bump_version() to fetch_add with Release (function names: version and
bump_version in dict_inner.rs), and move every call to bump_version so it
executes inside the same write-lock critical section where the dictionary
mutation occurs (the callers in this repo include the mutation sites referenced
in frame.rs that read the version lock-free for LOAD_GLOBAL caching); ensure the
write lock remains held across the mutation and the bump_version call so the
Release store synchronizes with readers that use Acquire.

---

Nitpick comments:
In `@crates/vm/src/builtins/range.rs`:
- Around line 663-674: The iteration logic in PyRangeIterator is duplicated
between fast_next and fast_next (the fast path) — extract the shared
increment/check logic into a single helper (e.g., a private method on
PyRangeIterator like next_index_or_none or advance_and_get_index) and have both
next_fast and fast_next call that helper to compute the next index/value; ensure
the helper uses self.index.fetch_add(1), compares to self.length, and returns
either the computed isize value (start + index * step) or None so both next_fast
and fast_next reuse the same implementation and avoid divergence.

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c315033 and b870090.

📒 Files selected for processing (10)
  • crates/vm/src/builtins/dict.rs
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/builtins/list.rs
  • crates/vm/src/builtins/object.rs
  • crates/vm/src/builtins/property.rs
  • crates/vm/src/builtins/range.rs
  • crates/vm/src/builtins/tuple.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/dict_inner.rs
  • crates/vm/src/frame.rs

Comment on lines 611 to 621
pub fn get_version_for_current_state(&self) -> u32 {
let v = self.func_version.load(Relaxed);
if v != 0 {
return v;
}
let new_v = FUNC_VERSION_COUNTER.fetch_add(1, Relaxed);
if new_v == 0 {
return 0; // Counter overflow
}
self.func_version.store(new_v, Relaxed);
new_v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Rust std::sync::atomic::AtomicU32 fetch_add overflow behavior wrapping

💡 Result:

std::sync::atomic::AtomicU32::fetch_add(val, order) performs wrapping (modulo 2³²) addition on overflow and returns the previous value (it does not panic on overflow). (doc.rust-lang.org)

So if the atomic currently holds u32::MAX, then fetch_add(1, …) will store 0 (wraparound) and return u32::MAX. (doc.rust-lang.org)

Citations:


Prevent version-tag reuse after counter overflow.

fetch_add wraps on overflow. The current check (if new_v == 0) only catches exhaustion on the first wrap; after u32::MAX increments, the counter cycles back and old version tags become reusable. This violates the "no new valid versions after exhaustion" contract.

🛠️ Suggested fix (checked increment loop)
 pub fn get_version_for_current_state(&self) -> u32 {
     let v = self.func_version.load(Relaxed);
     if v != 0 {
         return v;
     }
-    let new_v = FUNC_VERSION_COUNTER.fetch_add(1, Relaxed);
-    if new_v == 0 {
-        return 0; // Counter overflow
-    }
-    self.func_version.store(new_v, Relaxed);
-    new_v
+    loop {
+        let current = FUNC_VERSION_COUNTER.load(Relaxed);
+        let Some(next) = current.checked_add(1) else {
+            return 0; // Exhausted permanently
+        };
+        if FUNC_VERSION_COUNTER
+            .compare_exchange_weak(current, next, Relaxed, Relaxed)
+            .is_ok()
+        {
+            self.func_version.store(current, Relaxed);
+            return current;
+        }
+    }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/function.rs` around lines 611 - 621, The
get_version_for_current_state implementation allows FUNC_VERSION_COUNTER to wrap
and reuse version tags; update it to atomically advance FUNC_VERSION_COUNTER
without ever returning a recycled nonzero version by using an atomic
compare-and-swap/loop (or Atomics::fetch_update) that: loads the current counter
(FUNC_VERSION_COUNTER), if it's 0 or u32::MAX treat as exhausted and return 0,
otherwise compute next = current.wrapping_add(1) and attempt compare_exchange to
set it to next; once you successfully set the global counter, store that nonzero
new version into self.func_version and return it; reference
get_version_for_current_state, FUNC_VERSION_COUNTER, and self.func_version for
locating the change.

Comment on lines +263 to +270
pub fn version(&self) -> u64 {
self.version.load(Relaxed)
}

/// Bump the version counter after any mutation.
fn bump_version(&self) {
self.version.fetch_add(1, Relaxed);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# 1) Show where version bumps happen relative to lock scopes.
rg -n -C4 'bump_version\(' crates/vm/src/dict_inner.rs

Repository: RustPython/RustPython

Length of output: 2663


🏁 Script executed:

# 2) Show consumers of dict version values (to validate whether reads are lock-free).
rg -n -C4 '\.version\(\)' crates/vm/src

Repository: RustPython/RustPython

Length of output: 3008


🏁 Script executed:

# 3) Get the full context around lines 263-270 and surrounding code
head -n 300 crates/vm/src/dict_inner.rs | tail -n +250

Repository: RustPython/RustPython

Length of output: 1941


Use Acquire/Release ordering for version field and perform version bumps under the write lock.

The version() and bump_version() methods use Relaxed ordering. Currently, bump_version() is called after the write lock is released at lines 319, 383, 441, 465, 499, 536, 738, and 756. Meanwhile, version() is read lock-free in frame.rs (lines 4644, 4680, 7076) to validate LOAD_GLOBAL specialization caches. With Relaxed ordering, a reader may see stale version values after mutations complete, causing cache validation to incorrectly accept invalidated caches.

🔧 Suggested direction
-use core::sync::atomic::{AtomicU64, Ordering::Relaxed};
+use core::sync::atomic::{AtomicU64, Ordering::{Acquire, Release}};
...
 pub fn version(&self) -> u64 {
-    self.version.load(Relaxed)
+    self.version.load(Acquire)
 }
...
 fn bump_version(&self) {
-    self.version.fetch_add(1, Relaxed);
+    self.version.fetch_add(1, Release);
 }

Also move each bump_version() call into the same write-lock critical section as the corresponding mutation to ensure version updates are ordered with respect to dict mutations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/dict_inner.rs` around lines 263 - 270, The version() accessor
and bump_version() updater must use proper Acquire/Release ordering and
bump_version must be called while holding the dict write lock: change version()
to load with Acquire and bump_version() to fetch_add with Release (function
names: version and bump_version in dict_inner.rs), and move every call to
bump_version so it executes inside the same write-lock critical section where
the dictionary mutation occurs (the callers in this repo include the mutation
sites referenced in frame.rs that read the version lock-free for LOAD_GLOBAL
caching); ensure the write lock remains held across the mutation and the
bump_version call so the Release store synchronizes with readers that use
Acquire.

@youknowone youknowone marked this pull request as draft March 2, 2026 08:01
@youknowone youknowone force-pushed the specialization branch 3 times, most recently from 1b34652 to 9df2a06 Compare March 2, 2026 11:47
- Add debug_assert preconditions in invoke_exact_args
- Add get_version_for_current_state() for lazy version reassignment
  after func_version invalidation
- Document NEXT_TYPE_VERSION overflow policy
- COMPARE_OP: CompareOpInt, CompareOpFloat, CompareOpStr
- TO_BOOL: ToBoolBool, ToBoolInt, ToBoolNone, ToBoolList, ToBoolStr
- FOR_ITER: ForIterRange, ForIterList, ForIterTuple with fast_next()
- LOAD_GLOBAL: LoadGlobalModule, LoadGlobalBuiltin with dict version guard
- Add version counter to Dict for mutation tracking
…ation

- BinaryOpSubscrListInt, BinaryOpSubscrTupleInt, BinaryOpSubscrDict
- ContainsOpDict, ContainsOpSet
- UnpackSequenceTwoTuple, UnpackSequenceTuple, UnpackSequenceList
- StoreAttrInstanceValue with type_version guard
- Deoptimize bytecode for marshal serialization (original_bytes)
- Separate co_code (deoptimized) from _co_code_adaptive (quickened)
…al, ForIterGen, CallListAppend specialization
- LoadAttrNondescriptorNoDict: plain class attr on objects without dict
- LoadAttrNondescriptorWithValues: plain class attr with dict fallback
- LoadAttrClass: handler for type attribute access (not yet routed)
- CallMethodDescriptorNoargs: method descriptor with 0 args
- CallMethodDescriptorO: method descriptor with 1 arg
- CallMethodDescriptorFast: method descriptor with multiple args
- Use HAS_DICT flag instead of obj.dict().is_some() for method/nondescriptor routing
- CallBuiltinFast: native function calls with arbitrary positional args
- CallNonPyGeneral: fallback for unmatched callables (custom __call__, etc.)
- All builtin function calls now specialize (CallBuiltinFast as default)
- specialize_call now always produces a specialized instruction
- SendGen: direct coro.send() for generator/coroutine receivers
- Add adaptive counter to Send instruction
- specialize_send checks builtin_coro for PyGenerator/PyCoroutine
- LoadAttrSlot: direct obj.get_slot(offset) bypassing descriptor protocol
- StoreAttrSlot: direct obj.set_slot(offset, value) bypassing descriptor protocol
- Detect PyMemberDescriptor with MemberGetter::Offset in specialize_load_attr/store_attr
- Cache slot offset in cache_base+3
…ltinFastWithKeywords, CallMethodDescriptorFastWithKeywords specialization
Fix LoadSuperAttrMethod to push unbound descriptor + self
instead of bound method + self which caused double self binding.
Fix LoadSuperAttrAttr obj_arg condition for classmethod detection.
Remove unnecessary CPython references, FIXME→TODO,
redundant Note: prefix, and "Same as" cross-references.
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] lib: cpython/Lib/doctest.py
[ ] test: cpython/Lib/test/test_doctest (TODO: 6)

dependencies:

  • doctest

dependent tests: (33 tests)

  • doctest: test_builtin test_cmd test_code test_collections test_ctypes test_decimal test_deque test_descrtut test_difflib test_doctest test_doctest2 test_enum test_extcall test_generators test_getopt test_heapq test_http_cookies test_itertools test_listcomps test_math test_metaclass test_pep646_syntax test_pickle test_pickletools test_setcomps test_statistics test_syntax test_threading_local test_typing test_unpack test_unpack_ex test_weakref test_zipimport

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant