Specialized ops by youknowone · Pull Request #7301 · RustPython/RustPython

youknowone · 2026-03-01T15:56:36Z

Summary by CodeRabbit

Refactor
- Added internal version tracking to dictionaries and functions
- Optimized iteration performance for built-in collections (lists, tuples, ranges) with new fast-path methods
- Enhanced VM type versioning system with improved counter management and property accessor methods

coderabbitai · 2026-03-01T15:56:47Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

The changes implement mutation-tracking infrastructure and optimization fast-paths across VM built-in types. A version counter system is introduced to the dictionary layer, enabling downstream mutation detection. Iterator types gain fast-path accessors, and PyFunction/PyType receive version management capabilities to support specialization and caching optimizations.

Changes

Cohort / File(s)	Summary
Dictionary Mutation Tracking `crates/vm/src/dict_inner.rs`, `crates/vm/src/builtins/dict.rs`	Added monotonically increasing version counter (AtomicU64) to track dictionary mutations. Dict now maintains a version field with public accessor, internal bump_version() helper, and instrumentation across insert, clear, delete, pop, setdefault, and related mutation paths. PyDict delegates via new version() method.
Iterator Fast-Paths `crates/vm/src/builtins/list.rs`, `crates/vm/src/builtins/tuple.rs`, `crates/vm/src/builtins/range.rs`	Added pub(crate) fast_next() methods to PyListIterator, PyTupleIterator, and PyRangeIterator. These allocation-free fast-path entry points enable FOR_ITER specialization: list/tuple variants return Option, range variant returns Option. Existing next() implementations unchanged.
Version Management `crates/vm/src/builtins/function.rs`, `crates/vm/src/builtins/type.rs`	PyFunction gains pub get_version_for_current_state() that manages global FUNC_VERSION_COUNTER with overflow handling. PyType introduces NEXT_TYPE_VERSION static counter and updates assign_version_tag() to return 0 on counter exhaustion via checked_add. Added runtime debug assertions in invoke_exact_args to validate exact-args fast-path preconditions.
API Expansion `crates/vm/src/builtins/object.rs`, `crates/vm/src/builtins/property.rs`	Visibility change: slot_setattro on PyBaseObject now pub(crate) (was private). PyProperty gains new pub(crate) get_fget() accessor method mirroring existing fget getter functionality.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hops through versions, fast paths now clear,
Mutations tracked without a fear,
Iterators swift, no waste in sight,
Version tags keep caches bright! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Specialized ops' is vague and generic, using non-descriptive terms that don't convey meaningful information about the specific changes in the changeset.	Use a more descriptive title that explains which operations are being specialized or what the main optimization goal is, such as 'Add version tracking and fast-path accessors for iteration specialization'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-02T04:22:27Z

Code has been automatically formatted

The code in this PR has been formatted using:

cargo fmt --all
Please pull the latest changes before pushing again:

git pull origin specialization

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

crates/vm/src/builtins/range.rs (1)

663-674: Avoid duplicating step logic between next_fast and fast_next.

Both methods currently implement the same increment/check path. Please keep one source of truth to prevent divergence.

♻️ Suggested simplification

 impl PyRangeIterator {
     /// Fast path for FOR_ITER specialization. Returns the next isize value
     /// without allocating PyInt or PyIterReturn.
     pub(crate) fn fast_next(&self) -> Option<isize> {
-        let index = self.index.fetch_add(1);
-        if index < self.length {
-            Some(self.start + (index as isize) * self.step)
-        } else {
-            None
-        }
+        self.next_fast()
     }
 }

As per coding guidelines: "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/range.rs` around lines 663 - 674, The iteration logic
in PyRangeIterator is duplicated between fast_next and fast_next (the fast path)
— extract the shared increment/check logic into a single helper (e.g., a private
method on PyRangeIterator like next_index_or_none or advance_and_get_index) and
have both next_fast and fast_next call that helper to compute the next
index/value; ensure the helper uses self.index.fetch_add(1), compares to
self.length, and returns either the computed isize value (start + index * step)
or None so both next_fast and fast_next reuse the same implementation and avoid
divergence.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 611-621: The get_version_for_current_state implementation allows
FUNC_VERSION_COUNTER to wrap and reuse version tags; update it to atomically
advance FUNC_VERSION_COUNTER without ever returning a recycled nonzero version
by using an atomic compare-and-swap/loop (or Atomics::fetch_update) that: loads
the current counter (FUNC_VERSION_COUNTER), if it's 0 or u32::MAX treat as
exhausted and return 0, otherwise compute next = current.wrapping_add(1) and
attempt compare_exchange to set it to next; once you successfully set the global
counter, store that nonzero new version into self.func_version and return it;
reference get_version_for_current_state, FUNC_VERSION_COUNTER, and
self.func_version for locating the change.

In `@crates/vm/src/dict_inner.rs`:
- Around line 263-270: The version() accessor and bump_version() updater must
use proper Acquire/Release ordering and bump_version must be called while
holding the dict write lock: change version() to load with Acquire and
bump_version() to fetch_add with Release (function names: version and
bump_version in dict_inner.rs), and move every call to bump_version so it
executes inside the same write-lock critical section where the dictionary
mutation occurs (the callers in this repo include the mutation sites referenced
in frame.rs that read the version lock-free for LOAD_GLOBAL caching); ensure the
write lock remains held across the mutation and the bump_version call so the
Release store synchronizes with readers that use Acquire.

---

Nitpick comments:
In `@crates/vm/src/builtins/range.rs`:
- Around line 663-674: The iteration logic in PyRangeIterator is duplicated
between fast_next and fast_next (the fast path) — extract the shared
increment/check logic into a single helper (e.g., a private method on
PyRangeIterator like next_index_or_none or advance_and_get_index) and have both
next_fast and fast_next call that helper to compute the next index/value; ensure
the helper uses self.index.fetch_add(1), compares to self.length, and returns
either the computed isize value (start + index * step) or None so both next_fast
and fast_next reuse the same implementation and avoid divergence.

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c315033 and b870090.

📒 Files selected for processing (10)

crates/vm/src/builtins/dict.rs
crates/vm/src/builtins/function.rs
crates/vm/src/builtins/list.rs
crates/vm/src/builtins/object.rs
crates/vm/src/builtins/property.rs
crates/vm/src/builtins/range.rs
crates/vm/src/builtins/tuple.rs
crates/vm/src/builtins/type.rs
crates/vm/src/dict_inner.rs
crates/vm/src/frame.rs

coderabbitai · 2026-03-02T08:01:20Z

crates/vm/src/builtins/function.rs

+    pub fn get_version_for_current_state(&self) -> u32 {
+        let v = self.func_version.load(Relaxed);
+        if v != 0 {
+            return v;
+        }
+        let new_v = FUNC_VERSION_COUNTER.fetch_add(1, Relaxed);
+        if new_v == 0 {
+            return 0; // Counter overflow
+        }
+        self.func_version.store(new_v, Relaxed);
+        new_v


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Rust std::sync::atomic::AtomicU32 fetch_add overflow behavior wrapping

💡 Result:

std::sync::atomic::AtomicU32::fetch_add(val, order) performs wrapping (modulo 2³²) addition on overflow and returns the previous value (it does not panic on overflow). (doc.rust-lang.org)

So if the atomic currently holds u32::MAX, then fetch_add(1, …) will store 0 (wraparound) and return u32::MAX. (doc.rust-lang.org)

Citations:

1: https://doc.rust-lang.org/beta/std/sync/atomic/struct.AtomicU32.html?utm_source=openai

2: https://doc.rust-lang.org/beta/std/sync/atomic/struct.AtomicU32.html?utm_source=openai

Prevent version-tag reuse after counter overflow.

fetch_add wraps on overflow. The current check (if new_v == 0) only catches exhaustion on the first wrap; after u32::MAX increments, the counter cycles back and old version tags become reusable. This violates the "no new valid versions after exhaustion" contract.

🛠️ Suggested fix (checked increment loop)

pub fn get_version_for_current_state(&self) -> u32 { let v = self.func_version.load(Relaxed); if v != 0 { return v; } - let new_v = FUNC_VERSION_COUNTER.fetch_add(1, Relaxed); - if new_v == 0 { - return 0; // Counter overflow - } - self.func_version.store(new_v, Relaxed); - new_v + loop { + let current = FUNC_VERSION_COUNTER.load(Relaxed); + let Some(next) = current.checked_add(1) else { + return 0; // Exhausted permanently + }; + if FUNC_VERSION_COUNTER + .compare_exchange_weak(current, next, Relaxed, Relaxed) + .is_ok() + { + self.func_version.store(current, Relaxed); + return current; + } + } }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/function.rs` around lines 611 - 621, The get_version_for_current_state implementation allows FUNC_VERSION_COUNTER to wrap and reuse version tags; update it to atomically advance FUNC_VERSION_COUNTER without ever returning a recycled nonzero version by using an atomic compare-and-swap/loop (or Atomics::fetch_update) that: loads the current counter (FUNC_VERSION_COUNTER), if it's 0 or u32::MAX treat as exhausted and return 0, otherwise compute next = current.wrapping_add(1) and attempt compare_exchange to set it to next; once you successfully set the global counter, store that nonzero new version into self.func_version and return it; reference get_version_for_current_state, FUNC_VERSION_COUNTER, and self.func_version for locating the change.

coderabbitai · 2026-03-02T08:01:20Z

crates/vm/src/dict_inner.rs

+    pub fn version(&self) -> u64 {
+        self.version.load(Relaxed)
+    }
+
+    /// Bump the version counter after any mutation.
+    fn bump_version(&self) {
+        self.version.fetch_add(1, Relaxed);
+    }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# 1) Show where version bumps happen relative to lock scopes. rg -n -C4 'bump_version\(' crates/vm/src/dict_inner.rs

Repository: RustPython/RustPython

Length of output: 2663

🏁 Script executed:

# 2) Show consumers of dict version values (to validate whether reads are lock-free). rg -n -C4 '\.version\(\)' crates/vm/src

Repository: RustPython/RustPython

Length of output: 3008

🏁 Script executed:

# 3) Get the full context around lines 263-270 and surrounding code head -n 300 crates/vm/src/dict_inner.rs | tail -n +250

Repository: RustPython/RustPython

Length of output: 1941

Use Acquire/Release ordering for version field and perform version bumps under the write lock.

The version() and bump_version() methods use Relaxed ordering. Currently, bump_version() is called after the write lock is released at lines 319, 383, 441, 465, 499, 536, 738, and 756. Meanwhile, version() is read lock-free in frame.rs (lines 4644, 4680, 7076) to validate LOAD_GLOBAL specialization caches. With Relaxed ordering, a reader may see stale version values after mutations complete, causing cache validation to incorrectly accept invalidated caches.

🔧 Suggested direction

-use core::sync::atomic::{AtomicU64, Ordering::Relaxed}; +use core::sync::atomic::{AtomicU64, Ordering::{Acquire, Release}}; ... pub fn version(&self) -> u64 { - self.version.load(Relaxed) + self.version.load(Acquire) } ... fn bump_version(&self) { - self.version.fetch_add(1, Relaxed); + self.version.fetch_add(1, Release); }

Also move each bump_version() call into the same write-lock critical section as the corresponding mutation to ensure version updates are ordered with respect to dict mutations.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/dict_inner.rs` around lines 263 - 270, The version() accessor and bump_version() updater must use proper Acquire/Release ordering and bump_version must be called while holding the dict write lock: change version() to load with Acquire and bump_version() to fetch_add with Release (function names: version and bump_version in dict_inner.rs), and move every call to bump_version so it executes inside the same write-lock critical section where the dictionary mutation occurs (the callers in this repo include the mutation sites referenced in frame.rs that read the version lock-free for LOAD_GLOBAL caching); ensure the write lock remains held across the mutation and the bump_version call so the Release store synchronizes with readers that use Acquire.

- Add debug_assert preconditions in invoke_exact_args - Add get_version_for_current_state() for lazy version reassignment after func_version invalidation - Document NEXT_TYPE_VERSION overflow policy

- COMPARE_OP: CompareOpInt, CompareOpFloat, CompareOpStr - TO_BOOL: ToBoolBool, ToBoolInt, ToBoolNone, ToBoolList, ToBoolStr - FOR_ITER: ForIterRange, ForIterList, ForIterTuple with fast_next() - LOAD_GLOBAL: LoadGlobalModule, LoadGlobalBuiltin with dict version guard - Add version counter to Dict for mutation tracking

…ation - BinaryOpSubscrListInt, BinaryOpSubscrTupleInt, BinaryOpSubscrDict - ContainsOpDict, ContainsOpSet - UnpackSequenceTwoTuple, UnpackSequenceTuple, UnpackSequenceList - StoreAttrInstanceValue with type_version guard - Deoptimize bytecode for marshal serialization (original_bytes) - Separate co_code (deoptimized) from _co_code_adaptive (quickened)

…Isinstance, CallType1 specialization

…al, ForIterGen, CallListAppend specialization

- LoadAttrNondescriptorNoDict: plain class attr on objects without dict - LoadAttrNondescriptorWithValues: plain class attr with dict fallback - LoadAttrClass: handler for type attribute access (not yet routed) - CallMethodDescriptorNoargs: method descriptor with 0 args - CallMethodDescriptorO: method descriptor with 1 arg - CallMethodDescriptorFast: method descriptor with multiple args - Use HAS_DICT flag instead of obj.dict().is_some() for method/nondescriptor routing

- CallBuiltinFast: native function calls with arbitrary positional args - CallNonPyGeneral: fallback for unmatched callables (custom __call__, etc.) - All builtin function calls now specialize (CallBuiltinFast as default) - specialize_call now always produces a specialized instruction

- SendGen: direct coro.send() for generator/coroutine receivers - Add adaptive counter to Send instruction - specialize_send checks builtin_coro for PyGenerator/PyCoroutine

- LoadAttrSlot: direct obj.get_slot(offset) bypassing descriptor protocol - StoreAttrSlot: direct obj.set_slot(offset, value) bypassing descriptor protocol - Detect PyMemberDescriptor with MemberGetter::Offset in specialize_load_attr/store_attr - Cache slot offset in cache_base+3

…ltinFastWithKeywords, CallMethodDescriptorFastWithKeywords specialization

Fix LoadSuperAttrMethod to push unbound descriptor + self instead of bound method + self which caused double self binding. Fix LoadSuperAttrAttr obj_arg condition for classmethod detection.

Remove unnecessary CPython references, FIXME→TODO, redundant Note: prefix, and "Same as" cross-references.

github-actions · 2026-03-02T12:18:47Z

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] lib: cpython/Lib/doctest.py
[ ] test: cpython/Lib/test/test_doctest (TODO: 6)

dependencies:

doctest

dependent tests: (33 tests)

doctest: test_builtin test_cmd test_code test_collections test_ctypes test_decimal test_deque test_descrtut test_difflib test_doctest test_doctest2 test_enum test_extcall test_generators test_getopt test_heapq test_http_cookies test_itertools test_listcomps test_math test_metaclass test_pep646_syntax test_pickle test_pickletools test_setcomps test_statistics test_syntax test_threading_local test_typing test_unpack test_unpack_ex test_weakref test_zipimport

Legend:

[+] path exists in CPython
[x] up-to-date, [ ] outdated

youknowone force-pushed the specialization branch 3 times, most recently from 8fc678d to 57cd6b2 Compare March 2, 2026 04:16

youknowone force-pushed the specialization branch from 843794e to 5ac7f95 Compare March 2, 2026 07:43

youknowone marked this pull request as ready for review March 2, 2026 07:53

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

youknowone marked this pull request as draft March 2, 2026 08:01

youknowone force-pushed the specialization branch 3 times, most recently from 1b34652 to 9df2a06 Compare March 2, 2026 11:47

youknowone added 18 commits March 2, 2026 21:09

Add debug_assert to invoke_exact_args, lazy func_version reassignment

12f3cde

- Add debug_assert preconditions in invoke_exact_args - Add get_version_for_current_state() for lazy version reassignment after func_version invalidation - Document NEXT_TYPE_VERSION overflow policy

working

2ddfb7d

Add STORE_SUBSCR, BinaryOpAddUnicode, ToBoolAlwaysTrue, CallLen, Call…

63adb74

…Isinstance, CallType1 specialization

Add BinaryOpSubscrStrInt, CallStr1, CallTuple1 specialization

01bc2ae

Add BinaryOpInplaceAddUnicode specialization

c41dde4

Add LoadAttrModule, CallBuiltinO, CallPyGeneral, CallBoundMethodGener…

785e121

…al, ForIterGen, CallListAppend specialization

Add SendGen specialization for generator/coroutine send

4e99c10

- SendGen: direct coro.send() for generator/coroutine receivers - Add adaptive counter to Send instruction - specialize_send checks builtin_coro for PyGenerator/PyCoroutine

Add LoadSuperAttrAttr, LoadSuperAttrMethod, CallBuiltinClass, CallBui…

0f09ed7

…ltinFastWithKeywords, CallMethodDescriptorFastWithKeywords specialization

Add LoadAttrProperty specialization for property descriptor access

4ab6fb4

Add LoadAttrClass specialization for class attribute access

011e0db

Add BinaryOpSubscrListSlice specialization

d8afb5b

Add CallKwPy, CallKwBoundMethod, CallKwNonPy specialization

7fc4656

Fix LoadSuperAttrMethod to push unbound descriptor + self instead of bound method + self which caused double self binding. Fix LoadSuperAttrAttr obj_arg condition for classmethod detection.

Clean up comments in specialization code

beec790

Remove unnecessary CPython references, FIXME→TODO, redundant Note: prefix, and "Same as" cross-references.

youknowone added 2 commits March 2, 2026 21:09

fix doctest

c51df5d

fix check_signals

0f8da0c

youknowone force-pushed the specialization branch from 9df2a06 to 0f8da0c Compare March 2, 2026 12:10

Auto-format: cargo fmt --all

87bd158

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specialized ops#7301

Specialized ops#7301
youknowone wants to merge 21 commits intoRustPython:mainfrom
youknowone:specialization

youknowone commented Mar 1, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 1, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 2, 2026

Uh oh!

coderabbitai bot Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

youknowone commented Mar 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📦 Library Dependencies

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

youknowone commented Mar 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 1, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading