Fix DataFrame.OrderBy to perform stable sorting#7586
Fix DataFrame.OrderBy to perform stable sorting#7586kubaflo wants to merge 4 commits intodotnet:mainfrom
Conversation
ae879f2 to
5c2e34b
Compare
There was a problem hiding this comment.
Pull request overview
Fixes DataFrame.OrderBy() / OrderByDescending() stability so rows with equal sort keys preserve their original relative order, addressing issue #6443.
Changes:
- Added a stable
MergeSortIndicesimplementation and updated heap-merge output ordering to support stable ordering (especially for descending). - Switched per-buffer sorts in primitive and string columns from introsort to stable merge sort; descending now uses a reversed comparer.
- Added new unit tests validating stability across duplicates, nulls, and descending order.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/Microsoft.Data.Analysis/DataFrameColumn.cs |
Adds stable merge sort helper and changes heap merge to write left-to-right (with reversed comparer for descending). |
src/Microsoft.Data.Analysis/PrimitiveDataFrameColumn.Sort.cs |
Replaces per-buffer introsort with stable merge sort; uses reversed comparer for descending; updates heap usage accordingly. |
src/Microsoft.Data.Analysis/DataFrameColumns/StringDataFrameColumn.cs |
Same stable sort + reversed comparer changes for string columns. |
test/Microsoft.Data.Analysis.Tests/DataFrameTests.Sort.cs |
Adds stability regression tests for OrderBy/OrderByDescending and null handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Replace the unstable IntrospectiveSort (introsort) with a stable merge sort implementation for per-buffer sorting in both PrimitiveDataFrameColumn and StringDataFrameColumn. Also fix the multi-buffer merge phase to always write left-to-right using a direction-aware comparer, ensuring stability for both ascending and descending sorts. Fixes dotnet#6443 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
5c2e34b to
3dfcc70
Compare
🔍 Multimodal Code Review — PR #7586: Stable Sort for DataFrame.OrderByReviewed independently by 3 AI models: Claude Sonnet 4.6 · GPT-5.2 · Gemini 3 Pro Preview 🚨 Cross-Model Consensus (All 3 Models Agree)🔴 BUG: Multi-buffer merge still breaks stability for equal keys spanning buffersFile: The LIFO→FIFO switch + Concrete scenario (all 3 models produced equivalent traces): Impact: Only affects DataFrames with columns spanning >1 internal buffer (~1M+ rows for int, ~536M+ for larger types). All 5 new tests use ≤100 rows and never exercise this path. Suggested fix (Gemini's is most precise): When re-inserting into 🟡 WARNING:
|
| Area | Verdict |
|---|---|
Merge sort <= 0 comparison |
✅ Correct for stability (left half wins ties) |
ArrayPool.Shared.Rent/Return |
✅ Correct usage, proper finally cleanup |
| Reversed comparer for descending | ✅ Sound approach, SortedDictionary enumerates in comparer order |
| Recursive depth | ✅ O(log n) per buffer, no stack overflow risk |
Null-index starting position after removing ascending param |
✅ Correct |
| Per-buffer sort (single-buffer path) | ✅ Fully stable and correct |
🔵 Suggestions (Individual Model Findings)
| Model | Finding | Severity |
|---|---|---|
| Gemini | LOH pressure: ArrayPool.Shared doesn't pool arrays >2²⁰ (~1M); 100M-row sorts allocate ~400MB on every call |
ℹ️ Note |
| Claude Sonnet | Code duplication between Merge (Span) and MergeList (IList) — could unify via interface or delegate |
🔵 Suggestion |
| Gemini | Rename LargeDataset test → ManyDuplicates (100 rows isn't "large") |
🔵 Suggestion |
Summary Scorecard
| Category | Rating |
|---|---|
| Correctness (single-buffer) | ✅ Solid |
| Correctness (multi-buffer) | 🔴 Bug remains |
| Test coverage | 🟡 Insufficient for multi-buffer |
| Performance | 🟡 O(n) RemoveAt(0) risk |
| Code quality | ✅ Good, minor duplication |
Bottom line: The per-buffer sort fix (introsort → merge sort) is correct and well-implemented. However, the multi-buffer merge in PopulateColumnSortIndicesWithHeap has a remaining stability bug that all three models independently identified with equivalent reproduction traces. This should be fixed before merge.
Address issues identified by multimodal code review (Claude Sonnet 4.6, GPT-5.2, Gemini 3 Pro): 1. Fix cross-buffer stability: When re-inserting a buffer's next element with the same key into the heap, use AddFirst instead of Add(append). The popped buffer has a lower index than remaining entries, so its next element must come before them to preserve stable order. 2. Fix O(n) RemoveAt(0) performance: Replace List<ValueTuple<int,int>> with LinkedList<ValueTuple<int,int>> as the SortedDictionary value type. LinkedList provides O(1) RemoveFirst/AddFirst vs List's O(n) RemoveAt(0)/Insert(0,...). 3. Rename misleading test: TestOrderBy_StableSort_LargeDataset (100 rows) → TestOrderBy_StableSort_ManyDuplicates. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Round 2 multimodal review (GPT-5.2 + Gemini 3 Pro) identified that AddFirst is incorrect when a higher-indexed buffer reaches a key after a lower-indexed buffer already queued it. Example: Buffer0=[100], Buffer1=[10,100] Pop 10(Buf1) → advance → 100 → AddFirst puts Buf1 before Buf0 → outputs 100(Buf1) before 100(Buf0) — stability violated. Fix: insert new entries in buffer-index order via linear scan of the LinkedList. The list is bounded by buffer count (tiny), so O(k) scan is negligible. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Multimodal Re-Review (Round 2) — PR #7586Reviewed independently by 3 AI models: Claude Sonnet 4.6 · GPT-5.2 · Gemini 3 Pro Preview Round 1 Issues — Status
🔴 NEW BUG Found in Round 2:
|
| Area | Verdict | Models |
|---|---|---|
Merge sort <= 0 comparison |
✅ Stable (left half wins ties) | All 3 |
ArrayPool.Shared.Rent/Return |
✅ Correct | All 3 |
| Reversed comparer for descending | ✅ Sound | All 3 |
LinkedList migration (callers consistent) |
✅ Complete | All 3 |
| Null-index position logic | ✅ Correct | All 3 |
| Per-buffer sort (single-buffer path) | ✅ Fully stable | All 3 |
🟡 Remaining Suggestions
| Model | Finding | Severity |
|---|---|---|
| Claude Sonnet | AddFirst/ordered-insert code path has zero test coverage — all 5 tests use single-buffer data (< MaxCapacity rows). Multi-buffer path is untestable without >536M rows or test hooks. |
🟡 Warning |
| Gemini | LinkedList allocates a LinkedListNode per row processed. For very large datasets this increases GC pressure vs List (where the list is bounded by buffer count). Acceptable trade-off given O(1) removal. |
ℹ️ Note |
| Claude Sonnet | Code duplication between Merge (Span) and MergeList (IList) |
🔵 Suggestion |
Summary Scorecard (Post-Fix)
| Category | Rating |
|---|---|
| Correctness (single-buffer) | ✅ Solid |
| Correctness (multi-buffer) | ✅ Fixed (ordered insert by bufferIndex) |
| Test coverage | 🟡 Good for single-buffer; multi-buffer untestable |
| Performance | ✅ O(1) removal, O(k) insertion (k=buffer count) |
| Code quality | ✅ Good |
All 17 sort tests pass. Fix pushed.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #7586 +/- ##
==========================================
- Coverage 69.07% 69.06% -0.02%
==========================================
Files 1483 1483
Lines 274513 274722 +209
Branches 28285 28305 +20
==========================================
+ Hits 189625 189733 +108
- Misses 77503 77614 +111
+ Partials 7385 7375 -10
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Address Codecov report showing 15 uncovered lines — all in the multi-buffer merge path of PopulateColumnSortIndicesWithHeap. Trick: set StringDataFrameColumn.MaxCapacity to a small value (2-3) to force multiple internal buffers with only a few elements, then verify stable sort across buffer boundaries. 3 new tests: - TestOrderBy_StableSort_MultipleStringBuffers: 2 buffers, duplicate keys spanning both, verifies ascending stability - TestOrderBy_StableSort_MultipleBuffers_InterleavedKeys: exercises the interleaved scenario (higher buffer reaches a key after lower buffer already queued it) that broke AddFirst in round 2 - TestOrderByDescending_StableSort_MultipleBuffers: reversed comparer with multi-buffer data Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Multimodal Review — Round 3 — PR #7586Reviewed by: Claude Sonnet 4.6 ✅ · GPT-5.2 ✅ · Gemini 3 Pro ⏱️ (timed out) Round 2 Issues — Status
✅ No Bugs Found (Round 3)Both Claude Sonnet 4.6 and GPT-5.2 confirm the ordered insertion by Claude Sonnet 4.6 — Full Trace Verification
Key findings:
GPT-5.2 — Confirmed Fix
🟡 One Minor Concern RaisedThread safety of
|
| Area | Verdict |
|---|---|
Merge sort (<= 0 stability) |
✅ Correct |
| ArrayPool usage | ✅ Correct |
| Reversed comparer for descending | ✅ Sound |
| LinkedList migration | ✅ Complete and consistent |
| Ordered insertion by bufferIndex | ✅ Correct (traced end-to-end) |
| Multi-buffer test coverage | ✅ Both AddBefore and AddLast exercised |
| MaxCapacity trick safety | ✅ Safe (parallelization disabled) |
| Null handling | ✅ Correct |
Final Summary
| Category | R1 | R2 | R3 |
|---|---|---|---|
| Correctness (single-buffer) | ✅ | ✅ | ✅ |
| Correctness (multi-buffer) | 🔴 | ✅ Fixed | ✅ Verified |
| Performance | 🟡 | ✅ Fixed | ✅ |
| Test coverage | 🟡 | 🟡 | ✅ Fixed |
| Code quality | ✅ | ✅ | ✅ |
Bottom line: All bugs identified across 3 rounds of multimodal review have been fixed and verified. The PR is ready for merge. 🚀
Summary
Fixes #6443 —
DataFrame.OrderBy()did not perform stable sorting. Rows with equal sort keys were reordered unpredictably instead of preserving their original relative order.Root Cause
Two issues:
IntrospectiveSort(introsort = quicksort + heapsort fallback), which is inherently unstable. The code even had a comment acknowledging this: "Bug fix: QuickSort is not stable."PopulateColumnSortIndicesWithHeapwrote results right-to-left for descending order, which reversed the relative order of equal elements.Fix
Per-buffer sort:
IntrospectiveSort→MergeSortIndicesAdded a stable merge sort implementation to
DataFrameColumn.csand replacedIntrospectiveSortcalls in bothPrimitiveDataFrameColumn.Sort.csandStringDataFrameColumn.cs.Merge sort is:
<= 0comparison favoring the left half on ties)ArrayPool<int>.Sharedto minimize GC pressureDescending sort fix
For descending order, callers now create the
SortedDictionarywith a reversed comparer and sort each buffer in descending order.PopulateColumnSortIndicesWithHeapalways writes left-to-right, preserving stability in both directions. The unusedascendingparameter was removed from the method signature.Files Changed
src/Microsoft.Data.Analysis/DataFrameColumn.csMergeSortIndices(stable merge sort); fixedPopulateColumnSortIndicesWithHeapto always write left-to-right; removed unusedascendingparametersrc/Microsoft.Data.Analysis/PrimitiveDataFrameColumn.Sort.csIntrospectiveSort→MergeSortIndices; use reversed comparer for descendingsrc/Microsoft.Data.Analysis/DataFrameColumns/StringDataFrameColumn.cstest/Microsoft.Data.Analysis.Tests/DataFrameTests.Sort.csTests
5 new tests added:
TestOrderBy_StableSort_PreservesOriginalOrder— Reproduces the exact example from issue DataFrame.OrderBy(string columnName) does not perform stable sorting! #6443 (8 rows, 3 duplicate key groups)TestOrderByDescending_StableSort_PreservesOriginalOrder— Verifies descending stabilityTestOrderBy_StableSort_WithNullsAndDuplicates— Stability with null values mixed inTestStringColumnSort_StableSort— String column stabilityTestOrderBy_StableSort_LargeDataset— 100 rows with 5 duplicate keys, triggers quicksort's unstable code path (partitions > 16 elements). This test fails without the fix (Unstable sort at row 3: Key=0, ID=25 should be > previous ID=90) and passes with it.All 470 existing + new tests pass, 0 failures.
Multi-Model AI Review (Cross-Pollination)
This fix was developed with AI assistance and then reviewed by two additional AI models to catch issues:
Gemini 3 Pro Review
<= 0comparison favors left half on ties)ArrayPool<int>.Sharedinstead ofnew int[]to reduce GC pressure → Appliedascendingparameter → AppliedGPT-5.2 Review
ArrayPoolrecommendation → AppliedPopulateColumnSortIndicesWithHeapcould affect stability when duplicate keys span multiple buffers (>536M rows forint). This is a pre-existing issue not introduced by this change, and is impractical to test. Added a note in the test file.Both models agreed the fix is correct. All actionable suggestions were applied.