Skip to content

Migrate from pandas to polars#11

Merged
gitronald merged 9 commits intodevfrom
update/pandas-to-polars
Mar 13, 2026
Merged

Migrate from pandas to polars#11
gitronald merged 9 commits intodevfrom
update/pandas-to-polars

Conversation

@gitronald
Copy link
Copy Markdown
Owner

Summary

  • Replace pandas/numpy with polars across the entire codebase
  • Fix multi-parent concatenation bug in add_parent_nodes
  • Add integration tests with real abortion tree fixture data
  • Add CI workflow for automated testing

Metanode fix examples

Before (multi-parent concatenation corrupted diffs):

source:     "abortion in farsi"
parent:     "abortion meaning in farsi abortion definition in farsi"  (TWO parents joined)
source_add: "farsi farsi"  (duplicated tokens from bloated parent set)

After (single parent selected):

source:     "abortion in farsi"
parent:     "abortion meaning in farsi"  (one parent)
source_add: "farsi"

Overall impact on test data (12,112 edges):

Metric Before After
Repeated tokens in source_add 339 27
Repeated tokens in target_add 78 58
Parent strings > 100 chars 716 0
Null source_add 4 0

Known remaining issues

  • Case-sensitive token diff breaks on Google entity suggestions (41 rows)
  • Bare print() in _compute_metanode circle-back path (917 invocations)
  • Missing unit tests for multi-parent edge cases

@gitronald gitronald merged commit 1035277 into dev Mar 13, 2026
4 checks passed
@gitronald gitronald deleted the update/pandas-to-polars branch March 21, 2026 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant