Diffly is a Python package for comparing Polars DataFrames with detailed analysis capabilities. It identifies differences between datasets including schema differences, row-level mismatches, missing rows, and column value changes.
You can install diffly using your favorite package manager, e.g., pixi or pip:
pixi add diffly
pip install difflyimport polars as pl
from diffly import compare_frames
left = pl.DataFrame({
"id": ["a", "b", "c"],
"value": [1.0, 2.0, 3.0],
})
right = pl.DataFrame({
"id": ["a", "b", "d"],
"value": [1.0, 2.5, 4.0],
})
comparison = compare_frames(left, right, primary_key="id")
if not comparison.equal():
summary = comparison.summary(
top_k_column_changes=1,
show_sample_primary_key_per_change=True
)
print(summary)┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Diffly Summary ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
Primary key: id
Schemas
▔▔▔▔▔▔▔
Schemas match exactly (column count: 2).
Rows
▔▔▔▔
Left count Right count
3 (no change) 3
┏━┯━┯━┯━┯━┓
┃-│-│-│-│-┃ 1 left only (33.33%)
┠─┼─┼─┼─┼─┨╌╌╌┏━┯━┯━┯━┯━┓╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╮
┃ │ │ │ │ ┃ = ┃ │ │ │ │ ┃ 1 equal (50.00%) │
┠─┼─┼─┼─┼─┨╌╌╌┠─┼─┼─┼─┼─┨╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌├╴ 2 joined
┃ │ │ │ │ ┃ ≠ ┃ │ │ │ │ ┃ 1 unequal (50.00%) │
┗━┷━┷━┷━┷━┛╌╌╌┠─┼─┼─┼─┼─┨╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯
┃+│+│+│+│+┃ 1 right only (33.33%)
┗━┷━┷━┷━┷━┛
Columns
▔▔▔▔▔▔▔
┌───────┬────────┬───────────────────────────┐
│ value │ 50.00% │ 2.0 -> 2.5 (1x, e.g. "b") │
└───────┴────────┴───────────────────────────┘
See more examples in the documentation.