Skip to content

Quantco/diffly

Repository files navigation


diffly — A utility package for comparing 🐻‍❄️ DataFrames

CI conda-forge pypi-version python-version codecov

🗂 Table of Contents

📖 Introduction

Diffly is a Python package for comparing Polars DataFrames with detailed analysis capabilities. It identifies differences between datasets including schema differences, row-level mismatches, missing rows, and column value changes.

💿 Installation

You can install diffly using your favorite package manager, e.g., pixi or pip:

pixi add diffly
pip install diffly

🎯 Usage

import polars as pl
from diffly import compare_frames

left = pl.DataFrame({
    "id": ["a", "b", "c"],
    "value": [1.0, 2.0, 3.0],
})

right = pl.DataFrame({
    "id": ["a", "b", "d"],
    "value": [1.0, 2.5, 4.0],
})

comparison = compare_frames(left, right, primary_key="id")

if not comparison.equal():
    summary = comparison.summary(
        top_k_column_changes=1,
        show_sample_primary_key_per_change=True
    )
    print(summary)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                                     Diffly Summary                                     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
   Primary key: id

 Schemas
 ▔▔▔▔▔▔▔
   Schemas match exactly (column count: 2).

 Rows
 ▔▔▔▔
   Left count             Right count
       3      (no change)      3

   ┏━┯━┯━┯━┯━┓
   ┃-│-│-│-│-┃                1  left only   (33.33%)
   ┠─┼─┼─┼─┼─┨╌╌╌┏━┯━┯━┯━┯━┓╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╮
   ┃ │ │ │ │ ┃ = ┃ │ │ │ │ ┃  1  equal       (50.00%)  │
   ┠─┼─┼─┼─┼─┨╌╌╌┠─┼─┼─┼─┼─┨╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌├╴  2  joined
   ┃ │ │ │ │ ┃ ≠ ┃ │ │ │ │ ┃  1  unequal     (50.00%)  │
   ┗━┷━┷━┷━┷━┛╌╌╌┠─┼─┼─┼─┼─┨╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯
                 ┃+│+│+│+│+┃  1  right only  (33.33%)
                 ┗━┷━┷━┷━┷━┛

 Columns
 ▔▔▔▔▔▔▔
   ┌───────┬────────┬───────────────────────────┐
   │ value │ 50.00% │ 2.0 -> 2.5 (1x, e.g. "b") │
   └───────┴────────┴───────────────────────────┘

See more examples in the documentation.

About

Utility package for comparing polars data frames.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Contributors

Languages