Skip to content

datafold/dataiku-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dataiku Demo — Customer Analytics Pipeline

End-to-end customer analytics pipeline that ingests Snowflake data into Dataiku DSS, computes RFM scores, CLV estimates, and churn risk, writes results back to Snowflake, and is mirrored on Databricks with validated parity.


Dataiku Flow

Dataiku Demo Flow

Architecture

Snowflake (DEV.DATAIKU_DEMO)
  ├── CUSTOMERS        (1,000 rows)
  └── TRANSACTIONS     (8,000 rows)
          │
          ▼  Dataiku DSS (DEMO project)
  ┌───────────────────────────────────────────┐
  │  [Shaker]  filter STATUS = 'completed'    │
  │      → transactions_completed             │
  │                                           │
  │  [Join]    LEFT JOIN on CUSTOMER_ID       │
  │      → customer_transactions_joined       │
  │                                           │
  │  [Python]  RFM + CLV + Churn analytics   │
  │      → CUSTOMER_ANALYTICS_OUTPUT          │
  └───────────────────────────────────────────┘
          │
          ▼
  Snowflake  DEV.DATAIKU_DEMO.CUSTOMER_ANALYTICS_OUTPUT
  Databricks dev.dataiku_demo.customer_analytics_output  ← migrated, parity verified

Parity validation with Datafold

Parity was validated using Datafold — a data reliability platform that runs cross-database diffs at scale using bisection hashing.

The validate_parity.py script uses the same open-source data-diff library that powers Datafold cloud.

About

Dataiku customer analytics pipeline demo with Snowflake + Databricks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages