This repository contains our COMP7108 project implementation using the PaySim transaction dataset and Neo4j/Cypher for fraud-oriented graph analytics.
The project covers the assignment pipeline end-to-end:
- Property-graph modeling and ingestion of PaySim data
- Fraud pattern discovery with graph traversal queries
- Fraud-subgraph visualization
- Graph algorithm analysis (GDS)
- Temporal fraud behavior analysis
- PaySim (Kaggle): https://www.kaggle.com/datasets/ealaxi/paysim1
- Source note: scripts/dataset-source.txt
Note: the raw CSV is not included in this repository. Please download it and place it in Neo4j's import-accessible location.
- Project/taskA: ingestion, schema reset, constraints/indexes, validation
- Project/taskB: pattern queries (fan-in, fan-out, mule, cycle attempts, shortest path)
- Project/taskC: fraud subgraph visualization query + exported figures
- Project/taskD: graph algorithm queries (PageRank, community, betweenness)
- Project/taskE: temporal-analysis outputs and Python post-processing scripts
- scripts: assignment brief and project guidance documents
- Neo4j (for Cypher execution)
- Neo4j Graph Data Science plugin (required for Task D queries)
- Python 3.9+ (for Task E scripts)
- Python packages:
- pandas
- matplotlib
- Download PaySim CSV from Kaggle.
- Rename/place the file so that Cypher import path in Project/taskA/taskA03_import.cypher is valid (
file:///data.csvby default).
- Project/taskA/taskA01_reset_and_schema.cypher
- Project/taskA/taskA02_constraints_and_indexes.cypher
- Project/taskA/taskA03_import.cypher
- Project/taskA/taskA04_validation_queries.cypher
- Task B queries:
- Task C visualization:
- Task D algorithms (GDS required):
- (E01) Run Project/taskE/taskE01_transaction_spike.cypher and export the results as
taskE01_outgoing.csvandtaskE01_incoming.csv. - (E01) Edit paths in Project/taskE/taskE01_transaction_spike.py to your local files, then run the script to generate spike results.
- (E02) Run Project/taskE/taskE02_balance_anomaly.cypher and export anomaly records.
- (E03) Run Project/taskE/taskE03_fraud_timeline.cypher and export timeline CSVs.
- (E03) For fraud timeline plotting, use Project/taskE/taskE03_fraud_timeline_plot.py with Project/taskE/output/taskE03_fraud_timeline.csv.
Important: current Python scripts contain Windows absolute paths in the source code; update them before execution.
- Reset/schema initialization: Project/taskA/taskA01_reset_and_schema.cypher
- Constraints and indexes (optimized lookup/filtering): Project/taskA/taskA02_constraints_and_indexes.cypher
- Batch import (
IN TRANSACTIONS OF 1000 ROWS): Project/taskA/taskA03_import.cypher - Validation checks (counts, distributions, fraud stats): Project/taskA/taskA04_validation_queries.cypher
- Fan-in detection: Project/taskB/taskB01_fan_in.cypher
- Fan-out detection: Project/taskB/taskB02_fan_out.cypher
- Mule pattern detection: Project/taskB/taskB03_mule_detection.cypher
- Cycle detection attempts: Project/taskB/taskB04_cycle_detection.cypher
- Shortest path and local path exploration: Project/taskB/taskB05_shortest_path_and_exploration.cypher
- Fraud account labeling and subgraph extraction: Project/taskC/taskC_visual.cypher
- Exported figures: Project/taskC/C_step.png, Project/taskC/C_amount.png
- PageRank ranking and fraud cross-check: Project/taskD/taskD01_pagerank.cypher
- Community detection + betweenness on local suspicious structure: Project/taskD/taskD02_Community_Betweenness.cypher
- Supporting result tables: CSV files in Project/taskD
- E01 — Transaction spike detection:
- Cypher export query: Project/taskE/taskE01_transaction_spike.cypher
- Python spike analysis script: Project/taskE/taskE01_transaction_spike.py
- Results: Project/taskE/output/taskE01_outgoing.csv, Project/taskE/output/taskE01_incoming.csv
- Top spike accounts: Project/taskE/output/taskE01_outgoing_spikes.csv, Project/taskE/output/taskE01_incoming_spikes.csv
- E02 — Balance anomaly detection:
- Cypher query: Project/taskE/taskE02_balance_anomaly.cypher
- Detailed anomaly records: Project/taskE/output/taskE02_detailed_anomaly_records.csv
- Summary by transaction type: Project/taskE/output/taskE02_summary_transaction_type.csv
- E03 — Fraud activity timeline:
- Cypher queries (fraud counts per step + account activity span): Project/taskE/taskE03_fraud_timeline.cypher
- Python plot script: Project/taskE/taskE03_fraud_timeline_plot.py
- Timeline data: Project/taskE/output/taskE03_fraud_timeline.csv, Project/taskE/output/taskE03_flagged_timeline.csv
- Account activity span: Project/taskE/output/taskE03_account_activity.csv
- Timeline chart: Project/taskE/output/taskE03_fraud_timeline.png
Aligned with assignment deliverables:
- Ingestion scripts: Project/taskA
- Query collection: Project/taskB, Project/taskC, Project/taskD, Project/taskE
- Visualization outputs: Project/taskB/output, Project/taskC, Project/taskE
- Algorithm/temporal outputs: CSV/PNG files under Project/taskD and Project/taskE