A Python-based data pipeline that ingests, validates, and serves Chicago traffic crash data from multiple SODA APIs, supporting spatial analysis and automated refresh workflows. Includes a public dashboard for visualizing crash data.
The service orchestrates end-to-end ETL for four interconnected datasets from the Chicago Open Data Portal, keeps them synchronized in PostgreSQL/PostGIS, and exposes a FastAPI layer with an admin UI and documentation.
Now includes a full-stack public dashboard built with Next.js, featuring interactive maps, trend charts, and real-time statistics - perfect for advocacy organizations like Lakeview Urbanists.
Prerequisites: Python 3.11+, pip, Docker, Node 18+, npm, and GNU Make.
git clone https://github.com/MisterClean/chicago-crashes-pipeline.git
cd chicago-crashes-pipeline
python -m venv .venv && source .venv/bin/activate
make install
cp .env.example .env # update credentials as needed
make docker-up # start Postgres/PostGIS and supporting services
make migrate
make serve # FastAPI + admin portal at http://localhost:8000A full-stack public dashboard for visualizing Chicago crash data, built with Next.js 15, MapLibre, and Recharts.
Features:
- Interactive map with 10,000+ crash points color-coded by severity
- Weekly trend charts showing crashes, injuries, and fatalities
- Key metrics: total crashes, injuries, fatalities, pedestrians, cyclists, hit & runs
- Date range filtering with quick presets (7 days, 30 days, 1 year)
- Responsive design for desktop and mobile
# Start the backend (if not already running)
cd docker && docker-compose up -d postgres redis
cd .. && source venv/bin/activate
uvicorn src.api.main:app --reload --port 8000
# Start the frontend
cd frontend && npm install && npm run dev
# Dashboard at http://localhost:3001/dashboardDeploy everything with Docker Compose:
# One-time: Download Chicago basemap tiles (~50MB)
./docker/tiles/download-basemap.sh
# Start all services
docker-compose -f docker/docker-compose.fullstack.yml up -d
# Access points:
# - Dashboard: http://localhost
# - API: http://localhost/api
# - Tiles: http://localhost/tiles| Layer | Technology | Purpose |
|---|---|---|
| Frontend | Next.js 15 (App Router) | Server components, TypeScript |
| Charts | Recharts | Trend visualization |
| Maps | react-map-gl + MapLibre | Interactive crash map |
| Tiles | Martin | Vector tiles from PostGIS |
| Basemap | PMTiles | Self-hosted Chicago map tiles |
| Proxy | Nginx | Reverse proxy, caching, rate limiting |
- Traffic Crashes – Crashes: Primary crash records (~1M+ rows)
- Traffic Crashes – People: Person-level injury details
- Traffic Crashes – Vehicles: Vehicle and unit information
- Traffic Crashes – Vision Zero Fatalities: Curated fatality data
graph TB
subgraph External["External Data Sources"]
CDA[Chicago Data Portal - SODA API]
SHP[Shapefiles - Geographic Boundaries]
end
subgraph WebUI["Web Interface"]
ADMIN[Admin Portal]
API_DOCS[API Documentation]
end
subgraph API["API Layer"]
FASTAPI[FastAPI Application]
SYNC_R[Sync Router]
JOBS_R[Jobs Router]
HEALTH_R[Health Router]
SPATIAL_R[Spatial Router]
end
subgraph Services["Business Logic"]
ETL[ETL Service]
VALIDATOR[Data Validator]
JOB_SVC[Job Service]
SCHEDULER[Job Scheduler]
DB_SVC[Database Service]
SPATIAL_SVC[Spatial Service]
end
subgraph Processing["Data Processing"]
SANITIZER[Data Sanitizer]
RATE_LIMITER[Rate Limiter]
end
subgraph Storage["Data Storage"]
PG[(PostgreSQL with PostGIS)]
CRASHES[(Crashes Table)]
PEOPLE[(People Table)]
VEHICLES[(Vehicles Table)]
FATALITIES[(Fatalities Table)]
JOBS[(Jobs Table)]
EXECUTIONS[(Executions Table)]
end
subgraph Infra["Infrastructure"]
DOCKER[Docker Container]
VENV[Python Environment]
end
ADMIN --> FASTAPI
API_DOCS --> FASTAPI
FASTAPI --> SYNC_R
FASTAPI --> JOBS_R
FASTAPI --> HEALTH_R
FASTAPI --> SPATIAL_R
SYNC_R --> ETL
JOBS_R --> JOB_SVC
SPATIAL_R --> SPATIAL_SVC
ETL --> CDA
ETL --> RATE_LIMITER
ETL --> VALIDATOR
VALIDATOR --> SANITIZER
JOB_SVC --> SCHEDULER
JOB_SVC --> DB_SVC
SCHEDULER --> ETL
SPATIAL_SVC --> SHP
SPATIAL_SVC --> DB_SVC
DB_SVC --> PG
PG --> CRASHES
PG --> PEOPLE
PG --> VEHICLES
PG --> FATALITIES
PG --> JOBS
PG --> EXECUTIONS
PG -.-> DOCKER
FASTAPI -.-> VENV
Key components
- FastAPI service exposing sync controls, health checks, and spatial endpoints
- Admin portal for job orchestration and monitoring
- Async ETL pipeline with rate limiting, validation, and sanitization stages
- Scheduler and job tracking with retries and execution history
- PostgreSQL/PostGIS storage optimised for spatial and analytic workloads
- Automated initial/backfill loads plus incremental syncs with progress tracking
- Batch ingestion (~50k rows per request) using connection pooling and COPY for heavy loads
- Spatial enrichment via PostGIS and shapefile lookups
- Dockerised local stack with reproducible Python environment
- Structured logging and metrics for observability
Validation & sanitization
- Geographic bounds, date parsing, age and vehicle year checks
- Duplicate pruning, Unicode handling, and whitespace cleanup
Resilience
- Circuit breaker with exponential backoff to respect API limits
- Partial failure recovery to keep ingestion moving
- Detailed error reporting to the job history tables
Sync & Health:
GET /sync/status– Current sync status and last run timePOST /sync/trigger– Manual sync trigger with optional date rangeGET /health– Service health check
Dashboard (for frontend):
GET /dashboard/stats– Aggregate statistics (crashes, injuries, fatalities)GET /dashboard/trends/weekly– Weekly crash trends for chartsGET /dashboard/crashes/geojson– Crash points as GeoJSON for mapsGET /dashboard/crashes/by-hour– Hourly distribution analysisGET /dashboard/crashes/by-cause– Top contributory causes
Comprehensive documentation is available at http://localhost:8000/documentation/ when running the API server.
Key Resources:
- Quick Start Guide - Get up and running quickly
- Configuration Guide - Environment setup
- API Reference - Complete API documentation
- Admin Portal Guide - Using the web interface
- Troubleshooting - Common issues and solutions
- Contributing Guidelines - How to contribute
- Security Policy - Security best practices and reporting
Building Documentation:
npm install
npm run start # Dev server at http://localhost:3000
npm run build # Static build to src/static/documentation/The documentation is built with Docusaurus 3 and includes:
- Architecture diagrams and data flow
- API reference with examples
- Development guides and testing strategies
- Operations and deployment guides
- Data catalog and schema documentation
We take security seriously. Please review:
- SECURITY.md - Security policy and vulnerability reporting
- Security Best Practices - Deployment security guide
Important: Never use default credentials in production. See the security documentation for configuration guidance.
We welcome contributions! See CONTRIBUTING.md for guidelines.
Quick Start for Contributors:
- Fork and clone the repository
- Set up development environment:
make dev-install - Create a feature branch:
git checkout -b feature/your-feature - Make changes and add tests
- Run tests and linters:
make test && make lint - Submit a pull request
make install # Install dependencies
make test # Run tests
make lint # Lint the codebase
make format # Apply formatting
make docker-build # Build Docker images
make migrate # Run database migrationschicago-crash-pipeline/
├── src/
│ ├── api/ # FastAPI service
│ ├── etl/ # ETL pipeline modules
│ ├── models/ # SQLAlchemy models
│ ├── validators/ # Data validation rules
│ ├── spatial/ # Spatial data handlers
│ └── utils/ # Shared helpers
├── frontend/ # Next.js dashboard
│ ├── app/ # App Router pages
│ ├── lib/ # API client, map config
│ └── Dockerfile # Production build
├── docker/
│ ├── docker-compose.fullstack.yml # Full stack deployment
│ ├── martin.yaml # Vector tile server config
│ ├── nginx.conf # Reverse proxy config
│ └── tiles/ # PMTiles basemap
├── migrations/ # Alembic migrations
├── tests/
├── config/
└── docs/
- Traffic Crashes – Crashes
- Traffic Crashes – People
- Traffic Crashes – Vehicles
- Vision Zero Fatalities
Import errors with relative imports
- Run Python commands from the repository root so
srcstays on the path - For ad-hoc scripts, add
sys.path.append("src")
Database connection issues
- Confirm
.envcontains valid credentials - Verify PostgreSQL/PostGIS is running and the
postgisextension is enabled
API rate limits
- Default configuration respects the 1000 requests/hour cap
- Request a Chicago Data Portal token for higher throughput
Large data loads
- Tune the
batch_sizeconfiguration and container memory for initial loads
MIT License - see LICENSE for details.
