A curated list of awesome machine learning libraries for marketing, including media mix models, multi touch attribution, causal inference and more shakostats.com.
Star ⭐ the repo if it helps you, and feel free to contribute your own favorite resources
A collection of open source repositories and libraries.
- ChannelAttribution
- - Python and R library that employs a k-order Markov representation to identify structural correlations in customer journey data.
- fractribution
- - Fractional attribution using machine learning (by Google).
- Marketing-Attribution-Models
- - A collection of marketing attribution models.
- markov-chain-attribution
- - Implementation of Markov Chain attribution.
- mta
- - Multi-touch attribution models.
- pychattr
- - Python implementation of the excellent R ChannelAttribution library.
- shapley
- - Shapley values for attribution modeling.
- shapley-attribution-model-zhao-naive
- - Shapley Naive set-based attribution modeling.
- Deep Conversion Attribution
- - Deep Conversion Attribution for Online Advertising.
- ChannelAttribution - Python library for channel attribution.
- Regression Based Attribution
- - Regression-based attribution by Google.
- BayesianMMM
- - Bayesian Media Mix Modeling with Python and PyMC3.
- dammmdatagen
- - (R) Media Mix Modeling Data Generator.
- lightweight-mmm
- - A lightweight Bayesian Marketing Mix Modeling library by Google.
- mamimo
- - Small Media Mix Models designed to be used in conjunction with ML libraries (e.g. SKL)
- mmm-stan
- - Marketing Mix Modeling with Stan.
- pymc-marketing
- - Bayesian marketing mix modeling and customer lifetime value in Python.
- Robyn
- - Facebook's automated Marketing Mix Modeling (MMM) code.
- Meridian
- - Google's new open-source Bayesian MMM framework (successor to LightweightMMM).
- Ecommerce Marketing Spend Optimization
- - Machine Learning model for optimizing marketing budget.
- MMM Prior Elicitation
- - Tools for prior elicitation in MMM.
- trimmed_match
- - Trimmed Match design for randomized paired geo experiments.
- matched_markets
- - Time-based regression matched markets (by Google).
- GeoexperimentsResearch
- - Geo experiments research (R).
- GeoLift
- - An R package to design and analyze geo-lift experiments (by Facebook).
- CausalImpact \ (R library)
- Meta Geolift \ (R Library)
- Quasi - Causal Inference for the Brave and True
- Murray \ (Python Library)
- MarketMatching
- - Market matching and causal impact analysis.
- Geo RCT Methodology
- - Methodology for Geo Randomized Controlled Trials.
- scpi: Uncertainty Quantification for Synthetic Control Methods
- Uncertainty Quantification in Synthetic Controls with Staggered Treatment Adoption
- Prediction Intervals for Synthetic Control Methods
- CausalImpact
- - An R package for causal inference using Bayesian structural time-series models.
- CausalPy
- - A Python package for causal inference in quasi-experimental settings.
- dowhy
- - Python library for causal inference that supports explicit modeling and testing of causal assumptions.
- SyntheticControlMethods
- - Causal inference using Synthetic Control.
- tfcausalimpact
- - Google's CausalImpact Algorithm implemented on top of TensorFlow Probability.
- upliftml
- - Scalable unconstrained and constrained uplift modeling from experimental data using PySpark and H20.
- scikit-uplift
- - exclamation: uplift modeling in scikit-learn style in python 🐍
- EconML
- - AI, Econometrics and Causal Inference modelling.
- statsmodels
- - Statistical modeling including time series and econometrics.
- pysyncon
- - Multiple Synthetic Control implementations.
- scpi
- - Synthetic Control Methods with Prediction Intervals (R/Python/Stata).
- SparseSC
- - Sparse Synthetic Control Models in Python by Microsoft.
- Causal Inference in Python
- - Code for "Causal Inference in Python".
- CausalLift
- - Uplift modeling for causal inference.
- MatchIt
- - Nonparametric preprocessing for parametric causal inference (R).
- TensorFlow CausalImpact
- - Python implementation of CausalImpact using TensorFlow Probability.
- Causmos
- - An open-source web application for Causal Impact analysis (by Google Marketing Solutions).
- Deep Causal MMM
- - Deep Causal Marketing Mix Modeling.
- ScidesignR
- - Scientific design of experiments in R.
- CausalImpact Python
- - Another Python implementation of CausalImpact.
- mlsynth
- A Python library for doing policy evaluation using panel data estimators.
- causalml
- - Uplift modeling and causal inference with machine learning.
- btyd
- - Buy Till You Die and CLV statistical models in Python.
- lifetimes
- - Measure customer lifetime value in Python.
- lucius-ltv
- - CLV for subscriptions.
- amazon-denseclus
- - Python module for clustering both categorical and numerical data using UMAP and HDBSCAN by Amazon.
- rfm
- - RFM Analysis and Customer Segmentation.
- retentioneering-tools
- - Retentioneering: product analytics, data-driven customer journey map optimization, marketing analytics, web analytics, transaction analytics, graph visualization...
- ecommercetools
- - Data science toolkit for those working in technical ecommerce, marketing science, and technical seo and includes a wide range of features to aid analysis and mod...
- lifelines
- - Survival analysis in Python.
- pysurvival
- - An open source python package for Survival Analysis modeling.
- scikit-survival
- - Survival analysis built on top of scikit-learn.
- EconML
- - Automated Learning and Intelligence for Causation and Economics.
- arules
- - Association Rules (apriori, eclat) in R.
- BTYDplus
- - Extended BTYD models (R).
- mr-uplift
- - Uplift Modeling with Multiple Treatments/Responses.
- BTYD
- - Buy Till You Die - Probability Models for Customer-Base Analysis (R).
- NeuralProphet
- - A hybrid forecasting framework based on PyTorch and trained with standard deep learning methods.
- pmdarima
- - A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
- prophet
- - Additive time series modelling by Facebook.
- sktime
- - A unified framework for ML with Time Eeries.
- StatsForecast
- - Lightning ⚡️ fast forecasting with statistical and econometric models.
- stumpy
- - STUMPY computes something called the matrix profile, which is just an academic way of saying "for every subsequence automatically identify its corresponding nea...
- temporian
- - Temporian is an open-source Python library for preprocessing ⚡ and feature engineering 🛠 temporal data 📈 for machine learning applications 🤖.
- tbats
- - BATS and TBATS time series forecasting
- tslearn
- - The machine learning toolkit for time series analysis in Python.
- NeuralForecast
- - Scalable and user friendly neural forecasting algorithms.
- Nixtla
- - TimeGPT-1: production ready pre-trained Time Series Foundation Model for forecasting and anomaly detection.
- MLForecast
- - Scalable machine 🤖 learning for time series forecasting.
- lightfm
- - A Python implementation of a number of popular recommendation algorithms.
- openrec
- - A Modular Framework for Extensible and Adaptable Recommendation Algorithms.
- recmetrics
- - Library of metrics for evaluating recommender systems.
- recommenders
- - Best Practices on Recommendation Systems (by Microsoft).
- Surprise
- - Scikit for building and analyzing recommender systems that deal with explicit rating data.
- gapandas4
- - Python package for querying the Google Analytics Data API for GA4 and displaying the results in a Pandas dataframe.
- Decoy
- - Synthetic Data Generator using DuckDB at its core.
- SDV
- - Python library designed to be your one-stop shop for creating tabular synthetic data.
Articles, papers, and other resources organized by topic.
- CausalImpact \ (paper) - An important problem in econometrics and marketing is to infer the causal impact that a designed market intervention has exerted on an outcome metric over time. In order to allocate a given budget ...
- Meta Geolift \ (paper)
- Online A/B Testing \ - Trustworthy Online Controlled Experiments
- External Resource - How to measure the incremental Return On Ad Spend (iROAS) is a fundamental problem for the online advertising industry. A standard modern tool is to run randomized geo experiments, where experiment...
- External Resource - Gaussian processes are powerful non-parametric probabilistic models for stochastic functions. However, the direct implementation entails a complexity that is computationally intractable when the nu...
- Bayesian - The challenges posed by high-dimensional data and use of the simplex constraint are two major concerns in the empirical application of the synthetic control method (SCM) in econometric studies. To ...
- infernce for simplex weights - In many applications, the parameter of interest involves a simplex-valued weight which is identified as a solution to an optimization problem. Examples include synthetic control methods with group-...
- synthetic business cycles - This paper investigates the use of synthetic control methods for causal inference in macroeconomic settings when dealing with possibly nonstationary data. While the synthetic control approach has g...
- Synthetic Control Method (Vanilla SCM)
- Augmented Difference-in-Differences
- Forward Difference-in-Differences
- Two Step Synthetic Control
- Synthetic Control Method with Nonlinear Outcomes - The synthetic control estimator (Abadie et al., 2010) is asymptotically unbiased assuming that the outcome is a linear function of the underlying predictors and that the treated unit can be well ap...
- Proximal Causal Inference for SCM (Surrogates) - The synthetic control method (SCM) has become a popular tool for estimating causal effects in policy evaluation, where a single treated unit is observed, and a heterogeneous set of untreated units ...
- Proximal SCM Framework - Synthetic control (SC) methods are commonly used to estimate the treatment effect on a single treated unit in panel data settings. An SC is a weighted average of control units built to match the tr...
- Relaxed Balanced Synthetic Control - The synthetic control method (SCM) is widely used for constructing the counterfactual of a treated unit based on data from control units in a donor pool. Allowing the donor pool contains more contr...
- L1-INF Synthetic Control - This paper reinterprets the Synthetic Control (SC) framework through the lens of weighting philosophy, arguing that the contrast between traditional SC and Difference-in-Differences (DID) reflects ...
- Synthetic Control with Multiple Outcomes (TLP and SBMF) - We generalize the synthetic control (SC) method to a multiple-outcome framework, where the conventional pre-treatment time dimension is supplemented with the extra dimension of related outcomes in ...
- Synthetic Controls for Experimental Design - This article studies experimental design in settings where the experimental units are large aggregate entities (e.g., markets), and only one or a small number of units can be exposed to the treatme...
- DeepTCN paper - We present a probabilistic forecasting framework based on convolutional neural network for multiple related time series forecasting. The framework can be applied to estimate probability density und...
- Chronos-2 report - Pretrained time series models have enabled inference-only forecasting systems that produce accurate predictions without task-specific training. However, existing approaches largely focus on univari...
- Conformalized Prediction - Conformal prediction is a technique for constructing prediction intervals that attain valid coverage in finite samples, without making distributional assumptions. Despite this appeal, existing conf...
- Matched Markets paper - Although randomized controlled trials are regarded as the "gold standard" for causal inference, advertisers have been hesitant to embrace them as their primary method of experimental desi...
- TBR paper - Two previously published papers (Vaver and Koehler, 2011, 2012) describe
a model for analyzing geo experiments. This model was designed to measure
advertising effectiveness using the rigor of... - GeoX paper - Advertisers have a fundamental need to quantify the effectiveness of their advertising. For search ad spend, this information provides a basis for formulating strategies related to bidding, budgeti...
- Benidis et al. - Deep learning based forecasting methods have become the methods of choice in many applications of time series prediction or forecasting often outperforming other approaches. Consequently, over the ...
- Orbit: Probabilistic Forecast with Exponential Smoothing - Time series forecasting is an active research topic in academia as well as industry. Although we see an increasing amount of adoptions of machine learning methods in solving some of those forecasti...
- Treatment Effects with Instruments paper
- Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS) - We consider the estimation of heterogeneous treatment effects with arbitrary machine learning methods in the presence of unobserved confounders with the aid of a valid instrument. Such settings ari...
- Arxiv preprint arxiv:1806.04823 - This paper proposes a Lasso-type estimator for a high-dimensional sparse parameter identified by a single index conditional moment restriction (CMR). In addition to this parameter, the moment funct...
- ArXiv preprint arXiv:1608.00060 - Most modern supervised statistical/machine learning (ML) methods are explicitly designed to solve prediction problems very well. Achieving this goal does not imply that these methods automatically ...
- CausalML: Python package for causal machine learning - CausalML is a Python implementation of algorithms related to causal inference and machine learning. Algorithms combining causal inference and machine learning have been a trending topic in recent y...
- ArXiv Paper - We introduce Gluon Time Series (GluonTS, available at https://gluon-ts.mxnet.io), a library for deep-learning-based time series modeling. GluonTS simplifies the development of and experimentation w...
- Measuring Ad Effectiveness Using Geo Experiments
- Estimating Ad Effectiveness Using Geo Experiments in a Time-Based Regression Framework - Two previously published papers (Vaver and Koehler, 2011, 2012) describe
a model for analyzing geo experiments. This model was designed to measure
advertising effectiveness using the rigor of... - NeuralProphet - We introduce NeuralProphet, a successor to Facebook Prophet, which set an industry standard for explainable, scalable, and user-friendly forecasting frameworks. With the proliferation of time serie...
- Be Careful When Interpreting Predictive Models in Search of Causal Insights
- 2021 Conference on Digital Experimentation @ MIT (CODE@MIT)
- The Kernel Cookbook: Advice on Covariance functions
- Gaussian Processes: HSGP Reference & First Steps
- Gaussian Processes: HSGP Advanced Usage
- DeepAR paper - Probabilistic forecasting with autoregressive recurrent networks (Amazon).
- N-BEATS paper - Neural basis expansion analysis for interpretable time series forecasting.
- N-HiTS paper - Neural Hierarchical Interpolation for Time Series Forecasting.
- TCN paper - An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling.
- [Overview](https - Overview
- Applications \ - Pricing
- Applications \ - Stitchfix Experimentation
- Applications \ - Amazon Causal Marketing
- Applications \ - Meta Ad Placement
- Applications \ - Application to Performance Marketing
- xDeepFM - eXtreme Deep Factorization Machine: Combining Explicit and Implicit Feature Interactions for Recommender Systems.
- Google Primer - Introduction to Recommendation Systems.
- Susan Athey - The Economics of Technology Professor at Stanford Graduate School of Business. Leading researcher in the intersection of machine learning and causal inference.
- Guido Imbens - Applied Econometrics Professor and Professor of Economics at Stanford Graduate School of Business. Nobel Laureate (2021) for methodological contributions to the analysis of causal relationships.
- Peter Fader - Frances and Pei-Yuan Chia Professor of Marketing at The Wharton School. Author of Customer Centricity.
- Byron Sharp - Professor of Marketing Science and Director of the Ehrenberg-Bass Institute. Author of How Brands Grow.
- Ron Berman - Associate Professor of Marketing at The Wharton School. Focuses on online marketing, marketing analytics, and game theory.
- Randall Lewis - Economic Research Scientist at Netflix. Known for work on "Ghost Ads" and measuring advertising effectiveness.
- Stefan Wager - Associate Professor of Operations, Information & Technology at Stanford GSB. Research on causal inference and statistical learning.
- Catherine Tucker - Sloan Distinguished Professor of Management at MIT Sloan. Expert in digital marketing, privacy, and online advertising.
- Dominique Hanssens - Distinguished Research Professor of Marketing at UCLA Anderson. Known for Long-Term Impact of Marketing.
- Garrett Johnson - Associate Professor of Marketing at Boston University. Co-author of "Ghost Ads" and research on privacy/GDPR.
- Probabilistic Programming & Bayesian Methods for Hackers
- - An introduction to Bayesian methods + Probabilistic Programming.
- Causal Inference for the Brave and True
- - An open-source book on causal inference.
- Statistical Rethinking
- - PyMC port of Richard McElreath's Statistical Rethinking.
- Causal Inference and Discovery in Python
- - Code repository for the book.
- Marketing-Data-Science
- - Analytics and data science business case studies to identify opportunities and inform decisions about products and features. Topics include Markov chains, A/B testing, customer segme...
- Hands-On Data Science for Marketing
- - Code for the Packt book.
- Bayesian Analysis with Python
- - Code for the third edition of the book.
- Data Science for Marketing Analytics
- - Code for the Packt book.
- Doing Bayesian Data Analysis
- - Python port of John Kruschke's book.
- Intuitive Bayes - A course on Intuitive Bayesian statistics.
- Forecasting Principles and Practices - Forecasting Bible with R Examples.
- Market Segmentation Analysis - Customer Segmentation Book with Python Examples.
- Algorithmic Marketing - Overview of general applications of data science in marketing.
- The Book of Why - Judea Pearl's popular science book on causal inference.
- Causality - Judea Pearl's seminal technical book.
- Causal Inference: The Mixtape - An accessible introduction to causal inference (Scott Cunningham).
- Probably Overthinking It - Allen Downey's book on using data to answer questions.
- Causal Inference in Python - Applying Causal Inference in Industry (Matheus Facure).
- Decision Making Processes in Marketing Mix Modelling (PDF)
- Causal Sales Analytics: Are my sales incremental or cannibalistic?
- Causal Analysis with PyMC: Answering "What If?" with the New do Operator
- Bayesian Media Mix Modeling for Marketing Optimization
- Modelling Changes in Marketing Effectiveness Over Time
- Reducing Customer Acquisition Costs: How we helped optimizing HelloFresh's marketing budget
- How Wayfair Uses Geo Experiments to Measure Incrementality
- Using Geographic Splitting & Optimization Techniques to Measure Marketing Performance
- The Future is Modeled: A How-to Guide for Advanced Marketing Mix Models
- Python/STAN Implementation of Multiplicative Marketing Mix Model
- Unified online marketing measurement - Think with Google
- Unified Marketing Measurement: The Power of Blending Methodologies (PDF)
- An Analyst's Guide to MMM | Robyn
- "Shapley Value Methods for Attribution Modeling in Online Advertising" by Zhao, et al. - This paper re-examines the Shapley value methods for attribution analysis in the area of online advertising. As a credit allocation solution in cooperative game theory, Shapley value method directl...
- this paper - Google Research.
- Challenges and Opportunities in Media Mix Modeling - Google Research.
- Feature Selection Methods for Uplift Modeling - Uplift modeling is a causal learning technique that estimates subgroup-level treatment effects. It is commonly used in industry and elsewhere for tasks such as targeting ads. In a typical setting, ...
- https://arxiv.org/abs/1808.03737 - In online advertising, the Internet users may be exposed to a sequence of different ad campaigns, i.e., display ads, search, or referrals from multiple channels, before led up to any final sales co...
- darts
- - A python library for easy manipulation and forecasting of time series.
- gluonts
- - Probabilistic time series modeling in Python (by Amazon).
- orbit
- - Bayesian Time Varying Coefficients (by Uber).
- tsfresh
- - Automatic extraction of relevant features from time series.
- awesome-digital-marketing
- - 😎 A curated list of awesome digital marketing guides, resources, services, & more.
- awesome-marketing
- - A curated list of resources related to internet marketing.
- awesome-marketing-machine-learning
- - A curated list of awesome machine learning libraries for marketing, including media mix models, multi touch attribution, causal inference and more
- awesome-Marketing-Analytics
- - rotating_light: Resources 💼 to learn/practice 🎯 Marketing analytics 💹 🚨
- AwesomeMarketing
- "Auto-Keras: Efficient Neural Architecture Search with Network Morphism" - Neural architecture search (NAS) has been proposed to automatically tune deep neural networks, but existing search algorithms, e.g., NASNet, PNAS, usually suffer from expensive computational cost. ...
- Trimmed Match Design for Randomized Paired Geo Experiments - How to measure the incremental return on Ad spend (iROAS) is a fundamental problem for the online advertising industry. A standard modern tool is to run randomized geo experiments, where experiment...
- Measuring Ad Effectiveness Using Geo Experiments - Advertisers have a fundamental need to quantify the effectiveness of their advertising. For search ad spend, this information provides a basis for formulating strategies related to bidding, budgeti...
- Estimating Ad Effectiveness using Geo Experiments in a Time-Based Regression Framework - Two previously published papers (Vaver and Koehler, 2011, 2012) describe
a model for analyzing geo experiments. This model was designed to measure
advertising effectiveness using the rigor of... - A Time-Based Regression Matched Markets Approach for Designing Geo Experiments - Although randomized controlled trials are regarded as the "gold standard" for causal inference, advertisers have been hesitant to embrace them as their primary method of experimental desi...
- Matheus Facure's Blog - Focus on Causal Inference.
- Juan Orduz's Blog - Great resources on MMM and causal inference.
- Think with Google: Marketing Strategies
- HelloFresh Engineering: Bayesian Media Mix Modeling
- Recast Blog
- PyMC Labs Blog
- Share of Search as a Predictive Metric - Les Binet on using Share of Search to predict market share.
- Demystifying Brand Equity - Kantar's guide on definition, measurement, and strategy for brand equity.
- Validating Google's ABCD framework - Kantar research on the impact of creative elements on sales and brand equity.
- When it comes to advertising effectiveness, what is key? - Nielsen study on the drivers of advertising effectiveness (Creatives, Reach, Targeting).
- Achieve your business goals with our modern measurement playbook - Overview of the modern measurement framework.
- The Modern Measurement Playbook - Google's guide to unifying MMM, attribution, and incrementality.
- Attribution model overview
- Ebay Hybrid Geo/User experiment
- Switchback tests - DoorDash Engineering.
- Conversion lift tests are dead - Primer on why geo experiments are important.
- Understanding the R-hat statistic
- A New Gold Standard for Digital Ad Measurement - Harvard Business Review.
- Trimmed Match Design for Randomized Paired Geo Experiments
- Ghost Ads: Improving the Economics of Measuring Online Ad Effectiveness
- Designing and Deploying Online Field Experiments - One of the first papers describing the application of geo experiments to marketing.
- Estimating Ad Effectiveness Using Geo Experiments in a Time-Based Regression Framework - Google Research.
- A Time-Based Regression Matched Markets Approach for Designing Geo Experiments - Google Research.
- Bayesian Time Varying Coefficients
- Hierarchical MMM with sign constraints - Marketing mix models (MMMs) are statistical models for measuring the effectiveness of various marketing activities such as promotion, media advertisement, etc. In this research, we propose a compre...
- Hierarchical Bayesian MMM using Category Data - Google Research.
- Geo-level Bayesian Hierarchical Media Mix Modeling - Google Research.
Feel free to submit an issue or pull request with any suggestions!
This list is maintained by Shako Stats.
Connect with me on LinkedIn.