This project is designed as a foundational framework, inviting the community to develop custom algorithms for stock price movement prediction. Its modular design also allows for adaptation to other machine learning applications, such as website click-through rate prediction or advertisement targeting.
The current release features:
- A robust framework for training machine learning models to predict U.S. stock price changes.
Problem Statement:
- Many popular U.S. stock prediction projects rely on Large Language Models (LLMs), which can incur significant API costs. This project offers a more cost-effective alternative.
- Engaging in frequent intra-day trading often leads to being categorized as a pattern day trader, which typically requires a substantial account balance and can be mentally taxing. This project focuses on predicting next-day price changes, aiming to reduce the need for constant market monitoring and provide a more sustainable approach to investment.
This project is provided for research and demonstration purposes only and does not constitute investment advice. Any financial decisions made based on this model are entirely at your own risk.
- Clone the repository and navigate to the ML4Investment working directory
git clone https://github.com/Beryex/ML4Investment.git --depth 1
cd ML4Investment- Set up environment
conda create -n ml4i python=3.11 -y
conda activate ml4i
pip install uv
uv pip install -e '.[dev]'- Create local configuration by
cp ml4investment/.env.example ml4investment/.env- Open the newly created
.envfile and enter your credentials, such as your trading platform's API Key and Secret. The API needs to be applied through Schwab Developer Portal.
All the data and the optimized results are stored at folder ml4investment/data.
The project supports data collection by loading local files and fetching the latest data via the Schwab API, via
python fetch_data.pyYou could also load local files via
python fetch_data.py -lld -ldp PATH_TO_STOCK_DATA_FOLDERI recommend loading the provided local files for extending your training data to several years back and using the Schwab API to fetch the most recent stock data.
After downloading the data, navigate to the ml4investment directory, which contains the main codebase.
To train your own model, run
python train.py # Train the model with default hyperparametersYou can also run
python train.py -odspto optimize the data sampling proportion,python train.py -ofto optimize features andpython train.py -omhpto optimize model hyperparameters.
To evaluate the model's performance via backtesting, run
python backtest.py -vThis will show the detailed statics result, e.g. MAE, sign accuracy and the actual gain, in the test dataset, as well as the stock-level and daily-level performance.
For daily usage, directly execute
bash scripts/daily_usage.shThis will fetch the latest U.S. stock data, run the backtest of current model, generate predictions for the price change ratio from the next trading day's open price to the subsequent day's open price and automatically place the orders.
Run the test suite from the repository root:
pytestFor a more verbose run (matching the current development workflow):
python -m pytest -l -v -sThe preceding sections covered the daily usage of this project. Here, I'll guide you on how to enhance the pipeline and develop your own customized models.
To get you familiar with the pipeline, here I introduce the main setting and their corresponding file location:
| Description | File Path |
|---|---|
| Pipeline Hyperparameter Settings | ml4investment/config/global_settings.py |
| LightGBM Model Hyperparameters | ml4investment/data/prod_model_hyperparams.json |
| Feature Selection Result | ml4investment/data/prod_features.json |
| Stocks for Training | ml4investment/config/train_stocks.json |
| Target Stocks | ml4investment/config/target_stocks.json |
| Stocks for Prediction | ml4investment/data/predict_stocks.json |
The stocks used for prediction is optimized from target stocks.
To enhance model performance, consider refining the feature engineering process within ml4investment/utils/feature_calculating.py and ml4investment/utils/feature_processing.py as well as optimizing the model training process in ml4investment/utils/model_training.py.
To do hyperparameter search, you could directly set the hyperparameters to search and corresponding value list in ml4investment/scripts/hyperparameter_search.sh. Then run
bash scripts/hyperparameter_search.shIf you find issues or discover tricks that could improve the algorithm, feel free to directly modify the code and submit a Pull Request.
@software{ml4investment,
author = {Boyao Wang},
title = {ML4Investment: Machine Learning for Investment},
url = {https://github.com/Beryex/ML4Investment},
year = {2025},
note = {GitHub repository}
}
