This repository contains the dataset and code for the paper: "MemeTrans: A Dataset for Detecting High-Risk Memecoin Launches on Solana"
- Python 3.9
- Conda recommended
- Install packages using:
pip install -r requirements.txtcd MemeTrans/risk_prediction
python ml_model_train.py --model rfpython memecoin_selection.pyThe data_pipeline/ directory contains the full pipeline for reproducing the dataset from scratch.
| Directory | Description |
|---|---|
memecoin/ |
Collects Pump.fun token migration transactions from the Raydium fee account via Solana RPC, producing the memecoin list (raw_data/memecoin.jsonl). |
transaction/ |
Queries raw transactions from Google BigQuery, splits memecoin windows into CSV parts, generates BigQuery load scripts and JOIN SQL, and parses raw transactions (inner/outer) into structured records. |
bundle/ |
Queries the Jito bundle API to identify MEV bundles, and traces on-chain fund flow to detect shared wallet creators for bundled account detection. |
feature/ |
Generates the full feature set from parsed transactions, including holding concentration, market activity, bundle/cluster statistics, and OHLCV time series. |
annotation/ |
Manipulation detection and label annotation (coming soon). |
memecoin/ → transaction/ (BigQuery + parse) → bundle/ (Jito + fund flow)
↓
feature/ (feat_gen)
↓
annotation/
Scripts that query Solana RPC read endpoints from data_pipeline/rpc_endpoints.txt (one URL per line, gitignored). Create this file with your own RPC endpoints before running the memecoin collection or bundle scripts.
Since the raw transaction data from BigQuery is very large (>1TB), we provide the parsed transaction datasets on Google Drive:
- inner_tx.zip — Pre-migration (bonding curve) transactions
- outer_tx.zip — Post-migration (Raydium DEX) transactions
Download and extract them into raw_data/parsed_tx/ to skip the coin_collection.py → BigQuery → parse_* steps and run the downstream pipeline directly.
If you have any questions, please open an issue or contact the corresponding author at: husihao26@gmail.com
We will respond as soon as possible.