Fine-tuning Llama 3.1 8B to automatically convert doctor-patient conversation transcripts into structured clinical notes using QLoRA.
- Python 3.10+
- Conda (Anaconda or Miniconda) — for local environment setup
- A Google account (if running on Google Colab)
- A HuggingFace account with access to Meta Llama models (https://huggingface.co/meta-llama)
- GPU — this project requires a CUDA-capable GPU. An A100 is recommended, but T4/V100 will also work (slower training).
conda create -n clinical-note-gen python=3.10 -y
conda activate clinical-note-genInstall PyTorch with the appropriate CUDA version for your system. For CUDA 12.1:
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia -yFor other CUDA versions, check https://pytorch.org/get-started/locally/
pip install -r requirements.txtpip install git+https://github.com/google-research/bleurt.gitjupyter notebook fine_tuned_LLM.ipynbIf you don't have a local GPU, use Google Colab instead. No environment setup is needed — the notebook installs everything automatically.
Upload fine_tuned_LLM.ipynb to Google Colab, or open it directly if hosted on GitHub/Google Drive.
- Go to Runtime > Change runtime type
- Set Hardware accelerator to GPU
- Select A100 if available (under the "High-RAM" option), otherwise T4 or V100
- Click Save
- Go to Runtime > Run all
- Alternatively, run cells one by one using Shift + Enter
Important: After the first cell (installation cell), you may need to restart the runtime before continuing. Colab will prompt you if needed. After restarting, run all cells from the second cell onward.
That's it. The notebook handles everything automatically:
- Installs all dependencies
- Downloads the ACI-Bench dataset
- Loads and prepares the data
- Loads the pre-trained model
- Runs baseline evaluation
- Trains 3 hyperparameter configurations
- Evaluates and compares all models
- Performs error analysis
- Saves the best model
- Runs a demo inference
.
├── fine_tuned_LLM.ipynb # Main notebook — run this
├── requirements.txt # Python dependencies
├── README.md # This file
.
├── aci-bench/ # Cloned dataset (created automatically)
├── best_model_lora/ # Saved best fine-tuned LoRA adapters
├── outputs_config1/ # Training checkpoints for Config 1
├── outputs_config2/ # Training checkpoints for Config 2
├── outputs_config3/ # Training checkpoints for Config 3
├── evaluation_results.pkl # Full evaluation results (with predictions)
├── evaluation_results.json # Metrics summary in JSON format
├── evaluation_results.png # Visualization charts
├── sample_predictions.txt # Example generated notes for review
- "CUDA out of memory" — Restart the runtime and try again. If it persists, reduce
MAX_SEQ_LENGTHfrom 4096 to 2048 in the model loading cell. - "Module not found" errors after installation — Restart the runtime (or your Jupyter kernel) after the installation cell, then run from the imports cell onward.
- Slow training — Make sure you are using a GPU runtime. CPU-only will not work for this project.
- Conda environment issues — If
pip install -r requirements.txtfails, try installing packages one at a time to identify the problematic dependency. - BLEURT installation fails — BLEURT is optional. The notebook will skip BLEURT scoring if it is not installed and still run all other metrics.