Transformer from Scratch — TinyShakespeare

A decoder-only, GPT-style Transformer implemented from scratch in PyTorch and trained character-by-character on the TinyShakespeare dataset.

Architecture

Multi-head self-attention with causal masking and optional KV caching for fast inference
Sinusoidal positional encoding
6 decoder layers, each with pre-norm LayerNorm, MHA, GELU MLP (4× expansion), and residual connections
Character-level tokenizer with <start>, <end>, and <pad> special tokens
Beam search and top-k sampling generation strategies

Project Structure

train-transformer/
├── transformer.py        # Model definition (Transformer, Decoder, MHA, KV cache)
├── utils.py              # Tokenizer, data prep, text generation utilities
├── train_transformer.py  # Training & evaluation CLI script
├── tinyshakespeare.txt   # Training corpus
└── requirements.txt

Installation

pip install -r requirements.txt

Apple Silicon (M1/M2/M3): PyTorch will automatically use the MPS backend. On other hardware it falls back to CPU. For CUDA GPU support install the appropriate PyTorch build from pytorch.org.

Usage

Train

python train_transformer.py --train

This trains for 10 epochs and saves two checkpoint files:

model_checkpoint.pth — full checkpoint (model + optimizer state)
model_state_dict.pth — model weights only (used for eval)

Evaluate / Generate text

python train_transformer.py --eval

Loads model_state_dict.pth and generates continuations for two seed phrases using beam search:

"I speak from"
"Some parcels of"

Train + Eval in one go

python train_transformer.py --train --eval

Key Hyperparameters

Parameter	Value
Sequence length	40
Batch size	32
Model dimension (`d_model`)	512
Attention heads	8
Decoder layers	6
Learning rate	5e-4
Epochs	10
LR schedule	OneCycleLR
Label smoothing	0.02
Gradient clipping	1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer from Scratch — TinyShakespeare

Architecture

Project Structure

Installation

Usage

Train

Evaluate / Generate text

Train + Eval in one go

Key Hyperparameters

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
requirements.txt		requirements.txt
tinyshakespeare.txt		tinyshakespeare.txt
train_transformer.py		train_transformer.py
transformer.py		transformer.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

Transformer from Scratch — TinyShakespeare

Architecture

Project Structure

Installation

Usage

Train

Evaluate / Generate text

Train + Eval in one go

Key Hyperparameters

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages