Video Transcriber App

A powerful desktop application that converts video files into accurate text transcripts using OpenAI's Whisper AI model. Features a modern GUI built with PyQt6, batch processing capabilities, and advanced text post-processing with filler word removal.

✨ Features

🎥 Multi-Format Support: Process MP4, AVI, MKV, MOV, WEBM, and MP3 files
🚀 GPU Acceleration: Automatic CUDA detection for 10-20x faster processing
📦 Batch Processing: Queue multiple files for automated transcription
🧹 Advanced Text Processing:
- Automatic filler word removal ("um", "uh", "like", "you know")
- Smart punctuation and capitalization
- Paragraph formatting for readability
🎯 Flexible Model Selection: Choose from tiny, base, small, medium, or large Whisper models
💾 Custom Model Loading: Load pre-downloaded models to work offline
📊 Real-time Progress: Track processing with time estimates and progress bars
⏸️ Pause/Resume: Control processing without losing progress
🎨 Modern UI: Clean, intuitive interface with drag-and-drop support

📥 Installation

Option 1: Download Pre-built Executable (Windows)

Download the latest VideoTranscriber.exe from the Releases page
Download Whisper model files (see Model Setup below)
Run VideoTranscriber.exe

Option 2: Run from Source

Prerequisites

Python 3.11 or higher
NVIDIA GPU (optional, for faster processing)
CUDA 11.8 or 12.1 (if using GPU)

Step 1: Clone the Repository

git clone https://github.com/yourusername/video-transcriber.git
cd video-transcriber

Step 2: Create Virtual Environment

python -m venv venv

# Windows
venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

Step 3: Install Dependencies

pip install -r requirements.txt

Step 4: Install PyTorch with CUDA (for GPU support)

# For CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# For CPU only
pip install torch torchvision torchaudio

Step 5: Run the Application

python run.py

🎯 Model Setup

Understanding Whisper Models

The app uses OpenAI's Whisper models for transcription. Each model size offers different trade-offs:

Model	Parameters	Speed	Quality	Download Size
tiny	39M	Very Fast	Basic	~39 MB
base	74M	Fast	Good	~74 MB
small	244M	Moderate	Better	~244 MB
medium	769M	Slow	Very Good	~769 MB
large	1550M	Very Slow	Best	~1.5 GB

Automatic Model Download

On first use, the app will automatically download the selected model from OpenAI (requires internet connection).

Manual Model Setup (Recommended for Offline Use)

Download Model Files
- Download .pt files from OpenAI Whisper
- Or from Hugging Face

Place Models in a Folder

C:\WhisperModels\
├── tiny.pt
├── base.pt
├── small.pt
├── medium.pt
└── large-v3.pt

Load in Application
- Click "Load Model Folder" button
- Navigate to your models folder
- Select the folder containing .pt files
- The app will remember this location

🚀 Usage Guide

Basic Workflow

Start the Application
- Run VideoTranscriber.exe or python run.py
Configure Settings
- Select output directory for transcripts
- Choose Whisper model size (larger = better quality, slower)
- (Optional) Load custom model folder
Add Videos
- Click "Add Files" to select videos
- Or "Add Directory" to process entire folders
- Or drag and drop files directly
Process Videos
- Click "Start Processing"
- Monitor progress in real-time
- Pause/resume as needed
Get Results
- Transcripts saved as .txt files
- Same filename as video with .txt extension
- Located in your selected output directory

Advanced Features

GPU Acceleration

The app automatically detects and uses NVIDIA GPUs. Check status in console output:

Model loaded successfully on cuda = GPU active ✅
Model loaded successfully on cpu = CPU only ⚠️

Batch Processing Tips

Queue processes videos in order (FIFO)
Each video's transcript is saved immediately upon completion
Failed videos don't stop the queue
Time estimates improve as more videos are processed

Text Processing Options

The app automatically:

Removes filler words while preserving meaning
Adds proper punctuation and capitalization
Creates readable paragraphs
Fixes common transcription errors

🔧 Building from Source

Creating Executable

Install PyInstaller
```
pip install pyinstaller
```

Run Build Script

# Windows
build_exe.bat

# Or manually
pyinstaller VideoTranscriber.spec --clean

Find Executable
- Located in dist/VideoTranscriber.exe
- Single file, ready for distribution

Customizing Build

Edit VideoTranscriber.spec to:

Add custom icon
Include additional files
Modify build options

🐛 Troubleshooting

Common Issues

"No model found" error

Ensure .pt files are in the selected folder
File names should contain model size (e.g., large.pt, large-v3.pt)

Slow processing on CPU

Install CUDA-enabled PyTorch (see installation)
Use smaller model (base or small)
Check GPU is detected in console output

"CUDA out of memory" error

Use smaller model
Close other GPU applications
Process shorter videos

Transcription has repeated text

App includes automatic repetition removal
Update to latest version
Report persistent issues

Performance Tips

For Speed: Use GPU + smaller models (base/small)
For Quality: Use large model with GPU
For Long Videos: Videos auto-split into segments
For Batch Processing: Queue overnight with large model

📁 Project Structure

video-transcriber/
├── src/
│   ├── ui/                    # GUI components
│   ├── transcription/          # Whisper integration
│   ├── audio_processing/       # Video/audio conversion
│   ├── post_processing/        # Text enhancement
│   ├── input_handling/         # File management
│   └── config/                 # Settings management
├── run.py                      # Application entry point
├── requirements.txt            # Python dependencies
├── VideoTranscriber.spec       # PyInstaller configuration
└── build_exe.bat              # Build script

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.claude		.claude
assets/icons		assets/icons
docs		docs
resources		resources
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
VideoTranscriber-Full.spec		VideoTranscriber-Full.spec
__init__.py		__init__.py
requirements.txt		requirements.txt
run.py		run.py
run_app.bat		run_app.bat
setup.py		setup.py

dusancv22/Video-Transcriber-App

Folders and files

Latest commit

History

Repository files navigation

Video Transcriber App

✨ Features

📥 Installation

Option 1: Download Pre-built Executable (Windows)

Option 2: Run from Source

Prerequisites

Step 1: Clone the Repository

Step 2: Create Virtual Environment

Step 3: Install Dependencies

Step 4: Install PyTorch with CUDA (for GPU support)

Step 5: Run the Application

🎯 Model Setup

Understanding Whisper Models

Automatic Model Download

Manual Model Setup (Recommended for Offline Use)

🚀 Usage Guide

Basic Workflow

Advanced Features

GPU Acceleration

Batch Processing Tips

Text Processing Options

🔧 Building from Source

Creating Executable

Customizing Build

🐛 Troubleshooting

Common Issues

Performance Tips

📁 Project Structure

🤝 Contributing

📄 License

🙏 Acknowledgments

💬 Support

🚀 Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages