A powerful desktop application that converts video files into accurate text transcripts using OpenAI's Whisper AI model. Features a modern GUI built with PyQt6, batch processing capabilities, and advanced text post-processing with filler word removal.
- π₯ Multi-Format Support: Process MP4, AVI, MKV, MOV, WEBM, and MP3 files
- π GPU Acceleration: Automatic CUDA detection for 10-20x faster processing
- π¦ Batch Processing: Queue multiple files for automated transcription
- π§Ή Advanced Text Processing:
- Automatic filler word removal ("um", "uh", "like", "you know")
- Smart punctuation and capitalization
- Paragraph formatting for readability
- π― Flexible Model Selection: Choose from tiny, base, small, medium, or large Whisper models
- πΎ Custom Model Loading: Load pre-downloaded models to work offline
- π Real-time Progress: Track processing with time estimates and progress bars
- βΈοΈ Pause/Resume: Control processing without losing progress
- π¨ Modern UI: Clean, intuitive interface with drag-and-drop support
- Download the latest
VideoTranscriber.exefrom the Releases page - Download Whisper model files (see Model Setup below)
- Run
VideoTranscriber.exe
- Python 3.11 or higher
- NVIDIA GPU (optional, for faster processing)
- CUDA 11.8 or 12.1 (if using GPU)
git clone https://github.com/yourusername/video-transcriber.git
cd video-transcriberpython -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activatepip install -r requirements.txt# For CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# For CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# For CPU only
pip install torch torchvision torchaudiopython run.pyThe app uses OpenAI's Whisper models for transcription. Each model size offers different trade-offs:
| Model | Parameters | Speed | Quality | Download Size |
|---|---|---|---|---|
| tiny | 39M | Very Fast | Basic | ~39 MB |
| base | 74M | Fast | Good | ~74 MB |
| small | 244M | Moderate | Better | ~244 MB |
| medium | 769M | Slow | Very Good | ~769 MB |
| large | 1550M | Very Slow | Best | ~1.5 GB |
On first use, the app will automatically download the selected model from OpenAI (requires internet connection).
-
Download Model Files
- Download
.ptfiles from OpenAI Whisper - Or from Hugging Face
- Download
-
Place Models in a Folder
C:\WhisperModels\ βββ tiny.pt βββ base.pt βββ small.pt βββ medium.pt βββ large-v3.pt -
Load in Application
- Click "Load Model Folder" button
- Navigate to your models folder
- Select the folder containing
.ptfiles - The app will remember this location
-
Start the Application
- Run
VideoTranscriber.exeorpython run.py
- Run
-
Configure Settings
- Select output directory for transcripts
- Choose Whisper model size (larger = better quality, slower)
- (Optional) Load custom model folder
-
Add Videos
- Click "Add Files" to select videos
- Or "Add Directory" to process entire folders
- Or drag and drop files directly
-
Process Videos
- Click "Start Processing"
- Monitor progress in real-time
- Pause/resume as needed
-
Get Results
- Transcripts saved as
.txtfiles - Same filename as video with
.txtextension - Located in your selected output directory
- Transcripts saved as
The app automatically detects and uses NVIDIA GPUs. Check status in console output:
Model loaded successfully on cuda= GPU active βModel loaded successfully on cpu= CPU onlyβ οΈ
- Queue processes videos in order (FIFO)
- Each video's transcript is saved immediately upon completion
- Failed videos don't stop the queue
- Time estimates improve as more videos are processed
The app automatically:
- Removes filler words while preserving meaning
- Adds proper punctuation and capitalization
- Creates readable paragraphs
- Fixes common transcription errors
-
Install PyInstaller
pip install pyinstaller
-
Run Build Script
# Windows build_exe.bat # Or manually pyinstaller VideoTranscriber.spec --clean
-
Find Executable
- Located in
dist/VideoTranscriber.exe - Single file, ready for distribution
- Located in
Edit VideoTranscriber.spec to:
- Add custom icon
- Include additional files
- Modify build options
"No model found" error
- Ensure
.ptfiles are in the selected folder - File names should contain model size (e.g.,
large.pt,large-v3.pt)
Slow processing on CPU
- Install CUDA-enabled PyTorch (see installation)
- Use smaller model (base or small)
- Check GPU is detected in console output
"CUDA out of memory" error
- Use smaller model
- Close other GPU applications
- Process shorter videos
Transcription has repeated text
- App includes automatic repetition removal
- Update to latest version
- Report persistent issues
- For Speed: Use GPU + smaller models (base/small)
- For Quality: Use large model with GPU
- For Long Videos: Videos auto-split into segments
- For Batch Processing: Queue overnight with large model
video-transcriber/
βββ src/
β βββ ui/ # GUI components
β βββ transcription/ # Whisper integration
β βββ audio_processing/ # Video/audio conversion
β βββ post_processing/ # Text enhancement
β βββ input_handling/ # File management
β βββ config/ # Settings management
βββ run.py # Application entry point
βββ requirements.txt # Python dependencies
βββ VideoTranscriber.spec # PyInstaller configuration
βββ build_exe.bat # Build script
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI Whisper for the amazing transcription model
- PyQt6 for the GUI framework
- MoviePy for video processing
- PyTorch for ML framework
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Support for more video formats
- Real-time transcription preview
- Speaker diarization
- Multiple language support
- Cloud processing option
- Export to SRT/VTT subtitles
- Integration with video editing software