Skip to content

A desktop application that converts video files into text transcripts using OpenAI's Whisper AI model.

Notifications You must be signed in to change notification settings

dusancv22/Video-Transcriber-App

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

32 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Video Transcriber App

A powerful desktop application that converts video files into accurate text transcripts using OpenAI's Whisper AI model. Features a modern GUI built with PyQt6, batch processing capabilities, and advanced text post-processing with filler word removal.

Python PyQt6 Whisper License

✨ Features

  • πŸŽ₯ Multi-Format Support: Process MP4, AVI, MKV, MOV, WEBM, and MP3 files
  • πŸš€ GPU Acceleration: Automatic CUDA detection for 10-20x faster processing
  • πŸ“¦ Batch Processing: Queue multiple files for automated transcription
  • 🧹 Advanced Text Processing:
    • Automatic filler word removal ("um", "uh", "like", "you know")
    • Smart punctuation and capitalization
    • Paragraph formatting for readability
  • 🎯 Flexible Model Selection: Choose from tiny, base, small, medium, or large Whisper models
  • πŸ’Ύ Custom Model Loading: Load pre-downloaded models to work offline
  • πŸ“Š Real-time Progress: Track processing with time estimates and progress bars
  • ⏸️ Pause/Resume: Control processing without losing progress
  • 🎨 Modern UI: Clean, intuitive interface with drag-and-drop support

πŸ“₯ Installation

Option 1: Download Pre-built Executable (Windows)

  1. Download the latest VideoTranscriber.exe from the Releases page
  2. Download Whisper model files (see Model Setup below)
  3. Run VideoTranscriber.exe

Option 2: Run from Source

Prerequisites

  • Python 3.11 or higher
  • NVIDIA GPU (optional, for faster processing)
  • CUDA 11.8 or 12.1 (if using GPU)

Step 1: Clone the Repository

git clone https://github.com/yourusername/video-transcriber.git
cd video-transcriber

Step 2: Create Virtual Environment

python -m venv venv

# Windows
venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

Step 3: Install Dependencies

pip install -r requirements.txt

Step 4: Install PyTorch with CUDA (for GPU support)

# For CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# For CPU only
pip install torch torchvision torchaudio

Step 5: Run the Application

python run.py

🎯 Model Setup

Understanding Whisper Models

The app uses OpenAI's Whisper models for transcription. Each model size offers different trade-offs:

Model Parameters Speed Quality Download Size
tiny 39M Very Fast Basic ~39 MB
base 74M Fast Good ~74 MB
small 244M Moderate Better ~244 MB
medium 769M Slow Very Good ~769 MB
large 1550M Very Slow Best ~1.5 GB

Automatic Model Download

On first use, the app will automatically download the selected model from OpenAI (requires internet connection).

Manual Model Setup (Recommended for Offline Use)

  1. Download Model Files

  2. Place Models in a Folder

    C:\WhisperModels\
    β”œβ”€β”€ tiny.pt
    β”œβ”€β”€ base.pt
    β”œβ”€β”€ small.pt
    β”œβ”€β”€ medium.pt
    └── large-v3.pt
    
  3. Load in Application

    • Click "Load Model Folder" button
    • Navigate to your models folder
    • Select the folder containing .pt files
    • The app will remember this location

πŸš€ Usage Guide

Basic Workflow

  1. Start the Application

    • Run VideoTranscriber.exe or python run.py
  2. Configure Settings

    • Select output directory for transcripts
    • Choose Whisper model size (larger = better quality, slower)
    • (Optional) Load custom model folder
  3. Add Videos

    • Click "Add Files" to select videos
    • Or "Add Directory" to process entire folders
    • Or drag and drop files directly
  4. Process Videos

    • Click "Start Processing"
    • Monitor progress in real-time
    • Pause/resume as needed
  5. Get Results

    • Transcripts saved as .txt files
    • Same filename as video with .txt extension
    • Located in your selected output directory

Advanced Features

GPU Acceleration

The app automatically detects and uses NVIDIA GPUs. Check status in console output:

  • Model loaded successfully on cuda = GPU active βœ…
  • Model loaded successfully on cpu = CPU only ⚠️

Batch Processing Tips

  • Queue processes videos in order (FIFO)
  • Each video's transcript is saved immediately upon completion
  • Failed videos don't stop the queue
  • Time estimates improve as more videos are processed

Text Processing Options

The app automatically:

  • Removes filler words while preserving meaning
  • Adds proper punctuation and capitalization
  • Creates readable paragraphs
  • Fixes common transcription errors

πŸ”§ Building from Source

Creating Executable

  1. Install PyInstaller

    pip install pyinstaller
  2. Run Build Script

    # Windows
    build_exe.bat
    
    # Or manually
    pyinstaller VideoTranscriber.spec --clean
  3. Find Executable

    • Located in dist/VideoTranscriber.exe
    • Single file, ready for distribution

Customizing Build

Edit VideoTranscriber.spec to:

  • Add custom icon
  • Include additional files
  • Modify build options

πŸ› Troubleshooting

Common Issues

"No model found" error

  • Ensure .pt files are in the selected folder
  • File names should contain model size (e.g., large.pt, large-v3.pt)

Slow processing on CPU

  • Install CUDA-enabled PyTorch (see installation)
  • Use smaller model (base or small)
  • Check GPU is detected in console output

"CUDA out of memory" error

  • Use smaller model
  • Close other GPU applications
  • Process shorter videos

Transcription has repeated text

  • App includes automatic repetition removal
  • Update to latest version
  • Report persistent issues

Performance Tips

  1. For Speed: Use GPU + smaller models (base/small)
  2. For Quality: Use large model with GPU
  3. For Long Videos: Videos auto-split into segments
  4. For Batch Processing: Queue overnight with large model

πŸ“ Project Structure

video-transcriber/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ ui/                    # GUI components
β”‚   β”œβ”€β”€ transcription/          # Whisper integration
β”‚   β”œβ”€β”€ audio_processing/       # Video/audio conversion
β”‚   β”œβ”€β”€ post_processing/        # Text enhancement
β”‚   β”œβ”€β”€ input_handling/         # File management
β”‚   └── config/                 # Settings management
β”œβ”€β”€ run.py                      # Application entry point
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ VideoTranscriber.spec       # PyInstaller configuration
└── build_exe.bat              # Build script

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

πŸ’¬ Support

πŸš€ Roadmap

  • Support for more video formats
  • Real-time transcription preview
  • Speaker diarization
  • Multiple language support
  • Cloud processing option
  • Export to SRT/VTT subtitles
  • Integration with video editing software

About

A desktop application that converts video files into text transcripts using OpenAI's Whisper AI model.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors