VLM Runtime

Real-time vision-language model running entirely in your browser
WebGPU • Zero dependencies • 100% Private

What is this?

A browser-based implementation of FastVLM-0.5B that lets you ask questions about what your camera sees. Everything runs locally on your GPU—no servers, no API calls, complete privacy.

In simple terms: Point your camera at something, ask a question, get an instant AI response. All processing happens in your browser.

Why This Matters

🔒 Privacy First — Your camera feed never leaves your device
💰 Zero API Costs — Runs 100% locally using WebGPU acceleration
⚡ Fast & Responsive — Real-time inference with GPU acceleration
📚 Educational — See how modern vision-language models work in the browser
🎨 Beautiful Design — Inspired by Apple/WWDC's design language

How It Works

Processing Pipeline

📹 Camera Feed → WebGPU Processing → FastVLM-0.5B Model → Live Captions

Technology Stack

Component	Technology
Language	Vanilla JavaScript (no frameworks)
GPU Acceleration	WebGPU
Model Runtime	Transformers.js + ONNX Runtime
UI Design	CSS + vanilla JS components
Styling	Apple-inspired glass morphism design

Performance Optimizations

✅ Lazy Loading — 60% faster startup, models load on-demand
✅ Frame Downscaling — 50% faster inference via 640px resolution
✅ UI Throttling — Batched updates every 100ms
✅ Canvas Caching — Dimension tracking prevents reflows
✅ Async Control — Abortable operations for responsive UI
✅ FP16 Support — 2-3× speed boost on compatible GPUs
✅ Dynamic Frame Timing — Automatic adjustment based on GPU speed

Architecture

State Machine

The app uses a formal event-driven state machine for predictable state transitions and robust error handling:

State Separation:

ViewState — UI screens: permission | welcome | loading | runtime | error | image-upload
RuntimeState — Execution state: idle | warming | running | paused | recovering | failed
LoadingPhase — Model loading: loading-wgpu | loading-model | warming-up | complete

15+ Formal Transitions:

PERMISSION_GRANTED → welcome screen
START → loading screen
WGPU_READY → model starts loading
MODEL_LOADED → warmup begins
WARMUP_COMPLETE → runtime (live captioning)
STREAM_ENDED → error screen with recovery
RETRY → back to permission flow

Error Handling:

Formal error states with codes: CAMERA_DENIED, MODEL_LOAD_FAILED, STREAM_LOST, etc.
Recovery actions: retry, reload, fallback modes
Technical details collapsible for debugging

Why:

Declarative transitions with guards prevent invalid states
Centralized error recovery flows
Better debugging with event logs
Production-ready architecture at scale

See src/js/utils/state-machine.js for full implementation.

Quick Start

Requirements

✅ WebGPU-enabled browser (Chrome 113+, Edge 113+, Firefox 141+)
✅ Camera/Webcam
✅ Local server (file:// protocol not supported)
✅ HTTPS connection (or localhost) — required for camera access

Check WebGPU compatibility: webgpu.io/test

Local Development

# Clone the repository
git clone <repository-url>
cd Vision-Language-Runtime

# Start a local server (CORS requirement)
cd src
python -m http.server 8000

# Open in browser
http://localhost:8000

Production Deployment

No build step required! Deploy the src/ directory directly:

Build command:          (empty - no build needed)
Build output directory: src/

Host on:

Cloudflare Pages (recommended - free HTTPS + CDN)
Vercel
Netlify
GitHub Pages
Any static file host

See CONTRIBUTING.md for detailed deployment guides.

Mobile & HTTPS Guide

Mobile Devices

Camera access on mobile requires HTTPS (except localhost). Follow these steps:

iOS/Safari

Deploy app to HTTPS (Cloudflare Pages recommended)
Settings → Safari → Camera → Allow
Grant permissions when prompted
Reload page after permissions change

Android/Chrome

Navigate via HTTPS
Grant camera permissions
Enable WebGPU support: chrome://flags → search "webgpu" → enable developer features
Reload and check console for WebGPU status

Testing on Local Network

# Find your machine's IP
ifconfig          # Mac/Linux
ipconfig          # Windows

# Start server (from src/ directory)
python -m http.server 8000

# Access from mobile on same network
http://YOUR_LOCAL_IP:8000

Recommended: Deploy to Cloudflare Pages

Get instant HTTPS + global CDN (free tier available):

Push to GitHub
Connect repo to Cloudflare Pages
Set build output directory to src/
Auto-deploy on push

Result: https://your-app.pages.dev with zero build steps

Features

Core Features

✨ Real-time Inference — 1-3 seconds per frame (varies by GPU)
🔒 Private by Default — 100% on-device processing
🎨 Glass UI Design — Apple/WWDC-inspired monochromatic interface
⚡ GPU Accelerated — WebGPU + FP16 support for 2-3× speed boost

Customization

📝 Custom Prompts — Ask any question about what the camera sees
🌍 Multilingual Presets — 10+ language options for live captions
🎭 Live ASCII Art — Psychedelic art background from camera feed
🖼️ Freeze Frame Analysis — Pause and analyze static images

Productivity

📊 Caption History — Store last 20 captions with JSON export
🔗 Smart URL Detection — Automatic URL recognition with security checks
📱 Device Optimization — Auto-detection for iOS/Safari/Android
🎥 Camera Switching — Fast switching between front/rear cameras

Developer Tools

🔧 Diagnostics Panel — Press Ctrl+Shift+D for real-time stats
📋 Debug Logger — Full logging console for troubleshooting
✅ Type Checking — TypeScript validation (optional dev tool)

Performance tuning

The app automatically detects your GPU and selects an optimal performance tier (low, medium, or high). Each tier controls inference size, token limit, and minimum frame timing.

Adjust performance per tier

Edit src/js/utils/constants.js in the QOS_PROFILES object:

// For slower/older GPUs - edit QOS_PROFILES.low
low: {
    MAX_INFERENCE_SIZE: 320,  // Default inference resolution
    MAX_NEW_TOKENS: 32,       // Output token limit
    TIMING_DELAY_MS: 5000,    // Minimum ms between frames
    SYSTEM_PROMPT: '...'      // Prompt for fast responses
}

// For mid-range GPUs - edit QOS_PROFILES.medium
medium: {
    MAX_INFERENCE_SIZE: 480,
    MAX_NEW_TOKENS: 64,
    TIMING_DELAY_MS: 3500,
    SYSTEM_PROMPT: '...'
}

// For high-end GPUs - edit QOS_PROFILES.high
high: {
    MAX_INFERENCE_SIZE: 640,  // Default: 640px
    MAX_NEW_TOKENS: 128,
    TIMING_DELAY_MS: 2000,
    SYSTEM_PROMPT: '...'
}

Frame timing: The app uses dynamic frame delays based on actual inference time. It waits at least TIMING_DELAY_MS and adds 20% buffer to measured time: delay = max(TIMING_DELAY_MS, inferenceTime * 1.2). This prevents GPU throttling on slower hardware.

Enable debug mode

// In src/js/utils/constants.js, find MODEL_CONFIG
MODEL_CONFIG.DEBUG = true  // Auto-enabled on localhost

Enable FP16 for 2× speed boost

FP16 (half-precision floating-point) can double inference speed on compatible GPUs. The app automatically detects and uses FP16 if available.

Hardware tier auto-detection: The app runs GPU capability tests on startup to determine if it's low, medium, or high tier. This controls frame resolution, token limits, and waiting times. You can see the detection results in the browser console on first load.

Check FP16 status: Open browser console on first load - you'll see WebGPU detection results including FP16 availability.

Enable FP16 on mobile (Samsung S24+, Pixel 9, etc):

Open Chrome and navigate to:
```
chrome://flags
```
Search and enable:
```
#enable-webgpu-developer-features
```
Restart browser and verify:
```
chrome://gpu
```
Look for shader-f16 in WebGPU Features list

Enable FP16 on desktop:

Chrome/Edge: chrome://flags → enable #enable-unsafe-webgpu
Restart browser
Check console for "🚀 FP16 enabled" message

Performance impact with FP16:

Samsung S24+ (Adreno 750): ~2-3s per frame (vs 4-6s)
Desktop RTX 4090: ~1-2s per frame (vs 3-4s)
iPhone 15 Pro (A17): ~3-4s per frame (partial support)

Troubleshooting

Issue	Solution
WebGPU not available	Update browser to latest version. Check webgpu.io for compatibility. App falls back to image upload mode.
Model won't load	Clear cache (Ctrl+Shift+Delete), check console for CORS errors, verify internet connection
Slow performance	Reduce `MAX_INFERENCE_SIZE` in `src/js/utils/constants.js` or increase `TIMING_DELAY_MS`. Close GPU-intensive apps.
Camera blocked on mobile	Make sure using HTTPS (camera requires HTTPS on mobile). Check Settings → Safari → Camera → Allow. Reload page.
Camera blocked on desktop	Check browser permissions: Settings → Site Permissions → Camera. Reload page. Try incognito mode.
"Insecure Connection" warning	Camera requires HTTPS. Use secure connection or run on localhost for testing.
URLs in captions aren't clickable	Click the URL badge to open with security confirmation. Never open untrusted links!

Debug Mode

Enable debug logging in src/js/utils/constants.js:

MODEL_CONFIG.DEBUG = true  // Auto-enabled on localhost

Open browser DevTools (F12) to see detailed logs for:

GPU detection
Model loading
Inference timing
State transitions
Error details

Credits & Attribution

Base Model & Framework:

Apple/FastVLM-0.5B — Efficient vision-language model by Apple Research
Transformers.js — JavaScript runtime by Hugging Face
ONNX Runtime Web — Inference engine

Implementation:

Browser-native implementation in Vanilla JavaScript
Zero external dependencies in production
Performance optimizations for GPU efficiency
Apple-inspired design language

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Development Setup

# Optional: Install dev dependencies (for testing only)
npm install

# Run tests
npm run test:unit      # Unit tests
npm run test:e2e       # E2E tests with Playwright
npm run type-check     # TypeScript validation

# Start development server
cd src && python -m http.server 8000

Note: Development tools are optional. The production app has zero external dependencies.

License

Licensed under the MIT License. See LICENSE file for details.

Model License: FastVLM-0.5B is subject to its own license terms on Hugging Face.

Made with ☕ and questionable engineering decisions

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github		.github
assets		assets
docs		docs
src		src
tests		tests
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.prettierrc		.prettierrc
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MODULARIZATION_SUMMARY.md		MODULARIZATION_SUMMARY.md
README.md		README.md
SECURITY.md		SECURITY.md
package.json		package.json
playwright.config.js		playwright.config.js
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

VLM Runtime

Table of Contents

What is this?

Why This Matters

How It Works

Processing Pipeline

Technology Stack

Performance Optimizations

Architecture

State Machine

Quick Start

Requirements

Local Development

Production Deployment

Mobile & HTTPS Guide

Mobile Devices

iOS/Safari

Android/Chrome

Testing on Local Network

Recommended: Deploy to Cloudflare Pages

Features

Core Features

Customization

Productivity

Developer Tools

Performance tuning

Adjust performance per tier

Enable debug mode

Enable FP16 for 2× speed boost

Troubleshooting

Debug Mode

Credits & Attribution

Contributing

Development Setup

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages