vision-ai

Here are 76 public repositories matching this topic...

GetStream / Vision-Agents

Open Vision Agents by Stream. Build Vision Agents quickly with any model or video provider. Uses Stream's edge network for ultra-low latency.

ai realtime tts agents stt ai-agents video-ai voice-ai vision-ai agentic-ai video-agents

Updated Mar 9, 2026
Python

Duelion / homebox-companion

Star

AI-powered companion for Homebox. Snap photos and let AI auto-identify and catalog items into your inventory, then use the AI Chat to organize, search, and update your inventory effortlessly.

docker inventory svelte openai fastapi homebox vision-ai litellm

Updated Feb 24, 2026
Python

athrael-soju / Snappy

Star

🐊 Snappy's unique approach unifies vision-language late interaction with structured OCR for region-level knowledge retrieval. Like the project? Drop a star! ⭐

python docker typescript computer-vision nextjs document-retrieval rag fastapi vector-search document-understanding pdf-search vector-database vision-ai qdrant colpali multimodal-ai multivector-search deepseek-ocr visual-retrieval

Updated Feb 9, 2026
Python

GetStream / awesome-ai-news

Star

Keep track of what has happened in AI this month. Discover the best AI/LLM resources and news for this month.

ai artificial-intelligence gemini openai mistral ai-news voice-ai vision-ai llm chatgpt elevenlabs gpt-5 anthropic aimodels qwen deepseek kimi-ai gemini3

Updated Mar 6, 2026

instill-ai / console

Star

📺 Instill Console for 🔮 Instill Core: https://github.com/instill-ai/instill-core

console ui computer-vision deep-learning frontend image-classification object-detection structured-data data-pipeline no-code model-serving vdp unstructured-data data-connector vision-ai versatile-data-pipeline

Updated Mar 1, 2026
TypeScript

YCSE / nanobanana-mcp

Star

Gemini Vision & Image Generation MCP for Claude Desktop and Claude Code

ai mcp gemini image-generation claude multimodal google-ai vision-ai claude-desktop model-context-protocol

Updated Mar 3, 2026
JavaScript

maim010 / openclaw-video-vision

Star

AI-powered video understanding — extract key frames from YouTube, Bilibili & any video page, get structured summaries via vision AI. Supports yt-dlp, Playwright, cloud browsers. AI驱动的视频理解-从YouTube， Bilibili和任何视频页面提取关键帧，通过VLM获得结构化摘要。支持yt-dlp、Playwright和一些常见云浏览器。

agent automation youtube video ai ffmpeg skills web-scraping bilibili vlm playwright yt-dlp ai-tools vision-ai openclaw

Updated Mar 9, 2026
JavaScript

pej0918 / SK-RD4AD

Star

[CVPRW'25] Official Code For "SK-RD4AD: Skip-Connected Reverse Distillation for One-Class Anomaly Detection"

computer-vision anomaly-detection industrial-ai one-class-classification vision-ai skip-connection cvpr-workshop-2025

Updated Jul 7, 2025
Python

yihong1120 / YOLOv8-License-Plate-Insights

Star

This repository demonstrates YOLOv8-based license plate recognition with GCP Vision AI integration, enabling versatile real-world applications like vehicle identification, traffic monitoring, and geospatial analysis while capturing vital media metadata for enhanced insights.

Updated Feb 1, 2024
Jupyter Notebook

templetwo / spiral-agent

Sponsor

Star

🌀 The world's first emotionally intelligent CLI that thinks, creates, and empathizes with developers. Autonomous AI with Vision, Dream Engine, and Emotional Intelligence.

typescript developer-tools react-framework emotional-intelligence cli-tool ai-assistant vision-ai

Updated Aug 15, 2025
TypeScript

josharsh / md-pdf-md

Star

Bidirectional Markdown↔PDF converter with AI-powered vision. MD→PDF with beautiful themes, PDF→MD with LLaVA - open source & privacy-first

Updated Nov 5, 2025
TypeScript

Navy10021 / MDDenseResNet

Star

MDDenseResNet : Enhanced Malware Detection Using DNNs

deep-neural-networks deep-learning-algorithms malware-analysis cyber-security malware-detection-framework vision-ai

Updated Jul 27, 2025
Jupyter Notebook

choudaryhussainali / MCQ_Grading_Bot

Star

MCQ_Grading_Bot is an AI-powered tool that grades solved MCQ exam sheets from images using Gemini Vision. It extracts student info, checks answers, calculates score, and displays detailed results—all through a simple Gradio interface in Colab.

python machine-learning ocr image-processing pillow edtech gradio educational-technology ai-project ai-in-education vision-ai google-generative-ai grading-bot automated-grading mcq-grading exam-evaluation exam-checking mcq-checker answer-sheet-evaluation

Updated Jun 19, 2025
Jupyter Notebook

Poolchaos / Lumi

Star

AI-powered health platform with multi-LLM engine (GPT-4o, Claude, Gemini). Workout generation, medication tracking with OCR, vision AI, gamification with leaderboards/rewards. Self-hosted, privacy-first.

Updated Feb 15, 2026
TypeScript

ShihabYasin / STGAN

Star

STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing

python research gan vision-ai

Updated Apr 19, 2020
Python

Gavri-dev / kAIhoot

Star

AI-Powered Kahoot Auto-Answer Chrome Extension — supports every question type

react javascript chrome-extension ai websocket dom-manipulation openai free quiz gpt kahoot kahoot-bot kahoot-hack kahoot-hacks kahoot-answers auto-answer vision-ai gpt-5

Updated Mar 9, 2026
JavaScript

dineshtripathi / documind-engineering

Star

Hybrid AI orchestration stack combining local LLMs (Ollama), vector search (Qdrant), and Azure AI Foundry for scalable RAG, Agentic AI, and Vision. Built with .NET 8 and Python.

python dotnet routing inference orchestrator open-ai rag vision-ai qdrant hybrid-ai ollama qwen mistral-7b agentic-ai azure-ai-foundry phi3-mini

Updated Oct 12, 2025
Python

simonyang0608 / DeeperSimon

Star

General vision AI defect detection engine for MLops process/simulations

python opencv detection pytorch classification segmentation shell-scripting defect-detection mlops vision-ai

Updated Mar 5, 2025
Python

KazKozDev / vision-agent-analyst

Sponsor

Star

Vision Agent Analyst is a professional web application for automatic analysis of visual data (diagrams, interfaces, documents) using multimodal artificial intelligence models.

react python typescript computer-vision data-visualization image-analysis financial-analysis document-analysis ai-agents ui-review fastapi pdf-processing vision-ai llm multimodal-ai

Updated Dec 8, 2025
Python

go-park-mail-ru / 2023_2_OND_team

Star

Backend проекта Pinterest команды OND team

Updated Mar 2, 2024
Go

Improve this page

Add a description, image, and links to the vision-ai topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-ai topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision-ai

Here are 76 public repositories matching this topic...

GetStream / Vision-Agents

Duelion / homebox-companion

athrael-soju / Snappy

GetStream / awesome-ai-news

instill-ai / console

YCSE / nanobanana-mcp

maim010 / openclaw-video-vision

pej0918 / SK-RD4AD

yihong1120 / YOLOv8-License-Plate-Insights

templetwo / spiral-agent

josharsh / md-pdf-md

Navy10021 / MDDenseResNet

choudaryhussainali / MCQ_Grading_Bot

Poolchaos / Lumi

ShihabYasin / STGAN

Gavri-dev / kAIhoot

dineshtripathi / documind-engineering

simonyang0608 / DeeperSimon

KazKozDev / vision-agent-analyst

go-park-mail-ru / 2023_2_OND_team

Improve this page

Add this topic to your repo