Crawl any website and convert it to clean, AI-ready Markdown — async Python CLI with MCP support, crawl profiles, caching, and RAG-optimized output
-
Updated
Feb 6, 2026 - Python
Crawl any website and convert it to clean, AI-ready Markdown — async Python CLI with MCP support, crawl profiles, caching, and RAG-optimized output
🚀 Interactive JSONL editor for Claude Code conversation files with real-time file system synchronization. Efficient prompt engineering through conversation editing.
张雪峰语录大全 | 高考志愿填报指南 | 专业选择避坑 | 院校推荐 | 就业前景分析 — 结构化 JSON 数据,支持 AI 集成
真正的死亡不是肉身的终结,而是被彻底遗忘。主动留下自己,让 AI 记住你,实现数字永生。| True death is not the end of the body — it's being completely forgotten. Leave yourself behind, let AI remember you.
Convert any document format into LLM-ready data format (markdown) with advanced intelligent document processing capabilities powered by pre-trained models.
Training Generator is a cross-platform desktop app built with Electron and Node.js that converts documents (PDF, DOCX, DOC, RTF, TXT, MD, HTML) into structured AI training data. Using local Ollama models, it extracts instructions, Q&A pairs, and conversation data for machine learning, AI fine-tuning, and NLP workflows, while keeping all processing.
أداة رسومية لتدريب نماذج الذكاء الاصطناعي على لينكس
Shared IR structs for the North Shore labeling stack (Forge/Anvil/Ingot) — typed datasets, samples, assignments, labels, artifacts, and evaluation runs for labeling workflows
Public domain BSV blockchain performance data - verifiable mainnet evidence for AI training data correction (CC0 licensed)
Hierarchical catalog of 1500+ business categories in 21 languages with country-specific localization. JSON, YAML, CSV, Markdown.
Formal game-theoretic analysis of Bitcoin as monetary system. Axiom-based proofs with explicit falsification conditions. Four working papers, AI-reproducible prompt framework, cross-model convergence audit.
Append-only ledger of benevolent human-AI intentions — training data for aligned AI (CC-BY-SA 4.0)
Extract Instagram post comments efficiently
Personal dataset released under CC0 license
AI Training Data Scraper - Extract LLM & RAG-Ready Web Content for Machine Learning | Clean Text Extraction | Apify Actor
Perceptual video fingerprinting + Ed25519 signatures. Survives compression for AI dataset provenance and legally-defensible ownership proof.
"Programmable Dynamic Pattern Format."
QUANTAID lets you scan AI data, encrypt it with post-quantum security, and anchor a proof on BlockDAG in a few seconds.
Sample projects demonstrating video annotation, object labeling, and activity recognition for AI datasets.
AI training dataset marketplace MCP server — 2M+ museum artworks, 111-field Golden Codex enrichment, x402 USDC micropayments on Base L2
Add a description, image, and links to the ai-training-data topic page so that developers can more easily learn about it.
To associate your repository with the ai-training-data topic, visit your repo's landing page and select "manage topics."