Ern-OS

Sovereign AI agent engine. Local-first. Written in Rust.

Quick Start · Architecture · Tools · Memory · Documentation

Ern-OS is a high-performance AI agent engine that runs entirely on your hardware. No cloud. No telemetry. No API keys required. Point it at any GGUF model via llama-server, and you get a full agentic system: a dual-layer inference engine with ReAct reasoning, a 29-tool executor, a 7-tier persistent memory system, an observer audit pipeline, autonomous learning, and a 12-tab WebUI dashboard — all compiled into a single Rust binary.

Created by @mettamazza

Quick Start

# 1. Clone
git clone https://github.com/mettamazza/ErnosAgent.git
cd ErnosAgent

# 2. Place a GGUF model
mkdir -p models
# Copy your model to models/ (e.g., gemma-4-27b-it-Q4_K_M.gguf)

# 3. Configure (edit ern-os.toml with your model path)
# 4. Run
cargo run --release

Opens http://localhost:3000 — the full dashboard with chat, memory explorer, tool logs, training controls, and more.

Requirements

Dependency	Purpose
Rust 1.75+	Build the engine
llama-server	Serve GGUF models locally
A GGUF model file	The brain (any model works — Gemma, Llama, Mistral, etc.)

Optional: Kokoro TTS (voice), Flux (image generation), code-server (VS Code IDE) — each auto-launches if configured and available.

Architecture

User ──→ WebUI (localhost:3000)
           │
           ├─ WebSocket: Chat / Voice / Video
           │
    ┌──────┴──────────────────────────────────┐
    │         Dual-Layer Inference Engine      │
    │                                          │
    │  Layer 1 (L1): Fast single-shot reply    │
    │  ─ 20 tools, streaming, sub-second       │
    │                                          │
    │  Layer 2 (L2): ReAct reasoning loop      │
    │  ─ 27 tools, multi-turn, autonomous      │
    │  ─ Model-driven turn management          │
    │  ─ Observer audit on every reply          │
    ├──────────────────────────────────────────┤
    │  29-Tool Executor                        │
    │  shell · web · files · browser · memory  │
    │  sub-agents · artifacts · codebase edit  │
    │  image gen · SAE · steering · learning   │
    ├──────────────────────────────────────────┤
    │  7-Tier Persistent Memory                │
    │  timeline · scratchpad · lessons ·       │
    │  synaptic · procedures · embeddings ·    │
    │  consolidation                           │
    ├──────────────────────────────────────────┤
    │  Learning Pipeline                       │
    │  golden buffer · rejection buffer ·      │
    │  LoRA · GRPO · sleep consolidation       │
    ├──────────────────────────────────────────┤
    │  Provider Trait (model-neutral)           │
    │  llamacpp · ollama · openai-compatible   │
    └──────────────────────────────────────────┘

Dual-Layer Inference

Layer 1 handles straightforward requests — the model gets a single inference call with 20 tools (including memory, search, files, browser, planning, verification, and escalation). If the task requires multi-step reasoning, it escalates to Layer 2.

Layer 2 runs a full ReAct loop: the model reasons, calls tools, observes results, and continues until it decides it's done. Turn management is model-driven — the model requests extensions when it needs more turns. An Observer audits every reply for quality, hallucination, and completeness before it reaches the user.

Model Neutrality

Ern-OS doesn't care what model you run. The Provider trait abstracts all inference:

llamacpp — local GGUF models via llama-server (default, recommended)
ollama — Ollama-managed models
openai-compatible — any OpenAI-compatible API endpoint

Tools

29 native tools, all executing locally:

Tool	What It Does
`run_bash_command`	Execute shell commands with working directory control
`web_search`	Search the web and visit URLs (8-engine waterfall: Brave, Serper, Tavily, SerpAPI, DuckDuckGo, Google, Wikipedia, Google News RSS)
`file_read` / `file_write`	Read and write files on the local filesystem
`codebase_search`	Recursive grep across directories
`codebase_edit`	Find-replace, insert, multi-patch with auto-checkpoint
`browser`	Headless Chrome — open, navigate, click, type, screenshot
`memory`	Store, recall, and search across the memory system
`scratchpad` / `timeline` / `lessons` / `synaptic`	Direct access to individual memory tiers
`self_skills`	Create, store, and execute learned skill procedures
`spawn_sub_agent`	Launch a child agent with scoped tool access
`propose_plan`	Create an implementation plan for user approval before execution
`create_artifact`	Generate structured documents and reports
`generate_image`	Text-to-image via local Flux server
`learning`	Trigger LoRA training, manage preference buffers
`interpretability`	SAE feature analysis, activation inspection
`steering`	Runtime steering vectors for behaviour modification
`system_recompile`	Hot-recompile the engine from its own source
`system_logs`	Read and search runtime logs
`checkpoint`	Create named restore points during codebase edits
`plan_and_execute`	Decompose a complex objective into a DAG of sub-tasks and execute via sub-agents
`verify_code`	Run the verification pipeline (compile → test → browser) to validate code changes

Memory System

7 tiers of persistent memory, all stored locally as JSON:

Tier	Purpose	Persistence
Timeline	Chronological event log — every tool call, every interaction	Append-only
Scratchpad	Working memory for the current task	Session-scoped
Lessons	Distilled learnings from past mistakes and successes	Permanent
Synaptic	High-signal knowledge graph with weighted connections	Permanent
Procedures	Executable skill recipes synthesised from experience	Permanent
Embeddings	Vector store for semantic recall	Permanent
Consolidation	Sleep-cycle memory compression and pruning	Scheduled

Memory is automatically recalled at inference time and injected into the system prompt. The consolidation engine runs on a configurable schedule to compress, prune, and strengthen memory based on access patterns.

Observer

Every Layer 2 reply passes through the Observer before reaching the user. The Observer is a separate inference call that audits for:

Hallucination — claims not supported by tool results
Sycophancy — agreeing with the user when evidence says otherwise
Laziness — incomplete, vague, or placeholder responses
Tool ignorance — describing what it would do instead of using tools

If the Observer rejects a reply, the model gets structured feedback and tries again. This is not a filter — it's a quality loop.

WebUI Dashboard

12 tabs accessible from localhost:3000:

Tab	What's There
Chat	Streaming chat with thinking blocks, tool execution cards, artifacts
Memory	Browse and search all 7 memory tiers
Tools	Live tool execution log with timing
Training	Golden/rejection buffer stats, trigger LoRA training
Interpretability	SAE feature analysis, activation heatmaps
Steering	Apply runtime steering vectors
Logs	Live system logs with filtering
Identity	View and edit the agent's persona
Agents	Manage sub-agent configurations
Scheduler	Cron-like job scheduling (health checks, consolidation, learning)
Codes	Embedded VS Code IDE (via code-server)
Settings	Platform adapters, provider config, system controls

Learning Pipeline

Ern-OS has a built-in learning pipeline for continuous self-improvement:

Golden Buffer — captures high-quality interaction pairs for SFT fine-tuning
Rejection Buffer — captures Observer-rejected responses for preference training (DPO/GRPO)
Sleep Consolidation — scheduled memory compression, lesson extraction, and skill synthesis
LoRA Training — native Candle-based LoRA on Apple Silicon (Metal-accelerated)

Customisation

Identity / Persona

Create data/prompts/identity.md to give your agent a custom personality. If absent, a default Ern-OS persona is used. The identity file supports full markdown and is injected into the system prompt at inference time.

Configuration

All configuration lives in ern-os.toml:

[general]
active_provider = "llamacpp"
data_dir = "data"

[llamacpp]
server_binary = "/opt/homebrew/bin/llama-server"
port = 8080
model_path = "./models/your-model.gguf"
n_gpu_layers = 999

[observer]
enabled = true

[web]
port = 3000
open_browser = true

[prompt]
thinking_enabled = true

See docs/configuration.md for the full reference.

Project Stats

Metric	Value
Language	Rust (Edition 2021)
Source files	173 `.rs` files
Lines of code	~26,000
Tests	454 passing (378 lib + 76 e2e)
Test failures	0
Compiler warnings	0
Tools	29 (20 in L1, 27 in L2)
API endpoints	80 REST + 3 WebSocket (chat, voice, video)
Dashboard tabs	12
Memory tiers	7
Providers	3 (llamacpp, ollama, openai-compatible)
Auto-launching services	4 (WebUI, Kokoro TTS, Flux image gen, code-server)

Documentation

Document	Description
Architecture	System design, data flow, module responsibilities
Configuration	All config options with types and defaults
Memory System	7-tier memory architecture and consolidation
Inference Pipeline	Dual-layer engine, ReAct loop, observer audit
Learning Pipeline	LoRA, GRPO, sleep consolidation, preference training
Tools	29-tool registry with schemas and parallel execution
Interpretability	SAE, feature analysis, steering vectors
Provider Interface	Provider trait, implementations, model neutrality
Testing	Test structure, coverage, running tests

License

MIT — do whatever you want with it.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github/workflows		.github/workflows
.gradle		.gradle
.idea		.idea
agents		agents
android		android
docs		docs
extensions/ernos-chat		extensions/ernos-chat
gradle		gradle
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
local.properties		local.properties
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ern-OS

Quick Start

Requirements

Architecture

Dual-Layer Inference

Model Neutrality

Tools

Memory System

Observer

WebUI Dashboard

Learning Pipeline

Customisation

Identity / Persona

Configuration

Project Stats

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ern-OS

Quick Start

Requirements

Architecture

Dual-Layer Inference

Model Neutrality

Tools

Memory System

Observer

WebUI Dashboard

Learning Pipeline

Customisation

Identity / Persona

Configuration

Project Stats

Documentation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages