Awesome LLM Observability & Agent Monitoring 2026

A production-grade starter kit for LLM observability, agent monitoring, and evaluation.

Last Updated: March 2026

📚 Awesome Lists

Agent Monitoring (Primary)

Zero-Code Agent Tracing

AbideX - Zero-code OpenTelemetry agent tracing (CrewAI, LangGraph, LlamaIndex)
OpenLLMetry - OpenTelemetry SDK for LLMs & agents
LangSmith - Deep LangChain/LangGraph debugging
Trace Loop - Lightweight OTel for agents

Agent Frameworks

CrewAI - Multi-agent orchestration framework
LangGraph - Graph-based agent workflows
LlamaIndex - RAG + agent pipelines
Pydantic AI - Lightweight agent framework

OTLP Backends (Trace Storage & Visualization)

SigNoz - Open-source OTLP backend (self-hosted)
Uptrace - Managed OTLP SaaS
Jaeger - Distributed tracing & span visualization
Grafana Tempo - OSS trace backend

Traditional Observability (Complementary)

Tracing & Observability

Langfuse - Prompt management + evals + sessions
Arize Phoenix - Evaluation + drift detection
Helicone - Cost proxy for OpenAI/Anthropic
Braintrust - Evals + experiments + collaboration

Evaluation Frameworks

DeepEval - 50+ metrics (hallucination, bias, faithfulness)
Ragas - RAG evaluation metrics
TruLens - Feedback functions & explainability
ARES - Query optimizer for RAG

Cost & Metrics

Helicone - Cost tracking (proxy-based)
Token Counter - Token counting
Prometheus - Metrics collection
Grafana - Visualization & alerting

Embeddings & Drift

Arize Phoenix - Embedding drift visualization
WhyLabs - Data quality monitoring
Evidently AI - Drift detection

Best Practices & Research

Project Structure

awesome-observability/
├── README.md                    # This file
├── requirements.txt             # Python dependencies
├── Makefile                     # Build & development commands
├── LICENSE                      # MIT License
│
├── src/                         # Core Python modules (production-ready)
│   ├── observability.py        # Langfuse, Phoenix, Helicone, OTLP integrations
│   └── eval_utils.py           # DeepEval, Ragas, TruLens wrappers
│
├── docs/                        # Documentation & guides
│   ├── QUICKSTART.md           # Get started in 5 minutes
│   ├── SETUP.md                # Detailed setup & configuration
│   └── DEPLOYMENT.md           # Production deployment guide
│
├── deploy/                      # Deployment & infrastructure configs
│   ├── docker-compose.yml      # Full stack (Langfuse, SigNoz, Prometheus, Grafana)
│   ├── Dockerfile              # Container image for FastAPI example
│   ├── prometheus.yml          # Prometheus metrics collection
│   ├── alerts.yml              # Alert rules (hallucination, cost, latency)
│   └── signoz-otel-config.yaml # OpenTelemetry collector config for SigNoz
│
├── examples/                    # Ready-to-run examples (1,026 lines, production-ready)
│   ├── 01_fastapi_rag.py       # FastAPI RAG with tracing & evaluation
│   ├── 02_langchain_agent.py   # LangChain agent with cost tracking
│   ├── 03_llamaindex_phoenix.py # LlamaIndex + Arize Phoenix integration
│   ├── 04_cost_monitoring.py   # Real-time LLM cost tracking
│   ├── 05_crewai_with_abidex.py # CrewAI agent with AbideX tracing
│   └── 06_langgraph_with_abidex.py # LangGraph workflow with zero-code tracing
│
├── configs/                     # Configuration templates
│   ├── .env.template           # Environment variables template
│   └── grafana-dashboard.json  # Pre-built Grafana dashboard
│
└── tests/                       # Test suite (if included)

Quick Decision Tree

Choose your observability stack based on your needs:

┌─ Start here: What's your primary use case?
│
├─ Agent-heavy (CrewAI, LangGraph, LlamaIndex) + want ZERO-CODE auto-tracing
│  └─→ AbideX (lead) + OTLP backend (SigNoz/Uptrace/Jaeger)
│      Optional: + Langfuse (prompts/evals) or Phoenix (drift)
│      (Best: zero code changes, auto GenAI attributes, OpenTelemetry native)
│
├─ Self-hosted + full control + open-source + agents
│  └─→ AbideX + SigNoz (OTLP) + Langfuse + DeepEval
│      (Best: cost-effective, no lock-in, rich agent visibility)
│
├─ Heavy LangChain/LangGraph + full evals + dashboards
│  └─→ AbideX (tracing) + LangSmith (optional, if LangChain-only)
│      + Langfuse (prompts/sessions) + DeepEval (quality)
│      (Best: deep debugging, agent steps, quality gates)
│
├─ Cost-sensitive: Just track costs/latency + minimal tracing
│  └─→ Helicone (proxy) + AbideX (light tracing)
│      (Best: lightweight, transparent, low overhead)
│
├─ Evaluation-first: hallucination/bias checks on agent outputs
│  └─→ AbideX (tracing) + DeepEval (evals) + DeepEval dashboards
│      (Best: 50+ metrics, auto quality gates, monitoring)
│
├─ Team collab + experiments + multi-agent coordination
│  └─→ AbideX (tracing) + Braintrust (experiments) + Langfuse (optional)
│      (Best: tracing + experiment tracking + eval collab)
│
└─ RAG/embeddings: agent chains with drift + retrieval quality
   └─→ AbideX (agent tracing) + Phoenix (drift) + Ragas/DeepEval

Recommended Combinations

Scenario	Stack	Setup	Cost	Agent-Ready
Agent MVP	AbideX + local SigNoz	Low	Free	Yes
Startup with agents	AbideX + Langfuse + DeepEval	Low	Low	Yes
Scale-up (multi-agent)	AbideX + SigNoz/Uptrace + Langfuse + DeepEval	Medium	Medium	Yes
Enterprise agents	AbideX + LangSmith + Langfuse + DeepEval	High	High	Yes
Cost-sensitive	AbideX + Helicone + Langfuse	Low	Low	Yes
Eval-heavy (RAG)	AbideX + Phoenix + Ragas + DeepEval	Medium	Medium	Yes
Max compliance	AbideX + SigNoz + DeepEval + Prometheus	Medium	Medium	Yes

Tool Comparison

Feature	AbideX	Langfuse	Arize Phoenix	Helicone	LangSmith	DeepEval	Braintrust	OpenLLMetry
Type	Zero-code agent tracer	All-in-one	Evaluation-first	Cost proxy	Agent tracing	Eval metrics	Eval + collab	OTel-based
OSS	Yes	Yes	Yes	No	No	Yes	No	Yes
Self-hosted	Yes (local spans)	Yes	Yes	No	No	Yes	No	Yes (local)
Auto Agent Tracing	Full	Basic	Basic	No	Yes	No	No	No
GenAI Attributes	Full (role/goal/task)	Yes	No	No	Yes	No	No	No
OTel Native	Yes (spans only)	No	Yes	No	No	No	No	Yes
OTLP Export	Yes	No	No	No	No	No	No	Yes
Prompt Mgmt	No	Full	No	No	Full	No	No	No
Evals (50+ metrics)	No	Yes	Full	No	Yes	Full	Full	No
Drift Detection	No	Yes	Full	No	No	Yes	No	No
CrewAI/LangGraph support	Full	Yes	Basic	No	Yes	No	No	Basic
Code changes needed	No (ZERO)	Some	Some	None (proxy)	Some	Some	Some	Some
Setup time	<1 min	5 min	5 min	1 min	5 min	5 min	5 min	10 min
Pricing	Free (OSS)	Free (OSS) + Cloud	Cheap + Cloud	$	$$	Free + Cloud	$$$	Free (OSS)
Learning Curve	Low	Medium	Low	Low	Medium	Low	High

Open-Source Evaluation Frameworks

Framework	Hallucinations	Bias	RAG Metrics	Faithfulness	Context	License
DeepEval	Full	Yes	Yes	Full	Yes	MIT
Ragas	Yes	No	Full	Yes	Yes	Apache 2.0

Quick Start

Explore ready-to-run examples in examples/ directory:

FastAPI RAG - FastAPI with tracing & evaluation
LangChain Agent - LangChain with cost tracking
LlamaIndex + Phoenix - LLamaIndex RAG + drift detection
Cost Monitoring - Real-time LLM cost tracking
CrewAI with AbideX - CrewAI multi-agent
LangGraph with AbideX - LangGraph workflows

Detailed setup instructions in docs/SETUP.md and docs/QUICKSTART.md.

Resources & References

AbideX (Zero-Code Agent Tracing)

AbideX GitHub - Main repository & documentation
AbideX PyPI - Package & installation

Agent Framework Documentation

CrewAI Docs - Multi-agent orchestration framework
LangGraph Documentation - Graph-based agents
LlamaIndex Agent Guide - RAG + agents
Pydantic AI Documentation - Lightweight agent builder

OTLP Backends & OpenTelemetry

SigNoz Setup Guide - Docker deployment
Uptrace Getting Started - Managed OTLP backend
Jaeger Installation - Distributed tracing
OpenTelemetry Python Docs - OTel SDK

Evaluation & Quality Frameworks

DeepEval Docs - 50+ metrics & LLM-as-judge
Ragas Documentation - RAG evaluation metrics
TruLens Docs - Feedback functions
Langfuse Evals - Prompt management + evals

Community & Support

Code Examples & Repositories

AbideX Examples - CrewAI, LangGraph samples
SigNoz Agent Monitoring Setup - Docker examples
DeepEval Cookbook - Eval patterns
Langfuse SDK Examples
LlamaIndex Observability Examples

Contributing

Have a tool, pattern, or best practice to add? Submit a PR!

Guidelines

Add tools in alphabetical order within sections
Include GitHub URL, key features, and licensing info
For code examples, ensure they follow patterns in this repo
Test Docker Compose setup before submitting

License

MIT License - Feel free to use in your projects!

Last updated: March 2026 | Follow for updates: Watch on GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
deploy		deploy
docs		docs
examples		examples
src		src
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Awesome LLM Observability & Agent Monitoring 2026

Table of Contents

📚 Awesome Lists

Agent Monitoring (Primary)

Zero-Code Agent Tracing

Agent Frameworks

OTLP Backends (Trace Storage & Visualization)

Traditional Observability (Complementary)

Tracing & Observability

Evaluation Frameworks

Cost & Metrics

Embeddings & Drift

Best Practices & Research

Project Structure

Quick Decision Tree

Recommended Combinations

Tool Comparison

Open-Source Evaluation Frameworks

Quick Start

Resources & References

AbideX (Zero-Code Agent Tracing)

Agent Framework Documentation

OTLP Backends & OpenTelemetry

Evaluation & Quality Frameworks

Community & Support

Code Examples & Repositories

Contributing

Guidelines

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages