Skip to content

goabiaryan/awesome-observability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome LLM Observability & Agent Monitoring 2026

Awesome

A production-grade starter kit for LLM observability, agent monitoring, and evaluation.

Last Updated: March 2026

Table of Contents

  1. Awesome Lists
  2. Project Structure
  3. Quick Decision Tree
  4. Tool Comparison
  5. Quick Start
  6. Resources & References

📚 Awesome Lists

Agent Monitoring (Primary)

Zero-Code Agent Tracing

  • AbideX - Zero-code OpenTelemetry agent tracing (CrewAI, LangGraph, LlamaIndex)
  • OpenLLMetry - OpenTelemetry SDK for LLMs & agents
  • LangSmith - Deep LangChain/LangGraph debugging
  • Trace Loop - Lightweight OTel for agents

Agent Frameworks

OTLP Backends (Trace Storage & Visualization)

  • SigNoz - Open-source OTLP backend (self-hosted)
  • Uptrace - Managed OTLP SaaS
  • Jaeger - Distributed tracing & span visualization
  • Grafana Tempo - OSS trace backend

Traditional Observability (Complementary)

Tracing & Observability

Evaluation Frameworks

  • DeepEval - 50+ metrics (hallucination, bias, faithfulness)
  • Ragas - RAG evaluation metrics
  • TruLens - Feedback functions & explainability
  • ARES - Query optimizer for RAG

Cost & Metrics

Embeddings & Drift

Best Practices & Research


Project Structure

awesome-observability/
├── README.md                    # This file
├── requirements.txt             # Python dependencies
├── Makefile                     # Build & development commands
├── LICENSE                      # MIT License
│
├── src/                         # Core Python modules (production-ready)
│   ├── observability.py        # Langfuse, Phoenix, Helicone, OTLP integrations
│   └── eval_utils.py           # DeepEval, Ragas, TruLens wrappers
│
├── docs/                        # Documentation & guides
│   ├── QUICKSTART.md           # Get started in 5 minutes
│   ├── SETUP.md                # Detailed setup & configuration
│   └── DEPLOYMENT.md           # Production deployment guide
│
├── deploy/                      # Deployment & infrastructure configs
│   ├── docker-compose.yml      # Full stack (Langfuse, SigNoz, Prometheus, Grafana)
│   ├── Dockerfile              # Container image for FastAPI example
│   ├── prometheus.yml          # Prometheus metrics collection
│   ├── alerts.yml              # Alert rules (hallucination, cost, latency)
│   └── signoz-otel-config.yaml # OpenTelemetry collector config for SigNoz
│
├── examples/                    # Ready-to-run examples (1,026 lines, production-ready)
│   ├── 01_fastapi_rag.py       # FastAPI RAG with tracing & evaluation
│   ├── 02_langchain_agent.py   # LangChain agent with cost tracking
│   ├── 03_llamaindex_phoenix.py # LlamaIndex + Arize Phoenix integration
│   ├── 04_cost_monitoring.py   # Real-time LLM cost tracking
│   ├── 05_crewai_with_abidex.py # CrewAI agent with AbideX tracing
│   └── 06_langgraph_with_abidex.py # LangGraph workflow with zero-code tracing
│
├── configs/                     # Configuration templates
│   ├── .env.template           # Environment variables template
│   └── grafana-dashboard.json  # Pre-built Grafana dashboard
│
└── tests/                       # Test suite (if included)

Quick Decision Tree

Choose your observability stack based on your needs:

┌─ Start here: What's your primary use case?
│
├─ Agent-heavy (CrewAI, LangGraph, LlamaIndex) + want ZERO-CODE auto-tracing
│  └─→ AbideX (lead) + OTLP backend (SigNoz/Uptrace/Jaeger)
│      Optional: + Langfuse (prompts/evals) or Phoenix (drift)
│      (Best: zero code changes, auto GenAI attributes, OpenTelemetry native)
│
├─ Self-hosted + full control + open-source + agents
│  └─→ AbideX + SigNoz (OTLP) + Langfuse + DeepEval
│      (Best: cost-effective, no lock-in, rich agent visibility)
│
├─ Heavy LangChain/LangGraph + full evals + dashboards
│  └─→ AbideX (tracing) + LangSmith (optional, if LangChain-only)
│      + Langfuse (prompts/sessions) + DeepEval (quality)
│      (Best: deep debugging, agent steps, quality gates)
│
├─ Cost-sensitive: Just track costs/latency + minimal tracing
│  └─→ Helicone (proxy) + AbideX (light tracing)
│      (Best: lightweight, transparent, low overhead)
│
├─ Evaluation-first: hallucination/bias checks on agent outputs
│  └─→ AbideX (tracing) + DeepEval (evals) + DeepEval dashboards
│      (Best: 50+ metrics, auto quality gates, monitoring)
│
├─ Team collab + experiments + multi-agent coordination
│  └─→ AbideX (tracing) + Braintrust (experiments) + Langfuse (optional)
│      (Best: tracing + experiment tracking + eval collab)
│
└─ RAG/embeddings: agent chains with drift + retrieval quality
   └─→ AbideX (agent tracing) + Phoenix (drift) + Ragas/DeepEval

Recommended Combinations

Scenario Stack Setup Cost Agent-Ready
Agent MVP AbideX + local SigNoz Low Free Yes
Startup with agents AbideX + Langfuse + DeepEval Low Low Yes
Scale-up (multi-agent) AbideX + SigNoz/Uptrace + Langfuse + DeepEval Medium Medium Yes
Enterprise agents AbideX + LangSmith + Langfuse + DeepEval High High Yes
Cost-sensitive AbideX + Helicone + Langfuse Low Low Yes
Eval-heavy (RAG) AbideX + Phoenix + Ragas + DeepEval Medium Medium Yes
Max compliance AbideX + SigNoz + DeepEval + Prometheus Medium Medium Yes

Tool Comparison

Feature AbideX Langfuse Arize Phoenix Helicone LangSmith DeepEval Braintrust OpenLLMetry
Type Zero-code agent tracer All-in-one Evaluation-first Cost proxy Agent tracing Eval metrics Eval + collab OTel-based
OSS Yes Yes Yes No No Yes No Yes
Self-hosted Yes (local spans) Yes Yes No No Yes No Yes (local)
Auto Agent Tracing Full Basic Basic No Yes No No No
GenAI Attributes Full (role/goal/task) Yes No No Yes No No No
OTel Native Yes (spans only) No Yes No No No No Yes
OTLP Export Yes No No No No No No Yes
Prompt Mgmt No Full No No Full No No No
Evals (50+ metrics) No Yes Full No Yes Full Full No
Drift Detection No Yes Full No No Yes No No
CrewAI/LangGraph support Full Yes Basic No Yes No No Basic
Code changes needed No (ZERO) Some Some None (proxy) Some Some Some Some
Setup time <1 min 5 min 5 min 1 min 5 min 5 min 5 min 10 min
Pricing Free (OSS) Free (OSS) + Cloud Cheap + Cloud $ $$ Free + Cloud $$$ Free (OSS)
Learning Curve Low Medium Low Low Medium Low High

Open-Source Evaluation Frameworks

Framework Hallucinations Bias RAG Metrics Faithfulness Context License
DeepEval Full Yes Yes Full Yes MIT
Ragas Yes No Full Yes Yes Apache 2.0

Quick Start

Explore ready-to-run examples in examples/ directory:

Detailed setup instructions in docs/SETUP.md and docs/QUICKSTART.md.


Resources & References

AbideX (Zero-Code Agent Tracing)

Agent Framework Documentation

OTLP Backends & OpenTelemetry

Evaluation & Quality Frameworks

Community & Support

Code Examples & Repositories


Contributing

Have a tool, pattern, or best practice to add? Submit a PR!

Guidelines

  1. Add tools in alphabetical order within sections
  2. Include GitHub URL, key features, and licensing info
  3. For code examples, ensure they follow patterns in this repo
  4. Test Docker Compose setup before submitting

License

MIT License - Feel free to use in your projects!


Last updated: March 2026 | Follow for updates: Watch on GitHub

About

A curation of some of the best tools, resources, frameworks on LLM observability

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors