A production-grade starter kit for LLM observability, agent monitoring, and evaluation.
Last Updated: March 2026
- Awesome Lists
- Project Structure
- Quick Decision Tree
- Tool Comparison
- Quick Start
- Resources & References
- AbideX - Zero-code OpenTelemetry agent tracing (CrewAI, LangGraph, LlamaIndex)
- OpenLLMetry - OpenTelemetry SDK for LLMs & agents
- LangSmith - Deep LangChain/LangGraph debugging
- Trace Loop - Lightweight OTel for agents
- CrewAI - Multi-agent orchestration framework
- LangGraph - Graph-based agent workflows
- LlamaIndex - RAG + agent pipelines
- Pydantic AI - Lightweight agent framework
- SigNoz - Open-source OTLP backend (self-hosted)
- Uptrace - Managed OTLP SaaS
- Jaeger - Distributed tracing & span visualization
- Grafana Tempo - OSS trace backend
- Langfuse - Prompt management + evals + sessions
- Arize Phoenix - Evaluation + drift detection
- Helicone - Cost proxy for OpenAI/Anthropic
- Braintrust - Evals + experiments + collaboration
- DeepEval - 50+ metrics (hallucination, bias, faithfulness)
- Ragas - RAG evaluation metrics
- TruLens - Feedback functions & explainability
- ARES - Query optimizer for RAG
- Helicone - Cost tracking (proxy-based)
- Token Counter - Token counting
- Prometheus - Metrics collection
- Grafana - Visualization & alerting
- Arize Phoenix - Embedding drift visualization
- WhyLabs - Data quality monitoring
- Evidently AI - Drift detection
- AbideX Documentation
- OpenTelemetry for Python
- SigNoz LLM Monitoring Guide
- Langfuse LLM Observability Guide
- Arize "What is LLM Observability?"
- Confident AI Top Eval Tools
- LangChain Agent Debugging
awesome-observability/
├── README.md # This file
├── requirements.txt # Python dependencies
├── Makefile # Build & development commands
├── LICENSE # MIT License
│
├── src/ # Core Python modules (production-ready)
│ ├── observability.py # Langfuse, Phoenix, Helicone, OTLP integrations
│ └── eval_utils.py # DeepEval, Ragas, TruLens wrappers
│
├── docs/ # Documentation & guides
│ ├── QUICKSTART.md # Get started in 5 minutes
│ ├── SETUP.md # Detailed setup & configuration
│ └── DEPLOYMENT.md # Production deployment guide
│
├── deploy/ # Deployment & infrastructure configs
│ ├── docker-compose.yml # Full stack (Langfuse, SigNoz, Prometheus, Grafana)
│ ├── Dockerfile # Container image for FastAPI example
│ ├── prometheus.yml # Prometheus metrics collection
│ ├── alerts.yml # Alert rules (hallucination, cost, latency)
│ └── signoz-otel-config.yaml # OpenTelemetry collector config for SigNoz
│
├── examples/ # Ready-to-run examples (1,026 lines, production-ready)
│ ├── 01_fastapi_rag.py # FastAPI RAG with tracing & evaluation
│ ├── 02_langchain_agent.py # LangChain agent with cost tracking
│ ├── 03_llamaindex_phoenix.py # LlamaIndex + Arize Phoenix integration
│ ├── 04_cost_monitoring.py # Real-time LLM cost tracking
│ ├── 05_crewai_with_abidex.py # CrewAI agent with AbideX tracing
│ └── 06_langgraph_with_abidex.py # LangGraph workflow with zero-code tracing
│
├── configs/ # Configuration templates
│ ├── .env.template # Environment variables template
│ └── grafana-dashboard.json # Pre-built Grafana dashboard
│
└── tests/ # Test suite (if included)
Choose your observability stack based on your needs:
┌─ Start here: What's your primary use case?
│
├─ Agent-heavy (CrewAI, LangGraph, LlamaIndex) + want ZERO-CODE auto-tracing
│ └─→ AbideX (lead) + OTLP backend (SigNoz/Uptrace/Jaeger)
│ Optional: + Langfuse (prompts/evals) or Phoenix (drift)
│ (Best: zero code changes, auto GenAI attributes, OpenTelemetry native)
│
├─ Self-hosted + full control + open-source + agents
│ └─→ AbideX + SigNoz (OTLP) + Langfuse + DeepEval
│ (Best: cost-effective, no lock-in, rich agent visibility)
│
├─ Heavy LangChain/LangGraph + full evals + dashboards
│ └─→ AbideX (tracing) + LangSmith (optional, if LangChain-only)
│ + Langfuse (prompts/sessions) + DeepEval (quality)
│ (Best: deep debugging, agent steps, quality gates)
│
├─ Cost-sensitive: Just track costs/latency + minimal tracing
│ └─→ Helicone (proxy) + AbideX (light tracing)
│ (Best: lightweight, transparent, low overhead)
│
├─ Evaluation-first: hallucination/bias checks on agent outputs
│ └─→ AbideX (tracing) + DeepEval (evals) + DeepEval dashboards
│ (Best: 50+ metrics, auto quality gates, monitoring)
│
├─ Team collab + experiments + multi-agent coordination
│ └─→ AbideX (tracing) + Braintrust (experiments) + Langfuse (optional)
│ (Best: tracing + experiment tracking + eval collab)
│
└─ RAG/embeddings: agent chains with drift + retrieval quality
└─→ AbideX (agent tracing) + Phoenix (drift) + Ragas/DeepEval
| Scenario | Stack | Setup | Cost | Agent-Ready |
|---|---|---|---|---|
| Agent MVP | AbideX + local SigNoz | Low | Free | Yes |
| Startup with agents | AbideX + Langfuse + DeepEval | Low | Low | Yes |
| Scale-up (multi-agent) | AbideX + SigNoz/Uptrace + Langfuse + DeepEval | Medium | Medium | Yes |
| Enterprise agents | AbideX + LangSmith + Langfuse + DeepEval | High | High | Yes |
| Cost-sensitive | AbideX + Helicone + Langfuse | Low | Low | Yes |
| Eval-heavy (RAG) | AbideX + Phoenix + Ragas + DeepEval | Medium | Medium | Yes |
| Max compliance | AbideX + SigNoz + DeepEval + Prometheus | Medium | Medium | Yes |
| Feature | AbideX | Langfuse | Arize Phoenix | Helicone | LangSmith | DeepEval | Braintrust | OpenLLMetry |
|---|---|---|---|---|---|---|---|---|
| Type | Zero-code agent tracer | All-in-one | Evaluation-first | Cost proxy | Agent tracing | Eval metrics | Eval + collab | OTel-based |
| OSS | Yes | Yes | Yes | No | No | Yes | No | Yes |
| Self-hosted | Yes (local spans) | Yes | Yes | No | No | Yes | No | Yes (local) |
| Auto Agent Tracing | Full | Basic | Basic | No | Yes | No | No | No |
| GenAI Attributes | Full (role/goal/task) | Yes | No | No | Yes | No | No | No |
| OTel Native | Yes (spans only) | No | Yes | No | No | No | No | Yes |
| OTLP Export | Yes | No | No | No | No | No | No | Yes |
| Prompt Mgmt | No | Full | No | No | Full | No | No | No |
| Evals (50+ metrics) | No | Yes | Full | No | Yes | Full | Full | No |
| Drift Detection | No | Yes | Full | No | No | Yes | No | No |
| CrewAI/LangGraph support | Full | Yes | Basic | No | Yes | No | No | Basic |
| Code changes needed | No (ZERO) | Some | Some | None (proxy) | Some | Some | Some | Some |
| Setup time | <1 min | 5 min | 5 min | 1 min | 5 min | 5 min | 5 min | 10 min |
| Pricing | Free (OSS) | Free (OSS) + Cloud | Cheap + Cloud | $ | $$ | Free + Cloud | $$$ | Free (OSS) |
| Learning Curve | Low | Medium | Low | Low | Medium | Low | High |
| Framework | Hallucinations | Bias | RAG Metrics | Faithfulness | Context | License |
|---|---|---|---|---|---|---|
| DeepEval | Full | Yes | Yes | Full | Yes | MIT |
| Ragas | Yes | No | Full | Yes | Yes | Apache 2.0 |
Explore ready-to-run examples in examples/ directory:
- FastAPI RAG - FastAPI with tracing & evaluation
- LangChain Agent - LangChain with cost tracking
- LlamaIndex + Phoenix - LLamaIndex RAG + drift detection
- Cost Monitoring - Real-time LLM cost tracking
- CrewAI with AbideX - CrewAI multi-agent
- LangGraph with AbideX - LangGraph workflows
Detailed setup instructions in docs/SETUP.md and docs/QUICKSTART.md.
- AbideX GitHub - Main repository & documentation
- AbideX PyPI - Package & installation
- CrewAI Docs - Multi-agent orchestration framework
- LangGraph Documentation - Graph-based agents
- LlamaIndex Agent Guide - RAG + agents
- Pydantic AI Documentation - Lightweight agent builder
- SigNoz Setup Guide - Docker deployment
- Uptrace Getting Started - Managed OTLP backend
- Jaeger Installation - Distributed tracing
- OpenTelemetry Python Docs - OTel SDK
- DeepEval Docs - 50+ metrics & LLM-as-judge
- Ragas Documentation - RAG evaluation metrics
- TruLens Docs - Feedback functions
- Langfuse Evals - Prompt management + evals
- AbideX Examples - CrewAI, LangGraph samples
- SigNoz Agent Monitoring Setup - Docker examples
- DeepEval Cookbook - Eval patterns
- Langfuse SDK Examples
- LlamaIndex Observability Examples
Have a tool, pattern, or best practice to add? Submit a PR!
- Add tools in alphabetical order within sections
- Include GitHub URL, key features, and licensing info
- For code examples, ensure they follow patterns in this repo
- Test Docker Compose setup before submitting
MIT License - Feel free to use in your projects!
Last updated: March 2026 | Follow for updates: Watch on GitHub