Skip to main content

Awesome observability page

636
GitHub Stars
288
Curated Resources
15
Categories
6 hours ago
Last Refreshed
1. Best Practices3. Collect4. Load Generators and Synthetic Traffic5. Transport6. Collector7. Storage8. Visualization9. Processing and Analyze and Act10. LLM & AI Observability11. GPU Observability12. Application Performance Monitoring Solutions (APM)13. Service Mesh14. Observability as a Service15. Examples and Sandboxes16. References

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me platforms resources from awesome-observability"

Installation instructions →

What's inside

5. Transport

  • ActiveMQ

    Powerful open source messaging and integration patterns server.

  • Aeron

    Efficient reliable UDP unicast, UDP multicast, and IPC message transport.

  • Apache Kafka

    Publish-subscribe messaging rethought as a distributed commit log.

  • Apollo

    Faster, more reliable, easier to maintain messaging broker built from the foundations of the original ActiveMQ.

  • Ascoltatori

    Pub/sub library for Node.

  • Beanstalk

    Simple, fast work queue.

10. LLM & AI Observability

  • AgentaPlatforms

    Open-source LLMOps platform for prompt playground, prompt management, LLM evaluation, and observability.

  • agenttraceCost & Usage Tracking

    TUI observability for AI coding agents. Tracks cost, tokens, tool failures, anomalies, health, and CI gates across Claude Code, Codex, Gemini CLI, Aider, and Cursor exports.

  • Arize PhoenixPlatforms

    Open-source AI observability platform for tracing, evaluation, datasets, experiments, prompt management and playground. Built on OpenTelemetry with Python and TypeScript support.

  • burn0Cost & Usage Tracking

    Open-source Node.js cost observability with one import. Auto-detects and tracks per-request costs for 50+ services (LLMs, SaaS, databases) via HTTP interception. Sub-millisecond overhead, local-first with optional cloud dashboard.

  • BurndCost & Usage Tracking

    Local-first CLI that reads Claude Code JSONL session files and surfaces cost leaks via pattern detectors (retry storms, tool overuse, repeated reads, thrash, etc.). npx-installable, MIT, zero telemetry; findings export to a shareable report URL.

  • ClevAgentPlatforms

    Runtime monitoring for AI agents — heartbeat watchdog, loop detection, cost tracking, auto-restart.

9. Processing and Analyze and Act

  • AlertaAlerts

    Tool used to consolidate and de-duplicate alerts from multiple sources for quick "at-a-glance" visualisation.

  • Anomaly Detection in Prometheus MetricsAnomalies Detection

    Prototype for a Prometheus Anomaly Detector (PAD) which can be deployed on OpenShift. The PAD is a framework to deploy a metric prediction model to detect anomalies in prometheus metrics.

  • Anomaly Detection Toolkit (ADTK)Anomalies Detection

    Python package for unsupervised / rule-based time series anomaly detection.

  • BansheeAnomalies Detection

    Real-time anomalies(outliers) detection system for periodic metrics.

  • BosunAlerts

    Time Series Alerting Framework.

  • CabotAlerts

    Get alerted when services go down or metrics go crazy.

14. Observability as a Service

  • Alibaba Cloud Logs Service

    Complete real-time data logging service that has been developed by Alibaba Group.

  • Azure Monitor

    Full observability into your applications, infrastructure, and network.

  • CloudWatch

    Observability of your AWS resources and applications on AWS and on-premises.

  • Dash0

    Modern OpenTelemetry Native Observability, built on CNCF Open Standards such as PromQL, Perses and OLTP with full cost control. Supports monitoring metrics, logs and traces. With dashboarding and alerting capabilities.

  • Epsagon

    Application Monitoring Built for Containers and Serverless.

  • Geneos

    Real-time monitoring for all your environments in one platform.

7. Storage

  • Apache CassandraNoSQL Database (The Others :-P)

    Scalability and high availability with linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure.

  • Apache HBase"Meta Projects" (data storage, multi-tenant, aggregation, high availability, etc)

    The Hadoop database, a distributed, scalable, big data store.

  • Apache LuceneSearch Engine

    Java library providing powerful indexing and search features.

  • Apache SolrSearch Engine

    Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene.

  • ArangoDBGraph Database

    Natively store data for graph, document and search needs.

  • ClickHouseSQL Database

    Fast open-source OLAP database management system.

8. Visualization

  • API Status CheckUptime

    Free real-time status monitoring for 120+ developer APIs including AWS, Stripe, GitHub, OpenAI, and more. Track third-party API availability with alerts and status pages.

  • BlueWave UptimeUptime

    Open-source, self-hosted monitoring tool built with React.js, Node.js, and MongoDB, designed to track server uptime, response times, and incidents in real-time with beautiful visualizations.

  • ChronografDashboarding

    User interface and administrative component of the InfluxDB platform.

  • ExplorVizGeneral & Tools

    Live trace visualization for large software landscapes.

  • Flame GraphGeneral & Tools

    Visualization of profiled software, allowing the most frequent code-paths to be identified quickly and accurately.

  • Flame ScopeGeneral & Tools

    FlameScope is a visualization tool for exploring different time ranges as Flame Graphs.

12. Application Performance Monitoring Solutions (APM)

  • Apitally

    API monitoring, analytics, and request logging for REST APIs, with lightweight open-source SDKs for Python, Node.js, Go, .NET, and Java.

  • AppDynamics

    Business and application performance monitoring.

  • AppOptics

    Continuous monitoring built to scale with your applications for less downtime and lower resource usage.

  • Aspecto

    Troubleshoot performance bottlenecks and errors within your microservices.

  • Aternity

    Simplified high-definition APM visibility leveraging Real User Monitoring, Synthetic Monitoring, and OpenTelemetry, that is scalable, easy to use and deploy, and unifies insights across end users, applications, networks, and the cloud-native ecosystem.

  • Blue Matador

    Easiest and fastest way to monitor your cloud environments on the market. Just provide your read-only credentials and start getting insights in minutes.

6. Collector

  • AugurConfiguration & Linters

    Static analysis linter for OpenTelemetry Collector configurations. Detects misconfigurations, hardcoded credentials, and missing critical components (memory limiters, batch processors) before deployment. Built on OPA/Rego with customizable policies and CI/CD integration.

  • BrubeckLogging

    Statsd-compatible stats aggregator written in C.

  • GoAccessLogging

    Open source real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser. It provides fast and valuable HTTP statistics for system administrators that require a visual server report on the fly.

  • Grafana MimirMetrics

    Mimir is an open source, horizontally scalable, highly available, multi-tenant TSDB for long-term storage for Prometheus.

  • Last9Logging

    Unified Logs Explorer with search, filters, SQL query support, and OpenTelemetry-native ingestion.

  • LogbookLogging

    Extensible Java library to enable complete request and response logging for different client- and server-side technologies.

Showing a sample of 288 resources. View the full list on GitHub →