Skip to main content

🤖 A curated list of resources for testing AI agents - frameworks, methodologies, benchmarks, tools, and best practices for ensuring reliable, safe, and effective autonomous AI systems

38
GitHub Stars
155
Curated Resources
11
Categories
45 min ago
Last Refreshed
FoundationsAI Agent CategoriesTesting FrameworksChaos Engineering and Fault InjectionBenchmarks and EvaluationSimulation EnvironmentsSafety and Security TestingPerformance TestingPractical ResourcesIndustry ApplicationsObservability and Monitoring

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me videos and courses resources from awesome-ai-agent-testing"

Installation instructions →

What's inside

Practical Resources

Safety and Security Testing

Benchmarks and Evaluation

Foundations

Testing Frameworks

  • Agent-Testing-LibraryLanguage-Specific Tools

    Testing utilities for JS agents.

  • AgentTestKitLanguage-Specific Tools

    Comprehensive testing toolkit for Java agents.

  • AgentVerseOpen Source Frameworks

    Framework for building and testing multi-agent systems.

  • API-BankCategory-Specific Testing Tools

    Tool-augmented LLM evaluation

  • Athina AICommercial Solutions

    Specialized platform for LLM and agent evaluation.

  • AutoGenOpen Source Frameworks

    Microsoft's framework for building conversational agents with comprehensive testing tools.

Simulation Environments

  • AI2-THORVirtual Worlds

    Interactive 3D environments

  • CARLAVirtual Worlds

    Autonomous driving simulation

  • Dota 2 Bot APIGame-Based Environments

    Complex multi-agent environment

  • HabitatVirtual Worlds

    Platform for embodied AI research

  • Meta-WorldDynamic Testing Environments

    Benchmark for multi-task RL

  • MineDojoVirtual Worlds

    Minecraft-based agent environment

Performance Testing

  • Apache JMeterLoad Testing

    Comprehensive testing tool

  • JaegerLatency Analysis

    Distributed tracing system

  • K6Load Testing

    Modern load testing tool

  • KubernetesScalability Testing

    Container orchestration

  • LocustLoad Testing

    Scalable load testing framework

  • OpenTelemetryLatency Analysis

    Observability framework

Observability and Monitoring

  • Arize AIProduction Monitoring Platforms

    ML observability with LLM support.

  • GalileoProduction Monitoring Platforms

    LLM observability and evaluation.

  • LangFuseProduction Monitoring Platforms

    Open-source LLM observability platform.

  • OpenTelemetry GenAI ConventionLogging Standards

    Emerging standard for AI observability.

  • WhyLabsProduction Monitoring Platforms

    AI observability platform.

Showing a sample of 155 resources. View the full list on GitHub →