awesome-foundation-model-leaderboards

A curated list of awesome leaderboard-oriented resources for AI domain

375

GitHub Stars

558

Curated Resources

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me 3d resources from awesome-foundation-model-leaderboards"

3D Arena3D
3D Arena hosts 3D generation arena, where various 3D generative models compete based on their performance in generating 3D models.
3DGen Arena3D
3DGen Arena hosts the 3D generation arena, where various 3D generative models compete based on their performance in generating 3D models.
3D-POPE3D
3D-POPE is a benchmark to evaluate object hallucination in 3D generative models.
AbelMath
Abel is a platform to evaluate the mathematical capabilities of LLMs.
Abstract Image
Abstract Image is a benchmark to evaluate multimodal LLMs (MLLM) in understanding and visually reasoning about abstract images, such as maps, charts, and layouts.
AesBench
AesBench is a benchmark to evaluate MLLMs on image aesthetics perception.

ACLUEText
ACLUE is an evaluation benchmark for ancient Chinese language comprehension.
African Languages LLM Eval LeaderboardText
African Languages LLM Eval Leaderboard tracks progress and ranks performance of LLMs on African languages.
AGIEvalText
AGIEval is a human-centric benchmark to evaluate the general abilities of foundation models in tasks pertinent to human cognition and problem-solving.
AI Benchmarking HubComprehensive
AI Benchmarking Hub tracks and compares AI model performance in reasoning, coding, and knowledge tasks.
Aider LLM LeaderboardsCode
Aider LLM Leaderboards evaluate LLM's ability to follow system prompts to edit code.
AI Energy Score LeaderboardText
AI Energy Score Leaderboard tracks and compares different models in energy efficiency.

ai-benchmarks
ai-benchmarks contains a handful of evaluation results for the response latency of popular AI services.
Artificial Analysis
Artificial Analysis is a platform to help users make informed decisions on AI model selection and hosting providers.

AIcrowd
AIcrowd hosts machine learning challenges and competitions across domains such as computer vision, NLP, and reinforcement learning, aimed at both researchers and practitioners.
AI Hub
AI Hub offers a variety of competitions to encourage AI solutions to real-world problems, with a focus on innovation and collaboration.
AI Studio
AI Studio offers AI competitions mainly for computer vision, NLP, and other data-driven tasks, allowing users to develop and showcase their AI skills.
Allen Institute for AI
The Allen Institute for AI provides leaderboards and benchmarks on tasks in natural language understanding, commonsense reasoning, and other areas in AI research.
Codabench
Codabench is an open-source platform for benchmarking AI models, enabling customizable, user-driven challenges across various AI domains.
DataFountain
DataFountain is a Chinese AI competition platform featuring challenges in finance, healthcare, and smart cities, encouraging solutions for industry-related problems.

AlignScore
AlignScore evaluates the performance of different metrics in assessing factual consistency.

ATOM Relative Adoption Metric
ATOM Relative Adoption Metric tracks and evaluates the relative adoption and usage of AI models.

DataComp
DataComp is a benchmark to evaluate the performance of various datasets with a fixed model architecture.

Showing a sample of 558 resources. View the full list on GitHub →