awesome-foundation-model-leaderboards
github.com/sailresearch/awesome-foundation-model-leaderboards ↗A curated list of awesome leaderboard-oriented resources for AI domain
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me 3d resources from awesome-foundation-model-leaderboards"
Installation instructions →What's inside
Image
- 3D Arena3D
3D Arena hosts 3D generation arena, where various 3D generative models compete based on their performance in generating 3D models.
- 3DGen Arena3D
3DGen Arena hosts the 3D generation arena, where various 3D generative models compete based on their performance in generating 3D models.
- 3D-POPE3D
3D-POPE is a benchmark to evaluate object hallucination in 3D generative models.
- AbelMath
Abel is a platform to evaluate the mathematical capabilities of LLMs.
- Abstract Image
Abstract Image is a benchmark to evaluate multimodal LLMs (MLLM) in understanding and visually reasoning about abstract images, such as maps, charts, and layouts.
- AesBench
AesBench is a benchmark to evaluate MLLMs on image aesthetics perception.
Model Ranking
- ACLUEText
ACLUE is an evaluation benchmark for ancient Chinese language comprehension.
- African Languages LLM Eval LeaderboardText
African Languages LLM Eval Leaderboard tracks progress and ranks performance of LLMs on African languages.
- AGIEvalText
AGIEval is a human-centric benchmark to evaluate the general abilities of foundation models in tasks pertinent to human cognition and problem-solving.
- AI Benchmarking HubComprehensive
AI Benchmarking Hub tracks and compares AI model performance in reasoning, coding, and knowledge tasks.
- ai-benchmarksText
ai-benchmarks contains a handful of evaluation results for the response latency of popular AI services.
- Aider LLM LeaderboardsCode
Aider LLM Leaderboards evaluate LLM's ability to follow system prompts to edit code.
Resources
- AIcrowd
AIcrowd hosts machine learning challenges and competitions across domains such as computer vision, NLP, and reinforcement learning, aimed at both researchers and practitioners.
- AI Hub
AI Hub offers a variety of competitions to encourage AI solutions to real-world problems, with a focus on innovation and collaboration.
- AI Studio
AI Studio offers AI competitions mainly for computer vision, NLP, and other data-driven tasks, allowing users to develop and showcase their AI skills.
- Allen Institute for AI
The Allen Institute for AI provides leaderboards and benchmarks on tasks in natural language understanding, commonsense reasoning, and other areas in AI research.
- Codabench
Codabench is an open-source platform for benchmarking AI models, enabling customizable, user-driven challenges across various AI domains.
- DataFountain
DataFountain is a Chinese AI competition platform featuring challenges in finance, healthcare, and smart cities, encouraging solutions for industry-related problems.
Metric Ranking
- AlignScore
AlignScore evaluates the performance of different metrics in assessing factual consistency.
Usage Ranking
- ATOM Relative Adoption Metric
ATOM Relative Adoption Metric tracks and evaluates the relative adoption and usage of AI models.
Dataset Ranking
- DataComp
DataComp is a benchmark to evaluate the performance of various datasets with a fixed model architecture.
Showing a sample of 547 resources. View the full list on GitHub →