Context Awesome

awesome-llm-judges

github.com/haizelabs/awesome-llm-judges ↗

⚖️ Awesome LLM Judges ⚖️

202

GitHub Stars

38

Curated Resources

6

Categories

42 min ago

Last Refreshed

🌱 Starter🎭 Multi-Judge🎯 Finetuned Models🛡️ Safety👨‍⚖️ Judging the Judges: Meta-Evaluation✨ Contributing

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me 🛑 content moderation resources from awesome-llm-judges"

Installation instructions →

What's inside

🛡️ Safety

A STRONGREJECT for Empty Jailbreaks (Sections C.4 & C.5)🛑 Content Moderation
Debate Helps Supervise Unreliable Experts🔍 Scalable Oversight
Great Models Think Alike and this Undermines AI Oversight🔍 Scalable Oversight
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations🛑 Content Moderation
LLM Critics Help Catch LLM Bugs🔍 Scalable Oversight
On Scalable Oversight with Weak LLMs Judging Strong LLMs🔍 Scalable Oversight

🌱 Starter

🎭 Multi-Judge

🎯 Finetuned Models

Critique-out-Loud Reward Models🏆 Generative Reward Models
Generative Verifiers: Reward Modeling as Next-Token Prediction🏆 Generative Reward Models
HALU-J: Critique-Based Hallucination Judge🌀 Hallucination
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Lynx: An Open Source Hallucination Evaluation Model🌀 Hallucination
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents🌀 Hallucination

👨‍⚖️ Judging the Judges: Meta-Evaluation

✨ Contributing

leonard@haizelabs.com

Showing a sample of 38 resources. View the full list on GitHub →