Skip to main content

A curated list of reinforcement learning with verifiable rewards (continually updated)

190
GitHub Stars
271
Curated Resources
4
Categories
46 min ago
Last Refreshed
Surveys & TutorialsCodebasesPapersOther Awesome Lists

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me 2026 resources from awesome-rlvr"

Installation instructions →

What's inside

Codebases

  • AReaL

    Ant Reasoning Reinforcement Learning for LLMs

  • Nemo-Aligner

    Scalable toolkit for efficient model alignment

  • open-r1

    Fully open reproduction of the DeepSeek-R1 pipeline (SFT, distillation, GRPO, evaluation)

  • Open-Reasoner-Zero

    one open source implementation of large-scale reasoning-oriented RL training focusing on scalability, simplicity and accessibility

  • OpenRLHF

    An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)*

  • PRIME

    PRIME (Process Reinforcement through IMplicit REwards), an open-source solution for online RL with process rewards

Showing a sample of 271 resources. View the full list on GitHub →