awesome-rlvr
github.com/opendilab/awesome-rlvr ↗A curated list of reinforcement learning with verifiable rewards (continually updated)
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me 2026 resources from awesome-rlvr"
Installation instructions →What's inside
Papers
- 1229095296/ResRL2026
- Absolute Zero: Reinforced Self-play Reasoning with Zero Data2025
- AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning2025
- Advantage Collapse in Group Relative Policy Optimization: Diagnosis and Mitigation2026
- Agentic Reinforced Policy Optimization2025
- Agentic RL Scaling Law: Spontaneous Code Execution for Mathematical Problem Solving2025
Surveys & Tutorials
- A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More
- Agentic Large Language Models, A Survey
- An Illusion of Progress? Assessing the Current State of Web Agents
- A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
- A Visual Guide to Reasoning LLMs
- Can LLMs Reason & Plan?
Codebases
- AReaL
Ant Reasoning Reinforcement Learning for LLMs
- Nemo-Aligner
Scalable toolkit for efficient model alignment
- open-r1
Fully open reproduction of the DeepSeek-R1 pipeline (SFT, distillation, GRPO, evaluation)
- Open-Reasoner-Zero
one open source implementation of large-scale reasoning-oriented RL training focusing on scalability, simplicity and accessibility
- OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)*
- PRIME
PRIME (Process Reinforcement through IMplicit REwards), an open-source solution for online RL with process rewards
Showing a sample of 271 resources. View the full list on GitHub →