Skip to main content

Awesome Reasoning LLM Tutorial/Survey/Guide

2.4k
GitHub Stars
187
Curated Resources
14
Categories
7 hours ago
Last Refreshed
🔍 Survey🤖 LLMs-in-RL🏆 Reward Learning (Process Reward Models)Policy OptimizationMCTS/Tree SearchExplainabilityMultimodal Agent related Slow-Fast SystemBenchmark and DatasetsReasoning and Safety🚀 RL & LLM Fine-Tuning Repositories⚡ Applications & Benchmarks📚 Tutorials & Courses🛠️ Libraries & Implementations🔗 Other Resources

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me 🚀 rl & llm fine-tuning repositories resources from awesome-llm-post-training"

Installation instructions →

What's inside

🚀 RL & LLM Fine-Tuning Repositories

  • 1

    Offers code for fine-tuning large vision-language models as decision-making agents via RL. Includes implementations for training models with task-specific rewards and evaluating them in various environments.

  • 10

    A high-throughput, distributed architecture for seamless LLM integration in interactive environments. While not specialized in RL or RLHF by default, it supports custom implementations and is ideal for users needing maximum flexibility.

  • 11

    Implements the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" . Focuses on enhancing LLM reasoning capabilities using a reverse curriculum RL approach.

  • 12

    A flexible, efficient, and production-ready RL training library for large language models (LLMs). Serves as the open-source implementation of the HybridFlow framework and supports various RL algorithms (PPO, GRPO), advanced resource utilization, and scalability up to 70B models on hundreds of GPUs. Integrates with Hugging Face models, supervised fine-tuning, and RLHF with multiple reward types.

  • 13

    A distributed training framework for fine-tuning large language models (LLMs) with reinforcement learning. Supports both Accelerate and NVIDIA NeMo backends, allowing training of models up to 20B+ parameters. Implements PPO and ILQL, and integrates with CHEESE for human-in-the-loop data collection.

  • 14

    A framework for instruction tuning in LLMs with RLHF, supporting 26 languages. Provides multilingual resources such as ChatGPT prompts, instruction datasets, and response ranking data, along with both BLOOM-based and LLaMa-based models and evaluation benchmarks.

Resources

⚡ Applications & Benchmarks

Benchmark and Datasets

Showing a sample of 187 resources. View the full list on GitHub →