awesome-rlhf
github.com/louieworth/awesome-rlhf ↗An index of algorithms for reinforcement learning from human feedback (rlhf))
91
GitHub Stars
104
Curated Resources
3
Categories
3 hours ago
Last Refreshed
PapersBlogs/Talks/ReportsOpen Source Software/Implementations
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me rlhf for llms: theory / methods resources from awesome-rlhf"
Installation instructions →What's inside
Papers
- Adversarial Preference OptimizationRLHF for LLMs: Theory / Methods
- A General Theoretical Paradigm to Understand Learning from Human PreferencesRLHF for LLMs: Theory / Methods
- A General Theoretical Paradigm to Understand Learning from Human PreferencesRLHF for LLMs: Theory / Methods
- AI Alignment: A Comprehensive SurveyReview/Survey
- Aligner: Achieving Efficient Alignment through Weak-to-Strong CorrectionRLHF for LLMs: Theory / Methods
- Aligning Language Models with Offline Reinforcement Learning from Human FeedbackRLHF for LLMs: Theory / Methods
Blogs/Talks/Reports
- GPT-4 Technical ReportReports
- Illustrating Reinforcement Learning from Human Feedback (RLHF)Blogs
- Llama 2: Open Foundation and Fine-Tuned Chat ModelsReports
- Reinforcement Learning for Language ModelsBlogs
- Reinforcement Learning from Human Feedback: From Zero to chatGPTTalks
- Reinforcement Learning from Human Feedback: Progress and ChallengesTalks
Resources
Showing a sample of 104 resources. View the full list on GitHub →