Context Awesome

awesome-llm-synthetic-data

github.com/wasiahmad/awesome-llm-synthetic-data ↗

A reading list on LLM based Synthetic Data Generation 🔥

1.5k

GitHub Stars

85

Curated Resources

6

Categories

4 hours ago

Last Refreshed

1. Surveys2. Methods3. Application Areas4. Datasets5. Tools6. Blogs

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me 3.1. mathematical reasoning resources from awesome-llm-synthetic-data"

Installation instructions →

What's inside

5. Tools

1. Surveys

3. Application Areas

Augmenting Math Word Problems via Iterative Question Composing3.1. Mathematical Reasoning
AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct3.2. Code Generation
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning3.2. Code Generation
Constitutional AI: Harmlessness from AI Feedback3.4. Alignment
Distilling LLMs' Decomposition Abilities into Compact Language Models3.1. Mathematical Reasoning
DyCodeEval: Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination3.2. Code Generation

2. Methods

Automatic Instruction Evolving for Large Language Models2.1. Techniques
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society2.1. Techniques
CodecLM: Aligning Language Models with Tailored Synthetic Data2.2. Instruction Generation with High Quality/Complexity
Generating Training Data with Language Models: Towards Zero-Shot Language Understanding2.1. Techniques
Instruction Pre-Training:Language Models are Supervised Multitask Learners2.1. Techniques
Large Language Models Can Self-Improve2.1. Techniques

4. Datasets

6. Blogs

Showing a sample of 85 resources. View the full list on GitHub →