awesome-synthetic-datasets
github.com/davanstrien/awesome-synthetic-datasets ↗awesome synthetic (text) datasets
332
GitHub Stars
19
Curated Resources
2
Categories
23 hours ago
Last Refreshed
Tutorials, guides and educational blog postsImportant techniques
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me important papers resources from awesome-synthetic-datasets"
Installation instructions →What's inside
Important techniques
- Best Practices and Lessons Learned on Synthetic Data for Language ModelsImportant Papers
- Extensive Self-Contrast Enables Feedback-Free Language Model AlignmentImportant Papers
- Generating custom sentence similarity datasets
- Generating Faithful Synthetic Data with Large Language Models: A Case Study in Computational Social ScienceImportant Papers
- Improving Text Embeddings with Large Language ModelsImportant Papers
- Instruction Pre-Training: Language Models are Supervised Multitask LearnersImportant Papers
Tutorials, guides and educational blog posts
Showing a sample of 19 resources. View the full list on GitHub →