awesome-llms-datasets
github.com/lmmlzn/awesome-llms-datasets ↗Summarize existing representative LLMs text datasets.
1.5k
GitHub Stars
542
Curated Resources
8
Categories
5 hours ago
Last Refreshed
ChangelogPre-training CorporaInstruction Fine-tuning DatasetsPreference DatasetsEvaluation DatasetsTraditional NLP DatasetsMulti-modal Large Language Models (MLLMs) DatasetsRetrieval Augmented Generation (RAG) Datasets
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me evaluation platform resources from awesome-llms-datasets"
Installation instructions →What's inside
Evaluation Datasets
- CLUE Benchmark SeriesEvaluation Platform
- C-MTEB LeaderboardEvaluation Platform
- DatasetSubject
- DatasetMultilingual
- GithubMedical
- GithubMedical
Instruction Fine-tuning Datasets
Pre-training Corpora
Traditional NLP Datasets
Preference Datasets
Showing a sample of 542 resources. View the full list on GitHub →