Skip to main content

Summarize existing representative LLMs text datasets.

1.5k
GitHub Stars
542
Curated Resources
8
Categories
5 hours ago
Last Refreshed
ChangelogPre-training CorporaInstruction Fine-tuning DatasetsPreference DatasetsEvaluation DatasetsTraditional NLP DatasetsMulti-modal Large Language Models (MLLMs) DatasetsRetrieval Augmented Generation (RAG) Datasets

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me evaluation platform resources from awesome-llms-datasets"

Installation instructions →

What's inside

Evaluation Datasets

Instruction Fine-tuning Datasets

  • DatasetGeneral Instruction Fine-tuning Datasets

  • DatasetGeneral Instruction Fine-tuning Datasets

  • DatasetGeneral Instruction Fine-tuning Datasets

  • DatasetGeneral Instruction Fine-tuning Datasets

  • DatasetGeneral Instruction Fine-tuning Datasets

  • DatasetGeneral Instruction Fine-tuning Datasets

Pre-training Corpora

  • DatasetGeneral Pre-training Corpora

  • DatasetGeneral Pre-training Corpora

  • DatasetGeneral Pre-training Corpora

  • DatasetGeneral Pre-training Corpora

  • DatasetGeneral Pre-training Corpora

  • DatasetGeneral Pre-training Corpora

Traditional NLP Datasets

Retrieval Augmented Generation (RAG) Datasets

Preference Datasets

  • DatasetPreference Evaluation Methods

  • DatasetPreference Evaluation Methods

  • DatasetPreference Evaluation Methods

  • GithubPreference Evaluation Methods

  • GithubPreference Evaluation Methods

  • GithubPreference Evaluation Methods

Multi-modal Large Language Models (MLLMs) Datasets

  • PaperInstruction Fine-tuning Datasets

  • PaperInstruction Fine-tuning Datasets

  • PaperEvaluation Datasets

  • PaperEvaluation Datasets

  • PaperEvaluation Datasets

Showing a sample of 542 resources. View the full list on GitHub →