Skip to main content

A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)

1.1k
GitHub Stars
52
Curated Resources
23
Categories
5 hours ago
Last Refreshed
(Vision-CAIR/MiniGPT-4)|5K|EN|MT|MIX(haotian-liu/LLaVA)|150K|EN|MT|MIX[({sunrainyg}/{InstructCV)|EN|MT|MIX}]{https://github.com/AlaaLab/InstructCV}(tatsu-lab/Alpaca)|52K|EN|MT|SI(Hello-SimpleAI/HC3)|24K|EN|MT|MIX(Hello-SimpleAI/HC3-Chinese)|13K|CN|MT|MIX(allenai/prosocial-dialog)|58K|EN|MT|MIX(allenai/natural-instructions)|1.6K|ML|MT|HG(bigscience/xP3)|N/A|ML|MT|MIX(PhoebusSi/Alpaca-CoT)|500k|ML|MT|COL(nomic-ai/gpt4all)|437k|EN|MT|COL(google-research/FLAN)|N/A|EN|MT|MIX(cascip/ChatAlpaca)|10k|EN|MT|MIX(orhonovich/unnatural-instructions)|240K|EN|MT|MIX(Instruction-Tuning-with-GPT-4/GPT-4-LLM)|52K|EN|CN|MT|SI(databrickslabs/dolly)|15K|EN|MT|HG(OpenAssistant/oasst1)|161K|ML|MT|HG(zjunlp/Mol-Instructions)|2043K|ML|MT|MIX(Anthropic/hh-rlhf)|22k|EN|MT|MIX(thu-coai/Safety-Prompts)|100k|CN|MT|MIX(HuggingFaceH4/stack-exchange-preferences)|10741k|EN|TS|HG(Instruction-Tuning-with-GPT-4/GPT-4-LLM)|52K|EN|MT|MIX(Reddit/eli5)|500k|EN|MT|HG

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me (huggingfaceh4/stack-exchange-preferences)|10741k|en|ts|hg resources from awesome-instruction-dataset"

Installation instructions →

What's inside

Showing a sample of 52 resources. View the full list on GitHub →