awesome-instruction-dataset
github.com/yaodongc/awesome-instruction-dataset ↗A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)
1.1k
GitHub Stars
52
Curated Resources
23
Categories
5 hours ago
Last Refreshed
(Vision-CAIR/MiniGPT-4)|5K|EN|MT|MIX(haotian-liu/LLaVA)|150K|EN|MT|MIX[({sunrainyg}/{InstructCV)|EN|MT|MIX}]{https://github.com/AlaaLab/InstructCV}(tatsu-lab/Alpaca)|52K|EN|MT|SI(Hello-SimpleAI/HC3)|24K|EN|MT|MIX(Hello-SimpleAI/HC3-Chinese)|13K|CN|MT|MIX(allenai/prosocial-dialog)|58K|EN|MT|MIX(allenai/natural-instructions)|1.6K|ML|MT|HG(bigscience/xP3)|N/A|ML|MT|MIX(PhoebusSi/Alpaca-CoT)|500k|ML|MT|COL(nomic-ai/gpt4all)|437k|EN|MT|COL(google-research/FLAN)|N/A|EN|MT|MIX(cascip/ChatAlpaca)|10k|EN|MT|MIX(orhonovich/unnatural-instructions)|240K|EN|MT|MIX(Instruction-Tuning-with-GPT-4/GPT-4-LLM)|52K|EN|CN|MT|SI(databrickslabs/dolly)|15K|EN|MT|HG(OpenAssistant/oasst1)|161K|ML|MT|HG(zjunlp/Mol-Instructions)|2043K|ML|MT|MIX(Anthropic/hh-rlhf)|22k|EN|MT|MIX(thu-coai/Safety-Prompts)|100k|CN|MT|MIX(HuggingFaceH4/stack-exchange-preferences)|10741k|EN|TS|HG(Instruction-Tuning-with-GPT-4/GPT-4-LLM)|52K|EN|MT|MIX(Reddit/eli5)|500k|EN|MT|HG
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me (huggingfaceh4/stack-exchange-preferences)|10741k|en|ts|hg resources from awesome-instruction-dataset"
Installation instructions →What's inside
(HuggingFaceH4/stack-exchange-preferences)|10741k|EN|TS|HG
Resources
- (allenai/natural-instructions)|1.6K|ML|MT|HG
1.6K|ML|MT|HG
- (allenai/prosocial-dialog)|58K|EN|MT|MIX
58K|EN|MT|MIX
- (Anthropic/hh-rlhf)|22k|EN|MT|MIX
22k|EN|MT|MIX
- (bigscience/xP3)|N/A|ML|MT|MIX
N/A|ML|MT|MIX
- (cascip/ChatAlpaca)|10k|EN|MT|MIX
10k|EN|MT|MIX
- (databrickslabs/dolly)|15K|EN|MT|HG
15K|EN|MT|HG
(tatsu-lab/Alpaca)|52K|EN|MT|SI
(bigscience/xP3)|N/A|ML|MT|MIX
(Reddit/eli5)|500k|EN|MT|HG
(databrickslabs/dolly)|15K|EN|MT|HG
(PhoebusSi/Alpaca-CoT)|500k|ML|MT|COL
(nomic-ai/gpt4all)|437k|EN|MT|COL
Showing a sample of 52 resources. View the full list on GitHub →