Skip to main content

A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。

733
GitHub Stars
99
Curated Resources
36
Categories
6 hours ago
Last Refreshed
StatisticsAlpaca -StanfordInstruction in the WildStanford Human Preferences Dataset (SHP)Hello-SimpleAI/HC3Hello-SimpleAI/HC3-Chineseallenai/prosocial-dialogallenai/natural-instructionsPhoebusSi/Alpaca-CoTnomic-ai/gpt4allbigscience/xP3cascip/ChatAlpacaorhonovich/unnatural-instructionsInstruction-Tuning-with-GPT-4/GPT-4-LLMdatabrickslabs/dollyOpenAssistant/oasst1BELLE/data/1.5Malpaca_chinese_datasetMed-ChatGLM/datapCLUECOIGAnthropic/hh-rlhfHuggingFaceH4/stack-exchange-preferencesNatural Instruction / Super-Natural InstructionBigScience/P3xMTF - BigScienceHH-RLHF - AnthropicUnnatural InstructionSelf-InstructUnifiedSKG - HKUGoogle/Flan CollectionInstructDialOpen Instruction Generalist (OIG).OpenAI WebGPT.OpenAI Summarization.Reference

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me huggingfaceh4/stack-exchange-preferences resources from awesome-instruction-datasets"

Installation instructions →

What's inside

HuggingFaceH4/stack-exchange-preferences

Statistics

  • akoksal/LongForm

    akoksal/LongForm

  • AlpacaDataCleaned

    yahma/alpaca-cleaned

  • Auto CoT

    kojima-takeshi188/zero_shot_cot/dataset | kojima-takeshi188/zero_shot_cot/log

  • baize

    alpaca_chat_data.json | medical_chat_data.json | quora_chat_data.json | stackoverflow_chat_data.json

  • belle_cn

    BelleGroup/train_1M_CN | BelleGroup/train_0.5M_CN

  • camel

    camel-ai/code | camel-ai/biology | camel-ai/physics | camel-ai/chemistry | camel-ai/math

Stanford Human Preferences Dataset (SHP)

Natural Instruction / Super-Natural Instruction

Instruction in the Wild

xMTF - BigScience

Showing a sample of 99 resources. View the full list on GitHub →