awesome-datasets
github.com/kili-technology/awesome-datasets ↗A comprehensive list of annotated training datasets classified by use case.
39
GitHub Stars
80
Curated Resources
8
Categories
20 hours ago
Last Refreshed
Speech RecognitionDocument ClassificationKey Information ExtractionOptical Character RecognitionDocument Layout AnalysisDocument Question AnsweringInstant SegmentationNamed-Entity Recognition
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me defense resources from awesome-datasets"
Installation instructions →What's inside
Instant Segmentation
- A DATASET FOR DETECTING FLYING AIRPLANES ON SATELLITE IMAGESDefense
- Airbus Aircraft DetectionDefense
- BRATS2016Medical
- casting product image data for quality inspectionManufacturing
- CheXpertMedical
- CT Medical ImagesMedical
Document Question Answering
- AmbigQAEnglish
- chatterbot/englishEnglish
- Coached Conversational Preference ElicitationEnglish
- ConvAI2 datasetEnglish
- Customer Support on TwitterEnglish
- EXCITEMENTS datasetsMultilingual
Named-Entity Recognition
- AnEMEnglish
- BBNEnglish
- BTCEnglish
- CADECEnglish
- CCCS-CIC-AndMal-2020English
- CONLL 2003English
Key Information Extraction
- CORDEnglish
- GHEGAMultilingual
- NISTEnglish
- The Kleister Charity datasetEnglish
- The Kleister NDA datasetEnglish
- XFUNDMultilingual
Speech Recognition
- CREMA-DEnglish
- M-AILABS Speech DatasetEnglish
Document Layout Analysis
- DocBankEnglish
- Layout Analysis DatasetEnglish
- PubLayNetEnglish
- TableBankEnglish
Optical Character Recognition
- FUNSDEnglish
- RDCL2019English
- SROIEEnglish
- Synth90kEnglish
- Total Text DatasetEnglish
Document Classification
- RVL-CDIP DatasetEnglish
- Top Streamers on TwitchEnglish
Showing a sample of 80 resources. View the full list on GitHub →