Skip to main content

A comprehensive list of annotated training datasets classified by use case.

39
GitHub Stars
80
Curated Resources
8
Categories
20 hours ago
Last Refreshed
Speech RecognitionDocument ClassificationKey Information ExtractionOptical Character RecognitionDocument Layout AnalysisDocument Question AnsweringInstant SegmentationNamed-Entity Recognition

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me defense resources from awesome-datasets"

Installation instructions →

What's inside

Named-Entity Recognition

Key Information Extraction

Speech Recognition

Document Layout Analysis

Optical Character Recognition

Document Classification

Showing a sample of 80 resources. View the full list on GitHub →