Skip to main content

Links to awesome OCR projects

3.1k
GitHub Stars
200
Curated Resources
3
Categories
1 hour ago
Last Refreshed
SoftwareDatasetsLiterature

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me blog posts and tutorials resources from awesome-ocr"

Installation instructions →

What's inside

Software

Datasets

  • archiscribe-corpusGround Truth

    >4,200 lines transcribed from 19th Century German prints via

  • CIS OCR Test SetGround Truth

    2 example documents each in German/Latin/Greek with ground truth for

  • CLTKGround Truth

    Corpora from

  • DIVA-HisDBGround Truth

    150 pages

  • EarlyPrintedBooksGround Truth

    ~8,800 lines from several early printed books

  • ECCO-TCPGround Truth

    2,188 ECCO documents transcribed by

Showing a sample of 200 resources. View the full list on GitHub →