awesome-ocr

Links to awesome OCR projects

3.1k

GitHub Stars

200

Curated Resources

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me blog posts and tutorials resources from awesome-ocr"

abby2hocr.xslt XSLT scriptOCR file formats
ABBYY Cloud OCR SDK Code samplesOCR as a Service
Code samples for using the proprietary commercial ABBYY OCR API.
AbbyyToAltoOCR file formats
PHP script converting from Abbyy 6 to ALTO XML
alto-toolsOCR file formats
Various tools to work with ALTO files, Python
ALTO XML DocumentationOCR file formats
Documentation and use cases for ALTO
ALTO XML SchemaOCR file formats
XML Schema and development of the ALTO XML format

archiscribe-corpusGround Truth
>4,200 lines transcribed from 19th Century German prints via
CIS OCR Test SetGround Truth
2 example documents each in German/Latin/Greek with ground truth for
CLTKGround Truth
Corpora from
DIVA-HisDBGround Truth
150 pages
EarlyPrintedBooksGround Truth
~8,800 lines from several early printed books
ECCO-TCPGround Truth
2,188 ECCO documents transcribed by

Showing a sample of 200 resources. View the full list on GitHub →