awesome-ocr
github.com/kba/awesome-ocr ↗Links to awesome OCR projects
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me blog posts and tutorials resources from awesome-ocr"
Installation instructions →What's inside
Literature
- 10 Tips for making your OCR project succeedBlog Posts and Tutorials
- abbyy-finereader-ocr-senateOCR Showcases
Using OCR to parse scanned Senate Financial Disclosure forms.
- Adaptive degraded document image binarizationAcademic articles
- A gentle introduction to OCRBlog Posts and Tutorials
- A Segmentation-Free Approach for Printed Devanagari Script RecognitionAcademic articles
- A Sequence Learning Approach for Multiple Script IdentificationAcademic articles
Software
- abby2hocr.xslt XSLT scriptOCR file formats
- ABBYY Cloud OCR SDK Code samplesOCR as a Service
Code samples for using the proprietary commercial ABBYY OCR API.
- AbbyyToAltoOCR file formats
PHP script converting from Abbyy 6 to ALTO XML
- alto-toolsOCR file formats
Various tools to work with ALTO files, Python
- ALTO XML DocumentationOCR file formats
Documentation and use cases for ALTO
- ALTO XML SchemaOCR file formats
XML Schema and development of the ALTO XML format
Datasets
- archiscribe-corpusGround Truth
>4,200 lines transcribed from 19th Century German prints via
- CIS OCR Test SetGround Truth
2 example documents each in German/Latin/Greek with ground truth for
- CLTKGround Truth
Corpora from
- DIVA-HisDBGround Truth
150 pages
- EarlyPrintedBooksGround Truth
~8,800 lines from several early printed books
- ECCO-TCPGround Truth
2,188 ECCO documents transcribed by
Showing a sample of 200 resources. View the full list on GitHub →