Skip to main content

A curated list of resources for Document Understanding (DU) topic

1.5k
GitHub Stars
95
Curated Resources
6
Categories
5 hours ago
Last Refreshed
PapersResourcesConferences, workshopsBlogsSolutionsDocument Question Answering

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me conferences, workshops resources from awesome-document-understanding"

Installation instructions →

What's inside

Conferences, workshops

Resources

  • borb

    is a pure python library to read, write and manipulate PDF documents. It represents a PDF document as a JSON-like datastructure of nested lists, dictionaries and primitives (numbers, string, booleans, etc).

  • Born digital pdf scanner

    checking if pdf is born-digital

  • Color Document Dataset

    from the Intelligent Sensory Information Systems, University of Amsterdam

  • deepdoctection

  • Layout Parser

    Layout Parser is a deep learning based tool for document image layout analysis tasks

  • OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted

Showing a sample of 95 resources. View the full list on GitHub →