Skip to main content

A curated list of papers about key information extraction.

108
GitHub Stars
61
Curated Resources
5
Categories
3 hours ago
Last Refreshed
DatasetsSurveyToolkitsModelsRelated Repositories

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me transformer-based resources from awesome-key-information-extraction"

Installation instructions →

What's inside

Toolkits

  • 2020

    PP-OCR: A Practical Ultra Lightweight OCR System

  • 2021

    MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding

  • 2022

    DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding

  • 2024

    ANLS* -- A Universal Document Processing Metric for Generative Large Language Models

Survey

  • 2021

    Document AI: Benchmarks, Models and Applications

  • 2023

    On the Hidden Mystery of OCR in Large Multimodal Models

Models

Datasets

  • CORD

    CORD: A Consolidated Receipt Dataset for Post-OCR Parsing

  • DUE

    DUE: End-to-End Document Understanding Benchmark

  • EATEN

    EATEN: Entity-aware Attention for Single Shot Visual Text Extraction

  • EPHOIE

    Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

  • FUNSD

    FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents

  • POIE

    Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution

Showing a sample of 61 resources. View the full list on GitHub →