Skip to main content

A curated list of resources dedicated to table recognition

404
GitHub Stars
16
Curated Resources
2
Categories
13 min ago
Last Refreshed
2. Datasets3. Other technical solutions

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me 2.1 introduction resources from awesome-table-recognition"

Installation instructions →

What's inside

2. Datasets

  • FinTabNet2.1 Introduction

    English This dataset contains complex tables from the annual reports of S&P 500 companies with detailed table structure annotations to help train and test structure recognition.

  • PubTables-1M2.1 Introduction

    English A large, detailed, high-quality dataset for training and evaluating a wide variety of models for the tasks of table detection, table structure recognition, and functional analysis.

  • PubTabNet2.1 Introduction

    English PubTabNet is a large dataset for image-based table recognition, containing 568k+ images of tabular data annotated with the corresponding HTML representation of the tables. It contain cell Topology, cell content and non-blank cell location groudtruth

  • SciTSR2.1 Introduction

    * English SciTSR is a large-scale table structure recognition dataset, which contains 15,000 tables in PDF format and their corresponding structure labels obtained from LaTeX source files. It contain cell Topology, cell content groudtruth

  • SynthTabNet2.1 Introduction

    English SynthTabNet is a synthetically generated dataset that contains annotated images of data in tabular layouts. It contain 600k train image, All parts are divided into Train, Test and Val splits (80%, 10%, 10%). It contain cell Topology, cell content and all cell location groudtruth

  • TableBank2.1 Introduction

    English TableBank is a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet, contains 417K high-quality labeled tables. It only contain cell Topology groudtruth

3. Other technical solutions

Showing a sample of 16 resources. View the full list on GitHub →