Skip to main content

Awesome list of the software tools related to opendata: data catalogs, ingestion tools, data prep tools and so on

36
GitHub Stars
141
Curated Resources
3
Categories
4 hours ago
Last Refreshed
Data catalogsStandardsTools

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me open data portals resources from awesome-opendata-software"

Installation instructions →

What's inside

Data catalogs

  • AlephOpen data portals

    Aleph is a tool for indexing large amounts of both documents (PDF, Word, HTML) and structured (CSV, XLS, SQL) data for easy browsing and search.

  • ArcGIS HubGeodata catalogs

  • ArcGIS ServerGeodata catalogs

    ArcGIS Server is the server software component in ArcGIS Enterprise that makes your geographic information available to other users in your organization, and optionally to any Internet user.

  • CartoGeodata catalogs

    SaaS mapping service with possibility of creating of geodata portals

  • CKANOpen data portals

    CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers hundreds of data portals worldwide.

  • ColecticaMicrodata catalogs

    Colectica is the fastest way to design, document, and publish your statistical data and survey research using open data standards.

Standards

  • Apache ParquetCommon data standards

    Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. Parquet is available in multiple languages including Java, C++, Python, etc.... It's still uncommon for open data portals but common for public ML data catalogs.

  • Arrow Columnar FormatCommon data standards

    The Arrow columnar format includes a language-agnostic in-memory data structure specification, metadata serialization, and a protocol for serialization and generic data transport.

  • Asset Description Metadata Schema, ADMSMetadata standards

    metadata management of a European public administration or service and want to explore, (re-)use or share semantic assets (metadata or reference data)

  • BagItData containers

    BagIt is a set of hierarchical file layout conventions designed to support storage and transfer of arbitrary digital content. A "bag" consists of a directory containing the payload files and other accompanying metadata files known as "tag" files.

  • BioCompute ObjectsData containers

    BCOs are represented in JSON (JavaScript Object Notation) formatted text, adhearing to JSON schema draft-07. The JSON format was chosen because it is both human and machine readable/writable. For a detailed description of JSON see

  • CDFCommon data standards

    CDF is a conceptual data abstraction for storing, manipulating, and accessing multidimensional data sets. The basic component of CDF is a software programming interface that is a device-independent view of the CDF data model. Common for scientific data.

Tools

  • bdbagData packaging

    The bdbag utilities are a collection of software programs for working with BagIt packages that conform to the BDBag and Bagit/RO profiles.

  • dataladData packaging

    DataLad makes data management and data distribution more accessible. To do that, it stands on the shoulders of Git and Git-annex to deliver a decentralized system for data exchange.

  • DatasetteData publishing

    An open source multi-tool for exploring and publishing data

  • Frictionless FrameworkData packaging

    Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data

  • OpenRefineData refining

    OpenRefine is a free, open source power tool for working with messy data and improving it

  • RSDMXStatistics tools

    Tools for reading SDMX data and metadata in R

Showing a sample of 141 resources. View the full list on GitHub →