awesome-opendata-software
github.com/commondataio/awesome-opendata-software ↗Awesome list of the software tools related to opendata: data catalogs, ingestion tools, data prep tools and so on
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me open data portals resources from awesome-opendata-software"
Installation instructions →What's inside
Data catalogs
- AlephOpen data portals
Aleph is a tool for indexing large amounts of both documents (PDF, Word, HTML) and structured (CSV, XLS, SQL) data for easy browsing and search.
- ArcGIS HubGeodata catalogs
- ArcGIS ServerGeodata catalogs
ArcGIS Server is the server software component in ArcGIS Enterprise that makes your geographic information available to other users in your organization, and optionally to any Internet user.
- CartoGeodata catalogs
SaaS mapping service with possibility of creating of geodata portals
- CKANOpen data portals
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers hundreds of data portals worldwide.
- ColecticaMicrodata catalogs
Colectica is the fastest way to design, document, and publish your statistical data and survey research using open data standards.
Standards
- Apache ParquetCommon data standards
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. Parquet is available in multiple languages including Java, C++, Python, etc.... It's still uncommon for open data portals but common for public ML data catalogs.
- Arrow Columnar FormatCommon data standards
The Arrow columnar format includes a language-agnostic in-memory data structure specification, metadata serialization, and a protocol for serialization and generic data transport.
- Asset Description Metadata Schema, ADMSMetadata standards
metadata management of a European public administration or service and want to explore, (re-)use or share semantic assets (metadata or reference data)
- BagItData containers
BagIt is a set of hierarchical file layout conventions designed to support storage and transfer of arbitrary digital content. A "bag" consists of a directory containing the payload files and other accompanying metadata files known as "tag" files.
- BioCompute ObjectsData containers
BCOs are represented in JSON (JavaScript Object Notation) formatted text, adhearing to JSON schema draft-07. The JSON format was chosen because it is both human and machine readable/writable. For a detailed description of JSON see
- CDFCommon data standards
CDF is a conceptual data abstraction for storing, manipulating, and accessing multidimensional data sets. The basic component of CDF is a software programming interface that is a device-independent view of the CDF data model. Common for scientific data.
Tools
- bdbagData packaging
The bdbag utilities are a collection of software programs for working with BagIt packages that conform to the BDBag and Bagit/RO profiles.
- dataladData packaging
DataLad makes data management and data distribution more accessible. To do that, it stands on the shoulders of Git and Git-annex to deliver a decentralized system for data exchange.
- DatasetteData publishing
An open source multi-tool for exploring and publishing data
- Frictionless FrameworkData packaging
Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data
- OpenRefineData refining
OpenRefine is a free, open source power tool for working with messy data and improving it
- RSDMXStatistics tools
Tools for reading SDMX data and metadata in R
Showing a sample of 141 resources. View the full list on GitHub →