awesome-parquet

Useful resources for using the Parquet format

GitHub Stars

Curated Resources

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me c++ resources from awesome-parquet"

Apache Arrow C++C++
A library with support for reading and writing Parquet files.
arrowR
The
Arrow GLibC GLib
A wrapper library for Arrow C++.
cudfJava
Java bindings for cudf, to be able to process large amounts of data on a GPU.
datafusionRust
An extensible query engine written in Rust that can read/write Parquet files using SQL or a DataFrame API.
DuckDBJulia
Official DuckDB Julia package.

Apache Parquet DocumentationDocumentation
The official documentation for Apache Parquet.
Best Practices for Distributing GeoParquetParquet engineering
Best practices for making 'good' GeoParquet files, especially for distribution of data.
Column Storage for the AI EraBlogs
A proposal by the creator of Parquet to better support AI workloads by adding encodings and metadata.
Handling Parquet FilesParquet engineering
Recommendations about the row group size and the Parquet file sizes.
Hyparquet: The Quest for Instant DataBlogs
6 optimization tricks to read Parquet files faster in the browser.
icem7Blogs
Un blog sur les outils de data science, avec des articles de fond sur Parquet.

ChatDBWeb
Online tools for viewing and converting from and to Parquet files.
DataConverter.ioWeb
Online tools for viewing, converting, and transforming Parquet files.
DataFusion CLICommand-line
A single, dependency-free executable that can read and write Parquet files, with a SQL interface.
DatanomyTerminal UI
A terminal-based tool for visualizing a Parquet file's metadata and structure.
DatasetteWeb
A tool to explore datasets, with support for reading Parquet files.
DataStudioWeb
Explore and visualize data, entirely in your browser.

F3
A data file format that is designed with efficiency, interoperability, and extensibility in mind.
GeoParquet
Specification for storing geospatial vector data (point, line, polygon) in Parquet.
Iceberg
A high-performance format for huge analytic tables that supports Parquet as one of its storage formats.
Lance
Modern columnar data format for ML and LLMs.
Nimble
File format for storage of large columnar datasets.
ORC
Self-describing type-aware columnar file format designed for Hadoop workloads.

Showing a sample of 88 resources. View the full list on GitHub →