awesome-parquet
github.com/severo/awesome-parquet ↗Useful resources for using the Parquet format
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me c++ resources from awesome-parquet"
Installation instructions →What's inside
Libraries
- Apache Arrow C++C++
A library with support for reading and writing Parquet files.
- arrowR
The
- Arrow GLibC GLib
A wrapper library for Arrow C++.
- cudfJava
Java bindings for cudf, to be able to process large amounts of data on a GPU.
- datafusionRust
An extensible query engine written in Rust that can read/write Parquet files using SQL or a DataFrame API.
- DuckDBC GLib
An in-process database library that supports reading and writing Parquet files.
Resources
- Apache Parquet DocumentationDocumentation
The official documentation for Apache Parquet.
- Best Practices for Distributing GeoParquetParquet engineering
Best practices for making 'good' GeoParquet files, especially for distribution of data.
- Column Storage for the AI EraBlogs
A proposal by the creator of Parquet to better support AI workloads by adding encodings and metadata.
- Handling Parquet FilesParquet engineering
Recommendations about the row group size and the Parquet file sizes.
- Hyparquet: The Quest for Instant DataBlogs
6 optimization tricks to read Parquet files faster in the browser.
- icem7Blogs
Un blog sur les outils de data science, avec des articles de fond sur Parquet.
Tools
- ChatDBWeb
Online tools for viewing and converting from and to Parquet files.
- DataConverter.ioWeb
Online tools for viewing, converting, and transforming Parquet files.
- DataFusion CLICommand-line
A single, dependency-free executable that can read and write Parquet files, with a SQL interface.
- DatanomyTerminal UI
A terminal-based tool for visualizing a Parquet file's metadata and structure.
- DatasetteWeb
A tool to explore datasets, with support for reading Parquet files.
- DataStudioWeb
Explore and visualize data, entirely in your browser.
Related formats
- F3
A data file format that is designed with efficiency, interoperability, and extensibility in mind.
- GeoParquet
Specification for storing geospatial vector data (point, line, polygon) in Parquet.
- Iceberg
A high-performance format for huge analytic tables that supports Parquet as one of its storage formats.
- Lance
Modern columnar data format for ML and LLMs.
- Nimble
File format for storage of large columnar datasets.
- ORC
Self-describing type-aware columnar file format designed for Hadoop workloads.
Showing a sample of 86 resources. View the full list on GitHub →