Skip to main content

Useful resources for using the Parquet format

53
GitHub Stars
86
Curated Resources
4
Categories
58 min ago
Last Refreshed
LibrariesToolsResourcesRelated formats

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me c++ resources from awesome-parquet"

Installation instructions →

What's inside

Libraries

  • Apache Arrow C++C++

    A library with support for reading and writing Parquet files.

  • arrowR

    The

  • Arrow GLibC GLib

    A wrapper library for Arrow C++.

  • cudfJava

    Java bindings for cudf, to be able to process large amounts of data on a GPU.

  • datafusionRust

    An extensible query engine written in Rust that can read/write Parquet files using SQL or a DataFrame API.

  • DuckDBC GLib

    An in-process database library that supports reading and writing Parquet files.

Resources

Tools

  • ChatDBWeb

    Online tools for viewing and converting from and to Parquet files.

  • DataConverter.ioWeb

    Online tools for viewing, converting, and transforming Parquet files.

  • DataFusion CLICommand-line

    A single, dependency-free executable that can read and write Parquet files, with a SQL interface.

  • DatanomyTerminal UI

    A terminal-based tool for visualizing a Parquet file's metadata and structure.

  • DatasetteWeb

    A tool to explore datasets, with support for reading Parquet files.

  • DataStudioWeb

    Explore and visualize data, entirely in your browser.

Related formats

  • F3

    A data file format that is designed with efficiency, interoperability, and extensibility in mind.

  • GeoParquet

    Specification for storing geospatial vector data (point, line, polygon) in Parquet.

  • Iceberg

    A high-performance format for huge analytic tables that supports Parquet as one of its storage formats.

  • Lance

    Modern columnar data format for ML and LLMs.

  • Nimble

    File format for storage of large columnar datasets.

  • ORC

    Self-describing type-aware columnar file format designed for Hadoop workloads.

Showing a sample of 86 resources. View the full list on GitHub →