Skip to main content

A curated list of awesome big data frameworks, resources and other awesomeness.

3
GitHub Stars
584
Curated Resources
33
Categories
23 hours ago
Last Refreshed
RDBMSFrameworksDistributed ProgrammingDistributed FilesystemDistributed IndexDocument Data ModelKey Map Data ModelKey-value Data ModelGraph Data ModelColumnar DatabasesNewSQL DatabasesTime-Series DatabasesSQL-like processingData IngestionService ProgrammingSchedulingMachine LearningBenchmarkingSecuritySystem DeploymentApplicationsSearch engine and frameworkMySQL forks and evolutionsPostgreSQL forks and evolutionsMemcached forks and evolutionsEmbedded DatabasesBusiness IntelligenceData VisualizationInternet of things and sensor dataInteresting ReadingsInteresting PapersVideosBooks

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me 2001 - 2010 resources from awesome-bigdata"

Installation instructions →

What's inside

Interesting Papers

  • 20032001 - 2010

    The Google File System.

  • 20042001 - 2010

    MapReduce: Simplied Data Processing on Large Clusters.

  • 20062001 - 2010

    The Chubby lock service for loosely-coupled distributed systems.

  • 20062001 - 2010

    Bigtable: A Distributed Storage System for Structured Data.

  • 20072001 - 2010

    Dynamo: Amazon’s Highly Available Key-value Store.

  • 20082001 - 2010

    Chukwa: A large-scale monitoring system.

Internet of things and sensor data

  • 2lemetry

    Platform for Internet of things.

  • Apache Edgent (Incubating)

    a programming model and micro-kernel style runtime that can be embedded in gateways and small footprint edge devices enabling local, real-time, analytics on the edge devices.

  • Azure IoT Hub

    Cloud-based bi-directional monitoring and messaging hub

Applications

  • 411

    an web application for alert management resulting from scheduled searches into Elasticsearch.

  • Adobe spindle

    Next-generation web analytics processing with Scala, Spark, and Parquet.

  • Apache Kiji

    framework to collect and analyze data in real-time, based on HBase.

  • Apache Metron

    a platform that integrates a variety of open source big data technologies in order to offer a centralized tool for security monitoring and analysis.

  • Apache Nutch

    open source web crawler.

  • Apache OODT

    capturing, processing and sharing of data for NASA's scientific archives.

Embedded Databases

  • Actian PSQL

    ACID-compliant DBMS developed by Pervasive Software, optimized for embedding in applications.

  • BerkeleyDB

    a software library that provides a high-performance embedded database for key/value data.

SQL-like processing

Columnar Databases

  • Actian Vector

    column-oriented analytic database.

  • Amazon Redshift

    Amazon's cloud offering, also based on a columnar datastore backend.

  • ClickHouse

    an open-source column-oriented database management system that allows generating analytical data reports in real time.

  • Columnar Storage

    an explanation of what columnar storage is and when you might want it.

Document Data Model

  • Actian Versant

    commercial object-oriented database management systems .

NewSQL Databases

  • ActorDB

    a distributed SQL database with the scalability of a KV store, while keeping the query capabilities of a relational database.

  • Amazon RedShift

    data warehouse service, based on PostgreSQL.

  • BayesDB

    statistic oriented SQL database.

  • Bedrock

    a simple, modular, networked and distributed transaction layer built atop SQLite.

  • CitusDB

    scales out PostgreSQL through sharding and replication.

  • Cockroach

    Scalable, Geo-Replicated, Transactional Datastore.

Showing a sample of 584 resources. View the full list on GitHub →