Skip to main content

A curated list of awesome big data frameworks, resources and other awesomeness. With repository stars⭐ and forks🍴

13
GitHub Stars
657
Curated Resources
36
Categories
1 hour ago
Last Refreshed
RDBMSFrameworksDistributed ProgrammingDistributed FilesystemDistributed IndexDocument Data ModelKey Map Data ModelKey-value Data ModelGraph Data ModelColumnar DatabasesNewSQL DatabasesTime-Series DatabasesLakehouse Table FormatsSQL-like processingVector DatabasesData IngestionData Quality and ObservabilityService ProgrammingSchedulingMachine LearningBenchmarkingSecuritySystem DeploymentApplicationsSearch engine and frameworkMySQL forks and evolutionsPostgreSQL forks and evolutionsMemcached forks and evolutionsEmbedded DatabasesBusiness IntelligenceData VisualizationInternet of things and sensor dataInteresting ReadingsInteresting PapersVideosBooks

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me 2001 - 2010 resources from fucking-awesome-bigdata"

Installation instructions →

What's inside

Interesting Papers

  • 20032001 - 2010

    The Google File System.

  • 20042001 - 2010

    MapReduce: Simplied Data Processing on Large Clusters.

  • 20062001 - 2010

    Bigtable: A Distributed Storage System for Structured Data.

  • 20062001 - 2010

    The Chubby lock service for loosely-coupled distributed systems.

  • 20072001 - 2010

    Dynamo: Amazon’s Highly Available Key-value Store.

  • 20082001 - 2010

    Chukwa: A large-scale monitoring system.

Internet of things and sensor data

  • 2lemetry

    Platform for Internet of things.

  • Ably

    Pub/sub messaging platform for IoT

  • Apache Edgent (Incubating)

    a programming model and micro-kernel style runtime that can be embedded in gateways and small footprint edge devices enabling local, real-time, analytics on the edge devices.

  • Azure IoT Hub

    Cloud-based bi-directional monitoring and messaging hub

Applications

  • 411

    an web application for alert management resulting from scheduled searches into Elasticsearch.

  • Adobe spindle

    Next-generation web analytics processing with Scala, Spark, and Parquet.

  • Apache Metron

    a platform that integrates a variety of open source big data technologies in order to offer a centralized tool for security monitoring and analysis.

  • Apache Nutch

    open source web crawler.

  • Apache OODT

    capturing, processing and sharing of data for NASA's scientific archives.

  • Apache Tika

    content analysis toolkit.

Embedded Databases

  • Actian PSQL

    ACID-compliant DBMS developed by Pervasive Software, optimized for embedding in applications.

  • BerkeleyDB

    a software library that provides a high-performance embedded database for key/value data.

SQL-like processing

  • Actian SQL for Hadoop

    high performance interactive SQL access to all Hadoop data.

  • Apache Calcite

    framework that allows efficient translation of queries involving heterogeneous and federated data.

  • Apache Doris

    real-time analytical database for high-concurrency SQL analytics, search, and warehousing.

  • Apache Drill

    framework for interactive analysis, inspired by Dremel.

  • Apache HCatalog

    table and storage management layer for Hadoop.

  • Apache Hive

    SQL-like data warehouse system for Hadoop.

Columnar Databases

Document Data Model

  • Actian Versant

    commercial object-oriented database management systems .

Graph Data Model

  • Actionbase

    a database for user interactions (likes, views, follows) with precomputed reads, supports HBase.

  • AgensGraph

    transactional graph database based on PostgreSQL.

  • Apache Spark Bagel

    implementation of Pregel, part of Spark.

  • ArangoDB

    multi model distributed database.

  • ArcadeDB

    multi-model database with graph, document, key-value, time-series and vector support.

Showing a sample of 657 resources. View the full list on GitHub →