Skip to main content

A collection of awesome software, libraries, Learning Tutorials, documents, books, resources and interesting stuff about Big Data Science & Engineering

13
GitHub Stars
637
Curated Resources
25
Categories
18 hours ago
Last Refreshed
Key-value Data ModelGraph Data ModelDatabasesTime-Series DatabasesSQL-like processingData IngestionService ProgrammingSchedulingMachine LearningBenchmarkingSecuritySystem DeploymentApplicationsSearch engine and frameworkMySQL forks and evolutionsPostgreSQL forks and evolutionsMemcached forks and evolutionsEmbedded DatabasesBusiness IntelligenceData VisualizationInternet of things and sensor dataInteresting ReadingsInteresting PapersVideosBooks

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me 2001 - 2010 resources from data-engineering-collection"

Installation instructions →

What's inside

Interesting Papers

  • 20032001 - 2010

    The Google File System.

  • 20042001 - 2010

    MapReduce: Simplied Data Processing on Large Clusters.

  • 20062001 - 2010

    Bigtable: A Distributed Storage System for Structured Data.

  • 20062001 - 2010

    The Chubby lock service for loosely-coupled distributed systems.

  • 20072001 - 2010

    Dynamo: Amazon’s Highly Available Key-value Store.

  • 20082001 - 2010

    Chukwa: A large-scale monitoring system.

Internet of things and sensor data

  • 2lemetry

    Platform for Internet of things.

  • Ably

    Pub/sub messaging platform for IoT

  • Apache Edgent (Incubating)

    a programming model and micro-kernel style runtime that can be embedded in gateways and small footprint edge devices enabling local, real-time, analytics on the edge devices.

  • Azure IoT Hub

    Cloud-based bi-directional monitoring and messaging hub

Applications

  • 411

    an web application for alert management resulting from scheduled searches into Elasticsearch.

  • Adobe spindle

    Next-generation web analytics processing with Scala, Spark, and Parquet.

  • Apache Metron

    a platform that integrates a variety of open source big data technologies in order to offer a centralized tool for security monitoring and analysis.

  • Apache Nutch

    open source web crawler.

  • Apache OODT

    capturing, processing and sharing of data for NASA's scientific archives.

  • Apache Tika

    content analysis toolkit.

Embedded Databases

  • Actian PSQL

    ACID-compliant DBMS developed by Pervasive Software, optimized for embedding in applications.

  • BerkeleyDB

    a software library that provides a high-performance embedded database for key/value data.

SQL-like processing

Resources

  • Actian Vector

    column-oriented analytic database.

  • Actian Versant

    commercial object-oriented database management systems .

  • ActorDB

    a distributed SQL database with the scalability of a KV store, while keeping the query capabilities of a relational database.

  • AddThis Hydra

    distributed data processing and storage system originally developed at AddThis.

  • Alluxio

    reliable file sharing at memory speed across cluster frameworks.

  • Amazon Redshift

    Amazon's cloud offering, also based on a columnar datastore backend.

Key-value Data Model

  • Aerospike

    NoSQL flash-optimized, in-memory. Open source and "Server code in 'C' (not Java or Erlang) precisely tuned to avoid context switching and memory copies."

  • Bolt

    an embedded key-value database for Go.

  • BTDB

    Key Value Database in .Net with Object DB Layer, RPC, dynamic IL and much more

  • BuntDB

    a fast, embeddable, in-memory key/value database for Go with custom indexing and geospatial support.

Graph Data Model

  • AgensGraph

    a new generation multi-model graph database for the modern complex data environment.

  • Apache Giraph

    implementation of Pregel, based on Hadoop.

  • Apache Spark Bagel

    implementation of Pregel, part of Spark.

Showing a sample of 637 resources. View the full list on GitHub →