awesome-bigdata
github.com/oxnr/awesome-bigdata ↗A curated list of awesome big data frameworks, ressources and other awesomeness.
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me 2001 - 2010 resources from awesome-bigdata"
Installation instructions →What's inside
Interesting Papers
- 20032001 - 2010
The Google File System.
- 20042001 - 2010
MapReduce: Simplied Data Processing on Large Clusters.
- 20062001 - 2010
The Chubby lock service for loosely-coupled distributed systems.
- 20062001 - 2010
Bigtable: A Distributed Storage System for Structured Data.
- 20072001 - 2010
Dynamo: Amazon’s Highly Available Key-value Store.
- 20082001 - 2010
Chukwa: A large-scale monitoring system.
Internet of things and sensor data
- 2lemetry
Platform for Internet of things.
- Ably
Pub/sub messaging platform for IoT
- Apache Edgent (Incubating)
a programming model and micro-kernel style runtime that can be embedded in gateways and small footprint edge devices enabling local, real-time, analytics on the edge devices.
- Azure IoT Hub
Cloud-based bi-directional monitoring and messaging hub
Applications
- 411
an web application for alert management resulting from scheduled searches into Elasticsearch.
- Adobe spindle
Next-generation web analytics processing with Scala, Spark, and Parquet.
- Apache Metron
a platform that integrates a variety of open source big data technologies in order to offer a centralized tool for security monitoring and analysis.
- Apache Nutch
open source web crawler.
- Apache OODT
capturing, processing and sharing of data for NASA's scientific archives.
- Apache Tika
content analysis toolkit.
Embedded Databases
- Actian PSQL
ACID-compliant DBMS developed by Pervasive Software, optimized for embedding in applications.
- BerkeleyDB
a software library that provides a high-performance embedded database for key/value data.
SQL-like processing
- Actian SQL for Hadoop
high performance interactive SQL access to all Hadoop data.
- Apache Calcite
framework that allows efficient translation of queries involving heterogeneous and federated data.
- Apache Doris
real-time analytical database for high-concurrency SQL analytics, search, and warehousing.
- Apache Drill
framework for interactive analysis, inspired by Dremel.
- Apache HCatalog
table and storage management layer for Hadoop.
- Apache Hive
SQL-like data warehouse system for Hadoop.
Columnar Databases
- Actian Vector
column-oriented analytic database.
- Amazon Redshift
Amazon's cloud offering, also based on a columnar datastore backend.
Document Data Model
- Actian Versant
commercial object-oriented database management systems .
Graph Data Model
- Actionbase
a database for user interactions (likes, views, follows) with precomputed reads, supports HBase.
- AgensGraph
transactional graph database based on PostgreSQL.
- Apache Spark Bagel
implementation of Pregel, part of Spark.
- ArangoDB
multi model distributed database.
- ArcadeDB
multi-model database with graph, document, key-value, time-series and vector support.
Showing a sample of 657 resources. View the full list on GitHub →