awesome-bigdata
github.com/eric-erki/awesome-bigdata ↗A curated list of awesome big data frameworks, resources and other awesomeness.
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me 2001 - 2010 resources from awesome-bigdata"
Installation instructions →What's inside
Interesting Papers
- 20032001 - 2010
The Google File System.
- 20042001 - 2010
MapReduce: Simplied Data Processing on Large Clusters.
- 20062001 - 2010
The Chubby lock service for loosely-coupled distributed systems.
- 20062001 - 2010
Bigtable: A Distributed Storage System for Structured Data.
- 20072001 - 2010
Dynamo: Amazon’s Highly Available Key-value Store.
- 20082001 - 2010
Chukwa: A large-scale monitoring system.
Internet of things and sensor data
- 2lemetry
Platform for Internet of things.
- Apache Edgent (Incubating)
a programming model and micro-kernel style runtime that can be embedded in gateways and small footprint edge devices enabling local, real-time, analytics on the edge devices.
- Azure IoT Hub
Cloud-based bi-directional monitoring and messaging hub
Applications
- 411
an web application for alert management resulting from scheduled searches into Elasticsearch.
- Adobe spindle
Next-generation web analytics processing with Scala, Spark, and Parquet.
- Apache Kiji
framework to collect and analyze data in real-time, based on HBase.
- Apache Metron
a platform that integrates a variety of open source big data technologies in order to offer a centralized tool for security monitoring and analysis.
- Apache Nutch
open source web crawler.
- Apache OODT
capturing, processing and sharing of data for NASA's scientific archives.
Embedded Databases
- Actian PSQL
ACID-compliant DBMS developed by Pervasive Software, optimized for embedding in applications.
- BerkeleyDB
a software library that provides a high-performance embedded database for key/value data.
SQL-like processing
- Actian SQL for Hadoop
high performance interactive SQL access to all Hadoop data.
- Apache Calcite
framework that allows efficient translation of queries involving heterogeneous and federated data.
- Apache Drill
framework for interactive analysis, inspired by Dremel.
- Apache HCatalog
table and storage management layer for Hadoop.
- Apache Hive
SQL-like data warehouse system for Hadoop.
- Apache Phoenix
SQL skin over HBase.
Columnar Databases
- Actian Vector
column-oriented analytic database.
- Amazon Redshift
Amazon's cloud offering, also based on a columnar datastore backend.
- ClickHouse
an open-source column-oriented database management system that allows generating analytical data reports in real time.
- Columnar Storage
an explanation of what columnar storage is and when you might want it.
Document Data Model
- Actian Versant
commercial object-oriented database management systems .
NewSQL Databases
- ActorDB
a distributed SQL database with the scalability of a KV store, while keeping the query capabilities of a relational database.
- Amazon RedShift
data warehouse service, based on PostgreSQL.
- BayesDB
statistic oriented SQL database.
- Bedrock
a simple, modular, networked and distributed transaction layer built atop SQLite.
- CitusDB
scales out PostgreSQL through sharding and replication.
- Cockroach
Scalable, Geo-Replicated, Transactional Datastore.
Showing a sample of 584 resources. View the full list on GitHub →