awesome-bigdata
github.com/bbauska/awesome-bigdata ↗Just big data
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me 2001 - 2010 resources from awesome-bigdata"
Installation instructions →What's inside
Interesting Papers
- 20032001 - 2010
The Google File System.
- 20042001 - 2010
MapReduce: Simplied Data Processing on Large Clusters.
- 20062001 - 2010
Bigtable: A Distributed Storage System for Structured Data.
- 20062001 - 2010
The Chubby lock service for loosely-coupled distributed systems.
- 20072001 - 2010
Dynamo: Amazon’s Highly Available Key-value Store.
- 20082001 - 2010
Chukwa: A large-scale monitoring system.
Internet of things and sensor data
- 2lemetry
Platform for Internet of things.
- Ably
Pub/sub messaging platform for IoT
- Apache Edgent (Incubating)
a programming model and micro-kernel style runtime that can be embedded in gateways and small footprint edge devices enabling local, real-time, analytics on the edge devices.
- Azure IoT Hub
Cloud-based bi-directional monitoring and messaging hub
Applications
- 411
an web application for alert management resulting from scheduled searches into Elasticsearch.
- Adobe spindle
Next-generation web analytics processing with Scala, Spark, and Parquet.
- Apache Metron
a platform that integrates a variety of open source big data technologies in order to offer a centralized tool for security monitoring and analysis.
- Apache Nutch
open source web crawler.
- Apache OODT
capturing, processing and sharing of data for NASA's scientific archives.
- Apache Tika
content analysis toolkit.
Embedded Databases
- Actian PSQL
ACID-compliant DBMS developed by Pervasive Software, optimized for embedding in applications.
- BerkeleyDB
a software library that provides a high-performance embedded database for key/value data.
SQL-like processing
- Actian SQL for Hadoop
high performance interactive SQL access to all Hadoop data.
- Apache Calcite
framework that allows efficient translation of queries involving heterogeneous and federated data.
- Apache Drill
framework for interactive analysis, inspired by Dremel.
- Apache HCatalog
table and storage management layer for Hadoop.
- Apache Hive
SQL-like data warehouse system for Hadoop.
- Apache Phoenix
SQL skin over HBase.
Columnar Databases
- Actian Vector
column-oriented analytic database.
- Amazon Redshift
Amazon's cloud offering, also based on a columnar datastore backend.
- ClickHouse
an open-source column-oriented database management system that allows generating analytical data reports in real time.
- Columnar Storage
an explanation of what columnar storage is and when you might want it.
Document Data Model
- Actian Versant
commercial object-oriented database management systems .
NewSQL Databases
- ActorDB
a distributed SQL database with the scalability of a KV store, while keeping the query capabilities of a relational database.
- Amazon RedShift
data warehouse service, based on PostgreSQL.
- BayesDB
statistic oriented SQL database.
- Bedrock
a simple, modular, networked and distributed transaction layer built atop SQLite.
- CitusDB
scales out PostgreSQL through sharding and replication.
- Cockroach
Scalable, Geo-Replicated, Transactional Datastore.
Showing a sample of 607 resources. View the full list on GitHub →