awesome-streaming
github.com/eric-erki/awesome-streaming ↗a curated list of awesome streaming frameworks, applications, etc
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me toolkit resources from awesome-streaming"
Installation instructions →What's inside
Toolkit
- aeron
efficient reliable unicast and multicast message transport.
- akka
toolkit and runtime for building highly concurrent, distributed, and resilient message-driven application on the JVM.
- pulsar
Actor based event driven concurrent framework for Python.
- samza-luwak
uses Luwak, a stored-query engine built on Lucene, to implement full-text search on streams.
- StreamFlow
stream processing tool designed to help build and monitor processing workflows.
- Turbine
tool for aggregating streams of Server-Sent Event (SSE) JSON data into a single stream.
Closed Source
- Amazon Kinesis Streams
real-time, fully managed and scalable data stream engine provided by AWS.
- Azure Stream Analytics
- Cloud Dataflow
Google's managed stream and batch data processing engine. Supports running Beam pipelines.
- concord
a distributed stream processing framework built in C++ on top of Apache.
- IBM Streams
platform for distributed processing and real-time analytics. Provides toolkits for advanced analytics like geospatial, time series, etc. out of the box.
- jubatus
distributed processing framework and streaming machine learning library.
Streaming Engine
- Apache Apex
unified platform for big data stream and batch processing.
- Apache Flink
system for high-throughput, low-latency data stream processing that supports stateful computation, data-driven windowing semantics and iterative stream processing.
- Apache Heron (incubating)
a realtime, distributed, fault-tolerant stream processing engine from Twitter.
- Apache Samza
distributed stream processing framework that build on Kafka(messaging, storage) and YARN(fault tolerance, processor isolation, security and resource management).
- Apache Spark Streaming
makes it easy to build scalable fault-tolerant streaming applications.
- Apache Storm
distributed real-time computation system. Storm is to stream processing what Hadoop is to batch processing.
DSL
- Apache Beam
unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs), open sourced by Google.
- coast
a DSL that builds DAGs on top of Samza and provides exactly-once semantics.
- Esper
component for complex event processing (CEP) and event series analysis.
- Streamparse
lets you run Python code against real-time streams of data via Apache Storm.
- summingbird
library that lets you write MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed MapReduce platforms, including Storm and Scalding.
IoT
- Apache Edgent
a programming model and runtime that enables continuous streaming analytics on gateways and edge devices which can work with centralized systems to provide efficient and timely analytics across the whole IoT ecosystem: from the center to the edge, opens sourced by IBM.
- Apache StreamPipes
a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.
- sensorbee
lightweight stream processing engine for IoT.
Data Pipeline
- Apache Kafka
distributed, partitioned, replicated commit log service, which provides the functionality of a messaging system, but with a unique design.
- Apache Pulsar
distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API.
- brooklin
a distributed system intended for streaming data between various heterogeneous source and destination systems with high reliability and throughput at scale from Linkedin (replaced databus).
- camus
Linkedin's Kafka -> HDFS pipeline.
- databus
Linkedin's source-agnostic distributed change data capture system.
- flume
distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.
Online Machine Learning
- Apache Samoa
distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms.
- DataSketches
sketches library from Yahoo!.
- StormCV
enables the use of Apache Storm for video processing by adding computer vision (CV) specific operations and data model.
- streamDM
mining Big Data streams using Spark Streaming from Huawei.
- StreamingBandit
Provides a webserver to quickly setup and evaluate possible solutions to contextual multi-armed bandit (cMAB) problems.
- trident-ml
realtime online machine learning library based on Trident.
Streaming Library
- Benthos
Benthos is a high performance and resilient message streaming service, able to connect various sources and sinks and perform arbitrary actions, transformations and filters on payloads
- FS2(prev. 'Scalaz-Stream')
Compositional, streaming I/O library for Scala.
- monix
high-performance Scala / Scala.js library for composing asynchronous and event-based programs.
- StreamAlert
Airbnb's Real-time Data Analysis and Alerting.
- Streamline
Stream Analytics Framework by Hortonworks, designed as a wrapper around existing streaming solutions like Storm. Aimed to allow users to drag-and-drop streaming components to focus on business logic.
- Stream Ops
A fully embeddable data streaming engine and stream processing API for Java.
Showing a sample of 82 resources. View the full list on GitHub →