Skip to main content

a curated list of awesome streaming frameworks, applications, etc

3k
GitHub Stars
132
Curated Resources
12
Categories
3 hours ago
Last Refreshed
Streaming EngineStreaming LibraryStreaming ApplicationIoTDSLData PipelineOnline Machine LearningStreaming SQLBenchmarkToolkitClosed SourceReadings

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me toolkit resources from awesome-streaming"

Installation instructions →

What's inside

Toolkit

  • aeron

    efficient reliable unicast and multicast message transport.

  • akka

    toolkit and runtime for building highly concurrent, distributed, and resilient message-driven application on the JVM.

  • Apache Pekko

    Fork of Akka 2.6.x, prior to the Akka project's adoption of the Business Source License.

  • Eventum

    Data generation platform for producing synthetic event streams based on templates, scripts or log samples.

  • Nussknacker

    A visual tool to define and run real-time decision algorithms.

  • pulsar

    Actor based event driven concurrent framework for Python.

Closed Source

  • Amazon Kinesis Streams

    real-time, fully managed and scalable data stream engine provided by AWS.

  • Azure Stream Analytics

  • Cloud Dataflow

    Google's managed stream and batch data processing engine. Supports running Beam pipelines.

  • concord

    a distributed stream processing framework built in C++ on top of Apache.

  • IBM Streams

    platform for distributed processing and real-time analytics. Provides toolkits for advanced analytics like geospatial, time series, etc. out of the box.

  • jubatus

    distributed processing framework and streaming machine learning library.

Streaming Engine

  • Apache Apex

    unified platform for big data stream and batch processing.

  • Apache Ballista

    distributed compute platform powered by Apache Arrow.

  • Apache Flink

    system for high-throughput, low-latency data stream processing that supports stateful computation, data-driven windowing semantics and iterative stream processing.

  • Apache Heron (incubating)

    a realtime, distributed, fault-tolerant stream processing engine from Twitter.

  • Apache Samza

    distributed stream processing framework that build on Kafka(messaging, storage) and YARN(fault tolerance, processor isolation, security and resource management).

  • Apache Spark Streaming

    makes it easy to build scalable fault-tolerant streaming applications.

DSL

  • Apache Beam

    unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs), open sourced by Google.

  • coast

    a DSL that builds DAGs on top of Samza and provides exactly-once semantics.

  • Esper

    component for complex event processing (CEP) and event series analysis.

  • Streamparse

    lets you run Python code against real-time streams of data via Apache Storm.

  • summingbird

    library that lets you write MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed MapReduce platforms, including Storm and Scalding.

IoT

  • Apache Edgent

    a programming model and runtime that enables continuous streaming analytics on gateways and edge devices which can work with centralized systems to provide efficient and timely analytics across the whole IoT ecosystem: from the center to the edge, opens sourced by IBM.

  • Apache StreamPipes

    a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.

  • sensorbee

    lightweight stream processing engine for IoT.

Data Pipeline

  • Apache Kafka

    distributed, partitioned, replicated commit log service, which provides the functionality of a messaging system, but with a unique design.

  • Apache Pulsar

    distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API.

  • Apache RocketMQ

    distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

  • AutoMQ

    cloud-first alternative to Kafka by decoupling durability to S3 and EBS. 100% Kafka compatible. 10x cost-effective. Autoscale in seconds. Single-digit ms latency.

  • brooklin

    a distributed system intended for streaming data between various heterogeneous source and destination systems with high reliability and throughput at scale from Linkedin (replaced databus).

  • Bruin

    End-to-end data pipeline tool combining ingestion from 50+ sources, SQL/Python transformations, and built-in data quality checks in a single CLI.

Online Machine Learning

  • Apache Samoa

    distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms.

  • DataSketches

    sketches library from Yahoo!.

  • https://github.com/numaproj/numalogic

    Collection of ML models and libraries for real-time anomaly detection and forecasting on time series data. Built on Numaflow, a K8s native stream processing platform

  • River

    online machine learning library.

  • StormCV

    enables the use of Apache Storm for video processing by adding computer vision (CV) specific operations and data model.

  • streamDM

    mining Big Data streams using Spark Streaming from Huawei.

Streaming Library

  • Benthos

    Benthos is a high performance and resilient message streaming service, able to connect various sources and sinks and perform arbitrary actions, transformations and filters on payloads

  • Daggy

    real-time streams aggregation and catching.

  • FastStream

    powerful and easy-to-use Python library simplifying the process of writing producers and consumers for message queues, handling all the parsing, networking and documentation generation automatically. Supports multiple protocols such as Apache Kafka, RabbitMQ and alike.

  • FS2(prev. 'Scalaz-Stream')

    Compositional, streaming I/O library for Scala.

  • Kzmlabs StateFun Actors

    Stateful actors on Apache Flink 2.x with durable per-key state, exactly-once messaging, and Kafka/Kinesis I/O. Continuation of Apache Stateful Functions on Flink 2.2 + Java 21.

  • Mediapipe

    Cross-platform, customizable ML solutions for live and streaming media.

Showing a sample of 132 resources. View the full list on GitHub →