Skip to main content

A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning

6
GitHub Stars
232
Curated Resources
20
Categories
23 hours ago
Last Refreshed
Explaining Black Box Models and DatasetsPrivacy Preserving Machine LearningModel and Data VersioningModel Deployment and Orchestration FrameworksAdversarial Robustness LibrariesNeural Architecture SearchData Science Notebook FrameworksIndustrial Strength Visualisation librariesIndustrial Strength NLPData Pipeline ETL FrameworksData Labelling Tools and FrameworksData Storage OptimisationFunction as a Service FrameworksComputation load distribution frameworksModel serialisation formatsCompiler optimisation frameworksData Stream ProcessingFeature Engineering AutomationFeature StoresCommercial Platforms

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me adversarial robustness libraries resources from awesome-production-machine-learning"

Installation instructions →

What's inside

Adversarial Robustness Libraries

  • AdvBox

    generate adversarial examples from the command line with 0 coding using PaddlePaddle, PyTorch, Caffe2, MxNet, Keras, and TensorFlow. Includes 10 attacks and also 6 defenses. Used to implement

  • Adversarial DNN Playground

    the attack library is limited in size, but it has a nice front-end to it with buttons you can press!

  • AdverTorch

    library for adversarial attacks / defenses specifically for PyTorch.

  • Alibi Detect

    alibi-detect is a Python package focused on outlier, adversarial and concept drift detection. The package aims to cover both online and offline detectors for tabular data, text, images and time series. The outlier detection methods should allow the user to identify global, contextual and collective outliers.

  • Artificial Adversary

  • CleverHans

    library for testing adversarial attacks / defenses maintained by some of the most important names in adversarial ML, namely Ian Goodfellow (ex-Google Brain, now Apple) and Nicolas Papernot (Google Brain). Comes with some nice tutorials!

Explaining Black Box Models and Datasets

  • Aequitas

    An open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers to audit machine learning models for discrimination and bias, and to make informed and equitable decisions around developing and deploying predictive risk-assessment tools.

  • Alibi

    Alibi is an open source Python library aimed at machine learning model inspection and interpretation. The initial focus on the library is on black-box, instance based model explanations.

  • anchor

    Code for the paper

  • captum

    model interpretability and understanding library for PyTorch developed by Facebook. It contains general purpose implementations of integrated gradients, saliency maps, smoothgrad, vargrad and others for PyTorch models.

  • casme

    Example of using classifier-agnostic saliency map extraction on ImageNet presented on the paper

  • ContrastiveExplanation (Foil Trees)

    Python script for model agnostic contrastive/counterfactual explanations for machine learning. Accompanying code for the paper

Commercial Platforms

  • Algorithmia

    Cloud platform to build, deploy and serve machine learning models

  • Amazon SageMaker

    End-to-end machine learning development and deployment interface where you are able to build notebooks that use EC2 instances as backend, and then can host models exposed on an API

  • cnvrg.io

    An end-to-end platform to manage, build and automate machine learning

  • Comet.ml

    Machine learning experiment management. Free for open source and students

  • Dataiku

    Collaborative data science platform powering both self-service analytics and the operationalization of machine learning models in production.

  • DataRobot

    Automated machine learning platform which enables users to build and deploy machine learning models.

Data Storage Optimisation

  • Alluxio

    A virtual distributed storage system that bridges the gab between computation frameworks and storage systems.

  • Apache Arrow

    In-memory columnar representation of data compatible with Pandas, Hadoop-based systems, etc

  • Apache Kafka

    Distributed streaming platform framework

  • Apache Parquet

    On-disk columnar representation of data compatible with Pandas, Hadoop-based systems, etc

  • BayesDB

    Database that allows for built-in non-parametric Bayesian model discovery and queryingi for data on a database-like interface -

  • ClickHouse

    ClickHouse is an open source column oriented database management system supported by Yandex -

Data Pipeline ETL Frameworks

  • Apache Airflow

    Data Pipeline framework built in Python, including scheduler, DAG definition and a UI for visualisation

  • Apache Nifi

    Apache NiFi was made for dataflow. It supports highly configurable directed graphs of data routing, transformation, and system mediation logic.

  • Azkaban

    Azkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop jobs. Azkaban resolves the ordering through job dependencies and provides an easy to use web user interface to maintain and track your workflows.

  • Genie

    Job orchestration engine to interface and trigger the execution of jobs from Hadoop-based systems

  • Luigi

    Luigi is a Python module that helps you build complex pipelines of batch jobs, handling dependency resolution, workflow management, visualisation, etc

  • Neuraxle

    A framework for building neat pipelines, providing the right abstractions to chain your data transformation and prediction steps with data streaming, as well as doing hyperparameter searches (AutoML).

Data Stream Processing

  • Apache Flink

    Open source stream processing framework with powerful stream and batch processing capabilities.

  • Apache Samza

    Distributed stream processing framework. It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management.

  • Brooklin

    Distributed stream processing framework. It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management.

  • Faust

    Streaming library built on top of Python's Asyncio library using the async kafka client inspired by the kafka streaming library.

  • Kafka Streams

    Kafka client library for buliding applications and microservices where the input and output are stored in kafka clusters

Model and Data Versioning

  • Apache Marvin

  • Catalyst

    High-level utils for PyTorch DL & RL research. It was developed with a focus on reproducibility, fast experimentation and code/ideas reusing.

  • D6tflow

    A python library that allows for building complex data science workflows on Python.

  • DAGsHub

    The home for data science collaboration. A platform, based on DVC, for data science project management and collaboration.

  • Data Version Control (DVC)

    A git fork that allows for version management of models

  • FGLab

    Machine learning dashboard, designed to make prototyping experiments easier.

Function as a Service Frameworks

  • Apache OpenWhisk

    Open source, distributed serverless platform that executes functions in response to events at any scale.

  • Fission

    (Early Alpha) Serverless functions as a service framework on Kubernetes

  • Hydrosphere Mist

    Serverless proxy for Apache Spark clusters

  • Hydrosphere ML Lambda

    Open source model management cluster for deploying, serving and monitoring machine learning models and ad-hoc algorithms with a FaaS architecture

  • KNative Serving

    Kubernetes based serverless microservices with "scale-to-zero" functionality.

  • OpenFaaS

    Serverless functions framework with RESTful API on Kubernetes

Showing a sample of 232 resources. View the full list on GitHub →