Skip to main content

A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning

21k
GitHub Stars
538
Curated Resources
24
Categories
17 hours ago
Last Refreshed
AutoMLComputation and Communication OptimisationData Annotation and SynthesisData PipelineData Science NotebookData Storage OptimisationData Stream ProcessingDeployment and ServingEvaluation and MonitoringExplainability and FairnessFeature StoreIndustry-strength Anomaly DetectionIndustry Strength Computer VisionIndustry Strength Information RetrievalIndustry Strength Natural Language ProcessingIndustry Strength Recommender SystemIndustry Strength Reinforcement LearningIndustry Strength RoboticsIndustry Strength VisualisationMetadata ManagementModel, Data and Experiment ManagementModel Training and OrchestrationModel Storage OptimisationPrivacy and Safety

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me computation and communication optimisation resources from awesome-production-machine-learning"

Installation instructions →

What's inside

Computation and Communication Optimisation

  • Accelerate

    Accelerate abstracts exactly and only the boilerplate code related to multi-GPU/TPU/mixed-precision and leaves the rest of your code unchanged.

  • Adapters

    Adapters is a unified library for parameter-efficient and modular transfer learning.

  • BitBLAS

    BitBLAS is a library to support mixed-precision BLAS operations on GPUs

  • bitsandbytes

    Bitsandbytes library is a lightweight Python wrapper around CUDA custom functions, in particular 8-bit optimizers, matrix multiplication (LLM.int8()), and 8 & 4-bit quantization functions.

  • Cache-DiT

    Cache-DiT is built on top of Diffusers and supports nearly all DiTs, providing hybrid cache acceleration (DBCache, TaylorSeer, SCM, etc.) and comprehensive parallelism optimizations including Context Parallelism, Tensor Parallelism, and hybrid 2D/3D parallelism, with compatibility for compilation, CPU offloading, and quantization.

  • Colossal-AI

    A unified deep learning system for big model era, which helps users to efficiently and quickly deploy large AI model training and inference.

Industry Strength Reinforcement Learning

  • Acme

    Acme is a library of reinforcement learning (RL) building blocks that strives to expose simple, efficient, and readable agents.

  • AReaL

    AReaL is a reinforcement learning library.

  • ChatLearn

    ChatLearn is a flexible and efficient reinforcement learning training framework for large language models, supporting distributed training engines (FSDP2, Megatron) and inference engines (vLLM, SGLang) with modern RL algorithms such as GRPO and GSPO.

  • CleanRL

    CleanRL is a Deep Reinforcement Learning library that provides high-quality single-file implementation with research-friendly features. The implementation is clean and simple, yet we can scale it to run thousands of experiments using AWS Batch.

  • CompilerGym

    CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks.

  • d3rlpy

    d3rlpy is an offline deep reinforcement learning library for practitioners and researchers.

Explainability and Fairness

  • Aequitas

    An open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers to audit machine learning models for discrimination and bias, and to make informed and equitable decisions around developing and deploying predictive risk-assessment tools.

  • AI Explainability 360

    Interpretability and explainability of data and machine learning models including a comprehensive set of algorithms that cover different dimensions of explanations along with proxy explainability metrics.

  • AI Fairness 360

    A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.

  • Alibi

    Alibi is an open source Python library aimed at machine learning model inspection and interpretation. The initial focus on the library is on black-box, instance based model explanations.

  • captum

    model interpretability and understanding library for PyTorch developed by Facebook. It contains general purpose implementations of integrated gradients, saliency maps, smoothgrad, vargrad and others for PyTorch models.

Deployment and Serving

  • Agenta

    Agenta provides end-to-end tools for the entire LLMOps workflow: building (LLM playground, evaluation), deploying (prompt and configuration management), and (LLM observability and tracing).

  • AirLLM

    AirLLM optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card without quantization, distillation and pruning.

  • AITemplate

    AITemplate (AIT) is a Python framework that transforms deep neural networks into CUDA (NVIDIA GPU) / HIP (AMD GPU) C++ code for lightning-fast inference serving.

  • BentoML

    BentoML is an open source framework for high performance ML model serving.

  • BISHENG

    BISHENG is an open LLM application devops platform, focusing on enterprise scenarios.

  • DeepDetect

    Machine Learning production server for TensorFlow, XGBoost and Cafe models written in C++ and maintained by Jolibrain.

Industry Strength Robotics

  • AI2-THOR

    AI2-THOR is a near photo-realistic interactable framework for AI agents.

AutoML

  • AIDE

    AIDE is an open-source ML engineering agent that uses a tree search algorithm to autonomously explore, implement, and evaluate solution strategies for machine learning tasks.

  • AutoGluon

    Automated feature, model, and hyperparameter selection for tabular, image, and text data on top of popular machine learning libraries (Scikit-Learn, LightGBM, CatBoost, PyTorch, MXNet).

  • Autokeras

    AutoML library for Keras based on

  • auto-sklearn

    Framework to automate algorithm and hyperparameter tuning for sklearn.

  • Ax

    Ax is an accessible, general-purpose platform for understanding, managing, deploying, and automating adaptive experiments.

  • BoTorch

    BoTorch is a library for Bayesian Optimization built on PyTorch.

Privacy and Safety

  • AI Gateway

    The AI Gateway is a blazing fast AI Gateway with integrated guardrails.

  • AI Job Displacement Tracker

    Structured, source-backed dataset tracking 96 AI-attributed workforce reductions (457K workers affected, 13 countries, 13 sectors). Every entry includes source URLs, attribution tier, and job functions.

  • ART

    ART (Adversarial Robustness Toolbox) provides tools that enable developers and researchers to defend and evaluate Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference.

  • Awesome AI Regulation

    Covers governance, compliance, and regulatory frameworks essential for responsible ML system deployment across different jurisdictions.

  • Awesome Production GenAI

    Focuses specifically on generative AI deployment, including LLM operations, prompt engineering, and GenAI-specific monitoring and safety tools.

  • Awesome RAG Production

    Curated list of production-grade tools and best practices for building scalable RAG systems.

Model, Data and Experiment Management

  • Aim

    A super-easy way to record, search and compare AI experiments.

  • ClearML

    Auto-Magical Experiment Manager & Version Control for AI (previously Trains).

  • DataHub

    DataHub is an open-source data catalog for the modern data stack.

  • Dolt

    Dolt is a SQL database that you can fork, clone, branch, merge, push and pull just like a git repository.

  • DVC

    DVC (Data Version Control) is a git fork that allows for version management of models.

Showing a sample of 538 resources. View the full list on GitHub →