awesome-auto-parallelism
github.com/connollyleon/awesome-auto-parallelism ↗A baseline repository of Auto-Parallelism in Training Neural Networks
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me data parallelism + intra-layer model parallelism (or tensor parallelism): resources from awesome-auto-parallelism"
Installation instructions →What's inside
Data Parallelism + Intra-layer Model Parallelism (or Tensor Parallelism):
- AccPar
Tensor partitioning for heterogeneous deep learning accelerators.
- Double Recursive
A Double recursive algorithm to search strategies
- FlexFlow
a deep learning framework that accelerates distributed DNN training by automatically searching for efficient parallelization strategies
- PaSE
PaSE uses a dynamic programming based approach to find an efficient strategy within a reasonable time.
- ROC
Another paper from Zhihao, Jia. Designed for GNN
- TensorOpt
Exploring the Tradeoffs in Distributed DNN Training with Auto-Parallelism
Data Parallelism + Model Parallelism (or Tensor Parallelism) + Pipeline Parallelism:
- Alpa
Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
- DistIR
Horizontal TP. An intermediate representation and simulator for efficient neural network distribution
- GSPMD
a system that uses simple tensor sharding annotations to achieve different parallelism paradigms in a unified way
- Piper
This code package contains algorithms (proof-of-concept implementation) and input files (profiled DNN models / workloads) from the paper "Piper: Multidimensional Planner for DNN Parallelization" published at NeurIPS 2021. An extension of DNN partitioning
Data Parallelism + Pipeline Parallelism (or Inter-layer Model Parallelism):
- Chimera
Efficiently training large-scale neural networks with bidirectional pipelines
- DAPPLE
An Efficient Pipelined Data Parallel Approach for Training Large Model. Succeed from GPipe
- DNN-partitioning
published at NeurIPS 2020.
- FTPipe
FTPipe can automatically transform sequential implementation into a multi-GPU one.
- HeterPS
distributed deep learning with RL based scheduling in heterogeneous environment.
- PipeDream
This repository contains the source code implementation of PipeDream and PipeDream-2BW
Other Interesting automatic work
- TASO
automatically optimize DNN computation with graph substitution
Pipeline Parallelism or Inter-layer Model Parallelism only:
- torchgpipe
An A GPipe implementation in PyTorch
- vPipe
A pipeline only system designed for NAS network. Complementary to hybrid parallelism
Showing a sample of 22 resources. View the full list on GitHub →