Skip to main content

A curated list of Python libraries used for data science.

96
GitHub Stars
321
Curated Resources
16
Categories
2 hours ago
Last Refreshed
Machine Learning FrameworksScientificOutlier DetectionDeep Learning FrameworksDeep Learning ToolsDeep Learning ProjectsVisualizationAutoMLExplorationFeature ExtractionTradingMiscDeploymentProfilingPython ToolsData Gathering

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me images and video resources from awesome-python-data-science"

Installation instructions →

What's inside

Deployment

  • airflow

    ETL.

  • evidently

    Evidently helps evaluate machine learning models during validation and monitor them in production.

  • kubeflow

    Machine Learning Toolkit for Kubernetes.

  • lore

    Lore makes machine learning approachable for Software Engineers and maintainable for Machine Learning Researchers.

  • mlflow

    Open source platform for the complete machine learning lifecycle.

  • onnx

    Open Neutral Network Exchange.

Feature Extraction

  • albumentationsImages and Video

    fast image augmentation library.

  • AugmentorImages and Video

    Image augmentation library.

  • BERT-pytorchText/NLP

    Google AI 2018 BERT pytorch implementation.

  • BlingFireText/NLP

    A lightning fast Finite State machine and REgular expression manipulation library.

  • categorical-encodingGeneral Feature Extraction

    sklearn compatible categorical variable encoders.

  • CausalityTime Series

    Causal analysis.

Exploration

  • alibi

    Algorithms for monitoring and explaining machine learning models.

  • alibi-detect

    Open source Python library focused on outlier, adversarial and drift detection. The package aims to cover both online and offline detectors for tabular data, text, images and time series.

  • cleanlab

    Finding label errors in datasets and learning with noisy labels.

  • dabl

    Data Analysis Baseline Library

  • Dora

    Exploratory data analysis.

  • dtale

    Flask/React client for visualizing pandas data structures

Deep Learning Tools

  • allennlp

    NLP Research library.

  • DLTK

    Deep Learning Toolkit for Medical Image Analysis.

  • Edward

    Probabilistic programming language in TensorFlow.

  • einops

    Deep learning operations reinvented.

  • foolbox

    Python toolbox to create adversarial examples that fool neural networks.

  • gluon-nlp

    NLP made easy.

Visualization

  • altair

    Declarative statistical visualization.

  • bokeh

    Interactive web plotting.

  • bqplot

    Plotting library for IPython/Jupyter Notebooks.

  • cufflinks

    Productivity Tools for Plotly + Pandas.

  • dash

    Interactive Web plotting.

  • datashader

    Graphics pipeline system.

Misc

  • annoyGeneral Feature Extraction

    Approximate Nearest Neighbors.

  • crayon

    A language-agnostic interface to TensorBoard.

  • faiss

    A library for efficient similarity search and clustering of dense vectors.

  • fbpca

    Fast Randomized PCA/SVD.

  • mmh3

    MurmurHash3, a set of fast and robust hash functions.

  • pipeline

    Standard Runtime For Every Real-Time Machine Learning.

Scientific

  • astropy

    Astronomy and astrophysics.

  • Biopython

    Astronomy and astrophysics.

  • blaze

    NumPy and Pandas for databases.

  • cvxpy

    Python-embedded modeling language for convex optimization problems.

  • dask

    Parallel computing with task scheduling.

  • nilearn

    NeuroImaging.

AutoML

  • autokeras

    Automated machine learning in Keras.

  • auto_ml

    Automated machine learning.

  • auto-sklearn

    Automated machine learning.

  • devol

    Automated deep neural network design via genetic programming.

  • featuretools

    Automated feature engineering.

  • MLBox

    Automated Machine Learning python library.

Showing a sample of 321 resources. View the full list on GitHub →