awesome-data-science-and-engineering
github.com/dineshkarthik/awesome-data-science-and-engineering ↗A curated list of Data Science and Engineering frameworks, tools, libraries and related list of tutorials.
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me libraries resources from awesome-data-science-and-engineering"
Installation instructions →What's inside
Libraries
- Aggregation and grouping
aggregation (such as min, max, sum, count, etc.) and grouping.
- Alembic and Postgresql/
Running simple migrations on postgres using alembic.
- Array manipulation
changing shapes, transpose-like operations, changing number of dimensions, changing kind of array, joining-splitting-tilling of arrays, adding-removing & rearraning elements
- Arrays and Vectorized Computation
ndarrays, array & scalar operations, transposing arrays and swapping axes
- Auto-generate migrations
Alembic can view the status of the database and compare against the table metadata in the application, generating the “obvious” migrations based on a comparison.
- Basics
Reading files into data frames and selecting.
Tools
- Basic Features
- Documentation
- Getting started
a small dive into basic features
- GitHub Repo
- GPU Notebooks
- Importing Libraries
Frameworks
- Custom Operator and Sensor
write your own operator and sensor.
- Deep Dive into airflow on kubernetes executor
Running airflow reliabley with Kubernetes.
- Executors
explanation on different type of executors
- Getting started
simple airflow setup and running dags.
- Homepage
- Installation
Big Data
- Documentation
- Example scripts
set of example scriupts from Apache spark github repo.
- Getting Started
overview os sparkcontext, sqlcontext, spark ml, basic operations, data processing.
- Homepage
- Installation
- Introduction to Dataframes in PySpark
creating dataframes, commonly used functions on dataframes.
Showing a sample of 67 resources. View the full list on GitHub →