Skip to main content

:memo: An awesome Data Science repository to learn and apply for real world problems.

29k
GitHub Stars
904
Curated Resources
8
Categories
1 hour ago
Last Refreshed
What is Data Science?AgentsTraining ResourcesThe Data Science ToolboxLiterature and MediaSocializeFunOther Awesome Lists

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me tutorials resources from awesome-datascience"

Installation instructions →

What's inside

Other Awesome Lists

Fun

Literature and Media

The Data Science Toolbox

  • AdaBoostComparison

  • Adaptive resonance theoryComparison

  • AerosolveMiscellaneous Tools

    A machine learning package built for humans.

  • AI for DatabaseMiscellaneous Tools

    Chat with your database in natural language — no SQL needed. Get instant insights, build self-refreshing dashboards, and trigger automated workflows based on database changes.

  • AlbumentationsMiscellaneous Tools

    А fast and framework agnostic image augmentation library that implements a diverse set of augmentation techniques. Supports classification, segmentation, and detection out of the box. Was used to win a number of Deep Learning competitions at Kaggle, Topcoder and those that were a part of the CVPR workshops.

  • altairDeep Learning Packages

Agents

  • ADK-RustFrameworks

    Production-ready AI agent development kit for Rust with model-agnostic design (Gemini, OpenAI, Anthropic), multiple agent types (LLM, Graph, Workflow), MCP support, and built-in telemetry.

  • ai-evaluationTools

    Open-source LLM and agent evaluation framework with 50+ metrics, LLM-as-Judge augmentation, and guardrail scanners (jailbreak, PII, prompt-injection). Useful for scoring RAG outputs, agent trajectories, and function-calling behavior in data-science workflows.

  • Arch ToolsTools

    61 production-ready AI API tools for data science workflows: code analysis, web scraping, NLP, image generation, crypto data, and search. REST API and MCP protocol support.

  • BGPT MCPResearch & Knowledge Retrieval

    MCP server that gives AI agents access to a database of scientific papers built from raw experimental data extracted from full-text studies. Returns 25+ structured fields per paper including methods, results, sample sizes, and quality scores.

  • CAJALTools

    Local AI agent for generating publication-ready scientific papers with real arXiv citations, IMRaD structure, and tribunal scoring. Runs 100% offline via Ollama with 4B-9B models. MIT licensed.

  • Chunk TunerResearch & Knowledge Retrieval

    Open-source Python library and MCP server to benchmark document chunking strategies for RAG, score retrieval quality, and recommend configurations for a corpus.

Socialize

What is Data Science?

  • a very short history of #datascience

    The story of how data scientists became sexy is mostly the story of the coupling of the mature discipline of statistics with a very young one--computer science. The term “Data Science” has emerged only recently to specifically designate a new profession that is expected to make sense of the vast stores of big data. But making sense of data has a long history and has been discussed by scientists, statisticians, librarians, computer scientists and others for years. The following timeline traces the evolution of the term “Data Science” and its use, attempts to define it, and related terms.

Showing a sample of 904 resources. View the full list on GitHub →