Skip to main content

A curated list of resources for testing, monitoring, and improving data quality across various data environments.

1
GitHub Stars
46
Curated Resources
4
Categories
17 hours ago
Last Refreshed
Frameworks and LibrariesBooks and MethodologiesToolsArticles and Guides

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me open source resources from awesome-data-quality"

Installation instructions →

What's inside

Articles and Guides

Books and Methodologies

  • Book

    By David Loshin.

  • Book

    By Carlo Batini/Monica Scannapieco.

  • Book

    By Arkady Maydanchik.

  • Book

    By Larry English.

  • Book

    By Danette McGilvray.

  • Resource

    From the Canadian Institute for Health Information.

Frameworks and Libraries

  • GitHubOpen Source

    Another data quality monitoring tool implemented using Spark.

  • GitHubOpen Source

    Enables data testing through extended SQL queries.

  • GitHubOpen Source

    Python library for assessing data quality throughout stages of the data pipeline development.

  • GitHubOpen Source

    Data monitoring and observability tailored to dbt.

  • GitHubCommercial

    Metadata service for collecting, aggregating, and visualizing a data ecosystem's metadata.

  • GitHubOpen Source

    Data Quality solution for distributed data systems at any scale in both streaming and batch data context.

Tools

  • GitHubOpen Source Tools

    Behavior-driven development tool for data quality testing.

  • GitHubOpen Source Tools

    Python library for data reliability.

  • GitHubOpen Source Tools

    Data transformation tool with built-in testing capabilities.

  • GitHubOpen Source Tools

    For defining unit tests for data.

  • GitHubOpen Source Tools

    Automates data quality checks.

  • GitHubOpen Source Tools

    Data validation and profiling.

Showing a sample of 46 resources. View the full list on GitHub →