Skip to main content

A curated list of awesome resources for creating synthetic data

45
GitHub Stars
74
Curated Resources
3
Categories
21 hours ago
Last Refreshed
Data-driven methodsProcess-driven methodsMetrics and dataset evaluation

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me tabular resources from awesome-data-synthesis"

Installation instructions →

What's inside

Process-driven methods

Data-driven methods

  • bayesian-synthetic-generatorTabular

    Repository of a software system for generating synthetic personal data based on the Bayesian network block structure

  • Bn-learn Latent ModelTabular

    Generating High-Fidelity Synthetic Patient Data for Assessing Machine Learning Healthcare Software -

  • bnomicsTabular

    Synthetic data generation with probabilistic Bayesian Networks -

  • CLGPTabular

    categorical latent Gaussian process is a generative model for multivariate categorical data -

  • COR-GANTabular

    Correlation-Capturing Convolutional Neural Networks for Generating Synthetic Healthcare Records -

  • CTGANTabular

    CTGAN is a GAN-based data synthesizer that can generate synthetic tabular data with high fidelity. -

Metrics and dataset evaluation

  • datagene

  • SDGym

    Synthetic Data Gym (SDGym) is a framework to benchmark the performance of synthetic data generators for tabular data. SDGym is a project of the Data to AI Laboratory at MIT.

  • SDMetrics

  • SDV evaluation functionsMultiple formats

  • Statistical-Similarity-Measurement

    A methodology designed to validate the statistical similarity of synthetic data generated by GAN models. The metrics contain Auto-encoder, PCA, t-SNE, KL-divergence, Clustering, and Cosine Similarity.

  • table-evaluator

Showing a sample of 74 resources. View the full list on GitHub →