awesome-data-synthesis
github.com/joofio/awesome-data-synthesis ↗A curated list of awesome resources for creating synthetic data
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me tabular resources from awesome-data-synthesis"
Installation instructions →What's inside
Process-driven methods
- BadMedicineTabular
- bindataTabular
- charlatanTabular
- conjurerTabular
R Package to generate synthetic data.
- datasynthRTabular
- fabricatrTabular
Data-driven methods
- bayesian-synthetic-generatorTabular
Repository of a software system for generating synthetic personal data based on the Bayesian network block structure
- Bn-learn Latent ModelTabular
Generating High-Fidelity Synthetic Patient Data for Assessing Machine Learning Healthcare Software -
- bnomicsTabular
Synthetic data generation with probabilistic Bayesian Networks -
- CLGPTabular
categorical latent Gaussian process is a generative model for multivariate categorical data -
- COR-GANTabular
Correlation-Capturing Convolutional Neural Networks for Generating Synthetic Healthcare Records -
- CTGANTabular
CTGAN is a GAN-based data synthesizer that can generate synthetic tabular data with high fidelity. -
Metrics and dataset evaluation
- datagene
- SDGym
Synthetic Data Gym (SDGym) is a framework to benchmark the performance of synthetic data generators for tabular data. SDGym is a project of the Data to AI Laboratory at MIT.
- SDMetrics
- SDV evaluation functionsMultiple formats
- Statistical-Similarity-Measurement
A methodology designed to validate the statistical similarity of synthetic data generated by GAN models. The metrics contain Auto-encoder, PCA, t-SNE, KL-divergence, Clustering, and Cosine Similarity.
- table-evaluator
Showing a sample of 74 resources. View the full list on GitHub →