awesome-synthetic-data
github.com/gretelai/awesome-synthetic-data ↗📖 A curated list of resources dedicated to synthetic data
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me simulation resources from awesome-synthetic-data"
Installation instructions →What's inside
Libraries
- AirSimSimulation
AirSim is a simulator for drones, cars and more, built on Unreal and Unity engines.
- Contrastive Unpaired TranslationImage
Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan.
- Denoising Diffusion PytorchImage
Implementation of DDPM
- gretel-syntheticsText, Tabular and Time-Series
Generative models for structured and unstructured text, tabular, and multi-variate time-series data featuring differentially private learning.
- JukeboxAudio
OpenAI's Jukebox- A Generative Model for Music.
- Nvidia Dataset SynthesizerSimulation
NDDS is a UE4 plugin from NVIDIA to empower computer vision researchers to export high-quality synthetic images with metadata.
Tutorials
- Annotated DiffusionReading Content
Tutorial on original diffusion model paper with code
- Learning to Generate Data by Estimating Gradients of the Data DistributionDiffusion Models
Video by Yang Song from Stanford. Excellent theory and interesting applications.
- The Unreasonable Effectiveness of Recurrent Neural NetworksReading Content
Andrej Karpathy's intro to RNNs.
Datasets
- Awesome Public Datasets
Topic centric, high quality, public data sources
- Data.gov
U.S. Government's open data
- Google Cloud Public Datasets
Publicly available and free machine learning and analytics datasets.
- Google Research Dataset Search
Discover datasets hosted in thousands of repositories across the web
- HuggingFace Datasets
Library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks.
- Kaggle Datasets
Data science and machine learning datasets.
Academic Papers
Services
- List of Synthetic Data Startups in 2021
Not all of these necessarily have APIs.
Showing a sample of 50 resources. View the full list on GitHub →