awesome-health-datasets
github.com/fabianofilho/awesome-health-datasets βOpen datasets in Healthcare
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me brain & neurology resources from awesome-health-datasets"
Installation instructions βWhat's inside
𧬠Genomics & Bioinformatics
- 1000 Genomes Project
Whole-genome sequencing for population genetics and variant analysis.
- gnomAD
Large-scale variant database with 140,000+ exomes & genomes.
- The Cancer Genome Atlas (TCGA)
Genomic, epigenomic, transcriptomic, and proteomic data across 33 cancer types.
- UK Biobank
In-depth genetic and health information from ~500,000 volunteer participants.
π©» Medical Imaging & Radiology
- ADNI (Alzheimerβs Disease Neuroimaging Initiative)Brain & Neurology
Longitudinal multisite study for early detection of Alzheimer's.
- CheXpert (Stanford)Chest & Lungs
~224,000 chest X-rays for weak supervision and uncertainty-aware classification.
- COVID-19 Radiography DatabaseChest & Lungs
Chest X-ray images for COVID-19 and pneumonia detection.
- HAM10000Oncology & Dermatology
~10,000 dermoscopy images for melanoma detection and skin lesion classification.
- Human Connectome Project (HCP)Brain & Neurology
High-resolution neuroimaging datasets to map human brain connections.
- LIDC-IDRIChest & Lungs
Lung CT scans focusing on pulmonary nodules.
π₯ Clinical & Electronic Health Records (EHR)
- All of Us (NIH)
Diverse EHR & genomics dataset focusing on underrepresented populations.
- eICU Collaborative Research Database
Multi-center critical care dataset of clinical data from over 200,000 ICU stays.
- HCUP (U.S. Hospitalization Data)
Nationwide inpatient & emergency data for healthcare utilization and cost analysis.
- MIMIC-III-Ext-Notes
Extended clinical notes from MIMIC-III critical care database. v1.0.0 (2026).
- MIMIC-III-Ext-PPG
PPG (photoplethysmography) benchmark dataset for cardiorespiratory analysis from MIMIC-III. v1.0.0 (2026).
- MIMIC-IV
Comprehensive critical care dataset containing de-identified health records of ICU patients (2008β2019).
π Medical NLP & Text
- Augmented Clinical Notes (Asclepius)
167k synthetic clinical notes with discharge summaries and comprehensive medical histories (HuggingFace, 2026).
- BioASQ
Biomedical question answering and semantic indexing dataset.
- CORD-19
Scientific literature corpus with 1M+ COVID-19 research papers.
- Med_Dataset
100k real doctor-patient interactions across medical specialties, including diagnoses and treatment recommendations (HuggingFace, 2026).
- MedFit Dataset
6,444 healthcare Q&A pairs designed for fine-tuning medical chatbot language models (HuggingFace, 2026).
- Medical Medicine Dataset
Comprehensive drug information: 700 medications with therapeutic uses, side effects, and descriptions for medical chatbots and clinical decision support (HuggingFace, 2026).
π Physiological Signals & Wearables
- Bridge2AI-Voice Pediatric Dataset
Pediatric voice recordings for speech analysis and voice disorder detection (v1.0.0, 2026).
- Endometriosis Symptoms Monitoring Database
Daily symptom tracking from 34 endometriosis patients over 1-10 months, including symptom frequency, intensity, and MedDRA coding (2026).
- GRABMyoFlow
sEMG (surface electromyography) dataset with 63 subjects and dynamic transitions for hand gesture recognition and biometric authentication (2026).
- MIT-BIH Arrhythmia Database
Standard test material for evaluation of arrhythmia detectors.
- NSRR Sleep Datasets
Polysomnography & sleep signals for sleep disorder detection.
- PhysioNet
Renowned repository of physiological signals and open medical data.
π Clinical Cases & Multimodal Data
- CNTXTAI Medical Case Studies
Diverse clinical cases from chronic diseases (MS, diabetes complications) to acute conditions (heart disease, respiratory failure) from academic publications (HuggingFace, 2026).
π§π· Brazilian Public Health Data
- DATASUS / TABNET
Official health information system of the Brazilian Ministry of Health.
- Portal de Dados Abertos do SUS
Open data portal for the Brazilian Unified Health System (SUS).
π Epidemiology & General Health
- Global Health Data Exchange (GHDx)
Comprehensive catalog of surveys, censuses, vital statistics, and other health-related data.
- Global Health Observatory (WHO)
WHO's gateway to health-related statistics for its 194 Member States.
- Pima Indians Diabetes Dataset
Tabular clinical data for diabetes risk prediction.
Showing a sample of 49 resources. View the full list on GitHub β