Skip to main content

Open datasets in Healthcare

9
GitHub Stars
49
Curated Resources
10
Categories
23 hours ago
Last Refreshed
πŸ₯ Clinical & Electronic Health Records (EHR)🩻 Medical Imaging & Radiology🧬 Genomics & BioinformaticsπŸ“ˆ Physiological Signals & WearablesπŸ“ Medical NLP & TextπŸ“‹ Clinical Cases & Multimodal Data🚨 Emergency & Synthetic Healthcare DataπŸ‡§πŸ‡· Brazilian Public Health DataπŸ“Š Epidemiology & General HealthπŸ“š Dataset Platforms & Repositories

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me brain & neurology resources from awesome-health-datasets"

Installation instructions β†’

What's inside

🧬 Genomics & Bioinformatics

  • 1000 Genomes Project

    Whole-genome sequencing for population genetics and variant analysis.

  • gnomAD

    Large-scale variant database with 140,000+ exomes & genomes.

  • The Cancer Genome Atlas (TCGA)

    Genomic, epigenomic, transcriptomic, and proteomic data across 33 cancer types.

  • UK Biobank

    In-depth genetic and health information from ~500,000 volunteer participants.

🩻 Medical Imaging & Radiology

πŸ₯ Clinical & Electronic Health Records (EHR)

  • All of Us (NIH)

    Diverse EHR & genomics dataset focusing on underrepresented populations.

  • eICU Collaborative Research Database

    Multi-center critical care dataset of clinical data from over 200,000 ICU stays.

  • HCUP (U.S. Hospitalization Data)

    Nationwide inpatient & emergency data for healthcare utilization and cost analysis.

  • MIMIC-III-Ext-Notes

    Extended clinical notes from MIMIC-III critical care database. v1.0.0 (2026).

  • MIMIC-III-Ext-PPG

    PPG (photoplethysmography) benchmark dataset for cardiorespiratory analysis from MIMIC-III. v1.0.0 (2026).

  • MIMIC-IV

    Comprehensive critical care dataset containing de-identified health records of ICU patients (2008–2019).

πŸ“ Medical NLP & Text

  • Augmented Clinical Notes (Asclepius)

    167k synthetic clinical notes with discharge summaries and comprehensive medical histories (HuggingFace, 2026).

  • BioASQ

    Biomedical question answering and semantic indexing dataset.

  • CORD-19

    Scientific literature corpus with 1M+ COVID-19 research papers.

  • Med_Dataset

    100k real doctor-patient interactions across medical specialties, including diagnoses and treatment recommendations (HuggingFace, 2026).

  • MedFit Dataset

    6,444 healthcare Q&A pairs designed for fine-tuning medical chatbot language models (HuggingFace, 2026).

  • Medical Medicine Dataset

    Comprehensive drug information: 700 medications with therapeutic uses, side effects, and descriptions for medical chatbots and clinical decision support (HuggingFace, 2026).

πŸ“ˆ Physiological Signals & Wearables

  • Bridge2AI-Voice Pediatric Dataset

    Pediatric voice recordings for speech analysis and voice disorder detection (v1.0.0, 2026).

  • Endometriosis Symptoms Monitoring Database

    Daily symptom tracking from 34 endometriosis patients over 1-10 months, including symptom frequency, intensity, and MedDRA coding (2026).

  • GRABMyoFlow

    sEMG (surface electromyography) dataset with 63 subjects and dynamic transitions for hand gesture recognition and biometric authentication (2026).

  • MIT-BIH Arrhythmia Database

    Standard test material for evaluation of arrhythmia detectors.

  • NSRR Sleep Datasets

    Polysomnography & sleep signals for sleep disorder detection.

  • PhysioNet

    Renowned repository of physiological signals and open medical data.

πŸ“‹ Clinical Cases & Multimodal Data

  • CNTXTAI Medical Case Studies

    Diverse clinical cases from chronic diseases (MS, diabetes complications) to acute conditions (heart disease, respiratory failure) from academic publications (HuggingFace, 2026).

πŸ‡§πŸ‡· Brazilian Public Health Data

πŸ“Š Epidemiology & General Health

Showing a sample of 49 resources. View the full list on GitHub β†’