Open-Source Healthcare AI

A Python toolkit for clinical deep learning — unifying datasets, tasks, and models across electronic health records, physiological signals, and medical imaging.

pip install pyhealth
20+
Clinical
Datasets
26+
Clinical
Tasks
33+
ML
Models
400+
Discord
Members
10+
Active
Researchers
39×
Faster than
pandas
20×
Less memory than
pandarallel

5-Stage Pipeline in <15 Lines

The same pattern works for any task — swap the dataset and task class to move between mortality, readmission, drug recommendation, or imaging.

Mortality prediction on MIMIC-III

from pyhealth.datasets import MIMIC3Dataset
from pyhealth.tasks import MortalityPredictionMIMIC3
from pyhealth.datasets import split_by_patient, get_dataloader
from pyhealth.models import Transformer
from pyhealth.trainer import Trainer

if __name__ == "__main__":
    # 1. Load data
    dataset = MIMIC3Dataset(root="data/", tables=["DIAGNOSES_ICD", "PROCEDURES_ICD"])
    samples = dataset.set_task(MortalityPredictionMIMIC3())

    # 2. Split & load
    train_ds, val_ds, test_ds = split_by_patient(samples, [0.8, 0.1, 0.1])
    train_loader = get_dataloader(train_ds, batch_size=32, shuffle=True)

    # 3. Train
    model = Transformer(dataset=samples)
    trainer = Trainer(model=model)
    trainer.train(train_loader, val_loader, epochs=50, monitor="pr_auc")
    trainer.evaluate(test_loader)

Built for Scale — Efficient by Design

Benchmarked on MIMIC-IV at 4 parallel workers. PyHealth 2.0 uses a memory-mapped architecture that dynamically adapts to your hardware — whether you have 2 cores or 64, it scales without manual memory management so you can focus on the ML, not the infrastructure. Pandas values shown at 1 worker only (†) as it could not scale beyond single-threaded execution.

Drug Recommendation

Wall Time (hours · lower is better)

PyHealth 2.0
0.23h
MEDS
2.91h
Pandas †
1.31h
PyHealth 1.16
0.27h

Peak Memory (GB · lower is better)

PyHealth 2.0
8.9 GB
MEDS
61.7 GB
Pandas †
10.4 GB
PyHealth 1.16
83.2 GB

Length of Stay Prediction

Wall Time (hours · lower is better)

PyHealth 2.0
0.69h
MEDS
3.32h
Pandas †
2.85h
PyHealth 1.16
0.64h

Peak Memory (GB · lower is better)

PyHealth 2.0
8.5 GB
MEDS
61.8 GB
Pandas †
11.4 GB
PyHealth 1.16
80.9 GB

In-Hospital Mortality

Wall Time (hours · lower is better)

PyHealth 2.0
1.28h
MEDS
2.96h
Pandas †
26.03h
PyHealth 1.16
0.88h

Peak Memory (GB · lower is better)

PyHealth 2.0
23.4 GB
MEDS
60.0 GB
Pandas †
49.2 GB
PyHealth 1.16
187.2 GB
PyHealth 2.0 PyHealth 1.16 MEDS Pandas † (1 worker only)

† Pandas measured at 1 worker; failed beyond single-threaded execution. All other methods at 4 workers on MIMIC-IV. Full benchmark in the PyHealth 2.0 paper.

How PyHealth Standardizes Clinical AI Development

A unified API from raw clinical data to trustworthy, interpretable models — follow the pipeline step by step.

Resources

Everything you need to get started or get involved.