LATTE: Label-efficient incident phenotyping from longitudinal electronic health records

Jun Wen; Jue Hou; Clara Lea Bonzel; Yihan Zhao; Victor M. Castro; Vivian S. Gainer; Dana Weisenfeld; Tianrun Cai; Yuk Lam Ho; Vidul A. Panickan; Lauren Costa; Chuan Hong; J. Michael Gaziano; Katherine P. Liao; Junwei Lu; Kelly Cho; Tianxi Cai

doi:10.1016/j.patter.2023.100906

LATTE: Label-efficient incident phenotyping from longitudinal electronic health records

Jun Wen, Jue Hou, Clara Lea Bonzel, Yihan Zhao, Victor M. Castro, Vivian S. Gainer, Dana Weisenfeld, Tianrun Cai, Yuk Lam Ho, Vidul A. Panickan, Lauren Costa, Chuan Hong, J. Michael Gaziano, Katherine P. Liao, Junwei Lu, Kelly Cho, Tianxi Cai

Biostatistics

Research output: Contribution to journal › Article › peer-review

Abstract

Electronic health record (EHR) data are increasingly used to support real-world evidence studies but are limited by the lack of precise timings of clinical events. Here, we propose a label-efficient incident phenotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embeddings, LATTE selects predictive features and compresses their information into longitudinal visit embeddings through visit attention learning. LATTE models the sequential dependency between the target event and visit embeddings to derive the timings. To improve label efficiency, LATTE constructs longitudinal silver-standard labels from unlabeled patients to perform semi-supervised training. LATTE is evaluated on the onset of type 2 diabetes, heart failure, and relapses of multiple sclerosis. LATTE consistently achieves substantial improvements over benchmark methods while providing high prediction interpretability. The event timings are shown to help discover risk factors of heart failure among patients with rheumatoid arthritis.

Original language	English (US)
Article number	100906
Journal	Patterns
Volume	5
Issue number	1
DOIs	https://doi.org/10.1016/j.patter.2023.100906
State	Published - Jan 12 2024

Bibliographical note

Publisher Copyright:
© 2023 The Authors

Keywords

DSML 2: Proof-of-Concept: Data science output has been formulated, implemented, and tested for one domain/problem

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access

10.1016/j.patter.2023.100906

OpenUrl availability

Full text

Cite this

Wen, J., Hou, J., Bonzel, C. L., Zhao, Y., Castro, V. M., Gainer, V. S., Weisenfeld, D., Cai, T., Ho, Y. L., Panickan, V. A., Costa, L., Hong, C., Gaziano, J. M., Liao, K. P., Lu, J., Cho, K., & Cai, T. (2024). LATTE: Label-efficient incident phenotyping from longitudinal electronic health records. Patterns, 5(1), Article 100906. https://doi.org/10.1016/j.patter.2023.100906

@article{726cd6f8a20245bda772eb1c450e903b,

title = "LATTE: Label-efficient incident phenotyping from longitudinal electronic health records",

abstract = "Electronic health record (EHR) data are increasingly used to support real-world evidence studies but are limited by the lack of precise timings of clinical events. Here, we propose a label-efficient incident phenotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embeddings, LATTE selects predictive features and compresses their information into longitudinal visit embeddings through visit attention learning. LATTE models the sequential dependency between the target event and visit embeddings to derive the timings. To improve label efficiency, LATTE constructs longitudinal silver-standard labels from unlabeled patients to perform semi-supervised training. LATTE is evaluated on the onset of type 2 diabetes, heart failure, and relapses of multiple sclerosis. LATTE consistently achieves substantial improvements over benchmark methods while providing high prediction interpretability. The event timings are shown to help discover risk factors of heart failure among patients with rheumatoid arthritis.",

keywords = "DSML 2: Proof-of-Concept: Data science output has been formulated, implemented, and tested for one domain/problem",

author = "Jun Wen and Jue Hou and Bonzel, {Clara Lea} and Yihan Zhao and Castro, {Victor M.} and Gainer, {Vivian S.} and Dana Weisenfeld and Tianrun Cai and Ho, {Yuk Lam} and Panickan, {Vidul A.} and Lauren Costa and Chuan Hong and Gaziano, {J. Michael} and Liao, {Katherine P.} and Junwei Lu and Kelly Cho and Tianxi Cai",

note = "Publisher Copyright: {\textcopyright} 2023 The Authors",

year = "2024",

month = jan,

day = "12",

doi = "10.1016/j.patter.2023.100906",

language = "English (US)",

volume = "5",

journal = "Patterns",

issn = "2666-3899",

publisher = "Cell Press",

number = "1",

}

TY - JOUR

T1 - LATTE

T2 - Label-efficient incident phenotyping from longitudinal electronic health records

AU - Wen, Jun

AU - Hou, Jue

AU - Bonzel, Clara Lea

AU - Zhao, Yihan

AU - Castro, Victor M.

AU - Gainer, Vivian S.

AU - Weisenfeld, Dana

AU - Cai, Tianrun

AU - Ho, Yuk Lam

AU - Panickan, Vidul A.

AU - Costa, Lauren

AU - Hong, Chuan

AU - Gaziano, J. Michael

AU - Liao, Katherine P.

AU - Lu, Junwei

AU - Cho, Kelly

AU - Cai, Tianxi

PY - 2024/1/12

Y1 - 2024/1/12

N2 - Electronic health record (EHR) data are increasingly used to support real-world evidence studies but are limited by the lack of precise timings of clinical events. Here, we propose a label-efficient incident phenotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embeddings, LATTE selects predictive features and compresses their information into longitudinal visit embeddings through visit attention learning. LATTE models the sequential dependency between the target event and visit embeddings to derive the timings. To improve label efficiency, LATTE constructs longitudinal silver-standard labels from unlabeled patients to perform semi-supervised training. LATTE is evaluated on the onset of type 2 diabetes, heart failure, and relapses of multiple sclerosis. LATTE consistently achieves substantial improvements over benchmark methods while providing high prediction interpretability. The event timings are shown to help discover risk factors of heart failure among patients with rheumatoid arthritis.

AB - Electronic health record (EHR) data are increasingly used to support real-world evidence studies but are limited by the lack of precise timings of clinical events. Here, we propose a label-efficient incident phenotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embeddings, LATTE selects predictive features and compresses their information into longitudinal visit embeddings through visit attention learning. LATTE models the sequential dependency between the target event and visit embeddings to derive the timings. To improve label efficiency, LATTE constructs longitudinal silver-standard labels from unlabeled patients to perform semi-supervised training. LATTE is evaluated on the onset of type 2 diabetes, heart failure, and relapses of multiple sclerosis. LATTE consistently achieves substantial improvements over benchmark methods while providing high prediction interpretability. The event timings are shown to help discover risk factors of heart failure among patients with rheumatoid arthritis.

KW - DSML 2: Proof-of-Concept: Data science output has been formulated, implemented, and tested for one domain/problem

UR - http://www.scopus.com/inward/record.url?scp=85181850356&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85181850356&partnerID=8YFLogxK

U2 - 10.1016/j.patter.2023.100906

DO - 10.1016/j.patter.2023.100906

M3 - Article

C2 - 38264714

AN - SCOPUS:85181850356

SN - 2666-3899

VL - 5

JO - Patterns

JF - Patterns

IS - 1

M1 - 100906

ER -

LATTE: Label-efficient incident phenotyping from longitudinal electronic health records

Abstract

Bibliographical note

Keywords

UN SDGs

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this