TY - GEN
T1 - Strategies for Handling Missing Data in Detecting Postoperative Surgical Site Infections
AU - Hu, Zhen
AU - Melton, Genevieve B.
AU - Simon, Gyorgy J.
PY - 2015/12/8
Y1 - 2015/12/8
N2 - Researchers are increasingly interested in the secondary use of EHR data to detect specific outcomes and adverse conditions. Our aim is to develop valid, robust, and practical EHR-derived models for identifying postoperative surgical site infections (SSIs). SSIs can be classified into superficial, deep, and organ/space, and are costly with significant morbidity. Compared with administrative/claims data that previous research heavily relied on, our use of EHR data has the potential to allow for the construction of more informative SSI detection models. Unfortunately, secondary use of EHR data can be challenging due to its often incomplete nature - some specific tests are just ordered to only a subset of patients (e.g., 52% of the surgical patients in our cohort do not have any white blood cell count data within 30 days after the operation). Mostly researchers ignore it by excluding cases or single variables with missing data, or imputing missing values for variables with slight amount of missing data. However, because of the high missingness rate in our data, to simply discard incomplete cases may result in losing important indicators of SSI. In our previous work, we only utilized the complete cases to detect SSIs within 30 days after surgery using the gold standard outcome from a validated national surgical registry - National Surgical Quality Improvement Project (NSQIP)[1]. In the current study, we sought to explore several popular treatments of missing data. The performance of the models after applying different treatments are compared to that of the reference model based on the complete cases.
AB - Researchers are increasingly interested in the secondary use of EHR data to detect specific outcomes and adverse conditions. Our aim is to develop valid, robust, and practical EHR-derived models for identifying postoperative surgical site infections (SSIs). SSIs can be classified into superficial, deep, and organ/space, and are costly with significant morbidity. Compared with administrative/claims data that previous research heavily relied on, our use of EHR data has the potential to allow for the construction of more informative SSI detection models. Unfortunately, secondary use of EHR data can be challenging due to its often incomplete nature - some specific tests are just ordered to only a subset of patients (e.g., 52% of the surgical patients in our cohort do not have any white blood cell count data within 30 days after the operation). Mostly researchers ignore it by excluding cases or single variables with missing data, or imputing missing values for variables with slight amount of missing data. However, because of the high missingness rate in our data, to simply discard incomplete cases may result in losing important indicators of SSI. In our previous work, we only utilized the complete cases to detect SSIs within 30 days after surgery using the gold standard outcome from a validated national surgical registry - National Surgical Quality Improvement Project (NSQIP)[1]. In the current study, we sought to explore several popular treatments of missing data. The performance of the models after applying different treatments are compared to that of the reference model based on the complete cases.
UR - http://www.scopus.com/inward/record.url?scp=84966424762&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84966424762&partnerID=8YFLogxK
U2 - 10.1109/ICHI.2015.89
DO - 10.1109/ICHI.2015.89
M3 - Conference contribution
AN - SCOPUS:84966424762
T3 - Proceedings - 2015 IEEE International Conference on Healthcare Informatics, ICHI 2015
BT - Proceedings - 2015 IEEE International Conference on Healthcare Informatics, ICHI 2015
A2 - Fu, Wai-Tat
A2 - Balakrishnan, Prabhakaran
A2 - Harabagiu, Sanda
A2 - Wang, Fei
A2 - Srivatsava, Jaideep
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 3rd IEEE International Conference on Healthcare Informatics, ICHI 2015
Y2 - 21 October 2015 through 23 October 2015
ER -