Limited clinical utility of a machine learning revision prediction model based on a national hip arthroscopy registry

R. Kyle Martin; Solvejg Wastvedt; Jeppe Lange; Ayoosh Pareek; Julian Wolfson; Bent Lund

doi:10.1007/s00167-022-07054-8

Limited clinical utility of a machine learning revision prediction model based on a national hip arthroscopy registry

R. Kyle Martin, Solvejg Wastvedt, Jeppe Lange, Ayoosh Pareek, Julian Wolfson, Bent Lund

Biostatistics

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

Purpose: Accurate prediction of outcome following hip arthroscopy is challenging and machine learning has the potential to improve our predictive capability. The purpose of this study was to determine if machine learning analysis of the Danish Hip Arthroscopy Registry (DHAR) can develop a clinically meaningful calculator for predicting the probability of a patient undergoing subsequent revision surgery following primary hip arthroscopy. Methods: Machine learning analysis was performed on the DHAR. The primary outcome for the models was probability of revision hip arthroscopy within 1, 2, and/or 5 years after primary hip arthroscopy. Data were split randomly into training (75%) and test (25%) sets. Four models intended for these types of data were tested: Cox elastic net, random survival forest, gradient boosted regression (GBM), and super learner. These four models represent a range of approaches to statistical details like variable selection and model complexity. Model performance was assessed by calculating calibration and area under the curve (AUC). Analysis was performed using only variables available in the pre-operative clinical setting and then repeated to compare model performance using all variables available in the registry. Results: In total, 5581 patients were included for analysis. Average follow-up time or time-to-revision was 4.25 years (± 2.51) years and overall revision rate was 11%. All four models were generally well calibrated and demonstrated concordance in the moderate range when restricted to only pre-operative variables (0.62–0.67), and when considering all variables available in the registry (0.63–0.66). The 95% confidence intervals for model concordance were wide for both analyses, ranging from a low of 0.53 to a high of 0.75, indicating uncertainty about the true accuracy of the models. Conclusion: The association between pre-surgical factors and outcome following hip arthroscopy is complex. Machine learning analysis of the DHAR produced a model capable of predicting revision surgery risk following primary hip arthroscopy that demonstrated moderate accuracy but likely limited clinical usefulness. Prediction accuracy would benefit from enhanced data quality within the registry and this preliminary study holds promise for future model generation as the DHAR matures. Ongoing collection of high-quality data by the DHAR should enable improved patient-specific outcome prediction that is generalisable across the population. Level of evidence: Level III.

Original language	English (US)
Pages (from-to)	2079-2089
Number of pages	11
Journal	Knee Surgery, Sports Traumatology, Arthroscopy
Volume	31
Issue number	6
DOIs	https://doi.org/10.1007/s00167-022-07054-8
State	Published - Jun 2023

Bibliographical note

Funding Information:
This study was funded by a Norwegian Centennial Chair Seed Grant.

Publisher Copyright:
© 2022, The Author(s).

Keywords

Femoroacetabular impingement
Hip arthroscopy
Machine learning
Outcome prediction
Revision surgery

PubMed: MeSH publication types

Journal Article

Access

10.1007/s00167-022-07054-8

OpenUrl availability

Full text

Cite this

@article{189f9d5f82904710b09d9a694cd1b302,

title = "Limited clinical utility of a machine learning revision prediction model based on a national hip arthroscopy registry",

abstract = "Purpose: Accurate prediction of outcome following hip arthroscopy is challenging and machine learning has the potential to improve our predictive capability. The purpose of this study was to determine if machine learning analysis of the Danish Hip Arthroscopy Registry (DHAR) can develop a clinically meaningful calculator for predicting the probability of a patient undergoing subsequent revision surgery following primary hip arthroscopy. Methods: Machine learning analysis was performed on the DHAR. The primary outcome for the models was probability of revision hip arthroscopy within 1, 2, and/or 5 years after primary hip arthroscopy. Data were split randomly into training (75%) and test (25%) sets. Four models intended for these types of data were tested: Cox elastic net, random survival forest, gradient boosted regression (GBM), and super learner. These four models represent a range of approaches to statistical details like variable selection and model complexity. Model performance was assessed by calculating calibration and area under the curve (AUC). Analysis was performed using only variables available in the pre-operative clinical setting and then repeated to compare model performance using all variables available in the registry. Results: In total, 5581 patients were included for analysis. Average follow-up time or time-to-revision was 4.25 years (± 2.51) years and overall revision rate was 11%. All four models were generally well calibrated and demonstrated concordance in the moderate range when restricted to only pre-operative variables (0.62–0.67), and when considering all variables available in the registry (0.63–0.66). The 95% confidence intervals for model concordance were wide for both analyses, ranging from a low of 0.53 to a high of 0.75, indicating uncertainty about the true accuracy of the models. Conclusion: The association between pre-surgical factors and outcome following hip arthroscopy is complex. Machine learning analysis of the DHAR produced a model capable of predicting revision surgery risk following primary hip arthroscopy that demonstrated moderate accuracy but likely limited clinical usefulness. Prediction accuracy would benefit from enhanced data quality within the registry and this preliminary study holds promise for future model generation as the DHAR matures. Ongoing collection of high-quality data by the DHAR should enable improved patient-specific outcome prediction that is generalisable across the population. Level of evidence: Level III.",

keywords = "Femoroacetabular impingement, Hip arthroscopy, Machine learning, Outcome prediction, Revision surgery",

author = "Martin, {R. Kyle} and Solvejg Wastvedt and Jeppe Lange and Ayoosh Pareek and Julian Wolfson and Bent Lund",

note = "Funding Information: This study was funded by a Norwegian Centennial Chair Seed Grant. Publisher Copyright: {\textcopyright} 2022, The Author(s).",

year = "2023",

month = jun,

doi = "10.1007/s00167-022-07054-8",

language = "English (US)",

volume = "31",

pages = "2079--2089",

journal = "Knee Surgery, Sports Traumatology, Arthroscopy",

issn = "0942-2056",

publisher = "Springer Verlag",

number = "6",

}

TY - JOUR

T1 - Limited clinical utility of a machine learning revision prediction model based on a national hip arthroscopy registry

AU - Martin, R. Kyle

AU - Wastvedt, Solvejg

AU - Lange, Jeppe

AU - Pareek, Ayoosh

AU - Wolfson, Julian

AU - Lund, Bent

PY - 2023/6

Y1 - 2023/6

N2 - Purpose: Accurate prediction of outcome following hip arthroscopy is challenging and machine learning has the potential to improve our predictive capability. The purpose of this study was to determine if machine learning analysis of the Danish Hip Arthroscopy Registry (DHAR) can develop a clinically meaningful calculator for predicting the probability of a patient undergoing subsequent revision surgery following primary hip arthroscopy. Methods: Machine learning analysis was performed on the DHAR. The primary outcome for the models was probability of revision hip arthroscopy within 1, 2, and/or 5 years after primary hip arthroscopy. Data were split randomly into training (75%) and test (25%) sets. Four models intended for these types of data were tested: Cox elastic net, random survival forest, gradient boosted regression (GBM), and super learner. These four models represent a range of approaches to statistical details like variable selection and model complexity. Model performance was assessed by calculating calibration and area under the curve (AUC). Analysis was performed using only variables available in the pre-operative clinical setting and then repeated to compare model performance using all variables available in the registry. Results: In total, 5581 patients were included for analysis. Average follow-up time or time-to-revision was 4.25 years (± 2.51) years and overall revision rate was 11%. All four models were generally well calibrated and demonstrated concordance in the moderate range when restricted to only pre-operative variables (0.62–0.67), and when considering all variables available in the registry (0.63–0.66). The 95% confidence intervals for model concordance were wide for both analyses, ranging from a low of 0.53 to a high of 0.75, indicating uncertainty about the true accuracy of the models. Conclusion: The association between pre-surgical factors and outcome following hip arthroscopy is complex. Machine learning analysis of the DHAR produced a model capable of predicting revision surgery risk following primary hip arthroscopy that demonstrated moderate accuracy but likely limited clinical usefulness. Prediction accuracy would benefit from enhanced data quality within the registry and this preliminary study holds promise for future model generation as the DHAR matures. Ongoing collection of high-quality data by the DHAR should enable improved patient-specific outcome prediction that is generalisable across the population. Level of evidence: Level III.

AB - Purpose: Accurate prediction of outcome following hip arthroscopy is challenging and machine learning has the potential to improve our predictive capability. The purpose of this study was to determine if machine learning analysis of the Danish Hip Arthroscopy Registry (DHAR) can develop a clinically meaningful calculator for predicting the probability of a patient undergoing subsequent revision surgery following primary hip arthroscopy. Methods: Machine learning analysis was performed on the DHAR. The primary outcome for the models was probability of revision hip arthroscopy within 1, 2, and/or 5 years after primary hip arthroscopy. Data were split randomly into training (75%) and test (25%) sets. Four models intended for these types of data were tested: Cox elastic net, random survival forest, gradient boosted regression (GBM), and super learner. These four models represent a range of approaches to statistical details like variable selection and model complexity. Model performance was assessed by calculating calibration and area under the curve (AUC). Analysis was performed using only variables available in the pre-operative clinical setting and then repeated to compare model performance using all variables available in the registry. Results: In total, 5581 patients were included for analysis. Average follow-up time or time-to-revision was 4.25 years (± 2.51) years and overall revision rate was 11%. All four models were generally well calibrated and demonstrated concordance in the moderate range when restricted to only pre-operative variables (0.62–0.67), and when considering all variables available in the registry (0.63–0.66). The 95% confidence intervals for model concordance were wide for both analyses, ranging from a low of 0.53 to a high of 0.75, indicating uncertainty about the true accuracy of the models. Conclusion: The association between pre-surgical factors and outcome following hip arthroscopy is complex. Machine learning analysis of the DHAR produced a model capable of predicting revision surgery risk following primary hip arthroscopy that demonstrated moderate accuracy but likely limited clinical usefulness. Prediction accuracy would benefit from enhanced data quality within the registry and this preliminary study holds promise for future model generation as the DHAR matures. Ongoing collection of high-quality data by the DHAR should enable improved patient-specific outcome prediction that is generalisable across the population. Level of evidence: Level III.

KW - Femoroacetabular impingement

KW - Hip arthroscopy

KW - Machine learning

KW - Outcome prediction

KW - Revision surgery

UR - http://www.scopus.com/inward/record.url?scp=85136933406&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85136933406&partnerID=8YFLogxK

U2 - 10.1007/s00167-022-07054-8

DO - 10.1007/s00167-022-07054-8

M3 - Article

C2 - 35947158

AN - SCOPUS:85136933406

SN - 0942-2056

VL - 31

SP - 2079

EP - 2089

JO - Knee Surgery, Sports Traumatology, Arthroscopy

JF - Knee Surgery, Sports Traumatology, Arthroscopy

IS - 6

ER -

Limited clinical utility of a machine learning revision prediction model based on a national hip arthroscopy registry

Abstract

Bibliographical note

Keywords

PubMed: MeSH publication types

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this