Spine surgeon versus AI algorithm full-length radiographic measurements: a validation study of complex adult spinal deformity patients

Jason J. Haselhuhn; Paul Brian O. Soriano; Priyanka Grover; Marcel Dreischarf; Kari Odland; Nathan R. Hendrickson; Kristen E. Jones; Christopher T. Martin; Jonathan N. Sembrano; David W. Polly

doi:10.1007/s43390-024-00825-y

Spine surgeon versus AI algorithm full-length radiographic measurements: a validation study of complex adult spinal deformity patients

Jason J. Haselhuhn, Paul Brian O. Soriano, Priyanka Grover, Marcel Dreischarf, Kari Odland, Nathan R. Hendrickson, Kristen E. Jones, Christopher T. Martin, Jonathan N. Sembrano, David W. Polly

Research output: Contribution to journal › Article › peer-review

Abstract

Introduction: Spinal measurements play an integral role in surgical planning for a variety of spine procedures. Full-length imaging eliminates distortions that can occur with stitched images. However, these images take radiologists significantly longer to read than conventional radiographs. Artificial intelligence (AI) image analysis software that can make such measurements quickly and reliably would be advantageous to surgeons, radiologists, and the entire health system. Materials and methods: Institutional Review Board approval was obtained for this study. Preoperative full-length standing anterior–posterior and lateral radiographs of patients that were previously measured by fellowship-trained spine surgeons at our institution were obtained. The measurements included lumbar lordosis (LL), greatest coronal Cobb angle (GCC), pelvic incidence (PI), coronal balance (CB), and T1-pelvic angle (T1PA). Inter-rater intra-class correlation (ICC) values were calculated based on an overlapping sample of 10 patients measured by surgeons. Full-length standing radiographs of an additional 100 patients were provided for AI software training. The AI algorithm then measured the radiographs and ICC values were calculated. Results: ICC values for inter-rater reliability between surgeons were excellent and calculated to 0.97 for LL (95% CI 0.88–0.99), 0.78 (0.33–0.94) for GCC, 0.86 (0.55–0.96) for PI, 0.99 for CB (0.93–0.99), and 0.95 for T1PA (0.82–0.99). The algorithm computed the five selected parameters with ICC values between 0.70 and 0.94, indicating excellent reliability. Exemplary for the comparison of AI and surgeons, the ICC for LL was 0.88 (95% CI 0.83–0.92) and 0.93 for CB (0.90–0.95). GCC, PI, and T1PA could be determined with ICC values of 0.81 (0.69–0.87), 0.70 (0.60–0.78), and 0.94 (0.91–0.96) respectively. Conclusions: The AI algorithm presented here demonstrates excellent reliability for most of the parameters and good reliability for PI, with ICC values corresponding to measurements conducted by experienced surgeons. In future, it may facilitate the analysis of large data sets and aid physicians in diagnostics, pre-operative planning, and post-operative quality control.

Original language	English (US)
Journal	Spine Deformity
DOIs	https://doi.org/10.1007/s43390-024-00825-y
State	Accepted/In press - 2024

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive licence to Scoliosis Research Society 2024.

Keywords

Adult spinal deformity
Artificial intelligence
Inter-rater reliability
Spinopelvic measurements

PubMed: MeSH publication types

Journal Article

Access

10.1007/s43390-024-00825-y

Cite this

Haselhuhn, J. J., Soriano, P. B. O., Grover, P., Dreischarf, M., Odland, K., Hendrickson, N. R., Jones, K. E., Martin, C. T., Sembrano, J. N., & Polly, D. W. (Accepted/In press). Spine surgeon versus AI algorithm full-length radiographic measurements: a validation study of complex adult spinal deformity patients. Spine Deformity. https://doi.org/10.1007/s43390-024-00825-y

@article{1d9433aeaca3450dbca07054489c6cba,

title = "Spine surgeon versus AI algorithm full-length radiographic measurements: a validation study of complex adult spinal deformity patients",

abstract = "Introduction: Spinal measurements play an integral role in surgical planning for a variety of spine procedures. Full-length imaging eliminates distortions that can occur with stitched images. However, these images take radiologists significantly longer to read than conventional radiographs. Artificial intelligence (AI) image analysis software that can make such measurements quickly and reliably would be advantageous to surgeons, radiologists, and the entire health system. Materials and methods: Institutional Review Board approval was obtained for this study. Preoperative full-length standing anterior–posterior and lateral radiographs of patients that were previously measured by fellowship-trained spine surgeons at our institution were obtained. The measurements included lumbar lordosis (LL), greatest coronal Cobb angle (GCC), pelvic incidence (PI), coronal balance (CB), and T1-pelvic angle (T1PA). Inter-rater intra-class correlation (ICC) values were calculated based on an overlapping sample of 10 patients measured by surgeons. Full-length standing radiographs of an additional 100 patients were provided for AI software training. The AI algorithm then measured the radiographs and ICC values were calculated. Results: ICC values for inter-rater reliability between surgeons were excellent and calculated to 0.97 for LL (95% CI 0.88–0.99), 0.78 (0.33–0.94) for GCC, 0.86 (0.55–0.96) for PI, 0.99 for CB (0.93–0.99), and 0.95 for T1PA (0.82–0.99). The algorithm computed the five selected parameters with ICC values between 0.70 and 0.94, indicating excellent reliability. Exemplary for the comparison of AI and surgeons, the ICC for LL was 0.88 (95% CI 0.83–0.92) and 0.93 for CB (0.90–0.95). GCC, PI, and T1PA could be determined with ICC values of 0.81 (0.69–0.87), 0.70 (0.60–0.78), and 0.94 (0.91–0.96) respectively. Conclusions: The AI algorithm presented here demonstrates excellent reliability for most of the parameters and good reliability for PI, with ICC values corresponding to measurements conducted by experienced surgeons. In future, it may facilitate the analysis of large data sets and aid physicians in diagnostics, pre-operative planning, and post-operative quality control.",

keywords = "Adult spinal deformity, Artificial intelligence, Inter-rater reliability, Spinopelvic measurements",

author = "Haselhuhn, {Jason J.} and Soriano, {Paul Brian O.} and Priyanka Grover and Marcel Dreischarf and Kari Odland and Hendrickson, {Nathan R.} and Jones, {Kristen E.} and Martin, {Christopher T.} and Sembrano, {Jonathan N.} and Polly, {David W.}",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Scoliosis Research Society 2024.",

year = "2024",

doi = "10.1007/s43390-024-00825-y",

language = "English (US)",

journal = "Spine Deformity",

issn = "2212-134X",

publisher = "Elsevier BV",

}

TY - JOUR

T1 - Spine surgeon versus AI algorithm full-length radiographic measurements

T2 - a validation study of complex adult spinal deformity patients

AU - Haselhuhn, Jason J.

AU - Soriano, Paul Brian O.

AU - Grover, Priyanka

AU - Dreischarf, Marcel

AU - Odland, Kari

AU - Hendrickson, Nathan R.

AU - Jones, Kristen E.

AU - Martin, Christopher T.

AU - Sembrano, Jonathan N.

AU - Polly, David W.

PY - 2024

Y1 - 2024

N2 - Introduction: Spinal measurements play an integral role in surgical planning for a variety of spine procedures. Full-length imaging eliminates distortions that can occur with stitched images. However, these images take radiologists significantly longer to read than conventional radiographs. Artificial intelligence (AI) image analysis software that can make such measurements quickly and reliably would be advantageous to surgeons, radiologists, and the entire health system. Materials and methods: Institutional Review Board approval was obtained for this study. Preoperative full-length standing anterior–posterior and lateral radiographs of patients that were previously measured by fellowship-trained spine surgeons at our institution were obtained. The measurements included lumbar lordosis (LL), greatest coronal Cobb angle (GCC), pelvic incidence (PI), coronal balance (CB), and T1-pelvic angle (T1PA). Inter-rater intra-class correlation (ICC) values were calculated based on an overlapping sample of 10 patients measured by surgeons. Full-length standing radiographs of an additional 100 patients were provided for AI software training. The AI algorithm then measured the radiographs and ICC values were calculated. Results: ICC values for inter-rater reliability between surgeons were excellent and calculated to 0.97 for LL (95% CI 0.88–0.99), 0.78 (0.33–0.94) for GCC, 0.86 (0.55–0.96) for PI, 0.99 for CB (0.93–0.99), and 0.95 for T1PA (0.82–0.99). The algorithm computed the five selected parameters with ICC values between 0.70 and 0.94, indicating excellent reliability. Exemplary for the comparison of AI and surgeons, the ICC for LL was 0.88 (95% CI 0.83–0.92) and 0.93 for CB (0.90–0.95). GCC, PI, and T1PA could be determined with ICC values of 0.81 (0.69–0.87), 0.70 (0.60–0.78), and 0.94 (0.91–0.96) respectively. Conclusions: The AI algorithm presented here demonstrates excellent reliability for most of the parameters and good reliability for PI, with ICC values corresponding to measurements conducted by experienced surgeons. In future, it may facilitate the analysis of large data sets and aid physicians in diagnostics, pre-operative planning, and post-operative quality control.

AB - Introduction: Spinal measurements play an integral role in surgical planning for a variety of spine procedures. Full-length imaging eliminates distortions that can occur with stitched images. However, these images take radiologists significantly longer to read than conventional radiographs. Artificial intelligence (AI) image analysis software that can make such measurements quickly and reliably would be advantageous to surgeons, radiologists, and the entire health system. Materials and methods: Institutional Review Board approval was obtained for this study. Preoperative full-length standing anterior–posterior and lateral radiographs of patients that were previously measured by fellowship-trained spine surgeons at our institution were obtained. The measurements included lumbar lordosis (LL), greatest coronal Cobb angle (GCC), pelvic incidence (PI), coronal balance (CB), and T1-pelvic angle (T1PA). Inter-rater intra-class correlation (ICC) values were calculated based on an overlapping sample of 10 patients measured by surgeons. Full-length standing radiographs of an additional 100 patients were provided for AI software training. The AI algorithm then measured the radiographs and ICC values were calculated. Results: ICC values for inter-rater reliability between surgeons were excellent and calculated to 0.97 for LL (95% CI 0.88–0.99), 0.78 (0.33–0.94) for GCC, 0.86 (0.55–0.96) for PI, 0.99 for CB (0.93–0.99), and 0.95 for T1PA (0.82–0.99). The algorithm computed the five selected parameters with ICC values between 0.70 and 0.94, indicating excellent reliability. Exemplary for the comparison of AI and surgeons, the ICC for LL was 0.88 (95% CI 0.83–0.92) and 0.93 for CB (0.90–0.95). GCC, PI, and T1PA could be determined with ICC values of 0.81 (0.69–0.87), 0.70 (0.60–0.78), and 0.94 (0.91–0.96) respectively. Conclusions: The AI algorithm presented here demonstrates excellent reliability for most of the parameters and good reliability for PI, with ICC values corresponding to measurements conducted by experienced surgeons. In future, it may facilitate the analysis of large data sets and aid physicians in diagnostics, pre-operative planning, and post-operative quality control.

KW - Adult spinal deformity

KW - Artificial intelligence

KW - Inter-rater reliability

KW - Spinopelvic measurements

UR - http://www.scopus.com/inward/record.url?scp=85184886453&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85184886453&partnerID=8YFLogxK

U2 - 10.1007/s43390-024-00825-y

DO - 10.1007/s43390-024-00825-y

M3 - Article

C2 - 38336942

AN - SCOPUS:85184886453

SN - 2212-134X

JO - Spine Deformity

JF - Spine Deformity

ER -

Spine surgeon versus AI algorithm full-length radiographic measurements: a validation study of complex adult spinal deformity patients

Abstract

Bibliographical note

Keywords

PubMed: MeSH publication types

Access

Other files and links

Fingerprint

Cite this