TY - JOUR
T1 - Spine surgeon versus AI algorithm full-length radiographic measurements
T2 - a validation study of complex adult spinal deformity patients
AU - Haselhuhn, Jason J.
AU - Soriano, Paul Brian O.
AU - Grover, Priyanka
AU - Dreischarf, Marcel
AU - Odland, Kari
AU - Hendrickson, Nathan R.
AU - Jones, Kristen E.
AU - Martin, Christopher T.
AU - Sembrano, Jonathan N.
AU - Polly, David W.
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Scoliosis Research Society 2024.
PY - 2024
Y1 - 2024
N2 - Introduction: Spinal measurements play an integral role in surgical planning for a variety of spine procedures. Full-length imaging eliminates distortions that can occur with stitched images. However, these images take radiologists significantly longer to read than conventional radiographs. Artificial intelligence (AI) image analysis software that can make such measurements quickly and reliably would be advantageous to surgeons, radiologists, and the entire health system. Materials and methods: Institutional Review Board approval was obtained for this study. Preoperative full-length standing anterior–posterior and lateral radiographs of patients that were previously measured by fellowship-trained spine surgeons at our institution were obtained. The measurements included lumbar lordosis (LL), greatest coronal Cobb angle (GCC), pelvic incidence (PI), coronal balance (CB), and T1-pelvic angle (T1PA). Inter-rater intra-class correlation (ICC) values were calculated based on an overlapping sample of 10 patients measured by surgeons. Full-length standing radiographs of an additional 100 patients were provided for AI software training. The AI algorithm then measured the radiographs and ICC values were calculated. Results: ICC values for inter-rater reliability between surgeons were excellent and calculated to 0.97 for LL (95% CI 0.88–0.99), 0.78 (0.33–0.94) for GCC, 0.86 (0.55–0.96) for PI, 0.99 for CB (0.93–0.99), and 0.95 for T1PA (0.82–0.99). The algorithm computed the five selected parameters with ICC values between 0.70 and 0.94, indicating excellent reliability. Exemplary for the comparison of AI and surgeons, the ICC for LL was 0.88 (95% CI 0.83–0.92) and 0.93 for CB (0.90–0.95). GCC, PI, and T1PA could be determined with ICC values of 0.81 (0.69–0.87), 0.70 (0.60–0.78), and 0.94 (0.91–0.96) respectively. Conclusions: The AI algorithm presented here demonstrates excellent reliability for most of the parameters and good reliability for PI, with ICC values corresponding to measurements conducted by experienced surgeons. In future, it may facilitate the analysis of large data sets and aid physicians in diagnostics, pre-operative planning, and post-operative quality control.
AB - Introduction: Spinal measurements play an integral role in surgical planning for a variety of spine procedures. Full-length imaging eliminates distortions that can occur with stitched images. However, these images take radiologists significantly longer to read than conventional radiographs. Artificial intelligence (AI) image analysis software that can make such measurements quickly and reliably would be advantageous to surgeons, radiologists, and the entire health system. Materials and methods: Institutional Review Board approval was obtained for this study. Preoperative full-length standing anterior–posterior and lateral radiographs of patients that were previously measured by fellowship-trained spine surgeons at our institution were obtained. The measurements included lumbar lordosis (LL), greatest coronal Cobb angle (GCC), pelvic incidence (PI), coronal balance (CB), and T1-pelvic angle (T1PA). Inter-rater intra-class correlation (ICC) values were calculated based on an overlapping sample of 10 patients measured by surgeons. Full-length standing radiographs of an additional 100 patients were provided for AI software training. The AI algorithm then measured the radiographs and ICC values were calculated. Results: ICC values for inter-rater reliability between surgeons were excellent and calculated to 0.97 for LL (95% CI 0.88–0.99), 0.78 (0.33–0.94) for GCC, 0.86 (0.55–0.96) for PI, 0.99 for CB (0.93–0.99), and 0.95 for T1PA (0.82–0.99). The algorithm computed the five selected parameters with ICC values between 0.70 and 0.94, indicating excellent reliability. Exemplary for the comparison of AI and surgeons, the ICC for LL was 0.88 (95% CI 0.83–0.92) and 0.93 for CB (0.90–0.95). GCC, PI, and T1PA could be determined with ICC values of 0.81 (0.69–0.87), 0.70 (0.60–0.78), and 0.94 (0.91–0.96) respectively. Conclusions: The AI algorithm presented here demonstrates excellent reliability for most of the parameters and good reliability for PI, with ICC values corresponding to measurements conducted by experienced surgeons. In future, it may facilitate the analysis of large data sets and aid physicians in diagnostics, pre-operative planning, and post-operative quality control.
KW - Adult spinal deformity
KW - Artificial intelligence
KW - Inter-rater reliability
KW - Spinopelvic measurements
UR - http://www.scopus.com/inward/record.url?scp=85184886453&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85184886453&partnerID=8YFLogxK
U2 - 10.1007/s43390-024-00825-y
DO - 10.1007/s43390-024-00825-y
M3 - Article
C2 - 38336942
AN - SCOPUS:85184886453
SN - 2212-134X
JO - Spine Deformity
JF - Spine Deformity
ER -