Spine surgeon versus AI algorithm full-length radiographic measurements: a validation study of complex adult spinal deformity patients

Jason J. Haselhuhn, Paul Brian O. Soriano, Priyanka Grover, Marcel Dreischarf, Kari Odland, Nathan R. Hendrickson, Kristen E. Jones, Christopher T. Martin, Jonathan N. Sembrano, David W. Polly

Research output: Contribution to journalArticlepeer-review

Abstract

Introduction: Spinal measurements play an integral role in surgical planning for a variety of spine procedures. Full-length imaging eliminates distortions that can occur with stitched images. However, these images take radiologists significantly longer to read than conventional radiographs. Artificial intelligence (AI) image analysis software that can make such measurements quickly and reliably would be advantageous to surgeons, radiologists, and the entire health system. Materials and methods: Institutional Review Board approval was obtained for this study. Preoperative full-length standing anterior–posterior and lateral radiographs of patients that were previously measured by fellowship-trained spine surgeons at our institution were obtained. The measurements included lumbar lordosis (LL), greatest coronal Cobb angle (GCC), pelvic incidence (PI), coronal balance (CB), and T1-pelvic angle (T1PA). Inter-rater intra-class correlation (ICC) values were calculated based on an overlapping sample of 10 patients measured by surgeons. Full-length standing radiographs of an additional 100 patients were provided for AI software training. The AI algorithm then measured the radiographs and ICC values were calculated. Results: ICC values for inter-rater reliability between surgeons were excellent and calculated to 0.97 for LL (95% CI 0.88–0.99), 0.78 (0.33–0.94) for GCC, 0.86 (0.55–0.96) for PI, 0.99 for CB (0.93–0.99), and 0.95 for T1PA (0.82–0.99). The algorithm computed the five selected parameters with ICC values between 0.70 and 0.94, indicating excellent reliability. Exemplary for the comparison of AI and surgeons, the ICC for LL was 0.88 (95% CI 0.83–0.92) and 0.93 for CB (0.90–0.95). GCC, PI, and T1PA could be determined with ICC values of 0.81 (0.69–0.87), 0.70 (0.60–0.78), and 0.94 (0.91–0.96) respectively. Conclusions: The AI algorithm presented here demonstrates excellent reliability for most of the parameters and good reliability for PI, with ICC values corresponding to measurements conducted by experienced surgeons. In future, it may facilitate the analysis of large data sets and aid physicians in diagnostics, pre-operative planning, and post-operative quality control.

Original languageEnglish (US)
JournalSpine Deformity
DOIs
StateAccepted/In press - 2024

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive licence to Scoliosis Research Society 2024.

Keywords

  • Adult spinal deformity
  • Artificial intelligence
  • Inter-rater reliability
  • Spinopelvic measurements

PubMed: MeSH publication types

  • Journal Article

Fingerprint

Dive into the research topics of 'Spine surgeon versus AI algorithm full-length radiographic measurements: a validation study of complex adult spinal deformity patients'. Together they form a unique fingerprint.

Cite this