Multi-site benchmark classification of major depressive disorder using machine learning on cortical and subcortical measures

Vladimir Belov; Tracy Erwin-Grabner; Moji Aghajani; Andre Aleman; Alyssa R. Amod; Zeynep Basgoze; Francesco Benedetti; Bianca Besteher; Robin Bülow; Christopher R.K. Ching; Colm G. Connolly; Kathryn Cullen; Christopher G. Davey; Danai Dima; Annemiek Dols; Jennifer W. Evans; Cynthia H.Y. Fu; Ali Saffet Gonul; Ian H. Gotlib; Hans J. Grabe; Nynke Groenewold; J. Paul Hamilton; Ben J. Harrison; Tiffany C. Ho; Benson Mwangi; Natalia Jaworska; Neda Jahanshad; Bonnie Klimes-Dougan; Sheri Michelle Koopowitz; Thomas Lancaster; Meng Li; David E.J. Linden; Frank P. MacMaster; David M.A. Mehler; Elisa Melloni; Bryon A. Mueller; Amar Ojha; Mardien L. Oudega; Brenda W.J.H. Penninx; Sara Poletti; Edith Pomarol-Clotet; Maria J. Portella; Elena Pozzi; Liesbeth Reneman; Matthew D. Sacchet; Philipp G. Sämann; Anouk Schrantee; Kang Sim; Jair C. Soares; Dan J. Stein; Sophia I. Thomopoulos; Aslihan Uyar-Demir; Nic J.A. van der Wee; Steven J.A. van der Werff; Henry Völzke; Sarah Whittle; Katharina Wittfeld; Margaret J. Wright; Mon Ju Wu; Tony T. Yang; Carlos Zarate; Dick J. Veltman; Lianne Schmaal; Paul M. Thompson; Roberto Goya-Maldonado

doi:10.1038/s41598-023-47934-8

Multi-site benchmark classification of major depressive disorder using machine learning on cortical and subcortical measures

Vladimir Belov, Tracy Erwin-Grabner, Moji Aghajani, Andre Aleman, Alyssa R. Amod, Zeynep Basgoze, Francesco Benedetti, Bianca Besteher, Robin Bülow, Christopher R.K. Ching, Colm G. Connolly, Kathryn Cullen, Christopher G. Davey, Danai Dima, Annemiek Dols, Jennifer W. Evans, Cynthia H.Y. Fu, Ali Saffet Gonul, Ian H. Gotlib, Hans J. GrabeNynke Groenewold, J. Paul Hamilton, Ben J. Harrison, Tiffany C. Ho, Benson Mwangi, Natalia Jaworska, Neda Jahanshad, Bonnie Klimes-Dougan, Sheri Michelle Koopowitz, Thomas Lancaster, Meng Li, David E.J. Linden, Frank P. MacMaster, David M.A. Mehler, Elisa Melloni, Bryon A. Mueller, Amar Ojha, Mardien L. Oudega, Brenda W.J.H. Penninx, Sara Poletti, Edith Pomarol-Clotet, Maria J. Portella, Elena Pozzi, Liesbeth Reneman, Matthew D. Sacchet, Philipp G. Sämann, Anouk Schrantee, Kang Sim, Jair C. Soares, Dan J. Stein, Sophia I. Thomopoulos, Aslihan Uyar-Demir, Nic J.A. van der Wee, Steven J.A. van der Werff, Henry Völzke, Sarah Whittle, Katharina Wittfeld, Margaret J. Wright, Mon Ju Wu, Tony T. Yang, Carlos Zarate, Dick J. Veltman, Lianne Schmaal, Paul M. Thompson, Roberto Goya-Maldonado

Research output: Contribution to journal › Article › peer-review

Abstract

Machine learning (ML) techniques have gained popularity in the neuroimaging field due to their potential for classifying neuropsychiatric disorders. However, the diagnostic predictive power of the existing algorithms has been limited by small sample sizes, lack of representativeness, data leakage, and/or overfitting. Here, we overcome these limitations with the largest multi-site sample size to date (N = 5365) to provide a generalizable ML classification benchmark of major depressive disorder (MDD) using shallow linear and non-linear models. Leveraging brain measures from standardized ENIGMA analysis pipelines in FreeSurfer, we were able to classify MDD versus healthy controls (HC) with a balanced accuracy of around 62%. But after harmonizing the data, e.g., using ComBat, the balanced accuracy dropped to approximately 52%. Accuracy results close to random chance levels were also observed in stratified groups according to age of onset, antidepressant use, number of episodes and sex. Future studies incorporating higher dimensional brain imaging/phenotype features, and/or using more advanced machine and deep learning methods may yield more encouraging prospects.

Original language	English (US)
Article number	1084
Journal	Scientific reports
Volume	14
Issue number	1
DOIs	https://doi.org/10.1038/s41598-023-47934-8
State	Published - Dec 2024

Bibliographical note

Publisher Copyright:
© 2024, The Author(s).

Access

10.1038/s41598-023-47934-8

OpenUrl availability

Full text

Cite this

Belov, V., Erwin-Grabner, T., Aghajani, M., Aleman, A., Amod, A. R., Basgoze, Z., Benedetti, F., Besteher, B., Bülow, R., Ching, C. R. K., Connolly, C. G., Cullen, K., Davey, C. G., Dima, D., Dols, A., Evans, J. W., Fu, C. H. Y., Gonul, A. S., Gotlib, I. H., ... Goya-Maldonado, R. (2024). Multi-site benchmark classification of major depressive disorder using machine learning on cortical and subcortical measures. Scientific reports, 14(1), Article 1084. https://doi.org/10.1038/s41598-023-47934-8

Belov, V, Erwin-Grabner, T, Aghajani, M, Aleman, A, Amod, AR, Basgoze, Z, Benedetti, F, Besteher, B, Bülow, R, Ching, CRK, Connolly, CG, Cullen, K, Davey, CG, Dima, D, Dols, A, Evans, JW, Fu, CHY, Gonul, AS, Gotlib, IH, Grabe, HJ, Groenewold, N, Hamilton, JP, Harrison, BJ, Ho, TC, Mwangi, B, Jaworska, N, Jahanshad, N, Klimes-Dougan, B, Koopowitz, SM, Lancaster, T, Li, M, Linden, DEJ, MacMaster, FP, Mehler, DMA, Melloni, E, Mueller, BA, Ojha, A, Oudega, ML, Penninx, BWJH, Poletti, S, Pomarol-Clotet, E, Portella, MJ, Pozzi, E, Reneman, L, Sacchet, MD, Sämann, PG, Schrantee, A, Sim, K, Soares, JC, Stein, DJ, Thomopoulos, SI, Uyar-Demir, A, van der Wee, NJA, van der Werff, SJA, Völzke, H, Whittle, S, Wittfeld, K, Wright, MJ, Wu, MJ, Yang, TT, Zarate, C, Veltman, DJ, Schmaal, L, Thompson, PM & Goya-Maldonado, R 2024, 'Multi-site benchmark classification of major depressive disorder using machine learning on cortical and subcortical measures', Scientific reports, vol. 14, no. 1, 1084. https://doi.org/10.1038/s41598-023-47934-8

@article{2feb83778e6341d1b0f5e9b4321ba553,

title = "Multi-site benchmark classification of major depressive disorder using machine learning on cortical and subcortical measures",

abstract = "Machine learning (ML) techniques have gained popularity in the neuroimaging field due to their potential for classifying neuropsychiatric disorders. However, the diagnostic predictive power of the existing algorithms has been limited by small sample sizes, lack of representativeness, data leakage, and/or overfitting. Here, we overcome these limitations with the largest multi-site sample size to date (N = 5365) to provide a generalizable ML classification benchmark of major depressive disorder (MDD) using shallow linear and non-linear models. Leveraging brain measures from standardized ENIGMA analysis pipelines in FreeSurfer, we were able to classify MDD versus healthy controls (HC) with a balanced accuracy of around 62%. But after harmonizing the data, e.g., using ComBat, the balanced accuracy dropped to approximately 52%. Accuracy results close to random chance levels were also observed in stratified groups according to age of onset, antidepressant use, number of episodes and sex. Future studies incorporating higher dimensional brain imaging/phenotype features, and/or using more advanced machine and deep learning methods may yield more encouraging prospects.",

author = "Vladimir Belov and Tracy Erwin-Grabner and Moji Aghajani and Andre Aleman and Amod, {Alyssa R.} and Zeynep Basgoze and Francesco Benedetti and Bianca Besteher and Robin B{\"u}low and Ching, {Christopher R.K.} and Connolly, {Colm G.} and Kathryn Cullen and Davey, {Christopher G.} and Danai Dima and Annemiek Dols and Evans, {Jennifer W.} and Fu, {Cynthia H.Y.} and Gonul, {Ali Saffet} and Gotlib, {Ian H.} and Grabe, {Hans J.} and Nynke Groenewold and Hamilton, {J. Paul} and Harrison, {Ben J.} and Ho, {Tiffany C.} and Benson Mwangi and Natalia Jaworska and Neda Jahanshad and Bonnie Klimes-Dougan and Koopowitz, {Sheri Michelle} and Thomas Lancaster and Meng Li and Linden, {David E.J.} and MacMaster, {Frank P.} and Mehler, {David M.A.} and Elisa Melloni and Mueller, {Bryon A.} and Amar Ojha and Oudega, {Mardien L.} and Penninx, {Brenda W.J.H.} and Sara Poletti and Edith Pomarol-Clotet and Portella, {Maria J.} and Elena Pozzi and Liesbeth Reneman and Sacchet, {Matthew D.} and S{\"a}mann, {Philipp G.} and Anouk Schrantee and Kang Sim and Soares, {Jair C.} and Stein, {Dan J.} and Thomopoulos, {Sophia I.} and Aslihan Uyar-Demir and {van der Wee}, {Nic J.A.} and {van der Werff}, {Steven J.A.} and Henry V{\"o}lzke and Sarah Whittle and Katharina Wittfeld and Wright, {Margaret J.} and Wu, {Mon Ju} and Yang, {Tony T.} and Carlos Zarate and Veltman, {Dick J.} and Lianne Schmaal and Thompson, {Paul M.} and Roberto Goya-Maldonado",

note = "Publisher Copyright: {\textcopyright} 2024, The Author(s).",

year = "2024",

month = dec,

doi = "10.1038/s41598-023-47934-8",

language = "English (US)",

volume = "14",

journal = "Scientific reports",

issn = "2045-2322",

publisher = "Nature Publishing Group",

number = "1",

}

TY - JOUR

T1 - Multi-site benchmark classification of major depressive disorder using machine learning on cortical and subcortical measures

AU - Belov, Vladimir

AU - Erwin-Grabner, Tracy

AU - Aghajani, Moji

AU - Aleman, Andre

AU - Amod, Alyssa R.

AU - Basgoze, Zeynep

AU - Benedetti, Francesco

AU - Besteher, Bianca

AU - Bülow, Robin

AU - Ching, Christopher R.K.

AU - Connolly, Colm G.

AU - Cullen, Kathryn

AU - Davey, Christopher G.

AU - Dima, Danai

AU - Dols, Annemiek

AU - Evans, Jennifer W.

AU - Fu, Cynthia H.Y.

AU - Gonul, Ali Saffet

AU - Gotlib, Ian H.

AU - Grabe, Hans J.

AU - Groenewold, Nynke

AU - Hamilton, J. Paul

AU - Harrison, Ben J.

AU - Ho, Tiffany C.

AU - Mwangi, Benson

AU - Jaworska, Natalia

AU - Jahanshad, Neda

AU - Klimes-Dougan, Bonnie

AU - Koopowitz, Sheri Michelle

AU - Lancaster, Thomas

AU - Li, Meng

AU - Linden, David E.J.

AU - MacMaster, Frank P.

AU - Mehler, David M.A.

AU - Melloni, Elisa

AU - Mueller, Bryon A.

AU - Ojha, Amar

AU - Oudega, Mardien L.

AU - Penninx, Brenda W.J.H.

AU - Poletti, Sara

AU - Pomarol-Clotet, Edith

AU - Portella, Maria J.

AU - Pozzi, Elena

AU - Reneman, Liesbeth

AU - Sacchet, Matthew D.

AU - Sämann, Philipp G.

AU - Schrantee, Anouk

AU - Sim, Kang

AU - Soares, Jair C.

AU - Stein, Dan J.

AU - Thomopoulos, Sophia I.

AU - Uyar-Demir, Aslihan

AU - van der Wee, Nic J.A.

AU - van der Werff, Steven J.A.

AU - Völzke, Henry

AU - Whittle, Sarah

AU - Wittfeld, Katharina

AU - Wright, Margaret J.

AU - Wu, Mon Ju

AU - Yang, Tony T.

AU - Zarate, Carlos

AU - Veltman, Dick J.

AU - Schmaal, Lianne

AU - Thompson, Paul M.

AU - Goya-Maldonado, Roberto

PY - 2024/12

Y1 - 2024/12

N2 - Machine learning (ML) techniques have gained popularity in the neuroimaging field due to their potential for classifying neuropsychiatric disorders. However, the diagnostic predictive power of the existing algorithms has been limited by small sample sizes, lack of representativeness, data leakage, and/or overfitting. Here, we overcome these limitations with the largest multi-site sample size to date (N = 5365) to provide a generalizable ML classification benchmark of major depressive disorder (MDD) using shallow linear and non-linear models. Leveraging brain measures from standardized ENIGMA analysis pipelines in FreeSurfer, we were able to classify MDD versus healthy controls (HC) with a balanced accuracy of around 62%. But after harmonizing the data, e.g., using ComBat, the balanced accuracy dropped to approximately 52%. Accuracy results close to random chance levels were also observed in stratified groups according to age of onset, antidepressant use, number of episodes and sex. Future studies incorporating higher dimensional brain imaging/phenotype features, and/or using more advanced machine and deep learning methods may yield more encouraging prospects.

AB - Machine learning (ML) techniques have gained popularity in the neuroimaging field due to their potential for classifying neuropsychiatric disorders. However, the diagnostic predictive power of the existing algorithms has been limited by small sample sizes, lack of representativeness, data leakage, and/or overfitting. Here, we overcome these limitations with the largest multi-site sample size to date (N = 5365) to provide a generalizable ML classification benchmark of major depressive disorder (MDD) using shallow linear and non-linear models. Leveraging brain measures from standardized ENIGMA analysis pipelines in FreeSurfer, we were able to classify MDD versus healthy controls (HC) with a balanced accuracy of around 62%. But after harmonizing the data, e.g., using ComBat, the balanced accuracy dropped to approximately 52%. Accuracy results close to random chance levels were also observed in stratified groups according to age of onset, antidepressant use, number of episodes and sex. Future studies incorporating higher dimensional brain imaging/phenotype features, and/or using more advanced machine and deep learning methods may yield more encouraging prospects.

UR - http://www.scopus.com/inward/record.url?scp=85182306914&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85182306914&partnerID=8YFLogxK

U2 - 10.1038/s41598-023-47934-8

DO - 10.1038/s41598-023-47934-8

M3 - Article

C2 - 38212349

AN - SCOPUS:85182306914

SN - 2045-2322

VL - 14

JO - Scientific reports

JF - Scientific reports

IS - 1

M1 - 1084

ER -

Multi-site benchmark classification of major depressive disorder using machine learning on cortical and subcortical measures

Abstract

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this