Contextual Gaps in Machine Learning for Mental Illness Prediction: The Case of Diagnostic Disclosures

Stevie Chancellor; Jessica L. Feuston; Jayhyun Chang

doi:10.1145/3610181

Contextual Gaps in Machine Learning for Mental Illness Prediction: The Case of Diagnostic Disclosures

Stevie Chancellor, Jessica L. Feuston, Jayhyun Chang

Computer Science and Engineering

Research output: Contribution to journal › Article › peer-review

Abstract

Getting training data for machine learning (ML) prediction of mental illness on social media data is labor intensive. To work around this, ML teams will extrapolate proxy signals, or alternative signs from data to evaluate illness status and create training datasets. However, these signals' validity has not been determined, whether signals align with important contextual factors, and how proxy quality impacts downstream model integrity. We use ML and qualitative methods to evaluate whether a popular proxy signal, diagnostic self-disclosure, produces a conceptually sound ML model of mental illness. Our findings identify major conceptual errors only seen through a qualitative investigation - training data built from diagnostic disclosures encodes a narrow vision of diagnosis experiences that propagates into paradoxes in the downstream ML model. This gap is obscured by strong performance of the ML classifier (F1 = 0.91). We discuss the implications of conceptual gaps in creating training data for human-centered models, and make suggestions for improving research methods.

Original language	English (US)
Article number	3610181
Journal	Proceedings of the ACM on Human-Computer Interaction
Volume	7
Issue number	CSCW2
DOIs	https://doi.org/10.1145/3610181
State	Published - Oct 4 2023

Bibliographical note

Publisher Copyright:
© 2023 ACM.

Keywords

Reddit
error analysis
mental health
social media
validity

Access

10.1145/3610181

OpenUrl availability

Full text

Cite this

@article{3ee5461ef1a44db89d589095cad40212,

title = "Contextual Gaps in Machine Learning for Mental Illness Prediction: The Case of Diagnostic Disclosures",

abstract = "Getting training data for machine learning (ML) prediction of mental illness on social media data is labor intensive. To work around this, ML teams will extrapolate proxy signals, or alternative signs from data to evaluate illness status and create training datasets. However, these signals' validity has not been determined, whether signals align with important contextual factors, and how proxy quality impacts downstream model integrity. We use ML and qualitative methods to evaluate whether a popular proxy signal, diagnostic self-disclosure, produces a conceptually sound ML model of mental illness. Our findings identify major conceptual errors only seen through a qualitative investigation - training data built from diagnostic disclosures encodes a narrow vision of diagnosis experiences that propagates into paradoxes in the downstream ML model. This gap is obscured by strong performance of the ML classifier (F1 = 0.91). We discuss the implications of conceptual gaps in creating training data for human-centered models, and make suggestions for improving research methods.",

keywords = "Reddit, error analysis, mental health, social media, validity",

author = "Stevie Chancellor and Feuston, {Jessica L.} and Jayhyun Chang",

note = "Publisher Copyright: {\textcopyright} 2023 ACM.",

year = "2023",

month = oct,

day = "4",

doi = "10.1145/3610181",

language = "English (US)",

volume = "7",

journal = "Proceedings of the ACM on Human-Computer Interaction",

issn = "2573-0142",

publisher = "Association for Computing Machinery (ACM)",

number = "CSCW2",

}

TY - JOUR

T1 - Contextual Gaps in Machine Learning for Mental Illness Prediction

T2 - The Case of Diagnostic Disclosures

AU - Chancellor, Stevie

AU - Feuston, Jessica L.

AU - Chang, Jayhyun

PY - 2023/10/4

Y1 - 2023/10/4

N2 - Getting training data for machine learning (ML) prediction of mental illness on social media data is labor intensive. To work around this, ML teams will extrapolate proxy signals, or alternative signs from data to evaluate illness status and create training datasets. However, these signals' validity has not been determined, whether signals align with important contextual factors, and how proxy quality impacts downstream model integrity. We use ML and qualitative methods to evaluate whether a popular proxy signal, diagnostic self-disclosure, produces a conceptually sound ML model of mental illness. Our findings identify major conceptual errors only seen through a qualitative investigation - training data built from diagnostic disclosures encodes a narrow vision of diagnosis experiences that propagates into paradoxes in the downstream ML model. This gap is obscured by strong performance of the ML classifier (F1 = 0.91). We discuss the implications of conceptual gaps in creating training data for human-centered models, and make suggestions for improving research methods.

AB - Getting training data for machine learning (ML) prediction of mental illness on social media data is labor intensive. To work around this, ML teams will extrapolate proxy signals, or alternative signs from data to evaluate illness status and create training datasets. However, these signals' validity has not been determined, whether signals align with important contextual factors, and how proxy quality impacts downstream model integrity. We use ML and qualitative methods to evaluate whether a popular proxy signal, diagnostic self-disclosure, produces a conceptually sound ML model of mental illness. Our findings identify major conceptual errors only seen through a qualitative investigation - training data built from diagnostic disclosures encodes a narrow vision of diagnosis experiences that propagates into paradoxes in the downstream ML model. This gap is obscured by strong performance of the ML classifier (F1 = 0.91). We discuss the implications of conceptual gaps in creating training data for human-centered models, and make suggestions for improving research methods.

KW - Reddit

KW - error analysis

KW - mental health

KW - social media

KW - validity

UR - http://www.scopus.com/inward/record.url?scp=85174534934&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85174534934&partnerID=8YFLogxK

U2 - 10.1145/3610181

DO - 10.1145/3610181

M3 - Article

AN - SCOPUS:85174534934

SN - 2573-0142

VL - 7

JO - Proceedings of the ACM on Human-Computer Interaction

JF - Proceedings of the ACM on Human-Computer Interaction

IS - CSCW2

M1 - 3610181

ER -

Contextual Gaps in Machine Learning for Mental Illness Prediction: The Case of Diagnostic Disclosures

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this