Contextual Gaps in Machine Learning for Mental Illness Prediction: The Case of Diagnostic Disclosures

Stevie Chancellor, Jessica L. Feuston, Jayhyun Chang

Research output: Contribution to journalArticlepeer-review

Abstract

Getting training data for machine learning (ML) prediction of mental illness on social media data is labor intensive. To work around this, ML teams will extrapolate proxy signals, or alternative signs from data to evaluate illness status and create training datasets. However, these signals' validity has not been determined, whether signals align with important contextual factors, and how proxy quality impacts downstream model integrity. We use ML and qualitative methods to evaluate whether a popular proxy signal, diagnostic self-disclosure, produces a conceptually sound ML model of mental illness. Our findings identify major conceptual errors only seen through a qualitative investigation - training data built from diagnostic disclosures encodes a narrow vision of diagnosis experiences that propagates into paradoxes in the downstream ML model. This gap is obscured by strong performance of the ML classifier (F1 = 0.91). We discuss the implications of conceptual gaps in creating training data for human-centered models, and make suggestions for improving research methods.

Original languageEnglish (US)
Article number3610181
JournalProceedings of the ACM on Human-Computer Interaction
Volume7
Issue numberCSCW2
DOIs
StatePublished - Oct 4 2023

Bibliographical note

Publisher Copyright:
© 2023 ACM.

Keywords

  • Reddit
  • error analysis
  • mental health
  • social media
  • validity

Fingerprint

Dive into the research topics of 'Contextual Gaps in Machine Learning for Mental Illness Prediction: The Case of Diagnostic Disclosures'. Together they form a unique fingerprint.

Cite this