Supervised machine learning for exploratory analysis in family research

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Objective: This article introduces supervised machine learning (ML) for conducting exploratory, discovery-oriented family research in a transparent and systematic way. Background: Supervised ML can examine large numbers of variable simultaneously, identify key predictors, and explore patterns among predictors—an approach that may help address concerns in family research about lack of theoretical specificity and prevalence of unguided exploratory analysis. Method: Following an overview of supervised ML, example analyses drew on the National Longitudinal Study of Adolescent Health (Add Health) dataset across Waves I–IV (N = 5114 adolescents, 50.53% female, Mage = 15.94, SD = 1.77 at Wave I). From 143 articles using Add Health data Waves I through IV, 62 adolescent family variables from eight domains (e.g., socioeconomics, parenting, health) were identified as predictors of young adult (ages 24–32) educational attainment. Following benchmark regression models, ML models were trained using Lasso regression, decision tree, random forest, and extreme gradient boosting; these were tested separately from training data and interpreted through SHapley Additive exPlanations. Results: The random forest model performed best (R2 =.382 for the model with all the predictors): 14 variables were identified to be the key predictors of educational attainment. Patterns among these predictors, including directionality, nonlinearity and interactions emerged. Conclusions: Supervised ML research can be used to inform further confirmatory analyses and advance theory.

Original languageEnglish (US)
JournalJournal of Marriage and Family
DOIs
StateAccepted/In press - 2024

Bibliographical note

Publisher Copyright:
© 2024 The Authors. Journal of Marriage and Family published by Wiley Periodicals LLC on behalf of National Council on Family Relations.

Keywords

  • adolescence
  • education
  • emerging adulthood
  • family systems
  • longitudinal research
  • quantitative methodology

Fingerprint

Dive into the research topics of 'Supervised machine learning for exploratory analysis in family research'. Together they form a unique fingerprint.

Cite this