A new strategy for linking U.S. historical censuses: A case study for the IPUMS multigenerational longitudinal panel

Jonas Helgertz, Joseph Price, Jacob Wellington, Kelly J. Thompson, Steven Ruggles, Catherine A. Fitch

Research output: Contribution to journalArticlepeer-review

18 Scopus citations

Abstract

This paper presents a probabilistic method of record linkage, developed using the U.S. full count censuses of 1900 and 1910 but applicable to many sources of digitized historical records. The method links records using a two-step approach, first establishing high confidence matches among men by exploiting a comprehensive set of individual and contextual characteristics. The method then proceeds to link both men and women by leveraging links between households established in the first step. While only the first stage links can be directly comparable to other popular methods in research on the U.S., our method yields both considerably higher linkage rates and greater accuracy while only performing negligibly worse than other algorithms in resembling the target population.

Original languageEnglish (US)
Pages (from-to)12-29
Number of pages18
JournalHistorical Methods
Volume55
Issue number1
DOIs
StatePublished - 2022

Bibliographical note

Funding Information:
This work was funded by the research project “A Multigenerational Longitudinal Panel for Aging Research” (NIH/NIA, R01AG057679). Financial support from the Minnesota Population Center is also acknowledged, through core funding (P2C HD041023) from the Eunice Kennedy Shriver National Institute for Child Health and Human Development (NICHD). The authors acknowledge Ancestry.com for providing the underlying data making this research possible. The authors also gratefully acknowledge support from the National Institute on Aging (R01AG057679) and from the Minnesota Population Center (P2C HD041023) funded through a grant from the Eunice Kennedy Shriver National Institute for Child Health and Human Development (NICHD). We thank the editor, anonymous reviewers and participants at the 2019 Economic History conference on record linkage at Northwestern University ffor helpful comments and suggestions. We are also grateful for feedback from Ran Abramitzky, Leah Boustan and James Feigenbaum, as well as to Megan Moland, Jacob Van Leeuwen, Daniel Sabey and Tom Bryan for their excellent research assistance.

Funding Information:
The authors acknowledge Ancestry.com for providing the underlying data making this research possible. The authors also gratefully acknowledge support from the National Institute on Aging (R01AG057679) and from the Minnesota Population Center (P2C HD041023) funded through a grant from the Eunice Kennedy Shriver National Institute for Child Health and Human Development (NICHD). We thank the editor, anonymous reviewers and participants at the 2019 Economic History conference on record linkage at Northwestern University ffor helpful comments and suggestions. We are also grateful for feedback from Ran Abramitzky, Leah Boustan and James Feigenbaum, as well as to Megan Moland, Jacob Van Leeuwen, Daniel Sabey and Tom Bryan for their excellent research assistance.

Publisher Copyright:
© 2021 The Author(s). Published with license by Taylor & Francis Group, LLC.

Keywords

  • Record linkage
  • United States of America
  • census data
  • machine learning

PubMed: MeSH publication types

  • Journal Article

Fingerprint

Dive into the research topics of 'A new strategy for linking U.S. historical censuses: A case study for the IPUMS multigenerational longitudinal panel'. Together they form a unique fingerprint.

Cite this