Big Data for Population Research

Project: Research project

Project Details

Description

DESCRIPTION (provided by applicant): This proposal seeks funding to expand the Integrated Public Use Micro-data Series (IPUMS) by adding demographic and geographic data describing the entire enumerated population of the U.S. from 1790 to 1930. The project will provide data on the characteristics of over 600 million persons, quadrupling the quantity of U.S. census micro-data available for scientific research. The data will cover entire populations with full geographic detail, providing contextual information on neighborhood characteristics, including ethnic composition, demographic behavior, and population mobility. These data offer the earliest information available on key social and economic characteristics, and they will provide invaluable insight into processes of long-run demographic and economic change. The data will make a permanent and substantial addition to the nation's statistical infrastructure and will have far-reaching implications for research across the social and behavioral sciences. The project is made possible by the donation of a massive high-quality verified transcription of information in the U.S. censuses, prepared by two major genealogical organizations. Converting this immense body of raw data into a format suitable for scientific analysis will require the following tasks: () classify and code geographic locations to be compatible with categories used in the published census returns; (2) assess completeness and accuracy of the data transcription; (3) convert alphabetic string data into numeric categories that are comparable over time; (4) employ new data cleaning software to identify and correct common enumeration and transcription errors; (5) develop documentation, including full descriptions of data processing methods, detailed analysis of comparability issues, and comprehensive machine-processable metadata; (6) incorporate the data into the IPUMS data access system for free dissemination to the scientific community; and (7) implement secure data protection and preservation policies. The project will be executed by a team of highly-experienced researchers with exceptional proficiency in large- scale data creation, integration, and dissemination and will leverage cutting-edge technology to process an unprecedented volume of data at reasonable cost. The project is a collaboration of the Minnesota Population Center with the world's largest producers of genealogical data, allowing cost-effective use of scarce resources to develop shared infrastructure for population and health research.
StatusFinished
Effective start/end date9/21/135/31/19

Funding

  • National Institute of Child Health and Human Development: $638,712.00
  • National Institute of Child Health and Human Development: $622,521.00
  • National Institute of Child Health and Human Development: $635,072.00
  • National Institute of Child Health and Human Development: $623,527.00
  • National Institute of Child Health and Human Development: $625,449.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.