Challenges of Large-Scale Data Processing in the 1990s: The IPUMS Experience

Diana L. Magnuson; Steven Ruggles

doi:10.1109/MAHC.2022.3214736

Challenges of Large-Scale Data Processing in the 1990s: The IPUMS Experience

Diana L. Magnuson, Steven Ruggles

History (Twin Cities)

Research output: Contribution to journal › Article › peer-review

Abstract

When it was launched in 1991, the Integrated Public Use Microdata Series (IPUMS) project faced a challenging environment and limited resources. Few datasets were interoperable and much data collected at great public expense was inaccessible to most researchers. Documentation of datasets was nonstandardized, incomplete, and inadequate for automated processing. With insufficient attention to preservation, valuable scientific data were disappearing (see Bogue et al., 1976). IPUMS was established to address these critical issues. At the outset, IPUMS faced daunting barriers of inadequate data processing, storage, and network capacity. This anecdote describes the improvised computational infrastructure developed in the decade from 1989 to 1999 to process, manage, and disseminate the world's largest population datasets. We use a combination of archival sources, interviews, and our own memories to trace the development of the IPUMS computing environment during a period of explosive technical innovation. The development of IPUMS is part of a larger story of the development of social science infrastructure in the late 20th century and its contribution to democratizing data access.

Original language	English (US)
Pages (from-to)	71-83
Number of pages	13
Journal	IEEE Annals of the History of Computing
Volume	44
Issue number	4
DOIs	https://doi.org/10.1109/MAHC.2022.3214736
State	Published - Oct 1 2022

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

Access

10.1109/MAHC.2022.3214736

OpenUrl availability

Full text

Cite this

@article{5cf4f94e40024eb2869e3412e2ff2a30,

title = "Challenges of Large-Scale Data Processing in the 1990s: The IPUMS Experience",

abstract = "When it was launched in 1991, the Integrated Public Use Microdata Series (IPUMS) project faced a challenging environment and limited resources. Few datasets were interoperable and much data collected at great public expense was inaccessible to most researchers. Documentation of datasets was nonstandardized, incomplete, and inadequate for automated processing. With insufficient attention to preservation, valuable scientific data were disappearing (see Bogue et al., 1976). IPUMS was established to address these critical issues. At the outset, IPUMS faced daunting barriers of inadequate data processing, storage, and network capacity. This anecdote describes the improvised computational infrastructure developed in the decade from 1989 to 1999 to process, manage, and disseminate the world's largest population datasets. We use a combination of archival sources, interviews, and our own memories to trace the development of the IPUMS computing environment during a period of explosive technical innovation. The development of IPUMS is part of a larger story of the development of social science infrastructure in the late 20th century and its contribution to democratizing data access.",

author = "Magnuson, {Diana L.} and Steven Ruggles",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.",

year = "2022",

month = oct,

day = "1",

doi = "10.1109/MAHC.2022.3214736",

language = "English (US)",

volume = "44",

pages = "71--83",

journal = "IEEE Annals of the History of Computing",

issn = "1058-6180",

publisher = "IEEE Computer Society",

number = "4",

}

TY - JOUR

T1 - Challenges of Large-Scale Data Processing in the 1990s

T2 - The IPUMS Experience

AU - Magnuson, Diana L.

AU - Ruggles, Steven

PY - 2022/10/1

Y1 - 2022/10/1

N2 - When it was launched in 1991, the Integrated Public Use Microdata Series (IPUMS) project faced a challenging environment and limited resources. Few datasets were interoperable and much data collected at great public expense was inaccessible to most researchers. Documentation of datasets was nonstandardized, incomplete, and inadequate for automated processing. With insufficient attention to preservation, valuable scientific data were disappearing (see Bogue et al., 1976). IPUMS was established to address these critical issues. At the outset, IPUMS faced daunting barriers of inadequate data processing, storage, and network capacity. This anecdote describes the improvised computational infrastructure developed in the decade from 1989 to 1999 to process, manage, and disseminate the world's largest population datasets. We use a combination of archival sources, interviews, and our own memories to trace the development of the IPUMS computing environment during a period of explosive technical innovation. The development of IPUMS is part of a larger story of the development of social science infrastructure in the late 20th century and its contribution to democratizing data access.

AB - When it was launched in 1991, the Integrated Public Use Microdata Series (IPUMS) project faced a challenging environment and limited resources. Few datasets were interoperable and much data collected at great public expense was inaccessible to most researchers. Documentation of datasets was nonstandardized, incomplete, and inadequate for automated processing. With insufficient attention to preservation, valuable scientific data were disappearing (see Bogue et al., 1976). IPUMS was established to address these critical issues. At the outset, IPUMS faced daunting barriers of inadequate data processing, storage, and network capacity. This anecdote describes the improvised computational infrastructure developed in the decade from 1989 to 1999 to process, manage, and disseminate the world's largest population datasets. We use a combination of archival sources, interviews, and our own memories to trace the development of the IPUMS computing environment during a period of explosive technical innovation. The development of IPUMS is part of a larger story of the development of social science infrastructure in the late 20th century and its contribution to democratizing data access.

UR - http://www.scopus.com/inward/record.url?scp=85145019089&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85145019089&partnerID=8YFLogxK

U2 - 10.1109/MAHC.2022.3214736

DO - 10.1109/MAHC.2022.3214736

M3 - Article

AN - SCOPUS:85145019089

SN - 1058-6180

VL - 44

SP - 71

EP - 83

JO - IEEE Annals of the History of Computing

JF - IEEE Annals of the History of Computing

IS - 4

ER -

Challenges of Large-Scale Data Processing in the 1990s: The IPUMS Experience

Abstract

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this