CHS:Small:Collaborative Research: Structured Data Peer Production: Addressing Challenges and Leveraging Opportunities

Project: Research project

Project Details

Description

This project will illuminate and advance the critical new phenomenon of peer-produced structured data. In the past two decades, peer production has emerged as a new means of knowledge production, such as the prime success story of Wikipedia. Recently, peer production communities have begun producing a new kind of data that is structured and machine-readable, with Wikidata and OpenStreetMap being two major examples. These structured data peer production communities differ from traditional peer production in key ways: (1) For the data produced by these communities to be machine-readable, it must follow strict syntactic and semantic rules; this contrasts with traditional peer production communities' fundamental ethos of contributor freedom; (2) The data produced by these communities is intended primarily to be consumed and processed by algorithms before becoming visible to end users and, thus, the ultimate consequences of contributors' edits are mostly invisible to them; and (3) instead of offering different language versions, Wikidata and OpenStreetMap both opted to produce a single, worldwide dataset intended for use across languages and cultures. This research will address numerous issues of significant social and economic impact, notably helping to improve the transparency, accuracy, and fairness of algorithms that affect millions of people's lives on a daily basis, and the relevance and utility of structured data across multiple applications and linguistic and cultural barriers. It will quantify the value important algorithms derive from the volunteer effort of people distributed across the globe. It will show the extent to which peer produced structured data serves as a global lingua franca - or whether, on the other hand, it embodies culturally specific assumptions that limit its applicability. It will lead to the creation of software tools that improve data production processes and thus the data produced.

This research seeks to produce this understanding by focusing on three themes related to the structured data peer production process and data products: (1) Motivation of members of the communities. The ability to edit what one wants is a powerful motivating factor for peer production contributors. This research will investigate how contributor behavior changes when this motivating factor is blunted by the stringent rules necessary for the creation of structured, machine-readable data. (2) Value of data produced by the communities. There are several ways to estimate the value of contributions in Wikipedia. However, Wikidata and OpenStreetMap data typically are processed by algorithms - perhaps even by multiple algorithms for different purposes - before being delivered to humans. This makes it difficult to assess the value of these data. The research will investigate the value of structured data by identifying where it is used and how it contributes to algorithmic outputs; it also will create tools to make this information visible to editors and the community as a whole. (3) Data production in a language-independent context. The research will analyze the consequences of the ambitious decision by both Wikidata and OpenStreetMap to do their work in a language-independent way. Specifically, it will examine the extent to which this decision results in undesired effects in the data this generated.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

StatusFinished
Effective start/end date9/1/188/31/22

Funding

  • National Science Foundation: $249,738.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.