Bot detection in Wikidata using behavioral and other informal cues

Andrew Hall; Loren Terveen; Aaron Halfaker

doi:10.1145/3274333

Bot detection in Wikidata using behavioral and other informal cues

Andrew Hall, Loren Terveen, Aaron Halfaker

Computer Science and Engineering

Research output: Contribution to journal › Article › peer-review

23 Scopus citations

Abstract

Bots have been important to peer production’s success. Wikipedia, OpenStreetMap, and Wikidata all have taken advantage of automation to perform work at a rate and scale exceeding that of human contributors. Understanding the ways in which humans and bots behave in these communities is an important topic, and one that relies on accurate bot recognition. Yet, in many cases, bot activities are not explicitly flagged and could be mistaken for human contributions. We develop a machine classifier to detect previously unidentified bots using implicit behavioral and other informal editing characteristics. We show that this method yields a high level of fitness under both formal evaluation (PR-AUC: 0.845, ROC-AUC: 0.985) and a qualitative analysis of “anonymous” contributor edit sessions. We also show that, in some cases, unflagged bot activities can significantly misrepresent human behavior in analyses. Our model has the potential to support future research and community patrolling activities.

Original language	English (US)
Article number	64
Journal	Proceedings of the ACM on Human-Computer Interaction
Volume	2
Issue number	CSCW
DOIs	https://doi.org/10.1145/3274333
State	Published - Nov 2018

Bibliographical note

Publisher Copyright:
Copyright 2018 held by Owner/Author(s).

Keywords

Automation
Bots
Machine learning
Peer production
Structured data
Wikidata

Access

10.1145/3274333

OpenUrl availability

Full text

Cite this

@article{e2aa5273b5934ba5ac43a4671fddf735,

title = "Bot detection in Wikidata using behavioral and other informal cues",

abstract = "Bots have been important to peer production{\textquoteright}s success. Wikipedia, OpenStreetMap, and Wikidata all have taken advantage of automation to perform work at a rate and scale exceeding that of human contributors. Understanding the ways in which humans and bots behave in these communities is an important topic, and one that relies on accurate bot recognition. Yet, in many cases, bot activities are not explicitly flagged and could be mistaken for human contributions. We develop a machine classifier to detect previously unidentified bots using implicit behavioral and other informal editing characteristics. We show that this method yields a high level of fitness under both formal evaluation (PR-AUC: 0.845, ROC-AUC: 0.985) and a qualitative analysis of “anonymous” contributor edit sessions. We also show that, in some cases, unflagged bot activities can significantly misrepresent human behavior in analyses. Our model has the potential to support future research and community patrolling activities.",

keywords = "Automation, Bots, Machine learning, Peer production, Structured data, Wikidata",

author = "Andrew Hall and Loren Terveen and Aaron Halfaker",

year = "2018",

month = nov,

doi = "10.1145/3274333",

language = "English (US)",

volume = "2",

journal = "Proceedings of the ACM on Human-Computer Interaction",

issn = "2573-0142",

publisher = "Association for Computing Machinery (ACM)",

number = "CSCW",

}

TY - JOUR

T1 - Bot detection in Wikidata using behavioral and other informal cues

AU - Hall, Andrew

AU - Terveen, Loren

AU - Halfaker, Aaron

PY - 2018/11

Y1 - 2018/11

N2 - Bots have been important to peer production’s success. Wikipedia, OpenStreetMap, and Wikidata all have taken advantage of automation to perform work at a rate and scale exceeding that of human contributors. Understanding the ways in which humans and bots behave in these communities is an important topic, and one that relies on accurate bot recognition. Yet, in many cases, bot activities are not explicitly flagged and could be mistaken for human contributions. We develop a machine classifier to detect previously unidentified bots using implicit behavioral and other informal editing characteristics. We show that this method yields a high level of fitness under both formal evaluation (PR-AUC: 0.845, ROC-AUC: 0.985) and a qualitative analysis of “anonymous” contributor edit sessions. We also show that, in some cases, unflagged bot activities can significantly misrepresent human behavior in analyses. Our model has the potential to support future research and community patrolling activities.

AB - Bots have been important to peer production’s success. Wikipedia, OpenStreetMap, and Wikidata all have taken advantage of automation to perform work at a rate and scale exceeding that of human contributors. Understanding the ways in which humans and bots behave in these communities is an important topic, and one that relies on accurate bot recognition. Yet, in many cases, bot activities are not explicitly flagged and could be mistaken for human contributions. We develop a machine classifier to detect previously unidentified bots using implicit behavioral and other informal editing characteristics. We show that this method yields a high level of fitness under both formal evaluation (PR-AUC: 0.845, ROC-AUC: 0.985) and a qualitative analysis of “anonymous” contributor edit sessions. We also show that, in some cases, unflagged bot activities can significantly misrepresent human behavior in analyses. Our model has the potential to support future research and community patrolling activities.

KW - Automation

KW - Bots

KW - Machine learning

KW - Peer production

KW - Structured data

KW - Wikidata

UR - http://www.scopus.com/inward/record.url?scp=85066419175&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066419175&partnerID=8YFLogxK

U2 - 10.1145/3274333

DO - 10.1145/3274333

M3 - Article

AN - SCOPUS:85066419175

SN - 2573-0142

VL - 2

JO - Proceedings of the ACM on Human-Computer Interaction

JF - Proceedings of the ACM on Human-Computer Interaction

IS - CSCW

M1 - 64

ER -

Bot detection in Wikidata using behavioral and other informal cues

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this