Ease of adoption of clinical natural language processing software: An evaluation of five systems

Kai Zheng; V. G.Vinod Vydiswaran; Yang Liu; Yue Wang; Amber Stubbs; Özlem Uzuner; Anupama E. Gururaj; Samuel Bayer; John Aberdeen; Anna Rumshisky; Serguei Pakhomov; Hongfang Liu; Hua Xu

doi:10.1016/j.jbi.2015.07.008

Ease of adoption of clinical natural language processing software: An evaluation of five systems

Kai Zheng, V. G.Vinod Vydiswaran, Yang Liu, Yue Wang, Amber Stubbs, Özlem Uzuner, Anupama E. Gururaj, Samuel Bayer, John Aberdeen, Anna Rumshisky, Serguei Pakhomov, Hongfang Liu, Hua Xu

Pharmaceutical Care and Health Systems

Research output: Contribution to journal › Article › peer-review

29 Scopus citations

Abstract

Objective: In recognition of potential barriers that may inhibit the widespread adoption of biomedical software, the 2014 i2b2 Challenge introduced a special track, Track 3 - Software Usability Assessment, in order to develop a better understanding of the adoption issues that might be associated with the state-of-the-art clinical NLP systems. This paper reports the ease of adoption assessment methods we developed for this track, and the results of evaluating five clinical NLP system submissions. Materials and methods: A team of human evaluators performed a series of scripted adoptability test tasks with each of the participating systems. The evaluation team consisted of four "expert evaluators" with training in computer science, and eight "end user evaluators" with mixed backgrounds in medicine, nursing, pharmacy, and health informatics. We assessed how easy it is to adopt the submitted systems along the following three dimensions: communication effectiveness (i.e., how effective a system is in communicating its designed objectives to intended audience), effort required to install, and effort required to use. We used a formal software usability testing tool, TURF, to record the evaluators' interactions with the systems and 'think-aloud' data revealing their thought processes when installing and using the systems and when resolving unexpected issues. Results: Overall, the ease of adoption ratings that the five systems received are unsatisfactory. Installation of some of the systems proved to be rather difficult, and some systems failed to adequately communicate their designed objectives to intended adopters. Further, the average ratings provided by the end user evaluators on ease of use and ease of interpreting output are -0.35 and -0.53, respectively, indicating that this group of users generally deemed the systems extremely difficult to work with. While the ratings provided by the expert evaluators are higher, 0.6 and 0.45, respectively, these ratings are still low indicating that they also experienced considerable struggles. Discussion: The results of the Track 3 evaluation show that the adoptability of the five participating clinical NLP systems has a great margin for improvement. Remedy strategies suggested by the evaluators included (1) more detailed and operation system specific use instructions; (2) provision of more pertinent onscreen feedback for easier diagnosis of problems; (3) including screen walk-throughs in use instructions so users know what to expect and what might have gone wrong; (4) avoiding jargon and acronyms in materials intended for end users; and (5) packaging prerequisites required within software distributions so that prospective adopters of the software do not have to obtain each of the third-party components on their own.

Original language	English (US)
Pages (from-to)	S189-S196
Journal	Journal of Biomedical Informatics
Volume	58
DOIs	https://doi.org/10.1016/j.jbi.2015.07.008
State	Published - Dec 1 2015

Bibliographical note

Publisher Copyright:
© 2015 Elsevier Inc..

Keywords

Human-computer interaction
Natural language processing [L01.224.065.580]
Software design [L01.224.900.820]
Software validation [L01.224.900.868]
Usability
User-computer interface [L01.224.900.910]

Access

10.1016/j.jbi.2015.07.008

OpenUrl availability

Full text

Cite this

@article{cc5dd9821c7d4a99a52052ad0ff2f390,

title = "Ease of adoption of clinical natural language processing software: An evaluation of five systems",

abstract = "Objective: In recognition of potential barriers that may inhibit the widespread adoption of biomedical software, the 2014 i2b2 Challenge introduced a special track, Track 3 - Software Usability Assessment, in order to develop a better understanding of the adoption issues that might be associated with the state-of-the-art clinical NLP systems. This paper reports the ease of adoption assessment methods we developed for this track, and the results of evaluating five clinical NLP system submissions. Materials and methods: A team of human evaluators performed a series of scripted adoptability test tasks with each of the participating systems. The evaluation team consisted of four {"}expert evaluators{"} with training in computer science, and eight {"}end user evaluators{"} with mixed backgrounds in medicine, nursing, pharmacy, and health informatics. We assessed how easy it is to adopt the submitted systems along the following three dimensions: communication effectiveness (i.e., how effective a system is in communicating its designed objectives to intended audience), effort required to install, and effort required to use. We used a formal software usability testing tool, TURF, to record the evaluators' interactions with the systems and 'think-aloud' data revealing their thought processes when installing and using the systems and when resolving unexpected issues. Results: Overall, the ease of adoption ratings that the five systems received are unsatisfactory. Installation of some of the systems proved to be rather difficult, and some systems failed to adequately communicate their designed objectives to intended adopters. Further, the average ratings provided by the end user evaluators on ease of use and ease of interpreting output are -0.35 and -0.53, respectively, indicating that this group of users generally deemed the systems extremely difficult to work with. While the ratings provided by the expert evaluators are higher, 0.6 and 0.45, respectively, these ratings are still low indicating that they also experienced considerable struggles. Discussion: The results of the Track 3 evaluation show that the adoptability of the five participating clinical NLP systems has a great margin for improvement. Remedy strategies suggested by the evaluators included (1) more detailed and operation system specific use instructions; (2) provision of more pertinent onscreen feedback for easier diagnosis of problems; (3) including screen walk-throughs in use instructions so users know what to expect and what might have gone wrong; (4) avoiding jargon and acronyms in materials intended for end users; and (5) packaging prerequisites required within software distributions so that prospective adopters of the software do not have to obtain each of the third-party components on their own.",

keywords = "Human-computer interaction, Natural language processing [L01.224.065.580], Software design [L01.224.900.820], Software validation [L01.224.900.868], Usability, User-computer interface [L01.224.900.910]",

author = "Kai Zheng and Vydiswaran, {V. G.Vinod} and Yang Liu and Yue Wang and Amber Stubbs and {\"O}zlem Uzuner and Gururaj, {Anupama E.} and Samuel Bayer and John Aberdeen and Anna Rumshisky and Serguei Pakhomov and Hongfang Liu and Hua Xu",

note = "Publisher Copyright: {\textcopyright} 2015 Elsevier Inc..",

year = "2015",

month = dec,

day = "1",

doi = "10.1016/j.jbi.2015.07.008",

language = "English (US)",

volume = "58",

pages = "S189--S196",

journal = "Journal of Biomedical Informatics",

issn = "1532-0464",

publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Ease of adoption of clinical natural language processing software

T2 - An evaluation of five systems

AU - Zheng, Kai

AU - Vydiswaran, V. G.Vinod

AU - Liu, Yang

AU - Wang, Yue

AU - Stubbs, Amber

AU - Uzuner, Özlem

AU - Gururaj, Anupama E.

AU - Bayer, Samuel

AU - Aberdeen, John

AU - Rumshisky, Anna

AU - Pakhomov, Serguei

AU - Liu, Hongfang

AU - Xu, Hua

PY - 2015/12/1

Y1 - 2015/12/1

N2 - Objective: In recognition of potential barriers that may inhibit the widespread adoption of biomedical software, the 2014 i2b2 Challenge introduced a special track, Track 3 - Software Usability Assessment, in order to develop a better understanding of the adoption issues that might be associated with the state-of-the-art clinical NLP systems. This paper reports the ease of adoption assessment methods we developed for this track, and the results of evaluating five clinical NLP system submissions. Materials and methods: A team of human evaluators performed a series of scripted adoptability test tasks with each of the participating systems. The evaluation team consisted of four "expert evaluators" with training in computer science, and eight "end user evaluators" with mixed backgrounds in medicine, nursing, pharmacy, and health informatics. We assessed how easy it is to adopt the submitted systems along the following three dimensions: communication effectiveness (i.e., how effective a system is in communicating its designed objectives to intended audience), effort required to install, and effort required to use. We used a formal software usability testing tool, TURF, to record the evaluators' interactions with the systems and 'think-aloud' data revealing their thought processes when installing and using the systems and when resolving unexpected issues. Results: Overall, the ease of adoption ratings that the five systems received are unsatisfactory. Installation of some of the systems proved to be rather difficult, and some systems failed to adequately communicate their designed objectives to intended adopters. Further, the average ratings provided by the end user evaluators on ease of use and ease of interpreting output are -0.35 and -0.53, respectively, indicating that this group of users generally deemed the systems extremely difficult to work with. While the ratings provided by the expert evaluators are higher, 0.6 and 0.45, respectively, these ratings are still low indicating that they also experienced considerable struggles. Discussion: The results of the Track 3 evaluation show that the adoptability of the five participating clinical NLP systems has a great margin for improvement. Remedy strategies suggested by the evaluators included (1) more detailed and operation system specific use instructions; (2) provision of more pertinent onscreen feedback for easier diagnosis of problems; (3) including screen walk-throughs in use instructions so users know what to expect and what might have gone wrong; (4) avoiding jargon and acronyms in materials intended for end users; and (5) packaging prerequisites required within software distributions so that prospective adopters of the software do not have to obtain each of the third-party components on their own.

AB - Objective: In recognition of potential barriers that may inhibit the widespread adoption of biomedical software, the 2014 i2b2 Challenge introduced a special track, Track 3 - Software Usability Assessment, in order to develop a better understanding of the adoption issues that might be associated with the state-of-the-art clinical NLP systems. This paper reports the ease of adoption assessment methods we developed for this track, and the results of evaluating five clinical NLP system submissions. Materials and methods: A team of human evaluators performed a series of scripted adoptability test tasks with each of the participating systems. The evaluation team consisted of four "expert evaluators" with training in computer science, and eight "end user evaluators" with mixed backgrounds in medicine, nursing, pharmacy, and health informatics. We assessed how easy it is to adopt the submitted systems along the following three dimensions: communication effectiveness (i.e., how effective a system is in communicating its designed objectives to intended audience), effort required to install, and effort required to use. We used a formal software usability testing tool, TURF, to record the evaluators' interactions with the systems and 'think-aloud' data revealing their thought processes when installing and using the systems and when resolving unexpected issues. Results: Overall, the ease of adoption ratings that the five systems received are unsatisfactory. Installation of some of the systems proved to be rather difficult, and some systems failed to adequately communicate their designed objectives to intended adopters. Further, the average ratings provided by the end user evaluators on ease of use and ease of interpreting output are -0.35 and -0.53, respectively, indicating that this group of users generally deemed the systems extremely difficult to work with. While the ratings provided by the expert evaluators are higher, 0.6 and 0.45, respectively, these ratings are still low indicating that they also experienced considerable struggles. Discussion: The results of the Track 3 evaluation show that the adoptability of the five participating clinical NLP systems has a great margin for improvement. Remedy strategies suggested by the evaluators included (1) more detailed and operation system specific use instructions; (2) provision of more pertinent onscreen feedback for easier diagnosis of problems; (3) including screen walk-throughs in use instructions so users know what to expect and what might have gone wrong; (4) avoiding jargon and acronyms in materials intended for end users; and (5) packaging prerequisites required within software distributions so that prospective adopters of the software do not have to obtain each of the third-party components on their own.

KW - Human-computer interaction

KW - Natural language processing [L01.224.065.580]

KW - Software design [L01.224.900.820]

KW - Software validation [L01.224.900.868]

KW - Usability

KW - User-computer interface [L01.224.900.910]

UR - http://www.scopus.com/inward/record.url?scp=84938262566&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84938262566&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2015.07.008

DO - 10.1016/j.jbi.2015.07.008

M3 - Article

C2 - 26210361

AN - SCOPUS:84938262566

SN - 1532-0464

VL - 58

SP - S189-S196

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

ER -

Ease of adoption of clinical natural language processing software: An evaluation of five systems

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this