Using Natural Language Processing to Automatically Assess Feedback Quality: Findings From 3 Surgical Residencies

Erkin Ötleş; Daniel E. Kendrick; Quintin P. Solano; Mary Schuller; Samantha L. Ahle; Mickyas H. Eskender; Emily Carnes; Brian C. George

doi:10.1097/ACM.0000000000004153

Using Natural Language Processing to Automatically Assess Feedback Quality: Findings From 3 Surgical Residencies

Erkin Ötleş, Daniel E. Kendrick, Quintin P. Solano, Mary Schuller, Samantha L. Ahle, Mickyas H. Eskender, Emily Carnes, Brian C. George

Surgery

Research output: Contribution to journal › Article › peer-review

13 Scopus citations

Abstract

Purpose Learning is markedly improved with high-quality feedback, yet assuring the quality of feedback is difficult to achieve at scale. Natural language processing (NLP) algorithms may be useful in this context as they can automatically classify large volumes of narrative data. However, it is unknown if NLP models can accurately evaluate surgical trainee feedback. This study evaluated which NLP techniques best classify the quality of surgical trainee formative feedback recorded as part of a workplace assessment. Method During the 2016-2017 academic year, the SIMPL (Society for Improving Medical Professional Learning) app was used to record operative performance narrative feedback for residents at 3 university-based general surgery residency training programs. Feedback comments were collected for a sample of residents representing all 5 postgraduate year levels and coded for quality. In May 2019, the coded comments were then used to train NLP models to automatically classify the quality of feedback across 4 categories (effective, mediocre, ineffective, or other). Models included support vector machines (SVM), logistic regression, gradient boosted trees, naive Bayes, and random forests. The primary outcome was mean classification accuracy. Results The authors manually coded the quality of 600 recorded feedback comments. Those data were used to train NLP models to automatically classify the quality of feedback across 4 categories. The NLP model using an SVM algorithm yielded a maximum mean accuracy of 0.64 (standard deviation, 0.01). When the classification task was modified to distinguish only high-quality vs low-quality feedback, maximum mean accuracy was 0.83, again with SVM. Conclusions To the authors' knowledge, this is the first study to examine the use of NLP for classifying feedback quality. SVM NLP models demonstrated the ability to automatically classify the quality of surgical trainee evaluations. Larger training datasets would likely further increase accuracy.

Original language	English (US)
Pages (from-to)	1457-1460
Number of pages	4
Journal	Academic Medicine
Volume	96
Issue number	10
DOIs	https://doi.org/10.1097/ACM.0000000000004153
State	Published - Oct 1 2021

Bibliographical note

Publisher Copyright:
© 2021 Lippincott Williams and Wilkins. All rights reserved.

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access

10.1097/ACM.0000000000004153

OpenUrl availability

Full text

Cite this

@article{5e47ee7867174823a0348e3acd6b361b,

title = "Using Natural Language Processing to Automatically Assess Feedback Quality: Findings From 3 Surgical Residencies",

abstract = "Purpose Learning is markedly improved with high-quality feedback, yet assuring the quality of feedback is difficult to achieve at scale. Natural language processing (NLP) algorithms may be useful in this context as they can automatically classify large volumes of narrative data. However, it is unknown if NLP models can accurately evaluate surgical trainee feedback. This study evaluated which NLP techniques best classify the quality of surgical trainee formative feedback recorded as part of a workplace assessment. Method During the 2016-2017 academic year, the SIMPL (Society for Improving Medical Professional Learning) app was used to record operative performance narrative feedback for residents at 3 university-based general surgery residency training programs. Feedback comments were collected for a sample of residents representing all 5 postgraduate year levels and coded for quality. In May 2019, the coded comments were then used to train NLP models to automatically classify the quality of feedback across 4 categories (effective, mediocre, ineffective, or other). Models included support vector machines (SVM), logistic regression, gradient boosted trees, naive Bayes, and random forests. The primary outcome was mean classification accuracy. Results The authors manually coded the quality of 600 recorded feedback comments. Those data were used to train NLP models to automatically classify the quality of feedback across 4 categories. The NLP model using an SVM algorithm yielded a maximum mean accuracy of 0.64 (standard deviation, 0.01). When the classification task was modified to distinguish only high-quality vs low-quality feedback, maximum mean accuracy was 0.83, again with SVM. Conclusions To the authors' knowledge, this is the first study to examine the use of NLP for classifying feedback quality. SVM NLP models demonstrated the ability to automatically classify the quality of surgical trainee evaluations. Larger training datasets would likely further increase accuracy.",

author = "Erkin {\"O}tle{\c s} and Kendrick, {Daniel E.} and Solano, {Quintin P.} and Mary Schuller and Ahle, {Samantha L.} and Eskender, {Mickyas H.} and Emily Carnes and George, {Brian C.}",

year = "2021",

month = oct,

day = "1",

doi = "10.1097/ACM.0000000000004153",

language = "English (US)",

volume = "96",

pages = "1457--1460",

journal = "Academic Medicine",

issn = "1040-2446",

publisher = "Lippincott Williams and Wilkins",

number = "10",

}

TY - JOUR

T1 - Using Natural Language Processing to Automatically Assess Feedback Quality

T2 - Findings From 3 Surgical Residencies

AU - Ötleş, Erkin

AU - Kendrick, Daniel E.

AU - Solano, Quintin P.

AU - Schuller, Mary

AU - Ahle, Samantha L.

AU - Eskender, Mickyas H.

AU - Carnes, Emily

AU - George, Brian C.

PY - 2021/10/1

Y1 - 2021/10/1

N2 - Purpose Learning is markedly improved with high-quality feedback, yet assuring the quality of feedback is difficult to achieve at scale. Natural language processing (NLP) algorithms may be useful in this context as they can automatically classify large volumes of narrative data. However, it is unknown if NLP models can accurately evaluate surgical trainee feedback. This study evaluated which NLP techniques best classify the quality of surgical trainee formative feedback recorded as part of a workplace assessment. Method During the 2016-2017 academic year, the SIMPL (Society for Improving Medical Professional Learning) app was used to record operative performance narrative feedback for residents at 3 university-based general surgery residency training programs. Feedback comments were collected for a sample of residents representing all 5 postgraduate year levels and coded for quality. In May 2019, the coded comments were then used to train NLP models to automatically classify the quality of feedback across 4 categories (effective, mediocre, ineffective, or other). Models included support vector machines (SVM), logistic regression, gradient boosted trees, naive Bayes, and random forests. The primary outcome was mean classification accuracy. Results The authors manually coded the quality of 600 recorded feedback comments. Those data were used to train NLP models to automatically classify the quality of feedback across 4 categories. The NLP model using an SVM algorithm yielded a maximum mean accuracy of 0.64 (standard deviation, 0.01). When the classification task was modified to distinguish only high-quality vs low-quality feedback, maximum mean accuracy was 0.83, again with SVM. Conclusions To the authors' knowledge, this is the first study to examine the use of NLP for classifying feedback quality. SVM NLP models demonstrated the ability to automatically classify the quality of surgical trainee evaluations. Larger training datasets would likely further increase accuracy.

AB - Purpose Learning is markedly improved with high-quality feedback, yet assuring the quality of feedback is difficult to achieve at scale. Natural language processing (NLP) algorithms may be useful in this context as they can automatically classify large volumes of narrative data. However, it is unknown if NLP models can accurately evaluate surgical trainee feedback. This study evaluated which NLP techniques best classify the quality of surgical trainee formative feedback recorded as part of a workplace assessment. Method During the 2016-2017 academic year, the SIMPL (Society for Improving Medical Professional Learning) app was used to record operative performance narrative feedback for residents at 3 university-based general surgery residency training programs. Feedback comments were collected for a sample of residents representing all 5 postgraduate year levels and coded for quality. In May 2019, the coded comments were then used to train NLP models to automatically classify the quality of feedback across 4 categories (effective, mediocre, ineffective, or other). Models included support vector machines (SVM), logistic regression, gradient boosted trees, naive Bayes, and random forests. The primary outcome was mean classification accuracy. Results The authors manually coded the quality of 600 recorded feedback comments. Those data were used to train NLP models to automatically classify the quality of feedback across 4 categories. The NLP model using an SVM algorithm yielded a maximum mean accuracy of 0.64 (standard deviation, 0.01). When the classification task was modified to distinguish only high-quality vs low-quality feedback, maximum mean accuracy was 0.83, again with SVM. Conclusions To the authors' knowledge, this is the first study to examine the use of NLP for classifying feedback quality. SVM NLP models demonstrated the ability to automatically classify the quality of surgical trainee evaluations. Larger training datasets would likely further increase accuracy.

UR - http://www.scopus.com/inward/record.url?scp=85111502172&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85111502172&partnerID=8YFLogxK

U2 - 10.1097/ACM.0000000000004153

DO - 10.1097/ACM.0000000000004153

M3 - Article

C2 - 33951682

AN - SCOPUS:85111502172

SN - 1040-2446

VL - 96

SP - 1457

EP - 1460

JO - Academic Medicine

JF - Academic Medicine

IS - 10

ER -

Using Natural Language Processing to Automatically Assess Feedback Quality: Findings From 3 Surgical Residencies

Abstract

Bibliographical note

UN SDGs

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this