DuluthNLP at SemEval-2023 Task 12: AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset

Samuel Akrah; Ted Pedersen

DuluthNLP at SemEval-2023 Task 12: AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset

Samuel Akrah, Ted Pedersen

Computer Science (Duluth)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Scopus citations

Abstract

This paper describes the DuluthNLP system that participated in Task 12 of SemEval-2023 on AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset. Given a set of tweets, the task requires participating systems to classify each tweet as negative, positive or neutral. We evaluate a range of monolingual and multilingual pre-trained models on the Twi language dataset, one among the 14 African languages included in the SemEval task. We introduce TwiBERT, a new pretrained model trained from scratch. We show that TwiBERT, along with mBERT, generally perform best when trained on the Twi dataset, achieving an F1 score of 64.29% on the official evaluation test data, which ranks 14 out of 30 of the total submissions for Track 10. The TwiBERT model is released at https://huggingface.co/sakrah/TwiBERT.

Original language	English (US)
Title of host publication	17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop
Editors	Atul Kr. Ojha, A. Seza Dogruoz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
Publisher	Association for Computational Linguistics
Pages	1697-1701
Number of pages	5
ISBN (Electronic)	9781959429999
State	Published - 2023
Event	17th International Workshop on Semantic Evaluation, SemEval 2023, co-located with the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 - Hybrid, Toronto, Canada Duration: Jul 13 2023 → Jul 14 2023

Publication series

Name	17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop

Conference

Conference	17th International Workshop on Semantic Evaluation, SemEval 2023, co-located with the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Country/Territory	Canada
City	Hybrid, Toronto
Period	7/13/23 → 7/14/23

Bibliographical note

Publisher Copyright:
© 2023 Association for Computational Linguistics.

OpenUrl availability

Full text

Cite this

Akrah, S., & Pedersen, T. (2023). DuluthNLP at SemEval-2023 Task 12: AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset. In A. K. Ojha, A. S. Dogruoz, G. Da San Martino, H. T. Madabushi, R. Kumar, & E. Sartori (Eds.), 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop (pp. 1697-1701). (17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop). Association for Computational Linguistics.

DuluthNLP at SemEval-2023 Task 12: AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset. / Akrah, Samuel; Pedersen, Ted.
17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop. ed. / Atul Kr. Ojha; A. Seza Dogruoz; Giovanni Da San Martino; Harish Tayyar Madabushi; Ritesh Kumar; Elisa Sartori. Association for Computational Linguistics, 2023. p. 1697-1701 (17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Akrah, S & Pedersen, T 2023, DuluthNLP at SemEval-2023 Task 12: AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset. in AK Ojha, AS Dogruoz, G Da San Martino, HT Madabushi, R Kumar & E Sartori (eds), 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop. 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop, Association for Computational Linguistics, pp. 1697-1701, 17th International Workshop on Semantic Evaluation, SemEval 2023, co-located with the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023, Hybrid, Toronto, Canada, 7/13/23.

Akrah S, Pedersen T. DuluthNLP at SemEval-2023 Task 12: AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset. In Ojha AK, Dogruoz AS, Da San Martino G, Madabushi HT, Kumar R, Sartori E, editors, 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop. Association for Computational Linguistics. 2023. p. 1697-1701. (17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop).

Akrah, Samuel ; Pedersen, Ted. / DuluthNLP at SemEval-2023 Task 12 : AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset. 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop. editor / Atul Kr. Ojha ; A. Seza Dogruoz ; Giovanni Da San Martino ; Harish Tayyar Madabushi ; Ritesh Kumar ; Elisa Sartori. Association for Computational Linguistics, 2023. pp. 1697-1701 (17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop).

@inproceedings{b0406f0252524e3b80083f276dc4bbb0,

title = "DuluthNLP at SemEval-2023 Task 12: AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset",

abstract = "This paper describes the DuluthNLP system that participated in Task 12 of SemEval-2023 on AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset. Given a set of tweets, the task requires participating systems to classify each tweet as negative, positive or neutral. We evaluate a range of monolingual and multilingual pre-trained models on the Twi language dataset, one among the 14 African languages included in the SemEval task. We introduce TwiBERT, a new pretrained model trained from scratch. We show that TwiBERT, along with mBERT, generally perform best when trained on the Twi dataset, achieving an F1 score of 64.29% on the official evaluation test data, which ranks 14 out of 30 of the total submissions for Track 10. The TwiBERT model is released at https://huggingface.co/sakrah/TwiBERT.",

author = "Samuel Akrah and Ted Pedersen",

note = "Publisher Copyright: {\textcopyright} 2023 Association for Computational Linguistics.; 17th International Workshop on Semantic Evaluation, SemEval 2023, co-located with the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 ; Conference date: 13-07-2023 Through 14-07-2023",

year = "2023",

language = "English (US)",

series = "17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop",

publisher = "Association for Computational Linguistics",

pages = "1697--1701",

editor = "Ojha, {Atul Kr.} and Dogruoz, {A. Seza} and {Da San Martino}, Giovanni and Madabushi, {Harish Tayyar} and Ritesh Kumar and Elisa Sartori",

booktitle = "17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop",

}

TY - GEN

T1 - DuluthNLP at SemEval-2023 Task 12

T2 - 17th International Workshop on Semantic Evaluation, SemEval 2023, co-located with the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023

AU - Akrah, Samuel

AU - Pedersen, Ted

PY - 2023

Y1 - 2023

N2 - This paper describes the DuluthNLP system that participated in Task 12 of SemEval-2023 on AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset. Given a set of tweets, the task requires participating systems to classify each tweet as negative, positive or neutral. We evaluate a range of monolingual and multilingual pre-trained models on the Twi language dataset, one among the 14 African languages included in the SemEval task. We introduce TwiBERT, a new pretrained model trained from scratch. We show that TwiBERT, along with mBERT, generally perform best when trained on the Twi dataset, achieving an F1 score of 64.29% on the official evaluation test data, which ranks 14 out of 30 of the total submissions for Track 10. The TwiBERT model is released at https://huggingface.co/sakrah/TwiBERT.

AB - This paper describes the DuluthNLP system that participated in Task 12 of SemEval-2023 on AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset. Given a set of tweets, the task requires participating systems to classify each tweet as negative, positive or neutral. We evaluate a range of monolingual and multilingual pre-trained models on the Twi language dataset, one among the 14 African languages included in the SemEval task. We introduce TwiBERT, a new pretrained model trained from scratch. We show that TwiBERT, along with mBERT, generally perform best when trained on the Twi dataset, achieving an F1 score of 64.29% on the official evaluation test data, which ranks 14 out of 30 of the total submissions for Track 10. The TwiBERT model is released at https://huggingface.co/sakrah/TwiBERT.

UR - http://www.scopus.com/inward/record.url?scp=85161026453&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85161026453&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85161026453

T3 - 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop

SP - 1697

EP - 1701

BT - 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop

A2 - Ojha, Atul Kr.

A2 - Dogruoz, A. Seza

A2 - Da San Martino, Giovanni

A2 - Madabushi, Harish Tayyar

A2 - Kumar, Ritesh

A2 - Sartori, Elisa

PB - Association for Computational Linguistics

Y2 - 13 July 2023 through 14 July 2023

ER -

DuluthNLP at SemEval-2023 Task 12: AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset

Abstract

Publication series

Conference

Bibliographical note

OpenUrl availability

Other files and links

Fingerprint

Cite this