Understanding the vaccine stance of Italian tweets and addressing language changes through the COVID-19 pandemic: Development and validation of a machine learning model

Social media is increasingly being used to express opinions and attitudes toward vaccines. The vaccine stance of social media posts can be classified in almost real-time using machine learning. We describe the use of a Transformer-based machine learning model for analyzing vaccine stance of Italian...

Full description

Saved in:

Bibliographic Details
Main Authors:	Susan Cheatham (Author), Per E. Kummervold (Author), Lorenza Parisi (Author), Barbara Lanfranchi (Author), Ileana Croci (Author), Francesca Comunello (Author), Maria Cristina Rota (Author), Antonietta Filia (Author), Alberto Eugenio Tozzi (Author), Caterina Rizzo (Author), Francesco Gesualdo (Author)
Format:	Book
Published:	Frontiers Media S.A., 2022-07-01T00:00:00Z.
Subjects:	article
Online Access:	Connect to this object online.
Tags:	Add Tag No Tags, Be the first to tag this record!

MARC


LEADER	00000 am a22000003u 4500
001	doaj_65b8f74d7e2e4b22a7d69badfe5d651b
042			\|a dc
100	1	0	\|a Susan Cheatham \|e author
700	1	0	\|a Per E. Kummervold \|e author
700	1	0	\|a Lorenza Parisi \|e author
700	1	0	\|a Barbara Lanfranchi \|e author
700	1	0	\|a Ileana Croci \|e author
700	1	0	\|a Francesca Comunello \|e author
700	1	0	\|a Maria Cristina Rota \|e author
700	1	0	\|a Antonietta Filia \|e author
700	1	0	\|a Alberto Eugenio Tozzi \|e author
700	1	0	\|a Caterina Rizzo \|e author
700	1	0	\|a Caterina Rizzo \|e author
700	1	0	\|a Francesco Gesualdo \|e author
245	0	0	\|a Understanding the vaccine stance of Italian tweets and addressing language changes through the COVID-19 pandemic: Development and validation of a machine learning model
260			\|b Frontiers Media S.A., \|c 2022-07-01T00:00:00Z.
500			\|a 2296-2565
500			\|a 10.3389/fpubh.2022.948880
520			\|a Social media is increasingly being used to express opinions and attitudes toward vaccines. The vaccine stance of social media posts can be classified in almost real-time using machine learning. We describe the use of a Transformer-based machine learning model for analyzing vaccine stance of Italian tweets, and demonstrate the need to address changes over time in vaccine-related language, through periodic model retraining. Vaccine-related tweets were collected through a platform developed for the European Joint Action on Vaccination. Two datasets were collected, the first between November 2019 and June 2020, the second from April to September 2021. The tweets were manually categorized by three independent annotators. After cleaning, the total dataset consisted of 1,736 tweets with 3 categories (promotional, neutral, and discouraging). The manually classified tweets were used to train and test various machine learning models. The model that classified the data most similarly to humans was XLM-Roberta-large, a multilingual version of the Transformer-based model RoBERTa. The model hyper-parameters were tuned and then the model ran five times. The fine-tuned model with the best F-score over the validation dataset was selected. Running the selected fine-tuned model on just the first test dataset resulted in an accuracy of 72.8% (F-score 0.713). Using this model on the second test dataset resulted in a 10% drop in accuracy to 62.1% (F-score 0.617), indicating that the model recognized a difference in language between the datasets. On the combined test datasets the accuracy was 70.1% (F-score 0.689). Retraining the model using data from the first and second datasets increased the accuracy over the second test dataset to 71.3% (F-score 0.713), a 9% improvement from when using just the first dataset for training. The accuracy over the first test dataset remained the same at 72.8% (F-score 0.721). The accuracy over the combined test datasets was then 72.4% (F-score 0.720), a 2% improvement. Through fine-tuning a machine-learning model on task-specific data, the accuracy achieved in categorizing tweets was close to that expected by a single human annotator. Regular training of machine-learning models with recent data is advisable to maximize accuracy.
546			\|a EN
690			\|a vaccines
690			\|a machine learning
690			\|a artificial intelligence
690			\|a vaccination hesitancy
690			\|a Transformer model
690			\|a Public aspects of medicine
690			\|a RA1-1270
655	7		\|a article \|2 local
786	0		\|n Frontiers in Public Health, Vol 10 (2022)
787	0		\|n https://www.frontiersin.org/articles/10.3389/fpubh.2022.948880/full
787	0		\|n https://doaj.org/toc/2296-2565
856	4	1	\|u https://doaj.org/article/65b8f74d7e2e4b22a7d69badfe5d651b \|z Connect to this object online.

Understanding the vaccine stance of Italian tweets and addressing language changes through the COVID-19 pandemic: Development and validation of a machine learning model

MARC

Similar Items