Application of ChatGPT as a content generation tool in continuing medical education: acne as a test topic

The large language model (LLM) ChatGPT can answer open-ended and complex questions, but its accuracy in providing reliable medical information requires a careful assessment. As part of the AICHECK (Artificial Intelligence for CME Health E-learning Contents and Knowledge) Study, aimed at evaluating t...

Full description

Saved in:
Bibliographic Details
Main Authors: Luigi Naldi (Author), Vincenzo Bettoli (Author), Eugenio Santoro (Author), Maria Rosa Valetto (Author), Anna Bolzon (Author), Fortunato Cassalia (Author), Simone Cazzaniga (Author), Sergio Cima (Author), Andrea Danese (Author), Silvia Emendi (Author), Monica Ponzano (Author), Nicoletta Scarpa (Author), Pietro Dri (Author)
Format: Book
Published: PAGEPress Publications, 2024-11-01T00:00:00Z.
Subjects:
Online Access:Connect to this object online.
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The large language model (LLM) ChatGPT can answer open-ended and complex questions, but its accuracy in providing reliable medical information requires a careful assessment. As part of the AICHECK (Artificial Intelligence for CME Health E-learning Contents and Knowledge) Study, aimed at evaluating the potential of ChatGPT in continuous medical education (CME), we compared ChatGPT-generated educational contents to the recommendations of the National Institute for Health and Care Excellence (NICE) guidelines on acne vulgaris. ChatGPT version 4 was exposed to a 23-item questionnaire developed by an experienced dermatologist. A panel of five dermatologists rated the answers positively in terms of "quality" (87.8%), "readability" (94.8%), "accuracy" (75.7%), "thoroughness" (85.2%), and "consistency" with guidelines (76.8%). The references provided by ChatGPT obtained positive ratings for "pertinence" (94.6%), "relevance" (91.2%), and "update" (62.3%). The internal reproducibility was adequate both for answers (93.5%) and references (67.4%). Answers related to issues of uncertainty and/or controversy in the scientific community scored the lowest. This study underscores the need to develop rigorous evaluation criteria for AI-generated medical content and for expert oversight to ensure accuracy and guideline adherence.
Item Description:10.4081/dr.2024.10138
2036-7392
2036-7406