Ascle-A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study

BackgroundMedical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various t...

Full description

Saved in:
Bibliographic Details
Main Authors: Rui Yang (Author), Qingcheng Zeng (Author), Keen You (Author), Yujie Qiao (Author), Lucas Huang (Author), Chia-Chun Hsieh (Author), Benjamin Rosand (Author), Jeremy Goldwasser (Author), Amisha Dave (Author), Tiarnan Keenan (Author), Yuhe Ke (Author), Chuan Hong (Author), Nan Liu (Author), Emily Chew (Author), Dragomir Radev (Author), Zhiyong Lu (Author), Hua Xu (Author), Qingyu Chen (Author), Irene Li (Author)
Format: Book
Published: JMIR Publications, 2024-10-01T00:00:00Z.
Subjects:
Online Access:Connect to this object online.
Tags: Add Tag
No Tags, Be the first to tag this record!

MARC

LEADER 00000 am a22000003u 4500
001 doaj_636cfe56b72c41f0aa9d2e22d25cb671
042 |a dc 
100 1 0 |a Rui Yang  |e author 
700 1 0 |a Qingcheng Zeng  |e author 
700 1 0 |a Keen You  |e author 
700 1 0 |a Yujie Qiao  |e author 
700 1 0 |a Lucas Huang  |e author 
700 1 0 |a Chia-Chun Hsieh  |e author 
700 1 0 |a Benjamin Rosand  |e author 
700 1 0 |a Jeremy Goldwasser  |e author 
700 1 0 |a Amisha Dave  |e author 
700 1 0 |a Tiarnan Keenan  |e author 
700 1 0 |a Yuhe Ke  |e author 
700 1 0 |a Chuan Hong  |e author 
700 1 0 |a Nan Liu  |e author 
700 1 0 |a Emily Chew  |e author 
700 1 0 |a Dragomir Radev  |e author 
700 1 0 |a Zhiyong Lu  |e author 
700 1 0 |a Hua Xu  |e author 
700 1 0 |a Qingyu Chen  |e author 
700 1 0 |a Irene Li  |e author 
245 0 0 |a Ascle-A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study 
260 |b JMIR Publications,   |c 2024-10-01T00:00:00Z. 
500 |a 1438-8871 
500 |a 10.2196/60601 
520 |a BackgroundMedical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings. ObjectiveThis study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. MethodsWe fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics. ResultsThe fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5). ConclusionsThis study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face. 
546 |a EN 
690 |a Computer applications to medicine. Medical informatics 
690 |a R858-859.7 
690 |a Public aspects of medicine 
690 |a RA1-1270 
655 7 |a article  |2 local 
786 0 |n Journal of Medical Internet Research, Vol 26, p e60601 (2024) 
787 0 |n https://www.jmir.org/2024/1/e60601 
787 0 |n https://doaj.org/toc/1438-8871 
856 4 1 |u https://doaj.org/article/636cfe56b72c41f0aa9d2e22d25cb671  |z Connect to this object online.