Early Detection of Pulmonary Embolism in a General Patient Population Immediately Upon Hospital Admission Using Machine Learning to Identify New, Unidentified Risk Factors: Model Development Study

BackgroundUnder- or late identification of pulmonary embolism (PE)-a thrombosis of 1 or more pulmonary arteries that seriously threatens patients' lives-is a major challenge confronting modern medicine. ObjectiveWe aimed to establish accurate and informative machine learning (ML) models to iden...

Full description

Saved in:
Bibliographic Details
Main Authors: Ori Ben Yehuda (Author), Edward Itelman (Author), Adva Vaisman (Author), Gad Segal (Author), Boaz Lerner (Author)
Format: Book
Published: JMIR Publications, 2024-07-01T00:00:00Z.
Subjects:
Online Access:Connect to this object online.
Tags: Add Tag
No Tags, Be the first to tag this record!

MARC

LEADER 00000 am a22000003u 4500
001 doaj_ea8e7b736ddf47da87b12c2a93a1c8ab
042 |a dc 
100 1 0 |a Ori Ben Yehuda  |e author 
700 1 0 |a Edward Itelman  |e author 
700 1 0 |a Adva Vaisman  |e author 
700 1 0 |a Gad Segal  |e author 
700 1 0 |a Boaz Lerner  |e author 
245 0 0 |a Early Detection of Pulmonary Embolism in a General Patient Population Immediately Upon Hospital Admission Using Machine Learning to Identify New, Unidentified Risk Factors: Model Development Study 
260 |b JMIR Publications,   |c 2024-07-01T00:00:00Z. 
500 |a 1438-8871 
500 |a 10.2196/48595 
520 |a BackgroundUnder- or late identification of pulmonary embolism (PE)-a thrombosis of 1 or more pulmonary arteries that seriously threatens patients' lives-is a major challenge confronting modern medicine. ObjectiveWe aimed to establish accurate and informative machine learning (ML) models to identify patients at high risk for PE as they are admitted to the hospital, before their initial clinical checkup, by using only the information in their medical records. MethodsWe collected demographics, comorbidities, and medications data for 2568 patients with PE and 52,598 control patients. We focused on data available prior to emergency department admission, as these are the most universally accessible data. We trained an ML random forest algorithm to detect PE at the earliest possible time during a patient's hospitalization-at the time of his or her admission. We developed and applied 2 ML-based methods specifically to address the data imbalance between PE and non-PE patients, which causes misdiagnosis of PE. ResultsThe resulting models predicted PE based on age, sex, BMI, past clinical PE events, chronic lung disease, past thrombotic events, and usage of anticoagulants, obtaining an 80% geometric mean value for the PE and non-PE classification accuracies. Although on hospital admission only 4% (1942/46,639) of the patients had a diagnosis of PE, we identified 2 clustering schemes comprising subgroups with more than 61% (705/1120 in clustering scheme 1; 427/701 and 340/549 in clustering scheme 2) positive patients for PE. One subgroup in the first clustering scheme included 36% (705/1942) of all patients with PE who were characterized by a definite past PE diagnosis, a 6-fold higher prevalence of deep vein thrombosis, and a 3-fold higher prevalence of pneumonia, compared with patients of the other subgroups in this scheme. In the second clustering scheme, 2 subgroups (1 of only men and 1 of only women) included patients who all had a past PE diagnosis and a relatively high prevalence of pneumonia, and a third subgroup included only those patients with a past diagnosis of pneumonia. ConclusionsThis study established an ML tool for early diagnosis of PE almost immediately upon hospital admission. Despite the highly imbalanced scenario undermining accurate PE prediction and using information available only from the patient's medical history, our models were both accurate and informative, enabling the identification of patients already at high risk for PE upon hospital admission, even before the initial clinical checkup was performed. The fact that we did not restrict our patients to those at high risk for PE according to previously published scales (eg, Wells or revised Genova scores) enabled us to accurately assess the application of ML on raw medical data and identify new, previously unidentified risk factors for PE, such as previous pulmonary disease, in general populations. 
546 |a EN 
690 |a Computer applications to medicine. Medical informatics 
690 |a R858-859.7 
690 |a Public aspects of medicine 
690 |a RA1-1270 
655 7 |a article  |2 local 
786 0 |n Journal of Medical Internet Research, Vol 26, p e48595 (2024) 
787 0 |n https://www.jmir.org/2024/1/e48595 
787 0 |n https://doaj.org/toc/1438-8871 
856 4 1 |u https://doaj.org/article/ea8e7b736ddf47da87b12c2a93a1c8ab  |z Connect to this object online.