USCT-UNet: Rethinking the Semantic Gap in U-Net Network From U-Shaped Skip Connections With Multichannel Fusion Transformer

Medical image segmentation is a crucial component of computer-aided clinical diagnosis, with state-of-the-art models often being variants of U-Net. Despite their success, these models’ skip connections introduce an unnecessary semantic gap between the encoder and decoder, which hinders th...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaoshan Xie (Author), Min Yang (Author)
Format: Book
Published: IEEE, 2024-01-01T00:00:00Z.
Subjects:
Online Access:Connect to this object online.
Tags: Add Tag
No Tags, Be the first to tag this record!

MARC

LEADER 00000 am a22000003u 4500
001 doaj_9148946f0dda44888f82aa0a754da99b
042 |a dc 
100 1 0 |a Xiaoshan Xie  |e author 
700 1 0 |a Min Yang  |e author 
245 0 0 |a USCT-UNet: Rethinking the Semantic Gap in U-Net Network From U-Shaped Skip Connections With Multichannel Fusion Transformer 
260 |b IEEE,   |c 2024-01-01T00:00:00Z. 
500 |a 1534-4320 
500 |a 1558-0210 
500 |a 10.1109/TNSRE.2024.3468339 
520 |a Medical image segmentation is a crucial component of computer-aided clinical diagnosis, with state-of-the-art models often being variants of U-Net. Despite their success, these models’ skip connections introduce an unnecessary semantic gap between the encoder and decoder, which hinders their ability to achieve the high precision required for clinical applications. Awareness of this semantic gap and its detrimental influences have increased over time. However, a quantitative understanding of how this semantic gap compromises accuracy and reliability remains lacking, emphasizing the need for effective mitigation strategies. In response, we present the first quantitative evaluation of the semantic gap between corresponding layers of U-Net and identify two key characteristics: 1) The direct skip connection (DSC) exhibits a semantic gap that negatively impacts models’ performance; 2) The magnitude of the semantic gap varies across different layers. Based on these findings, we re-examine this issue through the lens of skip connections. We introduce a Multichannel Fusion Transformer (MCFT) and propose a novel USCT-UNet architecture, which incorporates U-shaped skip connections (USC) to replace DSC, allocates varying numbers of MCFT blocks based on the semantic gap magnitude at different layers, and employs a spatial channel cross-attention (SCCA) module to facilitate the fusion of features between the decoder and USC. We evaluate USCT-UNet on four challenging datasets, and the results demonstrate that it effectively eliminates the semantic gap. Compared to using DSC, our USC and SCCA strategies achieve maximum improvements of 4.79% in the Dice coefficient, 5.70% in mean intersection over union (MIoU), and 3.26 in Hausdorff distance. 
546 |a EN 
690 |a Medical image segmentation 
690 |a semantic gap 
690 |a U-shaped skip connection 
690 |a multichannel fusion transformer 
690 |a spatial channel cross-attention 
690 |a Medical technology 
690 |a R855-855.5 
690 |a Therapeutics. Pharmacology 
690 |a RM1-950 
655 7 |a article  |2 local 
786 0 |n IEEE Transactions on Neural Systems and Rehabilitation Engineering, Vol 32, Pp 3782-3793 (2024) 
787 0 |n https://ieeexplore.ieee.org/document/10695464/ 
787 0 |n https://doaj.org/toc/1534-4320 
787 0 |n https://doaj.org/toc/1558-0210 
856 4 1 |u https://doaj.org/article/9148946f0dda44888f82aa0a754da99b  |z Connect to this object online.