Keyword search (4,163 papers available)

"vision transformer" Keyword-tagged Publications:

Title Authors PubMed ID
1 Attention-Fusion-Based Two-Stream Vision Transformer for Heart Sound Classification Ranipa K; Zhu WP; Swamy MNS; 41155032
ENCS
2 Lung Nodule Malignancy Classification Integrating Deep and Radiomic Features in a Three-Way Attention-Based Fusion Module Khademi S; Heidarian S; Afshar P; Mohammadi A; Sidiqi A; Nguyen ET; Ganeshan B; Oikonomou A; 41150036
ENCS
3 CosSIF: Cosine similarity-based image filtering to overcome low inter-class variation in synthetic medical image datasets Islam M; Zunair H; Mohammed N; 38492455
ENCS

 

Title:Attention-Fusion-Based Two-Stream Vision Transformer for Heart Sound Classification
Authors:Ranipa KZhu WPSwamy MNS
Link:https://pubmed.ncbi.nlm.nih.gov/41155032/
DOI:10.3390/bioengineering12101033
Publication:Bioengineering (Basel, Switzerland)
Keywords:attention fusiondeep learningheart sound classificationvision transformer
PMID:41155032 Category: Date Added:2025-10-29
Dept Affiliation: ENCS
1 Department of Electrical and Computer Engineering, Concordia University, Montreal, QC H3G 1M8, Canada.

Description:

Vision Transformers (ViTs), inspired by their success in natural language processing, have recently gained attention for heart sound classification (HSC). However, most of the existing studies on HSC rely on single-stream architectures, overlooking the advantages of multi-resolution features. While multi-stream architectures employing early or late fusion strategies have been proposed, they often fall short of effectively capturing cross-modal feature interactions. Additionally, conventional fusion methods, such as concatenation, averaging, or max pooling, frequently result in information loss. To address these limitations, this paper presents a novel attention fusion-based two-stream Vision Transformer (AFTViT) architecture for HSC that leverages two-dimensional mel-cepstral domain features. The proposed method employs a ViT-based encoder to capture long-range dependencies and diverse contextual information at multiple scales. A novel attention block is then used to integrate cross-context features at the feature level, enhancing the overall feature representation. Experiments conducted on the PhysioNet2016 and PhysioNet2022 datasets demonstrate that the AFTViT outperforms state-of-the-art CNN-based methods in terms of accuracy. These results highlight the potential of the AFTViT framework for early diagnosis of cardiovascular diseases, offering a valuable tool for cardiologists and researchers in developing advanced HSC techniques.





BookR developed by Sriram Narayanan
for the Concordia University School of Health
Copyright © 2011-2026
Cookie settings
Concordia University