PERFORM Publications

Keyword search (4,163 papers available)

Title		Authors	PubMed ID
1	Attention-Fusion-Based Two-Stream Vision Transformer for Heart Sound Classification	Ranipa K; Zhu WP; Swamy MNS;	41155032 ENCS
2	Lung Nodule Malignancy Classification Integrating Deep and Radiomic Features in a Three-Way Attention-Based Fusion Module	Khademi S; Heidarian S; Afshar P; Mohammadi A; Sidiqi A; Nguyen ET; Ganeshan B; Oikonomou A;	41150036 ENCS
3	CosSIF: Cosine similarity-based image filtering to overcome low inter-class variation in synthetic medical image datasets	Islam M; Zunair H; Mohammed N;	38492455 ENCS

Title:	Attention-Fusion-Based Two-Stream Vision Transformer for Heart Sound Classification
Authors:	Ranipa K, Zhu WP, Swamy MNS
Link:	https://pubmed.ncbi.nlm.nih.gov/41155032/
DOI:	10.3390/bioengineering12101033
Publication:	Bioengineering (Basel, Switzerland)
Keywords:	attention fusion; deep learning; heart sound classification; vision transformer;
PMID:	41155032	Category:	Date Added:	2025-10-29
Dept Affiliation:	ENCS 1 Department of Electrical and Computer Engineering, Concordia University, Montreal, QC H3G 1M8, Canada.

Description:

Vision Transformers (ViTs), inspired by their success in natural language processing, have recently gained attention for heart sound classification (HSC). However, most of the existing studies on HSC rely on single-stream architectures, overlooking the advantages of multi-resolution features. While multi-stream architectures employing early or late fusion strategies have been proposed, they often fall short of effectively capturing cross-modal feature interactions. Additionally, conventional fusion methods, such as concatenation, averaging, or max pooling, frequently result in information loss. To address these limitations, this paper presents a novel attention fusion-based two-stream Vision Transformer (AFTViT) architecture for HSC that leverages two-dimensional mel-cepstral domain features. The proposed method employs a ViT-based encoder to capture long-range dependencies and diverse contextual information at multiple scales. A novel attention block is then used to integrate cross-context features at the feature level, enhancing the overall feature representation. Experiments conducted on the PhysioNet2016 and PhysioNet2022 datasets demonstrate that the AFTViT outperforms state-of-the-art CNN-based methods in terms of accuracy. These results highlight the potential of the AFTViT framework for early diagnosis of cardiovascular diseases, offering a valuable tool for cardiologists and researchers in developing advanced HSC techniques.

BookR: School of Health Core Facilities Booking

"vision transformer" Keyword-tagged Publications:

Search Publications

No results