⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-11-16 更新
NeuroCLIP: Brain-Inspired Prompt Tuning for EEG-to-Image Multimodal Contrastive Learning
Authors:Jiyuan Wang, Li Zhang, Haipeng Lin, Qile Liu, Gan Huang, Ziyu Li, Zhen Liang, Xia Wu
Recent advances in brain-inspired artificial intelligence have sought to align neural signals with visual semantics using multimodal models such as CLIP. However, existing methods often treat CLIP as a static feature extractor, overlooking its adaptability to neural representations and the inherent physiological-symbolic gap in EEG-image alignment. To address these challenges, we present NeuroCLIP, a prompt tuning framework tailored for EEG-to-image contrastive learning. Our approach introduces three core innovations: (1) We design a dual-stream visual embedding pipeline that combines dynamic filtering and token-level fusion to generate instance-level adaptive prompts, which guide the adjustment of patch embedding tokens based on image content, thereby enabling fine-grained modulation of visual representations under neural constraints; (2) We are the first to introduce visual prompt tokens into EEG-image alignment, acting as global, modality-level prompts that work in conjunction with instance-level adjustments. These visual prompt tokens are inserted into the Transformer architecture to facilitate neural-aware adaptation and parameter optimization at a global level; (3) Inspired by neuroscientific principles of human visual encoding, we propose a refined contrastive loss that better model the semantic ambiguity and cross-modal noise present in EEG signals. On the THINGS-EEG2 dataset, NeuroCLIP achieves a Top-1 accuracy of 63.2% in zero-shot image retrieval, surpassing the previous best method by +12.3%, and demonstrates strong generalization under inter-subject conditions (+4.6% Top-1), highlighting the potential of physiology-aware prompt tuning for bridging brain signals and visual semantics.
最近,脑启发人工智能的进展通过使用CLIP等多模态模型来对齐神经信号和视觉语义。然而,现有方法往往将CLIP视为静态特征提取器,忽略了其适应神经表征的能力以及脑电图与图像对齐中固有的生理-符号差距。为了解决这些挑战,我们提出了NeuroCLIP,这是一个专为脑电图到图像的对比学习设计的提示调整框架。我们的方法引入了三个核心创新点:(1)我们设计了一个双流视觉嵌入管道,结合动态过滤和令牌级融合,生成实例级自适应提示,根据图像内容引导补丁嵌入令牌的调整,从而在神经约束下实现精细粒度的视觉表示的调制;(2)我们首次将视觉提示令牌引入到脑电图与图像对齐中,作为全局、模态级别的提示,与实例级别的调整协同工作。这些视觉提示令牌被插入到Transformer架构中,促进神经感知的适应性和全局参数优化;(3)受神经科学中人类视觉编码原理的启发,我们提出了一种精细的对比损失,更好地模拟脑电图信号中语义的模糊性和跨模态噪声。在THINGS-EEG2数据集上,NeuroCLIP在零样本图像检索中实现了63.2%的Top-1准确率,超越了之前最佳方法的+12.3%,并在跨主体条件下表现出强大的泛化能力(+4.6% Top-1),凸显了生理感知提示调整在弥合脑信号和视觉语义方面的潜力。
论文及项目相关链接
Summary
本文介绍了针对脑电信号与图像对比学习的神经CLIP技术。它设计了一个双流视觉嵌入管道,结合动态过滤和令牌级融合生成实例级自适应提示,指导基于图像内容的补丁嵌入令牌的调整。此外,首次将视觉提示令牌引入脑电-图像对齐中,插入到Transformer架构中,促进神经感知的适应性和全局参数优化。最后,受到神经科学原理的启发,提出了一种精细的对比损失,更好地模拟EEG信号中的语义模糊和跨模态噪声。在THINGS-EEG2数据集上,NeuroCLIP实现了零样本图像检索的Top-1准确率63.2%,超越了之前最佳方法+12.3%,并在跨主体条件下表现出强大的泛化能力(+4.6% Top-1),展示了生理感知提示调整在桥接脑信号和视觉语义方面的潜力。
Key Takeaways
- NeuroCLIP利用双流视觉嵌入管道结合动态过滤和令牌融合生成自适应提示,指导基于图像内容的补丁嵌入令牌的调整。
- 首次引入视觉提示令牌到EEG-图像对齐中,促进神经感知的适应性和全局参数优化。
- 精细的对比损失可以更好地模拟EEG信号中的语义模糊和跨模态噪声。
- NeuroCLIP在THINGS-EEG2数据集上实现了较高的零样本图像检索准确率,表现出强大的性能。
- 该技术超越了现有方法,并在跨主体条件下表现出良好的泛化能力。
- NeuroCLIP有助于桥接脑信号和视觉语义,展示了生理感知提示调整在这一领域的潜力。
点此查看论文截图
Improve Contrastive Clustering Performance by Multiple Fusing-Augmenting ViT Blocks
Authors:Cheng Wang, Shuisheng Zhou, Fengjiao Peng, Jin Sheng, Feng Ye, Yinli Dong
In the field of image clustering, the widely used contrastive learning networks improve clustering performance by maximizing the similarity between positive pairs and the dissimilarity of negative pairs of the inputs. Extant contrastive learning networks, whose two encoders often implicitly interact with each other by parameter sharing or momentum updating, may not fully exploit the complementarity and similarity of the positive pairs to extract clustering features from input data. To explicitly fuse the learned features of positive pairs, we design a novel multiple fusing-augmenting ViT blocks (MFAVBs) based on the excellent feature learning ability of Vision Transformers (ViT). Firstly, two preprocessed augmentions as positive pairs are separately fed into two shared-weight ViTs, then their output features are fused to input into a larger ViT. Secondly, the learned features are split into a pair of new augmented positive samples and passed to the next FAVBs, enabling multiple fusion and augmention through MFAVBs operations. Finally, the learned features are projected into both instance-level and clustering-level spaces to calculate the cross-entropy loss, followed by parameter updates by backpropagation to finalize the training process. To further enhance ability of the model to distinguish between similar images, our input data for the network we propose is preprocessed augmentions with features extracted from the CLIP pretrained model. Our experiments on seven public datasets demonstrate that MFAVBs serving as the backbone for contrastive clustering outperforms the state-of-the-art techniques in terms of clustering performance.
在图像聚类领域,广泛使用的对比学习网络通过最大化正样本对之间的相似性和负样本对之间的不相似性来提高聚类性能。现有的对比学习网络,其两个编码器通常通过参数共享或动量更新隐式地相互交互,可能无法充分利用正样本对的互补性和相似性来从输入数据中提取聚类特征。为了明确地融合正样本对的学习特征,我们基于视觉变压器(ViT)的优秀特征学习能力,设计了一种新型的多融合增强ViT块(MFAVBs)。首先,将两个预处理的增强版本作为正样本对分别输入到两个共享权重的ViTs中,然后将它们的输出特征融合并输入到一个更大的ViT中。其次,将学习到的特征分割成一对新的增强正样本并传递给下一个FAVBs,通过MFAVBs操作实现多次融合和增强。最后,将学习到的特征投影到实例级和聚类级空间,计算交叉熵损失,然后通过反向传播进行参数更新以完成训练过程。为了进一步提高模型区分相似图像的能力,我们为所提出的网络使用的输入数据是CLIP预训练模型提取的特征的预处理增强版本。我们在七个公共数据集上的实验表明,作为对比聚类的骨干网,MFAVBs在聚类性能上超过了最先进的技术。
论文及项目相关链接
Summary
基于对比学习网络,通过最大化正负样本对的相似性和差异性来提升图像聚类的性能。为更充分地提取聚类特征,研究设计了一种基于视觉转换器(ViT)的多融合增强ViT块(MFAVBs)。通过预处理的增强正样本对输入到共享权重的ViT中,融合特征后再输入更大的ViT。此外,将学习到的特征分裂成新的增强正样本对,多次融合和增强。最后,将特征投影到实例级别和聚类级别空间,计算交叉熵损失,通过反向传播更新参数来完成训练过程。使用CLIP预训练模型的特征预处理输入数据,进一步提高模型区分相似图像的能力。在七个公开数据集上的实验表明,MFAVBs作为对比聚类的骨干网,在聚类性能上超越了最先进技术。
Key Takeaways
- 对比学习网络通过最大化正负样本对的相似性和差异性来提升图像聚类的性能。
- MFAVBs基于视觉转换器(ViT)设计,能更充分地提取聚类特征。
- 融合预处理的增强正样本对,提高模型学习效果。
- 通过多次融合和增强学习到的特征,进一步提高模型性能。
- 特征被投影到实例级别和聚类级别空间,计算交叉熵损失以优化模型。
- 使用CLIP预训练模型的特征预处理输入数据,增强模型区分相似图像的能力。
点此查看论文截图
DI3CL: Contrastive Learning With Dynamic Instances and Contour Consistency for SAR Land-Cover Classification Foundation Model
Authors:Zhongle Ren, Hui Ding, Kai Wang, Biao Hou, Xingyu Luo, Weibin Li, Licheng Jiao
Although significant advances have been achieved in SAR land-cover classification, recent methods remain predominantly focused on supervised learning, which relies heavily on extensive labeled datasets. This dependency not only limits scalability and generalization but also restricts adaptability to diverse application scenarios. In this paper, a general-purpose foundation model for SAR land-cover classification is developed, serving as a robust cornerstone to accelerate the development and deployment of various downstream models. Specifically, a Dynamic Instance and Contour Consistency Contrastive Learning (DI3CL) pre-training framework is presented, which incorporates a Dynamic Instance (DI) module and a Contour Consistency (CC) module. DI module enhances global contextual awareness by enforcing local consistency across different views of the same region. CC module leverages shallow feature maps to guide the model to focus on the geometric contours of SAR land-cover objects, thereby improving structural discrimination. Additionally, to enhance robustness and generalization during pre-training, a large-scale and diverse dataset named SARSense, comprising 460,532 SAR images, is constructed to enable the model to capture comprehensive and representative features. To evaluate the generalization capability of our foundation model, we conducted extensive experiments across a variety of SAR land-cover classification tasks, including SAR land-cover mapping, water body detection, and road extraction. The results consistently demonstrate that the proposed DI3CL outperforms existing methods. Our code and pre-trained weights are publicly available at: https://github.com/SARpre-train/DI3CL.
尽管SAR土地覆盖分类已经取得了重大进展,但最近的方法仍然主要侧重于监督学习,这严重依赖于大量的标记数据集。这种依赖不仅限制了可扩展性和通用性,还限制了其在各种应用场景中的适应性。本文开发了一个用于SAR土地覆盖分类的通用基础模型,作为加速各种下游模型开发和部署的稳健基石。具体而言,提出了一种动态实例和轮廓一致性对比学习(DI3CL)预训练框架,该框架结合了动态实例(DI)模块和轮廓一致性(CC)模块。DI模块通过强制同一地区不同视图之间的局部一致性,增强了全局上下文意识。CC模块利用浅层特征图引导模型关注SAR土地覆盖对象的几何轮廓,从而提高结构辨别能力。此外,为了增强预训练过程中的稳健性和通用性,构建了一个名为SARSense的大规模多样化数据集,包含460532张SAR图像,使模型能够捕获全面和代表性的特征。为了评估我们基础模型的通用性,我们进行了大量实验,涵盖了各种SAR土地覆盖分类任务,包括SAR土地覆盖映射、水体检测、道路提取等。结果一致表明,所提出的DI3CL优于现有方法。我们的代码和预训练权重可在以下网址公开获取:https://github.com/SARpre-train/DI3CL。
论文及项目相关链接
PDF 18 pages, 10 figures;Submitted to IEEE Transactions on Image Processing (TIP); In peer review
Summary
本文提出了一种用于SAR土地覆盖分类的通用基础模型,该模型采用动态实例和轮廓一致性对比学习(DI3CL)预训练框架。该框架包括动态实例(DI)模块和轮廓一致性(CC)模块,可增强全局上下文感知能力,提高结构辨别能力。此外,为了增强预训练期间的鲁棒性和泛化能力,构建了一个大规模、多样化的数据集SARSense。实验结果表明,所提出的DI3CL在SAR土地覆盖分类任务上优于现有方法。
Key Takeaways
- 本文提出了一种用于SAR土地覆盖分类的基础模型,结合了动态实例(DI)模块和轮廓一致性(CC)模块,提高了模型的全球上下文感知和结构辨别能力。
- 为了增强模型的鲁棒性和泛化能力,构建了一个大规模、多样化的数据集SARSense。
- 采用对比学习预训练策略,使模型能够捕捉更全面、更具代表性的特征。
- 在多种SAR土地覆盖分类任务上进行了广泛的实验验证,包括SAR土地覆盖映射、水体检测、道路提取等。
- 实验结果表明,所提出的DI3CL框架在SAR土地覆盖分类任务上表现优于现有方法。
- 该模型的代码和预训练权重已公开发布在GitHub上。
点此查看论文截图
Semantic-Consistent Bidirectional Contrastive Hashing for Noisy Multi-Label Cross-Modal Retrieval
Authors:Likang Peng, Chao Su, Wenyuan Wu, Yuan Sun, Dezhong Peng, Xi Peng, Xu Wang
Cross-modal hashing (CMH) facilitates efficient retrieval across different modalities (e.g., image and text) by encoding data into compact binary representations. While recent methods have achieved remarkable performance, they often rely heavily on fully annotated datasets, which are costly and labor-intensive to obtain. In real-world scenarios, particularly in multi-label datasets, label noise is prevalent and severely degrades retrieval performance. Moreover, existing CMH approaches typically overlook the partial semantic overlaps inherent in multi-label data, limiting their robustness and generalization. To tackle these challenges, we propose a novel framework named Semantic-Consistent Bidirectional Contrastive Hashing (SCBCH). The framework comprises two complementary modules: (1) Cross-modal Semantic-Consistent Classification (CSCC), which leverages cross-modal semantic consistency to estimate sample reliability and reduce the impact of noisy labels; (2) Bidirectional Soft Contrastive Hashing (BSCH), which dynamically generates soft contrastive sample pairs based on multi-label semantic overlap, enabling adaptive contrastive learning between semantically similar and dissimilar samples across modalities. Extensive experiments on four widely-used cross-modal retrieval benchmarks validate the effectiveness and robustness of our method, consistently outperforming state-of-the-art approaches under noisy multi-label conditions.
跨模态哈希(CMH)通过将数据编码为紧凑的二进制表示来促进不同模态(例如图像和文本)之间的有效检索。虽然最近的方法已经取得了显著的性能,但它们通常严重依赖于完全标注的数据集,而这些数据集的获取成本高昂且劳动密集。在现实世界场景中,特别是在多标签数据集中,标签噪声普遍存在,严重降低了检索性能。此外,现有的CMH方法通常忽略了多标签数据固有的部分语义重叠,限制了它们的稳健性和通用性。为了应对这些挑战,我们提出了一种名为语义一致双向对比哈希(SCBCH)的新框架。该框架包括两个互补模块:(1)跨模态语义一致分类(CSCC),它利用跨模态语义一致性来估计样本可靠性并减少噪声标签的影响;(2)双向软对比哈希(BSCH),其基于多标签语义重叠动态生成软对比样本对,使模态之间语义相似和不相似的样本能够进行自适应对比学习。在四个广泛使用的跨模态检索基准数据集上的大量实验验证了我们的方法的有效性和稳健性,在噪声多标签条件下始终优于最新技术的方法。
论文及项目相关链接
Summary
本文介绍了跨模态哈希(CMH)技术在不同模态(如图像和文本)之间的检索应用。针对现有方法依赖全标注数据集的问题,以及真实场景中标签噪声和多标签数据语义重叠的问题,提出了一种新的框架——语义一致双向对比哈希(SCBCH)。该框架包括两个互补模块:利用跨模态语义一致性进行样本可靠性估计和减少噪声影响的跨模态语义一致分类(CSCC),以及基于多标签语义重叠动态生成软对比样本对的双向软对比哈希(BSCH)。实验证明,该方法在跨模态检索任务上表现优异,特别是在噪声多标签条件下,相较于现有方法更具效果和稳健性。
Key Takeaways
- 跨模态哈希(CMH)能将不同模态的数据编码成紧凑的二进制表示,从而提高检索效率。
- 现有方法多依赖全标注数据集,成本高昂且劳动密集。
- 真实场景中标签噪声普遍,严重影响检索性能。
- 多标签数据中的部分语义重叠被现有CMH方法所忽视,限制其稳健性和泛化能力。
- 提出了一种新的框架SCBCH,包含跨模态语义一致分类(CSCC)和双向软对比哈希(BSCH)两个模块。
- CSCC利用跨模态语义一致性估计样本可靠性,降低噪声影响。
- BSCH基于多标签语义重叠动态生成软对比样本对,实现自适应对比学习。
点此查看论文截图
Adapted Foundation Models for Breast MRI Triaging in Contrast-Enhanced and Non-Contrast Enhanced Protocols
Authors:Tri-Thien Nguyen, Lorenz A. Kapsner, Tobias Hepp, Shirin Heidarikahkesh, Hannes Schreiter, Luise Brock, Dominika Skwierawska, Dominique Hadler, Julian Hossbach, Evelyn Wenkel, Sabine Ohlmeyer, Frederik B. Laun, Andrzej Liebert, Andreas Maier, Michael Uder, Sebastian Bickelhaupt
Background: Magnetic resonance imaging (MRI) has high sensitivity for breast cancer detection, but interpretation is time-consuming. Artificial intelligence may aid in pre-screening. Purpose: To evaluate the DINOv2-based Medical Slice Transformer (MST) for ruling out significant findings (Breast Imaging Reporting and Data System [BI-RADS] >=4) in contrast-enhanced and non-contrast-enhanced abbreviated breast MRI. Materials and Methods: This institutional review board approved retrospective study included 1,847 single-breast MRI examinations (377 BI-RADS >=4) from an in-house dataset and 924 from an external validation dataset (Duke). Four abbreviated protocols were tested: T1-weighted early subtraction (T1sub), diffusion-weighted imaging with b=1500 s/mm2 (DWI1500), DWI1500+T2-weighted (T2w), and T1sub+T2w. Performance was assessed at 90%, 95%, and 97.5% sensitivity using five-fold cross-validation and area under the receiver operating characteristic curve (AUC) analysis. AUC differences were compared with the DeLong test. False negatives were characterized, and attention maps of true positives were rated in the external dataset. Results: A total of 1,448 female patients (mean age, 49 +/- 12 years) were included. T1sub+T2w achieved an AUC of 0.77 +/- 0.04; DWI1500+T2w, 0.74 +/- 0.04 (p=0.15). At 97.5% sensitivity, T1sub+T2w had the highest specificity (19% +/- 7%), followed by DWI1500+T2w (17% +/- 11%). Missed lesions had a mean diameter <10 mm at 95% and 97.5% thresholds for both T1sub and DWI1500, predominantly non-mass enhancements. External validation yielded an AUC of 0.77, with 88% of attention maps rated good or moderate. Conclusion: At 97.5% sensitivity, the MST framework correctly triaged cases without BI-RADS >=4, achieving 19% specificity for contrast-enhanced and 17% for non-contrast-enhanced MRI. Further research is warranted before clinical implementation.
结果:
翻译:
论文及项目相关链接
PDF 23 pages, 6 figures, 4 tables. Originally submitted to Radiology (RAD-25-2541); under consideration for transfer to Radiology: Artificial Intelligence (RSNA Portfolio Journal)
Summary:
该研究采用DINOv2基础的医疗切片转换器(MST)对乳腺磁共振成像(MRI)进行简化分析,以排除重要发现(BI-RADS≥4)。研究表明,在对比增强和非对比增强简略乳腺MRI中,MST框架在保持高灵敏度(97.5%)的同时,能够正确筛选病例,达到一定的特异性(对于对比增强MRI为19%,对于非对比增强MRI为17%)。该研究为人工智能在乳腺疾病诊断方面的应用提供了新依据。
Key Takeaways:
- 研究采用MST框架对乳腺MRI进行分析,旨在评估其在排除重要发现(BI-RADS≥4)方面的效能。
- 采用了四种简略协议进行测试,包括T1加权早期减法(T1sub)、扩散加权成像(DWI1500)、T2加权等。
- 在高灵敏度(97.5%)下,MST框架具有良好的特异性,对于对比增强和非对比增强MRI分别为19%和17%。
- 遗漏的病变平均直径小于10毫米,主要是非质量增强病变。
- 外部验证结果显示,MST框架的AUC为0.77,其中88%的注意力图被评为良好或中等。
- 研究表明,MST框架在乳腺疾病诊断方面具有一定的应用价值,但需要在临床实施前进行进一步的研究。
点此查看论文截图
C3-Diff: Super-resolving Spatial Transcriptomics via Cross-modal Cross-content Contrastive Diffusion Modelling
Authors:Xiaofei Wang, Stephen Price, Chao Li
The rapid advancement of spatial transcriptomics (ST), i.e., spatial gene expressions, has made it possible to measure gene expression within original tissue, enabling us to discover molecular mechanisms. However, current ST platforms frequently suffer from low resolution, limiting the in-depth understanding of spatial gene expression. Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots. However, it remains a challenge to model the interactions between histology images and gene expressions for effective ST enhancement. This study presents a cross-modal cross-content contrastive diffusion framework, called C3-Diff, for ST enhancement with histology images as guidance. In C3-Diff, we firstly analyze the deficiency of traditional contrastive learning paradigm, which is then refined to extract both modal-invariant and content-invariant features of ST maps and histology images. Further, to overcome the problem of low sequencing sensitivity in ST maps, we perform nosing-based information augmentation on the surface of feature unit hypersphere. Finally, we propose a dynamic cross-modal imputation-based training strategy to mitigate ST data scarcity. We tested C3-Diff by benchmarking its performance on four public datasets, where it achieves significant improvements over competing methods. Moreover, we evaluate C3-Diff on downstream tasks of cell type localization, gene expression correlation and single-cell-level gene expression prediction, promoting AI-enhanced biotechnology for biomedical research and clinical applications. Codes are available at https://github.com/XiaofeiWang2018/C3-Diff.
空间转录学(ST)即空间基因表达的快速发展,使我们能够在原始组织中测量基因表达,从而发现分子机制。然而,当前的ST平台经常面临分辨率低的问题,限制了空间基因表达的深入理解。超分辨率方法通过将组织斑点的基因表达与组织学图像相结合,有望增强ST图。然而,对组织学图像和基因表达之间的相互作用进行建模,以有效地增强ST仍然是一个挑战。本研究提出了一种用于ST增强的跨模态跨内容对比扩散框架,称为C3-Diff,以组织学图像为指导。在C3-Diff中,我们首先对传统的对比学习范式的缺点进行分析,然后对其进行改进,以提取ST图和组织学图像的模态不变和内容不变特征。此外,为了克服ST图中低测序灵敏度的问题,我们在特征单位超球体表面执行基于噪声的信息增强。最后,我们提出了一种基于动态跨模态插值的训练策略,以缓解ST数据稀缺的问题。我们通过四个公共数据集对C3-Diff的性能进行了基准测试,它在竞争方法中取得了显著改进。此外,我们在细胞类型定位、基因表达关联和单细胞水平基因表达预测等下游任务中评估了C3-Diff的表现,促进了人工智能增强生物技术和生物医学研究与临床应用的生物技术。相关代码可在https://github.com/XiaofeiWang2018/C3-Diff上找到。
论文及项目相关链接
Summary
本研究提出了一种基于对比扩散的跨模态跨内容对比学习框架C3-Diff,用于以组织图像为指导增强空间转录学(ST)。通过跨模态跨内容对比学习,结合超分辨率技术,解决ST地图分辨率低的问题。同时,该研究还通过信息增强和动态跨模态训练策略解决了ST数据稀缺的问题。在四个公共数据集上的实验表明,C3-Diff相较于其他方法具有显著优势。
Key Takeaways
- 空间转录学(ST)可测量原始组织内的基因表达,有助于发现分子机制。
- 当前ST平台分辨率较低,限制了空间基因表达的理解。
- C3-Diff框架旨在通过结合组织图像与基因表达数据,提高ST地图的分辨率。
- C3-Diff分析传统对比学习范式的不足,并提取模态和内容不变的特征。
- 通过基于噪声的信息增强技术解决ST地图低测序灵敏度问题。
- C3-Diff采用动态跨模态训练策略,以缓解ST数据稀缺的问题。
点此查看论文截图
Graph Contrastive Learning for Connectome Classification
Authors:Martín Schmidt, Sara Silva, Federico Larroca, Gonzalo Mateos, Pablo Musé
With recent advancements in non-invasive techniques for measuring brain activity, such as magnetic resonance imaging (MRI), the study of structural and functional brain networks through graph signal processing (GSP) has gained notable prominence. GSP stands as a key tool in unraveling the interplay between the brain’s function and structure, enabling the analysis of graphs defined by the connections between regions of interest – referred to as connectomes in this context. Our work represents a further step in this direction by exploring supervised contrastive learning methods within the realm of graph representation learning. The main objective of this approach is to generate subject-level (i.e., graph-level) vector representations that bring together subjects sharing the same label while separating those with different labels. These connectome embeddings are derived from a graph neural network Encoder-Decoder architecture, which jointly considers structural and functional connectivity. By leveraging data augmentation techniques, the proposed framework achieves state-of-the-art performance in a gender classification task using Human Connectome Project data. More broadly, our connectome-centric methodological advances support the promising prospect of using GSP to discover more about brain function, with potential impact to understanding heterogeneity in the neurodegeneration for precision medicine and diagnosis.
随着无创测量脑活动技术的最新进展,如磁共振成像(MRI),通过图信号处理(GSP)研究结构和功能脑网络已经变得尤为重要。GSP成为揭示大脑功能与作用之间相互作用的关键工具,能够分析由感兴趣区域之间的连接所定义的图的分析,这在本文中被称为连接组。我们的工作朝着这个方向迈出了一步,探索了图表示学习领域内的监督对比学习方法。这种方法的主要目标是生成主体级别的(即图级别的)向量表示,这些表示将共享相同标签的主体聚集在一起,同时将具有不同标签的主体分开。这些连接体嵌入来源于图神经网络编码器-解码器架构,同时考虑结构和功能连接。通过利用数据增强技术,所提出的框架在利用人类连接组项目数据的性别分类任务上达到了最先进的性能。更广泛地说,我们的以连接组为中心的方法论进步支持了使用GSP来发现更多关于大脑功能的希望,对精确医学和诊断中理解神经变异的异质性具有潜在影响。
论文及项目相关链接
PDF Presented at Asilomar Conference on Signals, Systems, and Computers 2025
Summary
近期非侵入性技术(如磁共振成像)的进展推动了通过图信号处理(GSP)研究脑网络和功能的重要性。本研究探索了图表示学习中的监督对比学习方法,旨在生成主体级别的向量表示,使相同标签的主体聚集在一起,不同标签的主体分开。利用数据增强技术,所提框架在人类连接组项目数据中实现了性别分类任务的卓越性能。研究为利用GSP深入了解脑功能提供了前景,对精确医学和诊断的神经退行性疾病的异质性具有潜在影响。
Key Takeaways
- 先进的非侵入性技术如磁共振成像(MRI)促进了脑网络和功能的研究。
- 图信号处理(GSP)在解析脑功能与作用中起到关键作用。
- 研究采用监督对比学习方法,旨在通过图表示学习生成主体级别的向量表示。
- 该方法利用数据增强技术,实现了性别分类任务的卓越性能。
- 研究为利用GSP深入了解脑功能提供了前景。
- 该研究对精确医学和诊断的神经退行性疾病的异质性具有潜在影响。
点此查看论文截图
CSPCL: Category Semantic Prior Contrastive Learning for Deformable DETR-Based Prohibited Item Detectors
Authors:Mingyuan Li, Tong Jia, Hao Wang, Bowen Ma, Hui Lu, Shiyi Guo, Da Cai, Dongyue Chen
Prohibited item detection based on X-ray images is one of the most effective security inspection methods. However, the foreground-background feature coupling caused by the overlapping phenomenon specific to X-ray images makes general detectors designed for natural images perform poorly. To address this issue, we propose a Category Semantic Prior Contrastive Learning (CSPCL) mechanism, which aligns the class prototypes perceived by the classifier with the content queries to correct and supplement the missing semantic information responsible for classification, thereby enhancing the model sensitivity to foreground features. To achieve this alignment, we design a specific contrastive loss, CSP loss, which comprises the Intra-Class Truncated Attraction (ITA) loss and the Inter-Class Adaptive Repulsion (IAR) loss, and outperforms classic contrastive losses. Specifically, the ITA loss leverages class prototypes to attract intra-class content queries and preserves essential intra-class diversity via a gradient truncation function. The IAR loss employs class prototypes to adaptively repel inter-class content queries, with the repulsion strength scaled by prototype-prototype similarity, thereby improving inter-class discriminability, especially among similar categories. CSPCL is general and can be easily integrated into Deformable DETR-based models. Extensive experiments on the PIXray, OPIXray, PIDray, and CLCXray datasets demonstrate that CSPCL significantly enhances the performance of various state-of-the-art models without increasing inference complexity. The code is publicly available at https://github.com/Limingyuan001/CSPCL.
基于X射线图像的违禁品检测是最有效的安全检查方法之一。然而,X射线图像特有的重叠现象导致的前景-背景特征耦合,使得针对自然图像设计的通用检测器表现不佳。为了解决这一问题,我们提出了一种名为CSPCL(类别语义先验对比学习)的机制,它将分类器感知到的类别原型与内容查询对齐,以纠正和补充负责分类的缺失语义信息,从而提高模型对前景特征的敏感性。为了实现这种对齐,我们设计了一种特定的对比损失,即CSP损失,它包括类内截断吸引损失(ITA损失)和类间自适应排斥损失(IAR损失),并优于经典对比损失。具体而言,ITA损失利用类别原型吸引类内内容查询,并通过梯度截断函数保持类内多样性。IAR损失利用类别原型自适应地排斥类间内容查询,排斥强度随原型-原型相似性而缩放,从而提高类间可分辨性,特别是在相似类别之间。CSPCL是通用的,可以轻松地集成到基于可变形DETR的模型中。在PIXray、OPIXray、PIDray和CLCXray数据集上的大量实验表明,CSPCL可以显著提高各种最新模型的性能,而不会增加推理复杂度。代码公开可用在https://github.com/Limingyuan001/CSPCL。
论文及项目相关链接
PDF 22 pages, 5 figures
Summary:针对X光图像中的禁止物品检测问题,由于图像中物品重叠导致的特征耦合现象使得通用检测器表现不佳。为此,我们提出了基于类别语义先验对比学习(CSPCL)的机制,通过使分类器感知的类别原型与内容查询对齐,补充缺失的语义信息,提高模型对前景特征的敏感性。设计了特定的对比损失CSP损失,包括截断吸引损失和自适应排斥损失,并在多个数据集上验证了其显著提升了模型的性能。代码已公开。
Key Takeaways:
- X光图像中的禁止物品检测是有效的安全检测方法。
- 由于物品重叠导致的特征耦合现象使得通用检测器在X光图像上表现不佳。
- 提出了基于类别语义先验对比学习(CSPCL)的机制,增强模型对前景特征的敏感性。
- 设计了CSP损失,包括截断吸引损失和自适应排斥损失。
- CSPCL机制可以轻易集成到基于可变形DETR的模型中。
- 在多个数据集上的实验表明,CSPCL显著提升了模型的性能且不会增加推理复杂度。
点此查看论文截图
Enhancing Multimodal Medical Image Classification using Cross-Graph Modal Contrastive Learning
Authors:Jun-En Ding, Chien-Chin Hsu, Chi-Hsiang Chu, Shuqiang Wang, Feng Liu
The classification of medical images is a pivotal aspect of disease diagnosis, often enhanced by deep learning techniques. However, traditional approaches typically focus on unimodal medical image data, neglecting the integration of diverse non-image patient data. This paper proposes a novel Cross-Graph Modal Contrastive Learning (CGMCL) framework for multimodal structured data from different data domains to improve medical image classification. The model effectively integrates both image and non-image data by constructing cross-modality graphs and leveraging contrastive learning to align multimodal features in a shared latent space. An inter-modality feature scaling module further optimizes the representation learning process by reducing the gap between heterogeneous modalities. The proposed approach is evaluated on two datasets: a Parkinson’s disease (PD) dataset and a public melanoma dataset. Results demonstrate that CGMCL outperforms conventional unimodal methods in accuracy, interpretability, and early disease prediction. Additionally, the method shows superior performance in multi-class melanoma classification. The CGMCL framework provides valuable insights into medical image classification while offering improved disease interpretability and predictive capabilities.
医疗图像分类是疾病诊断的核心环节,经常借助深度学习技术来提升效果。然而,传统方法通常关注单模态医疗图像数据,忽略了不同非图像患者数据的整合。本文针对来自不同数据域的多媒体结构数据,提出了一种新颖的跨图模态对比学习(CGMCL)框架,以提高医学图像分类的效果。该模型通过构建跨模态图并利用对比学习来对齐多模态特征在共享潜在空间,从而有效地整合图像和非图像数据。跨模态特征缩放模块进一步优化了表示学习过程,缩小了不同模态之间的差距。所提出的方法在两个数据集上进行了评估:帕金森病(PD)数据集和公共黑色素瘤数据集。结果表明,在准确性、可解释性和早期疾病预测方面,CGMCL优于传统的单模态方法。此外,该方法在多类黑色素瘤分类方面表现出卓越的性能。CGMCL框架为医学图像分类提供了有价值的见解,同时提高了疾病可解释性和预测能力。
论文及项目相关链接
Summary
本文提出了一种新型的跨图模态对比学习(CGMCL)框架,用于多模态结构化数据的医学图像分类。该框架通过构建跨模态图并利用对比学习,有效整合图像和非图像数据,在共享潜在空间中对齐多模态特征。此外,其跨模态特征缩放模块进一步优化了表示学习过程,减少了不同模态之间的差距。在帕金森病和公共黑色素瘤数据集上的评估结果表明,CGMCL在准确性、可解释性和早期疾病预测方面优于传统的单模态方法,并在多类黑色素瘤分类中表现出卓越性能。
Key Takeaways
- CGMCL框架用于医学图像分类,结合多模态结构化数据。
- 通过构建跨模态图,整合图像和非图像数据。
- 利用对比学习在共享潜在空间中对齐多模态特征。
- 跨模态特征缩放模块优化表示学习过程。
- CGMCL在帕金森病和黑色素瘤数据集上的表现优于传统方法。
- CGMCL提高了医学图像分类的准确性和可解释性。