发布日期: 2025-09-28

更新日期: 2025-11-27

文章字数: 5.8k

阅读时长: 23 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-28 更新

Revolutionizing Precise Low Back Pain Diagnosis via Contrastive Learning

Authors:Thanh Binh Le, Hoang Nhat Khang Vo, Tan-Ha Mai, Trong Nhan Phan

Low back pain affects millions worldwide, driving the need for robust diagnostic models that can jointly analyze complex medical images and accompanying text reports. We present LumbarCLIP, a novel multimodal framework that leverages contrastive language-image pretraining to align lumbar spine MRI scans with corresponding radiological descriptions. Built upon a curated dataset containing axial MRI views paired with expert-written reports, LumbarCLIP integrates vision encoders (ResNet-50, Vision Transformer, Swin Transformer) with a BERT-based text encoder to extract dense representations. These are projected into a shared embedding space via learnable projection heads, configurable as linear or non-linear, and normalized to facilitate stable contrastive training using a soft CLIP loss. Our model achieves state-of-the-art performance on downstream classification, reaching up to 95.00% accuracy and 94.75% F1-score on the test set, despite inherent class imbalance. Extensive ablation studies demonstrate that linear projection heads yield more effective cross-modal alignment than non-linear variants. LumbarCLIP offers a promising foundation for automated musculoskeletal diagnosis and clinical decision support.

腰痛影响全球数百万人，迫切需要建立稳健的诊断模型，能够联合分析复杂的医学图像和伴随的文本报告。我们提出了LumbarCLIP，这是一种新型的多模式框架，利用对比语言图像预训练，将腰椎MRI扫描与相应的放射学描述对齐。LumbarCLIP建立在精选数据集上，该数据集包含轴向MRI视图与专家撰写的报告配对，它将视觉编码器（ResNet-50、Vision Transformer、Swin Transformer）与基于BERT的文本编码器相结合，以提取密集表示。这些表示通过可学习的投影头（可配置为线性或非线性）投影到共享嵌入空间中，并进行归一化，以便使用软CLIP损失进行稳定的对比训练。我们的模型在下游分类任务上达到了最先进的性能，在测试集上达到了95.00%的准确率和94.75%的F1分数，尽管存在固有的类别不平衡问题。广泛的消融研究表明，与非线性变体相比，线性投影头更有助于实现跨模式的对齐。LumbarCLIP为自动化骨骼肌肉疾病的诊断和临床决策支持提供了有前景的基础。

论文及项目相关链接

PDF 12 pages, 4 figures

Summary

本文介绍了一种名为LumbarCLIP的新颖多模态诊断框架，该框架利用对比语言图像预训练技术，将腰椎MRI扫描与相应的放射学描述进行对齐。LumbarCLIP集成了视觉编码器（ResNet-50、Vision Transformer、Swin Transformer）和基于BERT的文本编码器，以提取密集表示，并通过可学习的投影头（线性或非线性）投影到共享嵌入空间中，使用软CLIP损失进行归一化，以促进稳定的对比训练。该模型在下游分类任务上实现了卓越的性能，在测试集上达到了95.00%的准确率和94.75%的F1分数。

Key Takeaways

LumbarCLIP是一个多模态诊断框架，用于分析腰椎MRI扫描和相应的文本报告。
利用对比语言图像预训练技术，实现MRI扫描与放射学描述的对齐。
框架集成了视觉编码器和基于BERT的文本编码器。
通过可学习的投影头将信息投影到共享嵌入空间，并可以使用线性或非线性配置。
使用软CLIP损失进行归一化，促进稳定的对比训练。
LumbarCLIP在下游分类任务上表现出卓越性能，达到95.00%的准确率和94.75%的F1分数。
线性投影头在跨模态对齐方面比非线性变体更有效。

Cool Papers

点此查看论文截图

A Contrastive Learning Framework for Breast Cancer Detection

Authors:Samia Saeed, Khuram Naveed

Breast cancer, the second leading cause of cancer-related deaths globally, accounts for a quarter of all cancer cases [1]. To lower this death rate, it is crucial to detect tumors early, as early-stage detection significantly improves treatment outcomes. Advances in non-invasive imaging techniques have made early detection possible through computer-aided detection (CAD) systems which rely on traditional image analysis to identify malignancies. However, there is a growing shift towards deep learning methods due to their superior effectiveness. Despite their potential, deep learning methods often struggle with accuracy due to the limited availability of large-labeled datasets for training. To address this issue, our study introduces a Contrastive Learning (CL) framework, which excels with smaller labeled datasets. In this regard, we train Resnet-50 in semi supervised CL approach using similarity index on a large amount of unlabeled mammogram data. In this regard, we use various augmentation and transformations which help improve the performance of our approach. Finally, we tune our model on a small set of labelled data that outperforms the existing state of the art. Specifically, we observed a 96.7% accuracy in detecting breast cancer on benchmark datasets INbreast and MIAS.

乳腺癌是全球第二大癌症致死原因，占所有癌症病例的四分之一（引用1）。为了降低死亡率，早期发现肿瘤至关重要，因为早期发现可以显著改善治疗效果。非侵入性成像技术的进展使得通过计算机辅助检测（CAD）系统进行早期检测成为可能，该系统依赖于传统图像分析来识别恶性肿瘤。然而，由于深度学习方法的卓越效果，人们越来越倾向于使用这些方法。尽管具有潜力，但由于用于训练的大型标记数据集有限，深度学习方法的准确性往往面临挑战。为了解决这个问题，我们的研究引入了对比学习（CL）框架，该框架在较小的标记数据集上表现出色。在这方面，我们使用相似性指数在半监督CL方法中训练Resnet-50，处理大量未标记的乳腺X光图像数据。在这方面，我们使用各种增强和转换，这有助于改进我们的方法性能。最后，我们在少量标记数据上调整我们的模型，其性能超过了现有技术的前沿。具体来说，我们在INbreast和MIAS基准数据集上观察到96.7%的乳腺癌检测准确率。

论文及项目相关链接

PDF

Summary：
乳腺癌是全球癌症死亡的主要原因之一，早期检测对降低死亡率至关重要。本研究引入对比学习（CL）框架，采用半监督方式训练Resnet-50模型，利用大量未标记的乳腺X光图像数据进行相似性指数训练，通过数据增强和变换提高性能，最终在标准数据集上实现了高达96.7%的乳腺癌检测准确率。

Key Takeaways：

乳腺癌是全球癌症死亡的主要原因之一，早期检测对改善治疗结果至关重要。
计算机辅助检测（CAD）系统通过非侵入性成像技术进行早期检测，但深度学习方法的准确性仍需提高。
对比学习（CL）框架在解决深度学习方法的准确性问题上表现出优势，尤其适用于较小标记数据集的训练。
研究采用半监督对比学习（CL）方法训练Resnet-50模型，利用未标记的乳腺X光图像数据进行训练。
通过数据增强和变换提高了模型的性能。
在标准数据集INbreast和MIAS上实现了高达96.7%的乳腺癌检测准确率。

Cool Papers

点此查看论文截图

One-shot Embroidery Customization via Contrastive LoRA Modulation

Authors:Jun Ma, Qian He, Gaofeng He, Huang Chen, Chen Liu, Xiaogang Jin, Huamin Wang

Diffusion models have significantly advanced image manipulation techniques, and their ability to generate photorealistic images is beginning to transform retail workflows, particularly in presale visualization. Beyond artistic style transfer, the capability to perform fine-grained visual feature transfer is becoming increasingly important. Embroidery is a textile art form characterized by intricate interplay of diverse stitch patterns and material properties, which poses unique challenges for existing style transfer methods. To explore the customization for such fine-grained features, we propose a novel contrastive learning framework that disentangles fine-grained style and content features with a single reference image, building on the classic concept of image analogy. We first construct an image pair to define the target style, and then adopt a similarity metric based on the decoupled representations of pretrained diffusion models for style-content separation. Subsequently, we propose a two-stage contrastive LoRA modulation technique to capture fine-grained style features. In the first stage, we iteratively update the whole LoRA and the selected style blocks to initially separate style from content. In the second stage, we design a contrastive learning strategy to further decouple style and content through self-knowledge distillation. Finally, we build an inference pipeline to handle image or text inputs with only the style blocks. To evaluate our method on fine-grained style transfer, we build a benchmark for embroidery customization. Our approach surpasses prior methods on this task and further demonstrates strong generalization to three additional domains: artistic style transfer, sketch colorization, and appearance transfer.

扩散模型在图像操作技术上取得了显著进展，其在生成逼真图像方面的能力开始改变零售工作流程，特别是在售前可视化方面。除了艺术风格转换，执行精细粒度视觉特征转换的能力变得愈发重要。刺绣是一种纺织艺术形式，其特征在于各种刺绣图案和材料属性的复杂交织，这给现有的风格转换方法带来了独特挑战。为了探索此类精细粒度特征的定制，我们提出了一种新颖的对比学习框架，该框架利用单张参考图像来分离精细粒度风格和内容特征，基于图像类比的经典概念。我们首先构建一对图像来定义目标风格，然后采用基于预训练扩散模型的解耦表示相似性度量来进行风格内容分离。随后，我们提出了一个两阶段的对比LoRA调制技术来捕捉精细粒度风格特征。在第一阶段，我们迭代更新整个LoRA和所选的风格块来初步分离风格和内容。在第二阶段，我们设计了一种对比学习策略，通过自我知识蒸馏进一步解耦风格和内容。最后，我们建立了一个推理管道来处理只有风格块的图像或文本输入。为了评估我们在精细粒度风格转换方法上的表现，我们为刺绣定制建立了一个基准测试。我们的方法在这个任务上超越了以前的方法，并进一步证明了在三个额外领域：艺术风格转换、草图颜色化和外观转换中具有强大的泛化能力。

论文及项目相关链接

PDF Accepted to ACM Transactions on Graphics (TOG), SIGGRAPH Asia 2025

Summary

本文介绍了扩散模型在图像操纵技术中的显著进展，特别是在零售工作流程中的预销售可视化方面的应用。文章提出了一种新的对比学习框架，用于对刺绣等精细特征进行定制，通过单张参考图像解耦精细风格和内容特征，构建于图像类比的传统概念之上。采用基于预训练扩散模型的解耦表示相似性度量来实现风格与内容的分离，并提出两阶段对比LoRA调制技术来捕捉精细风格特征。最终建立了一个推理管道，可处理图像或文本输入，仅使用风格块。在建立刺绣定制的基准测试上，该方法超越了先前的方法，并进一步证明其在艺术风格转移、草图彩色化和外观转移等三个领域具有较强的通用性。

Key Takeaways

扩散模型在图像操纵技术中取得显著进展，特别是预销售可视化方面的应用。
提出了一种新的对比学习框架，用于解耦精细风格和内容特征。
利用单张参考图像构建图像对，以实现目标风格的定义。
采用基于预训练扩散模型的解耦表示相似性度量进行风格与内容的分离。
采用两阶段对比LoRA调制技术捕捉精细风格特征，包括整个LoRA的迭代更新和选定风格块的更新。
通过建立推理管道，可处理图像或文本输入，仅使用风格块进行工作。

Cool Papers

点此查看论文截图

SSCM: A Spatial-Semantic Consistent Model for Multi-Contrast MRI Super-Resolution

Authors:Xiaoman Wu, Lubin Gan, Siying Wu, Jing Zhang, Yunwei Ou, Xiaoyan Sun

Multi-contrast Magnetic Resonance Imaging super-resolution (MC-MRI SR) aims to enhance low-resolution (LR) contrasts leveraging high-resolution (HR) references, shortening acquisition time and improving imaging efficiency while preserving anatomical details. The main challenge lies in maintaining spatial-semantic consistency, ensuring anatomical structures remain well-aligned and coherent despite structural discrepancies and motion between the target and reference images. Conventional methods insufficiently model spatial-semantic consistency and underuse frequency-domain information, which leads to poor fine-grained alignment and inadequate recovery of high-frequency details. In this paper, we propose the Spatial-Semantic Consistent Model (SSCM), which integrates a Dynamic Spatial Warping Module for inter-contrast spatial alignment, a Semantic-Aware Token Aggregation Block for long-range semantic consistency, and a Spatial-Frequency Fusion Block for fine structure restoration. Experiments on public and private datasets show that SSCM achieves state-of-the-art performance with fewer parameters while ensuring spatially and semantically consistent reconstructions.

多对比度磁共振成像超分辨率（MC-MRI SR）旨在利用高分辨率（HR）参考来增强低分辨率（LR）对比度，缩短采集时间，提高成像效率，同时保留解剖细节。主要挑战在于保持空间语义一致性，确保目标图像和参考图像之间的结构差异和运动情况下，解剖结构仍然保持良好的对齐和连贯性。传统方法不足以对空间语义一致性进行建模，并且没有充分利用频域信息，这导致精细对齐不佳以及高频细节恢复不足。在本文中，我们提出了空间语义一致模型（SSCM），该模型结合了动态空间扭曲模块进行跨对比度空间对齐，语义感知令牌聚合块实现长程语义一致性，以及空间频率融合块进行精细结构恢复。在公共和私有数据集上的实验表明，SSCM在参数更少的情况下实现了最先进的性能，同时确保了空间上和语义上的一致重建。

论文及项目相关链接

PDF

Summary

本文介绍了基于多对比度的磁共振成像超分辨率技术（MC-MRI SR），该技术旨在利用高分辨率参考图像提升低分辨率图像的对比度，以缩短采集时间并提高成像效率，同时保留解剖细节。研究的关键在于维持空间语义一致性，确保目标图像与参考图像之间的结构对齐和语义连贯性。文章提出了一种新的空间语义一致性模型（SSCM），该模型结合了动态空间变形模块进行跨对比度空间对齐、语义感知令牌聚合块进行长距离语义一致性以及空间频率融合块进行精细结构恢复。实验表明，SSCM在公共和私有数据集上实现了卓越的性能，参数更少，同时保证了空间语义一致的重建。

Key Takeaways

MC-MRI SR技术旨在提高低分辨率磁共振成像的对比度，利用高分辨率参考图像。
主要挑战在于维持空间语义一致性，确保图像结构对齐和语义连贯。
传统方法在空间语义一致性建模方面存在不足，且未能充分利用频域信息。
SSCM模型结合了动态空间变形模块、语义感知令牌聚合块和空间频率融合块。
动态空间变形模块用于跨对比度空间对齐。
语义感知令牌聚合块确保长距离语义一致性。

Cool Papers

点此查看论文截图

CLIPin: A Non-contrastive Plug-in to CLIP for Multimodal Semantic Alignment

Authors:Shengzhu Yang, Jiawei Du, Shuai Lu, Weihang Zhang, Ningli Wang, Huiqi Li

Large-scale natural image-text datasets, especially those automatically collected from the web, often suffer from loose semantic alignment due to weak supervision, while medical datasets tend to have high cross-modal correlation but low content diversity. These properties pose a common challenge for contrastive language-image pretraining (CLIP): they hinder the model’s ability to learn robust and generalizable representations. In this work, we propose CLIPin, a unified non-contrastive plug-in that can be seamlessly integrated into CLIP-style architectures to improve multimodal semantic alignment, providing stronger supervision and enhancing alignment robustness. Furthermore, two shared pre-projectors are designed for image and text modalities respectively to facilitate the integration of contrastive and non-contrastive learning in a parameter-compromise manner. Extensive experiments on diverse downstream tasks demonstrate the effectiveness and generality of CLIPin as a plug-and-play component compatible with various contrastive frameworks. Code is available at https://github.com/T6Yang/CLIPin.

大规模的自然图像文本数据集，尤其是那些从网上自动收集的数据集，由于弱监督而往往存在语义对齐不紧密的问题，而医疗数据集则往往具有高度的跨模态关联但内容多样性较低。这些特性给对比语言图像预训练（CLIP）带来了共同挑战：它们阻碍了模型学习稳健和可泛化表示的能力。在这项工作中，我们提出了CLIPin，这是一种统一的非对比式插件，可以无缝集成到CLIP风格的架构中，以改善多模态语义对齐，提供更强的监督，提高对齐稳健性。此外，还为图像和文本模态设计了两个共享预投影仪，以在参数折衷的方式促进对比学习和非对比学习的集成。在多种下游任务上的广泛实验证明了CLIPin作为一个即插即用组件的有效性、通用性以及与各种对比框架的兼容性。代码可在https://github.com/T6Yang/CLIPin找到。

论文及项目相关链接

PDF

Summary

本文提出一种名为CLIPin的统一非对比式插件，可无缝集成到CLIP风格的架构中，旨在改进多模态语义对齐。通过强化监督和提高对齐稳健性，CLIPin解决了大型自然图像文本数据集因弱监督导致的语义对齐不紧密以及医疗数据集内容多样性低的问题。同时，设计了用于图像和文本模态的两个共享预处理器，以在参数折衷的方式中实现对比和非对比学习的集成。在多种下游任务上的实验证明了CLIPin作为对比框架的即插即用组件的有效性和通用性。

Key Takeaways

CLIPin是一种非对比式插件，用于改进多模态语义对齐，可无缝集成到CLIP架构中。
CLIPin解决了大型自然图像文本数据集因弱监督导致的语义对齐问题。
在医疗数据集中，CLIPin通过提高跨模态关联性和增强内容多样性来提升模型的性能。
CLIPin通过设计用于图像和文本模态的共享预处理器，实现了对比和非对比学习的集成。
该方法采用了参数折衷的方式，以提高模型的学习表示能力。
在多种下游任务上的实验证明了CLIPin的有效性和通用性。

Cool Papers

点此查看论文截图

CellCLIP – Learning Perturbation Effects in Cell Painting via Text-Guided Contrastive Learning

Authors:Mingyu Lu, Ethan Weinberger, Chanwoo Kim, Su-In Lee

High-content screening (HCS) assays based on high-throughput microscopy techniques such as Cell Painting have enabled the interrogation of cells’ morphological responses to perturbations at an unprecedented scale. The collection of such data promises to facilitate a better understanding of the relationships between different perturbations and their effects on cellular state. Towards achieving this goal, recent advances in cross-modal contrastive learning could, in theory, be leveraged to learn a unified latent space that aligns perturbations with their corresponding morphological effects. However, the application of such methods to HCS data is not straightforward due to substantial differences in the semantics of Cell Painting images compared to natural images, and the difficulty of representing different classes of perturbations (e.g., small molecule vs CRISPR gene knockout) in a single latent space. In response to these challenges, here we introduce CellCLIP, a cross-modal contrastive learning framework for HCS data. CellCLIP leverages pre-trained image encoders coupled with a novel channel encoding scheme to better capture relationships between different microscopy channels in image embeddings, along with natural language encoders for representing perturbations. Our framework outperforms current open-source models, demonstrating the best performance in both cross-modal retrieval and biologically meaningful downstream tasks while also achieving significant reductions in computation time.

基于高通量显微镜技术（如细胞染色）的高内涵筛选（HCS）分析已经能够以前所未有的规模探究细胞形态学对干扰的响应。收集此类数据有望促进对不同干扰及其细胞状态影响之间关系的更好理解。为了实现这一目标，最近跨模态对比学习的进展理论上可以用来学习一个统一的潜在空间，将干扰与它们相应的形态学效应对齐。然而，由于细胞染色图像与自然图像语义上的显著差异，以及单一潜在空间中表示不同类别干扰（例如小分子与CRISPR基因敲除）的困难，此类方法在HCS数据上的应用并不简单。针对这些挑战，我们在此引入了CellCLIP，一个用于HCS数据的跨模态对比学习框架。CellCLIP利用预训练的图像编码器与一种新型通道编码方案，以更好地捕获图像嵌入中不同显微镜通道之间的关系，以及用于表示干扰的自然语言编码器。我们的框架优于当前的开源模型，在跨模态检索和具有生物学意义的下游任务中表现出最佳性能，同时计算时间也大大减少。

论文及项目相关链接

PDF

Summary

基于高内涵筛选（HCS）和高通量显微镜技术（如细胞染色法）的细胞形态学响应研究已实现了前所未有的规模。利用最新的跨模态对比学习技术，学习统一潜在空间以将扰动与其对应的形态效应对齐，有望推动这一领域的研究进展。但应用于HCS数据的挑战包括图像语义差异大及难以代表不同类别的扰动等。为应对这些挑战，本文提出CellCLIP框架，结合预训练图像编码器、新型通道编码方案及自然语言编码器，以更好地捕捉图像嵌入中的关系并代表扰动。CellCLIP框架表现优于现有开源模型，在跨模态检索和具有生物学意义的下游任务上均展现出最佳性能，并大大减少了计算时间。

Key Takeaways

HCS分析可以深入研究细胞的形态学响应机制。
跨模态对比学习为理解扰动与形态变化间的关系提供了新的理论工具。
将此技术应用于HCS数据存在诸多挑战，包括图像语义差异大和扰动表示的复杂性。
CellCLIP框架结合了预训练图像编码器、通道编码方案及自然语言编码器，用于优化跨模态数据的处理和分析。
CellCLIP在跨模态检索和生物学任务上的性能显著优于现有模型。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-09-28/%E6%97%A0%E7%9B%91%E7%9D%A3_%E5%8D%8A%E7%9B%91%E7%9D%A3_%E5%AF%B9%E6%AF%94%E5%AD%A6%E4%B9%A0/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

无监督/半监督/对比学习

医学影像/Breast Ultrasound

医学影像/Breast Ultrasound 方向最新论文已更新，请持续关注 Update in 2025-09-28 Multimodal Deep Learning for Phyllodes Tumor Classification from Ultrasound and Clinical Data

2025-09-28 医学影像/Breast Ultrasound

医学影像/Breast Ultrasound

检测/分割/跟踪

检测/分割/跟踪方向最新论文已更新，请持续关注 Update in 2025-09-28 SwinMamba A hybrid local-global mamba framework for enhancing semantic segmentation of remotely sensed images

2025-09-28 检测/分割/跟踪

检测/分割/跟踪