嘘~ 正在从服务器偷取页面 . . .

无监督/半监督/对比学习


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-11-26 更新

VeCoR - Velocity Contrastive Regularization for Flow Matching

Authors:Zong-Wei Hong, Jing-lun Li, Lin-Ze Li, Shen Zhang, Yao Tang

Flow Matching (FM) has recently emerged as a principled and efficient alternative to diffusion models. Standard FM encourages the learned velocity field to follow a target direction; however, it may accumulate errors along the trajectory and drive samples off the data manifold, leading to perceptual degradation, especially in lightweight or low-step configurations. To enhance stability and generalization, we extend FM into a balanced attract-repel scheme that provides explicit guidance on both “where to go” and “where not to go.” To be formal, we propose \textbf{Velocity Contrastive Regularization (VeCoR)}, a complementary training scheme for flow-based generative modeling that augments the standard FM objective with contrastive, two-sided supervision. VeCoR not only aligns the predicted velocity with a stable reference direction (positive supervision) but also pushes it away from inconsistent, off-manifold directions (negative supervision). This contrastive formulation transforms FM from a purely attractive, one-sided objective into a two-sided training signal, regularizing trajectory evolution and improving perceptual fidelity across datasets and backbones. On ImageNet-1K 256$\times$256, VeCoR yields 22% and 35% relative FID reductions on SiT-XL/2 and REPA-SiT-XL/2 backbones, respectively, and achieves further FID gains (32% relative) on MS-COCO text-to-image generation, demonstrating consistent improvements in stability, convergence, and image quality, particularly in low-step and lightweight settings. Project page: https://p458732.github.io/VeCoR_Project_Page/

流匹配(Flow Matching,FM)最近作为扩散模型的一种有原则且高效的替代方案而出现。标准FM鼓励学习到的速度场遵循目标方向;然而,它可能会在轨迹上积累误差,并将样本驱动到数据流形之外,从而导致感知退化,特别是在轻量级或低步数配置中。

论文及项目相关链接

PDF

摘要

Flow Matching(FM)的扩展方法通过引入平衡吸引排斥方案,提高了稳定性和泛化能力。为此,提出了Velocity Contrastive Regularization(VeCoR)训练方案,该方案通过对比两边监督增强了标准FM目标。VeCoR不仅使预测速度与稳定参考方向对齐(正向监督),还将其推离不一致的、非数据流形方向(负向监督)。此方法将FM从单一吸引目标转化为双侧训练信号,从而提高了轨迹演化的规律性和感知保真度。在ImageNet-1K 256×256数据集上,VeCoR在SiT-XL/2和REPA-SiT-XL/2骨干网上分别实现了22%和35%的相对FID降低。在MS-COCO文本到图像生成任务中,VeCoR实现了进一步的FID增益(相对降低了32%),证明了其在稳定性、收敛性和图像质量方面的持续进步,特别是在低步数和轻量级设置下。

关键见解

  1. Flow Matching (FM)是一种用于生成模型的有效方法,但可能在轨迹上积累误差并导致样本偏离数据流形,造成感知退化。
  2. 为提高FM的稳定性和泛化能力,引入了平衡吸引排斥方案。
  3. Velocity Contrastive Regularization (VeCoR)是一种互补的训练方案,通过对比两侧监督增强FM,不仅对齐预测速度与稳定方向,还排斥不一致的、非数据流形方向。
  4. VeCoR将FM从单一吸引目标转化为双侧训练信号,改善了轨迹演化和感知保真度。
  5. 在多个数据集和骨干网络上,VeCoR实现了显著的性能提升,特别是在低步数和轻量级模型设置下。
  6. VeCoR在ImageNet-1K和MS-COCO上的实验结果表明,其在图像生成任务中显著提高稳定性、收敛性和图像质量。

Cool Papers

点此查看论文截图

Dual-Path Knowledge-Augmented Contrastive Alignment Network for Spatially Resolved Transcriptomics

Authors:Wei Zhang, Jiajun Chu, Xinci Liu, Chen Tong, Xinyue Li

Spatial Transcriptomics (ST) is a technology that measures gene expression profiles within tissue sections while retaining spatial context. It reveals localized gene expression patterns and tissue heterogeneity, both of which are essential for understanding disease etiology. However, its high cost has driven efforts to predict spatial gene expression from whole slide images. Despite recent advancements, current methods still face significant limitations, such as under-exploitation of high-level biological context, over-reliance on exemplar retrievals, and inadequate alignment of heterogeneous modalities. To address these challenges, we propose DKAN, a novel Dual-path Knowledge-Augmented contrastive alignment Network that predicts spatially resolved gene expression by integrating histopathological images and gene expression profiles through a biologically informed approach. Specifically, we introduce an effective gene semantic representation module that leverages the external gene database to provide additional biological insights, thereby enhancing gene expression prediction. Further, we adopt a unified, one-stage contrastive learning paradigm, seamlessly combining contrastive learning and supervised learning to eliminate reliance on exemplars, complemented with an adaptive weighting mechanism. Additionally, we propose a dual-path contrastive alignment module that employs gene semantic features as dynamic cross-modal coordinators to enable effective heterogeneous feature integration. Through extensive experiments across three public ST datasets, DKAN demonstrates superior performance over state-of-the-art models, establishing a new benchmark for spatial gene expression prediction and offering a powerful tool for advancing biological and clinical research.

空间转录组学(ST)是一种技术,能够在保留空间上下文的同时,测量组织切片内的基因表达谱。它揭示了局部基因表达模式和组织异质性,对于了解疾病病因至关重要。然而,其高昂的成本促使人们努力从全切片图像预测空间基因表达。尽管最近有进展,但当前的方法仍然面临重大挑战,例如对高级生物上下文的利用不足、过分依赖示例检索以及异质模式的不充分对齐。为了应对这些挑战,我们提出了DKAN,这是一种新型的双路径知识增强对比对齐网络,它通过生物学信息丰富的方法整合病理图像和基因表达谱来预测空间解析基因表达。具体来说,我们引入了一个有效的基因语义表示模块,该模块利用外部基因数据库提供额外的生物学见解,从而提高基因表达预测。此外,我们采用统一的、一阶段的对比学习范式,将对比学习和监督学习无缝结合,减少对示例的依赖,辅以自适应加权机制。另外,我们提出了一个双路径对比对齐模块,它利用基因语义特征作为动态跨模态协调器,以实现有效的异质特征集成。在三个公开ST数据集上的广泛实验表明,DKAN在最新模型上表现出卓越性能,为空间基因表达预测建立了新的基准,并为推动生物学和临床研究提供了有力工具。

论文及项目相关链接

PDF AAAI 2026 Oral, extended version

Summary

本文介绍了空间转录学技术及其在疾病起源理解中的应用。为降低空间基因表达预测的成本,提出了DKAN网络,通过整合组织病理图像和基因表达谱,采用双路径知识增强对比对齐模块进行预测。DKAN利用外部基因数据库提供生物见解,采用统一的一阶段对比学习范式,消除对范例的依赖,并提出动态跨模态协调策略实现有效的异质特征集成。在三个公共空间转录学数据集上的实验表明,DKAN在基因表达预测上的性能优于现有模型,为生物和临床研究提供了有力工具。

Key Takeaways

  1. 空间转录学技术对于理解疾病起源至关重要,因为它可以揭示局部基因表达模式和组织异质性。
  2. 当前预测空间基因表达的方法存在挑战,如未能充分利用高级生物学上下文、过度依赖示例检索和异质模态的不充分对齐。
  3. DKAN网络通过整合组织病理图像和基因表达谱来预测空间解决的基因表达。
  4. DKAN利用外部基因数据库提供生物见解,增强基因表达的预测能力。
  5. DKAN采用统一的一阶段对比学习范式,消除对范例的依赖,并采用自适应权重机制。
  6. DKAN提出双路径对比对齐模块,利用基因语义特征作为动态跨模态协调者,实现有效的异质特征集成。

Cool Papers

点此查看论文截图

Comparative Study of UNet-based Architectures for Liver Tumor Segmentation in Multi-Phase Contrast-Enhanced Computed Tomography

Authors:Doan-Van-Anh Ly, Thi-Thu-Hien Pham, Thanh-Hai Le

Segmentation of liver structures in multi-phase contrast-enhanced computed tomography (CECT) plays a crucial role in computer-aided diagnosis and treatment planning for liver diseases, including tumor detection. In this study, we investigate the performance of UNet-based architectures for liver tumor segmentation, starting from the original UNet and extending to UNet3+ with various backbone networks. We evaluate ResNet, Transformer-based, and State-space (Mamba) backbones, all initialized with pretrained weights. Surprisingly, despite the advances in modern architecture, ResNet-based models consistently outperform Transformer- and Mamba-based alternatives across multiple evaluation metrics. To further improve segmentation quality, we introduce attention mechanisms into the backbone and observe that incorporating the Convolutional Block Attention Module (CBAM) yields the best performance. ResNetUNet3+ with CBAM module not only produced the best overlap metrics with a Dice score of 0.755 and IoU of 0.662, but also achieved the most precise boundary delineation, evidenced by the lowest HD95 distance of 77.911. The model’s superiority was further cemented by its leading overall accuracy of 0.925 and specificity of 0.926, showcasing its robust capability in accurately identifying both lesion and healthy tissue. To further enhance interpretability, Grad-CAM visualizations were employed to highlight the region’s most influential predictions, providing insights into its decision-making process. These findings demonstrate that classical ResNet architecture, when combined with modern attention modules, remain highly competitive for medical image segmentation tasks, offering a promising direction for liver tumor detection in clinical practice.

在多种阶段对比增强计算机断层扫描(CECT)中,肝脏结构的分割对于肝脏疾病的计算机辅助诊断和治疗计划,包括肿瘤检测,起到至关重要的作用。本研究探讨了基于UNet架构的肝脏肿瘤分割性能,从最初的UNet扩展到具有不同骨干网络的UNet3+。我们评估了ResNet、基于Transformer和State-space(Mamba)的骨干网络,所有网络都使用预训练权重进行初始化。令人惊讶的是,尽管现代架构有所进步,基于ResNet的模型在多个评估指标上始终优于基于Transformer和Mamba的替代方案。为了进一步提高分割质量,我们将注意力机制引入到骨干网络中,并观察到加入卷积块注意力模块(CBAM)会取得最佳性能。带有CBAM模块的ResNetUNet3+不仅以Dice得分0.755和IoU得分0.662的最佳重叠度量产生结果,而且实现了最精确的边缘描绘,以最低的HD95距离77.911为证据。该模型的总体准确性为0.925,特异性为0.926,表现出其准确识别病变和健康组织的稳健能力,进一步巩固了其优越性。为了增强解释性,采用了Grad-CAM可视化来突出显示预测区域中最具影响力的部分,为决策过程提供见解。这些发现表明,当经典ResNet架构与现代注意力模块相结合时,其在医学图像分割任务中仍具有高度的竞争力,为临床实践中的肝脏肿瘤检测提供了有前景的方向。

论文及项目相关链接

PDF 15 pages, 9 figures

摘要

本研究探讨了基于UNet架构的肝脏肿瘤分割性能,从原始的UNet到UNet3+的不同主干网络。研究采用ResNet、基于Transformer和Mamba的主干网络,均使用预训练权重进行初始化。尽管现代架构有所进步,但本研究发现ResNet模型在多个评估指标上始终优于基于Transformer和Mamba的模型。为了进一步提高分割质量,研究引入了注意力机制,并发现引入卷积块注意力模块(CBAM)效果最佳。带有CBAM模块的ResNetUNet3+不仅以Dice分数0.755和IoU分数0.662的最佳重叠度量取得了最好的成绩,而且实现了最精确的边界描绘,以最低的HD95距离77.911证明了其优势。该模型的总体准确性为0.925和特异性为0.926,显示出其在准确识别病变和健康组织方面的稳健能力。为了进一步提高可解释性,采用Grad-CAM可视化技术突出显示预测最关键的区域,提供了模型决策过程的见解。本研究结果表明,结合现代注意力模块的古典ResNet架构在医学图像分割任务中仍具有很强的竞争力,为肝脏肿瘤检测的临床实践提供了有前景的研究方向。

关键见解

  1. 研究评估了不同主干网络的UNet架构在肝脏肿瘤分割中的性能。
  2. 在多种评估指标中,ResNet模型表现出与其他模型的明显优势。
  3. 引入注意力机制提高了模型的分割质量,特别是CBAM模块的效果最佳。
  4. ResNetUNet3+结合CBAM模块在重叠度量、边界描绘以及总体准确性和特异性方面取得了最佳性能。
  5. 使用Grad-CAM可视化技术增强了模型的可解释性,有助于理解模型的决策过程。
  6. 研究结果强调了结合传统与现代技术的潜力,尤其是在医学图像分割任务中。

Cool Papers

点此查看论文截图

VCE: Safe Autoregressive Image Generation via Visual Contrast Exploitation

Authors:Feng Han, Chao Gong, Zhipeng Wei, Jingjing Chen, Yu-Gang Jiang

Recently, autoregressive image generation models have wowed audiences with their remarkable capability in creating surprisingly realistic images. Models such as GPT-4o and LlamaGen can not only produce images that faithfully mimic renowned artistic styles like Ghibli, Van Gogh, or Picasso, but also potentially generate Not-Safe-For-Work (NSFW) content, raising significant concerns regarding copyright infringement and ethical use. Despite these concerns, methods to safeguard autoregressive text-to-image models remain underexplored. Previous concept erasure methods, primarily designed for diffusion models that operate in denoising latent space, are not directly applicable to autoregressive models that generate images token by token. To address this critical gap, we propose Visual Contrast Exploitation (VCE), a novel framework comprising: (1) an innovative contrastive image pair construction paradigm that precisely decouples unsafe concepts from their associated content semantics, and (2) a sophisticated DPO-based training approach that enhances the model’s ability to identify and leverage visual contrastive features from image pairs, enabling precise concept erasure. Our comprehensive experiments across three challenging tasks-artist style erasure, explicit content erasure, and object removal-demonstrate that our method effectively secures the model, achieving state-of-the-art results while erasing unsafe concepts and maintaining the integrity of unrelated safe concepts. The code and models are available at https://github.com/Maplebb/VCE.

最近,自回归图像生成模型以其创造惊人逼真图像的能力吸引了观众。像GPT-4o和LlamaGen这样的模型不仅能够忠实地模仿吉卜利、梵高或毕加索等著名艺术风格来生成图像,而且还可能生成不适合工作场合(NSFW)的内容,引发了关于版权侵犯和道德使用的重大担忧。尽管存在这些担忧,但保护自回归文本到图像模型的方法仍被较少探索。以前的概念消除方法主要设计用于在降噪潜在空间操作的扩散模型,并不直接适用于逐令牌生成图像的自回归模型。为了解决这一关键空白,我们提出了视觉对比利用(VCE),这是一个新的框架,包括:(1)一种创新的对比图像对构建范式,能够精确地将不安全概念与其关联的内容语义分开;(2)一种基于DPO的先进训练方法,提高模型识别和利用图像对比特征的能力,从而实现精确的概念消除。我们在三个具有挑战性的任务——艺术家风格消除、明确内容消除和对象移除上的全面实验表明,我们的方法有效地保护了模型,在消除不安全概念的同时保持无关安全概念的完整性,并实现了最先进的成果。代码和模型可在https://github.com/Maplebb/VCE找到。

论文及项目相关链接

PDF

Summary

本文介绍了一种名为VCE的新框架,用于保护基于文本的图像生成模型,能够精准地区分并消除不安全的概念和内容,同时保持安全概念的完整性。该框架通过创新的对比图像对构建模式和基于DPO的训练方法,实现了在风格擦除、明确内容擦除和对象移除等任务上的卓越性能。

Key Takeaways

  1. VCE框架解决了基于文本的图像生成模型中的版权和伦理使用问题。
  2. VCE利用对比图像对构建模式,将不安全概念从内容语义中精确分离。
  3. 通过新颖的DPO训练方法,提高了模型识别和提取对比特征的能力。
  4. VCE在风格擦除、明确内容擦除和对象移除等任务上实现了最佳效果。
  5. VCE框架能够有效消除不安全概念,同时保持相关安全概念的完整性。
  6. 该框架适用于多种图像生成模型,具有广泛的应用前景。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
Speech Speech
Speech 方向最新论文已更新,请持续关注 Update in 2025-11-26 AIRHILT A Human-in-the-Loop Testbed for Multimodal Conflict Detection in Aviation
2025-11-26
下一篇 
检测/分割/跟踪 检测/分割/跟踪
检测/分割/跟踪 方向最新论文已更新,请持续关注 Update in 2025-11-26 SAM3-Adapter Efficient Adaptation of Segment Anything 3 for Camouflage Object Segmentation, Shadow Detection, and Medical Image Segmentation
  目录