发布日期: 2025-11-25

更新日期: 2025-11-27

文章字数: 2.3k

阅读时长: 9 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-11-25 更新

Neighbor GRPO: Contrastive ODE Policy Optimization Aligns Flow Models

Authors:Dailan He, Guanlin Feng, Xingtong Ge, Yazhe Niu, Yi Zhang, Bingqi Ma, Guanglu Song, Yu Liu, Hongsheng Li

Group Relative Policy Optimization (GRPO) has shown promise in aligning image and video generative models with human preferences. However, applying it to modern flow matching models is challenging because of its deterministic sampling paradigm. Current methods address this issue by converting Ordinary Differential Equations (ODEs) to Stochastic Differential Equations (SDEs), which introduce stochasticity. However, this SDE-based GRPO suffers from issues of inefficient credit assignment and incompatibility with high-order solvers for fewer-step sampling. In this paper, we first reinterpret existing SDE-based GRPO methods from a distance optimization perspective, revealing their underlying mechanism as a form of contrastive learning. Based on this insight, we propose Neighbor GRPO, a novel alignment algorithm that completely bypasses the need for SDEs. Neighbor GRPO generates a diverse set of candidate trajectories by perturbing the initial noise conditions of the ODE and optimizes the model using a softmax distance-based surrogate leaping policy. We establish a theoretical connection between this distance-based objective and policy gradient optimization, rigorously integrating our approach into the GRPO framework. Our method fully preserves the advantages of deterministic ODE sampling, including efficiency and compatibility with high-order solvers. We further introduce symmetric anchor sampling for computational efficiency and group-wise quasi-norm reweighting to address reward flattening. Extensive experiments demonstrate that Neighbor GRPO significantly outperforms SDE-based counterparts in terms of training cost, convergence speed, and generation quality.

集团相对策略优化（GRPO）在图像和视频生成模型与人类偏好对齐方面显示出潜力。然而，将其应用于现代流量匹配模型具有挑战性，因为它的确定性采样范式。当前的方法通过将常微分方程（ODEs）转换为随机微分方程（SDEs）来解决这个问题，引入了随机性。然而，基于SDE的GRPO存在信用分配效率低下以及与少步骤采样高阶求解器不兼容的问题。在本文中，我们首先从距离优化的角度重新解释现有的基于SDE的GRPO方法，揭示其作为对比学习形式的内在机制。基于这一见解，我们提出了Neighbor GRPO，这是一种新型的对齐算法，完全避免了需要使用SDE。Neighbor GRPO通过扰动ODE的初始噪声条件来生成一系列不同的候选轨迹，并使用基于softmax距离的替代跳跃策略优化模型。我们建立了基于距离的目标与政策梯度优化之间的理论联系，将我们的方法严格地融入到GRPO框架中。我们的方法充分保留了确定性ODE采样的优点，包括高效率以及与高阶求解器的兼容性。为了计算效率，我们进一步引入了对称锚点采样和组准范数重新加权来解决奖励平坦化问题。大量实验表明，在训练成本、收敛速度和生成质量方面，Neighbor GRPO显著优于基于SDE的竞争对手。

论文及项目相关链接

PDF

Summary
本论文提出一种基于对比学习的方法Neighbor GRPO用于改进生成模型中的组相对策略优化问题。该方法通过从距离优化角度重新解释现有SDE-based GRPO方法，揭示其作为对比学习的本质。Neighbor GRPO通过扰动ODE的初始噪声条件生成多种候选轨迹，并使用基于softmax距离的替代跳跃策略进行优化。该方法在保持确定性ODE采样的优势的同时，通过引入对称锚采样和组权重准范数调整策略进一步提高计算效率和收敛速度。实验表明，Neighbor GRPO在训练成本、收敛速度和生成质量上显著优于基于SDE的方法。

Key Takeaways

Group Relative Policy Optimization (GRPO)在图像和视频生成模型中与人类偏好对齐显示出潜力，但应用于现代流程匹配模型具有挑战性。
2.现有方法通过转换常微分方程（ODEs）为随机微分方程（SDEs）来解决这个问题，但这种方法存在低效的信用分配和与高阶求解器的不兼容性问题。
本文从距离优化角度重新解释了现有SDE-based GRPO方法，揭示其本质上是对比学习。
Neighbor GRPO方法通过扰动ODE的初始噪声条件生成轨迹，使用基于softmax距离的替代跳跃策略优化模型，从而绕过对SDE的需求。
Neighbor GRPO保留了确定性ODE采样的优点，包括效率和与高阶求解器的兼容性。
引入对称锚采样以提高计算效率，并通过组权重准范数调整策略解决奖励扁平化问题。

Cool Papers

点此查看论文截图

Comparative Study of UNet-based Architectures for Liver Tumor Segmentation in Multi-Phase Contrast-Enhanced Computed Tomography

Authors:Doan-Van-Anh Ly, Thi-Thu-Hien Pham, Thanh-Hai Le

Segmentation of liver structures in multi-phase contrast-enhanced computed tomography (CECT) plays a crucial role in computer-aided diagnosis and treatment planning for liver diseases, including tumor detection. In this study, we investigate the performance of UNet-based architectures for liver tumor segmentation, starting from the original UNet and extending to UNet3+ with various backbone networks. We evaluate ResNet, Transformer-based, and State-space (Mamba) backbones, all initialized with pretrained weights. Surprisingly, despite the advances in modern architecture, ResNet-based models consistently outperform Transformer- and Mamba-based alternatives across multiple evaluation metrics. To further improve segmentation quality, we introduce attention mechanisms into the backbone and observe that incorporating the Convolutional Block Attention Module (CBAM) yields the best performance. ResNetUNet3+ with CBAM module not only produced the best overlap metrics with a Dice score of 0.755 and IoU of 0.662, but also achieved the most precise boundary delineation, evidenced by the lowest HD95 distance of 77.911. The model’s superiority was further cemented by its leading overall accuracy of 0.925 and specificity of 0.926, showcasing its robust capability in accurately identifying both lesion and healthy tissue. To further enhance interpretability, Grad-CAM visualizations were employed to highlight the region’s most influential predictions, providing insights into its decision-making process. These findings demonstrate that classical ResNet architecture, when combined with modern attention modules, remain highly competitive for medical image segmentation tasks, offering a promising direction for liver tumor detection in clinical practice.

在多项对比增强计算机断层扫描（CECT）中，肝脏结构的分割对于肝脏疾病的计算机辅助诊断和治疗计划（包括肿瘤检测）起着至关重要的作用。本研究中，我们调查了基于UNet架构的肝脏肿瘤分割性能，从原始UNet扩展到具有各种骨干网络的UNet3+。我们评估了ResNet、基于Transformer和状态空间（Mamba）的骨干网络，它们都以预训练权重进行初始化。令人惊讶的是，尽管现代架构有所进步，但基于ResNet的模型在多个评估指标上始终优于基于Transformer和Mamba的替代方案。为了进一步提高分割质量，我们将注意力机制引入到骨干网络中，并观察到加入卷积块注意力模块（CBAM）后效果最佳。带有CBAM模块的ResNetUNet3+不仅以Dice得分0.755和IoU得分0.662的最佳重叠指标产生了最精确的边界描绘，而且实现了最低的HD95距离77.911。该模型的总体准确率达到了0.925和特异性为0.926，展示了其在准确识别病变组织和健康组织方面的稳健能力，进一步证明了其优越性。为了增强可解释性，采用了Grad-CAM可视化来突出显示预测区域中最具影响力的部分，从而深入了解其决策过程。这些研究结果表明，经典的ResNet架构与现代注意力模块相结合时，在医学图像分割任务中仍具有竞争力，为临床实践中肝脏肿瘤检测提供了有前景的方向。

论文及项目相关链接

PDF 16 pages, 9 figures

Summary
本研究探讨了基于UNet架构的肝脏肿瘤分割性能，从原始UNet到UNet3+，并使用了多种骨干网络，包括ResNet、基于Transformer和Mamba的骨干网络。研究意外地发现，ResNet模型在多指标评估中持续优于其他模型。引入注意力机制后，配备CBAM模块的ResNetUNet3+表现最佳，具有最高的Dice得分和IoU，最精确的边界描绘，以及最高的总体准确率和特异性。研究还利用Grad-CAM可视化增强了模型的解释性。结果表明，结合经典ResNet架构和现代注意力模块的方法在医学图像分割任务中仍极具竞争力，为临床实践中肝脏肿瘤检测提供了有希望的方向。

Key Takeaways