发布日期: 2025-09-12

更新日期: 2025-10-07

文章字数: 3.1k

阅读时长: 12 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-12 更新

Vision-Language Semantic Aggregation Leveraging Foundation Model for Generalizable Medical Image Segmentation

Authors:Wenjun Yu, Yinchen Zhou, Jia-Xuan Jiang, Shubin Zeng, Yuee Li, Zhong Wang

Multimodal models have achieved remarkable success in natural image segmentation, yet they often underperform when applied to the medical domain. Through extensive study, we attribute this performance gap to the challenges of multimodal fusion, primarily the significant semantic gap between abstract textual prompts and fine-grained medical visual features, as well as the resulting feature dispersion. To address these issues, we revisit the problem from the perspective of semantic aggregation. Specifically, we propose an Expectation-Maximization (EM) Aggregation mechanism and a Text-Guided Pixel Decoder. The former mitigates feature dispersion by dynamically clustering features into compact semantic centers to enhance cross-modal correspondence. The latter is designed to bridge the semantic gap by leveraging domain-invariant textual knowledge to effectively guide deep visual representations. The synergy between these two mechanisms significantly improves the model’s generalization ability. Extensive experiments on public cardiac and fundus datasets demonstrate that our method consistently outperforms existing SOTA approaches across multiple domain generalization benchmarks.

多模态模型在自然图像分割方面取得了显著的成功，但将其应用于医学领域时，其性能往往不佳。通过深入研究，我们将性能差距归因于多模态融合的挑战，主要挑战在于抽象文本提示和精细医学视觉特征之间的巨大语义差距，以及由此导致的特征分散。为了解决这个问题，我们从语义聚合的角度重新审视问题。具体来说，我们提出了一种期望最大化（EM）聚合机制和文本引导像素解码器。前者通过动态将特征聚类到紧凑的语义中心，以增强跨模态对应，从而缓解特征分散问题。后者旨在利用领域不变的文本知识来有效地引导深度视觉表示，从而缩小语义差距。这两种机制的协同作用大大提高了模型的泛化能力。在公共心脏和眼底数据集上的大量实验表明，我们的方法在多个域泛化基准测试上始终优于现有的最新技术方法。

论文及项目相关链接

PDF 29 pages and 8 figures

Summary：针对多模态模型在自然图像分割中取得显著成功，但在医学领域应用时表现不佳的问题，文章提出基于期望最大化聚合机制和文本引导像素解码器的解决方案。该方法通过动态聚类特征和利用领域不变文本知识，有效缩小语义差距和特征分散，提高了模型的泛化能力，并在公开的心脏和眼底数据集上的实验证明了其有效性。

Key Takeaways：

多模态模型在医学领域面临挑战，主要原因是多模态融合的问题，包括语义差距和特征分散。
文章从语义聚合的角度重新审视问题。
提出期望最大化聚合机制，通过动态聚类特征以增强跨模态对应。
设计文本引导像素解码器，利用领域不变的文本知识来指导深度视觉表示，以缩小语义差距。
两者协同工作，显著提高模型的泛化能力。
在公开的心脏和眼底数据集上的实验证明，该方法在多域泛化基准测试中始终优于现有最先进方法。

Cool Papers

点此查看论文截图

Hyperspectral Mamba for Hyperspectral Object Tracking

Authors:Long Gao, Yunhe Zhang, Yan Jiang, Weiying Xie, Yunsong Li

Hyperspectral object tracking holds great promise due to the rich spectral information and fine-grained material distinctions in hyperspectral images, which are beneficial in challenging scenarios. While existing hyperspectral trackers have made progress by either transforming hyperspectral data into false-color images or incorporating modality fusion strategies, they often fail to capture the intrinsic spectral information, temporal dependencies, and cross-depth interactions. To address these limitations, a new hyperspectral object tracking network equipped with Mamba (HyMamba), is proposed. It unifies spectral, cross-depth, and temporal modeling through state space modules (SSMs). The core of HyMamba lies in the Spectral State Integration (SSI) module, which enables progressive refinement and propagation of spectral features with cross-depth and temporal spectral information. Embedded within each SSI, the Hyperspectral Mamba (HSM) module is introduced to learn spatial and spectral information synchronously via three directional scanning SSMs. Based on SSI and HSM, HyMamba constructs joint features from false-color and hyperspectral inputs, and enhances them through interaction with original spectral features extracted from raw hyperspectral images. Extensive experiments conducted on seven benchmark datasets demonstrate that HyMamba achieves state-of-the-art performance. For instance, it achieves 73.0% of the AUC score and 96.3% of the DP@20 score on the HOTC2020 dataset. The code will be released at https://github.com/lgao001/HyMamba.

超光谱目标跟踪由于超光谱图像中丰富的光谱信息和精细的材料区分，在具有挑战性的场景中有着巨大的潜力。虽然现有的超光谱跟踪器通过将超光谱数据转换为假彩色图像或结合模态融合策略取得了一定的进展，但它们往往无法捕捉内在的光谱信息、时间依赖性和跨深度交互。为了解决这些局限性，提出了一种配备Mamba的新超光谱目标跟踪网络（HyMamba）。它通过状态空间模块（SSMs）统一了光谱、跨深度和时间的建模。HyMamba的核心在于光谱状态集成（SSI）模块，该模块能够逐步优化和传播具有跨深度和时间光谱信息的谱特征。在每个SSI中，引入了超光谱Mamba（HSM）模块，通过三个方向扫描的SSM同步学习空间和时间信息。基于SSI和HSM，HyMamba从假彩色和超光谱输入构建联合特征，并通过与从原始超光谱图像提取的原始光谱特征的交互来增强它们。在七个基准数据集上进行的广泛实验表明，HyMamba达到了最先进的性能。例如，在HOTC2020数据集上，它实现了73.0%的AUC得分和96.3%的DP@20得分。代码将在https://github.com/lgao001/HyMamba发布。

论文及项目相关链接

PDF

摘要
本文介绍了一种新的针对复杂场景的实时高效物体追踪方法。该研究通过引入HyMamba网络，结合光谱、跨深度和时序建模，克服了现有超光谱追踪技术的局限性。HyMamba的核心在于Spectral State Integration（SSI）模块，该模块能够逐步优化和传递光谱特征，同时包含跨深度和时序光谱信息。通过引入Hyperspectral Mamba（HSM）模块，学习空间光谱信息，并通过三个方向扫描的SSM增强追踪性能。实验结果表明，HyMamba在七个基准数据集上取得了最先进的性能。如在HOTC2020数据集上实现了73.0%的AUC得分和96.3%的DP@20得分。相关代码将在网上发布以供公众使用。

关键见解

超光谱物体追踪技术利用丰富的光谱信息和精细的材料区分，在复杂场景中表现优异。
现有超光谱追踪技术存在局限性，无法充分捕捉内在光谱信息、时序依赖性和跨深度交互性。
新提出的HyMamba网络集成了光谱、跨深度和时序建模技术来克服这些局限性。
HyMamba的核心是Spectral State Integration（SSI）模块，能够逐步优化和传递光谱特征，包括跨深度和时序信息。
HyMamba引入Hyperspectral Mamba（HSM）模块来学习空间光谱信息，并通过三个方向扫描的SSM增强性能。

Cool Papers

点此查看论文截图

JWST detection of a carbon dioxide dominated gas coma surrounding interstellar object 3I/ATLAS

Authors:Martin A. Cordiner, Nathaniel X. Roth, Michael S. P. Kelley, Dennis Bodewits, Steven B. Charnley, Maria N. Drozdovskaya, Davide Farnocchia, Marco Micheli, Stefanie N. Milam, Cyrielle Opitom, Megan E. Schwamb, Cristina A. Thomas, Stefano Bagnulo

3I/ATLAS is the third confirmed interstellar object to visit our Solar System, and only the second to display a clear coma. Infrared spectroscopy with the James Webb Space Telescope (JWST) provides the opportunity to measure its coma composition and determine the primary activity drivers. We report the first results from our JWST NIRSpec campaign for 3I/ATLAS, at an inbound heliocentric distance of $r_H=3.32$ au. The spectral images (spanning 0.6-5.3 $\mu$m) reveal a CO2 dominated coma, with enhanced outgassing in the sunward direction, and the presence of H2O, CO, OCS, water ice and dust. The coma CO2/H2O mixing ratio of $7.6\pm0.3$ is among the highest ever observed in a comet, and is 4.5-sigma above the trend as a function of heliocentric distance for long-period and Jupiter-family comets (excluding the outlier C/2016 R2). Our observations are compatible with an intrinsically CO2-rich nucleus, which may indicate that 3I/ATLAS contains ices exposed to higher levels of radiation than Solar System comets, or that it formed close to the CO2 ice line in its parent protoplanetary disk. A low coma H2O gas abundance may also be implied, for example, due to inhibited heat penetration into the nucleus, which could suppress the H2O sublimation rate relative to CO2 and CO.

3I/ATLAS是第三个确认访问我们太阳系的星际物体，也是第二个显示出清晰星云的星际物体。詹姆斯·韦伯太空望远镜（JWST）的红外光谱为我们提供了测量其星云组成并确定主要活动驱动因素的机会。我们报告了JWST NIRSpec针对3I/ATLAS的第一批结果，该结果在其靠近太阳中心的距离为r_H=3.32天文单位时获得。光谱图像（覆盖0.6-5.3微米）揭示了以二氧化碳为主的星云，在朝向太阳的方向上增强了出气，并且存在水、一氧化碳、氧硫化物、水冰和尘埃。星云中二氧化碳与水的混合比为7.6±0.3，这是有记录以来观察到的彗星中最高比例之一，并且该比例明显高于长期和木星家族彗星随太阳中心距离的变化趋势（排除异常值C/2016 R2）。我们的观测结果与固有二氧化碳丰富的核心相吻合，这可能表明3I/ATLAS包含的冰暴露于比太阳系彗星更高水平的辐射下，或者它在其母行星的原行星盘中形成于靠近二氧化碳冰线的地方。也可能暗示星云中水的气体含量较低，例如由于热量难以穿透核心，可能会抑制水的亚凝华速率相对于二氧化碳和一氧化碳。

论文及项目相关链接

PDF Accepted at ApJ Letters 2025-09-10

Summary：

通过詹姆斯韦伯太空望远镜（JWST）的红外光谱观测，确认第三颗星际访问物3I/ATLAS的太阳系中的彗星特征。观测结果揭示了一个以二氧化碳为主的彗星彗发，在朝向太阳方向有强烈的出气增强现象，并检测到水、一氧化碳、硫化氧、水冰和尘埃。彗星彗发的二氧化碳与水混合比率创下新高，超越了对长周期和木星家族彗星随日心距离变化的趋势。这表明3I/ATLAS可能存在一个内在的富含二氧化碳的核，可能含有太阳系彗星所无法接触到的冰层或在其起源的行星盘附近形成时接近二氧化碳雪线区域。此研究初步探讨了其原因并提出了抑制热量渗入核，可能导致水的亚升华率相对于二氧化碳和一氧化碳的假设。

Key Takeaways：