发布日期: 2025-09-17

更新日期: 2025-10-07

文章字数: 5.2k

阅读时长: 21 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-17 更新

BREA-Depth: Bronchoscopy Realistic Airway-geometric Depth Estimation

Authors:Francis Xiatian Zhang, Emile Mackute, Mohammadreza Kasaei, Kevin Dhaliwal, Robert Thomson, Mohsen Khadem

Monocular depth estimation in bronchoscopy can significantly improve real-time navigation accuracy and enhance the safety of interventions in complex, branching airways. Recent advances in depth foundation models have shown promise for endoscopic scenarios, yet these models often lack anatomical awareness in bronchoscopy, overfitting to local textures rather than capturing the global airway structure, particularly under ambiguous depth cues and poor lighting. To address this, we propose Brea-Depth, a novel framework that integrates airway-specific geometric priors into foundation model adaptation for bronchoscopic depth estimation. Our method introduces a depth-aware CycleGAN, refining the translation between real bronchoscopic images and airway geometries from anatomical data, effectively bridging the domain gap. In addition, we introduce an airway structure awareness loss to enforce depth consistency within the airway lumen while preserving smooth transitions and structural integrity. By incorporating anatomical priors, Brea-Depth enhances model generalization and yields more robust, accurate 3D airway reconstructions. To assess anatomical realism, we introduce Airway Depth Structure Evaluation, a new metric for structural consistency. We validate BREA-Depth on a collected ex vivo human lung dataset and an open bronchoscopic dataset, where it outperforms existing methods in anatomical depth preservation.

支气管镜检查中的单目测深技术可以显著提高实时导航精度，并增强复杂分支气道干预的安全性。深度基础模型的最新进展在内窥镜场景中显示出良好的前景，但这些模型在支气管镜检查中往往缺乏解剖学意识，过度适应局部纹理，而不是捕捉全球气道结构，特别是在深度线索模糊和照明不良的情况下。针对这一问题，我们提出了Brea-Depth，这是一种新型框架，它将气道特定几何先验知识融入基础模型适应支气管镜深度估计。我们的方法引入了一种深度感知CycleGAN，它细化了真实支气管镜图像与来自解剖数据的气道几何之间的转换，有效地缩小了领域差距。此外，我们引入了一种气道结构意识损失，以加强气道腔内的深度一致性，同时保持平滑过渡和结构完整性。通过融入解剖先验知识，Brea-Depth提高了模型的泛化能力，并产生了更稳健、更准确的3D气道重建。为了评估解剖真实性，我们引入了气道深度结构评估这一新指标来评估结构一致性。我们在收集的离体人类肺部数据集和公开支气管镜数据集上验证了BREA-Depth，它在解剖深度保留方面优于现有方法。

论文及项目相关链接

PDF The paper has been accepted to MICCAI 2025

Summary

本文提出了Brea-Depth框架，它将气道特定几何先验知识融入深度模型适应于支气管镜深度估计，可有效改善实时导航准确性并增强干预的安全性。Brea-Depth利用深度感知CycleGAN桥接真实支气管图像和气道几何结构之间的翻译，并引入气道结构意识损失以加强气道腔内的深度一致性。通过结合解剖先验知识，Brea-Depth提高了模型的泛化能力，产生了更稳健、准确的3D气道重建。本文还介绍了新的结构一致性评估指标Airway Depth Structure Evaluation，并在收集的人类肺部离体数据集和公开支气管镜数据集上验证了Brea-Depth的有效性。

Key Takeaways

Brea-Depth框架提高了支气管镜深度估计的实时导航准确性。
引入气道特定几何先验知识以增强模型的泛化能力。
使用深度感知CycleGAN进行真实支气管图像和气道几何结构的翻译。
引入气道结构意识损失以保持气道腔内的深度一致性。
结合解剖先验知识以产生更稳健、准确的3D气道重建。
提出新的结构一致性评估指标Airway Depth Structure Evaluation。
在人类肺部离体数据集和公开支气管镜数据集上验证了Brea-Depth的有效性。

Cool Papers

点此查看论文截图

OSDM-MReg: Multimodal Image Registration based One Step Diffusion Model

Authors:Xiaochen Wei, Weiwei Guo, Wenxian Yu, Feiming Wei, Dongying Li

Multimodal remote sensing image registration aligns images from different sensors for data fusion and analysis. However, existing methods often struggle to extract modality-invariant features when faced with large nonlinear radiometric differences, such as those between SAR and optical images. To address these challenges, we propose OSDM-MReg, a novel multimodal image registration framework that bridges the modality gap through image-to-image translation. Specifically, we introduce a one-step unaligned target-guided conditional diffusion model (UTGOS-CDM) to translate source and target images into a unified representation domain. Unlike traditional conditional DDPM that require hundreds of iterative steps for inference, our model incorporates a novel inverse translation objective during training to enable direct prediction of the translated image in a single step at test time, significantly accelerating the registration process. After translation, we design a multimodal multiscale registration network (MM-Reg) that extracts and fuses both unimodal and translated multimodal images using the proposed multimodal fusion strategy, enhancing the robustness and precision of alignment across scales and modalities. Extensive experiments on the OSdataset demonstrate that OSDM-MReg achieves superior registration accuracy compared to state-of-the-art methods.

多模态遥感图像配准旨在实现对不同传感器图像的融合与分析。然而，现有方法在面临较大的非线性辐射差异（如SAR和光学图像之间的差异）时，往往难以提取模态不变特征。为了解决这些挑战，我们提出了OSDM-MReg，这是一种新型的多模态图像配准框架，它通过图像到图像的翻译来弥补模态差异。具体来说，我们引入了一步式未对齐目标引导条件扩散模型（UTGOS-CDM），将源图像和目标图像翻译到一个统一的表示域中。与传统的需要数百次迭代推理的条件DDPM不同，我们的模型在训练过程中引入了新颖的反向翻译目标，以实现在测试时一步即可直接预测翻译后的图像，从而大大加速配准过程。翻译后，我们设计了一个多模态多尺度配准网络（MM-Reg），利用所提出的多模态融合策略来提取和融合单模态和翻译后的多模态图像，提高了跨尺度和模态对齐的鲁棒性和精度。在OSdataset上的广泛实验表明，OSDM-MReg的配准精度优于最先进的方法。

论文及项目相关链接

PDF This version updates our previous submission. After rerunning the experiments, we found that the proposed high-frequency perceptual loss did not improve the overall performance of the model. Therefore, we removed this component, revised the corresponding ablation studies, and updated the contributions accordingly. This work has been submitted to the IEEE for possible publication

Summary

在多元遥感图像注册中，不同传感器的图像需要进行数据融合与分析。然而，现有的方法在面对如SAR和光学图像之间的大非线性辐射差异时，提取模态不变特征常常遇到困难。为此，我们提出了OSDM-MReg新型多模态图像注册框架，它通过图像到图像的翻译来缩小模态差异。我们引入了一步式未对齐目标导向扩散模型（UTGOS-CDM），将源图像和目标图像翻译成一个统一的表现域。与传统需要数百次迭代推理的条件DDPM不同，我们的模型在训练过程中融入了逆向翻译目标，使测试时的翻译图像预测一步到位，大大加速了注册过程。翻译后，我们设计了一个多模态多尺度注册网络（MM-Reg），利用提出的多模态融合策略，提取和融合单模态和翻译后的多模态图像，提高跨尺度和模态对齐的稳健性和精度。在OSdataset上的广泛实验表明，OSDM-MReg的注册精度优于最新方法。

Key Takeaways

多模态遥感图像注册是为了数据融合和分析，需要解决不同传感器图像的对齐问题。
现有方法在面对大非线性辐射差异时提取模态不变特征有困难。
OSDM-MReg框架通过图像到图像的翻译来缩小模态差异。
UTGOS-CDM模型能够实现一步式图像翻译，显著加速注册过程。
MM-Reg网络用于提取和融合单模态和翻译后的多模态图像。
OSDM-MReg在OSdataset上的注册精度优于现有最新方法。

Cool Papers

点此查看论文截图

A framework for rapid, reproducible, and high-fidelity whole-brain multi-pool CEST imaging at 3T

Authors:Yupeng Wu, Siyuan Fang, Siyuan Wang, Caixia Fu, Jianqi Li

Purpose: To develop and validate a framework for rapid, accurate, and reproducible whole-brain, multi-pool chemical exchange saturation transfer (CEST) imaging at 3T, addressing challenges of long acquisition times and confounding factors. Methods: A single-shot 3D true fast imaging with steady-state precession (True FISP) sequence was optimized for whole-brain multi-pool CEST. Rapid B0, B1, and T1 mapping was performed using a dual-echo modified four-angle method. A feed-forward neural network was developed for rapid B1 correction, trained against the conventional multi-power method. The apparent exchange-dependent relaxation (AREX) metric was used to correct for T1 and magnetization transfer (MT) effects. The framework was validated in phantoms and healthy human subjects (N=8), including a test-retest reproducibility assessment. Results: The True FISP sequence yielded high-quality, whole-brain images with minimal artifacts and distortion in a clinically feasible scan time (~9 minutes). Phantom studies confirmed the effectiveness of B1 correction (coefficient of variation [CV] for MT_MTRLD decreased from 22.49% to 4.61%) and AREX-based confounder correction (CV for APT_AREX reduced from 33.6% to 6.9%). The neural network B1 correction showed excellent agreement with the conventional multi-power method in vivo (ICC > 0.97). High test-retest reproducibility was demonstrated across 96 brain regions, with the average CV for APT_AREX under 10% for over 95% of regions. Conclusion: A rapid and robust framework for whole-brain quantitative multi-pool CEST imaging was successfully developed and validated. By integrating an efficient acquisition sequence with a streamlined correction pipeline, this approach overcomes key barriers to clinical translation, enabling reliable metabolic imaging for widespread brain pathologies.

目的：开发并验证一个用于快速、准确和可重复的全脑多池化学交换饱和转移（CEST）成像框架，该框架在3T环境下运行，旨在解决长时间采集和干扰因素带来的挑战。方法：优化了一种单次激发的3D稳态真实快速成像序列（True FISP），用于全脑多池CEST成像。使用双回声修正四角度方法进行快速B0、B1和T1映射。开发了一种前馈神经网络进行快速B1校正，该网络采用常规多功率方法进行训练。采用表观交换相关松弛（AREX）度量来校正T1和磁化转移（MT）效应。该框架在幻像和健康受试者（N=8）中进行验证，包括测试-重测可重复性的评估。结果：True FISP序列在可行临床扫描时间（约9分钟）内产生高质量的全脑图像，几乎无伪影和失真。幻影研究证实了B1校正的有效性（MT_MTRLD的变异系数从22.49%降至4.61%），以及基于AREX的干扰因素校正的有效性（APT_AREX的变异系数从33.6%降至6.9%）。神经网络B1校正在体内与常规多功率方法显示出高度一致性（ICC> 0.97）。在96个脑区中的测试-重测表现出高可重复性，其中APT_AREX的平均变异系数低于10%，超过95%的区域均如此。结论：成功开发并验证了一种用于全脑定量多池CEST成像的快速稳健框架。通过结合高效的采集序列和简化的校正管道，该方法克服了临床转化的关键障碍，可实现可靠的代谢成像，用于广泛的脑部病变。

论文及项目相关链接

PDF KEYWORDS: chemical exchange saturation transfer (CEST), amide proton transfer (APT) whole-brain, multi-pool, true fast imaging with steady-state precession (True FISP), balanced steady state free precession (bSSFP)

Summary
在面临长采集时间和干扰因素挑战的情况下，研究团队开发并验证了一种用于快速、准确和可重复的全脑多池化学交换饱和转移（CEST）成像框架。该框架采用优化的单射频三维真实快速成像序列，结合稳态进动（True FISP），并利用先进的B0、B1和T1映射技术进行修正。同时，研究团队采用神经网络进行快速B1校正，并引入表观交换依赖松弛（AREX）指标来校正T1和磁化转移（MT）效应。在幻影研究和健康受试者中验证了该框架的有效性，显示出高测试再测试重现性。总结而言，该研究为临床快速可靠的全脑多池CEST成像提供了有力支持。

Key Takeaways

研究旨在解决全脑多池CEST成像中的长时间采集和干扰因素挑战。
采用优化的单射频三维True FISP序列实现快速成像。
通过先进的B0、B1和T1映射技术进行图像修正。
利用神经网络进行快速B1校正。
引入AREX指标校正T1和MT效应。
在幻影和真实受试者中验证了框架的有效性。

Cool Papers

点此查看论文截图

Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation

Authors:Youngmin Kim, Saejin Kim, Hoyeon Moon, Youngjae Yu, Junhyug Noh

Scalp disorders are highly prevalent worldwide, yet remain underdiagnosed due to limited access to expert evaluation and the high cost of annotation. Although AI-based approaches hold great promise, their practical deployment is hindered by challenges such as severe data imbalance and the absence of pixel-level segmentation labels. To address these issues, we propose ScalpVision, an AI-driven system for the holistic diagnosis of scalp diseases. In ScalpVision, effective hair segmentation is achieved using pseudo image-label pairs and an innovative prompting method in the absence of traditional hair masking labels. Additionally, ScalpVision introduces DiffuseIT-M, a generative model adopted for dataset augmentation while maintaining hair information, facilitating improved predictions of scalp disease severity. Our experimental results affirm ScalpVision’s efficiency in diagnosing a variety of scalp conditions, showcasing its potential as a valuable tool in dermatological care. Our code is available at https://github.com/winston1214/ScalpVision.

头皮疾病在全球范围内高发，但由于缺乏专家评估和标注成本高昂，其诊断率仍然较低。尽管人工智能方法具有巨大潜力，但其实际应用仍面临严重的数据不平衡和缺乏像素级分割标签等挑战。为了解决这些问题，我们提出了ScalpVision，一个用于头皮疾病整体诊断的人工智能系统。在ScalpVision中，利用伪图像标签对和一种创新的提示方法，在没有传统头发遮挡标签的情况下实现了有效的头发分割。此外，ScalpVision引入了DiffuseIT-M，一个用于数据集增强的生成模型，同时保持头发信息，有助于改进头皮疾病严重程度的预测。我们的实验结果证实了ScalpVision在诊断各种头皮状况方面的有效性，展示了其在皮肤科护理中的潜在价值。我们的代码可在https://github.com/winston1214/ScalpVision获取。

Summary

本文介绍了ScalpVision系统，一个用于头皮疾病整体诊断的人工智能驱动系统。针对头皮疾病普遍存在的诊断困难，系统利用伪图像标签对和创新提示方法实现了有效的头发分割，并采用DiffuseIT-M生成模型进行数据增强，提高了头皮疾病预测的准确度。实验结果证明了ScalpVision在诊断多种头皮疾病方面的效率，展现出在皮肤科护理中的潜在价值。

Key Takeaways

头皮疾病普遍存在且诊断困难，缺乏专家评估和标注成本高昂是主要问题。
提出ScalpVision系统，一个用于头皮疾病整体诊断的人工智能解决方案。
利用伪图像标签对和创新提示方法实现头发分割，无需传统头发遮挡标签。
采用DiffuseIT-M生成模型进行数据增强，保持头发信息，提高疾病预测准确性。
实验结果证明了ScalpVision系统在诊断多种头皮疾病方面的效率。
ScalpVision具有潜在价值，可成为皮肤科护理的重要工具。

Cool Papers

点此查看论文截图

Multilingual Diversity Improves Vision-Language Representations

Authors:Thao Nguyen, Matthew Wallingford, Sebastin Santy, Wei-Chiu Ma, Sewoong Oh, Ludwig Schmidt, Pang Wei Koh, Ranjay Krishna

Massive web-crawled image-text datasets lay the foundation for recent progress in multimodal learning. These datasets are designed with the goal of training a model to do well on standard computer vision benchmarks, many of which, however, have been shown to be English-centric (e.g., ImageNet). Consequently, existing data curation techniques gravitate towards using predominantly English image-text pairs and discard many potentially useful non-English samples. Our work questions this practice. Multilingual data is inherently enriching not only because it provides a gateway to learn about culturally salient concepts, but also because it depicts common concepts differently from monolingual data. We thus conduct a systematic study to explore the performance benefits of using more samples of non-English origins with respect to English vision tasks. By translating all multilingual image-text pairs from a raw web crawl to English and re-filtering them, we increase the prevalence of (translated) multilingual data in the resulting training set. Pre-training on this dataset outperforms using English-only or English-dominated datasets on ImageNet, ImageNet distribution shifts, image-English-text retrieval and on average across 38 tasks from the DataComp benchmark. On a geographically diverse task like GeoDE, we also observe improvements across all regions, with the biggest gain coming from Africa. In addition, we quantitatively show that English and non-English data are significantly different in both image and (translated) text space. We hope that our findings motivate future work to be more intentional about including multicultural and multilingual data, not just when non-English or geographically diverse tasks are involved, but to enhance model capabilities at large. All translated captions and metadata (language, CLIP score, etc.) are available on HuggingFace.

网络爬虫大规模图像文本数据集为最近的多模态学习进步奠定了基础。这些数据集的设计目标是在标准计算机视觉基准测试上表现良好，然而，其中许多基准测试已被证明是英语为中心的（例如ImageNet）。因此，现有的数据整理技术倾向于使用主要的英语图像文本对，并丢弃许多潜在的有用的非英语样本。我们的工作对这一做法提出质疑。多元语言数据天然丰富，不仅因为它提供了了解文化显著概念的大门，还因为它以不同于单语数据的方式描述常见概念。因此，我们进行了一项系统研究，以探索在使用非英语样本时，在英文视觉任务方面的性能优势。通过将所有多元语言图像文本对从原始网络爬虫翻译成英文并重新过滤，我们在训练集中增加了（翻译后的）多元语言数据的重要性。在此数据集上进行预训练，在ImageNet、ImageNet分布转移、图像英文文本检索以及DataComp基准测试中的38项任务上的表现，都优于仅使用英语或英语主导的数据集。在地理多样性任务（如GeoDE）中，我们也观察到所有地区的改进，其中最大的收益来自非洲。此外，我们还定量地证明了英语和非英语数据在图像和（翻译后的）文本空间上的差异是显著的。我们希望我们的研究结果能激励未来的研究更加有意识地包含多元文化和多元语言的数据，不仅在涉及非英语或地理多样性任务时如此，而且旨在提高模型的整体能力。所有翻译的标题和元数据（语言、CLIP分数等）都可以在HuggingFace上找到。

论文及项目相关链接

PDF NeurIPS 2024 Spotlight paper

Summary
多模态学习受益于大规模网络爬取图像文本数据集，但这些数据集主要以英文为中心。本文研究使用非英语样本的优势，通过翻译网络爬取的多语言图像文本对并重新筛选，发现使用这种数据集进行预训练在多个任务上的表现优于仅使用英语或英语主导的数据集。本文研究鼓励未来研究更积极地纳入多元文化和多语言数据，以提高模型的整体能力。

Key Takeaways