嘘~ 正在从服务器偷取页面 . . .

医学图像


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-11-27 更新

MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities

Authors:Tooba Tehreem Sheikh, Jean Lahoud, Rao Muhammad Anwer, Fahad Shahbaz Khan, Salman Khan, Hisham Cholakkal

Traditional object detection models in medical imaging operate within a closed-set paradigm, limiting their ability to detect objects of novel labels. Open-vocabulary object detection (OVOD) addresses this limitation but remains underexplored in medical imaging due to dataset scarcity and weak text-image alignment. To bridge this gap, we introduce MedROV, the first Real-time Open Vocabulary detection model for medical imaging. To enable open-vocabulary learning, we curate a large-scale dataset, Omnis, with 600K detection samples across nine imaging modalities and introduce a pseudo-labeling strategy to handle missing annotations from multi-source datasets. Additionally, we enhance generalization by incorporating knowledge from a large pre-trained foundation model. By leveraging contrastive learning and cross-modal representations, MedROV effectively detects both known and novel structures. Experimental results demonstrate that MedROV outperforms the previous state-of-the-art foundation model for medical image detection with an average absolute improvement of 40 mAP50, and surpasses closed-set detectors by more than 3 mAP50, while running at 70 FPS, setting a new benchmark in medical detection. Our source code, dataset, and trained model are available at https://github.com/toobatehreem/MedROV.

在医学成像领域,传统的目标检测模型通常在封闭集范式内运行,这限制了它们检测新型标签对象的能力。开放词汇目标检测(OVOD)解决了这一局限性,但由于数据集稀缺和文本图像对齐度弱,其在医学成像中的应用仍然被较少探索。为了弥补这一差距,我们引入了MedROV,这是医学成像领域首个实时开放词汇检测模型。为了实现开放词汇学习,我们整理了一个大规模数据集Omnis,该数据集包含九种成像模式的60万检测样本,并引入了一种伪标签策略来处理多源数据集的缺失注释。此外,我们通过融入大型预训练基础模型的知识来提高通用性。通过利用对比学习和跨模态表示,MedROV可以有效地检测已知和新型结构。实验结果表明,MedROV在医学图像检测方面超越了之前最先进的检测模型,平均绝对提高了40 mAP50,并且比封闭集检测器高出超过3 mAP50,同时以每秒70帧的速度运行,为医学检测设定了新的基准。我们的源代码、数据集和训练模型可在https://github.com/toobatehreem/MedROV找到。

论文及项目相关链接

PDF

Summary
医学成像中传统目标检测模型采用封闭集模式,无法检测新型标签的对象。开放词汇目标检测(OVOD)解决了此局限性,但在医学成像中因数据集稀缺和文本图像对齐不佳而未被充分探索。为弥补这一空白,我们推出MedROV,首个医学成像实时开放词汇检测模型。通过构建大型Omnis数据集和采用伪标签策略处理多源数据集的缺失注释,实现开放词汇学习。结合预训练大型基础模型的知识,提高通用性。利用对比学习和跨模态表示,MedROV可有效检测已知和新型结构。实验结果证明,MedROV在医学图像检测方面超越最新基础模型,平均绝对提升40 mAP50,较封闭集检测器提升超过3 mAP50,且以每秒70帧运行,为医学检测设定新基准。

Key Takeaways

  1. 传统医学成像目标检测模型受限于封闭集模式,无法检测新型标签对象。
  2. 开放词汇目标检测(OVOD)有助于解决此问题,但在医学成像中应用受限。
  3. MedROV是首个医学成像实时开放词汇检测模型,旨在弥合这一差距。
  4. 通过大型数据集Omnis和伪标签策略实现开放词汇学习,处理多源数据的缺失注释。
  5. 结合预训练基础模型知识,增强模型的通用性。
  6. 利用对比学习和跨模态表示,MedROV能检测已知和新型结构。
  7. MedROV在医学图像检测方面表现优异,超越最新基础模型和封闭集检测器,且运行速度快。

Cool Papers

点此查看论文截图

PixelDiT: Pixel Diffusion Transformers for Image Generation

Authors:Yongsheng Yu, Wei Xiong, Weili Nie, Yichen Sheng, Shiqiu Liu, Jiebo Luo

Latent-space modeling has been the standard for Diffusion Transformers (DiTs). However, it relies on a two-stage pipeline where the pretrained autoencoder introduces lossy reconstruction, leading to error accumulation while hindering joint optimization. To address these issues, we propose PixelDiT, a single-stage, end-to-end model that eliminates the need for the autoencoder and learns the diffusion process directly in the pixel space. PixelDiT adopts a fully transformer-based architecture shaped by a dual-level design: a patch-level DiT that captures global semantics and a pixel-level DiT that refines texture details, enabling efficient training of a pixel-space diffusion model while preserving fine details. Our analysis reveals that effective pixel-level token modeling is essential to the success of pixel diffusion. PixelDiT achieves 1.61 FID on ImageNet 256x256, surpassing existing pixel generative models by a large margin. We further extend PixelDiT to text-to-image generation and pretrain it at the 1024x1024 resolution in pixel space. It achieves 0.74 on GenEval and 83.5 on DPG-bench, approaching the best latent diffusion models.

隐式空间建模一直是扩散变压器(DiTs)的标准。然而,它依赖于一个两阶段的管道流程,其中预训练的自动编码器会导致有损重建,从而导致误差累积并阻碍联合优化。为了解决这些问题,我们提出了PixelDiT,这是一个单阶段端到端的模型,它不需要自动编码器,直接在像素空间学习扩散过程。PixelDiT采用基于完全变压器的架构,由两级设计构成:一个用于捕获全局语义的补丁级DiT和一个用于细化纹理细节的像素级DiT,这能够在保持细节的同时,有效地训练像素空间扩散模型。我们的分析表明,有效的像素级令牌建模是像素扩散成功的关键。PixelDiT在ImageNet 256x256上实现了1.61的FID,大大超越了现有的像素生成模型。我们将PixelDiT进一步扩展到文本到图像生成,并在像素空间以1024x1024的分辨率进行预训练。它在GenEval上达到了0.74,在DPG-bench上达到了83.5,接近最佳的隐式扩散模型。

论文及项目相关链接

PDF

Summary

扩散模型(Diffusion Transformers,简称DiTs)通常采用潜在空间建模。然而,其依赖于两阶段流程,预训练的自动编码器会产生有损重建,导致误差累积并阻碍联合优化。为解决这些问题,我们提出PixelDiT,一种单阶段端到端模型,直接在像素空间学习扩散过程。PixelDiT采用基于变压器的架构,由两级设计构成:用于捕捉全局语义的补丁级DiT和用于细化纹理细节的像素级DiT。分析表明,有效的像素级令牌建模是像素扩散成功的关键。PixelDiT在ImageNet 256x256上实现了1.61的FID分数,大幅超越了现有像素生成模型。我们还扩展了PixelDiT进行文本到图像的生成,并在像素空间以1024x1024的分辨率进行预训练。其在GenEval上得分0.74,在DPG-bench上得分83.5,接近最佳的潜在扩散模型。

Key Takeaways

  1. 扩散模型通常使用潜在空间建模,但存在两阶段流程和自动编码器引起的有损重建问题。
  2. PixelDiT是一种单阶段端到端模型,直接在像素空间学习扩散过程,解决了上述问题。
  3. PixelDiT采用两级设计,包括补丁级DiT捕捉全局语义,像素级DiT细化纹理细节。
  4. 有效的像素级令牌建模是PixelDiT成功的关键。
  5. PixelDiT在ImageNet 256x256上实现了高FID分数,显著优于现有像素生成模型。
  6. PixelDiT可扩展至文本到图像生成,并在像素空间高分辨率预训练。

Cool Papers

点此查看论文截图

Wanderland: Geometrically Grounded Simulation for Open-World Embodied AI

Authors:Xinhao Liu, Jiaqi Li, Youming Deng, Ruxin Chen, Yingjia Zhang, Yifei Ma, Li Guo, Yiming Li, Jing Zhang, Chen Feng

Reproducible closed-loop evaluation remains a major bottleneck in Embodied AI such as visual navigation. A promising path forward is high-fidelity simulation that combines photorealistic sensor rendering with geometrically grounded interaction in complex, open-world urban environments. Although recent video-3DGS methods ease open-world scene capturing, they are still unsuitable for benchmarking due to large visual and geometric sim-to-real gaps. To address these challenges, we introduce Wanderland, a real-to-sim framework that features multi-sensor capture, reliable reconstruction, accurate geometry, and robust view synthesis. Using this pipeline, we curate a diverse dataset of indoor-outdoor urban scenes and systematically demonstrate how image-only pipelines scale poorly, how geometry quality impacts novel view synthesis, and how all of these adversely affect navigation policy learning and evaluation reliability. Beyond serving as a trusted testbed for embodied navigation, Wanderland’s rich raw sensor data further allows benchmarking of 3D reconstruction and novel view synthesis models. Our work establishes a new foundation for reproducible research in open-world embodied AI. Project website is at https://ai4ce.github.io/wanderland/.

在诸如视觉导航等嵌入式人工智能中,可重复闭环评估仍然是一个主要瓶颈。前景看好的路径是高保真模拟,它将逼真的传感器渲染与复杂开放世界城市环境中的几何交互相结合。尽管最近的视频-三维全局传感器方法简化了开放世界场景的捕获,但由于视觉和几何仿真到现实的巨大差距,它们仍然不适合作为基准测试。为了应对这些挑战,我们引入了Wanderland,这是一个从现实到模拟的框架,具有多传感器捕获、可靠重建、精确几何和稳健视图合成等特点。使用这个管道,我们筛选了室内外城市场景的多样化数据集,系统地展示了仅图像管道如何表现不佳,几何质量如何影响新颖视图合成,以及所有这些如何对导航策略的学习和评估可靠性产生不利影响。除了作为可靠的测试平台外,Wanderland丰富的原始传感器数据还允许对三维重建和新颖视图合成模型进行基准测试。我们的工作为开放世界嵌入式人工智能的可重复研究建立了新的基础。项目网站是https://ai4ce.github.io/wanderland/。

论文及项目相关链接

PDF

Summary

在Embodied AI(如视觉导航)领域,可重复闭环评估仍是主要的瓶颈之一。一项有前景的解决方案是采用高保真模拟,将逼真传感器渲染与复杂的开放世界城市环境中的几何交互相结合。尽管最近的视频-三维图形空间方法简化了开放世界场景的捕获,但由于存在视觉和几何的模拟与现实差距较大,它们仍然不适合作为基准测试。为了应对这些挑战,我们引入了Wanderland,这是一个从现实到模拟的框架,具有多传感器捕捉、可靠的重建、精确的地形图和稳健的视角合成等功能。通过使用这一流程,我们采集了室内外城市场景的多样数据集,并系统地展示了仅有图像流程存在的拓展性问题、地形图质量对全新视角合成的影响,以及所有这些对导航策略的学习和评估可靠性的负面影响。除了成为可靠的测试平台外,Wanderland丰富的原始传感器数据还允许对三维重建和全新视角合成模型进行基准测试。我们的研究为开放世界实体人工智能领域的可重复研究建立了新的基础。

Key Takeaways

  1. Embodied AI的可重复闭环评估是其主要瓶颈之一。
  2. 高保真模拟结合了逼真的传感器渲染和复杂城市环境中的几何交互,成为解决这一问题的有前途的方法。
  3. 尽管视频-三维图形空间方法简化了开放世界场景的捕获,但由于视觉和几何的模拟与现实差距大,它们仍不适合基准测试。
  4. Wanderland框架具有多传感器捕捉、可靠的重建等功能,以解决这些挑战。
  5. Wanderland的数据集展示了仅有图像流程的拓展性问题以及地形图质量对视角合成的影响。
  6. 这些因素也对导航策略的学习和评估可靠性产生影响。

Cool Papers

点此查看论文截图

PhysChoreo: Physics-Controllable Video Generation with Part-Aware Semantic Grounding

Authors:Haoze Zhang, Tianyu Huang, Zichen Wan, Xiaowei Jin, Hongzhi Zhang, Hui Li, Wangmeng Zuo

While recent video generation models have achieved significant visual fidelity, they often suffer from the lack of explicit physical controllability and plausibility. To address this, some recent studies attempted to guide the video generation with physics-based rendering. However, these methods face inherent challenges in accurately modeling complex physical properties and effectively control ling the resulting physical behavior over extended temporal sequences. In this work, we introduce PhysChoreo, a novel framework that can generate videos with diverse controllability and physical realism from a single image. Our method consists of two stages: first, it estimates the static initial physical properties of all objects in the image through part-aware physical property reconstruction. Then, through temporally instructed and physically editable simulation, it synthesizes high-quality videos with rich dynamic behaviors and physical realism. Experimental results show that PhysChoreo can generate videos with rich behaviors and physical realism, outperforming state-of-the-art methods on multiple evaluation metrics.

虽然最近的视频生成模型在视觉保真度方面取得了显著的成效,但它们通常缺乏明确的物理可控性和合理性。为了解决这一问题,一些近期的研究尝试使用基于物理的渲染来指导视频生成。然而,这些方法在准确建模复杂物理特性和有效控制在较长时序序列中的物理行为方面存在固有的挑战。在这项工作中,我们介绍了PhysChoreo,这是一种可以从单张图像生成具有多样可控性和物理真实感的视频的新型框架。我们的方法分为两个阶段:首先,通过部件感知的物理属性重建,估计图像中所有物体的静态初始物理属性。然后,通过时间指令和可编辑的物理模拟,合成高质量的视频,具有丰富的动态行为和物理真实感。实验结果表明,PhysChoreo可以生成行为丰富、物理真实的视频,在多个评价指标上优于最先进的方法。

论文及项目相关链接

PDF

Summary

本文提出了一种名为PhysChoreo的新型视频生成框架,能够从单张图像生成具有多样可控性和物理真实感的视频。该方法通过两个阶段实现:首先估计图像中所有物体的静态初始物理属性,然后通过受时间指令和可编辑的物理模拟来合成高质量、丰富动态行为和物理真实的视频。实验结果表明,PhysChoreo在多个评估指标上优于现有方法。

Key Takeaways

  1. 近期视频生成模型虽实现了较高的视觉保真度,但缺乏明确的物理可控性和可信度。
  2. 一些研究尝试使用基于物理的渲染来指导视频生成,但面临准确建模复杂物理特性和有效控制长时间序列物理行为的挑战。
  3. 引入的新型框架PhysChoreo,能够从单张图像生成具有多样可控性和物理真实感的视频。
  4. PhysChoreo通过两个阶段实现:估计图像中物体的静态初始物理属性,然后通过受时间指令和可编辑的物理模拟来合成视频。
  5. PhysChoreo能够生成丰富行为和物理真实的视频。
  6. 对比多个评估指标,PhysChoreo在视频生成方面表现出优于现有方法的性能。

Cool Papers

点此查看论文截图

Time-Domain Linear Model-based Framework for Passive Acoustic Mapping of Cavitation Activity

Authors:Tatiana Gelvez-Barrera, Barbara Nicolas, Denis Kouamé, Bruno Gilles, Adrian Basarab

Passive acoustic mapping enables the spatial mapping and temporal monitoring of cavitation activity, playing a crucial role in therapeutic ultrasound applications. Most conventional beamforming methods, whether implemented in the time or frequency domains, suffer from limited axial resolution due to the absence of a reference emission onset time. While frequency-domain methods, the most efficient of which are based on the cross-spectral matrix, require long signals for accurate estimation, time-domain methods typically achieve lower spatial resolution. To address these limitations, we propose a linear model-based beamforming framework fully formulated in the time domain. The linear forward model relates a discretized spatiotemporal distribution of cavitation activity to the temporal signals recorded by a probe, explicitly accounting for time-of-flight delays dictated by the acquisition geometry. This model is then inverted using regularization techniques that exploit prior knowledge of cavitation activity in both spatial and temporal domains. Experimental results show that the proposed framework achieves enhanced or competitive cavitation map quality while using only 20% of the data typically required by frequency-domain methods. This highlights the substantial gain in data efficiency and the flexibility of our spatiotemporal regularization to adapt to diverse passive cavitation scenarios, outperforming state-of-the-art techniques.

被动声学映射能够实现空化活动的空间映射和时间监测,在超声治疗应用中发挥着至关重要的作用。大多数传统的波束形成方法,无论是在时间域还是频率域实现,都因缺少参考发射起始时间而受限于轴向分辨率。虽然基于频域的方法,其中最有效的是基于交叉谱矩阵的方法,需要长信号来进行准确估计,但时间域方法通常实现较低的空间分辨率。为了解决这些局限性,我们提出了一种完全以时间域形式制定的基于线性模型的波束形成框架。线性前向模型将离散时空分布的空化活动与探头记录的临时信号关联起来,明确考虑了由采集几何决定的飞行时间延迟。然后,该模型使用在空域和时间域利用空化活动先验知识的正则化技术进行反演。实验结果表明,所提出的框架在仅使用频率域方法通常需要的数据的20%的情况下,实现了增强或竞争性的空化图质量。这突出了我们的时空正则化在数据效率方面的巨大收益,以及其适应各种被动空化场景的灵活性,优于现有技术。

论文及项目相关链接

PDF

Summary

该文介绍了一种基于线性模型的时域波束形成框架,用于被动声学映射中的空化活动空间映射和时间监测。该框架解决了传统波束形成方法在轴向分辨率方面的限制,并通过时空正则化技术实现数据的高效利用,以适应不同的被动空化场景。

Key Takeaways

  1. 被动声学映射可实现空化活动的空间映射和时间监测,对治疗超声应用至关重要。
  2. 传统的波束形成方法,无论在时间域还是频率域,都受到轴向分辨率的限制。
  3. 频率域方法需要长信号进行准确估计,而时间域方法的空间分辨率通常较低。
  4. 提出的线性模型基波束形成框架完全在时域中构建,解决上述限制。
  5. 该框架通过明确考虑飞行时间的延迟来关联空化活动的时空分布与探头记录的临时信号。
  6. 使用正则化技术对该模型进行反演,利用空化活动在空间和时间的先验知识。

Cool Papers

点此查看论文截图

A Physics-Informed Loss Function for Boundary-Consistent and Robust Artery Segmentation in DSA Sequences

Authors:Muhammad Irfan, Nasir Rahim, Khalid Mahmood Malik

Accurate extraction and segmentation of the cerebral arteries from digital subtraction angiography (DSA) sequences is essential for developing reliable clinical management models of complex cerebrovascular diseases. Conventional loss functions often rely solely on pixel-wise overlap, overlooking the geometric and physical consistency of vascular boundaries, which can lead to fragmented or unstable vessel predictions. To overcome this limitation, we propose a novel \textit{Physics-Informed Loss} (PIL) that models the interaction between the predicted and ground-truth boundaries as an elastic process inspired by dislocation theory in materials physics. This formulation introduces a physics-based regularization term that enforces smooth contour evolution and structural consistency, allowing the network to better capture fine vascular geometry. The proposed loss is integrated into several segmentation architectures, including U-Net, U-Net++, SegFormer, and MedFormer, and evaluated on two public benchmarks: DIAS and DSCA. Experimental results demonstrate that PIL consistently outperforms conventional loss functions such as Cross-Entropy, Dice, Active Contour, and Surface losses, achieving superior sensitivity, F1 score, and boundary coherence. These findings confirm that the incorporation of physics-based boundary interactions into deep neural networks improves both the precision and robustness of vascular segmentation in dynamic angiographic imaging. The implementation of the proposed method is publicly available at https://github.com/irfantahir301/Physicsis_loss.

从数字减影血管造影(DSA)序列中准确提取和分割脑动脉,对于建立可靠的复杂脑血管疾病临床管理模型至关重要。传统的损失函数通常仅依赖于像素级的重叠,忽视了血管边界的几何和物理一致性,这可能导致血管预测出现碎片化或不稳定。为了克服这一局限性,我们提出了一种新的物理信息损失(PIL)方法,该方法将预测边界和真实边界之间的相互作用建模为一个弹性过程,该过程受到材料物理学中的位错理论的启发。这种表述引入了基于物理的正则化项,强制轮廓平滑演变和结构一致性,使网络能够更好地捕捉精细的血管几何结构。所提出的损失函数被集成到几种分割架构中,包括U-Net、U-Net++、SegFormer和MedFormer,并在两个公共基准测试(DIAS和DSCA)上进行了评估。实验结果表明,PIL始终优于传统的损失函数,如交叉熵、Dice、主动轮廓和表面损失,具有更高的灵敏度、F1分数和边界一致性。这些发现证实,将基于物理的边界相互作用纳入深度神经网络,可以提高动态血管造影图像中血管分割的准确性和稳健性。该方法的实现可在https://github.com/irfantahir301/Physicsis_loss公开获取。

论文及项目相关链接

PDF

Summary

本文提出了一种新的物理信息损失(Physics-Informed Loss,PIL)方法,用于在数字减影血管造影(DSA)序列中准确提取和分割脑动脉。该方法通过模拟预测边界和真实边界之间的相互作用,以材料物理学中的位错理论为灵感,引入物理正则化项,提高了血管分割的精度和鲁棒性。实验结果表明,与传统的损失函数相比,PIL在动态血管造影成像中的血管分割性能更优。

Key Takeaways

  1. 准确提取和分割脑动脉对临床管理模型的发展至关重要。
  2. 传统损失函数通常仅依赖于像素级重叠,忽略了血管边界的几何和物理一致性,导致预测结果可能碎片化或不稳定。
  3. 提出了全新的物理信息损失(PIL)方法来解决这个问题。
  4. PIL方法模拟预测边界和真实边界之间的相互作用,灵感来源于材料物理学中的位错理论。
  5. PIL引入了物理正则化项来强制执行平滑轮廓演变和结构一致性。
  6. 实验结果表明,与多种常规损失函数相比,PIL实现了更高的灵敏度、F1分数和边界一致性。

Cool Papers

点此查看论文截图

A meshless data-tailored approach to compute statistics from scattered data with adaptive radial basis functions

Authors:Damien Rigutto, Manuel Ratz, Miguel A. Mendez

Constrained radial basis function (RBF) regression has recently emerged as a powerful meshless tool for reconstructing continuous velocity fields from scattered flow measurements, particularly in image-based velocimetry. However, existing formulations based on isotropic kernels often suffer from spurious oscillations in regions with sharp gradients or strong flow anisotropy. This work introduces an anisotropic, gradient-informed, and adaptively sampled extension of the constrained RBF framework for regression of scattered data. Gradient information is estimated via local polynomial regression at collocation points, smoothed, and used to (1) re-sample data, maximizing sampling density near steep gradients while downsampling in smooth regions, and (2) construct a local anisotropic metric that shapes each basis function according to the flow directionality. In addition, a gradient-informed regularization is introduced by embedding observed gradients into the least-squares system as weighted soft constraints. The resulting formulation is fully meshless, linear, and computationally efficient, while significantly improving reconstruction quality in challenging regions. The method is evaluated on both synthetic and experimental datasets, including direct numerical simulation (DNS) data of a turbulent channel and time-resolved particle tracking velocimetry of a turbulent jet. Results show that the proposed approach outperforms isotropic and gradient-free RBF formulations in accuracy, smoothness, and physical consistency – particularly near shear layers and boundaries – while reducing the number of bases by an order of magnitude. To support the application, we have created a repository (https://github.com/mendezVKI/SPICY_VKI) that provides access to the investigated datasets.

基于约束的径向基函数(RBF)回归最近作为一种强大的无网格工具出现,用于从散乱的流动测量值重建连续速度场,特别是在基于图像的测速术中。然而,基于同构内核的现有公式在梯度陡峭或流动方向性强的区域往往会出现虚假振荡。这项工作引入了约束RBF框架的异向、梯度知情和自适应采样扩展,用于回归散列数据。梯度信息是通过共位点处的局部多项式回归进行估计的,然后进行平滑处理,并用于(1)重新采样数据,在陡峭的梯度附近最大化采样密度,同时在平滑区域进行降采样;(2)根据流动方向性构建局部异向度量,塑造每个基函数。此外,通过将在观察到的梯度嵌入到最小二乘系统中作为加权软约束,引入了梯度知情的正则化。所得公式完全无网格、线性且计算高效,同时在具有挑战性的区域显著提高重建质量。该方法在合成数据集和实验数据集上进行了评估,包括湍流通道的直接数值模拟(DNS)数据和湍射流的实时粒子追踪测速数据。结果表明,所提出的方法在准确性、平滑性和物理一致性方面优于同构和不含梯度的RBF公式,特别是在剪切层和边界附近,同时减少了基的数量一个数量级。为了支持应用,我们创建了一个仓库(https://github.com/mendezVKI/SPICY_VKI),提供访问所研究数据集。

论文及项目相关链接

PDF Submitted to Experiments in Fluids

Summary

本论文介绍了约束径向基函数(RBF)回归的新方法,这一强大的无网格工具能够从散乱的流测量值中重建连续速度场,特别是在基于图像的测速中。针对现有基于同向核的方法在梯度区域或强流异性区域出现的虚假振荡问题,论文提出了一种新的自适应采样扩展方法,该方法结合了梯度信息和局部多项式回归。通过估算梯度信息并对其进行平滑处理,该方法能够重新采样数据并构建局部异向度量,以根据流动方向性调整每个基函数。此外,还引入了一种基于梯度的正则化方法,通过将观察到的梯度嵌入到最小二乘系统中作为加权软约束来提高模型性能。该方法在合成和实验数据集上的评估结果表明,新提出的方法在同向和非梯度RBF公式中表现出更高的准确性、平滑度和物理一致性,特别是在剪切层和边界附近。

Key Takeaways

  1. 论文提出了一种改进的约束径向基函数(RBF)回归方法,适用于从散乱数据中重建连续速度场。
  2. 针对梯度区域或强流异性区域,现有方法可能会出现虚假振荡的问题,新方法通过结合梯度信息和局部多项式回归来解决这一问题。
  3. 通过估算并平滑梯度信息,新方法实现了数据的自适应采样,同时在梯度区域增加采样密度,在平滑区域减少采样。
  4. 新方法构建了局部异向度量,根据流动方向性调整每个基函数。
  5. 引入了一种基于梯度的正则化方法,提高了模型的性能。
  6. 在合成和实验数据集上的评估表明,新方法在准确性和物理一致性方面优于现有方法。

Cool Papers

点此查看论文截图

VGGTFace: Topologically Consistent Facial Geometry Reconstruction in the Wild

Authors:Xin Ming, Yuxuan Han, Tianyu Huang, Feng Xu

Reconstructing topologically consistent facial geometry is crucial for the digital avatar creation pipelines. Existing methods either require tedious manual efforts, lack generalization to in-the-wild data, or are constrained by the limited expressiveness of 3D Morphable Models. To address these limitations, we propose VGGTFace, an automatic approach that innovatively applies the 3D foundation model, \emph{i.e.} VGGT, for topologically consistent facial geometry reconstruction from in-the-wild multi-view images captured by everyday users. Our key insight is that, by leveraging VGGT, our method naturally inherits strong generalization ability and expressive power from its large-scale training and point map representation. However, it is unclear how to reconstruct a topologically consistent mesh from VGGT, as the topology information is missing in its prediction. To this end, we augment VGGT with Pixel3DMM for injecting topology information via pixel-aligned UV values. In this manner, we convert the pixel-aligned point map of VGGT to a point cloud with topology. Tailored to this point cloud with known topology, we propose a novel Topology-Aware Bundle Adjustment strategy to fuse them, where we construct a Laplacian energy for the Bundle Adjustment objective. Our method achieves high-quality reconstruction in 10 seconds for 16 views on a single NVIDIA RTX 4090. Experiments demonstrate state-of-the-art results on benchmarks and impressive generalization to in-the-wild data. Code is available at https://github.com/grignarder/vggtface.

重建拓扑一致的面部几何形状对于数字化身创建流程至关重要。现有方法要么需要繁琐的手动操作,要么缺乏对野外数据的泛化能力,要么受到3D可变形模型表现力有限的制约。为了解决这些局限性,我们提出了VGGTFace方法,该方法自动应用3D基础模型(即VGGT),以从日常用户捕获的野外多视角图像重建拓扑一致的面部几何形状。我们的关键见解是,通过利用VGGT,我们的方法自然继承了强大的泛化能力和表现能力,这来自于其大规模训练和点云表示。然而,从VGGT重建拓扑一致的网格是不清晰的,因为其在预测中缺少拓扑信息。为此,我们通过像素对齐的UV值使用Pixel3DMM增强VGGT,以注入拓扑信息。通过这种方式,我们将VGGT的像素对齐点图转换为具有拓扑结构的点云。针对具有已知拓扑结构的点云,我们提出了一种新型拓扑感知捆绑调整策略来进行融合,我们为捆绑调整目标构建了拉普拉斯能量。我们的方法在单个NVIDIA RTX 4090上实现了16个视角的高质量重建,只需10秒即可完成。实验表明,该方法在基准测试上达到了最先进的水平,并在野外数据上具有良好的泛化能力。代码可在https://github.com/grignarder/vggtface中找到。

论文及项目相关链接

PDF

Summary

该文提出一种基于VGGT(三维基础模型)和Pixel3DMM(三维可变形模型)的面部几何重建方法。通过融合两种方法,实现从日常用户采集的多视角照片自动重建面部几何模型,具有强大的泛化能力和表现力。为解决VGGT缺乏拓扑信息的问题,引入Pixel3DMM注入拓扑信息,并采用拓扑感知捆绑调整策略融合点云数据,实现高质量快速重建。

Key Takeaways

  1. 提出基于VGGT和Pixel3DMM的面部几何重建方法,实现自动从多视角照片重建面部几何模型。
  2. 方法具有强大的泛化能力和表现力,适用于日常用户采集的多种面部图像。
  3. 通过引入Pixel3DMM注入拓扑信息,解决VGGT缺乏拓扑信息的问题。
  4. 采用拓扑感知捆绑调整策略融合点云数据,提高重建质量。
  5. 方法的处理速度快,能够在10秒内完成16视角的高质量重建。
  6. 方法在基准测试上表现优异,并能很好地泛化到实际场景数据。

Cool Papers

点此查看论文截图

From Passive Perception to Active Memory: A Weakly Supervised Image Manipulation Localization Framework Driven by Coarse-Grained Annotations

Authors:Zhiqing Guo, Dongdong Xi, Songlin Li, Gaobo Yang

Image manipulation localization (IML) faces a fundamental trade-off between minimizing annotation cost and achieving fine-grained localization accuracy. Existing fully-supervised IML methods depend heavily on dense pixel-level mask annotations, which limits scalability to large datasets or real-world deployment.In contrast, the majority of existing weakly-supervised IML approaches are based on image-level labels, which greatly reduce annotation effort but typically lack precise spatial localization. To address this dilemma, we propose BoxPromptIML, a novel weakly-supervised IML framework that effectively balances annotation cost and localization performance. Specifically, we propose a coarse region annotation strategy, which can generate relatively accurate manipulation masks at lower cost. To improve model efficiency and facilitate deployment, we further design an efficient lightweight student model, which learns to perform fine-grained localization through knowledge distillation from a fixed teacher model based on the Segment Anything Model (SAM). Moreover, inspired by the human subconscious memory mechanism, our feature fusion module employs a dual-guidance strategy that actively contextualizes recalled prototypical patterns with real-time observational cues derived from the input. Instead of passive feature extraction, this strategy enables a dynamic process of knowledge recollection, where long-term memory is adapted to the specific context of the current image, significantly enhancing localization accuracy and robustness. Extensive experiments across both in-distribution and out-of-distribution datasets show that BoxPromptIML outperforms or rivals fully-supervised models, while maintaining strong generalization, low annotation cost, and efficient deployment characteristics.

图像操作定位(IML)面临着最小化标注成本与实现精细定位精度之间的根本权衡。现有的全监督IML方法严重依赖于密集像素级掩膜标注,这限制了在大数据集或实际部署中的可扩展性。相比之下,大多数现有的弱监督IML方法依赖于图像级别的标签,这大大减少了标注工作,但通常缺乏精确的空间定位能力。为了解决这一困境,我们提出了BoxPromptIML,这是一种新型的弱监督IML框架,能够有效地平衡标注成本和定位性能。具体来说,我们提出了一种粗糙区域标注策略,可以在较低的成本下生成相对准确的操作掩膜。为了提高模型效率和便于部署,我们进一步设计了一个高效的轻量级学生模型,该模型通过从固定的教师模型(基于Anything Model (SAM))进行知识蒸馏来学习执行精细定位。此外,受到人类潜意识记忆机制的启发,我们的特征融合模块采用了一种双指导策略,该策略积极地将回忆中的原型模式与来自输入的实时观察线索相结合。这种策略不是被动的特征提取,而是使知识回忆成为一个动态过程,长期记忆能够适应当前图像的具体上下文,从而显著提高定位精度和稳健性。在内部和外部数据集上的大量实验表明,BoxPromptIML在保持强大的泛化能力、低标注成本和高效部署特点的同时,其性能超越了或相当于全监督模型。

论文及项目相关链接

PDF Accepted by AAAI 2026

Summary
BoxPromptIML框架解决了图像操作定位(IML)中的根本权衡问题,即在最小化标注成本与实现精细粒度定位准确性之间取得平衡。该框架采用弱监督方式,提出一种粗区域标注策略,以较低成本生成相对准确的操作掩膜。设计高效轻量级的学生模型,通过知识蒸馏从固定教师模型(基于Segment Anything Model)中学习进行精细粒度定位。受人类潜意识记忆机制的启发,特征融合模块采用双指导策略,将回忆中的原型模式与实时观察线索相结合,提高定位精度和稳健性。实验表明,BoxPromptIML在分布内和分布外数据集上的表现优于或相当于全监督模型,同时保持强大的泛化能力、低标注成本和高效的部署特性。

Key Takeaways

  1. BoxPromptIML解决了图像操作定位(IML)中标注成本与定位准确性之间的权衡问题。
  2. 该框架采用弱监督方式,提出粗区域标注策略,降低标注成本并生成相对准确的操作掩膜。
  3. 设计了高效轻量级的学生模型,通过知识蒸馏从教师模型学习精细粒度定位。
  4. 特征融合模块采用双指导策略,结合回忆中的原型模式和实时观察线索,提高定位精度和稳健性。
  5. BoxPromptIML在分布内和分布外数据集上的表现优越,与全监督模型相当或更优。
  6. 该框架保持强大的泛化能力。

Cool Papers

点此查看论文截图

Impact of Spectral Coverage on Parameter recovery in Blazar Modeling

Authors:N. Sahakyan, D. Bégué, P. Giommi, H. Dereli-Bégué, Asaf Pe’er

Understanding the impact of spectral coverage on parameter recovery is critical for accurate interpretation of blazar spectra. In this study, we examine how the data coverage influences the reliability of parameter estimation within the one-zone synchrotron self-Compton (SSC) framework. Using OJ 287, TXS 0506+056, and Mrk 421 as representative of the low-, intermediate- and high synchrotron peak classes (LSP, ISP and HSP), respectively, we generate synthetic SEDs based on their best-fit models and perform 1,000 fits for each of the 21 observational configurations per source type. Our analysis quantifies the coverage probability for all model parameters, such has the magnetic field strength and the electron luminosity, and reveals that different blazar subclasses exhibit distinct sensitivities to spectral gaps. For LSPs, a minimal dataset comprising optical/UV, X-ray, and GeV $γ$-ray bands is sufficient for robust parameter inference. In contrast, ISPs and HSPs require broader spectral coverage to constrain the physical parameters. For ISP, we find that reliable parameter recovery can be achieved with two different minimal band combinations: \textit{(i)} X-ray, high energy $γ$-ray, and very high energy $γ$-ray data, or \textit{(ii)} optical/UV, X-ray, and high energy $γ$-ray data. For HSPs, the minimal configuration enabling reliable parameter recovery includes the optical/UV, X-ray, and very high energy $γ$-ray bands. We discuss the role of very high energy $γ$-ray observations, showing that they significantly enhance parameter recovery for HSPs. Our results provide practical guidelines for designing optimized multi-wavelength observation campaigns and for assessing the robustness of SSC model inferences under incomplete spectral coverage.

了解光谱覆盖对参数恢复的影响对于准确解释耀变体光谱至关重要。本研究中,我们探讨了数据覆盖如何在单区同步自康普顿(SSC)框架下影响参数估计的可靠性。我们以OJ 287、TXS 0506+056和Mrk 421作为低同步峰值类(LSP)、中间同步峰值类(ISP)和高同步峰值类(HSP)的代表,基于其最佳模型生成合成光谱能量分布(SED),并针对每种源类型的21个观测配置执行1000次拟合。我们的分析量化所有模型参数的覆盖概率,如磁场强度和电子光度,并揭示不同的耀变体子类对光谱间隙表现出不同的敏感性。对于LSP而言,包含光学/紫外线、X射线和GeV伽马射线波段的最低数据集足以进行稳健的参数推断。相比之下,ISP和HSP需要更广泛的光谱覆盖来约束物理参数。对于ISP,我们发现可以通过两种不同的最低波段组合实现可靠的参数恢复:(i)X射线、高能伽马射线和极高能伽马射线数据,或(ii)光学/紫外线、X射线和高能伽马射线数据。对于HSP而言,能够实现可靠参数恢复的最低配置包括光学/紫外线、X射线和极高能伽马射线波段。我们讨论了极高能伽马射线观测的作用,表明它们显著提高HSP的参数恢复能力。我们的结果提供了设计优化多波长观测活动的实用指南,并评估了在不完整光谱覆盖下SSC模型推断的稳健性。

论文及项目相关链接

PDF Accepted for publication in ApJ

摘要

本文研究了光谱覆盖范围对一维同步辐射自康普顿(SSC)框架下参数恢复可靠性的影响。通过对OJ 287、TXS 0506+056和Mrk 421等不同子类暴来源进行综合分析,发现不同子类暴来源对光谱间隙的敏感性存在差异。对于低同步峰值类(LSP),只需包含光学/紫外、X射线和盖革(γ)射线频段的最小数据集就可以实现稳健的参数推断。而对于中间同步峰值类(ISP)和高同步峰值类(HSP),则需要更广泛的光谱覆盖来限制物理参数。本研究提供了关于如何设计优化多波长观测计划和评估不完整光谱覆盖下SSC模型推断的稳健性的实用指导。

关键见解

  1. 研究了光谱覆盖范围对参数恢复可靠性的影响,这对于准确解释暴来源光谱至关重要。
  2. 通过分析三种不同类型的暴来源(低同步峰值类、中间同步峰值类和高同步峰值类),发现不同类型暴来源在参数恢复过程中对光谱覆盖的敏感性不同。
  3. 对于低同步峰值类暴来源,光学/紫外、X射线和盖革(γ)射线频段的最小数据集足以实现稳健的参数推断。
  4. 中间同步峰值类和高同步峰值类暴来源需要更广泛的光谱覆盖来限制物理参数。
  5. 研究发现,对于中间同步峰值类暴来源,有两种不同的最小频带组合可以实现可靠的参数恢复。
  6. 高同步峰值类暴来源的光谱覆盖中,光学/紫外、X射线和极高能γ射线频段是实现可靠参数恢复的最小配置。

Cool Papers

点此查看论文截图

TaCo: Capturing Spatio-Temporal Semantic Consistency in Remote Sensing Change Detection

Authors:Han Guo, Chenyang Liu, Haotian Zhang, Bowen Chen, Zhengxia Zou, Zhenwei Shi

Remote sensing change detection (RSCD) aims to identify surface changes across bi-temporal satellite images. Most previous methods rely solely on mask supervision, which effectively guides spatial localization but provides limited constraints on the temporal semantic transitions. Consequently, they often produce spatially coherent predictions while still suffering from unresolved semantic inconsistencies. To address this limitation, we propose TaCo, a spatio-temporal semantic consistent network, which enriches the existing mask-supervised framework with a spatio-temporal semantic joint constraint. TaCo conceptualizes change as a semantic transition between bi-temporal states, in which one temporal feature representation can be derived from the other via dedicated transition features. To realize this, we introduce a Text-guided Transition Generator that integrates textual semantics with bi-temporal visual features to construct the cross-temporal transition features. In addition, we propose a spatio-temporal semantic joint constraint consisting of bi-temporal reconstruct constraints and a transition constraint: the former enforces alignment between reconstructed and original features, while the latter enhances discrimination for changes. This design can yield substantial performance gains without introducing any additional computational overhead during inference. Extensive experiments on six public datasets, spanning both binary and semantic change detection tasks, demonstrate that TaCo consistently achieves SOTA performance.

遥感变化检测(RSCD)旨在识别双时态卫星图像中的地表变化。大多数以前的方法仅依赖于掩膜监督,这虽然可以有效地指导空间定位,但在时间语义转换方面提供的约束有限。因此,它们通常会生成空间连贯的预测,但仍存在未解决的语义不一致问题。为了解决这一局限性,我们提出了TaCo,一个时空语义一致网络,它丰富了现有的掩膜监督框架,加入了时空语义联合约束。TaCo将变化概念化为双时态状态之间的语义转换,其中一个时态的特征表示可以通过专门的转换特征从另一个时态推导出来。为了实现这一点,我们引入了文本引导转换生成器,它将文本语义与双时态视觉特征相结合,构建跨时间转换特征。此外,我们提出了一个时空语义联合约束,包括双时态重建约束和转换约束:前者强制重建特征和原始特征之间的对齐,后者增强了变化的鉴别力。这种设计可以在推理过程中不引入任何额外的计算开销的情况下实现显著的性能提升。在六个公开数据集上的广泛实验,涵盖了二进制和语义变化检测任务,证明TaCo始终实现了最先进的性能。

论文及项目相关链接

PDF

Summary

本文介绍了一种名为TaCo的时空语义一致性网络,用于遥感图像变化检测。该网络通过引入时空语义联合约束,丰富了现有的掩膜监督框架。TaCo将变化概念化为两个时态状态之间的语义过渡,通过专门的过渡特征,一个时态特征表示可以从另一个时态特征推导出来。为此,我们引入了文本引导过渡生成器,将文本语义与双时态视觉特征相结合,构建跨时态过渡特征。

Key Takeaways

  1. TaCo是一种时空语义一致性网络,旨在解决遥感图像变化检测中的语义不一致问题。
  2. 现有方法主要依赖掩膜监督,但其在时空语义过渡上的约束有限。
  3. TaCo通过将变化概念化为两个时态状态之间的语义过渡,引入了时空语义联合约束。
  4. TaCo使用文本引导过渡生成器,结合文本语义和双时态视觉特征,构建跨时态过渡特征。
  5. TaCo提出了双时态重建约束和过渡约束组成的时空语义联合约束。
  6. 双时态重建约束确保重建特征与原始特征的对齐。
  7. 过渡约束提高了对变化的辨别能力,且在推理过程中未增加计算开销。

Cool Papers

点此查看论文截图

Prompting Lipschitz-constrained network for multiple-in-one sparse-view CT reconstruction

Authors:Baoshun Shi, Ke Jiang, Qiusheng Lian, Xinran Yu, Huazhu Fu

Despite significant advancements in deep learning-based sparse-view computed tomography (SVCT) reconstruction algorithms, these methods still encounter two primary limitations: (i) It is challenging to explicitly prove that the prior networks of deep unfolding algorithms satisfy Lipschitz constraints due to their empirically designed nature. (ii) The substantial storage costs of training a separate model for each setting in the case of multiple views hinder practical clinical applications. To address these issues, we elaborate an explicitly provable Lipschitz-constrained network, dubbed LipNet, and integrate an explicit prompt module to provide discriminative knowledge of different sparse sampling settings, enabling the treatment of multiple sparse view configurations within a single model. Furthermore, we develop a storage-saving deep unfolding framework for multiple-in-one SVCT reconstruction, termed PromptCT, which embeds LipNet as its prior network to ensure the convergence of its corresponding iterative algorithm. In simulated and real data experiments, PromptCT outperforms benchmark reconstruction algorithms in multiple-in-one SVCT reconstruction, achieving higher-quality reconstructions with lower storage costs. On the theoretical side, we explicitly demonstrate that LipNet satisfies boundary property, further proving its Lipschitz continuity and subsequently analyzing the convergence of the proposed iterative algorithms. The data and code are publicly available at https://github.com/shibaoshun/PromptCT.

尽管基于深度学习的稀疏视图计算机断层扫描(SVCT)重建算法取得了显著进展,但这些方法仍然面临两个主要局限性:(i)由于深度展开算法是先验网络是经验设计,因此难以明确证明其满足Lipschitz约束。(ii)在多视图情况下,为每个设置训练一个单独的模型,产生了巨大的存储成本,阻碍了实际的临床应用。为了解决这个问题,我们详细阐述了一个明确可证明的Lipschitz约束网络,被称为LipNet,并集成了一个明确的提示模块,以提供不同稀疏采样设置的区别知识,能够在单个模型中处理多种稀疏视图配置。此外,我们开发了一个用于多合一SVCT重建的存储节省深度展开框架,称为PromptCT,它将LipNet嵌入为其先验网络,以确保其相应迭代算法的收敛性。在模拟和真实数据实验中,PromptCT在多合一SVCT重建方面优于基准重建算法,以更低的存储成本实现了更高质量的重建。在理论方面,我们明确证明了LipNet满足边界属性,进一步证明了其Lipschitz连续性,并随后分析了所提出迭代算法的收敛性。数据和代码可在https://github.com/shibaoshun/PromptCT公开获取。

论文及项目相关链接

PDF

Summary

本文介绍了针对稀疏视角计算机断层扫描(SVCT)重建算法的两个主要挑战的解决方案。首先,提出了一种名为LipNet的显式可证明的Lipschitz约束网络,以解决难以证明深度展开算法先验网络满足Lipschitz约束的问题。其次,开发了一种名为PromptCT的存储节省型深度展开框架,用于处理多个稀疏视角配置的单模型重建问题,并结合了显式提示模块提供不同稀疏采样设置的区别知识。实验表明,PromptCT在多个方面的SVCT重建中优于基准重建算法,可实现高质量重建和较低的存储成本。同时公开了数据和代码以供使用。

Key Takeaways

以下是关键见解的要点:

  • 针对深度学习中稀疏视角计算机断层扫描(SVCT)重建算法的挑战,提出了LipNet网络,能够显式满足Lipschitz约束。
  • LipNet网络解决了现有算法难以证明先验网络满足Lipschitz约束的问题。
  • 开发了PromptCT框架,能够在单一模型中处理多种稀疏视角配置,通过集成显式提示模块提供不同稀疏采样设置的知识。
  • PromptCT在模拟和实际数据实验中表现出优于基准重建算法的性能,实现了高质量重建和较低的存储成本。
  • 数据和代码已公开,便于使用和进一步的研究。

Cool Papers

点此查看论文截图

Text-guided Controllable Diffusion for Realistic Camouflage Images Generation

Authors:Yuhang Qian, Haiyan Chen, Wentong Li, Ningzhong Liu, Jie Qin

Camouflage Images Generation (CIG) is an emerging research area that focuses on synthesizing images in which objects are harmoniously blended and exhibit high visual consistency with their surroundings. Existing methods perform CIG by either fusing objects into specific backgrounds or outpainting the surroundings via foreground object-guided diffusion. However, they often fail to obtain natural results because they overlook the logical relationship between camouflaged objects and background environments. To address this issue, we propose CT-CIG, a Controllable Text-guided Camouflage Images Generation method that produces realistic and logically plausible camouflage images. Leveraging Large Visual Language Models (VLM), we design a Camouflage-Revealing Dialogue Mechanism (CRDM) to annotate existing camouflage datasets with high-quality text prompts. Subsequently, the constructed image-prompt pairs are utilized to finetune Stable Diffusion, incorporating a lightweight controller to guide the location and shape of camouflaged objects for enhanced camouflage scene fitness. Moreover, we design a Frequency Interaction Refinement Module (FIRM) to capture high-frequency texture features, facilitating the learning of complex camouflage patterns. Extensive experiments, including CLIPScore evaluation and camouflage effectiveness assessment, demonstrate the semantic alignment of our generated text prompts and CT-CIG’s ability to produce photorealistic camouflage images.

伪装图像生成(CIG)是一个新兴研究领域,专注于合成和谐融合物体并与周围环境保持高度视觉一致性的图像。现有方法通过融合物体到特定背景或通过前景对象引导的扩散进行背景外绘来实现CIG。然而,它们往往无法得到自然的结果,因为它们忽略了伪装物体与背景环境之间的逻辑关系。为了解决这一问题,我们提出了可控文本引导的伪装图像生成方法CT-CIG,能够生成逼真且逻辑合理的伪装图像。我们利用大型视觉语言模型(VLM),设计了一种伪装揭示对话机制(CRDM),用于对现有的伪装数据集进行高质量文本提示的标注。随后,利用构建的图像提示对来微调Stable Diffusion模型,结合轻量级控制器以引导伪装物体的位置和形状,从而提高伪装场景的适应性。此外,我们设计了一个频率交互细化模块(FIRM),用于捕捉高频纹理特征,促进复杂伪装模式的学习。大量实验包括CLIPScore评估和伪装效果评估,证明了我们的生成文本提示的语义对齐以及CT-CIG生成逼真伪装图像的能力。

论文及项目相关链接

PDF Accepted by AAAI 2026

Summary

基于文本指导的可控伪装图像生成方法。该研究利用大型视觉语言模型(VLM)和伪装揭示对话机制(CRDM),通过精细调整图像生成模型来提升伪装图像的自然度和逻辑性。同时设计频率交互细化模块(FIRM)捕捉高频纹理特征,增强复杂伪装图案的学习能力。

Key Takeaways

  1. Camouflage Images Generation (CIG) 是合成图像领域的新兴研究方向,旨在和谐融合物体并与周围环境保持高度视觉一致性。
  2. 现有方法常忽略伪装物体与背景环境之间的逻辑关系,导致生成结果不自然。
  3. 提出可控文本指导的伪装图像生成方法 CT-CIG,利用大型视觉语言模型(VLM)和伪装揭示对话机制(CRDM)提升伪装图像质量。
  4. 通过图像与文本提示对进行微调,增强伪装场景适应性。
  5. 设计频率交互细化模块(FIRM)捕捉高频纹理特征,有助于学习复杂伪装图案。
  6. 实验结果表明,CT-CIG能够生成语义对齐且光感逼真的伪装图像。

Cool Papers

点此查看论文截图

Robust 3D Brain MRI Inpainting with Random Masking Augmentation

Authors:Juexin Zhang, Ying Weng, Ke Chen

The ASNR-MICCAI BraTS-Inpainting Challenge was established to mitigate dataset biases that limit deep learning models in the quantitative analysis of brain tumor MRI. This paper details our submission to the 2025 challenge, a novel deep learning framework for synthesizing healthy tissue in 3D scans. The core of our method is a U-Net architecture trained to inpaint synthetically corrupted regions, enhanced with a random masking augmentation strategy to improve generalization. Quantitative evaluation confirmed the efficacy of our approach, yielding an SSIM of 0.873$\pm$0.004, a PSNR of 24.996$\pm$4.694, and an MSE of 0.005$\pm$0.087 on the validation set. On the final online test set, our method achieved an SSIM of 0.919$\pm$0.088, a PSNR of 26.932$\pm$5.057, and an RMSE of 0.052$\pm$0.026. This performance secured first place in the BraTS-Inpainting 2025 challenge and surpassed the winning solutions from the 2023 and 2024 competitions on the official leaderboard.

ASNR-MICCAI BraTS-Inpainting挑战赛旨在减轻数据集偏见,这些偏见限制了深度学习模型在脑部肿瘤MRI定量分析中的应用。这篇论文详细描述了我们对2025年挑战赛的提交内容,即一种用于在3D扫描中合成健康组织的新型深度学习框架。我们的方法的核心是U-Net架构,该架构经过训练可用于填充合成损坏区域,并采用随机掩膜增强策略来提高泛化能力。定量评估证实了我们方法的有效性,在验证集上,结构相似性指标(SSIM)为0.873±0.004,峰值信噪比(PSNR)为24.996±4.694,均方误差(MSE)为0.005±0.087。在最后的在线测试集上,我们的方法实现了SSIM 0.919±0.088,PSNR 26.932±5.057和均方根误差RMSE 0.052±0.026。这一表现使我们在BraTS-Inpainting 2025挑战赛中获得了第一名,并在官方排行榜上超越了2023年和2024年的获胜解决方案。

论文及项目相关链接

PDF Accepted by the International Brain Tumor Segmentation (BraTS) challenge organized at MICCAI 2025 conference

Summary
本论文提交了一项针对ASNR-MICCAI BraTS-Inpainting挑战的新型深度学习框架,旨在合成健康组织在三维扫描中的修复技术。通过训练U-Net架构来填充合成破坏区域,并使用随机掩膜增强策略提高泛化能力。定量评估证实此方法有效,在验证集和最终测试集上均取得优异成绩,获得BraTS-Inpainting 2025挑战第一名。

Key Takeaways

  1. ASNR-MICCAI BraTS-Inpainting挑战旨在解决深度学习模型在脑肿瘤MRI定量分析中的数据集偏见问题。
  2. 论文提出了一种新型的深度学习框架,用于合成三维扫描中的健康组织修复技术。
  3. 该方法的核心是训练的U-Net架构,用于填充合成破坏区域。
  4. 使用随机掩膜增强策略提高了模型的泛化能力。
  5. 定量评估表明,该方法在验证集上取得了优异的性能指标,包括SSIM、PSNR和MSE。
  6. 在最终测试集上,该方法性能卓越,获得BraTS-Inpainting 2025挑战第一名。

Cool Papers

点此查看论文截图

Fusion of Simulation and Experiment Data for Hypersonic Flow Field Prediction via Pre-Training and Fine-Tuning

Authors:Yuan Jia, Guoqin Zhao, Hao Ma, Xin Li, Chi Zhang, Chih-Yung Wen

Accurate prediction of hypersonic flow fields over a compression ramp is critical for aerodynamic design but remains challenging due to the scarcity of experimental measurements such as velocity. This study systematically develops a data fusion framework to address this issue. In the first phase, a model trained solely on Computational Fluid Dynamics (CFD) data establishes a baseline for flow field prediction. The second phase demonstrates that enriching the training with both CFD and experimental data significantly enhances predictive accuracy: errors in pressure and density are reduced to 12.6% and 7.4%, respectively. This model also captures key flow features such as separation and reattachment shocks more distinctly. Physical analyses based on this improved model, including investigations into ramp angle effects and global stability analysis, confirm its utility for efficient design applications. In the third phase, a pre-trained model (using only CFD data) is successfully fine-tuned with experimental schlieren images, effectively reconstructing velocity fields and validating the transferability of the approach. This step-wise methodology demonstrates the effectiveness of combining simulation and experiment by pre-training and fine-tuning, offering a robust and efficient pathway for hypersonic flow modeling in real-world.

准确预测压缩坡面上的高超音速流场对空气动力学设计至关重要,但由于速度等实验测量的稀缺性,这仍然是一个挑战。本研究系统地开发了一个数据融合框架来解决这个问题。在第一阶段,仅使用计算流体动力学(CFD)数据训练的模型为流场预测建立了基准。第二阶段表明,用CFD和实验数据丰富训练会显著提高预测精度:压力和密度的误差分别降至12.6%和7.4%。该模型还能更清晰地捕捉流场的关键特征,如分离和再附冲击等。基于这一改进模型的物理分析,包括对斜坡角度影响和全局稳定性分析的调查,证实了其在高效设计应用中的实用性。在第三阶段,使用仅CFD数据的预训练模型成功地使用实验纹影图像进行了微调,有效地重建了速度场并验证了该方法的可转移性。这种分步的方法论展示了通过预训练和微调结合模拟和实验的有效性,为现实世界中的高超音速流建模提供了稳健高效的途径。

论文及项目相关链接

PDF

Summary
本研究开发了一种数据融合框架,以解决预测压缩坡面高速流场的问题。研究通过三个阶段展示了如何结合仿真和实验进行预训练和微调,通过结合计算流体动力学(CFD)数据和实验数据,提高了预测精度,并成功重建了速度场。该模型为高效设计应用提供了实用工具。

Key Takeaways

  1. 数据融合框架用于解决预测压缩坡面高速流场的问题。
  2. 单纯使用计算流体动力学(CFD)数据的模型为流场预测提供了基线。
  3. 结合CFD和实验数据丰富了训练,显著提高了预测精度。
  4. 模型能更清晰地捕捉关键流特征,如分离和再附冲击。
  5. 基于改进模型的物理分析,包括斜坡角度影响和全局稳定性分析,证实了其在高效设计应用中的实用性。
  6. 使用仅CFD数据的预训练模型可以通过实验纹影图像进行微调,成功重建速度场。

Cool Papers

点此查看论文截图

Vision-Language Models for Automated 3D PET/CT Report Generation

Authors:Wenpei Jiao, Kun Shang, Hui Li, Ke Yan, Jiajin Zhang, Guangjie Yang, Lijuan Guo, Yan Wan, Xing Yang, Dakai Jin, Zhaoheng Xie

Positron emission tomography/computed tomography (PET/CT) is essential in oncology, yet the rapid expansion of scanners has outpaced the availability of trained specialists, making automated PET/CT report generation (PETRG) increasingly important for reducing clinical workload. Compared with structural imaging (e.g., X-ray, CT, and MRI), functional PET poses distinct challenges: metabolic patterns vary with tracer physiology, and whole-body 3D contextual information is required rather than local-region interpretation. To advance PETRG, we propose PETRG-3D, an end-to-end 3D dual-branch framework that separately encodes PET and CT volumes and incorporates style-adaptive prompts to mitigate inter-hospital variability in reporting practices. We construct PETRG-Lym, a multi-center lymphoma dataset collected from four hospitals (824 reports w/ 245,509 paired PET/CT slices), and construct AutoPET-RG-Lym, a publicly accessible PETRG benchmark derived from open imaging data but equipped with new expert-written, clinically validated reports (135 cases). To assess clinical utility, we introduce PETRG-Score, a lymphoma-specific evaluation protocol that jointly measures metabolic and structural findings across curated anatomical regions. Experiments show that PETRG-3D substantially outperforms existing methods on both natural language metrics (e.g., +31.49% ROUGE-L) and clinical efficacy metrics (e.g., +8.18% PET-All), highlighting the benefits of volumetric dual-modality modeling and style-aware prompting. Overall, this work establishes a foundation for future PET/CT-specific models emphasizing disease-aware reasoning and clinically reliable evaluation. Codes, models, and AutoPET-RG-Lym will be released.

正电子发射断层扫描/计算机断层扫描(PET/CT)在肿瘤学中具有重要意义,然而扫描仪的快速扩展使得训练有素的专家供应不足,因此自动化PET/CT报告生成(PETRG)在减轻临床工作量方面变得越来越重要。与结构成像(例如X射线、CT和MRI)相比,功能PET具有独特的挑战:代谢模式随追踪剂生理学而变化,并且需要全身三维上下文信息而非局部区域解读。为了推进PETRG,我们提出了PETRG-3D,这是一种端到端的3D双分支框架,能够分别编码PET和CT体积,并融入风格自适应提示来减轻医院间报告实践的差异。我们构建了PETRG-Lym,这是一个从四家医院收集的多中心淋巴瘤数据集(包含824份报告和245,509对PET/CT切片),以及构建了AutoPET-RG-Lym,这是一个可公开的PETRG基准测试集,它源于公开成像数据,但配备了新的专家撰写并经临床验证的报告(共涉及病例数为135例)。为了评估临床实用性,我们引入了PETRG评分,这是一种针对淋巴瘤的评估协议,可联合测量代谢和结构发现于精选解剖区域的结果。实验表明,PETRG-3D在自然语言指标(如+31.49%ROUGE-L)和临床疗效指标(如PET-All +8.18%)上均显著优于现有方法,突显了体积双模态建模和风格感知提示的优势。总体而言,这项工作为未来的PET/CT特定模型奠定了基础,这些模型侧重于疾病感知推理和临床可靠的评估。代码、模型和AutoPET-RG-Lym即将发布。

论文及项目相关链接

PDF

摘要
本文介绍了一种基于深度学习技术的自动化PET/CT报告生成系统PETRG-3D。通过构建多中心淋巴瘤数据集PETRG-Lym和公开基准测试AutoPET-RG-Lym,该系统实现了PET和CT影像的端到端三维双重分支框架编码,通过风格自适应提示来减轻不同医院报告实践的差异。实验结果显示,PETRG-3D在自然语言指标和临床有效性指标上均显著优于现有方法,证明了其临床实用价值。这为未来的PET/CT特定模型提供了基础,强调疾病感知推理和临床可靠评估的重要性。

关键见解

  1. PET/CT在肿瘤学中至关重要,但专业人员的缺乏促使了自动化PET/CT报告生成的必要性和紧迫性。
  2. 功能PET相对于结构性成像具有独特挑战,如代谢模式的多样性和全身三维上下文信息的需求。
  3. PETRG-3D是一种先进的自动化报告生成技术,结合了PET和CT体积的单独编码以及风格自适应提示技术,应对不同医院间的报告实践差异。
  4. 通过构建多中心淋巴瘤数据集PETRG-Lym和公开基准测试AutoPET-RG-Lym来推动PETRG的发展。
  5. PETRG-Score作为淋巴瘤特异性评估协议,可全面评估代谢和结构性结果在临床实践中的应用价值。
  6. 实验结果展示了PETRG-3D在临床指标和自然语言处理指标上的显著优势,证明了其在自动化报告生成领域的先进性。

Cool Papers

点此查看论文截图

A laboratory plasma experiment for X-ray astronomy using a compact electron beam ion trap (EBIT)

Authors:Yuki Amano, Leo Hirata, Moto Togawa, Hiromasa Suzuki, Hiroyuki A. Sakaue, Naoki Kimura, Nobuyuki Nakamura, Makoto Sawada, Masaki Oura, Jonas Danisch, Joschka Goes, Marc Botz, José R. Crespo López-urrutia, Hiroya Yamaguchi

We present the basic performance and experimental results of an electron beam ion trap (JAXA-EBIT), newly introduced to the Japanese astronomical community. Accurate atomic data are indispensable for the reliable interpretation of high-resolution X-ray spectra of astrophysical plasmas. The JAXA-EBIT generates highly charged ions under well-controlled laboratory conditions, providing experimental benchmarks for atomic data. The JAXA-EBIT shows performance comparable to the Heidelberg compact EBIT through dielectronic recombination measurements of highly charged Ar ions. Furthermore, we conducted resonant photoexcitation spectroscopy of highly charged ions using the soft X-ray beamline BL17SU at the synchrotron radiation facility SPring-8. As a result, we successfully detected resonance transitions of He-like O$^{6+}$ and Ne-like Fe$^{16+}$. These results demonstrate the capability of the JAXA-EBIT for precise measurement of atomic data and show that it serves as a powerful tool for advancing astrophysical research.

我们向日本天文学界介绍新引进的电子束离子阱(JAXA-EBIT)的基本性能和实验结果。准确的原子数据对于可靠解释天体物理等离子体的高分辨率X射线光谱至关重要。JAXA-EBIT在实验室控制的条件下产生高度带电离子,为原子数据提供实验基准。通过高度带电的Ar离子的介电再结合测量,JAXA-EBIT表现出与海德堡紧凑EBIT相当的性能。此外,我们在同步辐射设施SPring-8的软X射线光束线BL17SU上进行了高度带电离子的共振光激发光谱研究。结果成功检测到He类O$^{6+}$和Ne类Fe$^{16+}$的共振跃迁。这些结果证明了JAXA-EBIT在精确测量原子数据方面的能力,并表明它是推动天文物理学研究的有力工具。

论文及项目相关链接

PDF 8pages, 4 figures, accepted for publication in Plasma and Fusion Research

Summary
日本天文界引入了电子束离子阱(JAXA-EBIT),用于产生高度带电离子,为原子数据提供实验基准。其与海德堡紧凑EBIT的性能相当,并通过共振光电离光谱法成功检测到He类O^6+和Ne类Fe^16+的共振跃迁。这为精确测量原子数据和推进天文研究提供了有力工具。

Key Takeaways

  1. JAXA-EBIT被引入日本天文界,用于产生高度带电离子。
  2. JAXA-EBIT的实验室条件下的离子生成为原子数据提供了实验基准。
  3. JAXA-EBIT的性能与海德堡紧凑EBIT相当。
  4. 通过共振光电激发光谱法成功检测到高度带电离子的共振跃迁。
  5. 成功检测到He类O^6+和Ne类Fe^16+的共振跃迁。
  6. 这些结果证明了JAXA-EBIT在精确测量原子数据方面的能力。

Cool Papers

点此查看论文截图

LungEvaty: A Scalable, Open-Source Transformer-based Deep Learning Model for Lung Cancer Risk Prediction in LDCT Screening

Authors:Johannes Brandt, Maulik Chevli, Rickmer Braren, Georgios Kaissis, Philip Müller, Daniel Rueckert

Lung cancer risk estimation is gaining increasing importance as more countries introduce population-wide screening programs using low-dose CT (LDCT). As imaging volumes grow, scalable methods that can process entire lung volumes efficiently are essential to tap into the full potential of these large screening datasets. Existing approaches either over-rely on pixel-level annotations, limiting scalability, or analyze the lung in fragments, weakening performance. We present LungEvaty, a fully transformer-based framework for predicting 1-6 year lung cancer risk from a single LDCT scan. The model operates on whole-lung inputs, learning directly from large-scale screening data to capture comprehensive anatomical and pathological cues relevant for malignancy risk. Using only imaging data and no region supervision, LungEvaty matches state-of-the-art performance, refinable by an optional Anatomically Informed Attention Guidance (AIAG) loss that encourages anatomically focused attention. In total, LungEvaty was trained on more than 90,000 CT scans, including over 28,000 for fine-tuning and 6,000 for evaluation. The framework offers a simple, data-efficient, and fully open-source solution that provides an extensible foundation for future research in longitudinal and multimodal lung cancer risk prediction.

肺癌风险评估的重要性随着越来越多的国家引入使用低剂量计算机断层扫描(LDCT)的全民筛查项目而日益增加。随着图像数据的增长,能够处理整个肺部数据的高效可伸缩方法对于挖掘这些大规模筛查数据集的全部潜力至关重要。现有方法过于依赖像素级注释,限制了可扩展性,或以片段形式分析肺部,降低了性能。我们提出了 LungEvaty,这是一个基于完全变压器的框架,可从单个 LDCT 扫描中预测 1-6 年的肺癌风险。该模型在整肺输入上运行,直接从大规模筛查数据中学习,以捕获与恶性肿瘤风险相关的全面解剖和病理线索。仅使用图像数据且无需区域监督,LungEvaty 可达到最新性能水平,并通过可选的解剖学信息注意力引导(AIAG)损失进行细化,该损失鼓励关注解剖学重点。总而言之,LungEvaty 接受了超过 9 万次 CT 扫描的训练,其中包括超过 2.8 万次微调扫描和 6 千次评估扫描。该框架提供了一个简单、高效且完全开源的解决方案,为未来纵线和多模式肺癌风险预测研究提供了可扩展的基础。

论文及项目相关链接

PDF

Summary

肺癌风险评估随着低剂量计算机断层扫描(LDCT)在多个国家的广泛应用而变得越来越重要。随着影像数据的增长,需要开发能够高效处理整个肺部数据的方法。现有的方法过于依赖像素级注释,限制了其可扩展性,或在片段上分析肺部,削弱了性能。本文提出了基于完全转换器架构的LungEvaty框架,用于从单个LDCT扫描预测1-6年的肺癌风险。该模型处理整个肺部输入,直接从大规模筛查数据中学习,捕获与恶性风险相关的全面解剖和病理线索。仅使用图像数据且无需区域监督,LungEvaty达到了最先进的性能,并可通过可选的解剖信息引导注意力(AIAG)损失进行微调,以鼓励关注解剖重点。该框架在超过9万个CT扫描上进行训练,包括超过2.8万个用于微调的数据和6千个用于评估的数据。它为未来的纵断面和多模态肺癌风险评估研究提供了一个简单、高效和完全开源的解决方案。

Key Takeaways

  1. Lung cancer risk assessment is becoming increasingly important with the adoption of LDCT screening programs.
  2. As imaging volumes grow, there is a need for scalable methods to process entire lung volumes efficiently.
  3. Existing approaches to lung cancer risk prediction have limitations, such as over-reliance on pixel-level annotations or analyzing the lung in fragments.
  4. LungEvaty is a fully transformer-based framework for predicting lung cancer risk from a single LDCT scan.
  5. LungEvaty operates on whole-lung inputs and learns directly from large-scale screening data.
  6. LungEvaty achieves state-of-the-art performance without requiring region-specific supervision.

Cool Papers

点此查看论文截图

Spatially Resolved Plasma Diagnostics of the Supernova Remnant DEM L71 using the Reflection Grating Spectrometer

Authors:Yuki Amano, Yuken Ohshiro, Hiromasa Suzuki, Kotaro Fukushima, Hiroya Yamaguchi

We present a spatially resolved high-resolution X-ray spectroscopy of the supernova remnant DEM L71 using the Reflection Grating Spectrometer (RGS) aboard XMM-Newton. Because of the large dispersion angle of the RGS, we are able to resolve individual emission lines and examine their spatial distributions within this moderately extended remnant. We derive line fluxes across different regions of DEM L71 and perform quantitative plasma diagnostics. Our analysis reveals that some regions have high forbidden-to-resonance ratios of O\emissiontype{VII} He$α$ lines, suggesting a non-negligible contribution from additional physical processes, such as charge exchange and/or resonance scattering. Our results demonstrate that the RGS has potential to serve as an outstanding X-ray imaging spectrometer for moderately diffuse objects.

我们使用XMM-Newton上的反射光栅光谱仪(RGS)对超新星遗迹DEM L71进行了空间分辨率较高的X射线光谱研究。由于RGS具有较大的色散角,我们能够分辨出单个的发射线并检查它们在适度延展的遗迹内的空间分布。我们推导出DEM L72不同区域的线流量,并进行定量等离子体诊断。我们的分析表明,某些区域的O VII He$α$线禁止与共振的比率较高,表明额外的物理过程(如电荷交换和/或共振散射)的贡献不可忽视。我们的结果表明,RGS有可能成为中度扩散物体的出色X射线成像光谱仪。

论文及项目相关链接

PDF 12 pages, 10 figures, 1 table, accepted for publication in PASJ

Summary

使用XMM-Newton搭载的反射光栅光谱仪(RGS)对超新星遗迹DEM L71进行空间高分辨率X射线光谱分析。通过解析不同区域的发射线流量并进行等离子体诊断,发现某些区域的O\emissiontype{VII} Heα线禁止与共振比率较高,暗示有其他物理过程如电荷交换和/或共振散射的贡献。RGS对中等扩散物体具有优秀的X射线成像光谱仪潜力。

Key Takeaways

  1. 使用反射光栅光谱仪(RGS)对DEM L71进行高分辨X射线光谱分析。
  2. RGS的大散射角能够解析单个发射线并检查其在DEM L71内的空间分布。
  3. 在DEM L71的不同区域推导线流量并进行等离子体诊断。
  4. 发现某些区域具有高禁止与共振比的O\emissiontype{VII} Heα线,暗示存在其他物理过程的贡献。
  5. 这些额外的物理过程可能包括电荷交换和/或共振散射。
  6. RGS分析揭示了其在中等扩散物体上的优秀X射线成像光谱仪潜力。

Cool Papers

点此查看论文截图

Multi Head Attention Enhanced Inception v3 for Cardiomegaly Detection

Authors:Abishek Karthik, Pandiyaraju V

The healthcare industry has been revolutionized significantly by novel imaging technologies, not just in the diagnosis of cardiovascular diseases but also by the visualization of structural abnormalities like cardiomegaly. This article explains an integrated approach to the use of deep learning tools and attention mechanisms for automatic detection of cardiomegaly using X-ray images. The initiation of the project is grounded on a strong Data Collection phase and gathering the data of annotated X-ray images of various types. Then, while the Preprocessing module fine-tunes image quality, it is feasible to utilize the best out of the data quality in the proposed system. In our proposed system, the process is a CNN configuration leveraging the inception V3 model as one of the key blocks. Besides, we also employ a multilayer attention mechanism to enhance the strength. The most important feature of the method is the multi-head attention mechanism that can learn features automatically. By exact selective focusing on only some regions of input, the model can thus identify cardiomegaly in a sensitive manner. Attention rating is calculated, duplicated, and applied to enhance representation of main data, and therefore there is a successful diagnosis. The Evaluation stage will be extremely strict and it will thoroughly evaluate the model based on such measures as accuracy and precision. This will validate that the model can identify cardiomegaly and will also show the clinical significance of this method. The model has accuracy of 95.6, precision of 95.2, recall of 96.2, sensitivity of 95.7, specificity of 96.1 and an Area Under Curve(AUC) of 96.0 and their respective graphs are plotted for visualisation.

医疗健康行业被新型成像技术彻底革新,这不仅体现在心血管疾病的诊断上,还体现在诸如心脏增大等结构异常的可视化上。本文介绍了一种利用深度学习工具和注意力机制自动检测心脏增大的X射线图像的综合方法。该项目的启动基于强大的数据采集阶段和收集各种已标注的X射线图像数据。然后,预处理模块对图像质量进行微调,以便在拟议系统中充分利用数据质量。在我们提出的系统中,过程是一个利用inception V3模型作为关键模块之一的CNN配置。此外,我们还采用多层注意力机制来提升强度。该方法最重要的特点是多头注意力机制,它可以自动学习特征。通过精确选择仅聚焦在某些输入区域上,该模型能够以敏感的方式识别心脏增大。计算注意力评分,对其进行复制并应用于增强主数据的表示,从而进行成功的诊断。评估阶段将非常严格,它将根据准确性、精确度等指标对模型进行全面评估。这将验证模型识别心脏增大的能力,并显示该方法的临床意义。该模型的准确度为95.6%,精确度为95.2%,召回率为96.2%,灵敏度为95.7%,特异度为96.1%,曲线下面积(AUC)为92.0,并绘制了相应的图表用于可视化展示。

论文及项目相关链接

PDF

Summary
本文介绍了一种利用深度学习工具和注意力机制自动检测心胸增大(Cardiomegaly)的方法。该方法采用集成方案,基于强数据采集阶段进行数据收集并利用预处理器优化图像质量。使用卷积神经网络配置,采用inception V3模型作为关键模块之一,并采用多层注意力机制提升效能。最重要的特点是采用多头注意力机制可自动学习特征。通过对输入数据的精确选择性关注,该模型可以灵敏地识别心胸增大。该方法已成功应用于诊断阶段,其准确性得到验证。模型的准确度为95.6%,精确度、召回率等指标均表现良好,且已绘制相应的图表进行可视化展示。

Key Takeaways

  1. 新一代成像技术为医疗保健行业带来重大变革,不仅在心血管疾病诊断中,还能通过可视化结构性异常(如心胸增大)辅助诊断。
  2. 文章介绍了一种结合深度学习工具和注意力机制的自动检测心胸增大的方法,使用X射线图像。
  3. 项目的启动基于强大的数据采集阶段,收集各种类型的带注释的X射线图像数据。
  4. 在预处理模块中优化了图像质量,以便在提出的系统中充分利用数据质量。
  5. 采用卷积神经网络配置,包括inception V3模型作为关键组成部分,并使用了多层注意力机制以增强效能。
  6. 多头注意力机制是该方法最重要的特点,可自动学习特征,并通过精确选择性关注输入数据来识别心胸增大。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
TTS TTS
TTS 方向最新论文已更新,请持续关注 Update in 2025-11-27 Scale Where It Matters Training-Free Localized Scaling for Diffusion Models
2025-11-27
下一篇 
Diffusion Models Diffusion Models
Diffusion Models 方向最新论文已更新,请持续关注 Update in 2025-11-27 PixelDiT Pixel Diffusion Transformers for Image Generation
  目录