⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-09-14 更新
On the Encapsulation of Medical Imaging AI Algorithms
Authors:Hans Meine, Yongli Mou, Guido Prause, Horst Hahn
In the context of collaborative AI research and development projects, it would be ideal to have self-contained encapsulated algorithms that can be easily shared between different parties, executed and validated on data at different sites, or trained in a federated manner. In practice, all of this is possible but greatly complicated, because human supervision and expert knowledge is needed to set up the execution of algorithms based on their documentation, possibly implicit assumptions, and knowledge about the execution environment and data involved. We derive and formulate a range of detailed requirements from the above goal and from specific use cases, focusing on medical imaging AI algorithms. Furthermore, we refer to a number of existing APIs and implementations and review which aspects each of them addresses, which problems are still open, and which public standards and ontologies may be relevant. Our contribution is a comprehensive collection of aspects that have not yet been addressed in their entirety by any single solution. Working towards the formulated goals should lead to more sustainable algorithm ecosystems and relates to the FAIR principles for research data, where this paper focuses on interoperability and (re)usability of medical imaging AI algorithms.
在协作式人工智能研究和发展项目的背景下,拥有能够独立于不同方共享、在不同站点数据上执行和验证,或以联邦方式训练的自主封装算法将是最理想的。在实践中,所有这些都是可能的,但非常复杂,因为需要根据算法文档、可能的隐含假设以及执行环境和涉及数据的了解来设置算法的执行,这需要人工监督和专业知识。我们从上述目标和具体用例中得出并制定了详细的要求,重点关注医学影像AI算法。此外,我们参考了许多现有的API和实现,并回顾了它们各自解决的方面、仍然存在的问题以及可能的公共标准和本体。我们的贡献是全面收集了尚未被任何单一解决方案完全解决的方面。朝着既定目标努力应该会导致更可持续的算法生态系统,并与研究数据的FAIR原则相关,本论文的重点是医学影像AI算法的互操作性和(重新)使用性。
论文及项目相关链接
PDF v2: mention FAIR4ML, MLentory, some minor elaboration and spelling fixes
Summary
在协作式人工智能研究与发展项目中,理想状态是拥有可自我封装的算法,易于在不同团队间共享、执行和验证数据,或进行联邦式训练。但在实践中,实现这一切极为复杂,需要人工监督与专业知识来根据文档、隐含假设、执行环境与相关数据设置算法执行。本文由此目标及具体用例出发,推导出详尽的需求,并针对医学成像人工智能算法进行深入探讨。同时,本文回顾了现有API和实施方案的优缺点,并指出了尚未解决的问题和可能的公共标准及本体论关联。本文的贡献在于全面总结了目前尚未被单一解决方案完全解决的问题。朝着既定目标努力将促进算法生态的可持续发展,并符合研究数据的FAIR原则,重点提升医学成像AI算法的互通性和可复用性。
Key Takeaways
- 理想状态是拥有可自我封装的算法,便于在协作式AI项目中的共享、执行和验证。
- 实现这一理想状态需要人工监督与专业知识来正确设置算法执行。
- 本文从具体用例出发,推导出关于医学成像AI算法的详尽需求。
- 现有API和实施方案的优缺点得到了回顾。
- 尚未解决的问题和可能的公共标准及本体论关联被指出。
- 本文为医学成像AI算法领域全面总结了尚未被解决的问题。
点此查看论文截图


Bringing Attention to CAD: Boundary Representation Learning via Transformer
Authors:Qiang Zou, Lizhen Zhu
The recent rise of generative artificial intelligence (AI), powered by Transformer networks, has achieved remarkable success in natural language processing, computer vision, and graphics. However, the application of Transformers in computer-aided design (CAD), particularly for processing boundary representation (B-rep) models, remains largely unexplored. To bridge this gap, we propose a novel approach for adapting Transformers to B-rep learning, called the Boundary Representation Transformer (BRT). B-rep models pose unique challenges due to their irregular topology and continuous geometric definitions, which are fundamentally different from the structured and discrete data Transformers are designed for. To address this, BRT proposes a continuous geometric embedding method that encodes B-rep surfaces (trimmed and untrimmed) into Bezier triangles, preserving their shape and continuity without discretization. Additionally, BRT employs a topology-aware embedding method that organizes these geometric embeddings into a sequence of discrete tokens suitable for Transformers, capturing both geometric and topological characteristics within B-rep models. This enables the Transformer’s attention mechanism to effectively learn shape patterns and contextual semantics of boundary elements in a B-rep model. Extensive experiments demonstrate that BRT achieves state-of-the-art performance in part classification and feature recognition tasks.
最近,以Transformer网络为驱动的生成式人工智能(AI)在自然语言处理、计算机视觉和图形学领域取得了显著的成功。然而,Transformer在计算机辅助设计(CAD)中的应用,特别是在处理边界表示(B-rep)模型方面,仍被大大忽视。为了填补这一空白,我们提出了一种新的适应Transformer用于B-rep学习的方法,称为边界表示转换器(BRT)。由于B-rep模型具有不规则拓扑和连续几何定义的特点,与传统的为结构化离散数据设计的Transformer存在根本差异,因此构成了独特的挑战。为了解决这一问题,BRT提出了一种连续几何嵌入方法,该方法将B-rep表面(修剪和未修剪)编码为Bezier三角形,在保持其形状和连续性的同时无需离散化。此外,BRT还采用了一种拓扑感知嵌入方法,将这些几何嵌入组织成适合Transformer的离散令牌序列,捕获B-rep模型中的几何和拓扑特征。这使得Transformer的注意力机制能够有效地学习B-rep模型中边界元素的形状模式和上下文语义。大量实验表明,BRT在零件分类和特征识别任务上达到了最先进的性能。
论文及项目相关链接
Summary
基于Transformer网络的生成式人工智能在自然语言处理、计算机视觉和图形学等领域取得了显著成功,但在计算机辅助设计(CAD)中的应用,尤其是在处理边界表示(B-rep)模型方面仍鲜有探索。为弥补这一空白,提出了适应B-rep学习的全新方法——边界表示转换器(BRT)。BRT通过连续几何嵌入方法,将B-rep模型(无论是修剪的还是未修剪的)编码为Bezier三角形,保持其形状和连续性,且不进行离散化。此外,BRT采用拓扑感知嵌入方法,将这些几何嵌入组织成适合Transformer的离散令牌序列,同时捕捉B-rep模型中的几何和拓扑特征。这使得Transformer的注意力机制能够有效地学习B-rep模型中边界元素的形状模式和上下文语义。实验表明,BRT在零件分类和特征识别任务上达到了最先进的性能。
Key Takeaways
- 生成式人工智能在多个领域取得显著成功,但在计算机辅助设计(CAD)中的B-rep模型处理方面存在空白。
- 边界表示转换器(BRT)是适应B-rep学习的全新方法。
- BRT通过连续几何嵌入方法处理B-rep模型,保持其形状和连续性,不进行离散化。
- BRT采用拓扑感知嵌入方法,将几何嵌入转化为适合Transformer的离散令牌序列。
- BRT能够捕捉B-rep模型中的几何和拓扑特征。
- Transformer的注意力机制能有效学习B-rep模型中边界元素的形状模式和上下文语义。
点此查看论文截图





Core-Excited States of Linear and Bent Uranyl Complexes: Insights from High-Energy Resolution X-ray Spectroscopy and Relativistic Quantum Chemistry
Authors:Wilken Aldair Misael, Lucia Amidani, Juliane März, Elena F. Bazarkina, Kristina O. Kvashnina, Valérie Vallet, André Severo Pereira Gomes
Advanced X-ray spectroscopic techniques are widely recognized as state-of-the-art tools for probing the electronic structure, bonding, and chemical environments of the heaviest elements in the periodic table. In this study, we employ X-ray absorption near-edge structure measurements in high-energy resolution fluorescence detection (HERFD-XANES) mode to investigate the core states arising from excitations out of the U 3d${_{3/2}}$ (M$_4$ edge) levels for molecular complexes in which the uranyl moiety deviates from linearity to varying degrees, and in particular systems containing the UO$_2$Cl$_2$ group such as UO$_2$Cl$_2$.n(H$_2$O) and UO$_2$Cl$_2$(phen)$_2$, which in the latter case exhibits a pronounced O-U-O bending angle. These U M$_4$ edge HERFD-XANES spectra are compared to those of other uranyl complexes reported in the literature. This evaluation is complemented by \textit{ab initio} relativistic quantum chemistry simulations on the [UO$_2$(NO$_3$)$_2$.n(H$_2$O)], UO$_2$Cl$_2$.n(H$_2$O) and UO$_2$Cl$_2$(phen)$_2$ systems, using 2-component Time-Dependent Density Functional Theory (TD-DFT) with the CAM-B3LYP functional, employing the Tamm-Dancoff approximation (2c-TDA). Our 2c-TDA simulations show modest deviations from the HERFD-XANES data, with peak splittings differing by less than 1 eV from experimental values. These core-excited states were further characterized by Natural Transition Orbital (NTO) analysis. Overall, our results highlight the influence of equatorial ligands on the spectroscopic signatures, particularly pronounced in UO$_2$Cl$_2$(phen)$2$, where the U 3d${3/2}$ $\rightarrow$ 5f$\sigma_u^*$ satellite transition appears at lower energies compared to the other systems studied.
先进的X射线光谱技术被广泛认为是探索周期表中重元素的电子结构、键合和化学环境的最前沿工具。在本研究中,我们使用高能量分辨率荧光检测(HERFD)模式下的X射线吸收近边缘结构测量法,研究偏离线性的铀酰分子复合物的核心状态,特别是含有UO2Cl2基团的系统,如UO2Cl2.n(H2O)和UO2Cl2(phen)2。在后一种情况下,表现出明显的O-U-O弯曲角。将这些UM4边缘的HERFD-XANES光谱与文献中报道的其他铀酰复合物的光谱进行比较。此次评估辅以针对[UO2(NO3)2.n(H2O)]、UO2Cl2.n(H2O)和UO2Cl2(phen)2系统的从头开始相对论量子化学模拟,使用采用CAM-B3LYP功能的两分量含时密度泛函理论(TD-DFT),并采用塔姆-丹科夫近似(2c-TDA)。我们的两分量TDA模拟与HERFD-XANES数据存在适度偏差,峰值分裂与实验值相差小于1电子伏特。这些核心激发态进一步通过自然过渡轨道(NTO)分析进行表征。总体而言,我们的结果突出了赤道配体对光谱特征的影响,特别是在UO2Cl2(phen)2中尤为明显,其中U 3d_{3/2} → 5fσu*卫星跃迁出现在比其他系统更低的能量处。
论文及项目相关链接
PDF 55 pages, 10 figures, 5 tables
摘要
本研究采用高能量分辨率荧光检测(HERFD-XANES)模式的X射线吸收近边结构测量技术,针对不同程度偏离线性的铀酰基团分子复合物,特别是含有UO2Cl2基团的体系,如UO2Cl2·n(H2O)和UO2Cl2(phen)2,进行了核心状态的研究。通过与文献报道的其他铀酰复合物进行比较,本研究评估了这些体系的UM4边缘HERFD-XANES光谱。同时,结合基于TD-DFT的从头算相对论量子化学模拟,对[UO2(NO3)2·n(H2O)]、UO2Cl2·n(H2O)和UO2Cl2(phen)2体系进行了研究。模拟结果与实验数据略有偏差,峰值分裂与实验值相差不到1 eV。通过自然跃迁轨道(NTO)分析进一步表征了这些核心激发态。结果表明,赤道配体对光谱特征的影响显著,特别是在UO2Cl2(phen)2体系中,U 3d_{3/2} → 5fσu*卫星跃迁出现在较低能量处。
要点归纳
- 本研究使用HERFD-XANES技术探究了不同程度偏离线性的铀酰基团分子复合物的核心状态。
- 研究对象包括含有UO2Cl2基团的体系,如UO2Cl2·n(H2O)和UO2Cl2(phen)2。
- 通过与文献对比,评估了这些体系的UM4边缘HERFD-XANES光谱。
- 结合从头算相对论量子化学模拟进行研究,使用了TD-DFT方法和CAM-B3LYP功能。
- 模拟结果与实验数据略有偏差,但整体趋势一致。
- 赤道配体对光谱特征的影响显著,特别是在UO2Cl2(phen)2体系中。
点此查看论文截图

In-Context Reverse Classification Accuracy: Efficient Estimation of Segmentation Quality without Ground-Truth
Authors:Matias Cosarinsky, Ramiro Billot, Lucas Mansilla, Gabriel Jimenez, Nicolas Gaggión, Guanghui Fu, Enzo Ferrante
Assessing the quality of automatic image segmentation is crucial in clinical practice, but often very challenging due to the limited availability of ground truth annotations. In this paper, we introduce In-Context Reverse Classification Accuracy (In-Context RCA), a novel framework for automatically estimating segmentation quality in the absence of ground-truth annotations. By leveraging recent in-context learning segmentation models and incorporating retrieval-augmentation techniques to select the most relevant reference images, our approach enables efficient quality estimation with minimal reference data. Validated across diverse medical imaging modalities, our method demonstrates robust performance and computational efficiency, offering a promising solution for automated quality control in clinical workflows, where fast and reliable segmentation assessment is essential. The code is available at https://github.com/mcosarinsky/In-Context-RCA.
在临床实践中,评估自动图像分割的质量至关重要,但由于真实标注数据的有限性,这常常是一项非常具有挑战性的任务。在本文中,我们介绍了无真实标注情况下的自动估计分割质量的新型框架——上下文反向分类准确率(In-Context RCA)。我们通过利用最新的上下文学习分割模型和结合检索增强技术来选择最相关的参考图像,使得在极少参考数据的情况下也能实现高效的质量评估。我们的方法已经通过不同医学影像模态的验证,表现出了稳健的性能和计算效率,为临床工作流程中的自动化质量控制提供了有前景的解决方案,特别是在快速可靠的分割评估方面。相关代码可通过https://github.com/mcosarinsky/In-Context-RCA获取。
论文及项目相关链接
摘要
医学图像自动分割质量评估在临床实践中至关重要,但受限于真实标注数据的稀缺性,评估工作颇具挑战。本文提出一种无需真实标注的自动分割质量评估新框架——基于语境反向分类准确度(In-Context RCA)。通过采用最新的语境学习分割模型和检索增强技术选择最相关的参考图像,该方法可在少量参考数据的情况下实现高效的质量评估。在不同医学成像模式下进行的验证显示,该方法具有稳健的性能和计算效率,为临床工作流程中的自动化质量控制提供了有前景的解决方案,尤其是在需要快速可靠分割评估的情况下。相关代码可访问https://github.com/mcosarinsky/In-Context-RCA。
要点摘要
- 自动分割质量评估在医学图像分析中具有重要地位,但由于真实标注数据的稀缺性而面临挑战。
- 提出了一种新型评估框架In-Context RCA,无需真实标注数据。
- 融合最新的语境学习分割模型和检索增强技术来提升质量评估的效率和准确性。
- 方法在多种医学成像模式下表现稳健,具备高效计算效率。
- 为临床实践中自动化质量控制提供了切实可行的解决方案。
点此查看论文截图



Optimizing normal tissue sparing via spatiotemporal optimization under equivalent tumor-radical efficacy
Authors:Nimita Shinde, Wangyao Li, Ronald C Chen, Hao Gao
Objective: Spatiotemporal optimization in radiation therapy involves determining the optimal number of dose delivery fractions (temporal) and the optimal dose per fraction (spatial). Traditional approaches focus on maximizing the biologically effective dose (BED) to the target while constraining BED to organs-at-risk (OAR), which may lead to insufficient BED for complete tumor cell kill. This work proposes a formulation that ensures adequate BED delivery to the target while minimizing BED to the OAR. Approach: A spatiotemporal optimization model is developed that incorporates an inequality constraint to guarantee sufficient BED for tumor cell kill while minimizing BED to the OAR. The model accounts for tumor proliferation dynamics, including lag time (delay before proliferation begins) and doubling time (time for tumor volume to double), to optimize dose fractionation. Results: The performance of our formulation is evaluated for varying lag and doubling times. The results show that mean BED to the target consistently meets the minimum requirement for tumor cell kill. Additionally, the mean BED to OAR varies based on tumor proliferation dynamics. In the prostate case with lag time of 7 days and doubling time of 2 days, it is observed that mean BED delivered to femoral head is lowest at around 20 fractions, making this an optimal choice. While in the head-and-neck case, mean BED to OAR decreases as the number of fractions increases, suggesting that a higher number of fractions is optimal. Significance: A spatiotemporal optimization model is presented that minimizes BED to the OAR while ensuring sufficient BED for tumor cell kill. By incorporating tumor lag and doubling time, the approach identifies optimal number of fractions. This model can be extended to support hyperfractionation or accelerated fractionation strategies, offering a versatile tool for clinical treatment planning.
目标:放射治疗中的时空优化涉及确定最佳给药分数(时间)和每分数的最佳剂量(空间)。传统方法侧重于最大化目标部位的生物有效剂量(BED),同时限制风险器官(OAR)的BED,这可能导致对目标部位的BED不足,无法完全杀死肿瘤细胞。这项工作提出了一种公式,可确保目标部位获得足够的BED,同时最小化OAR的BED。方法:开发了一个时空优化模型,通过不等式约束确保肿瘤细胞杀死的BED充足,同时最小化OAR的BED。该模型考虑了肿瘤增殖动力学,包括潜伏期(增殖开始前的延迟时间)和倍增时间(肿瘤体积翻倍所需的时间),以优化剂量分割。结果:我们的公式针对不同的潜伏期和倍增时间进行了性能评估。结果表明,目标部位的平均BED始终达到肿瘤细胞杀死的最低要求。此外,OAR的平均BED会根据肿瘤增殖动力学而变化。在前列腺案例中,潜伏期为7天,倍增时间为2天的情况下,观察到股骨头的平均BED在大约20个分数时最低,这使得它成为最佳选择。而在头颈案例中,随着分数的增加,OAR的平均BED下降,表明分数越多越理想。意义:提出了一种时空优化模型,该模型可确保目标部位获得足够的BED,同时最小化OAR的BED。通过结合肿瘤的潜伏期和倍增时间,该方法可以确定最佳分数。该模型可扩展到支持超分割或加速分割策略,成为临床治疗方案制定的灵活工具。
论文及项目相关链接
摘要
本文研究了放射治疗中的时空优化问题,旨在确定最佳剂量交付分数(时间)和每个分数的最佳剂量(空间)。传统方法侧重于最大化目标生物有效剂量(BED),同时限制危险器官(OAR)的BED,可能导致不足以完全杀死肿瘤细胞。本文提出一种确保目标BED充足并最小化OAR的BED的时空优化模型。该模型考虑了肿瘤增殖动力学,包括潜伏期(增殖开始前的延迟时间)和倍增时间(肿瘤体积翻倍的时间),以优化剂量分割。评价模型在不同潜伏期和倍增时间下的表现,结果显示目标平均BED始终满足肿瘤细胞杀灭的最低要求。此外,OAR的平均BED会根据肿瘤增殖动力学而变化。在前列腺案例中,当潜伏期为7天,倍增时间为2天时,股骨头的平均BED在大约20个分数时最低,这是最佳选择。在头颈案例中,随着分数的增加,OAR的平均BED降低,表明分数更高是最佳的。本文提出了一种时空优化模型,通过结合肿瘤潜前期和倍增时间确定最佳分割次数。它能够延长对OAR的BED的同时确保肿瘤细胞杀灭的充足BED。此模型可支持超分割或加速分割策略,为临床治疗方案设计提供灵活工具。
关键见解
- 放射治疗中的时空优化涉及确定最佳剂量交付分数和每个分数的最佳剂量。
- 传统方法可能不足以保证完全杀死肿瘤细胞的BED。
- 提出的模型确保目标的BED充足并最小化危险器官的BED。
- 模型考虑了肿瘤增殖动力学,包括潜伏期和倍增时间。
- 在不同的肿瘤案例中,最佳剂量分割策略有所不同。
- 模型可支持不同的剂量分割策略,为临床治疗方案提供灵活性。
点此查看论文截图

Frequency Domain Enhanced U-Net for Low-Frequency Information-Rich Image Segmentation in Surgical and Deep-Sea Exploration Robots
Authors:Guohao Huo, Ruiting Dai, Jinliang Liu, Ling Shao, Hao Tang
In deep-sea exploration and surgical robotics scenarios, environmental lighting and device resolution limitations often cause high-frequency feature attenuation. Addressing the differences in frequency band sensitivity between CNNs and the human visual system (mid-frequency sensitivity with low-frequency sensitivity surpassing high-frequency), we experimentally quantified the CNN contrast sensitivity function and proposed a wavelet adaptive spectrum fusion (WASF) method inspired by biological vision mechanisms to balance cross-frequency image features. Furthermore, we designed a perception frequency block (PFB) that integrates WASF to enhance frequency-domain feature extraction. Based on this, we developed the FE-UNet model, which employs a SAM2 backbone network and incorporates fine-tuned Hiera-Large modules to ensure segmentation accuracy while improving generalization capability. Experiments demonstrate that FE-UNet achieves state-of-the-art performance in cross-domain tasks such as marine organism segmentation and polyp segmentation, showcasing robust adaptability and significant application potential. The code will be released soon.
在深海探索和手术机器人场景中,环境照明和设备分辨率的限制通常会导致高频特征衰减。为了解决卷积神经网络(CNN)和人类视觉系统之间在频率带宽敏感性上的差异(中频敏感性低于低频敏感性而高于高频),我们通过实验量化了CNN的对比敏感度函数,并基于生物视觉机制提出了一种小波自适应频谱融合(WASF)方法来平衡跨频率的图像特征。此外,我们设计了一个感知频率块(PFB),将WASF集成其中,以增强频率域的特征提取。基于此,我们开发了FE-UNet模型,该模型采用SAM2主干网络,并融入了精细调整过的Hiera-Large模块,以确保分割精度的同时提高泛化能力。实验表明,FE-UNet在跨域任务(如海洋生物分割和息肉分割)中实现了最先进的性能表现,展现了强大的适应性和显著的应用潜力。代码将很快发布。
论文及项目相关链接
Summary
医学图像研究中,针对深海探索和手术机器人场景下的环境照明和设备分辨率限制导致高频特征衰减问题,通过对比CNN与人类视觉系统的频率敏感度差异,提出了基于生物视觉机制的小波自适应频谱融合(WASF)方法,并设计了感知频率块(PFB)来增强频率域特征提取。在此基础上开发的FE-UNet模型,结合了SAM2骨干网络和经过调优的Hiera-Large模块,提升了分割准确性与泛化能力,并在跨域任务如海洋生物分割和息肉分割中取得最佳性能,展现出强大的适应性和应用潜力。
Key Takeaways
- 深海探索和手术机器人场景中,环境照明和设备分辨率限制导致高频特征衰减问题。
- 对比了CNN与人类视觉系统的频率敏感度差异,发现CNN对中间频率的敏感度较低。
- 提出了基于生物视觉机制的小波自适应频谱融合(WASF)方法,以平衡跨频率图像特征。
- 设计了感知频率块(PFB)以增强频率域特征提取。
- 开发FE-UNet模型,结合SAM2骨干网络和经过调优的Hiera-Large模块,提高分割准确性和泛化能力。
- FE-UNet模型在跨域任务如海洋生物分割和息肉分割中取得最佳性能。
点此查看论文截图





Efficient Deep Learning-based Forward Solvers for Brain Tumor Growth Models
Authors:Zeineb Haouari, Jonas Weidner, Yeray Martin-Ruisanchez, Ivan Ezhov, Aswathi Varma, Daniel Rueckert, Bjoern Menze, Benedikt Wiestler
Glioblastoma, a highly aggressive brain tumor, poses major challenges due to its poor prognosis and high morbidity rates. Partial differential equation-based models offer promising potential to enhance therapeutic outcomes by simulating patient-specific tumor behavior for improved radiotherapy planning. However, model calibration remains a bottleneck due to the high computational demands of optimization methods like Monte Carlo sampling and evolutionary algorithms. To address this, we recently introduced an approach leveraging a neural forward solver with gradient-based optimization to significantly reduce calibration time. This approach requires a highly accurate and fully differentiable forward model. We investigate multiple architectures, including (i) an enhanced TumorSurrogate, (ii) a modified nnU-Net, and (iii) a 3D Vision Transformer (ViT). The nnU-Net achieved the best overall results, excelling in both tumor outline matching and voxel-level prediction of tumor cell concentration. It yielded the lowest MSE in tumor cell concentration compared to ground truth numerical simulation and the highest Dice score across all tumor cell concentration thresholds. Our study demonstrates significant enhancement in forward solver performance and outlines important future research directions.
胶质母细胞瘤是一种高度侵袭性的脑肿瘤,由于其预后不良和发病率高而带来重大挑战。基于偏微分方程模型的模型在模拟患者特定肿瘤行为以改善放疗计划方面显示出巨大的潜力,有望提高治疗效果。然而,模型校准仍然是瓶颈,因为优化方法(如蒙特卡罗采样和进化算法)的计算需求很高。为了解决这一问题,我们最近引入了一种利用基于梯度的神经正向求解器的方法,可以大大缩短校准时间。这种方法需要一个高度精确和完全可微分的正向模型。我们调查了多种架构,包括(i)增强的TumorSurrogate,(ii)经过修改的nnU-Net,以及(iii)3D Vision Transformer(ViT)。nnU-Net取得了最佳的整体结果,在肿瘤轮廓匹配和肿瘤细胞浓度的体素级预测方面都表现出色。与地面真实数值模拟相比,它在肿瘤细胞浓度方面的均方误差最低,并且在所有肿瘤细胞浓度阈值中都取得了最高的Dice得分。我们的研究展示了正向求解器性能的显著提高,并概述了未来重要的研究方向。
论文及项目相关链接
Summary
高度侵袭性的脑肿瘤胶质母细胞瘤预后不良、发病率高,带来很大挑战。基于偏微分方程模型的模拟在优化放射治疗计划方面显示出潜力。但模型校准因计算需求大而成为瓶颈。最新研究利用神经网络前向求解器和基于梯度的优化方法大幅减少校准时间。研究中调查了多种架构,最终发现nnU-Net效果最佳,在肿瘤轮廓匹配和肿瘤细胞浓度体素级预测方面都表现出色。
Key Takeaways
- 胶质母细胞瘤是一种高度侵袭性的脑肿瘤,预后不良、发病率高,对治疗带来很大挑战。
- 基于偏微分方程模型的模拟在优化放射治疗计划方面显示出潜力。
- 模型校准是瓶颈,因为需要大量的计算资源。
- 利用神经网络前向求解器和基于梯度的优化方法大幅减少校准时间。
- 研究中调查了多种神经网络架构,包括增强版TumorSurrogate、改良版nnU-Net和3D Vision Transformer。
- nnU-Net在肿瘤轮廓匹配和肿瘤细胞浓度体素级预测方面都表现出最佳效果。
点此查看论文截图





CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers
Authors:Dimitrios Mallis, Ahmet Serdar Karadeniz, Sebastian Cavada, Danila Rukhovich, Niki Foteinopoulou, Kseniya Cherenkova, Anis Kacem, Djamila Aouada
We propose CAD-Assistant, a general-purpose CAD agent for AI-assisted design. Our approach is based on a powerful Vision and Large Language Model (VLLM) as a planner and a tool-augmentation paradigm using CAD-specific tools. CAD-Assistant addresses multimodal user queries by generating actions that are iteratively executed on a Python interpreter equipped with the FreeCAD software, accessed via its Python API. Our framework is able to assess the impact of generated CAD commands on geometry and adapts subsequent actions based on the evolving state of the CAD design. We consider a wide range of CAD-specific tools including a sketch image parameterizer, rendering modules, a 2D cross-section generator, and other specialized routines. CAD-Assistant is evaluated on multiple CAD benchmarks, where it outperforms VLLM baselines and supervised task-specific methods. Beyond existing benchmarks, we qualitatively demonstrate the potential of tool-augmented VLLMs as general-purpose CAD solvers across diverse workflows.
我们提出了CAD助手,这是一个用于人工智能辅助设计的通用CAD代理。我们的方法基于强大的视觉和大语言模型(VLLM)作为规划器,并使用CAD专用工具进行工具增强范式。CAD助手通过生成动作来解决多模态用户查询,这些动作在配备FreeCAD软件的Python解释器上迭代执行,通过其Python API进行访问。我们的框架能够评估生成的CAD命令对几何结构的影响,并根据CAD设计的不断变化状态调整后续动作。我们考虑了广泛的CAD专用工具,包括草图图像参数化器、渲染模块、2D横截面生成器和其他专业例行程序。CAD助手在多个CAD基准测试上进行了评估,表现出优于VLLM基准线和有监督的特定任务方法。除了现有的基准测试外,我们还从定性角度展示了工具增强型VLLM在多样化工作流程中的通用CAD求解器的潜力。
论文及项目相关链接
摘要
本文提出了基于强大的视觉和大型语言模型(VLLM)的CAD助手。通过采用工具增强模式并使用CAD特定工具,它能够处理多种模态的用户查询,并生成动作在Python解释器上执行,同时配备FreeCAD软件通过其Python API进行访问。该框架能够评估生成的CAD命令对几何结构的影响,并根据CAD设计的不断变化状态调整后续动作。考虑到了广泛的CAD特定工具,包括草图图像参数化器、渲染模块、二维横截面生成器和其他专业例行程序。在多个CAD基准测试中评估了CAD助手的表现,证明其优于VLLM基准和受监督的特定任务方法。此外,我们还通过定性演示了工具增强型VLLM在多样化工作流程中的通用CAD求解潜力。
关键见解
- CAD-Assistant是基于强大的视觉和大型语言模型(VLLM)构建的通用CAD代理,用于AI辅助设计。
- CAD-Assistant采用工具增强模式,使用CAD特定工具处理多种模态的用户查询。
- 它通过Python API与FreeCAD软件集成,能够执行CAD命令并迭代生成动作。
- CAD-Assistant能够评估命令对几何结构的影响,并根据设计状态的演变调整后续动作。
- 该框架支持多种CAD特定工具,包括草图图像参数化器、渲染模块等。
- 在多个CAD基准测试中,CAD-Assistant表现出优异的性能,优于传统的VLLM方法和受监督的特定任务方法。
点此查看论文截图




PadChest-GR: A Bilingual Chest X-ray Dataset for Grounded Radiology Report Generation
Authors:Daniel C. Castro, Aurelia Bustos, Shruthi Bannur, Stephanie L. Hyland, Kenza Bouzid, Maria Teodora Wetscherek, Maria Dolores Sánchez-Valverde, Lara Jaques-Pérez, Lourdes Pérez-Rodríguez, Kenji Takeda, José María Salinas, Javier Alvarez-Valle, Joaquín Galant Herrero, Antonio Pertusa
Radiology report generation (RRG) aims to create free-text radiology reports from clinical imaging. Grounded radiology report generation (GRRG) extends RRG by including the localisation of individual findings on the image. Currently, there are no manually annotated chest X-ray (CXR) datasets to train GRRG models. In this work, we present a dataset called PadChest-GR (Grounded-Reporting) derived from PadChest aimed at training GRRG models for CXR images. We curate a public bi-lingual dataset of 4,555 CXR studies with grounded reports (3,099 abnormal and 1,456 normal), each containing complete lists of sentences describing individual present (positive) and absent (negative) findings in English and Spanish. In total, PadChest-GR contains 7,037 positive and 3,422 negative finding sentences. Every positive finding sentence is associated with up to two independent sets of bounding boxes labelled by different readers and has categorical labels for finding type, locations, and progression. To the best of our knowledge, PadChest-GR is the first manually curated dataset designed to train GRRG models for understanding and interpreting radiological images and generated text. By including detailed localization and comprehensive annotations of all clinically relevant findings, it provides a valuable resource for developing and evaluating GRRG models from CXR images. PadChest-GR can be downloaded under request from https://bimcv.cipf.es/bimcv-projects/padchest-gr/
放射报告生成(RRG)旨在从临床影像中生成自由文本放射报告。基于情境的放射报告生成(GRRG)扩展了RRG,通过在图像上定位个别发现物。目前,没有手动标注的胸部X射线(CXR)数据集来训练GRRG模型。在这项工作中,我们提出了一种名为PadChest-GR(基于情境的报告)的数据集,它来源于PadChest,旨在针对CXR图像训练GRRG模型。我们整理了一个公共双语数据集,包含4555项CXR研究,带有基于情境的报告(3099项异常和1456项正常),每项研究都包含用英语和西班牙语描述的个别当前存在(阳性)和不存在(阴性)的发现物的完整句子列表。总共,PadChest-GR包含7037个阳性发现物句子和3422个阴性发现物句子。每个阳性发现物句子都与由不同读者标记的两套独立边界框相关联,并具有发现物类型、位置和进展的类别标签。据我们所知,PadChest-GR是第一个手动整理的数据集,旨在针对理解和解释放射影像图像和生成文本训练GRRG模型。通过包含所有临床相关发现物的详细定位和全面注释,它为开发和评估从CXR图像生成的GRRG模型提供了宝贵的资源。可以通过https://bimcv.cipf.es/bimcv-projects/padchest-gr/的要求下载PadChest-GR。
论文及项目相关链接
Summary
本文介绍了PadChest-GR数据集,该数据集是从PadChest衍生而来,旨在训练用于胸部X射线图像(CXR)的GRRG模型。该数据集包含带有接地报告(即有和正常情况下的观察结果)的4,555个CXR研究,包括用英语和西班牙语描述的个别观察结果的句子列表。该数据集提供了详细的定位和所有相关临床发现的全面注释,是理解和解释放射图像和生成文本的第一个手动整理的数据集。
Key Takeaways
- RRG旨在从临床成像中生成自由文本放射学报告。GRRG进一步通过包含图像上个别发现的定位来扩展RRG。
- 目前没有用于训练GRRG模型的胸部X射线(CXR)手动注释数据集。
- PadChest-GR数据集是从PadChest衍生而来的双语数据集,包含4,555个CXR研究,带有接地报告。其中既有3,099个异常情况和1,456个正常情况。
- PadChest-GR包含与高达两个独立观察者标记的边界框相关联的阳性发现句子,并为发现类型、位置和进展提供类别标签。
- PadChest-GR是第一个设计用于训练和解释放射图像生成文本(GRRG模型)的手动整理数据集。
- PadChest-GR包含所有相关临床发现的详细定位和全面注释,对于开发和评估GRRG模型具有宝贵价值。
点此查看论文截图




Deep Learning-based Cross-modal Reconstruction of Vehicle Target from Sparse 3D SAR Image
Authors:Da Li, Guoqiang Zhao, Chen Yao, Kaiqiang Zhu, Houjun Sun, Jiacheng Bao, Maokun Li
Three-dimensional synthetic aperture radar (3D SAR) is an advanced active microwave imaging technology widely utilized in remote sensing area. To achieve high-resolution 3D imaging,3D SAR requires observations from multiple aspects and altitude baselines surrounding the target. However, constrained flight trajectories often lead to sparse observations, which degrade imaging quality, particularly for anisotropic man-made small targets, such as vehicles and aircraft. In the past, compressive sensing (CS) was the mainstream approach for sparse 3D SAR image reconstruction. More recently, deep learning (DL) has emerged as a powerful alternative, markedly boosting reconstruction quality and efficiency. However, existing DL-based methods typically rely solely on high-quality 3D SAR images as supervisory signals to train deep neural networks (DNNs). This unimodal learning paradigm prevents the integration of complementary information from other data modalities, which limits reconstruction performance and reduces target discriminability due to the inherent constraints of electromagnetic scattering. In this paper, we introduce cross-modal learning and propose a Cross-Modal 3D-SAR Reconstruction Network (CMAR-Net) for enhancing sparse 3D SAR images of vehicle targets by fusing optical information. Leveraging cross-modal supervision from 2D optical images and error propagation guaranteed by differentiable rendering, CMAR-Net achieves efficient training and reconstructs sparse 3D SAR images, which are derived from highly sparse-aspect observations, into visually structured 3D vehicle images. Trained exclusively on simulated data, CMAR-Net exhibits robust generalization to real-world data, outperforming state-of-the-art CS and DL methods in structural accuracy within a large-scale parking lot experiment involving numerous civilian vehicles, thereby demonstrating its strong practical applicability.
三维合成孔径雷达(3D SAR)是一种先进的主动微波成像技术,广泛应用于遥感领域。为了实现高分辨率的3D成像,需要来自目标周围多个方位和高度基线的观测数据。然而,受约束的飞行轨迹往往导致观测数据稀疏,从而降低了成像质量,特别是对于车辆和飞机等异向的人造小目标更是如此。过去,压缩感知(CS)是稀疏三维SAR图像重建的主流方法。最近,深度学习(DL)作为一种强大的替代方法崭露头角,显著提高了重建质量和效率。然而,现有的基于深度学习的方法通常仅依赖高质量的三维SAR图像作为监督信号来训练深度神经网络(DNNs)。这种单一模态的学习模式无法整合来自其他数据模态的互补信息,从而限制了重建性能并降低了目标辨识度,这源于电磁散射的固有约束。本文介绍了跨模态学习,并提出了一种基于光学信息融合的三维SAR跨模态重建网络(CMAR-Net),旨在提高车辆目标的稀疏三维SAR图像质量。通过利用二维光学图像的跨模态监督信息和可微分渲染保证的错误传播机制,CMAR-Net实现了高效的训练,并将从高度稀疏方位观测中得到的三维SAR图像重建为视觉上结构化的三维车辆图像。仅在模拟数据上进行训练的CMAR-Net对真实数据具有强大的泛化能力,在大规模停车场实验中对大量民用车辆的重建在结构精度上超过了现有的压缩感知和深度学习方法,从而证明了其强大的实际应用价值。
论文及项目相关链接
PDF This work has been submitted to the IEEE for possible publication
Summary
本文介绍了基于跨模态学习的三维合成孔径雷达(3D SAR)图像重建技术。通过融合光学信息,提出了一种名为CMAR-Net的跨模态3D-SAR重建网络,可提高稀疏3D SAR图像的车辆目标质量。利用来自二维光学图像的跨模态监督信息和可微分渲染保证误差传播,CMAR-Net能够高效训练,将高度稀疏观测的稀疏3D SAR图像重建为视觉结构化的3D车辆图像。在大型停车场实验中,仅通过模拟数据训练的CMAR-Net对真实数据具有强大的泛化能力,在结构准确性方面优于现有的压缩感知和深度学习方法,显示出其强大的实际应用价值。
Key Takeaways
- 三维合成孔径雷达(3D SAR)是一种先进的主动微波成像技术,广泛应用于遥感领域。
- 3D SAR需要来自目标周围多个方面和高度的观察来实现高分辨率成像,但约束飞行轨迹会导致观测稀疏,降低成像质量。
- 以往的压缩感知(CS)是稀疏3D SAR图像重建的主流方法,而深度学习(DL)最近成为了一种强大的替代方法,显著提高了重建质量和效率。
- 现有的基于DL的方法通常仅依赖高质量的3D SAR图像作为监督信号来训练深度神经网络(DNNs),这限制了互补信息的集成和其他数据模态的使用,从而降低了重建性能和目标辨别能力。
- 本文引入跨模态学习,并提出了一种名为CMAR-Net的跨模态3D-SAR重建网络,通过融合光学信息来提高稀疏3D SAR图像的车辆目标质量。
- CMAR-Net利用来自二维光学图像的跨模态监督信息和可微分渲染保证误差传播,实现高效训练和稀疏3D SAR图像的重建。
点此查看论文截图






Automatic segmentation of Organs at Risk in Head and Neck cancer patients from CT and MRI scans
Authors:Sébastien Quetin, Andrew Heschl, Mauricio Murillo, Rohit Murali, Piotr Pater, George Shenouda, Shirin A. Enger, Farhad Maleki
Purpose: To present a high-performing, robust, and flexible deep learning pipeline for automatic segmentation of 30 organs-at-risk (OARs) in head and neck (H&N) cancer patients, using MRI, CT, or both. Method: We trained a segmentation pipeline on paired CT and MRI-T1 scans from 296 patients. We combined data from the H&N OARs CT and MR segmentation (HaN-Seg) challenge and the Burdenko and GLIS-RT datasets from the Cancer Imaging Archive (TCIA). MRI was rigidly registered to CT, and both were stacked as input to an nnU-Net pipeline. Left and right OARs were merged into single classes during training and separated at inference time based on anatomical position. Modality Dropout was applied during the training, ensuring the model would learn from both modalities and robustly handle missing modalities during inference. The trained model was evaluated on the HaN-Seg test set and three TCIA datasets. Predictions were also compared with Limbus AI software. Dice Score (DS) and Hausdorff Distance (HD) were used as evaluation metrics. Results: The pipeline achieved state-of-the-art performance on the HaN-Seg challenge with a mean DS of 78.12% and HD of 3.42 mm. On TCIA datasets, the model maintained strong agreement with Limbus AI software (DS: 77.43% , HD: 3.27 mm), while also flagging low-quality contours. The pipeline can segment seamlessly from the CT, the MRI scan, or both. Conclusion: The proposed pipeline achieved the best DS and HD scores among all HaN-Seg challenge participants and establishes a new state-of-the-art for fully automated, multi-modal segmentation of H&N OARs.
目的:本研究的目的是利用深度学习技术,构建一个高性能、稳健且灵活的管道,实现对头颈癌患者30个风险器官(OAR)的自动分割。该方法使用MRI、CT或两者结合的数据。方法:我们在来自296名患者的配对CT和MRI-T1扫描上训练了分割管道。我们结合了头颈风险器官CT和MR分割(HaN-Seg)挑战的数据以及TCIA的Burdenko和GLIS-RT数据集。MRI被刚性配准到CT,两者都被堆叠为输入到nnU-Net管道中。在训练过程中,左右OAR被合并为单一类别,并在推理时根据解剖位置进行分离。在训练过程中应用了模态丢弃法,确保模型能够从两种模态中学习,并在推理时稳健地处理缺失的模态。该训练模型在HaN-Seg测试集和三个TCIA数据集上进行了评估。预测结果还与Limbus AI软件进行了比较。使用Dice系数(DS)和Hausdorff距离(HD)作为评估指标。结果:该管道在HaN-Seg挑战中取得了最新技术性能,平均Dice系数为78.12%,Hausdorff距离为3.42毫米。在TCIA数据集中,该模型与Limbus AI软件保持高度一致(Dice系数:77.43%,Hausdorff距离:3.27毫米),同时标记了低质量的轮廓。该管道可以无缝地从CT或MRI扫描或两者中进行分割。结论:所提出的管道在HaN-Seg挑战中取得了最佳的Dice系数和Hausdorff距离得分,为全自动多模态头颈风险器官的分割建立了新的先进技术标准。
论文及项目相关链接
Summary
本文介绍了一种高性能、稳健、灵活的深度学习管道,可自动对头部和颈部(H&N)癌症患者的30个危险器官(OAR)进行MRI、CT或两者的分割。该管道在HaN-Seg挑战和TCIA数据集上进行了训练与评估,实现了在HN-Seg挑战中的最佳表现,建立了全自动多模态分割HN OARs的新里程碑。
Key Takeaways
- 深度学习管道用于自动分割头部和颈部癌症患者的危险器官。
- 管道结合了MRI和CT扫描数据,使用nnU-Net管道进行训练。
- 左右危险器官在训练期间合并成单个类别,在推理时间根据解剖位置进行分离。
- 训练过程中应用了模态丢弃技术,使模型能够处理缺失的模态。
- 在HaN-Seg挑战和TCIA数据集上进行了评估,实现了最佳性能。
- 与Limbus AI软件相比,该模型保持了强大的协议,并可以标记低质量的轮廓。
点此查看论文截图

