⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-10-21 更新
Lightweight Data-Free Denoising for Detail-Preserving Biomedical Image Restoration
Authors:Tomáš Chobola, Julia A. Schnabel, Tingying Peng
Current self-supervised denoising techniques achieve impressive results, yet their real-world application is frequently constrained by substantial computational and memory demands, necessitating a compromise between inference speed and reconstruction quality. In this paper, we present an ultra-lightweight model that addresses this challenge, achieving both fast denoising and high quality image restoration. Built upon the Noise2Noise training framework-which removes the reliance on clean reference images or explicit noise modeling-we introduce an innovative multistage denoising pipeline named Noise2Detail (N2D). During inference, this approach disrupts the spatial correlations of noise patterns to produce intermediate smooth structures, which are subsequently refined to recapture fine details directly from the noisy input. Extensive testing reveals that Noise2Detail surpasses existing dataset-free techniques in performance, while requiring only a fraction of the computational resources. This combination of efficiency, low computational cost, and data-free approach make it a valuable tool for biomedical imaging, overcoming the challenges of scarce clean training data-due to rare and complex imaging modalities-while enabling fast inference for practical use.
当前的自监督去噪技术取得了令人印象深刻的结果,但它们在现实世界中的应用经常受到巨大的计算和内存需求的限制,需要在推理速度和重建质量之间进行妥协。在本文中,我们提出了一种超轻量级的模型来解决这一挑战,实现了快速去噪和高质量图像恢复。该模型建立在Noise2Noise训练框架之上——该框架消除了对干净参考图像或明确噪声模型的依赖——我们引入了一种创新的分阶段去噪管道,名为Noise2Detail(N2D)。在推理过程中,该方法破坏了噪声模式的空间相关性,以产生中间平滑结构,随后对其进行精炼,直接从噪声输入中恢复细节。大量测试表明,Noise2Detail在性能上超越了现有的无数据集技术,同时只需计算资源的很小一部分。效率、低成本和数据免费的方法的结合使其成为生物医学成像的有价值的工具,克服了因稀有和复杂的成像模式导致的清洁训练数据稀缺的挑战,同时实现了快速推理以供实际应用。
论文及项目相关链接
PDF 10 pages, MICCAI 2025
Summary
本文提出了一种超轻量级模型,采用Noise2Noise训练框架,实现快速去噪和高质量图像恢复。新方法名为Noise2Detail(N2D),在推理过程中破坏噪声模式的空间相关性以产生中间平滑结构,随后直接对噪声输入进行精细修复以恢复细节。该方法在无需数据集的情况下表现优异,同时计算资源消耗较低,尤其适用于生物医学成像领域。
Key Takeaways
- 当前自监督去噪技术虽取得显著成果,但实际应用中面临计算与内存需求大的挑战,需要在推理速度与重建质量之间做出妥协。
- 论文提出了一种超轻量级模型,结合Noise2Noise训练框架,解决这一问题,实现快速去噪和高质量图像恢复。
- 引入名为Noise2Detail(N2D)的多阶段去噪管道,通过破坏噪声模式的空间相关性产生中间平滑结构。
- Noise2Detail方法能在无需数据集的情况下表现出卓越性能。
- 该方法计算资源消耗较低,适用于生物医学成像领域。
- 该方法能够在稀缺清洁训练数据的情况下克服挑战,尤其适用于稀有和复杂的成像模式。
点此查看论文截图



Multiscale X-ray computed tomography of standard optical fibers
Authors:Maria Caterina Crocco, Flavio Cognigni, Alessia Sanna, Raffaele Filosa, Svetlana Siprova, Riccardo C. Barberi, Raffaele G. Agostino, Stefan Wabnitz, Antonio D’Alessandro, Sylvie Lebrun, Marco Rossi, Vincenzo Formoso, Roberto Termine, Alberto Bravin, Mario Ferraro
Optical fiber technologies enable high-speed communication, medical imaging, and advanced sensing. Among the techniques for the characterization of optical fibers, Xray computed tomography has recently emerged as a versatile non-destructive tool for mapping their refractive index variations in 3D. In this study, we present a multiscale characterization of standard optical fibers. We carry out an intercomparison of three tomography setups: classical computed microtomography, X-ray microscopy, and nanotomography. In each method, our analysis highlights the trade-offs between resolution, field of view, and segmentation efficiency. Additionally, we integrate deep learning segmentation thresholding to improve the image analysis process. Thanks to its large field of view, microtomography with classical sources is ideal for the analysis of relatively long fiber spans, where a low spatial resolution is acceptable. The other way around, nanotomography has the highest spatial resolution, but it is limited to very small fiber samples, e.g., fiber tapers and nanofibers, which have diameters of the order of a few microns. Finally, X-ray microscopy provides a good compromise between the sample size fitting the device’s field of view and the spatial resolution needed for properly imaging the inner features of the fiber. Specifically, thanks to its practicality in terms of costs and cumbersomeness, we foresee that the latter will provide the most suitable choice for the quality control of fiber drawing in real-time, e.g., using the “One-Minute Tomographies with Fast Acquisition Scanning Technology” developed by Zeiss. In this regard, the combination of X-ray computed tomography and artificial intelligence-driven enhancements is poised to revolutionize fiber characterization, by enabling precise monitoring and adaptive control in fiber manufacturing.
光纤技术能够实现高速通信、医学成像和先进传感。在光纤特性分析技术中,X射线计算机断层扫描最近涌现为一种通用的非破坏性工具,可用于三维映射其折射率变化。在这项研究中,我们对标准光纤进行了多尺度表征。我们对三种断层扫描设置进行了相互比较:经典计算微断层扫描、X射线显微镜和纳米断层扫描。在每种方法中,我们的分析都强调了分辨率、视野和分割效率之间的权衡。此外,我们集成了深度学习分割阈值处理来改善图像分析过程。由于具有较大的视野,使用经典源进行的微断层扫描非常适合分析相对较长的光纤跨度,此时可以接受较低的空间分辨率。相反,纳米断层扫描具有最高的空间分辨率,但仅限于非常小的光纤样品,例如直径为数微米的渐变光纤和纳米光纤。最后,X射线显微镜在样品尺寸适合设备视野与成像光纤内部特征所需的空间分辨率之间提供了良好的折衷方案。具体来说,由于其在成本和繁琐性方面的实用性,我们预计后者将成为实时纤维拉制质量控制的最合适选择,例如使用Zeiss开发的“一分钟断层扫描与快速采集扫描技术”。在这方面,X射线计算机断层扫描和人工智能驱动增强的结合,有望通过实现精确监控和自适应控制来彻底改变光纤表征。
论文及项目相关链接
Summary
本文介绍了光纤的多种表征技术,包括利用X射线计算机断层扫描技术进行非破坏性表征的方法。研究中对比了三种成像技术,包括微断层扫描、X射线显微镜和纳米断层扫描,并探讨了它们在分辨率、视野和分割效率方面的优缺点。此外,还结合了深度学习分割阈值技术改进图像分析过程。对于不同类型的光纤,三种成像技术都有适用的场景,而结合了X射线计算机断层扫描与人工智能的增强技术可为光纤制造过程的精确监测和适应性控制带来革命性变革。其中,预计基于蔡司公司的快速成像扫描技术的X射线显微镜将为实时光纤质量控制提供最佳选择。
Key Takeaways
- 光学光纤的高速通信、医疗成像和先进传感应用依赖于对其特性的准确表征。
- X射线计算机断层扫描是非破坏性表征光纤折射率变化的有效工具。
- 对比了三种断层扫描技术:微断层扫描、X射线显微镜和纳米断层扫描,各有优缺点。
- 结合深度学习提高了图像分析过程的分割效率。
- 微断层扫描适用于分析相对长的光纤跨度,而纳米断层扫描则适用于微小纤维样本的高分辨率分析。
- X射线显微镜提供了一个在视野分辨率和成像光纤内部特征之间较好的平衡,预计将成为光纤实时质量控制的首选技术。
点此查看论文截图






Diffusion Bridge Networks Simulate Clinical-grade PET from MRI for Dementia Diagnostics
Authors:Yitong Li, Ralph Buchert, Benita Schmitz-Koep, Timo Grimmer, Björn Ommer, Dennis M. Hedderich, Igor Yakushev, Christian Wachinger
Positron emission tomography (PET) with 18F-Fluorodeoxyglucose (FDG) is an established tool in the diagnostic workup of patients with suspected dementing disorders. However, compared to the routinely available magnetic resonance imaging (MRI), FDG-PET remains significantly less accessible and substantially more expensive. Here, we present SiM2P, a 3D diffusion bridge-based framework that learns a probabilistic mapping from MRI and auxiliary patient information to simulate FDG-PET images of diagnostic quality. In a blinded clinical reader study, two neuroradiologists and two nuclear medicine physicians rated the original MRI and SiM2P-simulated PET images of patients with Alzheimer’s disease, behavioral-variant frontotemporal dementia, and cognitively healthy controls. SiM2P significantly improved the overall diagnostic accuracy of differentiating between three groups from 75.0% to 84.7% (p<0.05). Notably, the simulated PET images received higher diagnostic certainty ratings and achieved superior interrater agreement compared to the MRI images. Finally, we developed a practical workflow for local deployment of the SiM2P framework. It requires as few as 20 site-specific cases and only basic demographic information. This approach makes the established diagnostic benefits of FDG-PET imaging more accessible to patients with suspected dementing disorders, potentially improving early detection and differential diagnosis in resource-limited settings. Our code is available at https://github.com/Yiiitong/SiM2P.
正电子发射断层扫描(PET)结合氟脱氧葡萄糖(FDG)是诊断疑似痴呆障碍患者的既定工具。然而,与常规可用的磁共振成像(MRI)相比,FDG-PET的普及程度仍然较低且费用昂贵。在这里,我们提出SiM2P,这是一个基于三维扩散桥的框架,可以从MRI和辅助患者信息中学习概率映射来模拟具有诊断质量的FDG-PET图像。在一项盲态的临床读者研究中,两名神经放射科医师和两名核医学医师对患有阿尔茨海默病、行为性变前额叶痴呆以及认知健康对照患者的原始MRI和SiM2P模拟的PET图像进行了评估。SiM2P显著提高了三组之间的总体诊断准确率,从75.0%提高到84.7%(p<0.05)。值得注意的是,模拟的PET图像获得了更高的诊断确定性评分,并实现了优于MRI图像的评分者间一致性。最后,我们为本地部署SiM2P框架开发了一个实用工作流程。它只需要少数特定于站点的案例和基本的人口统计学信息。这种方法使得FDG-PET成像的既定诊断优势对于疑似痴呆障碍的患者更加容易获得,可能在资源有限的条件下改善早期检测和鉴别诊断。我们的代码可在https://github.com/Yiiitong/SiM2P获取。
论文及项目相关链接
摘要
利用MRI和辅助病人信息,通过三维扩散桥框架SiM2P模拟诊断质量的FDG-PET图像。SiM2P能提高诊断准确性,区分阿尔茨海默病、行为变异额颞叶痴呆和认知健康对照组,从75.0%提高到84.7%(p<0.05)。模拟PET图像获得更高的诊断确定性评分,并达成优于MRI图像的一致性。开发了一个实用的SiM2P框架本地部署工作流程,需要少量的特定站点病例和基本的人口统计学信息。此技术让FDG-PET成像的诊断优势更易为疑似痴呆障碍患者获得,有可能改善资源受限环境下的早期检测和鉴别诊断。相关代码可访问https://github.com/Yiiitong/SiM2P获取。
关键见解
- SiM2P是一种利用MRI和辅助病人信息模拟诊断质量FDG-PET图像的技术。它通过创建一个三维扩散桥梁框架来实现这一点。
- 对比传统的MRI技术,SiM2P提高了诊断的准确性,区分了不同种类的痴呆病症。特别是在盲态的临床医生阅读研究中,它成功提高了总体诊断率从原来的约百分之七十五至百分之八十四以上。这为神经放射医生和核医学医生提供了更准确的诊断工具。
- 模拟的PET图像比MRI图像具有更高的诊断确定性评分和更好的医生间一致性。这意味着医生对模拟PET图像的诊断更有信心,并且他们的判断更加一致。
- SiM2P框架的实用工作流程只需要少量的特定站点病例和基本的人口统计学信息即可实现本地部署。这使得新技术易于实施并广泛应用。
- 该技术使得在资源受限的环境中,像FDG-PET成像这样的先进诊断技术变得更容易获得,有利于疑似痴呆障碍患者的早期检测和鉴别诊断。这将改善医疗资源不均的情况,使更多人能够受益于先进的医学成像技术。
- 研究提供了对新技术易用性的明确评价,显示其对实际应用的高度适用性。这可能会激发更多在医学图像模拟领域的创新和应用。
点此查看论文截图




Rethinking Convergence in Deep Learning: The Predictive-Corrective Paradigm for Anatomy-Informed Brain MRI Segmentation
Authors:Feifei Zhang, Zhenhong Jia, Sensen Song, Fei Shi, Dayong Ren
Despite the remarkable success of the end-to-end paradigm in deep learning, it often suffers from slow convergence and heavy reliance on large-scale datasets, which fundamentally limits its efficiency and applicability in data-scarce domains such as medical imaging. In this work, we introduce the Predictive-Corrective (PC) paradigm, a framework that decouples the modeling task to fundamentally accelerate learning. Building upon this paradigm, we propose a novel network, termed PCMambaNet. PCMambaNet is composed of two synergistic modules. First, the Predictive Prior Module (PPM) generates a coarse approximation at low computational cost, thereby anchoring the search space. Specifically, the PPM leverages anatomical knowledge-bilateral symmetry-to predict a ‘focus map’ of diagnostically relevant asymmetric regions. Next, the Corrective Residual Network (CRN) learns to model the residual error, focusing the network’s full capacity on refining these challenging regions and delineating precise pathological boundaries. Extensive experiments on high-resolution brain MRI segmentation demonstrate that PCMambaNet achieves state-of-the-art accuracy while converging within only 1-5 epochs-a performance unattainable by conventional end-to-end models. This dramatic acceleration highlights that by explicitly incorporating domain knowledge to simplify the learning objective, PCMambaNet effectively mitigates data inefficiency and overfitting.
尽管端到端范式在深度学习领域取得了显著的成功,但它通常存在收敛速度慢和严重依赖大规模数据集的问题,这从根本上限制了其在医学影像等缺乏数据领域的效率和适用性。在这项工作中,我们引入了预测校正(PC)范式,这是一种从根本上加速学习的框架。基于这一范式,我们提出了一种新型网络,称为PCMambaNet。PCMambaNet由两个协同模块组成。首先,预测先验模块(PPM)以较低的计算成本生成粗略近似值,从而锚定搜索空间。具体来说,PPM利用解剖知识——双侧对称性,来预测诊断相关不对称区域的“焦点图”。接下来,校正残差网络(CRN)学习对残差进行建模,使网络的全能专注于细化这些具有挑战性的区域,并描绘出精确的疾病边界。在高分辨率脑MRI分割方面进行的广泛实验表明,PCMambaNet达到了最先进的准确性,并在仅1-5个周期内实现了收敛——这是传统端到端模型无法达到的性能。这种巨大的加速表明,通过明确地结合领域知识来简化学习目标,PCMambaNet有效地缓解了数据效率低下和过拟合问题。
论文及项目相关链接
Summary
引入Predictive-Corrective(PC)范式,加速深度学习模型学习。基于此范式,提出新型网络PCMambaNet,由Predictive Prior Module(PPM)和Corrective Residual Network(CRN)组成。PPM利用解剖知识预测诊断相关不对称区域的“焦点图”,CRN学习残差误差,提高了医学图像分割的准确性和效率。
Key Takeaways
- 深度学习中的端对端范式虽然取得了显著成功,但存在收敛速度慢和依赖大规模数据集的问题,这在数据稀缺领域如医学成像中限制了其效率和适用性。
- 引入Predictive-Corrective(PC)范式,以解耦建模任务来根本性地加速学习。
- PCMambaNet新型网络由Predictive Prior Module(PPM)和Corrective Residual Network(CRN)两个协同模块组成。
- PPM利用解剖知识(如双侧对称性)预测与诊断相关的不对称区域的“焦点图”,降低计算成本并缩小搜索空间。
- CRN致力于学习残差误差,专注于细化具有挑战性的区域并精确描绘病理边界。
- 在高分辨率脑部MRI分割实验中,PCMambaNet实现了最先进的准确性,并在仅1-5个周期内达到收敛,这是传统端对端模型无法实现的性能。
点此查看论文截图




Robust High-Resolution Multi-Organ Diffusion MRI Using Synthetic-Data-Tuned Prompt Learning
Authors:Chen Qian, Haoyu Zhang, Junnan Ma, Liuhong Zhu, Qingrui Cai, Yu Wang, Ruibo Song, Lv Li, Lin Mei, Xianwang Jiang, Qin Xu, Boyu Jiang, Ran Tao, Chunmiao Chen, Shufang Chen, Dongyun Liang, Qiu Guo, Jianzhong Lin, Taishan Kang, Mengtian Lu, Liyuan Fu, Ruibin Huang, Huijuan Wan, Xu Huang, Jianhua Wang, Di Guo, Hai Zhong, Jianjun Zhou, Xiaobo Qu
Clinical adoption of multi-shot diffusion-weighted magnetic resonance imaging (multi-shot DWI) for body-wide tumor diagnostics is limited by severe motion-induced phase artifacts from respiration, peristalsis, and so on, compounded by multi-organ, multi-slice, multi-direction and multi-b-value complexities. Here, we introduce a reconstruction framework, LoSP-Prompt, that overcomes these challenges through physics-informed modeling and synthetic-data-driven prompt learning. We model inter-shot phase variations as a high-order Locally Smooth Phase (LoSP), integrated into a low-rank Hankel matrix reconstruction. Crucially, the algorithm’s rank parameter is automatically set via prompt learning trained exclusively on synthetic abdominal DWI data emulating physiological motion. Validated across 10,000+ clinical images (43 subjects, 4 scanner models, 5 centers), LoSP-Prompt: (1) Achieved twice the spatial resolution of clinical single-shot DWI, enhancing liver lesion conspicuity; (2) Generalized to seven diverse anatomical regions (liver, kidney, sacroiliac, pelvis, knee, spinal cord, brain) with a single model; (3) Outperformed state-of-the-art methods in image quality, artifact suppression, and noise reduction (11 radiologists’ evaluations on a 5-point scale, $p<0.05$), achieving 4-5 points (excellent) on kidney DWI, 4 points (good to excellent) on liver, sacroiliac and spinal cord DWI, and 3-4 points (good) on knee and tumor brain. The approach eliminates navigator signals and realistic data supervision, providing an interpretable, robust solution for high-resolution multi-organ multi-shot DWI. Its scanner-agnostic performance signifies transformative potential for precision oncology.
在临床采用多回扩散加权磁共振成像(multi-shot DWI)进行全身肿瘤诊断时,受到呼吸、蠕动等引起的严重运动相位伪影的限制,再加上多器官、多切片、多方向和多b值的复杂性。在这里,我们引入了一种重建框架,名为LoSP-Prompt,它通过物理信息建模和合成数据驱动提示学习来克服这些挑战。我们将不同射击之间的相位变化建模为高阶局部平滑相位(LoSP),并将其集成到低秩Hankel矩阵重建中。关键的是,该算法的秩参数是通过仅使用模拟生理运动的合成腹部DWI数据进行提示学习来自动设置的。在超过10,000张临床图像(43名受试者,4种扫描仪型号,5个中心)的验证中,LoSP-Prompt:(1)实现了临床单发DWI两倍的空间分辨率,提高了肝脏病变的清晰度;(2)使用单一模型推广到七个不同的解剖区域(肝脏、肾脏、骨盆、膝盖、脊髓、大脑);(3)在图像质量、伪影抑制和降噪方面优于最新方法(11名放射科医生在五点量表上进行评估,p<0.05),在肾脏DWI上获得4-5分(优秀),肝脏、骨盆和脊髓DWI上获得4分(良好至优秀),膝盖和肿瘤大脑获得3-4分(良好)。该方法消除了导航信号和真实数据监督,提供了一种可解释、稳健的解决方案,用于高分辨率多器官多回DWI。其扫描仪无关的性能表明对精准肿瘤学具有变革潜力。
论文及项目相关链接
PDF 43 pages, 27 figures
Summary
在医学图像领域中,针对体部肿瘤的诊断采用了多站式扩散加权磁共振成像(multi-shot DWI),但由于呼吸、蠕动等引起的严重运动引起的相位伪影限制了其临床应用。为解决此问题,本文提出了一种重建框架LoSP-Prompt,通过物理信息建模和合成数据驱动的提示学习来克服这些挑战。该算法实现了临床单站式DWI的两倍空间分辨率,提高了肝脏病变的辨识度,并在多个解剖区域具有广泛应用性。此外,其在图像质量、伪影抑制和降噪方面均优于现有技术,并在肾脏DWI上取得了卓越表现。此方法无需导航信号和真实数据监督,提供了一种可解释、稳健的高分辨率多站式DWI解决方案,对精确肿瘤学具有变革潜力。
Key Takeaways
- 多站式扩散加权磁共振成像(multi-shot DWI)在临床应用中受到限制,主要由于运动引起的相位伪影。
- 提出了一种重建框架LoSP-Prompt,通过物理信息建模和合成数据驱动的提示学习来解决挑战。
- LoSP-Prompt实现了两倍于临床单站式DWI的空间分辨率,提高了肝脏病变的辨识度。
- 该方法可在多个解剖区域广泛应用,包括肝脏、肾脏、骨盆、膝盖、脊髓和大脑。
- LoSP-Prompt在图像质量、伪影抑制和降噪方面表现出卓越性能,优于现有技术。
- 该方法无需导航信号和真实数据监督,提供了可解释性和稳健性。
点此查看论文截图







Post-Processing Methods for Improving Accuracy in MRI Inpainting
Authors:Nishad Kulkarni, Krithika Iyer, Austin Tapp, Abhijeet Parida, Daniel Capellán-Martín, Zhifan Jiang, María J. Ledesma-Carbayo, Syed Muhammad Anwar, Marius George Linguraru
Magnetic Resonance Imaging (MRI) is the primary imaging modality used in the diagnosis, assessment, and treatment planning for brain pathologies. However, most automated MRI analysis tools, such as segmentation and registration pipelines, are optimized for healthy anatomies and often fail when confronted with large lesions such as tumors. To overcome this, image inpainting techniques aim to locally synthesize healthy brain tissues in tumor regions, enabling the reliable application of general-purpose tools. In this work, we systematically evaluate state-of-the-art inpainting models and observe a saturation in their standalone performance. In response, we introduce a methodology combining model ensembling with efficient post-processing strategies such as median filtering, histogram matching, and pixel averaging. Further anatomical refinement is achieved via a lightweight U-Net enhancement stage. Comprehensive evaluation demonstrates that our proposed pipeline improves the anatomical plausibility and visual fidelity of inpainted regions, yielding higher accuracy and more robust outcomes than individual baseline models. By combining established models with targeted post-processing, we achieve improved and more accessible inpainting outcomes, supporting broader clinical deployment and sustainable, resource-conscious research. Our 2025 BraTS inpainting docker is available at https://hub.docker.com/layers/aparida12/brats2025/inpt.
磁共振成像(MRI)是诊断、评估和制定脑病理治疗计划的主要成像方式。然而,大多数自动化MRI分析工具,如分割和注册流程,都是针对健康结构进行优化的,当面对大型病变(如肿瘤)时通常会失效。为了克服这一问题,图像修复技术旨在在肿瘤区域局部合成健康的脑组织,从而能够可靠地应用通用工具。在这项工作中,我们系统地评估了最先进的修复模型,并观察到其独立性能的饱和。作为回应,我们提出了一种结合模型集成和高效后处理策略的方法,例如中值滤波、直方图匹配和像素平均。通过轻量级的U-Net增强阶段实现了进一步的解剖学优化。综合评估表明,我们提出的流程提高了修复区域的解剖合理性和视觉保真度,相比单个基准模型,具有更高的准确性和更稳健的结果。通过结合既定模型和有针对性的后处理,我们实现了改进和更易于访问的修复结果,支持更广泛的临床部署和可持续、注重资源的研究。我们的2025年BraTS修复docker可在https://hub.docker.com/layers/aparida12/brats2025/inpt找到。
论文及项目相关链接
Summary
本研究针对磁共振成像(MRI)在脑病理学诊断、评估和治疗规划中的核心应用,提出了一种结合模型集成和高效后处理策略的图像修复方法。通过结合现有模型和后处理流程的优化,改进了图像修复结果的解剖学合理性和视觉保真度,提高了修复区域的准确性,支持更广泛的临床部署和资源节约型研究。
Key Takeaways
- MRI是诊断、评估和治疗脑病理学的关键成像方式。
- 自动化MRI分析工具在面临大型病变(如肿瘤)时经常失效。
- 图像修复技术旨在合成病变区域的健康脑组织,使通用工具得以可靠应用。
- 当前流行的图像修复模型性能已接近饱和。
- 提出了一种结合模型集成和高效后处理策略的方法,包括中位数滤波、直方图匹配和像素平均等。
- 通过轻量级的U-Net增强阶段进一步实现了解剖学的精细化。
点此查看论文截图


CARDIUM: Congenital Anomaly Recognition with Diagnostic Images and Unified Medical records
Authors:Daniela Vega, Hannah V. Ceballos, Javier S. Vera, Santiago Rodriguez, Alejandra Perez, Angela Castillo, Maria Escobar, Dario Londoño, Luis A. Sarmiento, Camila I. Castro, Nadiezhda Rodriguez, Juan C. Briceño, Pablo Arbeláez
Prenatal diagnosis of Congenital Heart Diseases (CHDs) holds great potential for Artificial Intelligence (AI)-driven solutions. However, collecting high-quality diagnostic data remains difficult due to the rarity of these conditions, resulting in imbalanced and low-quality datasets that hinder model performance. Moreover, no public efforts have been made to integrate multiple sources of information, such as imaging and clinical data, further limiting the ability of AI models to support and enhance clinical decision-making. To overcome these challenges, we introduce the Congenital Anomaly Recognition with Diagnostic Images and Unified Medical records (CARDIUM) dataset, the first publicly available multimodal dataset consolidating fetal ultrasound and echocardiographic images along with maternal clinical records for prenatal CHD detection. Furthermore, we propose a robust multimodal transformer architecture that incorporates a cross-attention mechanism to fuse feature representations from image and tabular data, improving CHD detection by 11% and 50% over image and tabular single-modality approaches, respectively, and achieving an F1 score of 79.8 $\pm$ 4.8% in the CARDIUM dataset. We will publicly release our dataset and code to encourage further research on this unexplored field. Our dataset and code are available at https://github.com/BCVUniandes/Cardium, and at the project website https://bcv-uniandes.github.io/CardiumPage/
先天性心脏疾病(CHDs)的产前诊断在人工智能(AI)解决方案方面具有巨大潜力。然而,由于这些疾病的罕见性,收集高质量的诊断数据仍然很困难,导致数据集的不平衡和质量低下,从而阻碍了模型性能。此外,尚未有公开的努力整合多种来源的信息,如成像和临床数据,这进一步限制了AI模型在支持和增强临床决策制定方面的能力。为了克服这些挑战,我们推出了先天性异常识别与诊断图像和统一医疗记录(CARDIUM)数据集,这是第一个公开可用的多模式数据集,它整合了胎儿超声和超声心动图像以及产妇临床记录,用于产前CHD检测。此外,我们提出了一种稳健的多模式转换器架构,该架构采用交叉注意机制来融合图像和表格数据的特征表示,提高了CHD检测的准确性,相对于图像和表格单模态方法分别提高了11%和50%,在CARDIUM数据集上达到了79.8±4.8%的F1分数。我们将公开发布我们的数据集和代码,以鼓励在这个尚未研究的领域进行进一步的研究。我们的数据集和代码可在[https://github.com/BCVUniandes/Cardium以及项目网站https://bcv-uniandes.github.io/CardiumPage上找到。]
论文及项目相关链接
PDF Accepted to CVAMD Workshop, ICCV 2025
Summary
该文本介绍了先天性心脏疾病产前诊断中人工智能应用面临的挑战,包括高质量数据集难以获取以及缺乏多源信息融合的问题。为解决这些问题,文本提出了CARDIUM数据集和多模态变压器架构,融合了胎儿超声和心电图图像以及母体临床记录数据,提高了先天性心脏疾病的检测性能。数据集和代码已公开发布,以推动该领域的研究。
Key Takeaways
- 先天性心脏疾病产前诊断中人工智能应用面临数据集获取困难的问题。
- 现有的数据集存在不平衡和低质量的问题,影响了模型性能。
- 目前缺乏融合多种信息(如影像和临床数据)的公共努力,限制了人工智能在支持临床决策制定方面的作用。
- 介绍了CARDIUM数据集,该数据集首次公开提供了胎儿超声和心电图图像以及母体临床记录数据,用于先天性心脏疾病的产前检测。
- 提出了一种稳健的多模态变压器架构,通过交叉注意力机制融合图像和表格数据的特征表示。
- 该架构提高了先天性心脏疾病的检测性能,相较于单模态方法,检测效果分别提升了11%和50%。
点此查看论文截图




HyperAIRI: a plug-and-play algorithm for precise hyperspectral image reconstruction in radio interferometry
Authors:Chao Tang, Arwa Dabbech, Adrian Jackson, Yves Wiaux
The next-generation radio-interferometric (RI) telescopes require imaging algorithms capable of forming high-resolution high-dynamic-range images from large data volumes spanning wide frequency bands. Recently, AIRI, a plug-and-play (PnP) approach taking the forward-backward algorithmic structure (FB), has demonstrated state-of-the-art performance in monochromatic RI imaging by alternating a data-fidelity step with a regularisation step via learned denoisers. In this work, we introduce HyperAIRI, its hyperspectral extension, underpinned by learned hyperspectral denoisers enforcing a power-law spectral model. For each spectral channel, the HyperAIRI denoiser takes as input its current image estimate, alongside estimates of its two immediate neighbouring channels and the spectral index map, and provides as output its associated denoised image. To ensure convergence of HyperAIRI, the denoisers are trained with a Jacobian regularisation enforcing non-expansiveness. To accommodate varying dynamic ranges, we assemble a shelf of pre-trained denoisers, each tailored to a specific dynamic range. At each HyperAIRI iteration, the spectral channels of the target image cube are updated in parallel using dynamic-range-matched denoisers from the pre-trained shelf. The denoisers are also endowed with a spatial image faceting functionality, enabling scalability to varied image sizes. Additionally, we formally introduce Hyper-uSARA, a variant of the optimisation-based algorithm HyperSARA, promoting joint sparsity across spectral channels via the l2,1-norm, also adopting FB. We evaluate HyperAIRI’s performance on simulated and real observations. We showcase its superior performance compared to its optimisation-based counterpart Hyper-uSARA, CLEAN’s hyperspectral variant in WSClean, and the monochromatic imaging algorithms AIRI and uSARA.
下一代射电干涉仪(RI)望远镜需要能够从覆盖宽频带的大量数据中形成高分辨率、高动态范围的图像的成像算法。最近,AIRI(一种采用前后向算法结构(FB)的即插即用(PnP)方法)在单色RI成像中展示了最先进的性能,通过数据保真步骤和通过学习去噪器进行的正则化步骤的交替进行。在这项工作中,我们介绍了HyperAIRI,其超光谱扩展版本,由学习超光谱去噪器支持,强制实施幂律光谱模型。对于每个光谱通道,HyperAIRI去噪器以当前图像估计值、其两个邻近通道和光谱指数图的估计值为输入,并提供相关的降噪图像作为输出。为确保HyperAIRI的收敛性,去噪器采用雅可比正则化来强制执行非膨胀性进行训练。为了适应不同的动态范围,我们构建了一排预先训练的去噪器,每个去噪器针对特定的动态范围定制。在每次HyperAIRI迭代中,目标图像立方体的光谱通道使用与动态范围相匹配的去噪器并行更新,这些去噪器来自预先训练的货架。此外,去噪器还配备了空间图像切片功能,可实现各种图像大小的扩展性。另外,我们正式介绍了基于优化算法的HyperSARA的变体Hyper-uSARA,它通过l2,1范数促进光谱通道之间的联合稀疏性,同样采用FB。我们对HyperAIRI的性能进行了模拟和真实观测评估。我们展示了其相较于优化型算法Hyper-uSARA、WSClean中的CLEAN超光谱变体以及单色成像算法AIRI和uSARA的卓越性能。
论文及项目相关链接
PDF 18 pages, 10 figures, submitted to MNRAS
摘要
新一代干涉仪望远镜需要能够从大体积数据形成高分辨率高动态范围图像的成像算法。最近,AIRI采用前后向算法结构(FB)的即插即用(PnP)方法,通过交替数据保真步骤和通过学习去噪器的正则化步骤,在单色干涉仪成像中展示了最先进的性能。在这项工作中,我们介绍了HyperAIRI及其超光谱扩展,以学习超光谱去噪器为支撑,强制实施功率律光谱模型。对于每个光谱通道,HyperAIRI去噪器以其当前图像估计、其两个邻近通道估计和光谱指数图作为输入,并提供相应的去噪图像作为输出。为确保HyperAIRI的收敛性,使用雅可比正则化训练去噪器以强制其非膨胀性。为适应不同的动态范围,我们创建了一排预训练的去噪器,每个去噪器针对特定的动态范围。在HyperAIRI的每次迭代中,目标图像立方体的光谱通道使用与动态范围匹配的去噪器从预训练货架中进行更新。此外,我们还正式推出了基于优化的算法HyperSARA的变体Hyper-uSARA,它通过l2,1范数促进光谱通道的联合稀疏性,也采用FB。我们通过模拟和真实观测对HyperAIRI的性能进行了评估,展示了其相对于基于优化的Hyper-uSARA、WSClean中的CLEAN超光谱变体和单色成像算法AIRI和uSARA的卓越性能。
关键见解
- 下一代干涉仪望远镜需要能够从大规模数据中形成高分辨率、高动态范围的图像。
- AIRI采用前后向算法结构(FB)已展示了其在单色干涉仪成像中的卓越性能。
- HyperAIRI是AIRI的超光谱扩展,利用学习到的超光谱去噪器,并强制实施功率律光谱模型。
- HyperAIRI去噪器能够根据当前图像估计、邻近通道估计和光谱指数图来为每个光谱通道生成去噪图像。
- 通过雅可比正则化确保HyperAIRI的收敛性,并且通过训练去噪器使其具有非膨胀性。
- 为适应不同的动态范围,使用一排预训练的去噪器,每个针对特定动态范围。
点此查看论文截图



Comprehensive language-image pre-training for 3D medical image understanding
Authors:Tassilo Wald, Ibrahim Ethem Hamamci, Yuan Gao, Sam Bond-Taylor, Harshita Sharma, Maximilian Ilse, Cynthia Lo, Olesya Melnichenko, Noel C. F. Codella, Maria Teodora Wetscherek, Klaus H. Maier-Hein, Panagiotis Korfiatis, Valentina Salvatelli, Javier Alvarez-Valle, Fernando Pérez-García
Vision-language pre-training, i.e., aligning images with paired text, is a powerful paradigm to create encoders that can be directly used for tasks such as classification and retrieval, and for downstream tasks such as segmentation and report generation. In the 3D medical image domain, these capabilities allow vision-language encoders (VLEs) to support radiologists by retrieving patients with similar abnormalities or predicting likelihoods of abnormality. While the methodology holds promise, data availability limits the capabilities of current 3D VLEs. In this paper, we alleviate the lack of data by injecting additional inductive biases: introducing a report generation objective and pairing vision-language pre-training with vision-only pre-training. This allows us to leverage both image-only and paired image-text 3D datasets, increasing the total amount of data to which our model is exposed. Through these additional inductive biases, paired with best practices of the 3D medical imaging domain, we develop the Comprehensive Language-image Pre-training (COLIPRI) encoder family. Our COLIPRI encoders achieve state-of-the-art performance in report generation, classification probing, and zero-shot classification, and remain competitive for semantic segmentation.
视觉语言预训练,即图像与配对文本的对齐,是一种强大的范式,可以创建编码器,这些编码器可以直接用于分类和检索任务,以及用于下游任务,如分割和报告生成。在3D医学图像领域,这些功能使视觉语言编码器(VLE)能够通过检索具有类似异常的患者或预测异常的可能性来支持放射科医生。虽然这种方法具有潜力,但数据可用性限制了当前3D VLE的能力。
论文及项目相关链接
Summary
本文介绍了视觉语言预训练在医学图像领域的应用。通过引入报告生成目标和结合视觉语言预训练和视觉预训练,缓解了数据可用性问题。通过引入额外的归纳偏见和最佳实践,开发出COLIPRI编码器家族,在报告生成、分类探测和零样本分类方面取得了最先进的性能,并在语义分割方面保持竞争力。
Key Takeaways
- 视觉语言预训练(VLP)在医学图像领域具有创建编码器的潜力,可直接用于分类、检索以及分割和报告生成等任务。
- 在3D医学图像领域,VLP能够支持放射科医生通过检索具有相似异常的患者或预测异常的可能性来辅助诊断。
- 当前3D VLE的能力受到数据可用性的限制。
- 引入报告生成目标和结合视觉语言预训练与视觉预训练,以缓解数据可用性问题,并利用图像和配对图像文本3D数据集。
- 通过额外的归纳偏见和最佳实践,开发出COLIPRI编码器家族。
- COLIPRI编码器在报告生成、分类探测和零样本分类方面取得最先进的性能。
点此查看论文截图





DCMIL: A Progressive Representation Learning of Whole Slide Images for Cancer Prognosis Analysis
Authors:Chao Tu, Kun Huang, Jie Zhang, Qianjin Feng, Yu Zhang, Zhenyuan Ning
The burgeoning discipline of computational pathology shows promise in harnessing whole slide images (WSIs) to quantify morphological heterogeneity and develop objective prognostic modes for human cancers. However, progress is impeded by the computational bottleneck of gigapixel-size inputs and the scarcity of dense manual annotations. Current methods often overlook fine-grained information across multi-magnification WSIs and variations in tumor microenvironments. Here, we propose an easy-to-hard progressive representation learning, termed dual-curriculum contrastive multi-instance learning (DCMIL), to efficiently process WSIs for cancer prognosis. The model does not rely on dense annotations and enables the direct transformation of gigapixel-size WSIs into outcome predictions. Extensive experiments on twelve cancer types (5,954 patients, 12.54 million tiles) demonstrate that DCMIL outperforms standard WSI-based prognostic models. Additionally, DCMIL identifies fine-grained prognosis-salient regions, provides robust instance uncertainty estimation, and captures morphological differences between normal and tumor tissues, with the potential to generate new biological insights. All codes have been made publicly accessible at https://github.com/tuuuc/DCMIL.
计算病理学这一新兴学科显示出利用全切片图像(WSI)进行形态学异质性量化,并为人类癌症发展客观预后模式的潜力。然而,由于千兆像素大小的输入的计算瓶颈以及密集手动注释的稀缺性,进展受到了阻碍。当前的方法常常忽略了多倍放大WSI中的精细颗粒信息和肿瘤微环境的差异。在这里,我们提出了一种从易到难的渐进表示学习方法,称为双课程对比多实例学习(DCMIL),以有效地处理WSI进行癌症预后。该模型不依赖于密集注释,能够将千兆像素大小的WSI直接转换为结果预测。在12种癌症类型(5954名患者,1254万块瓷砖)上进行的大量实验表明,DCMIL的性能优于基于WSI的标准预后模型。此外,DCMIL能够识别精细的预后显著区域,提供稳健的实例不确定性估计,并捕捉正常和肿瘤组织之间的形态差异,具有产生新的生物学见解的潜力。所有代码均已公开可访问 https://github.com/tuuuc/DCMIL。
论文及项目相关链接
Summary
本文介绍了计算病理学在处理全切片图像(WSIs)方面的潜力,通过量化形态学异质性并开发针对人类癌症的客观预后模式。为解决大规模数据输入的瓶颈及稀疏的手动标注问题,提出一种由易到难的渐进式表示学习方法——双课程对比多实例学习(DCMIL)。该方法无需依赖密集标注,可直接将大规模图像转换为预后预测结果。实验证明,DCMIL在多种癌症类型上的表现优于标准WSI预后模型,并能识别出与预后相关的细微区域,提供稳健的实例不确定性估计,捕捉正常与肿瘤组织间的形态差异,具有生成新生物学见解的潜力。相关代码已公开分享至GitHub。
Key Takeaways
- 计算病理学利用全切片图像(WSIs)在癌症的客观预后模式方面展现出巨大潜力。
- 处理大规模数据输入的瓶颈和手动标注稀缺性是主要挑战。
- 双课程对比多实例学习(DCMIL)是一种有效的处理全切片图像的方法,无需密集标注。
- DCMIL可直接将大规模图像转换为预后预测结果,表现出优异的性能。
- DCMIL能够识别与预后相关的细微区域,提供稳健的实例不确定性估计。
- DCMIL能够捕捉正常与肿瘤组织间的形态差异,有助于生成新的生物学见解。
点此查看论文截图




JEDA: Query-Free Clinical Order Search from Ambient Dialogues
Authors:Praphul Singh, Corey Barrett, Sumana Srivasta, Amitabh Saikia, Irfan Bulu, Sri Gadde, Krishnaram Kenthapadi
Clinical conversations mix explicit directives (order a chest X-ray) with implicit reasoning (the cough worsened overnight, we should check for pneumonia). Many systems rely on LLM rewriting, adding latency, instability, and opacity that hinder real-time ordering. We present JEDA (Joint Embedding for Direct and Ambient clinical orders), a domain-initialized bi-encoder that retrieves canonical orders directly and, in a query-free mode, encodes a short rolling window of ambient dialogue to trigger retrieval. Initialized from PubMedBERT and fine-tuned with a duplicate-safe contrastive objective, JEDA aligns heterogeneous expressions of intent to shared order concepts. Training uses constrained LLM guidance to tie each signed order to complementary formulations (command only, context only, command+context, context+reasoning), producing clearer inter-order separation, tighter query extendash order coupling, and stronger generalization. The query-free mode is noise-resilient, reducing sensitivity to disfluencies and ASR errors by conditioning on a short window rather than a single utterance. Deployed in practice, JEDA yields large gains and substantially outperforms its base encoder and recent open embedders (Linq Embed Mistral, SFR Embedding, GTE Qwen, BGE large, Embedding Gemma). The result is a fast, interpretable, LLM-free retrieval layer that links ambient context to actionable clinical orders in real time.
临床对话中融合了明确的指示(如进行胸部X光检查)与隐性的推理(咳嗽一整夜加重,我们应检查肺炎)。许多系统依赖于大型语言模型(LLM)进行重写,增加了延迟、不稳定和透明度不足的问题,阻碍了实时排序。我们提出了JEDA(用于直接和周围临床订单的联合嵌入),这是一种域初始化双编码器,它可以直接检索规范订单,并在无查询模式下,对周围的短期对话窗口进行编码,以触发检索。JEDA使用PubMedBERT进行初始化,并使用具有重复安全对比目标的方法进行微调,将不同的意图表达与共享订单概念对齐。训练过程中采用受限制的大型语言模型指导,将每个已签署的订单与补充配方(仅命令、仅上下文、命令+上下文、上下文+推理)相关联,产生更清晰的订单间分离、更紧密的查询扩展与订单耦合以及更强的泛化能力。无查询模式是噪声抗扰的,通过基于短期窗口而不是单个话语进行条件化,减少了对抗发音错误和语音识别错误的敏感性。在实践中部署时,JEDA产生了巨大的收益,并显著优于其基础编码器以及最近的开放嵌入器(Linq Embed Mistral、SFR Embedding、GTE Qwen、BGE large、Embedding Gemma)。其结果是快速、可解释、无需大型语言模型的检索层,该层能够实时链接周围上下文并采取行动的临床订单。
论文及项目相关链接
Summary
该文本介绍了一种名为JEDA的联合嵌入技术,用于直接和间接临床订单处理。通过结合公开医疗语言模型和训练过程,实现高效的自动医疗指令系统,以更快地连接实时场景的背景与动作化医疗指令,降低延迟和不稳定性,提高透明度。在实际部署中,JEDA取得了显著优势,显著优于基础编码器和最新开放嵌入器。
Key Takeaways
- JEDA是一种基于医学领域的双编码器技术,支持直接和间接临床订单处理。
- JEDA能够利用PubMedBERT进行初始化,并具备一个针对复制安全的对比目标进行微调的能力。
- JEDA能够关联不同表达意图与共享订单概念,处理多样化的医疗指令表达形式。
- JEDA具备一种查询自由模式,该模式通过考虑短期窗口而非单一话语来减少噪音干扰和语音识别错误的影响。
- JEDA部署在实际环境中取得了显著成果,优于基础编码器和多种最新嵌入技术。
- JEDA提供了一个快速、可解释、无需大型语言模型的检索层,能够实时连接上下文与可操作的医疗指令。
点此查看论文截图







Wearable and Ultra-Low-Power Fusion of EMG and A-Mode US for Hand-Wrist Kinematic Tracking
Authors:Giusy Spacone, Sebastian Frey, Mattia Orlandi, Pierangelo Maria Rapa, Victor Kartsch, Simone Benatti, Luca Benini, Andrea Cossettini
Hand gesture recognition based on biosignals has shown strong potential for developing intuitive human-machine interaction strategies that closely mimic natural human behavior. In particular, sensor fusion approaches have gained attention for combining complementary information and overcoming the limitations of individual sensing modalities, thereby enabling more robust and reliable systems. Among them, the fusion of surface electromyography (EMG) and A-mode ultrasound (US) is very promising. However, prior solutions rely on power-hungry platforms unsuitable for multi-day use and are limited to discrete gesture classification. In this work, we present an ultra-low-power (sub-50 mW) system for concurrent acquisition of 8-channel EMG and 4-channel A-mode US signals, integrating two state-of-the-art platforms into fully wearable, dry-contact armbands. We propose a framework for continuous tracking of 23 degrees of freedom (DoFs), 20 for the hand and 3 for the wrist, using a kinematic glove for ground-truth labeling. Our method employs lightweight encoder-decoder architectures with multi-task learning to simultaneously estimate hand and wrist joint angles. Experimental results under realistic sensor repositioning conditions demonstrate that EMG-US fusion achieves a root mean squared error of $10.6^\circ\pm2.0^\circ$, compared to $12.0^\circ\pm1^\circ$ for EMG and $13.1^\circ\pm2.6^\circ$ for US, and a R$^2$ score of $0.61\pm0.1$, with $0.54\pm0.03$ for EMG and $0.38\pm0.20$ for US.
基于生物信号的手势识别在开发直观的人机交互策略方面表现出巨大潜力,这些策略紧密模仿了人类自然行为。特别是,传感器融合方法受到了关注,能够通过结合互补信息来克服单一传感模式的局限性,从而实现更稳健和可靠的系统。其中,表面肌电图(EMG)和A模式超声(US)的融合前景非常广阔。然而,先前的解决方案依赖于功耗较大的平台,不适用于多天使用,并且仅限于离散手势分类。在这项工作中,我们展示了一个超低功耗(低于50毫瓦)的系统,能够同时采集8通道EMG和4通道A模式US信号,将两种最新平台集成到可穿戴的干接触臂带中。我们提出了一个框架,用于连续跟踪23个自由度(DoFs),其中手部有20个自由度,手腕有3个自由度,采用运动手套进行实际标注。我们的方法采用轻量级编码器-解码器架构进行多任务学习,同时估计手和手腕关节角度。在真实的传感器重新定位条件下进行的实验结果表明,与仅使用EMG或US相比,EMG-US融合达到了平均误差为$10.6^\circ\pm2.0^\circ$的根均方误差(RMSE),其中EMG为$12.0^\circ\pm1^\circ$,而US为$13.1^\circ\pm2.6^\circ$;同时融合后的R$^2$分数为$0.61\pm0.1$,相比之下EMG的R$^2$分数为$0.54\pm0.03$,而US的R$^2$分数为$0.38\pm0.20$。
论文及项目相关链接
PDF 5 pages, 3 figures
摘要
基于生物信号的手势识别在开发直观的人机交互策略方面显示出强大的潜力,这些策略能够紧密模仿自然人类行为。特别是传感器融合方法结合了互补信息,克服了单一传感模式的局限性,从而实现了更稳健和可靠的系统。其中,表面肌电图(EMG)和A模式超声(US)的融合前景非常看好。然而,现有的解决方案依赖于功耗较大的平台,不适用于多日使用,且仅限于离散手势分类。在这项工作中,我们提出了一种超低功耗(低于50毫瓦)的系统,能够同时采集8通道EMG和4通道A模式US信号,将两种最先进的平台集成到可穿戴的干接触臂章中。我们提出了一个框架,用于连续跟踪手部的20个自由度(DOF)和手腕的3个DOF,使用运动学手套进行真实标签标注。我们的方法采用轻量级编码器-解码器架构进行多任务学习,以同时估计手和手腕关节角度。在传感器重新定位的现实条件下进行的实验表明,EMG-US融合达到了平均误差角为$10.6^\circ\pm2.0^\circ$的均方根误差,与EMG的$12.0^\circ\pm1^\circ$和US的$13.1^\circ\pm2.6^\circ$相比具有更高的准确性。此外,我们的方法实现了R$^2$分数为$0.61\pm0.1$的预测效果。
关键见解
- 基于生物信号的手势识别具有发展直观人机交互策略的潜力,可模仿自然人类行为。
- 传感器融合方法结合了互补信息,提高了系统的稳健性和可靠性。
- EMG和A模式超声(US)的融合在手势识别中具有特别的优势。
- 现有解决方案因功耗问题而受限,不适用于多日使用,且主要限于离散手势分类。
- 提出了一种超低功耗的系统,能同时采集EMG和US信号,集成到可穿戴设备中。
- 系统能够连续跟踪手和手腕的多个自由度(DOF)。
点此查看论文截图





RIFLE: Removal of Image Flicker-Banding via Latent Diffusion Enhancement
Authors:Libo Zhu, Zihan Zhou, Xiaoyang Liu, Weihang Zhang, Keyu Shi, Yifan Fu, Yulun Zhang
Capturing screens is now routine in our everyday lives. But the photographs of emissive displays are often influenced by the flicker-banding (FB), which is alternating bright%u2013dark stripes that arise from temporal aliasing between a camera’s rolling-shutter readout and the display’s brightness modulation. Unlike moire degradation, which has been extensively studied, the FB remains underexplored despite its frequent and severe impact on readability and perceived quality. We formulate FB removal as a dedicated restoration task and introduce Removal of Image Flicker-Banding via Latent Diffusion Enhancement, RIFLE, a diffusion-based framework designed to remove FB while preserving fine details. We propose the flicker-banding prior estimator (FPE) that predicts key banding attributes and injects it into the restoration network. Additionally, Masked Loss (ML) is proposed to concentrate supervision on banded regions without sacrificing global fidelity. To overcome data scarcity, we provide a simulation pipeline that synthesizes FB in the luminance domain with stochastic jitter in banding angle, banding spacing, and banding width. Feathered boundaries and sensor noise are also applied for a more realistic simulation. For evaluation, we collect a paired real-world FB dataset with pixel-aligned banding-free references captured via long exposure. Across quantitative metrics and visual comparisons on our real-world dataset, RIFLE consistently outperforms recent image reconstruction baselines from mild to severe flicker-banding. To the best of our knowledge, it is the first work to research the simulation and removal of FB. Our work establishes a great foundation for subsequent research in both the dataset construction and the removal model design. Our dataset and code will be released soon.
屏幕截图现在已经是我们日常生活中的常规操作。然而,发光显示的照片往往受到频闪条纹(FB)的影响,这是由相机滚动快门读出和显示屏亮度调制之间的时间混叠而产生的明暗条纹交替。与广泛研究的摩尔纹退化不同,尽管频闪条纹经常对可读性和感知质量造成严重的影响,但其仍然被较少探索。我们将频闪条纹的去除制定为一个专门的恢复任务,并引入了通过潜在扩散增强去除图像频闪条纹(RIFLE),这是一个基于扩散的框架,旨在去除频闪条纹的同时保留细节。我们提出了频闪条纹先验估计器(FPE),它预测关键的条纹属性并将其注入恢复网络。此外,还提出了掩模损失(ML),以便在不影响全局保真度的情况下,将监督集中在条纹区域。为了克服数据稀缺的问题,我们提供了一个模拟流水线,在亮度域中合成频闪条纹,带有条纹角度、条纹间隔和条纹宽度中的随机抖动。我们还应用了渐变的边界和传感器噪声,以进行更逼真的模拟。为了评估,我们收集了一个配对的真实世界频闪条纹数据集,通过长时间曝光捕获具有像素对齐的无条纹参考。在我们的真实世界数据集上,与定量指标和视觉比较结果相比,RIFLE在轻微至严重的频闪条纹情况下均优于最近的图像重建基线。据我们所知,它是第一个研究FB模拟和去除的工作。我们的工作为后续的研究在数据集构建和去除模型设计方面奠定了坚实的基础。我们的数据集和代码将很快发布。
论文及项目相关链接
摘要
本文研究了屏幕闪烁带(FB)现象,将其定义为一种专门的恢复任务。提出了一种基于扩散的框架RIFLE,能够去除FB同时保留细节。文章引入了闪烁带先验估计器(FPE)来预测关键带属性并注入恢复网络。此外,还提出了Masked Loss(ML)来集中对带状区域的监督而不牺牲全局保真度。为解决数据稀缺问题,提供了一条合成FB的仿真管道,包括在亮度域合成FB以及应用模糊边界和传感器噪声以模拟更真实情况。在真实世界数据集上进行了定量指标和视觉对比评估,RIFLE在轻微至严重的闪烁带现象中均表现优于最新的图像重建基线。本文还是首个研究FB模拟和去除的工作,为后续研究和数据集构建及去除模型设计奠定了坚实基础。
关键见解
- 闪烁带(FB)是一种常见的屏幕截图问题,对可读性和感知质量产生严重影响。
- 提出了一种新的方法RIFLE,基于扩散框架,旨在去除FB同时保留细节。
- 引入了闪烁带先验估计器(FPE)来预测并处理FB的关键属性。
- Masked Loss(ML)被提出来专注于带状区域的监督,确保全局的保真度不受影响。
- 为了解决数据稀缺问题,建立了一个合成FB的仿真管道,包括在亮度域合成FB,并添加随机抖动在带状角度、带状间距和带状宽度等。
- 通过真实世界数据集进行的评估显示,RIFLE在去除FB方面表现优异,优于现有的图像重建方法。
点此查看论文截图





CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding
Authors:Xi Zhang, Zaiqiao Meng, Jake Lever, Edmond S. L. Ho
Multimodal large language models (MLLMs) have recently achieved remarkable progress in radiology by integrating visual perception with natural language understanding. However, they often generate clinically unsupported descriptions, known as medical hallucinations, which pose serious risks in medical applications that demand accuracy and image-grounded outputs. Through empirical analysis, we find that prompt-induced hallucinations remain prevalent in radiology MLLMs, largely due to over-sensitivity to clinical sections. To address this, we introduce Clinical Contrastive Decoding (CCD), a training-free and retrieval-free inference framework that integrates structured clinical signals from task-specific radiology expert models. CCD introduces a dual-stage contrastive mechanism to refine token-level logits during generation, thereby enhancing clinical fidelity without modifying the base MLLM. Experiments on three datasets and multiple models demonstrate that CCD consistently improves overall performance on radiology report generation (RRG). On the MIMIC-CXR dataset, it yields up to a 17% improvement in RadGraph-F1 when applied to state-of-the-art RRG models. Our approach provides a lightweight and generalisable solution for mitigating medical hallucinations, effectively bridging expert models and MLLMs in radiology.
多模态大型语言模型(MLLMs)最近通过整合视觉感知与自然语言理解在放射学领域取得了显著进展。然而,它们经常产生临床上不支持的描述,被称为医学幻觉,这在需要准确性和基于图像输出的医学应用中带来了严重的风险。通过实证分析,我们发现提示诱导的幻觉在放射学MLLMs中仍然普遍存在,很大程度上是由于对临床部分的过度敏感。为了解决这一问题,我们引入了临床对比解码(CCD),这是一种无需训练和检索的推理框架,它集成了来自特定任务放射学专家模型的结构化临床信号。CCD引入了一种双阶段对比机制,在生成过程中精细调整令牌级别的逻辑,从而提高临床保真度,而无需修改基础MLLM。在三个数据集和多个模型上的实验表明,CCD在放射学报告生成(RRG)方面始终提高了总体性能。在MIMIC-CXR数据集上,当应用于最先进的RRG模型时,它在RadGraph-F1上实现了高达17%的改进。我们的方法为缓解医学幻觉提供了一种轻便且通用的解决方案,有效地架起了专家模型和MLLMs在放射学中的桥梁。
论文及项目相关链接
PDF Preprint, 27 pages, 3 figures
Summary
本文主要探讨了多模态大型语言模型(MLLMs)在放射学领域的进展与挑战。文章指出,虽然MLLMs能结合视觉感知和自然语言理解,但在医学应用中,它们常产生未经临床验证的描述,即所谓的医学幻觉,这会给需要精确性和图像基础的输出带来严重风险。为解决这一问题,本文提出了训练无关、检索无关的临床对比解码(CCD)推理框架,该框架整合了特定任务放射学专家模型的结构化临床信号。实验结果表明,CCD在放射学报告生成任务上表现优异,对最先进的RRG模型应用后,在MIMIC-CXR数据集上的RadGraph-F1得分提高了高达17%。本文方法为缓解医学幻觉问题提供了一种轻便且通用的解决方案,有效地架起了专家模型和MLLMs之间的桥梁。
Key Takeaways
- 多模态大型语言模型(MLLMs)在放射学领域通过结合视觉感知和自然语言理解取得了显著进展。
- MLLMs会产生医学幻觉,即未经临床验证的描述,这在需要精确和图像基础的医学应用中构成风险。
- 医学幻觉主要由对临床部分的过度敏感引起。
- 为解决上述问题,提出了训练无关、检索无关的临床对比解码(CCD)推理框架。
- CCD框架整合了特定任务放射学专家模型的结构化临床信号。
- 实验结果表明,CCD在放射学报告生成任务上表现优异,显著提高性能。
点此查看论文截图




Refer to Any Segmentation Mask Group With Vision-Language Prompts
Authors:Shengcao Cao, Zijun Wei, Jason Kuen, Kangning Liu, Lingzhi Zhang, Jiuxiang Gu, HyunJoon Jung, Liang-Yan Gui, Yu-Xiong Wang
Recent image segmentation models have advanced to segment images into high-quality masks for visual entities, and yet they cannot provide comprehensive semantic understanding for complex queries based on both language and vision. This limitation reduces their effectiveness in applications that require user-friendly interactions driven by vision-language prompts. To bridge this gap, we introduce a novel task of omnimodal referring expression segmentation (ORES). In this task, a model produces a group of masks based on arbitrary prompts specified by text only or text plus reference visual entities. To address this new challenge, we propose a novel framework to “Refer to Any Segmentation Mask Group” (RAS), which augments segmentation models with complex multimodal interactions and comprehension via a mask-centric large multimodal model. For training and benchmarking ORES models, we create datasets MaskGroups-2M and MaskGroups-HQ to include diverse mask groups specified by text and reference entities. Through extensive evaluation, we demonstrate superior performance of RAS on our new ORES task, as well as classic referring expression segmentation (RES) and generalized referring expression segmentation (GRES) tasks. Project page: https://Ref2Any.github.io.
近期图像分割模型已经发展到了可以将图像分割成高质量视觉实体的掩膜,但它们无法基于语言和视觉对复杂查询提供全面的语义理解。这一局限影响了这些模型在处理依赖视觉语言提示驱动的友好用户交互等应用场景时的效果。为了缩小这一差距,我们引入了一项全新的任务——多模态引用表达式分割(ORES)。在此任务中,模型根据仅由文本指定或文本和参考视觉实体共同指定的任意提示生成一组掩膜。为了应对这一新挑战,我们提出了一种名为“指向任意分割掩膜组” (RAS) 的新框架,通过掩膜中心的大型多模态模型增强分割模型的复杂多模态交互和理解能力。为了训练和评估ORES模型,我们创建了MaskGroups-2M和MaskGroups-HQ数据集,涵盖了由文本和参考实体指定的各种掩膜组。通过广泛评估,我们在新的ORES任务以及经典的引用表达式分割(RES)和广义引用表达式分割(GRES)任务上展示了RAS的卓越性能。项目页面:https://Ref2Any.github.io。
论文及项目相关链接
PDF ICCV 2025
Summary
一种新型的图像分割模型任务——Omnimodal Refering Expression Segmentation(ORES)被提出,以解决现有模型无法基于语言和视觉对复杂查询进行全面语义理解的问题。为了应对这一新挑战,研究团队提出了一个名为“Refer to Any Segmentation Mask Group”(RAS)的新框架,通过增加分割模型的复杂多模式交互和基于mask的理解能力,来提高模型性能。为训练和评估ORES模型,团队还创建了MaskGroups-2M和MaskGroups-HQ数据集。评估结果表明,RAS在ORES任务以及经典和广义的参照表达式分割任务上表现优异。
Key Takeaways
- 现有图像分割模型虽能生成高质量实体掩膜,但在处理结合语言和视觉的复杂查询时,无法提供全面的语义理解。
- 为解决这一问题,提出了Omnimodal Refering Expression Segmentation(ORES)的新任务。
- 在ORES任务中,模型能够根据纯文本或文本加参考视觉实体的提示,生成一组掩膜。
- 为应对这一挑战,提出了“Refer to Any Segmentation Mask Group”(RAS)框架,通过增强分割模型的复杂多模式交互和基于掩膜的理解能力来提升性能。
- 为训练和评估ORES模型,创建了两个数据集MaskGroups-2M和MaskGroups-HQ,包含由文本和参考实体指定的多样化掩膜组。
- 评估结果显示,RAS在ORES任务及经典和广义参照表达式分割任务上表现卓越。
点此查看论文截图





Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings
Authors:Xingguang Wei, Haomin Wang, Shenglong Ye, Ruifeng Luo, Yanting Zhang, Lixin Gu, Jifeng Dai, Yu Qiao, Wenhai Wang, Hongjie Zhang
We study the task of panoptic symbol spotting, which involves identifying both individual instances of countable things and the semantic regions of uncountable stuff in computer-aided design (CAD) drawings composed of vector graphical primitives. Existing methods typically rely on image rasterization, graph construction, or point-based representation, but these approaches often suffer from high computational costs, limited generality, and loss of geometric structural information. In this paper, we propose VecFormer, a novel method that addresses these challenges through line-based representation of primitives. This design preserves the geometric continuity of the original primitive, enabling more accurate shape representation while maintaining a computation-friendly structure, making it well-suited for vector graphic understanding tasks. To further enhance prediction reliability, we introduce a Branch Fusion Refinement module that effectively integrates instance and semantic predictions, resolving their inconsistencies for more coherent panoptic outputs. Extensive experiments demonstrate that our method establishes a new state-of-the-art, achieving 91.1 PQ, with Stuff-PQ improved by 9.6 and 21.2 points over the second-best results under settings with and without prior information, respectively, highlighting the strong potential of line-based representation as a foundation for vector graphic understanding.
我们研究了全景符号识别任务,该任务涉及识别计算机辅助设计(CAD)绘图中可计数的单个实例和不可计数的语义区域。这些绘图由矢量图形基本元素组成。现有方法通常依赖于图像矢量化、图构建或基于点的表示,但这些方法往往存在计算成本高、通用性有限以及几何结构信息丢失等问题。在本文中,我们提出了VecFormer这一新方法,它通过基于线条的基本元素表示来解决这些挑战。这种设计保留了原始基本元素的几何连续性,能够在保持计算友好的结构的同时实现更准确的形状表示,非常适合矢量图形理解任务。为了进一步提高预测可靠性,我们引入了Branch Fusion Refinement模块,该模块有效地集成了实例和语义预测,解决它们的不一致性,以获得更连贯的全景输出。大量实验表明,我们的方法建立了新的技术标杆,实现了91.1的PQ值。在有/无先验信息的设置下,相对于第二好的结果,Stuff-PQ分别提高了9.6和21.2个百分点。这凸显了基于线条的表示作为矢量图形理解基础的强大潜力。
论文及项目相关链接
Summary
在計算機賦能設計(CAD)圖形文件中,既有圓示單獨個案及無定量的語意分佈識別。既有方法大多基於影像淺積化、圖形建造或點基表示,然而這些方式會出現計算成本高、泛化能力不足及幾何結構信息流失等問題。本文提出VecFormer,以線基表示法解決這些問題,保留原始幾何連續性,實現更精確的形狀表示,並保持友好的計算結構,適用於向量圖形理解任務。我們還引入Branch Fusion Refinement模組,進一步提升預測可靠性,有效整合個案及語意預測,解決它們的不一致性問題,以獲得更連貫的泛視輸出。實驗結果顯示,我們的方法建立新的最佳表現,達到91.1 PQ,在帶有及不帶有先驗信息的設置下分別將Stuff-PQ改善9.6和21.2分點,證明線基表示法於向量圖形理解的強大潛力。
Key Takeaways
- VecFormer解决了CAD圖形中的泛視符號識別問題,包括個案實例和語意分佈的識別。
- 既有方法如影像淺積化、圖形建造和點基表示存在高計算成本、有限泛化能力與幾何結構信息流失的問題。
- VecFormer通過線基表示法解決這些問題,保留原始幾何連續性並實現精確形狀表示。
- VecFormer的方法具有友好的計算結構,特別適用於向量圖形理解任務。
- Branch Fusion Refinement模組提升預測可靠性,整合個案及語意預測,解決它們的不一致性問題。
- 實驗結果顯示VecFormer達到新的最佳表現,特別是在PQ和Stuff-PQ的改善上表現突出。
点此查看论文截图



UNet with Self-Adaptive Mamba-Like Attention and Causal-Resonance Learning for Medical Image Segmentation
Authors:Saqib Qamar, Mohd Fazil, Parvez Ahmad, Shakir Khan, Abu Taha Zamani
Medical image segmentation plays an important role in various clinical applications; however, existing deep learning models face trade-offs between efficiency and accuracy. Convolutional Neural Networks (CNNs) capture local details well but miss the global context, whereas transformers handle the global context but at a high computational cost. Recently, State Space Sequence Models (SSMs) have shown potential for capturing long-range dependencies with linear complexity; however, their direct use in medical image segmentation remains limited due to incompatibility with image structures and autoregressive assumptions. To overcome these challenges, we propose SAMA-UNet, a novel U-shaped architecture that introduces two key innovations. First, the Self-Adaptive Mamba-like Aggregated Attention (SAMA) block adaptively integrates local and global features through dynamic attention weighting, enabling an efficient representation of complex anatomical patterns. Second, the causal resonance multi-scale module (CR-MSM) improves encoder-decoder interactions by adjusting feature resolution and causal dependencies across scales, enhancing the semantic alignment between low- and high-level features. Extensive experiments on MRI, CT, and endoscopy datasets demonstrate that SAMA-UNet consistently outperforms CNN, Transformer, and Mamba-based methods. It achieves 85.38% DSC and 87.82% NSD on BTCV, 92.16% and 96.54% on ACDC, 67.14% and 68.70% on EndoVis17, and 84.06% and 88.47% on ATLAS23, establishing new benchmarks across modalities. These results confirm the effectiveness of SAMA-UNet in combining efficiency and accuracy, making it a promising solution for real-world clinical segmentation tasks. The source code is available on GitHub.
医学图像分割在各种临床应用中都扮演着重要角色。然而,现有的深度学习模型在效率和准确性之间存在权衡。卷积神经网络(CNN)能够很好地捕捉局部细节,但忽略了全局上下文信息,而变压器能够处理全局上下文信息,但计算成本较高。最近,状态空间序列模型(SSM)在捕获长距离依赖方面表现出线性复杂度的潜力;然而,由于其与图像结构的不兼容性和自回归假设,它们在医学图像分割中的直接应用仍然受到限制。为了克服这些挑战,我们提出了SAMA-UNet,这是一种新型U型架构,它引入了两个关键创新点。首先,自适应性Mamba样聚合注意力(SAMA)块通过动态注意力加权自适应地集成局部和全局特征,能够高效地表示复杂的解剖模式。其次,因果共振多尺度模块(CR-MSM)通过调整特征分辨率和跨尺度的因果依赖性,改进了编码器-解码器之间的交互,增强了低级和高级特征之间的语义对齐。在MRI、CT和内窥镜数据集上的大量实验表明,SAMA-UNet在效率和准确性方面的表现均优于CNN、Transformer和基于Mamba的方法。在BTCV上,它达到了85.38%的DSC和87.82%的NSD;在ACDC上达到了92.16%和96.54%;在EndoVis17上达到了67.14%和68.70%;在ATLAS23上达到了84.06%和88.47%,为不同模态建立了新的基准。这些结果证实了SAMA-UNet在结合效率和准确性方面的有效性,使其成为现实世界临床分割任务的有前途的解决方案。源代码已在GitHub上提供。
论文及项目相关链接
摘要
医学图像分割在临床应用中的重要性不言而喻,但现有深度学习模型在效率和准确性之间存在此需寻求更有效的解决方案。卷积神经网络(CNN)能很好地捕捉局部细节,但忽略全局上下文信息;而Transformer能处理全局上下文,但计算成本高。状态空间序列模型(SSM)能够捕获长距离依赖关系且具有线性复杂度,但在医学图像分割中的直接应用仍然有限。本研究提出了一种新型的U形架构SAMA-UNet,包括两大创新点:一是自适应玛瑙样聚合注意力(SAMA)模块,能动态地整合局部和全局特征;二是因果共振多尺度模块(CR-MSM),通过调整特征分辨率和跨尺度的因果依赖关系,增强了编码器与解码器之间的交互。在MRI、CT和内镜数据集上的实验表明,SAMA-UNet在效率与准确性上均表现出卓越性能,为不同模态的医学图像分割树立了新的基准。
关键见解
- 医学图像分割在多种临床应用中具有重要作用,但深度学习模型需要在效率和准确性之间进行权衡。
- 卷积神经网络(CNN)和Transformer在医学图像分割中各有优缺点。CNN擅长捕捉局部细节,但忽略全局上下文;而Transformer能处理全局上下文,但计算成本高。
- 状态空间序列模型(SSM)具有捕获长距离依赖关系的潜力,但在医学图像分割中的直接应用仍然有限。
- SAMA-UNet是一种新型的U形架构,通过SAMA模块和CR-MSM模块的创新,实现了医学图像分割的高效和准确。
- SAMA模块能够自适应地整合局部和全局特征,提高模型的表现力。
- CR-MSM模块通过调整特征分辨率和跨尺度的因果依赖关系,增强了编码器与解码器之间的交互,提高了语义对齐。
- 在多个数据集上的实验结果表明,SAMA-UNet在医学图像分割方面取得了优异性能,为不同模态的医学图像分割树立了新的基准。
点此查看论文截图

A Multimodal Deep Learning Approach for White Matter Shape Prediction in Diffusion MRI Tractography
Authors:Yui Lo, Yuqian Chen, Dongnan Liu, Leo Zekelman, Jarrett Rushmore, Yogesh Rathi, Nikos Makris, Alexandra J. Golby, Fan Zhang, Weidong Cai, Lauren J. O’Donnell
Shape measures have emerged as promising descriptors of white matter tractography, offering complementary insights into anatomical variability and associations with cognitive and clinical phenotypes. However, conventional methods for computing shape measures are computationally expensive and time-consuming for large-scale datasets due to reliance on voxel-based representations. We propose Tract2Shape, a novel multimodal deep learning framework that leverages geometric (point cloud) and scalar (tabular) features to predict ten white matter tractography shape measures. To enhance model efficiency, we utilize a dimensionality reduction algorithm for the model to predict five primary shape components. The model is trained and evaluated on two independently acquired datasets, the HCP-YA dataset, and the PPMI dataset. We evaluate the performance of Tract2Shape by training and testing it on the HCP-YA dataset and comparing the results with state-of-the-art models. To further assess its robustness and generalization ability, we also test Tract2Shape on the unseen PPMI dataset. Tract2Shape outperforms SOTA deep learning models across all ten shape measures, achieving the highest average Pearson’s r and the lowest nMSE on the HCP-YA dataset. The ablation study shows that both multimodal input and PCA contribute to performance gains. On the unseen testing PPMI dataset, Tract2Shape maintains a high Pearson’s r and low nMSE, demonstrating strong generalizability in cross-dataset evaluation. Tract2Shape enables fast, accurate, and generalizable prediction of white matter shape measures from tractography data, supporting scalable analysis across datasets. This framework lays a promising foundation for future large-scale white matter shape analysis.
形态测量已作为有前途的白质纤维追踪描述方法出现,为解剖变异性以及其与认知和临床表型的关联提供了补充性的见解。然而,由于传统计算形态测量的方法依赖于基于体素(voxel-based)的表示方法,对于大规模数据集而言计算成本高且耗时。我们提出了Tract2Shape,这是一个新型的多模式深度学习框架,利用几何(点云)和标量(表格)特征来预测十个白质纤维追踪形态测量方法。为提高模型效率,我们使用降维算法对模型进行预测以得出五个主要形态组成部分。该模型在两个独立获取的数据集上进行训练和评估,分别是HCP-YA数据集和PPMI数据集。我们通过训练Tract2Shape在HCP-YA数据集上进行测试并与最先进的模型比较来评估其性能。为进一步评估其稳健性和泛化能力,我们还对Tract2Shape在未见过的PPMI数据集上进行了测试。Tract2Shape在所有十个形态测量指标上均优于最新深度学习模型,在HCP-YA数据集上取得了最高的平均Pearson相关系数和最低的平均归一化均方误差。消融研究表明,多模式输入和主成分分析都有助于性能提升。在未见过的测试数据集PPMI上,Tract2Shape保持较高的Pearson相关系数和较低的平均归一化均方误差,显示出良好的跨数据集评估泛化能力. Tract2Shape能够实现对白质形态测量的快速、准确和泛化的预测,支持跨数据集的规模化分析。这一框架为未来大规模白质形态分析奠定了坚实的基础。
论文及项目相关链接
PDF 25 pages, 3 figures, 8 tables
Summary
白质纤维束形态学特征描述的新兴方法展现出良好前景,为解剖学变异和认知及临床表型提供了补充信息。然而,传统计算形态学特征的方法由于依赖体素级表示,在处理大规模数据集时计算量大且耗时。本文提出Tract2Shape框架,采用几何点云和标量表格特征的多模态深度学习预测白质纤维束形态学特征的十个指标。为提高模型效率,使用降维算法预测五个主要形态学成分。在HC-PYA和PPMI两个独立数据集上进行训练和评估,通过与现有模型的对比实验证明Tract2Shape性能优于其他深度学习模型。此外,在未见过的PPMI数据集上进行测试,表现出良好的泛化能力。Tract2Shape实现了快速、准确、通用的白质形态学特征预测,支持跨数据集的可扩展分析,为未来大规模白质形态学研究奠定了良好基础。
Key Takeaways
- 白质纤维束形态学特征描述具有前景,为认知和临床表型研究提供补充信息。
- 传统计算形态学特征方法在处理大规模数据集时存在计算量大和耗时的问题。
- Tract2Shape框架采用多模态深度学习预测白质纤维束形态学特征的十个指标,提高预测效率和准确性。
- 模型利用降维算法预测主要形态学成分以增强效率。
- 在两个独立数据集上的实验证明Tract2Shape性能优于其他深度学习模型。
- Tract2Shape在未见过的数据集上表现出良好的泛化能力。
点此查看论文截图


X$^{2}$-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction
Authors:Weihao Yu, Yuanhao Cai, Ruyi Zha, Zhiwen Fan, Chenxin Li, Yixuan Yuan
Four-dimensional computed tomography (4D CT) reconstruction is crucial for capturing dynamic anatomical changes but faces inherent limitations from conventional phase-binning workflows. Current methods discretize temporal resolution into fixed phases with respiratory gating devices, introducing motion misalignment and restricting clinical practicality. In this paper, We propose X$^2$-Gaussian, a novel framework that enables continuous-time 4D-CT reconstruction by integrating dynamic radiative Gaussian splatting with self-supervised respiratory motion learning. Our approach models anatomical dynamics through a spatiotemporal encoder-decoder architecture that predicts time-varying Gaussian deformations, eliminating phase discretization. To remove dependency on external gating devices, we introduce a physiology-driven periodic consistency loss that learns patient-specific breathing cycles directly from projections via differentiable optimization. Extensive experiments demonstrate state-of-the-art performance, achieving a 9.93 dB PSNR gain over traditional methods and 2.25 dB improvement against prior Gaussian splatting techniques. By unifying continuous motion modeling with hardware-free period learning, X$^2$-Gaussian advances high-fidelity 4D CT reconstruction for dynamic clinical imaging. Code is publicly available at: https://x2-gaussian.github.io/.
四维计算机断层扫描(4D CT)重建对于捕捉动态解剖结构变化至关重要,但面临着传统相位分离工作流程的固有局限性。当前的方法通过将时间分辨率离散化为具有呼吸门控装置的固定相位来引入运动错位,并限制了临床实用性。在本文中,我们提出了X$^2$-Gaussian,这是一个新型框架,通过结合动态辐射高斯展开和自我监督的呼吸运动学习,实现了连续时间4D-CT重建。我们的方法通过时空编码器-解码器架构对解剖结构动态进行建模,预测随时间变化的高斯变形,消除了相位离散化。为了消除对外部门控设备的依赖,我们引入了一种生理驱动周期性一致性损失,通过可微优化直接从投影中学习患者特定的呼吸周期。大量实验证明了其卓越性能,相比传统方法获得9.93 dB的峰值信噪比增益,相比先前的高斯展开技术提高了2.25 dB。通过统一连续运动建模与无硬件周期学习,X$^2$-Gaussian推动了高保真4D CT重建在动态临床影像中的应用。代码公开可用:https://x2-gaussian.github.io/。
论文及项目相关链接
PDF Project Page: https://x2-gaussian.github.io/
摘要
本文提出了一种名为X$^2$-Gaussian的新型框架,用于连续时间四维计算机断层扫描(4D CT)重建。该框架通过整合动态辐射高斯展宽和自监督呼吸运动学习,实现了无间断的时间解析重建过程。引入的时空编码器-解码器架构能模拟解剖动态变化,预测时间变化的高斯变形,消除了对呼吸门控设备的依赖。通过采用生理周期一致性损失函数,直接从投影中学习患者特定的呼吸周期。实验证明,X$^2$-Gaussian相较于传统方法和之前的高斯展宽技术有显著改善,峰信噪比提高9.93dB和2.25dB。该框架统一了连续运动建模和无硬件周期学习,为动态临床成像的高保真四维CT重建提供了先进方案。代码已公开于:https://x2-gaussian.github.io/。
关键见解
- X$^2$-Gaussian框架实现了连续时间的四维CT重建,解决了传统相位分箱工作流程的内在限制。
- 引入时空编码器-解码器架构,模拟解剖动态变化,预测时间变化的高斯变形。
- 通过整合动态辐射高斯展宽和自监督呼吸运动学习,消除了对呼吸门控设备的需求。
- 首次提出生理周期一致性损失函数,允许直接从投影中学习患者特定的呼吸周期。
- 相比传统方法和先前的技术,X$^2$-Gaussian在峰信噪比上有显著改进。
- 该框架结合了连续运动建模和无硬件周期学习,为动态临床成像提供了高保真四维CT重建方案。
点此查看论文截图






Predicting gene essentiality and drug response from perturbation screens in preclinical cancer models with LEAP: Layered Ensemble of Autoencoders and Predictors
Authors:Barbara Bodinier, Gaetan Dissez, Lucile Ter-Minassian, Linus Bleistein, Roberta Codato, John Klein, Eric Durand, Antonin Dauvin
High-throughput preclinical perturbation screens, where the effects of genetic, chemical, or environmental perturbations are systematically tested on disease models, hold significant promise for machine learning-enhanced drug discovery due to their scale and causal nature. Predictive models trained on such datasets can be used to (i) infer perturbation response for previously untested disease models, and (ii) characterise the biological context that affects perturbation response. Existing predictive models suffer from limited reproducibility, generalisability and interpretability. To address these issues, we introduce a framework of Layered Ensemble of Autoencoders and Predictors (LEAP), a general and flexible ensemble strategy to aggregate predictions from multiple regressors trained using diverse gene expression representation models. LEAP consistently improves prediction performances in unscreened cell lines across modelling strategies. In particular, LEAP applied to perturbation-specific LASSO regressors (PS-LASSO) provides a favorable balance between near state-of-the-art performance and low computation time. We also propose an interpretability approach combining model distillation and stability selection to identify important biological pathways for perturbation response prediction in LEAP. Our models have the potential to accelerate the drug discovery pipeline by guiding the prioritisation of preclinical experiments and providing insights into the biological mechanisms involved in perturbation response. The code and datasets used in this work are publicly available.
高通量临床前扰动筛选,系统性测试遗传、化学或环境扰动对疾病模型的影响,由于其规模和因果性质,在机器学习辅助药物发现方面展现出巨大潜力。在此类数据集上训练的预测模型可用于(i)推断未测试疾病模型的扰动响应,(ii)表征影响扰动响应的生物学背景。现有预测模型存在重现性、通用性和可解释性方面的局限性。为了解决这些问题,我们引入了分层自动编码器与预测器组合框架(LEAP),这是一种通用且灵活的集成策略,用于聚合使用各种基因表达表示模型训练的多个回归器的预测结果。LEAP在未经筛选的细胞系中不断改进预测性能,跨越各种建模策略。特别是将LEAP应用于特定扰动LASSO回归器(PS-LASSO)时,在接近最新性能的同时提供了较低的计算时间的有利平衡。我们还提出了一种可解释性方法,结合模型蒸馏和稳定性选择,以识别LEAP中用于扰动响应预测的重要生物途径。我们的模型具有加速药物发现流程的潜力,可通过指导临床前实验的优先级排序,并提供有关扰动反应中涉及的生物学机制的见解。本工作中使用的代码和数据集已公开可用。
论文及项目相关链接
Summary
高通量临床前扰动筛选对于机器学习辅助药物发现具有巨大潜力,其系统测试遗传、化学或环境扰动对疾病模型的影响。为提高预测模型的性能并解决现有模型的可重复性、通用性和可解释性问题,本研究提出了一种名为“跳跃式集成自动编码器与预测器”(LEAP)的框架。该框架能够整合多个回归预测器的预测结果,采用多样化的基因表达表示模型进行训练,且能在未筛选的细胞系中持续提高预测性能。此外,本研究还结合了模型蒸馏和稳定性选择,提出了一种可解释性方法,用于识别对扰动反应预测重要的生物途径。此研究有望加快药物发现流程,为临床前实验提供指导,并深入了解扰动反应中的生物机制。
Key Takeaways
- 高通量临床前扰动筛选对于机器学习辅助药物发现具有重要意义。
- 现有预测模型存在可重复性、通用性和可解释性问题。
- 引入的LEAP框架能整合多个回归预测器的结果,提高预测性能。
- LEAP与PS-LASSO结合展现了卓越的性能和较低的计算时间。
- 提出了一种结合模型蒸馏和稳定性选择的可解释性方法,以识别重要生物途径。
- 此研究模型有潜力加速药物发现流程,指导临床前实验优先次序。
- 研究使用的代码和数据集已公开可用。
点此查看论文截图

