发布日期: 2025-11-07

更新日期: 2025-11-27

文章字数: 19.4k

阅读时长: 80 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-11-07 更新

Model order reduction via Lie groups

Authors:Yannik P. Wotte, Patrick Buchfink, Silke Glas, Federico Califano, Stefano Stramigioli

Lie groups and their actions are ubiquitous in the description of physical systems, and we explore implications in the setting of model order reduction (MOR). We present a novel framework of MOR via Lie groups, called MORLie, in which high-dimensional dynamical systems on manifolds are approximated by low-dimensional dynamical systems on Lie groups. In comparison to other Lie group methods we are able to attack non-equivariant dynamics, which are frequent in practical applications, and we provide new non-intrusive MOR methods based on the presented geometric formulation. We also highlight numerically that MORLie has a lower error bound than the Kolmogorov $N$-width, which limits linear-subspace methods. The method is applied to various examples: 1. MOR of a simplified deforming body modeled by a noisy point cloud data following a sheering motion, where MORLie outperforms a naive POD approach in terms of accuracy and dimensionality reduction. 2. Reconstructing liver motion during respiration with data from edge detection in ultrasound scans, where MORLie reaches performance approaching the state of the art, while reducing the training time from hours on a computing cluster to minutes on a mobile workstation. 3. An analytic example showing that the method of freezing is analytically recovered as a special case, showing the generality of the geometric framework.

李群及其行动在描述物理系统时无处不在，我们探索了其在模型降阶（MOR）设置中的影响。我们提出了一种新的基于李群的MOR框架，称为MORLie，其中高维流形上的动力系统被近似为低维李群上的动力系统。与其他李群方法相比，我们能够解决实际应用中常见的非等价动力学问题，并基于所给出的几何公式提供了新的非侵入式MOR方法。我们还从数值上强调，MORLie的误差界限低于限制子空间方法的Kolmogorov N宽度。该方法应用于各种示例：1. MOR的一个简化变形体，由一个遵循剪切运动的带噪声点云数据模拟，MORLie在精度和降维方面优于简单的POD方法。2.使用超声扫描边缘检测的数据重建呼吸时的肝脏运动，MORLie达到了接近最新技术水平的性能，同时将训练时间从计算机集群上的数小时缩短到移动工作站上的数分钟。3.一个分析示例，显示冻结方法作为特殊情况被分析恢复，显示了几何框架的普遍性。

论文及项目相关链接

PDF 22 pages, 21 figures

Summary

本文探讨了基于李群（Lie groups）模型的降阶模型化（MOR）框架，称为MORLie。该框架可近似高维流形上的动力系统为低维李群上的动力系统。相较于其他李群方法，MORLie可处理实际应用中常见的非等价动力学。本文还提供基于几何公式的非侵入式MOR方法，并指出MORLie具有低于Kolmogorov N宽度的误差界。该方法应用于多个实例，包括简化变形体的降阶模型化、肝脏运动重建和理论分析，显示出其准确性与效率。

Key Takeaways

MORLie框架引入李群用于模型降阶（MOR），用以近似高维流形上的动力系统。
与其他李群方法相比，MORLie能够处理非等价动力学，这在实践中很常见。
基于几何公式的非侵入式MOR方法被提出。
MORLie具有低于Kolmogorov N宽度的误差界。
在降阶模型化简化变形体的例子中，MORLie相较于POD方法表现出更高的准确性和降维效果。
在重建肝脏运动的数据中，MORLie达到了接近最佳性能，并将训练时间从集群计算的小时级缩短到工作站上的几分钟级。

Cool Papers

点此查看论文截图

Measuring accretion disc properties in the transitional millisecond pulsar PSR J1023+0038 using XMM-Newton, NuSTAR, NICER and Chandra

Authors:Vishal Jadoliya, Mayukh Pahari, Sudip Bhattacharyya, Shaswat Suresh Nair

Whether the accretion disc in the X-ray high-mode of transitional millisecond pulsars (tMSP) reaches near the neutron star surface by penetrating the magnetosphere is a crucial question with many implications, including for continuous gravitational wave emission from the pulsar. We attempt to answer this question for the tMSP PSR J1023+0038 by segregating high-mode data and performing detailed spectral analysis using the XMM-Newton EPIC-PN+MOS1+MOS2 joint observations, XMM-Newton+NuSTAR joint observations, NICER and Chandra individual observations during different epochs. With the sum of longest exposures ($\sim$202 ksec of high mode data from $\sim$364 ksec of total exposure), we performed a self-consistent spectral analysis and constrain the inner disc radius 16.8 $\pm$ 3.8 km with at least 3$\sigma$ significance. Such a measurement is found consistent with best-fit spectral values of inner disc radius from other observatory like NICER and a joint observations with XMM-Newton and NuSTAR within 3$\sigma$ limits. We also detect a Fe emission line at 6.45 keV, for the first time from a tMSP, in the Chandra spectrum with 99% significance with an upper limit of the inner disc radius of 21 R$_g$, supporting independently the fact that inner disc extends into neutron stars’s magnetosphere during high mode. All results from our analysis imply that the accretion disc is significantly present and extended within the corotation radius of the neutron star in PSR J1023+0038 during the X-ray high-mode of the tMSP PSR J1023+0038. The measured range of inner disc radius is fully consistent with an independent analysis by Bhattacharyya (2020), which suggests continuous gravitational wave emission from this neutron star, and the standard model of X-ray pulsations in accreting MSPs.

关于过渡型毫秒脉冲星（tMSP）X射线高模式下的吸积盘是否通过穿透磁层而接近中子星表面这一问题具有许多重要意义，包括对脉冲星连续引力波发射的影响。我们尝试通过对tMSP PSR J1023+0038的高模式数据进行分割，并利用XMM-Newton EPIC-PN+MOS1+MOS2联合观测、XMM-Newton+NuSTAR联合观测以及在不同时期NICER和Chandra的单独观测结果，进行详细的光谱分析来回答这个问题。通过对最长曝光时间的累加（约202 ksec的高模式数据，总计约364 ksec的曝光时间），我们进行了自洽的光谱分析，以至少3σ的显著性将内盘半径约束为16.8±3.8公里。这样的测量值与NICER等其他观测站的最佳拟合光谱内盘半径值相符，并与XMM-Newton和NuSTAR联合观测的3σ限制内的值一致。我们还首次在来自tMSP的Chandra光谱中检测到Fe发射线（以99%的显著性），内盘半径的上限为21 Rg，独立支持在高模式下内盘延伸至中子星磁层的事实。我们分析的所有结果都暗示，在PSR J1023+0038的X射线高模式下，其吸积盘明显存在于中子星的协同转动半径范围内。所测得的内盘半径范围与Bhattacharyya（2020）的独立分析完全一致，表明该中子星存在连续的引力波发射，并且符合吸积MSPs的X射线脉动标准模型。

论文及项目相关链接

PDF 18 pages, 3 tables, 12 figures, Accepted for publication in the Journal of High Energy Astrophysics

Summary
过渡型毫秒脉冲星PSR J1023+0038在X射线高模式下的吸积盘是否穿透磁层接近中子星表面是一个关键问题，对此问题的研究对于了解来自该脉冲星的连续引力波发射具有重要意义。通过对PSR J1023+0038的高模式数据进行分离和详细的谱分析，我们约束了其内盘半径为16.8±3.8公里，与NICER等其他观测结果一致。在Chandra光谱中首次检测到6.45千电子伏的Fe发射线，进一步支持了在高模式下内盘扩展到中子星磁层的观点。所有结果均表明，在tMSP PSR J1023+0038的X射线高模式下，其吸积盘显著存在于中子星自转半径内。

Key Takeaways

研究回答了关于过渡型毫秒脉冲星PSR J1023+0038在X射线高模式下的吸积盘是否接近中子星表面的问题。
通过谱分析，约束了内盘半径为约16.8公里，这一结果与NICER和其他观测结果相符。
在Chandra光谱中首次检测到来自tMSP的Fe发射线，进一步支持内盘在X射线高模式下扩展至中子星磁层的说法。
所有结果表明，在X射线高模式下，吸积盘显著存在于中子星的自转半径内。这对理解连续引力波发射具有重要意义。
结果与Bhattacharyya（2020）的独立分析一致，该分析暗示该中子星可能产生连续引力波发射。
研究结果符合X射线脉冲型毫秒脉冲星的标准模型。

Cool Papers

点此查看论文截图

Robust Alignment of the Human Embryo in 3D Ultrasound using PCA and an Ensemble of Heuristic, Atlas-based and Learning-based Classifiers Evaluated on the Rotterdam Periconceptional Cohort

Authors:Nikolai Herrmann, Marcella C. Zijta, Stefan Klein, Régine P. M. Steegers-Theunissen, Rene M. H. Wijnen, Bernadette S. de Bakker, Melek Rousian, Wietske A. P. Bastiaansen

Standardized alignment of the embryo in three-dimensional (3D) ultrasound images aids prenatal growth monitoring by facilitating standard plane detection, improving visualization of landmarks and accentuating differences between different scans. In this work, we propose an automated method for standardizing this alignment. Given a segmentation mask of the embryo, Principal Component Analysis (PCA) is applied to the mask extracting the embryo’s principal axes, from which four candidate orientations are derived. The candidate in standard orientation is selected using one of three strategies: a heuristic based on Pearson’s correlation assessing shape, image matching to an atlas through normalized cross-correlation, and a Random Forest classifier. We tested our method on 2166 images longitudinally acquired 3D ultrasound scans from 1043 pregnancies from the Rotterdam Periconceptional Cohort, ranging from 7+0 to 12+6 weeks of gestational age. In 99.0% of images, PCA correctly extracted the principal axes of the embryo. The correct candidate was selected by the Pearson Heuristic, Atlas-based and Random Forest in 97.4%, 95.8%, and 98.4% of images, respectively. A Majority Vote of these selection methods resulted in an accuracy of 98.5%. The high accuracy of this pipeline enables consistent embryonic alignment in the first trimester, enabling scalable analysis in both clinical and research settings. The code is publicly available at: https://gitlab.com/radiology/prenatal-image-analysis/pca-3d-alignment.

胚胎在三维（3D）超声图像中的标准化对齐，通过促进标准平面的检测、提高地标的可视化和突出不同扫描之间的差异，有助于进行产前生长监测。在此工作中，我们提出了一种自动化的方法进行这种对齐的标准化。给定胚胎的分割掩膜，应用主成分分析（PCA）对掩膜提取胚胎的主轴，从主轴中推导出四个候选方向。通过三种策略之一选择处于标准方向的候选物：一种基于皮尔逊相关性的启发式评估形状，通过归一化交叉相关性与图谱的图像匹配，以及随机森林分类器。我们在Rotterdam Periconceptional Cohort的1043个妊娠的2166张纵向采集的3D超声扫描图像上测试了我们的方法，这些图像的胎龄从7+0到12+6周不等。在99.0%的图像中，PCA正确提取了胚胎的主轴。皮尔逊启发式方法、基于图谱的方法和随机森林分别正确选择了候选物的97.4%、95.8%和98.4%。这些选择方法的大多数投票产生了高达98.5%的准确性。该流程的高准确性可实现孕早期胚胎的一致性对齐，在临床和研究环境中可实现可量化的分析。代码公开在：https://gitlab.com/radiology/prenatal-image-analysis/pca-3d-alignment。

论文及项目相关链接

PDF Submitted version of paper accepted at International Workshop on Preterm, Perinatal and Paediatric Image Analysis 2025

Summary

在三维超声波图像中标准化胚胎对齐，有助于提高产前生长监测中的标准平面检测、地标可视化和不同扫描间的差异识别。本研究提出了一种自动化方法进行标准化对齐。通过应用主成分分析（PCA）于胚胎分割掩膜，提取胚胎的主轴，推导出四个候选方向。选择标准方向候选者采用三种策略之一：基于皮尔逊相关性的启发式评估形状、通过归一化互相关进行图像与图谱的匹配以及随机森林分类器。在Rotterdam Periconceptional Cohort的1043例妊娠的2166张纵向采集的三维超声扫描图像上进行测试，结果证实PCA在99.0%的图像中正确提取了胚胎的主轴。皮尔逊启发式、图谱匹配和随机森林分类器分别在97.4%、95.8%和98.4%的图像中选择了正确的候选者。这些方法的大多数投票表决准确率为98.5%。该流程的高准确性实现了孕早期胚胎的一致性对齐，在临床和研究环境中可实现大规模分析。代码公开可访问：https://gitlab.com/radiology/prenatal-image-analysis/pca-3d-alignment。

Key Takeaways

三维超声波图像的胚胎标准化对齐有助于产前生长监测。
提出了一种自动化方法进行胚胎对齐的标准化。
通过PCA提取胚胎主轴，并据此推导出四个候选方向。
三种选择标准方向的方法包括基于皮尔逊相关性的启发式评估、图像与图谱的匹配以及随机森林分类器。
在大量图像测试中，PCA和其他方法表现出高准确性。
该流程的高准确性实现了孕早期胚胎的一致性对齐。

Cool Papers

点此查看论文截图

Computational Imaging Meets LLMs: Zero-Shot IDH Mutation Prediction in Brain Gliomas

Authors:Syed Muqeem Mahmood, Hassan Mohy-ud-Din

We present a framework that combines Large Language Models with computational image analytics for non-invasive, zero-shot prediction of IDH mutation status in brain gliomas. For each subject, coregistered multi-parametric MRI scans and multi-class tumor segmentation maps were processed to extract interpretable semantic (visual) attributes and quantitative features, serialized in a standardized JSON file, and used to query GPT 4o and GPT 5 without fine-tuning. We evaluated this framework on six publicly available datasets (N = 1427) and results showcased high accuracy and balanced classification performance across heterogeneous cohorts, even in the absence of manual annotations. GPT 5 outperformed GPT 4o in context-driven phenotype interpretation. Volumetric features emerged as the most important predictors, supplemented by subtype-specific imaging markers and clinical information. Our results demonstrate the potential of integrating LLM-based reasoning with computational image analytics for precise, non-invasive tumor genotyping, advancing diagnostic strategies in neuro-oncology. The code is available at https://github.com/ATPLab-LUMS/CIM-LLM.

我们提出一个结合大型语言模型和计算图像分析的非侵入性零样本预测脑胶质瘤IDH突变状态的框架。针对每个受试者，我们对核心多参数MRI扫描和多元肿瘤分割图进行处理，提取可解释的语义（视觉）属性和定量特征，将其序列化在标准化的JSON文件中，用于查询GPT 4o和GPT 5而无需微调。我们在六个公开数据集（N = 1427）上评估了这一框架，结果显示，即使在无需手动注释的情况下，该框架在异质群体中也具有较高的准确性和平衡的分类性能。GPT 5在上下文驱动的表型解释方面表现优于GPT 4o。体积特征是最重要预测因素，辅以亚型特异性成像标记和临床信息。我们的结果证明了将LLM推理与计算图像分析相结合在精确非侵入性肿瘤基因分型方面的潜力，为神经肿瘤学的诊断策略提供了先进的手段。代码可在 https://github.com/ATPLab-LUMS/CIM-LLM 获取。

论文及项目相关链接

PDF 5 pages, 1 figure, 3 tables

摘要

本研究提出了一种结合大型语言模型和计算图像分析的非侵入性零样本预测脑胶质瘤IDH突变状态的方法。通过对每位患者的多参数MRI扫描和多类肿瘤分割图进行核心注册处理，提取可解释的语义（视觉）特征和定量特征，并序列化存储在标准化JSON文件中，用于查询GPT 4o和GPT 5模型而无需微调。在六个公开数据集（N = 1427）上评估该框架，结果显示其具有较高的准确性和跨不同群体的平衡分类性能，即使在无需手动注释的情况下也是如此。GPT 5在上下文驱动的表型解释方面优于GPT 4o。体积特征是最重要预测因素，辅以特定亚型的成像标记和临床信息。本研究结果展示了将大型语言模型推理与计算图像分析相结合在精确非侵入性肿瘤基因分型方面的潜力，为神经肿瘤学的诊断策略提供了进步。相关代码已发布在https://github.com/ATPLab-LUMS/CIM-LLM上。

关键见解

研究提出了结合大型语言模型和计算图像分析的框架，用于非侵入性地预测脑胶质瘤的IDH突变状态。
利用多参数MRI扫描和肿瘤分割图提取特征，并标准化处理用于查询语言模型。
在多个公开数据集上评估，显示高准确性和跨群体平衡分类性能，无需手动注释。
GPT 5模型在上下文驱动的表型解释方面优于GPT 4o。
体积特征是预测IDH突变状态最重要的因素。
结合特定亚型的成像标记和临床信息提高了预测的准确性。

Cool Papers

点此查看论文截图

Morpho-Genomic Deep Learning for Ovarian Cancer Subtype and Gene Mutation Prediction from Histopathology

Authors:Gabriela Fernandes

Ovarian cancer remains one of the most lethal gynecological malignancies, largely due to late diagnosis and extensive heterogeneity across subtypes. Current diagnostic methods are limited in their ability to reveal underlying genomic variations essential for precision oncology. This study introduces a novel hybrid deep learning pipeline that integrates quantitative nuclear morphometry with deep convolutional image features to perform ovarian cancer subtype classification and gene mutation inference directly from Hematoxylin and Eosin (H&E) histopathological images. Using $\sim45,000$ image patches sourced from The Cancer Genome Atlas (TCGA) and public datasets, a fusion model combining a ResNet-50 Convolutional Neural Network (CNN) encoder and a Vision Transformer (ViT) was developed. This model successfully captured both local morphological texture and global tissue context. The pipeline achieved a robust overall subtype classification accuracy of $84.2%$ (Macro AUC of $0.87 \pm 0.03$). Crucially, the model demonstrated the capacity for gene mutation inference with moderate-to-high accuracy: $AUC_{TP53} = 0.82 \pm 0.02$, $AUC_{BRCA1} = 0.76 \pm 0.04$, and $AUC_{ARID1A} = 0.73 \pm 0.05$. Feature importance analysis established direct quantitative links, revealing that nuclear solidity and eccentricity were the dominant predictors for TP53 mutation. These findings validate that quantifiable histological phenotypes encode measurable genomic signals, paving the way for cost-effective, precision histopathology in ovarian cancer triage and diagnosis.

卵巢癌仍是致死率最高的妇科恶性肿瘤之一，这主要是因为诊断时间晚以及各亚型之间的广泛异质性。当前诊断方法在揭示卵巢癌精准医疗所需潜在基因组变异方面的能力受限。本研究介绍了一种新型混合深度学习管道，它将定量核形态测量与深度卷积图像特征相结合，直接从苏木精和伊红（H&E）组织病理图像中执行卵巢癌亚型分类和基因突变推断。该研究使用来自癌症基因组图谱（TCGA）和公共数据集的约45,000个图像块，开发了一个结合ResNet-50卷积神经网络（CNN）编码器和视觉转换器（ViT）的融合模型。该模型成功捕获了局部形态纹理和全局组织上下文。管道实现了稳健的总体亚型分类准确率84.2%（宏观AUC为0.87±0.03）。最重要的是，该模型在基因突变推断方面表现出中等到较高的准确性：AUC_TP53=0.82±0.02，AUC_BRCA1=0.76±0.04和AUC_ARID1A=0.73±0.05。特征重要性分析建立了直接的定量联系，表明核坚固性和离心率是TP53突变的主要预测因子。这些发现证明了可量化的组织病理表型能够编码可测量的基因组信号，为卵巢癌筛查和诊断中的经济高效、精准组织病理学铺平了道路。

论文及项目相关链接

PDF

摘要

本研究介绍了一种新型混合深度学习管道，该管道结合了定量核形态测量和深度卷积图像特征，可从苏木精和伊红（H&E）组织病理学图像直接进行卵巢癌亚型分类和基因突变推断。该模型成功捕获了局部形态纹理和全局组织上下文，实现了稳健的亚型分类准确率（宏观AUC为0.87±0.03）。此外，该模型还具有中等至高度的基因突变推断准确性。特征重要性分析揭示了核实体性和偏心率是TP53突变的主要预测因子。这为精确病理组织学在卵巢癌筛查和诊断中的成本效益铺平了道路。

关键要点

卵巢癌是一种致死率高的妇科恶性肿瘤，当前诊断方法难以揭示基因组变异。
本研究提出了一种新型混合深度学习管道，整合了定量核形态测量与深度卷积图像特征。
该模型从H&E组织病理学图像中直接进行卵巢癌亚型分类和基因突变推断。
模型实现了稳健的亚型分类准确率（宏观AUC为0.87±0.03），并展示了中等至高度的基因突变推断准确性。
特征重要性分析发现核实体性和偏心率是预测基因变异的关键指标。

Cool Papers

点此查看论文截图

The nexus between negative charge-transfer and reduced on-site Coulomb energy in correlated topological metals

Authors:A. R. Shelke, C. -W. Chuang, S. Hamamoto, M. Oura, M. Yoshimura, N. Hiraoka, C. -N. Kuo, C. -S. Lue, A. Fujimori, A. Chainani

The layered $3d$ transition metal dichalcogenides (TMDs) CoTe$2$ and NiTe$2$ are topological Dirac Type-II metals. Their $d$-bands do not exhibit the expected correlation-induced band narrowing seen in CoO and NiO. We address this conundrum by quantifying the on-site Coulomb energy $U{dd}$ via single-particle partial density of states and the two-hole correlation satellite using valence band resonant photoemission spectroscopy (PES), and obtain $U{dd}$ = 3.0 eV/3.7 eV for CoTe$_2$/NiTe$2$. Charge-transfer (CT) cluster model simulations of the measured core-level PES and x-ray absorption spectra of CoTe$2$ and CoO validate their contrasting electronic parameters:$U{dd}$ and CT energy $\Delta$ are (3.0 eV, -2.0 eV) for CoTe$2$, and (5.0 eV, 4.0 eV) for CoO, respectively. The $d$-$p$ hybridization strength $T{eg}$ for CoTe$2$$<$CoO, and indicates that the reduced $U_{dd}$ in CoTe$_2$ is not due to $T_{eg}$. The increase in $d^n$-count$\sim$1 by CT from ligand to Co site in CoTe$_2$ is due to a negative-$\Delta$ and reduced $U_{dd}$. Yet, only because $U_{dd}$$>$$\big|\Delta\big|$, CoTe${2}$ becomes a topological metal with $p$$\rightarrow$${p}$ type lowest energy excitations. Similarly, we obtain a negative-$\Delta$ and reduced $U{dd}$ in NiTe$2$ compared to NiO. The study reveals the nexus between negative-$\Delta$ and reduced $U{dd}$ required for setting up the electronic structure framework for achieving topological behavior via band inversion in correlated metals.

分层三维过渡金属二卤化物（TMDs）中的CoTe2和NiTe2是拓扑狄拉克第二类金属。它们的d带没有表现出在CoO和NiO中预期的由相关性引起的带窄化现象。我们通过量化现场的库仑能Udd，使用单粒子部分态密度和通过价带共振光电子光谱法（PES）的两孔相关性卫星，解决了这一难题，并得出CoTe2/NiTe2的Udd为3.0 eV/3.7 eV。对CoTe2和CoO的测量核心能级PES和X射线吸收光谱的电荷转移（CT）集群模型模拟验证了其对比电子参数：对于CoTe2，Udd和CT能量Δ分别为（3.0 eV，-2.0 eV），而对于CoO分别为（5.0 eV，4.0 eV）。CoTe2的d-p杂化强度Teg小于CoO，表明CoTe2中Udd的减少并非由于Teg。CoTe2中从配体到Co位点的电荷转移导致的d n计数增加约为1是由于负Δ和减少的Udd。然而，只有Udd> |Δ|时，CoTe2才能成为具有p→p型最低能量激发的拓扑金属。类似地，与NiO相比，我们在NiTe2中也获得了负Δ和减少的Udd。该研究揭示了负Δ和减少的Udd之间的关联，这是通过在相关金属中实现能带反转来实现拓扑行为所需的电子结构框架的关键。

论文及项目相关链接

PDF 8 pages + 5 figures(main) and 10 pages + 9 figures (SM) (submitted to PRB)

Summary

钴和镍的二元过渡金属硫化物（TMDs）CoTe2和NiTe2是拓扑狄拉克第二类金属。它们呈现出不同于氧化钴和氧化镍的d带特性。通过测量部分态密度和两孔关联卫星的库仑能Ud，发现CoTe2和NiTe2的Ud分别为3.0 eV和3.7 eV。基于电荷转移模型对CoTe2和CoO的核心能级光电子发射光谱和X射线吸收光谱的模拟，验证了其不同的电子参数，揭示了对比鲜明的电子结构和拓扑行为的内在关联。研究指出，负电荷转移能量Δ和降低的Ud是实现拓扑金属的关键。

Key Takeaways

CoTe2和NiTe2是拓扑狄拉克第二类金属，其d带特性不同于氧化钴和氧化镍。
通过测量部分态密度和两孔关联卫星，发现CoTe2和NiTe2的库仑能Ud分别为3.0 eV和3.7 eV。
基于电荷转移模型对CoTe2和CoO的核心能级光电子发射光谱和X射线吸收光谱模拟验证了其不同的电子参数。
负电荷转移能量Δ和降低的Ud对于实现拓扑金属行为至关重要。
CoTe2中的d-p杂化强度低于CoO，表明其较低的Ud并非由d-p杂化引起。
从配体到钴位的电荷转移导致d^n计数增加约1，这是由负Δ和降低的Ud引起的。然而，只有Ud大于Δ时，CoTe2才能成为拓扑金属。

Cool Papers

点此查看论文截图

Enhancing Medical Image Segmentation via Heat Conduction Equation

Authors:Rong Wu, Yim-Sang Yu

Medical image segmentation has been significantly advanced by deep learning architectures, notably U-Net variants. However, existing models struggle to achieve efficient global context modeling and long-range dependency reasoning under practical computational budgets simultaneously. In this work, we propose a novel hybrid architecture utilizing U-Mamba with Heat Conduction Equation. Our model combines Mamba-based state-space modules for efficient long-range reasoning with Heat Conduction Operators (HCOs) in the bottleneck layers, simulating frequency-domain thermal diffusion for enhanced semantic abstraction. Experimental results on multimodal abdominal CT and MRI datasets demonstrate that the proposed model consistently outperforms strong baselines, validating its effectiveness and generalizability. It suggest that blending state-space dynamics with heat-based global diffusion offers a scalable and interpretable solution for medical segmentation tasks.

医学图像分割领域在深度学习架构的推动下取得了显著进展，尤其是U-Net变体。然而，现有模型在实用计算预算下难以同时实现有效的全局上下文建模和长程依赖推理。在这项工作中，我们提出了一种新型混合架构，利用U-Mamba和导热方程。我们的模型结合了基于Mamba的状态空间模块，用于高效的长程推理，并在瓶颈层引入了导热算子（HCOs），模拟频域热扩散以增强语义抽象。在多模态腹部CT和MRI数据集上的实验结果表明，所提出的模型始终优于强基线，验证了其有效性和泛化能力。这证明将状态空间动力学与基于热的全局扩散相结合，为医学分割任务提供了可扩展和可解释的解决方案。

论文及项目相关链接

PDF

Summary
医学图像分割领域，深度学习架构特别是U-Net变体已取得显著进展。然而，现有模型在有限的计算预算下难以实现高效的全局上下文建模和长距离依赖推理。本研究提出了一种新型混合架构U-Mamba与热传导方程结合的方法。该方法结合了基于Mamba的状态空间模块进行高效的长距离推理，并在瓶颈层引入热传导算子（HCOs），模拟频域热扩散以增强语义抽象。在多模态腹部CT和MRI数据集上的实验结果表明，该方法在强基线基础上表现优异，验证了其有效性和通用性。这表明结合状态空间动力学与基于热的全局扩散为医学分割任务提供了可扩展且可解释的解决方案。

Key Takeaways

深度学习架构在医学图像分割领域取得显著进展，尤其是U-Net变体。
现有模型在计算预算有限的情况下难以实现全局上下文建模和长距离依赖推理的平衡。
提出了一种新型混合架构U-Mamba与热传导方程结合的方法，结合了状态空间模块和热传导算子。
状态空间模块用于高效长距离推理，而热传导算子模拟频域热扩散以增强语义抽象。
在多模态腹部CT和MRI数据集上的实验结果表明该方法表现优异。
该方法具有有效性和通用性，相较于强基线有显著提升。

Cool Papers

点此查看论文截图

Diffusion-Guided Mask-Consistent Paired Mixing for Endoscopic Image Segmentation

Authors:Pengyu Jie, Wanquan Liu, Rui He, Yihui Wen, Deyu Meng, Chenqiang Gao

Augmentation for dense prediction typically relies on either sample mixing or generative synthesis. Mixing improves robustness but misaligned masks yield soft label ambiguity. Diffusion synthesis increases apparent diversity but, when trained as common samples, overlooks the structural benefit of mask conditioning and introduces synthetic-real domain shift. We propose a paired, diffusion-guided paradigm that fuses the strengths of both. For each real image, a synthetic counterpart is generated under the same mask and the pair is used as a controllable input for Mask-Consistent Paired Mixing (MCPMix), which mixes only image appearance while supervision always uses the original hard mask. This produces a continuous family of intermediate samples that smoothly bridges synthetic and real appearances under shared geometry, enlarging diversity without compromising pixel-level semantics. To keep learning aligned with real data, Real-Anchored Learnable Annealing (RLA) adaptively adjusts the mixing strength and the loss weight of mixed samples over training, gradually re-anchoring optimization to real data and mitigating distributional bias. Across Kvasir-SEG, PICCOLO, CVC-ClinicDB, a private NPC-LES cohort, and ISIC 2017, the approach achieves state-of-the-art segmentation performance and consistent gains over baselines. The results show that combining label-preserving mixing with diffusion-driven diversity, together with adaptive re-anchoring, yields robust and generalizable endoscopic segmentation.

数据增强对于密集预测通常依赖于样本混合或生成合成。混合提高了稳健性，但错位掩膜会导致软标签歧义。扩散合成增加了明显的多样性，但当作为常规样本进行训练时，它忽略了掩膜条件的结构性优势，并引入了合成-真实域偏移。我们提出了一种配对、扩散引导的范式，融合了两者的优点。对于每个真实图像，在相同掩膜下生成一个合成对应物，并且该对用作可控输入，用于掩膜一致配对混合（MCPMix），其中仅混合图像外观，而监督始终使用原始的硬掩膜。这产生了一系列中间样本，这些样本在共享几何结构下平滑地桥接了合成和真实外观，扩大了多样性，同时不损害像素级语义。为了保持学习与真实数据对齐，Real-Anchored Learnable Annealing（RLA）自适应地调整混合强度和混合样本的损失权重，逐步将优化重新锚定到真实数据，并减轻分布偏见。在Kvasir-SEG、PICCOLO、CVC-ClinicDB、私人NPC-LES队列和ISIC 2017中，该方法达到了最先进的分割性能，并在基准测试上实现了一致的收益。结果表明，将标签保留混合与扩散驱动多样性相结合，再加上自适应重新锚定，可实现稳健且可推广的内镜分割。

论文及项目相关链接

PDF

Summary

本文提出一种结合样本混合和扩散合成的方法，用于增强密集预测任务的效果。该方法通过生成与真实图像配对的人工图像，提出Mask-Consistent Paired Mixing（MCPMix）技术，在保持监督时使用原始硬掩膜的同时，只混合图像外观。此外，引入Real-Anchored Learnable Annealing（RLA）技术，自适应调整混合强度和损失权重，使学习始终与真实数据对齐。该方法在多个数据集上实现了最先进的分割性能。

Key Takeaways

提出一种结合样本混合和扩散合成的方法，增强密集预测任务的性能。
引入Mask-Consistent Paired Mixing（MCPMix）技术，生成与真实图像配对的人工图像，并用于可控的输入。
MCPMix只在图像外观上进行混合，同时始终保持使用原始硬掩膜进行监督。
提出Real-Anchored Learnable Annealing（RLA）技术，自适应调整混合强度和损失权重。
方法实现了在多个数据集上的先进分割性能。
结合标签保留混合和扩散驱动的多样性，以及自适应重新锚定，得到稳健且可推广的内镜分割结果。

Cool Papers

点此查看论文截图

A Plug-and-Play Framework for Volumetric Light-Sheet Image Reconstruction

Authors:Yi Gong, Xinyuan Zhang, Jichen Chai, Yichen Ding, Yifei Lou

Cardiac contraction is a rapid, coordinated process that unfolds across three-dimensional tissue on millisecond timescales. Traditional optical imaging is often inadequate for capturing dynamic cellular structure in the beating heart because of a fundamental trade-off between spatial and temporal resolution. To overcome these limitations, we propose a high-performance computational imaging framework that integrates Compressive Sensing (CS) with Light-Sheet Microscopy (LSM) for efficient, low-phototoxic cardiac imaging. The system performs compressed acquisition of fluorescence signals via random binary mask coding using a Digital Micromirror Device (DMD). We propose a Plug-and-Play (PnP) framework, solved using the alternating direction method of multipliers (ADMM), which flexibly incorporates advanced denoisers, including Tikhonov, Total Variation (TV), and BM3D. To preserve structural continuity in dynamic imaging, we further introduce temporal regularization enforcing smoothness between adjacent z-slices. Experimental results on zebrafish heart imaging under high compression ratios demonstrate that the proposed method successfully reconstructs cellular structures with excellent denoising performance and image clarity, validating the effectiveness and robustness of our algorithm in real-world high-speed, low-light biological imaging scenarios.

心脏收缩是一个快速、协调的过程，这一过程在三维组织中以毫秒为单位展开。由于空间和时间分辨率之间的根本权衡，传统光学成像在捕捉跳动心脏的动态细胞结构时常常显得不足。为了克服这些局限性，我们提出了一种高性能的计算成像框架，该框架将压缩感知（CS）与光片显微镜（LSM）相结合，用于高效、低光毒性心脏成像。该系统通过数字微镜设备（DMD）使用随机二进制掩模编码对荧光信号进行压缩采集。我们提出了一个即插即用（PnP）框架，通过交替方向乘数法（ADMM）解决，灵活地结合了先进的去噪器，包括Tikhonov、总变异（TV）和BM3D。为了保持动态成像的结构连续性，我们进一步引入了时间正则化，以加强相邻z切片的平滑度。在高压缩率下的斑马鱼心脏成像实验结果表明，所提出的方法成功地重建了细胞结构，具有出色的去噪性能和图像清晰度，验证了我们的算法在现实世界中高速、低光生物成像场景中的有效性和稳健性。

论文及项目相关链接

PDF

Summary
光学成像技术因空间和时间的分辨率的权衡通常难以捕捉动态的心脏细胞结构，尤其是快速心跳情况下的捕捉更是难以实现。为解决此问题，我们提出了一种高性能的计算成像框架，该框架结合了压缩感知（CS）和光片显微镜（LSM），实现了高效、低光毒性心脏成像。通过数字微镜器件（DMD）进行随机二进制掩模编码实现荧光信号的压缩采集。我们提出了一个即插即用（PnP）框架，采用交替方向乘子法（ADMM）解决该问题，并灵活引入了Tikhonov、总变差（TV）和BM3D等高级去噪器。为在动态成像中保持结构连续性，我们进一步引入了时间正则化，强制相邻z切片之间的平滑性。在高压缩比条件下对斑马鱼心脏成像的实验结果表明，该方法能够成功地重建细胞结构，具有出色的降噪性能和图像清晰度，验证了该算法在高速、低光照生物成像场景中的有效性和稳健性。

Key Takeaways

传统光学成像难以捕捉快速动态心脏细胞结构，存在空间和时间的分辨率权衡问题。
提出一种结合压缩感知和光片显微镜的计算成像框架，实现高效低光毒性心脏成像。
采用数字微镜器件进行荧光信号压缩采集。
提出即插即用框架，用交替方向乘子法解决，并灵活引入高级去噪器。
为保持结构连续性，引入时间正则化技术。
在斑马鱼心脏成像实验中的高压缩比条件下成功重建细胞结构。

Cool Papers

点此查看论文截图

A Foundation Model for Brain MRI with Dynamic Modality Integration

Authors:Minh Sao Khue Luu, Bair N. Tuchinov

We present a foundation model for brain MRI that can work with different combinations of imaging sequences. The model uses one encoder with learnable modality embeddings, conditional layer normalization, and a masked autoencoding objective that accounts for missing modalities. A variance-covariance regularizer is applied to stabilize feature learning and improve representation diversity. This design removes the need for separate models for each modality and allows the network to adapt when some sequences are missing or unseen. It is trained on about 60,000 multi-center MRIs using self-supervised reconstruction and modality imputation to learn flexible representations. A learnable modality embedding guides feature extraction so the encoder can adjust to different inputs. We describe our planned evaluation on brain tumor and multiple sclerosis segmentation, as well as lesion classification, under various modality settings. Preliminary results show that the method works feasibly, and further experiments are planned to study its performance in more detail. All code and pretrained models are available at https://github.com/BrainFM/brainfm

我们提出了一种适用于不同成像序列组合的脑MRI基础模型。该模型使用一个编码器，带有可学习的模态嵌入、条件层归一化，并考虑了缺失模态的掩码自动编码目标。应用方差协方差正则化器以稳定特征学习并提高表示多样性。这种设计消除了为每个模态使用单独模型的需要，并允许网络在某些序列缺失或不可见的情况下进行适应。该模型使用自监督重建和模态插补技术在约6万张多中心MRI上进行训练，以学习灵活的表现。可学习的模态嵌入引导特征提取，使编码器能够适应不同的输入。我们介绍了在脑肿瘤和多重硬化症分割以及不同模态设置下的病变分类等计划进行的评估。初步结果表明，该方法具有良好的可行性，并计划进一步实验以更详细地研究其性能。所有代码和预训练模型可在 https://github.com/BrainFM/brainfm 找到。

论文及项目相关链接

PDF Preliminary work; results ongoing

Summary

本文介绍了一种适用于不同组合成像序列的脑MRI基础模型。该模型采用带有可学习模态嵌入、条件层归一化和掩码自动编码目标的编码器，解决了缺失模态的问题。通过应用方差协方差正则化器，稳定特征学习并改善表示多样性。该设计无需为每个模态建立单独模型，可在某些序列缺失或未见时自适应网络。该模型在约6万个多中心MRI上进行了自我监督重建和模态插值训练，以学习灵活表示。初步结果证明了该方法的可行性，并计划进行更多实验以研究其性能。

Key Takeaways

模型适用于多种成像序列的脑MRI分析。
模型采用带有可学习模态嵌入的编码器，适应不同输入。
条件层归一化和掩码自动编码目标处理缺失模态问题。
应用方差-协方差正则化器稳定特征学习并改善表示多样性。
模型在大量脑MRI数据上进行自我监督重建和模态插值训练。
初步结果证明了模型的可行性。

Cool Papers

点此查看论文截图

SCALE-VLP: Soft-Weighted Contrastive Volumetric Vision-Language Pre-training with Spatial-Knowledge Semantics

Authors:Ailar Mahdizadeh, Puria Azadi Moghadam, Xiangteng He, Shahriar Mirabbasi, Panos Nasiopoulos, Leonid Sigal

Vision-language models (VLMs) have demonstrated strong cross-modal capabilities, yet most work remains limited to 2D data and assumes binary supervision (i.e., positive vs. negative pairs), overlooking the continuous and structured dependencies present in volumetric data such as CT. Existing approaches often treat volumetric scans as independent 2D slices, compromising spatial coherence and underutilizing rich clinical semantics. We propose SCALE-VLP, a soft-weighted contrastive vision-language pre-training framework that integrates (i) volumetric spatial semantics to preserve anatomical structure and (ii) domain-aware, knowledge-infused semantics (e.g., radiological ontologies) to guide alignment. This yields structurally consistent and semantically grounded representations under limited supervision, demonstrating strong cross-task transferability (retrieval, report generation, and classification), and cross-domain generalizability with consistent gains without further fine-tuning. In particular, compared to the previous state of the art, SCALE-VLP achieves up to 4.3x higher top-1 CT-report retrieval, improves abnormality classification by 10 points, and reaches ROUGE-L 0.44 and BERT-F1 0.89 for report generation. Further, in zero-shot evaluation on an out-of-domain external dataset, we observe consistent gains, indicating the cross-task and cross-domain generalization ability of SCALE-VLP.

视觉语言模型（VLMs）已经展现出强大的跨模态能力，但大多数工作仍然局限于2D数据，并假设二元监督（即正面与负面配对），忽视了存在于CT等体积数据中的连续和结构化依赖关系。现有方法通常将体积扫描视为独立的2D切片，这损害了空间连贯性并未能充分利用丰富的临床语义。我们提出了SCALE-VLP，这是一种软加权对比视觉语言预训练框架，它集成了（i）体积空间语义以保留解剖结构，以及（ii）领域感知、知识注入语义（例如，放射学本体）以指导对齐。这在有限监督下产生了结构一致且语义丰富的表示，显示出强大的跨任务可迁移性（检索、报告生成和分类），以及跨域泛化能力，无需进一步微调即可带来一致性的收益。特别是与以前的最先进水平相比，SCALE-VLP在CT报告检索方面达到了高达4.3倍的top-1准确率，异常分类提高了10个点，报告生成的ROUGE-L达到0.44，BERT-F1达到0.89。此外，在域外外部数据集上进行零样本评估时，我们观察到了一致的收益，这表明SCALE-VLP的跨任务和跨域泛化能力。

论文及项目相关链接

PDF

Summary

本文提出了SCALE-VLP框架，该框架结合了体积空间语义和领域知识语义，用于在有限的监督下实现结构一致且语义丰富的表示。该框架能够保留解剖结构并引导对齐，实现了跨任务和跨领域的良好表现。相较于先前的研究，SCALE-VLP在CT报告检索、异常分类等方面取得了显著的提升。

Key Takeaways

VLMs虽然具有强大的跨模态能力，但大多数工作仍然局限于2D数据，并假设二元监督，忽略了存在于体积数据（如CT）中的连续和结构化依赖关系。
现有方法往往将体积扫描视为独立的2D切片，这损害了空间连贯性并导致丰富的临床语义未能充分利用。
SCALE-VLP是一个软加权对比视觉语言预训练框架，它整合了体积空间语义和领域知识语义。
SCALE-VLP能够保留解剖结构并引导对齐，从而实现结构一致且语义丰富的表示。
SCALE-VLP在跨任务和跨领域的表现中均表现出色，特别是在CT报告检索和异常分类方面。
SCALE-VLP相较于先前的研究取得了显著的提升，如在CT报告检索上达到了4.3倍的提升，异常分类提高了10个点。

Cool Papers

点此查看论文截图

Domain-Adaptive Transformer for Data-Efficient Glioma Segmentation in Sub-Saharan MRI

Authors:Ilerioluwakiiye Abolade, Aniekan Udo, Augustine Ojo, Abdulbasit Oyetunji, Hammed Ajigbotosho, Aondana Iorumbur, Confidence Raymond, Maruf Adewole

Glioma segmentation is critical for diagnosis and treatment planning, yet remains challenging in Sub-Saharan Africa due to limited MRI infrastructure and heterogeneous acquisition protocols that induce severe domain shift. We propose SegFormer3D-plus, a radiomics-guided transformer architecture designed for robust segmentation under domain variability. Our method combines: (1) histogram matching for intensity harmonization across scanners, (2) radiomic feature extraction with PCA-reduced k-means for domain-aware stratified sampling, (3) a dual-pathway encoder with frequency-aware feature extraction and spatial-channel attention, and (4) composite Dice-Cross-Entropy loss for boundary refinement. Pretrained on BraTS 2023 and fine-tuned on BraTS-Africa data, SegFormer3D-plus demonstrates improved tumor subregion delineation and boundary localization across heterogeneous African clinical scans, highlighting the value of radiomics-guided domain adaptation for resource-limited settings.

胶质瘤分割对于诊断和治疗计划至关重要，然而在撒哈拉以南非洲仍然是一个挑战，因为有限的MRI基础设施和不同的采集协议导致了严重的领域偏移。我们提出了SegFormer3D-plus，这是一种基于放射学指导的转换器架构，旨在在领域变化下进行稳健的分割。我们的方法结合了：（1）直方图匹配，用于扫描仪之间的强度协调，（2）使用PCA降维的k均值进行放射学特征提取，以实现领域感知分层采样，（3）具有频率感知特征提取和空间通道注意力的双路径编码器，（4）复合Dice-Cross-Entropy损失以优化边界。在BraTS 2023上进行预训练并在BraTS-Africa数据上进行微调后，SegFormer3D-plus展示了在非洲的临床扫描图像中改善肿瘤亚区域分割和边界定位的能力，这突出了在资源受限环境中放射学指导的领域适应的价值。

论文及项目相关链接

PDF 4 pages, 2 figures. Accepted as an abstract at the Women in Machine Learning (WiML) Workshop at NeurIPS 2025

Summary

本文提出一种名为SegFormer3D-plus的放射组学指导的变换器架构，用于在域变化下进行稳健的胶质瘤分割。该方法结合了强度直方图匹配、基于PCA降维的k-means放射特征提取、双通路编码器与复合Dice-Cross-Entropy损失函数等方法，以改善肿瘤亚区域分割和边界定位。SegFormer3D-plus在非洲异质临床扫描数据上表现优异，凸显了放射组学指导的域适应在资源有限环境中的价值。

Key Takeaways

胶质瘤分割对于诊断和治疗计划至关重要，但在撒哈拉以南非洲地区仍具挑战，受限于MRI基础设施和异质采集协议导致的域偏移。
SegFormer3D-plus是一种针对域变化设计的稳健分割方法，采用放射组学指导的变换器架构。
SegFormer3D-plus结合了强度直方图匹配以协调不同扫描仪间的差异。
使用PCA降维的k-means方法进行放射特征提取，实现域感知分层采样。
双通路编码器具备频率感知特征提取和时空通道注意力机制。
采用复合Dice-CrossEntropy损失函数以改善边界细化。

Cool Papers

点此查看论文截图

Med-Banana-50K: A Cross-modality Large-Scale Dataset for Text-guided Medical Image Editing

Authors:Zhihui Chen, Mengling Feng

Recent advances in multimodal large language models have enabled remarkable medical image editing capabilities. However, the research community’s progress remains constrained by the absence of large-scale, high-quality, and openly accessible datasets built specifically for medical image editing with strict anatomical and clinical constraints. We introduce Med-Banana-50K, a comprehensive 50K-image dataset for instruction-based medical image editing spanning three modalities (chest X-ray, brain MRI, fundus photography) and 23 disease types. Our dataset is constructed by leveraging Gemini-2.5-Flash-Image to generate bidirectional edits (lesion addition and removal) from real medical images. What distinguishes Med-Banana-50K from general-domain editing datasets is our systematic approach to medical quality control: we employ LLM-as-Judge with a medically grounded rubric (instruction compliance, structural plausibility, realism, and fidelity preservation) and history-aware iterative refinement up to five rounds. Beyond single-turn editing, Med-Banana-50K includes 37K failed attempts with full conversation logs for preference learning and alignment research. By providing this large-scale, medically validated, and fully documented resource, Med-Banana-50K establishes a foundation for training and evaluating the next generation of medical image editing models.Our dataset and code are publicly available at [https://github.com/richardChenzhihui/med-banana-50k].

近期多模态大型语言模型的进展为医学图像编辑提供了显著的能力。然而，研究社区的进展仍然受到缺乏大规模、高质量、公开可访问的专门用于医学图像编辑数据集的制约，这些数据集需要严格的解剖和临床约束。我们介绍了Med-Banana-50K，这是一个基于指令的医学图像编辑的综合性5万张图像数据集，涵盖三种模态（胸部X光、脑部MRI、眼底摄影）和23种疾病类型。我们的数据集通过利用Gemini-2.5-Flash-Image生成真实医学图像的双向编辑（病灶增加和去除）来构建。Med-Banana-50K与一般领域编辑数据集的区别在于我们的医疗质量控制系统方法：我们采用LLM-as-Judge，使用以医学为基础的评分系统（指令合规性、结构可行性、现实性和保真性保留），并进行最多五轮的历史感知迭代改进。除了单回合编辑外，Med-Banana-50K还包括包含完整对话日志的3.7万次失败尝试记录，用于偏好学习和对齐研究。通过提供大规模、医学验证和完整记录的这一资源，Med-Banana-50K为培训和评估下一代医学图像编辑模型奠定了基础。我们的数据集和代码可在[https://github.com/richardChenzhihui/med-banana-50k]公开访问。

论文及项目相关链接

PDF

Summary

医学图像编辑数据集Med-Banana-50K被推出，包含5万张跨三种模态（胸部X光、脑部MRI、眼底摄影）和23种疾病类型的指令型医学图像编辑数据。该数据集通过Gemini-2.5-Flash-Image生成双向编辑（病变增加和移除）的真实医学图像，并采用LLM-as-Judge进行医学质量控制。此外，还包括3.7万次失败的尝试及完整对话日志，有利于偏好学习和对齐研究。

Key Takeaways

Med-Banana-50K是一个大型医学图像编辑数据集，包含5万张图像。
数据集涵盖三种模态和23种疾病类型。
通过Gemini-2.5-Flash-Image生成真实医学图像的双向编辑。
采用LLM-as-Judge进行医学质量控制，包括指令遵守、结构合理性、真实性和保真度保留。
数据集包含3.7万次的失败尝试及完整对话日志。
Med-Banana-50K为训练和评价下一代医学图像编辑模型提供了基础。
数据集和代码可在https://github.com/richardChenzhihui/med-banana-50k公开获取。

Cool Papers

点此查看论文截图

BALR-SAM: Boundary-Aware Low-Rank Adaptation of SAM for Resource-Efficient Medical Image Segmentation

Authors:Zelin Liu, Sicheng Dong, Bocheng Li, Yixuan Yang, Jiacheng Ruan, Chenxu Zhou, Suncheng Xiang

Vision foundation models like the Segment Anything Model (SAM), pretrained on large-scale natural image datasets, often struggle in medical image segmentation due to a lack of domain-specific adaptation. In clinical practice, fine-tuning such models efficiently for medical downstream tasks with minimal resource demands, while maintaining strong performance, is challenging. To address these issues, we propose BALR-SAM, a boundary-aware low-rank adaptation framework that enhances SAM for medical imaging. It combines three tailored components: (1) a Complementary Detail Enhancement Network (CDEN) using depthwise separable convolutions and multi-scale fusion to capture boundary-sensitive features essential for accurate segmentation; (2) low-rank adapters integrated into SAM’s Vision Transformer blocks to optimize feature representation and attention for medical contexts, while simultaneously significantly reducing the parameter space; and (3) a low-rank tensor attention mechanism in the mask decoder, cutting memory usage by 75% and boosting inference speed. Experiments on standard medical segmentation datasets show that BALR-SAM, without requiring prompts, outperforms several state-of-the-art (SOTA) methods, including fully fine-tuned MedSAM, while updating just 1.8% (11.7M) of its parameters.

像Segment Anything Model（SAM）这样的视觉基础模型，在大规模自然图像数据集上进行预训练，往往由于缺少特定领域的适应性而在医学图像分割中表现挣扎。在临床实践中，以最小的资源需求高效地对这类模型进行医学下游任务的微调，同时保持强大的性能，是一项挑战。为了解决这些问题，我们提出了BALR-SAM，这是一个边界感知的低秩适应框架，用于增强SAM在医学成像中的应用。它结合了三个定制组件：（1）使用深度可分离卷积和多尺度融合的互补细节增强网络（CDEN），以捕获对准确分割至关重要的边界敏感特征；（2）将低秩适配器集成到SAM的视觉转换器块中，以优化医疗环境下的特征表示和注意力，同时显著减少参数空间；（3）在掩膜解码器中使用低秩张量注意力机制，减少75%的内存使用，并提高推理速度。在标准医学分割数据集上的实验表明，BALR-SAM无需提示即可超越多种最新技术方法，包括完全微调过的MedSAM，同时仅更新其1.8%（11.7M）的参数。

论文及项目相关链接

PDF

Summary

大型自然图像数据集预训练的Segment Anything Model（SAM）在医学图像分割方面存在领域特定适应性问题。为解决此挑战，提出BALR-SAM，一个边界感知的低秩适应框架，增强SAM在医学成像中的性能。包含三个定制组件：CDEN捕捉边界敏感特征，低秩适配器优化特征表示和注意力，低秩张量注意力机制减少内存使用并加快推理速度。在标准医学分割数据集上的实验显示，BALR-SAM仅更新少量参数即可超越先进方法。

Key Takeaways

预训练的医学图像分割模型面临领域特定适应性挑战。
BALR-SAM是一个增强医学成像性能的边界感知低秩适应框架。
CDEN使用深度可分离卷积和多尺度融合来捕捉边界敏感特征。
低秩适配器优化特征表示和注意力，同时显著减少参数空间。
低秩张量注意力机制减少内存使用并加速推理速度。
BALR-SAM在标准医学分割数据集上表现出卓越性能，超越了一些最先进的方法。

Cool Papers

点此查看论文截图

Joint Lossless Compression and Steganography for Medical Images via Large Language Models

Authors:Pengcheng Zheng, Xiaorong Pu, Kecheng Chen, Jiaxin Huang, Meng Yang, Bai Feng, Yazhou Ren, Jianan Jiang, Chaoning Zhang, Yang Yang, Heng Tao Shen

Recently, large language models (LLMs) have driven promising progress in lossless image compression. However, directly adopting existing paradigms for medical images suffers from an unsatisfactory trade-off between compression performance and efficiency. Moreover, existing LLM-based compressors often overlook the security of the compression process, which is critical in modern medical scenarios. To this end, we propose a novel joint lossless compression and steganography framework. Inspired by bit plane slicing (BPS), we find it feasible to securely embed privacy messages into medical images in an invisible manner. Based on this insight, an adaptive modalities decomposition strategy is first devised to partition the entire image into two segments, providing global and local modalities for subsequent dual-path lossless compression. During this dual-path stage, we innovatively propose a segmented message steganography algorithm within the local modality path to ensure the security of the compression process. Coupled with the proposed anatomical priors-based low-rank adaptation (A-LoRA) fine-tuning strategy, extensive experimental results demonstrate the superiority of our proposed method in terms of compression ratios, efficiency, and security. The source code will be made publicly available.

最近，大型语言模型（LLM）在无损图像压缩方面取得了令人瞩目的进展。然而，直接采用现有的医学图像范例在压缩性能和效率之间往往面临令人不满意的权衡。此外，现有的基于LLM的压缩器往往忽视了压缩过程的安全性，这在现代医疗场景中至关重要。为此，我们提出了一种新颖的无损压缩与隐写术联合框架。受位平面切片（BPS）的启发，我们发现可以将隐私信息以不可见的方式嵌入医学图像中。基于此见解，我们首先设计了一种自适应模态分解策略，将整个图像分为两部分，为随后的双路径无损压缩提供全局和局部模态。在这一双路径阶段，我们在局部模态路径中创新地提出了一种分段消息隐写算法，以确保压缩过程的安全性。结合提出的基于解剖先验的低秩自适应（A-LoRA）微调策略，大量实验结果证明了我们方法在压缩比、效率和安全性方面的优越性。源代码将公开提供。

论文及项目相关链接

PDF

Summary
医学图像无损压缩领域存在挑战，现有方法难以满足压缩性能与效率之间的平衡，且忽视压缩过程的安全性。为此，提出一种新型联合无损压缩与隐写框架，采用位平面切片（BPS）技术实现医学图像的安全信息嵌入。通过自适应模态分解策略将图像分为全局和局部模态，进行双路径无损压缩，并在局部模态路径采用分段消息隐写算法保障压缩安全性。

Key Takeaways

大型语言模型（LLMs）在医学图像无损压缩中的应用面临性能与效率的权衡问题。
现有方法忽视医学图像压缩过程的安全性，这在现代医疗场景中至关重要。
提出的联合无损压缩和隐写框架基于位平面切片（BPS）技术，可实现医学图像的安全信息嵌入。
通过自适应模态分解策略，将整个医学图像分为全局和局部模态，进行双路径无损压缩。
在局部模态路径采用分段消息隐写算法，确保压缩过程的安全性。
框架结合了基于解剖先验的低秩适应（A-LoRA）微调策略。

Cool Papers

点此查看论文截图

Style-Aware Blending and Prototype-Based Cross-Contrast Consistency for Semi-Supervised Medical Image Segmentation

Authors:Chaowei Chen, Xiang Zhang, Honglie Guo, Shunfang Wang

Weak-strong consistency learning strategies are widely employed in semi-supervised medical image segmentation to train models by leveraging limited labeled data and enforcing weak-to-strong consistency. However, existing methods primarily focus on designing and combining various perturbation schemes, overlooking the inherent potential and limitations within the framework itself. In this paper, we first identify two critical deficiencies: (1) separated training data streams, which lead to confirmation bias dominated by the labeled stream; and (2) incomplete utilization of supervisory information, which limits exploration of strong-to-weak consistency. To tackle these challenges, we propose a style-aware blending and prototype-based cross-contrast consistency learning framework. Specifically, inspired by the empirical observation that the distribution mismatch between labeled and unlabeled data can be characterized by statistical moments, we design a style-guided distribution blending module to break the independent training data streams. Meanwhile, considering the potential noise in strong pseudo-labels, we introduce a prototype-based cross-contrast strategy to encourage the model to learn informative supervisory signals from both weak-to-strong and strong-to-weak predictions, while mitigating the adverse effects of noise. Experimental results demonstrate the effectiveness and superiority of our framework across multiple medical segmentation benchmarks under various semi-supervised settings.

弱强一致性学习策略在半监督医学图像分割中得到了广泛应用，通过利用有限的标记数据并强制执行弱到强的一致性来训练模型。然而，现有方法主要集中在设计和组合各种扰动方案上，忽视了框架本身的内在潜力和局限性。在本文中，我们首先识别出两个关键缺陷：一是训练数据流分离，这导致以标记流为主的确认偏见；二是监督信息利用不完全，这限制了从强到弱的一致性的探索。为了解决这些挑战，我们提出了一个风格感知的混合和基于原型的交叉对比一致性学习框架。具体而言，受经验观察启发，即标记和无标记数据之间的分布不匹配可以通过统计矩来表征，我们设计了一个风格引导的分布混合模块来打破独立的训练数据流。同时，考虑到强伪标签中的潜在噪声，我们引入了一种基于原型的交叉对比策略，以鼓励模型从弱到强和强到弱的预测中学习有用的监督信号，同时减轻噪声的不利影响。实验结果表明，我们的框架在多个医学分割基准测试下具有有效性和优越性。

论文及项目相关链接

PDF

Summary
医学图像半监督分割中常采用弱强一致性学习策略，利用有限标注数据进行训练并强制实施弱到强的一致性。但现有方法主要关注设计和组合各种扰动方案，忽略了框架本身的潜在能力和局限性。本文首先识别了两个关键问题：一是训练数据流分离，导致以标注流为主的确认偏见；二是监督信息利用不足，限制了强到弱一致性的探索。为应对这些挑战，本文提出了一个风格感知的混合和基于原型的交叉对比一致性学习框架。通过设计风格引导的分布混合模块打破独立训练数据流，同时考虑强伪标签中的潜在噪声，引入基于原型的交叉对比策略，鼓励模型从弱到强和强到弱的预测中学习有用的监督信号，同时减轻噪声的不利影响。

Key Takeaways

医学图像半监督分割中采用弱强一致性学习策略。
现有方法主要关注扰动方案的设计，忽略了框架的潜在能力和局限性。
训练数据流分离和监督信息利用不足是两大关键问题。
提出风格感知的混合和基于原型的交叉对比一致性学习框架。
通过设计风格引导的分布混合模块解决数据流分离问题。
引入基于原型的交叉对比策略，以处理强伪标签中的潜在噪声。

Cool Papers

点此查看论文截图

Label tree semantic losses for rich multi-class medical image segmentation

Authors:Junwen Wang, Oscar MacCormac, William Rochford, Aaron Kujawa, Jonathan Shapey, Tom Vercauteren

Rich and accurate medical image segmentation is poised to underpin the next generation of AI-defined clinical practice by delineating critical anatomy for pre-operative planning, guiding real-time intra-operative navigation, and supporting precise post-operative assessment. However, commonly used learning methods for medical and surgical imaging segmentation tasks penalise all errors equivalently and thus fail to exploit any inter-class semantics in the labels space. This becomes particularly problematic as the cardinality and richness of labels increases to include subtly different classes. In this work, we propose two tree-based semantic loss functions which take advantage of a hierarchical organisation of the labels. We further incorporate our losses in a recently proposed approach for training with sparse, background-free annotations to extend the applicability of our proposed losses. Extensive experiments are reported on two medical and surgical image segmentation tasks, namely head MRI for whole brain parcellation (WBP) with full supervision and neurosurgical hyperspectral imaging (HSI) for scene understanding with sparse annotations. Results demonstrate that our proposed method reaches state-of-the-art performance in both cases.

丰富而准确的医学图像分割有望为下一代人工智能定义的临床实践提供支持，通过描绘关键解剖结构来进行术前规划、指导实时术中导航，并支持精确的术后评估。然而，医学和手术图像分割任务中常用的学习方法平等地惩罚所有错误，因此未能利用标签空间中的任何类间语义。当标签的基数和丰富性增加以包括细微不同的类别时，这变得特别成问题。在这项工作中，我们提出了两种基于树的语义损失函数，它们利用了标签的层次组织。我们还将我们的损失纳入最近提出的用于稀疏、无背景注释的训练方法中，以扩展我们提出的损失适用性。在两项医学和手术图像分割任务上进行了广泛的实验，即在全监督下的头部MRI全脑细分（WBP）和在稀疏注释下的神经外科高光谱成像（HSI）场景理解。结果表明，我们的方法在两种情况下均达到了最先进的性能。

论文及项目相关链接

PDF

Summary

本摘要介绍了一种新方法，针对医疗和手术图像分割任务设计了两款基于树结构的语义损失函数，能够有效处理复杂的医学图像分割问题。该方法充分利用标签的层次结构组织信息，通过构建两种树形结构损失函数来提升分割精度。实验结果显示，该方法在脑部MRI全监督分割和神经外科场景理解的稀疏注释中均达到了最佳性能。该方法有助于提升AI在临床实践中的应用效果，如术前规划、术中导航和术后评估等。

Key Takeaways

医疗和手术图像分割任务的常规学习方法对错误采取一视同仁的处理方式，忽略标签空间中不同类别之间的语义差异。这种处理方法对于类别多且丰富的标签会产生问题。
研究提出了一种新的基于树结构的语义损失函数，利用标签的层次结构来提升分割精度。这种方法适用于复杂的医学图像分割问题。
方法包括两种树形结构损失函数，旨在提高分割性能并处理复杂的医学图像数据。
方法通过结合稀疏、无背景注释的训练方法，扩大了其适用性。
实验结果显示，该方法在脑部MRI全监督分割和神经外科场景理解的稀疏注释中均达到了最佳性能。

Cool Papers

点此查看论文截图

Autoadaptive Medical Segment Anything Model

Authors:Tyler Ward, Meredith K. Owen, O’Kira Coleman, Brian Noehren, Abdullah-Al-Zubaer Imran

Medical image segmentation is a key task in the imaging workflow, influencing many image-based decisions. Traditional, fully-supervised segmentation models rely on large amounts of labeled training data, typically obtained through manual annotation, which can be an expensive, time-consuming, and error-prone process. This signals a need for accurate, automatic, and annotation-efficient methods of training these models. We propose ADA-SAM (automated, domain-specific, and adaptive segment anything model), a novel multitask learning framework for medical image segmentation that leverages class activation maps from an auxiliary classifier to guide the predictions of the semi-supervised segmentation branch, which is based on the Segment Anything (SAM) framework. Additionally, our ADA-SAM model employs a novel gradient feedback mechanism to create a learnable connection between the segmentation and classification branches by using the segmentation gradients to guide and improve the classification predictions. We validate ADA-SAM on real-world clinical data collected during rehabilitation trials, and demonstrate that our proposed method outperforms both fully-supervised and semi-supervised baselines by double digits in limited label settings. Our code is available at: https://github.com/tbwa233/ADA-SAM.

医学图像分割是成像工作流程中的一项关键任务，影响着许多基于图像的决定。传统的全监督分割模型依赖于大量的标记训练数据，这些数据通常通过手动注释获得，这一过程既昂贵又耗时，还容易出错。这凸显了对准确、自动和标注效率高的模型训练方法的需要。我们提出了ADA-SAM（自动、特定领域和自适应分割任何事物模型），这是一种基于医学图像分割的新型多任务学习框架。它利用辅助分类器的类激活图来指导半监督分割分支的预测，该分支基于分割任何事物（SAM）框架。此外，我们的ADA-SAM模型采用了一种新型梯度反馈机制，通过在分割和分类分支之间建立可学习的连接，利用分割梯度来指导并改进分类预测。我们在康复试验期间收集的真实世界临床数据上验证了ADA-SAM，并证明在有限标签设置下，我们所提出的方法在全监督和半监督基准测试上的表现均超出两位数。我们的代码可在：https://github.com/tbwa233/ADA-SAM找到。

论文及项目相关链接

PDF 11 pages, 2 figures, 3 tables

Summary

医学图像分割是成像工作流程中的关键任务，影响许多基于图像的决定。传统全监督分割模型依赖于大量手动标注的训练数据，这一过程既昂贵又耗时，且容易出错。因此，需要准确、自动、标注效率高的模型训练方法。我们提出ADA-SAM（自动、特定领域、自适应分割任何模型），这是一种基于Segment Anything（SAM）框架的新型多任务学习框架，用于医学图像分割。它利用辅助分类器的类激活图来指导半监督分割分支的预测，并引入了一种新颖的梯度反馈机制，通过分割梯度来指导和改进分类预测，从而在分割和分类分支之间建立可学习的连接。我们在康复试验期间收集的真实世界临床数据上验证了ADA-SAM，结果表明，在有限标签条件下，所提出的方法在全监督和半监督基准测试上的表现均超出两位数。

Key Takeaways

医学图像分割在成像工作流程中起关键作用，影响基于图像的多项决策。
传统全监督分割模型依赖大量手动标注的训练数据，这既耗时又成本高昂。
需要更准确、自动、标注效率更高的医学图像分割模型训练方法。
提出了ADA-SAM模型，这是一种新型多任务学习框架，用于医学图像分割。
5.ADA-SAM利用辅助分类器的类激活图指导半监督分割分支的预测。
6.ADA-SAM采用新颖的梯度反馈机制，通过分割梯度改进分类预测，建立分割和分类分支之间的可学习连接。

Cool Papers

点此查看论文截图

MediQ-GAN: Quantum-Inspired GAN for High Resolution Medical Image Generation

Authors:Qingyue Jiao, Yongcan Tang, Jun Zhuang, Jason Cong, Yiyu Shi

Machine learning-assisted diagnosis shows promise, yet medical imaging datasets are often scarce, imbalanced, and constrained by privacy, making data augmentation essential. Classical generative models typically demand extensive computational and sample resources. Quantum computing offers a promising alternative, but existing quantum-based image generation methods remain limited in scale and often face barren plateaus. We present MediQ-GAN, a quantum-inspired GAN with prototype-guided skip connections and a dual-stream generator that fuses classical and quantum-inspired branches. Its variational quantum circuits inherently preserve full-rank mappings, avoid rank collapse, and are theory-guided to balance expressivity with trainability. Beyond generation quality, we provide the first latent-geometry and rank-based analysis of quantum-inspired GANs, offering theoretical insight into their performance. Across three medical imaging datasets, MediQ-GAN outperforms state-of-the-art GANs and diffusion models. While validated on IBM hardware for robustness, our contribution is hardware-agnostic, offering a scalable and data-efficient framework for medical image generation and augmentation.

机器学习辅助诊断具有广阔前景，然而医学图像数据集往往稀缺、不均衡且受隐私限制，使得数据增强变得至关重要。传统的生成模型通常需要大量的计算和样本资源。量子计算提供了一个有前途的替代方案，但现有的基于量子图像生成的方法在规模上仍然有限，并经常面临荒芜的高原问题。我们提出了MediQ-GAN，这是一个受量子启发的GAN，具有原型引导跳跃连接和双流生成器，融合了经典和量子启发分支。其变分量子电路固有的保持了全秩映射，避免了秩崩溃，并在理论的指导下在表现力和可训练性之间取得平衡。除了生成质量之外，我们还对量子启发的GANs进行了首次基于潜几何和秩的分析，为其性能提供了理论上的见解。在三个医学图像数据集上，MediQ-GAN优于最先进GANs和扩散模型。尽管在IBM硬件上进行了稳健性验证，但我们的贡献是硬件无关的，提供了一个可扩展和高效的数据框架用于医学图像生成和增强。

论文及项目相关链接

PDF

Summary
量子启发式的MediQ-GAN在医学图像生成和增强方面展现出优势，其结合了经典和量子分支，有效解决了医学图像数据集稀缺、不均衡和隐私受限的问题。该模型通过原型引导跳过连接和双流生成器，提高了生成质量和效率。此外，还提供了对量子启发式的GAN的潜在几何和秩的分析。

Key Takeaways

医学图像数据集常常存在稀缺、不均衡和隐私受限的问题，数据增强显得尤为重要。
量子计算为解决医学图像生成提供了新的前景，但现有量子图像生成方法存在规模限制和性能瓶颈。
MediQ-GAN结合了经典和量子分支，通过原型引导跳过连接和双流生成器实现高效医学图像生成。
变分量子电路在MediQ-GAN中保持了全秩映射，避免了秩崩溃，实现了表达力和训练性的平衡。
除了生成质量外，研究还提供了对量子启发式的GAN的潜在几何和秩的分析，为理解其性能提供了理论视角。
在三个医学图像数据集上，MediQ-GAN优于最先进的其他GANs和扩散模型。

Cool Papers

点此查看论文截图

BRISC: Annotated Dataset for Brain Tumor Segmentation and Classification

Authors:Amirreza Fateh, Yasin Rezvani, Sara Moayedi, Sadjad Rezvani, Fatemeh Fateh, Mansoor Fateh, Vahid Abolghasemi

Accurate segmentation and classification of brain tumors from Magnetic Resonance Imaging (MRI) remain key challenges in medical image analysis, primarily due to the lack of high-quality, balanced, and diverse datasets with expert annotations. In this work, we address this gap by introducing BRISC, a dataset designed for brain tumor segmentation and classification tasks, featuring high-resolution segmentation masks. The dataset comprises 6,000 contrast-enhanced T1-weighted MRI scans, which were collated from multiple public datasets that lacked segmentation labels. Our primary contribution is the subsequent expert annotation of these images, performed by certified radiologists and physicians. It includes three major tumor types, namely glioma, meningioma, and pituitary, as well as non-tumorous cases. Each sample includes high-resolution labels and is categorized across axial, sagittal, and coronal imaging planes to facilitate robust model development and cross-view generalization. To demonstrate the utility of the dataset, we provide benchmark results for both tasks using standard deep learning models. The BRISC dataset is made publicly available. datasetlink: Kaggle (https://www.kaggle.com/datasets/briscdataset/brisc2025/), Figshare (https://doi.org/10.6084/m9.figshare.30533120), Zenodo (https://doi.org/10.5281/zenodo.17524350)

从磁共振成像（MRI）对脑肿瘤进行精确分割和分类，在医学图像分析领域仍是关键挑战，这主要是因为缺乏带有专家注释的高质量、均衡和多样化的数据集。在这项工作中，我们通过引入BRISC数据集来解决这一差距，该数据集专为脑肿瘤分割和分类任务设计，具有高质量分辨率分割掩膜。该数据集包含6000张增强型T1加权MRI扫描图像，这些图像是从多个缺少分割标签的公开数据集中整理的。我们的主要贡献是对这些图像进行了随后的专家注释，这些注释由认证过的放射学家和医生完成。它包含三种主要的肿瘤类型，即胶质瘤、脑膜瘤和垂体瘤，以及非肿瘤病例。每个样本都包含高分辨率的标签，并按轴向、矢状和冠状成像平面进行分类，以促进稳健的模型开发和跨视图推广。为了证明数据集的有效性，我们使用标准深度学习模型为两个任务提供了基准测试结果。BRISC数据集已公开发布。数据集链接：Kaggle（https://www.kaggle.com/datasets/briscdataset/brisc2025/）、Figshare（https://doi.org/10.6084/m9.figshare.30533120）、Zenodo（https://doi.org/10.5281/zenodo.17524350）

论文及项目相关链接

PDF

Summary

本文介绍了一个专为脑肿瘤分割和分类任务设计的数据集BRISC，包含了经过专家标注的高分辨率分割掩膜。该数据集由6000个对比增强的T1加权MRI扫描组成，涵盖了三种主要肿瘤类型（胶质瘤、脑膜瘤和垂体瘤）以及非肿瘤病例。每个样本都按轴向、矢状面和冠状面进行分类，以促进稳健的模型开发和跨视图泛化。为展示该数据集实用性，本文提供了使用标准深度学习模型的基准测试结果，并公开提供BRISC数据集。

Key Takeaways