⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-11-11 更新
Intensive X-ray/UVOIR continuum reverberation mapping of the Seyfert AGN MCG+08-11-11
Authors:D. Kynoch, I. M. McHardy, E. M. Cackett, J. Gelbord, J. V. Hernández Santisteban, K. Horne, J. A. Miller, H. Netzer, C. Done, R. Edelson, M. M. Fausnaugh, M. R. Goad, B. M. Peterson, F. M. Vincentelli
We present results from intensive (x3 daily), three-month-long X-ray, UV and optical monitoring of the bright Seyfert active galactic nucleus (AGN) MCG+08-11-11 with Swift, supported by optical-infrared ground-based monitoring. The 12 resultant, well-sampled, lightcurves are highly correlated; in particular, the X-ray to UV correlation r_max = 0.85 is, as far as we know, the highest yet recorded in a Seyfert galaxy. The lags increase with wavelength, as expected from reprocessing of central high-energy emission by surrounding material. Our lag spectrum is much shallower than that obtained from an optical monitoring campaign conducted a year earlier when MCG+08-11-11 was approximately 4 times brighter. After filtering out long-term trends in the earlier optical lightcurves we recover shorter lags consistent with our own - demonstrating concurrent reverberation signals from different spatial scales and the luminosity dependence of the measured lags. We use our lag spectrum to test several physical models, finding that disc reprocessing models cannot account for the observed ‘excess’ lags in the u and r-i-bands that are highly indicative of the Balmer and Paschen continua produced by reprocessing in the broad line region (BLR) gas. The structure seen in both the variable (rms) and lag spectra, and the large time delay between X-ray and UV variations (approximately 2 days) all suggest that the BLR is the dominant reprocessor. The hard X-ray spectrum (Gamma approximately 1.7) and faint, red, UV-optical spectrum both indicate that the Eddington accretion ratio is low: approximately 0.03. The bolometric luminosity then requires that the black hole mass is substantially greater than current reverberation mapping derived estimates.
我们对明亮的塞弗特活动星系核(AGN)MCG+08-11-11进行了密集(每日三次)、为期三个月的X射线、紫外线和光学监测,并利用Swift和地面光学-红外监测数据支持。所得的12条采样良好的光变曲线高度相关。特别是X射线和紫外线的最大相关系数r_max=0.85,据我们所知,这是在塞弗特星系中记录的最高值。延迟时间随波长增加而增加,这符合周围物质对中央高能发射的再处理预期。我们的延迟谱比一年前进行的光学监测活动得到的延迟谱要平缓得多,当时MCG+08-11-11亮度大约是现在的四倍。在过滤掉早期光学光变曲线的长期趋势后,我们恢复了较短的延迟时间,这与我们自己的观测结果一致,证明了不同空间尺度上的同时回波信号以及测量到的延迟与光度之间的依赖关系。我们使用延迟谱测试了几种物理模型,发现圆盘再处理模型无法解释u波段和r-i波段观察到的“过剩”延迟,这些延迟明显表明巴尔末连续体和帕申连续体是在宽线区(BLR)气体再处理产生的。在可变(rms)光谱和延迟光谱中看到的结构以及X射线和紫外线变化之间较长的时间延迟(约两天)都表明宽线区是主要的再处理器。硬X射线光谱(Gamma约为1.7)和微弱、红色、紫外光学光谱均表明爱丁顿增质率较低:约为0.03。总光度要求黑洞质量远大于当前由回波映射得出的估计值。
论文及项目相关链接
PDF 24 pages, 13 figures (including appendices). Revised following referee’s report
Summary
本文介绍了对亮塞弗特活动星系核(MCG+08-11-11)进行为期三个月的密集(每日三次)X射线、紫外线和光学监测的结果。研究发现X射线至紫外线的强相关性,且时延随波长增加而增加。与前一年的光学监测结果相比,本研究的滞后光谱更为平缓。研究还测试了多种物理模型,发现盘再处理模型无法解释观察到的“过剩”滞后现象,提示巴尔默和帕申连续体是在宽线区(BLR)气体中产生的。大的时间延迟和结构特征均表明宽线区是主要的光学反应器。硬X射线光谱和暗淡的红外-光学光谱均表明爱丁顿积率较低,要求黑洞质量远大于当前通过回波映射得出的估计值。
Key Takeaways
- 进行了长达三个月的密集X射线、紫外线和光学监测,针对亮塞弗特活动星系核MCG+08-11-11。
- 发现X射线至紫外线之间存在高度相关性,最大相关系数r_max = 0.85。
- 时延随波长增加而增加,符合中心高能辐射被周围物质再处理的预期。
- 与前一年的光学监测结果相比,本研究中的滞后光谱更平缓。
- 盘再处理模型无法解释观察到的某些滞后现象,暗示宽线区(BLR)气体产生巴尔默和帕申连续体。
- 宽线区可能是主要的光学反应器,这一结论基于大的时间延迟、变量光谱和特定的时间特征。
点此查看论文截图
Walk the Lines 2: Contour Tracking for Detailed Segmentation
Authors:André Peter Kelm, Max Braeschke, Emre Gülsoylu, Simone Frintrop
This paper presents Walk the Lines 2 (WtL2), a unique contour tracking algorithm specifically adapted for detailed segmentation of infrared (IR) ships and various objects in RGB.1 This extends the original Walk the Lines (WtL) [12], which focused solely on detailed ship segmentation in color. These innovative WtLs can replace the standard non-maximum suppression (NMS) by using contour tracking to refine the object contour until a 1-pixel-wide closed shape can be binarized, forming a segmentable area in foreground-background scenarios. WtL2 broadens the application range of WtL beyond its original scope, adapting to IR and expanding to diverse objects within the RGB context. To achieve IR segmentation, we adapt its input, the object contour detector, to IR ships. In addition, the algorithm is enhanced to process a wide range of RGB objects, outperforming the latest generation of contour-based methods when achieving a closed object contour, offering high peak Intersection over Union (IoU) with impressive details. This positions WtL2 as a compelling method for specialized applications that require detailed segmentation or high-quality samples, potentially accelerating progress in several niche areas of image segmentation.
本文介绍了Walk the Lines 2(WtL2)这一独特的轮廓跟踪算法,该算法特别适用于红外(IR)船舶和RGB中各种对象的详细分割1。这是对原始Walk the Lines(WtL)的扩展,WtL专注于彩色图像的详细船舶分割。这些创新的WtL能够通过轮廓跟踪来完善对象轮廓,直至达到1像素宽的闭合形状可进行二值化,从而在前景-背景场景中形成可分割区域,从而取代标准的非最大抑制(NMS)。WtL2拓宽了WtL的应用范围,适应红外领域并扩展到RGB内部的多种对象。为了实现红外分割,我们调整了输入的对象轮廓检测器以适应红外船舶。此外,该算法经过改进能够处理多种RGB对象,在达到闭合对象轮廓时,其性能优于最新的轮廓方法,具有高的峰值交并比(IoU)且细节令人印象深刻。这使得WtL2成为专业应用的理想方法,适用于需要详细分割或高质量样本的应用领域,有望加速图像分割领域的多个小众领域的进步。
论文及项目相关链接
PDF 11 pages, 6 figures. Accepted at CAIP 2025: 21st International Conference on Computer Analysis of Images and Patterns, Las Palmas de Gran Canaria, Spain, September 22-25, 2025. To appear in: Proceedings Part I, Lecture Notes in Computer Science (LNCS), Springer Nature Switzerland
Summary
本文介绍了Walk the Lines 2(WtL2)算法,该算法是一种独特的轮廓跟踪算法,适用于红外(IR)船舶和RGB中各种对象的详细分割。WtL2扩展了原始的Walk the Lines(WtL),不仅能对彩色船舶进行详细分割,还能适应红外分割和RGB中的多种对象。WtL2通过轮廓跟踪细化对象轮廓,直至达到1像素宽度的闭合形状,可在前景背景场景中形成可二值化的分段区域。该算法在达到闭合对象轮廓方面优于最新的轮廓方法,具有高的峰值交集比并展现出令人印象深刻的细节。这使得WtL2成为需要详细分割或高质量样本的专用应用程序的有吸引力的方法,有望加速图像分割领域的多个细分领域的进展。
Key Takeaways
- WtL2是一种基于轮廓跟踪的算法,用于详细分割红外船舶和RGB中的多种对象。
- WtL2扩展了原始的WtL算法,适应了红外分割,并扩展到RGB中的多种对象。
- WtL2通过轮廓跟踪细化对象轮廓,达到1像素宽度的闭合形状,形成可二值化的分段区域。
- WtL2在达到闭合对象轮廓方面表现出色,具有高峰值交集比和令人印象深刻的细节。
- WtL2适用于需要详细分割或高质量样本的专用应用程序。
- WtL2可能成为加速图像分割领域多个细分领域进展的关键方法。
点此查看论文截图
Multimodal Deep Learning for Prediction of Progression-Free Survival in Patients with Neuroendocrine Tumors Undergoing 177Lu-based Peptide Receptor Radionuclide Therapy
Authors:Simon Baur, Tristan Ruhwedel, Ekin Böke, Zuzanna Kobus, Gergana Lishkova, Christoph Wetz, Holger Amthauer, Christoph Roderburg, Frank Tacke, Julian M. Rogasch, Wojciech Samek, Henning Jann, Jackie Ma, Johannes Eschrich
Peptide receptor radionuclide therapy (PRRT) is an established treatment for metastatic neuroendocrine tumors (NETs), yet long-term disease control occurs only in a subset of patients. Predicting progression-free survival (PFS) could support individualized treatment planning. This study evaluates laboratory, imaging, and multimodal deep learning models for PFS prediction in PRRT-treated patients. In this retrospective, single-center study 116 patients with metastatic NETs undergoing 177Lu-DOTATOC were included. Clinical characteristics, laboratory values, and pretherapeutic somatostatin receptor positron emission tomography/computed tomographies (SR-PET/CT) were collected. Seven models were trained to classify low- vs. high-PFS groups, including unimodal (laboratory, SR-PET, or CT) and multimodal fusion approaches. Explainability was evaluated by feature importance analysis and gradient maps. Forty-two patients (36%) had short PFS (< 1 year), 74 patients long PFS (>1 year). Groups were similar in most characteristics, except for higher baseline chromogranin A (p = 0.003), elevated gamma-GT (p = 0.002), and fewer PRRT cycles (p < 0.001) in short-PFS patients. The Random Forest model trained only on laboratory biomarkers reached an AUROC of 0.59 +- 0.02. Unimodal three-dimensional convolutional neural networks using SR-PET or CT performed worse (AUROC 0.42 +- 0.03 and 0.54 +- 0.01, respectively). A multimodal fusion model laboratory values, SR-PET, and CT -augmented with a pretrained CT branch - achieved the best results (AUROC 0.72 +- 0.01, AUPRC 0.80 +- 0.01). Multimodal deep learning combining SR-PET, CT, and laboratory biomarkers outperformed unimodal approaches for PFS prediction after PRRT. Upon external validation, such models may support risk-adapted follow-up strategies.
肽受体放射性核素治疗(PRRT)是治疗转移性神经内分泌肿瘤(NETs)的一种既定方法,但长期疾病控制仅发生在部分患者中。预测无进展生存期(PFS)有助于制定个体化治疗方案。本研究评估了实验室、成像和多模态深度学习模型在PRRT治疗患者中的PFS预测效果。这项回顾性单中心研究纳入了116例接受177Lu-DOTATOC治疗的转移性NETs患者。收集了临床特征、实验室值和预治疗前的生长抑素受体正电子发射断层扫描/计算机断层扫描(SR-PET/CT)数据。训练了7种模型来区分低PFS组和高PFS组,包括单模态(实验室、SR-PET或CT)和多模态融合方法。通过特征重要性分析和梯度图对解释性进行了评估。42名患者(36%)PFS较短(<1年),74名患者PFS较长(>1年)。除基线血浆铬粒素A升高(p=0.003)、γ-GT升高(p=0.002)和PRRT周期较少(p<0.001)外,两组患者的多数特征相似。仅使用实验室生物标志物训练的随机森林模型达到AUROC 0.59 ± 0.02。使用SR-PET或CT的单模态三维卷积神经网络表现较差(AUROC分别为0.42 ± 0.03和0.54 ± 0.01)。一种结合实验室值、SR-PET和CT的多模态融合模型,辅以预训练CT分支,取得了最佳结果(AUROC 0.72 ± 0.01,AUPRC 0.80 ± 0.01)。对于PRRT后的PFS预测,结合SR-PET、CT和实验室生物标志物的多模态深度学习优于单模态方法。经过外部验证,此类模型可支持风险适应的随访策略。
论文及项目相关链接
Summary
本研究评估了肽受体放射性核素疗法(PRRT)治疗转移性神经内分泌肿瘤(NETs)后,通过实验室检测、影像技术以及多模态深度学习模型预测无进展生存期(PFS)的方法。研究结果表明,多模态融合模型结合实验室检测、SR-PET和CT数据能够达到最佳预测效果。
Key Takeaways
- 研究背景:虽然PRRT是治疗转移性NETs的既定方法,但长期疾病控制仅发生在部分患者中。预测PFS有助于支持个体化治疗计划。
- 研究方法:本研究采用回顾性、单中心研究,纳入116例接受177Lu-DOTATOC治疗的转移性NETs患者,收集临床特征、实验室值以及治疗前体素受体正电子发射断层扫描/计算机断层扫描(SR-PET/CT)数据。
- 预测模型:训练了包括单模态(实验室、SR-PET或CT)和多模态融合方法等在内的7个模型来分类低PFS与高PFS组。
- 结果分析:实验室生物标志物随机森林模型达到AUROC 0.59±0.02。SR-PET或CT的单模态三维卷积神经网络表现较差。多模态融合模型结合实验室值、SR-PET和CT数据,并辅以预训练CT分支,取得最佳结果,AUROC为0.72±0.01,AUPRC为0.80±0.01。
- 重要性分析:多模态深度学习结合SR-PET、CT和实验室生物标志物在预测PRRT后的PFS方面优于单模态方法。未来经过外部验证后,此类模型可支持风险适应的随访策略。
- 研究亮点:多模态融合方法表现出最佳预测性能,整合了不同数据模态的优势,有助于提高预测准确性。
点此查看论文截图
Medical Referring Image Segmentation via Next-Token Mask Prediction
Authors:Xinyu Chen, Yiran Wang, Gaoyang Pang, Jiafu Hao, Chentao Yue, Luping Zhou, Yonghui Li
Medical Referring Image Segmentation (MRIS) involves segmenting target regions in medical images based on natural language descriptions. While achieving promising results, recent approaches usually involve complex design of multimodal fusion or multi-stage decoders. In this work, we propose NTP-MRISeg, a novel framework that reformulates MRIS as an autoregressive next-token prediction task over a unified multimodal sequence of tokenized image, text, and mask representations. This formulation streamlines model design by eliminating the need for modality-specific fusion and external segmentation models, supports a unified architecture for end-to-end training. It also enables the use of pretrained tokenizers from emerging large-scale multimodal models, enhancing generalization and adaptability. More importantly, to address challenges under this formulation-such as exposure bias, long-tail token distributions, and fine-grained lesion edges-we propose three novel strategies: (1) a Next-k Token Prediction (NkTP) scheme to reduce cumulative prediction errors, (2) Token-level Contrastive Learning (TCL) to enhance boundary sensitivity and mitigate long-tail distribution effects, and (3) a memory-based Hard Error Token (HET) optimization strategy that emphasizes difficult tokens during training. Extensive experiments on the QaTa-COV19 and MosMedData+ datasets demonstrate that NTP-MRISeg achieves new state-of-the-art performance, offering a streamlined and effective alternative to traditional MRIS pipelines.
医学参考图像分割(MRIS)涉及根据自然语言描述对医学图像中的目标区域进行分割。虽然近期的方法取得了有前景的结果,但它们通常涉及复杂的多模态融合设计或多阶段解码器。在这项工作中,我们提出了NTP-MRISeg,这是一个新的框架,它将MRIS重新表述为一个在令牌化图像、文本和掩码表示的统一多模态序列上的自回归下一个令牌预测任务。这种表述简化了模型设计,消除了对特定模态融合和外部分割模型的需求,支持端到端的统一架构进行训练。它还允许使用来自新兴大规模多模态模型的预训练令牌器,增强了通用性和适应性。更重要的是,为了解决这个问题下的挑战,例如暴露偏差、长尾令牌分布和精细病变边缘,我们提出了三种新的策略:(1)一种Next-k令牌预测(NkTP)方案,以减少累积预测错误,(2)令牌级对比学习(TCL)以增强边界敏感性和减轻长尾分布的影响,(3)一种基于内存的硬错误令牌(HET)优化策略,在训练过程中强调困难令牌。在QaTa-COV19和MosMedData+数据集上的大量实验表明,NTP-MRISeg达到了新的最先进的性能,为传统的MRIS管道提供了简化而有效的替代方案。
论文及项目相关链接
PDF This work has been submitted to the IEEE Transactions on Medical Imaging for possible publication
Summary
本文提出了一种新的医学图像分割方法——NTP-MRISeg框架,它将医学图像分割任务转化为一个统一的序列预测任务,通过统一的端到端训练架构实现图像、文本和掩码的多模态融合。框架包括多种新策略以优化预测结果,如减少累积预测误差的下一个K个令牌预测方案,增强边界敏感性和缓解长尾分布效应的令牌级对比学习,以及基于内存的错误令牌优化策略。实验证明,NTP-MRISeg在医学图像分割任务上取得了最新成果。
Key Takeaways
- NTP-MRISeg将医学图像分割任务转化为序列预测任务,简化了模型设计。
- 利用统一的多模态序列令牌化技术实现了端到端的训练架构。
- 框架可以利用预训练的令牌化器提高泛化和适应性。
- 提出三种新策略:下一个K个令牌预测方案减少预测误差,令牌级对比学习增强边界敏感性和缓解长尾分布效应,基于内存的错误令牌优化策略强化训练过程。
点此查看论文截图
Pattern-Aware Diffusion Synthesis of fMRI/dMRI with Tissue and Microstructural Refinement
Authors:Xiongri Shen, Jiaqi Wang, Yi Zhong, Zhenxi Song, Leilei Zhao, Yichen Wei, Lingyan Liang, Shuqiang Wang, Baiying Lei, Demao Deng, Zhiguo Zhang
Magnetic resonance imaging (MRI), especially functional MRI (fMRI) and diffusion MRI (dMRI), is essential for studying neurodegenerative diseases. However, missing modalities pose a major barrier to their clinical use. Although GAN- and diffusion model-based approaches have shown some promise in modality completion, they remain limited in fMRI-dMRI synthesis due to (1) significant BOLD vs. diffusion-weighted signal differences between fMRI and dMRI in time/gradient axis, and (2) inadequate integration of disease-related neuroanatomical patterns during generation. To address these challenges, we propose PDS, introducing two key innovations: (1) a pattern-aware dual-modal 3D diffusion framework for cross-modality learning, and (2) a tissue refinement network integrated with a efficient microstructure refinement to maintain structural fidelity and fine details. Evaluated on OASIS-3, ADNI, and in-house datasets, our method achieves state-of-the-art results, with PSNR/SSIM scores of 29.83 dB/90.84% for fMRI synthesis (+1.54 dB/+4.12% over baselines) and 30.00 dB/77.55% for dMRI synthesis (+1.02 dB/+2.2%). In clinical validation, the synthesized data show strong diagnostic performance, achieving 67.92%/66.02%/64.15% accuracy (NC vs. MCI vs. AD) in hybrid real-synthetic experiments. Code is available in \href{https://github.com/SXR3015/PDS}{PDS GitHub Repository}
磁共振成像(MRI),特别是功能磁共振成像(fMRI)和扩散磁共振成像(dMRI),在研究神经退行性疾病中发挥着重要作用。然而,缺乏模态是其在临床应用中的一大障碍。虽然基于生成对抗网络(GAN)和扩散模型的方法在模态补全方面显示出了一些前景,但由于(1)fMRI和dMRI在时间/梯度轴上的BOLD与扩散加权信号差异显著,(2)在生成过程中疾病相关神经解剖模式的整合不足,它们在fMRI-dMRI合成方面仍存在局限性。为了解决这些挑战,我们提出了PDS方法,引入了两个关键创新点:(1)一种模式感知双模态3D扩散框架,用于跨模态学习;(2)一个与高效微观结构细化相结合的组织细化网络,以保持结构保真度和细节。在OASIS-3、ADNI和内部数据集上的评估表明,我们的方法达到了最新水平的结果,fMRI合成的PSNR/SSIM分数为29.83 dB/90.84%(比基线高出1.54 dB/ + 4.12%),dMRI合成的PSNR/SSIM分数为30.00 dB/77.55% (比基线高出1.02 dB/+ 2.2%)。在临床验证中,合成数据显示了强大的诊断性能,在混合真实-合成实验中实现了对NC与MCI与AD诊断的67.92%/ 66.02%/ 64.15%的准确率。代码可在PDS GitHub Repository(https://github.com/SXR3015/PDS)中找到。
论文及项目相关链接
摘要
文本中提到利用磁共振成像(MRI)中的功能MRI(fMRI)和扩散MRI(dMRI)来研究神经退行性疾病的重要性,但缺失的模式是临床应用的主要障碍。虽然基于GAN和扩散模型的模态完成法显示出了一些潜力,但由于fMRI和dMRI在时间/梯度轴上的BOLD与扩散加权信号差异显著,以及在生成过程中疾病相关神经解剖学模式的整合不足,它们在fMRI-dMRI合成中仍受到限制。为解决这些挑战,本文提出了PDS方法,引入了两个关键创新点:一是模式感知双模态3D扩散框架,用于跨模态学习;二是结合高效微观结构精化的组织细化网络,以保持结构保真度和细节。在OASIS-3、ADNI和内部数据集上的评估表明,该方法达到了最新技术水平,fMRI合成的PSNR/SSIM分数为29.83 dB/90.84%(比基线高出1.54 dB/4.12%),dMRI合成的PSNR/SSIM分数为30.00 dB/77.55%(比基线高出1.02 dB/2.2%)。在临床验证中,合成数据表现出强大的诊断性能,在混合真实-合成实验中,对NC与MCI及AD的准确率分别达到67.92%/66.02%/64.15%。相关代码可在PDS GitHub Repository找到。
Key Takeaways
- 磁共振成像(MRI)中的功能MRI(fMRI)和扩散MRI(dMRI)在研究神经退行性疾病中具有重要作用。
- 缺失的模式是MRI在临床应用中的主要障碍之一。
- 基于GAN和扩散模型的模态完成法在fMRI-dMRI合成中面临挑战,主要包括信号差异和疾病相关神经解剖学模式的整合问题。
- PDS方法通过引入模式感知双模态3D扩散框架和结合微观结构精化的组织细化网络,解决了上述问题。
- PDS方法在多个数据集上达到了最新技术水平,并在临床验证中表现出强大的诊断性能。
- PDS方法的fMRI和dMRI合成结果具有较高的PSNR和SSIM分数,表明其良好的图像质量和结构相似性。
点此查看论文截图
An Active Learning Pipeline for Biomedical Image Instance Segmentation with Minimal Human Intervention
Authors:Shuo Zhao, Yu Zhou, Jianxu Chen
Biomedical image segmentation is critical for precise structure delineation and downstream analysis. Traditional methods often struggle with noisy data, while deep learning models such as U-Net have set new benchmarks in segmentation performance. nnU-Net further automates model configuration, making it adaptable across datasets without extensive tuning. However, it requires a substantial amount of annotated data for cross-validation, posing a challenge when only raw images but no labels are available. Large foundation models offer zero-shot generalizability, but may underperform on specific datasets with unique characteristics, limiting their direct use for analysis. This work addresses these bottlenecks by proposing a data-centric AI workflow that leverages active learning and pseudo-labeling to combine the strengths of traditional neural networks and large foundation models while minimizing human intervention. The pipeline starts by generating pseudo-labels from a foundation model, which are then used for nnU-Net’s self-configuration. Subsequently, a representative core-set is selected for minimal manual annotation, enabling effective fine-tuning of the nnU-Net model. This approach significantly reduces the need for manual annotations while maintaining competitive performance, providing an accessible solution for biomedical researchers to apply state-of-the-art AI techniques in their segmentation tasks. The code is available at https://github.com/MMV-Lab/AL_BioMed_img_seg.
生物医学图像分割对于精确结构界定和下游分析至关重要。传统方法在处理噪声数据时经常遇到困难,而U-Net等深度学习模型在分割性能上设定了新的基准。nnU-Net进一步自动化了模型配置,使其能够适应各种数据集而无需大量调整。然而,它需要进行大量的交叉验证标注数据,当只有原始图像而没有标签时,这就构成了挑战。大型基础模型提供零样本泛化能力,但在具有独特特性的特定数据集上可能会表现不佳,限制了其直接用于分析。这项工作通过提出一种以数据为中心的AI工作流程来解决这些瓶颈,该流程结合传统神经网络和大型基础模型的优点,同时最小化人工干预,利用主动学习和伪标签标注。该流程首先使用基础模型生成伪标签,然后用于nnU-Net的自我配置。随后选择有代表性的核心集进行最小手动注释,实现对nnU-Net模型的有效微调。这种方法在减少对手动注释需求的同时保持了竞争力,为生物医学研究人员在其分割任务中应用最先进的AI技术提供了可行的解决方案。代码可在 https://github.com/MMV-Lab/AL_BioMed_img_seg 找到。
论文及项目相关链接
PDF 6 pages, 4 figures, presented at Bildverarbeitung f"ur die Medizin (BVM) 2025, Wiesbaden, Germany
Summary
生物医学图像分割对于精确结构轮廓描绘和下游分析至关重要。传统方法常常难以处理噪声数据,而深度学习模型如U-Net为分割性能设定了新的基准。nnU-Net进一步自动化模型配置,使其能够适应不同数据集而无需广泛调整。然而,它需要在大量标注数据上进行交叉验证,这对仅提供原始图像而无标签的情况提出了挑战。大型基础模型提供零样本泛化能力,但在具有独特特性的特定数据集上可能表现不佳。本文解决这些瓶颈,通过利用主动学习和伪标签的数据中心AI工作流程结合传统神经网络和大型基础模型的优点,同时最小化人工干预。从基础模型生成伪标签,用于nnU-Net的自我配置。随后选择代表性核心集进行最小手动注释,实现对nnU-Net模型的有效微调。这种方法显著减少了对手动注释的需求,同时保持竞争力性能,为生物医学研究人员在其分割任务中应用最新AI技术提供了可行的解决方案。
Key Takeaways
- 生物医学图像分割对于后续分析至关重要,但传统方法在处理噪声数据时表现不佳。
- 深度学习模型如U-Net在图像分割上表现优越,而nnU-Net能自动化模型配置以适应不同数据集。
- nnU-Net需要大量的标注数据进行交叉验证,这在缺乏标签的情况下是一个挑战。
- 大型基础模型虽具有零样本泛化能力,但在特定数据集上可能表现不足。
- 本文提出了一个数据中心的AI工作流程,结合了传统神经网络和大型基础模型的优点,同时减少了人工干预。
- 该流程利用伪标签和主动学习,通过选择代表性核心集进行最小手动注释来优化nnU-Net模型的性能。
点此查看论文截图
Data Efficiency and Transfer Robustness in Biomedical Image Segmentation: A Study of Redundancy and Forgetting with Cellpose
Authors:Shuo Zhao, Jianxu Chen
Generalist biomedical image segmentation models such as Cellpose are increasingly applied across diverse imaging modalities and cell types. However, two critical challenges remain underexplored: (1) the extent of training data redundancy and (2) the impact of cross domain transfer on model retention. In this study, we conduct a systematic empirical analysis of these challenges using Cellpose as a case study. First, to assess data redundancy, we propose a simple dataset quantization (DQ) strategy for constructing compact yet diverse training subsets. Experiments on the Cyto dataset show that image segmentation performance saturates with only 10% of the data, revealing substantial redundancy and potential for training with minimal annotations. Latent space analysis using MAE embeddings and t-SNE confirms that DQ selected patches capture greater feature diversity than random sampling. Second, to examine catastrophic forgetting, we perform cross domain finetuning experiments and observe significant degradation in source domain performance, particularly when adapting from generalist to specialist domains. We demonstrate that selective DQ based replay reintroducing just 5-10% of the source data effectively restores source performance, while full replay can hinder target adaptation. Additionally, we find that training domain sequencing improves generalization and reduces forgetting in multi stage transfer. Our findings highlight the importance of data centric design in biomedical image segmentation and suggest that efficient training requires not only compact subsets but also retention aware learning strategies and informed domain ordering. The code is available at https://github.com/MMV-Lab/biomedseg-efficiency.
通用生物医学图像分割模型(如Cellpose)在多种成像模式和细胞类型中的应用越来越广泛。然而,还有两个关键挑战尚未得到充分研究:(1)训练数据冗余的程度以及(2)跨域转移对模型保留的影响。在本研究中,我们以Cellpose为案例,对这些挑战进行了系统的实证分析。首先,为了评估数据冗余,我们提出了一种简单的数据集量化(DQ)策略,用于构建紧凑且多样化的训练子集。在Cyto数据集上的实验表明,仅使用10%的数据,图像分割性能就达到饱和,这揭示了大量的数据冗余和少量注释进行训练的可能性。使用MAE嵌入和t-SNE进行的潜在空间分析证实,DQ选择的补丁比随机抽样更能捕捉特征多样性。其次,为了研究灾难性遗忘,我们进行了跨域微调实验,并观察到源域性能显著下降,特别是在从通用领域转向专业领域时。我们证明,基于选择性DQ重播仅重新引入5-10%的源数据即可有效恢复源性能,而全面重播可能会阻碍目标适应。此外,我们发现训练域排序改进了泛化并在多阶段转移中减少了遗忘。我们的研究结果表明,数据为中心的设计在生物医学图像分割中的重要性,并表明高效训练不仅需要紧凑的子集,还需要保留感知的学习策略和知情的域排序。相关代码可在https://github.com/MMV-Lab/biomedseg-efficiency上找到。
论文及项目相关链接
PDF Accepted to IEEE BIBM 2025 Workshop; 6 pages; 4 figures; 5 tables; IEEEtran class. Code: https://github.com/MMV-Lab/biomedseg-efficiency
Summary
本研究针对生物医学图像分割模型(如Cellpose)在多样成像模态和细胞类型中的广泛应用,系统性地探讨了两个关键挑战:训练数据冗余和跨域迁移对模型保留的影响。研究通过数据集量化策略评估数据冗余,发现仅使用10%的数据即可达到图像分割性能饱和,表明存在大量冗余,可进行最小标注训练。同时,研究通过潜在空间分析确认量化选取的补丁能够捕捉更多的特征多样性。在考察灾难遗忘方面,研究进行了跨域微调实验,观察到源域性能显著下降,特别是在从通用领域到专业领域转变时。通过选择性回放和训练域排序策略,研究有效地恢复了源域性能并提高了模型的泛化能力。这些发现凸显了数据为中心的生物医学图像分割设计的重要性,建议高效训练需要精简子集、保留感知学习策略和明智的域排序。相关代码已公开分享。
Key Takeaways
- Cellpose等通用生物医学图像分割模型面临训练数据冗余和跨域迁移挑战。
- 数据集量化策略用于评估数据冗余,显示仅使用少量数据即可达到性能饱和。
- 通过潜在空间分析确认量化选取的数据片段具有更大的特征多样性。
- 跨域微调实验中观察到源域性能显著下降。
- 选择性回放策略可有效地恢复源域性能并减少目标域适应障碍。
- 训练域排序策略有助于提高模型的泛化能力和减少遗忘。
点此查看论文截图
The nexus between negative charge-transfer and reduced on-site Coulomb energy in correlated topological metals
Authors:A. R. Shelke, C. -W. Chuang, S. Hamamoto, M. Oura, M. Yoshimura, N. Hiraoka, C. -N. Kuo, C. -S. Lue, A. Fujimori, A. Chainani
The layered $3d$ transition metal dichalcogenides (TMDs) CoTe$2$ and NiTe$2$ are topological Dirac Type-II metals. Their $d$-bands do not exhibit the expected correlation-induced band narrowing seen in CoO and NiO. We address this conundrum by quantifying the on-site Coulomb energy $U{dd}$ via single-particle partial density of states and the two-hole correlation satellite using valence band resonant photoemission spectroscopy (PES), and obtain $U{dd}$ = 3.0 eV/3.7 eV for CoTe$_2$/NiTe$2$. Charge-transfer (CT) cluster model simulations of the measured core-level PES and x-ray absorption spectra of CoTe$2$ and CoO validate their contrasting electronic parameters:$U{dd}$ and CT energy $\Delta$ are (3.0 eV, -2.0 eV) for CoTe$2$, and (5.0 eV, 4.0 eV) for CoO, respectively. The $d$-$p$ hybridization strength $T{eg}$ for CoTe$2$$<$CoO, and indicates that the reduced $U_{dd}$ in CoTe$_2$ is not due to $T_{eg}$. The increase in $d^n$-count$\sim$1 by CT from ligand to Co site in CoTe$_2$ is due to a negative-$\Delta$ and reduced $U_{dd}$. Yet, only because $U_{dd}$$>$$\big|\Delta\big|$, CoTe${2}$ becomes a topological metal with $p$$\rightarrow$${p}$ type lowest energy excitations. Similarly, we obtain a negative-$\Delta$ and reduced $U{dd}$ in NiTe$2$ compared to NiO. The study reveals the nexus between negative-$\Delta$ and reduced $U{dd}$ required for setting up the electronic structure framework for achieving topological behavior via band inversion in correlated metals.
层状三维过渡金属二卤化物(TMDs)CoTe2和NiTe2是拓扑狄拉克二类金属。它们的d轨道带并未表现出在CoO和NiO中预期的由关联引起的带隙变窄现象。我们通过量化库仑能量Udd来研究这一难题,采用单粒子部分态密度和通过价带共振光电子光谱法(PES)得到两空穴关联卫星,对于CoTe2和NiTe2,我们得到Udd= 3.0 eV/3.7 eV。对CoTe2和CoO的测量核心能级PES和X射线吸收光谱的电荷转移(CT)集群模型模拟验证了其对比电子参数:对于CoTe2,Udd和电荷转移能量Δ分别为(3.0 eV,-2.0 eV),而对于CoO分别为(5.0 eV,4.0 eV)。CoTe2的d-p杂化强度Teg小于CoO,表明CoTe2中Udd的减少并非由于Teg。CoTe2中从配体到Co位点的电荷转移导致dn计数增加约1,这是由于负Δ和减少的Udd。然而,只有Udd> |Δ|时,CoTe2才能成为具有p→p型最低能量激发的拓扑金属。类似地,与NiO相比,我们在NiTe2中也获得了负Δ和减少的Udd。该研究揭示了负Δ和减少的Udd之间的联系,这对于通过相关金属中的能带反转实现拓扑行为的电子结构框架至关重要。
论文及项目相关链接
PDF 8 pages + 5 figures(main) and 10 pages + 9 figures (SM) (submitted to PRB)(corrected reference nos.)
Summary
钴和镍的三元二维过渡金属二硫化物(TMDs)CoTe2和NiTe2是拓扑狄拉克Type-II金属。研究通过测量单粒子部分态密度和两孔关联光谱的库仑能量Ud,解决了它们在d带上未出现预期的相关性引起的带窄化问题。对CoTe2和NiTe2的Ud分别为3.0 eV和3.7 eV。对CoTe2和CoO的核心能级光电子发射光谱和X射线吸收光谱的电荷转移集群模型模拟验证了其不同的电子参数:CoTe2的Ud和电荷转移能量Δ分别为3.0 eV和-2.0 eV,而CoO的分别为5.0 eV和4.0 eV。研究揭示了负Δ和降低的Ud对于通过带反转在相关金属中实现拓扑行为的重要性。
Key Takeaways
- CoTe2和NiTe2是拓扑狄拉克Type-II金属。
- 通过测量单粒子部分态密度和两孔关联光谱解决了d带上未出现的带窄化问题。
- CoTe2和NiTe2的库仑能量Ud分别为3.0 eV和3.7 eV。
- 相较于CoO,CoTe2的d-p杂化强度Teg较低。
- 负的电荷转移能量Δ和较低的Ud存在于CoTe2和NiTe2中,相较于其对应的氧化物(如CoO和NiO)。
- 研究表明负Δ和降低的Ud对于在相关金属中实现拓扑行为至关重要。
点此查看论文截图
Beyond Spin Coating: Homogeneous All-Inorganic Perovskite Films via High-Pressure Recrystallization
Authors:Asma Miled, Trong Tam Nguyen, José Penuelas, Aziz Benamrouche, Céline Chevalier, Thi Kim Anh Hoang, Gaëlle Trippé-Allard, Elsa Cassette, Brice Devif, Emmanuel Drouard, Emmanuelle Deleporte, Hong Hanh Mai, Abdelaziz Bouazizi, Christian Seassal, Hai Son Nguyen
Metal halide perovskites are promising materials for optoelectronic applications owing to their outstanding optical and electronic properties. Among them, all-inorganic perovskites such as CsPbBr$_3$ offer superior thermal and chemical stability. However, obtaining high-quality CsPbBr$_3$ thin films via solution processing remains challenging due to the precursor’s low solubility, and current additive or solvent engineering strategies are often complex and poorly reproducible. High-pressure recrystallization has recently emerged as a promising route to improve film quality, yet its impact on film properties remains insufficiently explored. Here, we systematically investigate the morphological, structural, and optical properties of CsPbBr$_3$ thin films prepared by high-pressure recrystallization, in comparison with standard non-recrystallized films. Optimized recrystallization at 300 bar produces smooth, pinhole-free, single-phase 3D perovskite layers with sub-nanometer roughness, while the film thickness is precisely tunable via precursor concentration. The process enhances both grain and crystallite sizes, leading to amplified spontaneous emission with a reduced excitation threshold and improved photostability. Temperature-dependent X-ray diffraction further reveals the orthorhombic–tetragonal–cubic phase transition, consistent with single-crystal behavior. This study provides fundamental insights into pressure-driven recrystallization and establishes a reproducible, scalable approach for fabricating high-quality CsPbBr$_3$ films for optoelectronic devices.
金属卤化物钙钛矿因其卓越的光学和电子特性而成为光电子应用中的有前途的材料。其中,如CsPbBr3之类的全无机钙钛矿具有出色的热和化学稳定性。然而,通过溶液处理获得高质量的CsPbBr3薄膜仍然具有挑战性,因为前驱体的溶解度低,并且当前的添加剂或溶剂工程策略通常复杂且重现性差。高压再结晶最近被证明是提高薄膜质量的有前途的途径,然而它对薄膜性能的影响尚未得到充分探索。在这里,我们系统地研究了通过高压再结晶制备的CsPbBr3薄膜与标准非再结晶薄膜的形态学、结构和光学性能。在300巴下优化的再结晶可产生光滑、无针孔、单相的3D钙钛矿层,粗糙度在亚纳米范围内,同时可通过前驱体浓度精确调节薄膜厚度。这一过程增加了晶粒和微晶的尺寸,从而放大了自发发射,降低了激发阈值并提高了光稳定性。温度依赖的X射线衍射进一步揭示了正交-四重-立方相变,这与单晶行为相一致。本研究为压力驱动再结晶提供了基本见解,并建立了一种可重复、可扩展的方法来制造用于光电子器件的高质量CsPbBr3薄膜。
论文及项目相关链接
摘要
金属卤化物钙钛矿因其卓越的光学和电子特性而成为光电子应用的有前途的材料。然而,通过溶液处理获得高质量的CsPbBr_3薄膜仍具有挑战性,因为前驱体的溶解度低,当前的添加剂或溶剂工程策略往往复杂且重现性差。高压再结晶最近被证明是提高薄膜质量的有前途的途径,但其对薄膜性质的影响仍探索不足。本文系统地研究了通过高压再结晶制备的CsPbBr_3薄膜的形态学、结构和光学性质,并与标准的非再结晶薄膜进行了比较。在300巴下优化的再结晶产生了光滑、无针孔、单相的3D钙钛矿层,粗糙度在亚纳米范围内,同时可通过前驱体浓度精确调节薄膜厚度。这一过程提高了晶粒和微晶的尺寸,导致自发发射增强,激发阈值降低,光稳定性提高。温度依赖的X射线衍射进一步揭示了正交-四重-立方相变,与单晶行为一致。本研究为压力驱动再结晶提供了基础见解,并建立了一种可重复、可扩展的方法来制造用于光电子器件的高质量CsPbBr_3薄膜。
关键见解
- 金属卤化物钙钛矿,如CsPbBr_3,在光电子应用中具有前景。
- 高压再结晶是提高CsPbBr_3薄膜质量的有前途的方法。
- 优化后的高压再结晶(300 bar)产生了光滑、无针孔、单相的钙钛矿层。
- 薄膜厚度可通过前驱体浓度精确调节。
- 高压再结晶提高了晶粒和微晶尺寸,增强了自发发射并降低了激发阈值。
- 温度依赖的X射线衍射揭示了CsPbBr_3的相变行为,与单晶性质一致。
- 研究为制造高质量CsPbBr_3薄膜提供了可重复和可扩展的方法。
点此查看论文截图
Med-Banana-50K: A Cross-modality Large-Scale Dataset for Text-guided Medical Image Editing
Authors:Zhihui Chen, Mengling Feng
Medical image editing has emerged as a pivotal technology with broad applications in data augmentation, model interpretability, medical education, and treatment simulation. However, the lack of large-scale, high-quality, and openly accessible datasets tailored for medical contexts with strict anatomical and clinical constraints has significantly hindered progress in this domain. To bridge this gap, we introduce Med-Banana-50K, a comprehensive dataset of over 50k medically curated image edits spanning chest X-ray, brain MRI, and fundus photography across 23 diseases. Each sample supports bidirectional lesion editing (addition and removal) and is constructed using Gemini-2.5-Flash-Image based on real clinical images. A key differentiator of our dataset is the medically grounded quality control protocol: we employ an LLM-as-Judge evaluation framework with criteria such as instruction compliance, structural plausibility, image realism, and fidelity preservation, alongside iterative refinement over up to five rounds. Additionally, Med-Banana-50K includes around 37,000 failed editing attempts with full evaluation logs to support preference learning and alignment research. By offering a large-scale, medically rigorous, and fully documented resource, Med-Banana-50K establishes a critical foundation for developing and evaluating reliable medical image editing systems. Our dataset and code are publicly available. [https://github.com/richardChenzhihui/med-banana-50k].
医学图像编辑已成为一项具有广泛应用的关键技术,包括数据增强、模型解释性、医学教育和治疗模拟等领域。然而,由于缺乏针对医学情境的、符合严格解剖学和临床约束的大规模、高质量、公开可访问的数据集,该领域的进展受到了显著阻碍。为了弥补这一差距,我们推出了Med-Banana-50K数据集,它包含超过5万份经过医学处理的图像编辑,涵盖胸部X光片、脑部MRI和眼底摄影,跨越23种疾病。每个样本都支持双向病变编辑(增加和删除),并使用基于真实临床图像的Gemini-2.5-Flash-Image构建。我们数据集的一个关键区别在于医学基础的质量控制协议:我们采用LLM-as-Judge评估框架,包括指令合规性、结构合理性、图像真实性和保真度保持等标准,并进行最多五轮的迭代改进。此外,Med-Banana-50K还包括约3.7万次失败的编辑尝试及完整的评估日志,以支持偏好学习和对齐研究。通过提供大规模、医学严谨和完整的资源,Med-Banana-50K为开发可靠的医学图像编辑系统奠定了关键基础。我们的数据集和代码已公开可用。[https://github.com/richardChenzhihui/med-banana-50k]。
论文及项目相关链接
Summary
医疗图像编辑技术在数据增强、模型解释性、医学教育和治疗模拟等方面有着广泛应用,但缺乏针对医学语境的大规模、高质量、公开可访问的数据集限制了该领域的进展。为解决此问题,我们推出Med-Banana-50K数据集,包含超过5万份医学审核过的图像编辑样本,涵盖胸部X光、脑部MRI和眼底摄影等23种疾病。每个样本支持双向病变编辑(增加和移除),基于真实临床图像使用Gemini-2.5-Flash-Image构建。我们的数据集采用医学基础质量控制协议,采用LLM-as-Judge评估框架,包括指令合规性、结构合理性、图像真实性和保真度保留等标准,并进行至多五轮的迭代改进。此外,Med-Banana-50K还包括约3.7万次编辑失败的尝试,并附有完整的评估日志,以支持偏好学习和对齐研究。通过提供大规模、医学严谨和完整记录的资源,Med-Banana-50K为开发评估可靠的医疗图像编辑系统提供了重要基础。
Key Takeaways
- 医疗图像编辑技术广泛应用于数据增强、模型解释性等领域。
- 缺乏针对医学语境的大规模高质量数据集限制了医疗图像编辑技术的发展。
- Med-Banana-50K是一个涵盖多种医疗图像的大规模数据集,支持病变的双向编辑。
- Med-Banana-50K基于真实临床图像构建,并使用医学基础质量控制协议。
- Med-Banana-50K采用LLM-as-Judge评估框架进行质量评估,包括多个评估标准。
- 数据集包含编辑失败的尝试和评估日志,以支持偏好学习和对齐研究。
- Med-Banana-50K为开发评估可靠的医疗图像编辑系统提供了重要基础。
点此查看论文截图
Wind-AE: A Fast, Open-source 1D Photoevaporation Code with Metal and Multi-frequency X-ray Capabilities
Authors:Madelyn Broome, Ruth Murray-Clay, John McCann, James E Owen
Throughout their lives, short period exoplanets (<100 days) experience X-ray and extreme-UV (XUV) stellar irradiation that can heat and photoionize planets’ upper atmospheres, driving transonic outflows. This photoevaporative mass loss plays a role in both evolution and observed demographics; however, mass loss rates are not currently directly observable and can only be inferred from models. To that end, we present an open-source fast 1D, XUV multi-frequency, multispecies, steady-state, hydrodynamic Parker Wind photoevaporation relaxation model based on Murray-Clay et al. (2009,arXiv:0811.0006). The model can move smoothly between high and low flux regimes and accepts custom multi-frequency stellar spectra. While the inclusion of high-energy X-rays increases mass loss rates ($\dot{M}$), metals decrease $\dot{M}$, and the net result for a typical hot Jupiter is a similar $\dot{M}$, but a hotter, faster, and more gradually ionized wind. We find that mulitfrequency photons (e.g., 13.6-2000eV) are absorbed over a broader range of heights in the atmosphere resulting in a wind-launch radius, $R_{XUV}$, that is of order 10 nanobars for all but the highest surface gravity planets. Grids of H/He solar metallicity atmospheres reveal that, for typical hot Jupiters like HD 209458b, $R_{XUV}$~1.1-1.8$R_P$ for low-fluxes, meaning that the energy-limited mass loss rate, $\dot{M}{Elim}(R)$, computed at $R=R_P$ is a good approximation. However, for planets with low escape velocities, like many sub-Neptunes and super-Earths, $R_{XUV}$ can be >>$R_P$, making it necessary to use $\dot{M}{Elim}(R=R_{XUV})$ to avoid significantly underestimating mass loss rates. For both high escape velocities and large incident fluxes, radiative cooling is significant and energy-limited mass loss overestimates $\dot{M}$.
在它们的生命周期中,周期短于100天的外行星会经历X射线和极端紫外(XUV)恒星辐射,这些辐射可以加热和电离行星的上层大气,从而产生跨音速流出。这种光蒸发质量损失在行星的演变和观测人口统计中都发挥作用;然而,目前无法直接观测到质量损失率,只能根据模型进行推断。为此,我们基于Murray-Clay等人(2009,arXiv:0811.0006)的研究,提出了一个开源的快速一维XUV多频、多组分稳态流体动力帕克风(Parker Wind)光蒸发松弛模型。该模型可以在高、低流量状态之间平稳过渡,并接受自定义的多频恒星光谱。在包含高能的X射线会增加质量损失率(M),而金属会降低M的情况下,对于典型的热木星来说,净结果具有相似的M值,但风更加热、更快、电离过程更为渐进。我们发现多频光子(例如,在13.6-2000电子伏特之间)在大气层中更高的高度范围内被吸收,导致风发射半径Rxuv在所有行星中除表面重力最高的行星外约为10纳米巴。由H/He太阳金属组成的大气网格揭示,对于像HD 209458b这样的典型热木星来说,Rxuv为Rp的1.1-1.8倍在低流量下意味着能量限制的质量损失率在Rp处计算是一个很好的近似值。然而,对于具有低逃逸速度的行如许多次海王星和超级地球来说,Rxuv可能远大于Rp,因此必须使用以R=Rxuv处的能量限制质量损失率来避免低估质量损失率。对于高逃逸速度和大的入射流量的情况,辐射冷却是显著的并且能量限制的质量损失会高估M值。
论文及项目相关链接
PDF 40 pages, 23 figures, Accepted
Summary
该文本介绍了短周期外行星(寿命小于100天)在X射线和极端紫外(XUV)恒星辐射下的状况,这种辐射会加热和电离行星上层大气,从而产生跨音速流出。通过呈现基于Murray-Clay等人(2009)的开源快速一维、XUV多频、多组分稳态流体动力Parker风的光蒸发松弛模型,文中探讨了行星质量损失的影响。模型能够灵活适应高、低流量状态,并接受自定义的多频恒星光谱。同时发现,金属和能量的相互作用会导致质量损失率(Mdot)的净结果保持稳定,但对行星风的性质有显著影响,使行星的风更热、更快并且电离程度更加缓慢。强调在不同情形下进行精确估算时选取适宜的半径的重要性。对于逃逸速度较高的行星和较大的入射流量,能量限制的质量损失存在高估的可能。
Key Takeaways
- 短周期外行星受到强烈的X射线和极端紫外辐射影响,导致大气层的热化和光致电离。这种影响成为影响行星演变和观测到的特点的重要因素之一。为了预测质量损失过程需要使用建模技术。文中介绍了一个基于Parker风的光蒸发松弛模型。该模型考虑了多种频率的恒星光谱,并能够适应不同的流量环境。文中特别提到了高能量X射线的加入会增加质量损失率(Mdot)。
点此查看论文截图
Self-supervised Deep Unrolled Model with Implicit Neural Representation Regularization for Accelerating MRI Reconstruction
Authors:Jingran Xu, Yuanyuan Liu, Yuanbiao Yang, Zhuo-Xu Cui, Jing Cheng, Qingyong Zhu, Nannan Zhang, Yihang Zhou, Dong Liang, Yanjie Zhu
Magnetic resonance imaging (MRI) is a vital clinical diagnostic tool, yet its application is limited by prolonged scan times. Accelerating MRI reconstruction addresses this issue by reconstructing high-fidelity MR images from undersampled k-space measurements. In recent years, deep learning-based methods have demonstrated remarkable progress. However, most methods rely on supervised learning, which requires large amounts of fully-sampled training data that are difficult to obtain. This paper proposes a novel zero-shot self-supervised reconstruction method named UnrollINR, which enables scan-specific MRI reconstruction without external training data. UnrollINR adopts a physics-guided unrolled reconstruction architecture and introduces implicit neural representation (INR) as a regularization prior to effectively constrain the solution space. This method overcomes the local bias limitation of CNNs in traditional deep unrolled methods and avoids the instability associated with relying solely on INR’s implicit regularization in highly ill-posed scenarios. Consequently, UnrollINR significantly improves MRI reconstruction performance under high acceleration rates. Experimental results show that even at a high acceleration rate of 10, UnrollINR achieves superior reconstruction performance compared to supervised and self-supervised learning methods, validating its effectiveness and superiority.
磁共振成像(MRI)是一种重要的临床诊断工具,但其扫描时间过长限制了其应用。加速MRI重建通过从欠采样的k空间测量值重建高保真MR图像来解决这个问题。近年来,基于深度学习的方法已经取得了显著的进步。然而,大多数方法依赖于监督学习,这需要大量完全采样的训练数据,这些数据很难获得。本文提出了一种新型的零样本自监督重建方法,名为UnrollINR,它可以在没有外部训练数据的情况下实现特定的MRI重建。UnrollINR采用物理引导的反卷积重建架构,并引入隐式神经表示(INR)作为正则化先验,有效地约束解空间。该方法克服了传统深度反卷积方法中CNN的局部偏见限制,避免了在高度不适定场景中仅依赖INR隐式正则化的不稳定性。因此,UnrollINR在高速扫描下显著提高了MRI重建性能。实验结果表明,即使在高达10倍的加速率下,UnrollINR的重建性能也优于监督和自监督学习方法,验证了其有效性和优越性。
论文及项目相关链接
摘要
本论文提出了一种名为UnrollINR的零样本自监督重建方法,解决了核磁共振成像(MRI)扫描时间长的问题。该方法采用物理引导的反卷积重建架构,引入隐式神经网络表示(INR)作为正则化先验,有效约束解空间。与传统深度反卷积方法中的卷积神经网络(CNN)局部偏见限制相比,UnrollINR避免了高度不适定场景中仅依赖INR隐式正则化的不稳定性。因此,UnrollINR在高加速率下显著提高了MRI重建性能。实验结果表明,即使在高达10的加速率下,UnrollINR的重建性能也优于监督学习和自监督学习方法,验证了其有效性和优越性。
要点
- UnrollINR是一种零样本自监督MRI重建方法,无需外部训练数据。
- 采用物理引导的反卷积重建架构。
- 引入隐式神经网络表示(INR)作为正则化先验,有效约束解空间。
- 克服了传统深度反卷积方法中CNN的局部偏见限制。
- 避免在高度不适定场景中仅依赖INR隐式正则化的不稳定性。
- 在高加速率下,UnrollINR显示出显著的MRI重建性能改进。
- 实验结果表明,UnrollINR的重建性能优于监督学习和自监督学习方法。
点此查看论文截图
Dual Teacher-Student Learning for Semi-supervised Medical Image Segmentation
Authors:Pengchen Zhang, Alan J. X. Guo, Sipin Luo, Zhe Han, Lin Guo
Semi-supervised learning reduces the costly manual annotation burden in medical image segmentation. A popular approach is the mean teacher (MT) strategy, which applies consistency regularization using a temporally averaged teacher model. In this work, the MT strategy is reinterpreted as a form of self-paced learning in the context of supervised learning, where agreement between the teacher’s predictions and the ground truth implicitly guides the model from easy to hard. Extending this insight to semi-supervised learning, we propose dual teacher-student learning (DTSL). It regulates the learning pace on unlabeled data using two signals: a temporally averaged signal from an in-group teacher and a cross-architectural signal from a student in a second, distinct model group. Specifically, a novel consensus label generator (CLG) creates the pseudo-labels from the agreement between these two signals, establishing an effective learning curriculum. Extensive experiments on four benchmark datasets demonstrate that the proposed method consistently outperforms existing state-of-the-art approaches. Remarkably, on three of the four datasets, our semi-supervised method with limited labeled data surpasses its fully supervised counterparts, validating the effectiveness of our self-paced learning design.
半监督学习降低了医学图像分割中昂贵的手动标注负担。一种流行的方法是均值教师(MT)策略,它使用随时间平均的教师模型应用一致性正则化。在这项工作中,我们将MT策略重新解释为监督学习背景下的自我节奏学习形式,教师预测和地面真实之间的协议隐式地引导模型从易到难。将这种见解扩展到半监督学习,我们提出了双教师-学生学习(DTSL)。它使用两个信号调节未标记数据上的学习进度:一个来自组内教师的随时间平均的信号,另一个来自第二个不同模型组中的学生的跨架构信号。具体来说,一种新型共识标签生成器(CLG)通过这两个信号之间的协议创建伪标签,建立有效的学习课程。在四个基准数据集上的大量实验表明,所提出的方法始终优于现有的最先进方法。值得注意的是,在四个数据集中的三个上,我们的半监督方法使用有限的标记数据超越了其完全监督的同类方法,验证了我们的自我节奏学习设计的有效性。
论文及项目相关链接
Summary
半监督学习降低了医学图像分割中昂贵的手动标注负担。本研究将平均教师(MT)策略解读为一种监督学习中的自我节奏学习,其中教师预测与真实值之间的协议隐含地引导模型从易到难学习。将此见解扩展到半监督学习,我们提出了双教师-学生学习(DTSL)方法。它在无标签数据上使用两个信号调节学习进度:来自组内教师的时间平均信号和来自第二个不同模型组的学生交叉架构信号。特别是,一种新的共识标签生成器(CLG)通过这两个信号之间的协议创建伪标签,建立了有效的学习课程。在四个基准数据集上的广泛实验表明,该方法始终优于现有的最先进的半监督学习方法,并且在三个数据集上实现了超过其完全监督方法的表现。验证了自我节奏学习的有效性。
Key Takeaways
- 半监督学习减少了医学图像分割中的手动标注成本。
- 平均教师策略被重新解释为自我节奏学习的一种形式,其中教师预测与真实值之间的协议指导模型学习进度。
- 提出了一种新的双教师-学生学习方法,利用两个信号在自我节奏学习中调节无标签数据的学习进度。
- 共识标签生成器通过两个信号之间的协议创建伪标签,以更有效地进行学习。
点此查看论文截图
Consistency Trajectory Matching for One-Step Generative Super-Resolution
Authors:Weiyi You, Mingyang Zhang, Leheng Zhang, Xingyu Zhou, Kexuan Shi, Shuhang Gu
Current diffusion-based super-resolution (SR) approaches achieve commendable performance at the cost of high inference overhead. Therefore, distillation techniques are utilized to accelerate the multi-step teacher model into one-step student model. Nevertheless, these methods significantly raise training costs and constrain the performance of the student model by the teacher model. To overcome these tough challenges, we propose Consistency Trajectory Matching for Super-Resolution (CTMSR), a distillation-free strategy that is able to generate photo-realistic SR results in one step. Concretely, we first formulate a Probability Flow Ordinary Differential Equation (PF-ODE) trajectory to establish a deterministic mapping from low-resolution (LR) images with noise to high-resolution (HR) images. Then we apply the Consistency Training (CT) strategy to directly learn the mapping in one step, eliminating the necessity of pre-trained diffusion model. To further enhance the performance and better leverage the ground-truth during the training process, we aim to align the distribution of SR results more closely with that of the natural images. To this end, we propose to minimize the discrepancy between their respective PF-ODE trajectories from the LR image distribution by our meticulously designed Distribution Trajectory Matching (DTM) loss, resulting in improved realism of our recovered HR images. Comprehensive experimental results demonstrate that the proposed methods can attain comparable or even superior capabilities on both synthetic and real datasets while maintaining minimal inference latency.
当前基于扩散的超分辨率(SR)方法在高推理开销的代价下实现了令人钦佩的性能。因此,采用蒸馏技术将多步教师模型加速为一步学生模型。然而,这些方法显著增加了训练成本,并且学生模型的表现受到教师模型的制约。为了克服这些挑战,我们提出了无蒸馏策略的一致轨迹匹配超分辨率重建(CTMSR),能够一步生成逼真的SR结果。具体来说,我们首先制定概率流常微分方程(PF-ODE)轨迹,从带有噪声的低分辨率(LR)图像建立确定性映射到高分辨率(HR)图像。然后,我们应用一致性训练(CT)策略直接一步学习映射,消除了预训练扩散模型的必要性。为了进一步提高性能并更好地在训练过程中利用真实标签,我们旨在使SR结果分布与自然图像分布更紧密对齐。为此,我们通过精心设计的分布轨迹匹配(DTM)损失来最小化LR图像分布各自的PF-ODE轨迹之间的差异,从而提高恢复的高分辨率图像的真实感。综合实验结果表明,所提出的方法在合成和真实数据集上可以达到相当甚至更优的能力,同时保持最低推理延迟。
论文及项目相关链接
PDF Accepted by ICCV 2025
Summary
本文提出了一种无需蒸馏的策略——一致性轨迹匹配超分辨率(CTMSR),能够在一步内生成逼真的超分辨率结果。该方法通过建立概率流常微分方程(PF-ODE)轨迹来映射从带噪声的低分辨率图像到高分辨率图像的关系,并应用一致性训练策略直接学习这种映射。同时,为了进一步提高性能并更好地利用训练过程中的真实标签,该方法还通过精心设计的分布轨迹匹配(DTM)损失来减少超分辨率结果分布与自然图像分布之间的差异,从而提高恢复的高分辨率图像的真实性。实验结果表明,该方法在合成和真实数据集上的性能可达到或超过现有技术,同时保持极低的推理延迟。
Key Takeaways
- 现有扩散超分辨率方法虽然性能良好,但推理开销较大,需要通过蒸馏技术加速。
- 提出的CTMSR策略无需蒸馏,能够一步生成超分辨率结果,降低训练成本。
- CTMSR通过建立PF-ODE轨迹来映射低分辨率到高分辨率图像的关系。
- 一致性训练策略用于直接学习这种映射,无需预训练的扩散模型。
- 为了提高性能,通过DTM损失减少超分辨率结果分布与自然图像分布之间的差异。
- 实验结果表明CTMSR在合成和真实数据集上表现优异,推理延迟低。
点此查看论文截图
FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models
Authors:Barbara Toniella Corradini, Mustafa Shukor, Paul Couairon, Guillaume Couairon, Franco Scarselli, Matthieu Cord
Foundation models have exhibited unprecedented capabilities in tackling many domains and tasks. Models such as CLIP are currently widely used to bridge cross-modal representations, and text-to-image diffusion models are arguably the leading models in terms of realistic image generation. Image generative models are trained on massive datasets that provide them with powerful internal spatial representations. In this work, we explore the potential benefits of such representations, beyond image generation, in particular, for dense visual prediction tasks. We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets, with pixel-level annotations. To avoid the annotation cost or training large diffusion models, we constraint our setup to be zero-shot and training-free. In a nutshell, our pipeline leverages different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation. The pipeline is as follows: the image is passed to both a captioner model (i.e. BLIP) and a diffusion model (i.e., Stable Diffusion Model) to generate a text description and visual representation, respectively. The features are clustered and binarized to obtain class agnostic masks for each object. These masks are then mapped to a textual class, using the CLIP model to support open-vocabulary. Finally, we add a refinement step that allows to obtain a more precise segmentation mask. Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets. In addition, we show very competitive results compared to the recent weakly-supervised segmentation approaches. We provide comprehensive experiments showing the superiority of diffusion model features compared to other pretrained models. Project page: https://bcorrad.github.io/freesegdiff/
模型,如CLIP,当前广泛用于跨模态表示的桥梁,文本到图像的扩散模型无疑是现实图像生成领域的领先模型。图像生成模型在大型数据集上进行训练,为它们提供强大的内部空间表示。在这项工作中,我们探索了这种表示在图像生成之外的潜在优势,特别是针对密集视觉预测任务。我们专注于图像分割任务,该任务传统上是通过在封闭词汇数据集上训练模型并配备像素级注释来解决的。为了避免注释成本或训练大型扩散模型,我们将设置约束为零启动且无需训练。简而言之,我们的管道利用不同且相对较小的开源基础模型进行零启动开放词汇分割。管道如下:图像被传递给描述模型(例如BLIP)和扩散模型(例如Stable Diffusion Model),以分别生成文本描述和视觉表示。这些特征被聚类和二值化以获取每个对象的类不可知掩模。然后,这些掩模被映射到文本类别,使用CLIP模型支持开放词汇。最后,我们添加了一个细化步骤,允许获得更精确的分割掩模。我们的方法(称为FreeSeg-Diff)不依赖任何训练,在Pascal VOC和COCO数据集上优于许多基于训练的方法。此外,与最近的弱监督分割方法相比,我们还展示了极具竞争力的结果。我们提供了全面的实验,显示扩散模型特征相较于其他预训练模型的优越性。项目页面:https://bcorrad.github.io/freesegdiff/
论文及项目相关链接
Summary
本文探索了图像生成模型(如CLIP和文本到图像扩散模型)在密集视觉预测任务中的潜力,重点关注图像分割任务。研究提出了一种基于零样本和免训练策略的自由分割方法(FreeSeg-Diff),利用开源基础模型进行开放词汇分割。该方法通过生成文本描述和视觉表示,结合CLIP模型进行开放词汇分类映射,获得精确分割掩膜。在Pascal VOC和COCO数据集上,该方法优于许多基于训练的方法,与最新的弱监督分割方法相比也表现出竞争力。此外,研究也进行了对比实验,证明了扩散模型特征相较于其他预训练模型的优越性。
Key Takeaways
- 基础模型如CLIP在跨模态表示中具有广泛应用。
- 图像生成模型能提供强大的内部空间表示,这在密集视觉预测任务中具有潜在优势。
- 研究聚焦于零样本和免训练的图像分割任务,利用开源基础模型实现开放词汇分割。
- 方法结合了文本描述和视觉表示生成,通过CLIP模型进行开放词汇分类映射。
- FreeSeg-Diff方法在Pascal VOC和COCO数据集上表现优越,相较于训练方法和弱监督方法具有竞争力。
- 扩散模型特征相较于其他预训练模型具有优越性。