⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-11-19 更新
SAGE: Saliency-Guided Contrastive Embeddings
Authors:Colton R. Crum, Adam Czajka
Integrating human perceptual priors into the training of neural networks has been shown to raise model generalization, serve as an effective regularizer, and align models with human expertise for applications in high-risk domains. Existing approaches to integrate saliency into model training often rely on internal model mechanisms, which recent research suggests may be unreliable. Our insight is that many challenges associated with saliency-guided training stem from the placement of the guidance approaches solely within the image space. Instead, we move away from the image space, use the model’s latent space embeddings to steer human guidance during training, and we propose SAGE (Saliency-Guided Contrastive Embeddings): a loss function that integrates human saliency into network training using contrastive embeddings. We apply salient-preserving and saliency-degrading signal augmentations to the input and capture the changes in embeddings and model logits. We guide the model towards salient features and away from non-salient features using a contrastive triplet loss. Additionally, we perform a sanity check on the logit distributions to ensure that the model outputs match the saliency-based augmentations. We demonstrate a boost in classification performance across both open- and closed-set scenarios against SOTA saliency-based methods, showing SAGE’s effectiveness across various backbones, and include experiments to suggest its wide generalization across tasks.
将人类感知先验知识融入神经网络训练已被证明可以提高模型的泛化能力,充当有效的正则化器,并用于高风险领域应用时与人类专业知识对齐。现有的将显著性整合到模型训练中的方法通常依赖于内部模型机制,但最近的研究表明这些机制可能不可靠。我们的见解是,许多与显著性引导训练相关的挑战都源于指导方法仅设置在图像空间内。相反,我们远离图像空间,使用模型的潜在空间嵌入来引导人类指导进行训练,并提出SAGE(显著性引导对比嵌入):一种使用对比嵌入将人类显著性整合到网络训练中的损失函数。我们对输入应用显著保留和显著性降低的信号增强,并捕获嵌入和模型逻辑中的变化。我们使用对比三元组损失引导模型朝向显著特征并远离非显著特征。此外,我们对逻辑分布执行健全性检查,以确保模型输出与基于显著性的增强相匹配。我们在开放和封闭场景的情况下,与最先进的基于显著性的方法相比,展示了分类性能的提升,证明了SAGE在不同主干网络中的有效性,并进行了实验以表明其在不同任务中的广泛泛化能力。
论文及项目相关链接
PDF 11 pages, 2 figures, 5 tables
Summary
本文提出将人类感知先验知识融入神经网络训练的新方法,通过引入SAGE(基于显著性引导对比嵌入)损失函数,在模型训练的潜在空间嵌入中引导人类指导。该方法使用显著性保留和显著性退化信号增强输入,通过对比三元损失函数引导模型关注显著特征并忽略非显著特征。实验证明,该方法在开放和封闭场景下均能提高分类性能,适用于多种主干网络,并可在不同任务中广泛推广。
Key Takeaways
- 整合人类感知先验知识可以提高神经网络的泛化能力,并作为有效的正则化器。
- 现有将显著性集成到模型训练中的方法常常依赖于内部模型机制,但最新研究表明这可能不可靠。
- 本文提出在模型训练的潜在空间嵌入中引导人类指导的新方法。
- 引入SAGE损失函数,通过对比嵌入将人类显著性融入网络训练。
- 使用显著性保留和显著性退化信号增强输入,并通过对比三元损失来引导模型关注显著特征。
- 实验证明该方法在分类性能上有所提升,适用于多种主干网络。
点此查看论文截图
C3Net: Context-Contrast Network for Camouflaged Object Detection
Authors:Baber Jan, Aiman H. El-Maleh, Abdul Jabbar Siddiqui, Abdul Bais, Saeed Anwar
Camouflaged object detection identifies objects that blend seamlessly with their surroundings through similar colors, textures, and patterns. This task challenges both traditional segmentation methods and modern foundation models, which fail dramatically on camouflaged objects. We identify six fundamental challenges in COD: Intrinsic Similarity, Edge Disruption, Extreme Scale Variation, Environmental Complexities, Contextual Dependencies, and Salient-Camouflaged Object Disambiguation. These challenges frequently co-occur and compound the difficulty of detection, requiring comprehensive architectural solutions. We propose C3Net, which addresses all challenges through a specialized dual-pathway decoder architecture. The Edge Refinement Pathway employs gradient-initialized Edge Enhancement Modules to recover precise boundaries from early features. The Contextual Localization Pathway utilizes our novel Image-based Context Guidance mechanism to achieve intrinsic saliency suppression without external models. An Attentive Fusion Module synergistically combines the two pathways via spatial gating. C3Net achieves state-of-the-art performance with S-measures of 0.898 on COD10K, 0.904 on CAMO, and 0.913 on NC4K, while maintaining efficient processing. C3Net demonstrates that complex, multifaceted detection challenges require architectural innovation, with specialized components working synergistically to achieve comprehensive coverage beyond isolated improvements. Code, model weights, and results are available at https://github.com/Baber-Jan/C3Net.
伪装目标检测是指识别那些通过相似颜色、纹理和图案无缝融入其周围环境的目标。这一任务既挑战了传统的分割方法,也挑战了现代的基础模型,这些模型在伪装目标上的表现急剧下降。我们确定了伪装目标检测中的六个基本挑战:内在相似性、边缘破坏、极端尺度变化、环境复杂性、上下文依赖和显著伪装目标消歧。这些挑战经常共同发生并增加了检测的难度,需要全面的架构解决方案。我们提出了C3Net,它通过专门的双路径解码器架构来解决所有挑战。边缘细化路径采用梯度初始化的边缘增强模块来从早期特征中恢复精确边界。上下文定位路径利用我们基于图像的新型上下文指导机制,实现内在显著性抑制,无需外部模型。注意力融合模块协同结合了这两个路径,通过空间门控实现。C3Net在COD10K上达到了0.898的S-measure指标,在CAMO上达到了0.904,在NC4K上达到了0.913,同时保持了高效的处理速度。C3Net证明了复杂的、多方面的检测挑战需要架构创新,通过协同工作的专业组件实现全面的覆盖,超越局部改进。相关代码、模型权重和结果可访问https://github.com/Baber-Jan/C3Net。
论文及项目相关链接
Summary
本文介绍了伪装目标检测(COD)所面临的六大挑战,并提出了针对这些挑战的C3Net模型。C3Net通过专门的双路径解码器架构解决了这些问题,实现了高效的伪装目标检测。该模型在三大数据集上达到了领先水平,并展示了其高效处理复杂检测挑战的能力。
Key Takeaways
- 伪装目标检测面临六大挑战:内在相似性、边缘破坏、极端尺度变化、环境复杂性、上下文依赖性和显著伪装目标歧义。
- C3Net模型通过双路径解码器架构解决了这些挑战。
- Edge Refinement Pathway利用梯度初始化的边缘增强模块从早期特征中恢复精确边界。
- Contextual Localization Pathway使用基于图像上下文指导机制实现内在显著性抑制。
- Attentive Fusion Module协同结合了这两个路径通过空间门控机制。
- C3Net在三大数据集上实现了领先水平,包括COD10K、CAMO和NC4K的S-measure分别为0.898、0.904和0.913。
点此查看论文截图
SEMC: Structure-Enhanced Mixture-of-Experts Contrastive Learning for Ultrasound Standard Plane Recognition
Authors:Qing Cai, Guihao Yan, Fan Zhang, Cheng Zhang, Zhi Liu
Ultrasound standard plane recognition is essential for clinical tasks such as disease screening, organ evaluation, and biometric measurement. However, existing methods fail to effectively exploit shallow structural information and struggle to capture fine-grained semantic differences through contrastive samples generated by image augmentations, ultimately resulting in suboptimal recognition of both structural and discriminative details in ultrasound standard planes. To address these issues, we propose SEMC, a novel Structure-Enhanced Mixture-of-Experts Contrastive learning framework that combines structure-aware feature fusion with expert-guided contrastive learning. Specifically, we first introduce a novel Semantic-Structure Fusion Module (SSFM) to exploit multi-scale structural information and enhance the model’s ability to perceive fine-grained structural details by effectively aligning shallow and deep features. Then, a novel Mixture-of-Experts Contrastive Recognition Module (MCRM) is designed to perform hierarchical contrastive learning and classification across multi-level features using a mixture-of-experts (MoE) mechanism, further improving class separability and recognition performance. More importantly, we also curate a large-scale and meticulously annotated liver ultrasound dataset containing six standard planes. Extensive experimental results on our in-house dataset and two public datasets demonstrate that SEMC outperforms recent state-of-the-art methods across various metrics.
超声标准平面识别在临床任务中至关重要,如疾病筛查、器官评估和生物测量。然而,现有方法未能有效利用浅层结构信息,并且在通过图像增强生成对比样本时,难以捕捉细微的语义差异,最终导致超声标准平面的结构和判别细节识别不佳。为了解决这些问题,我们提出了SEMC,这是一个结合结构感知特征融合和专家指导对比学习的新型结构增强混合专家对比学习框架。具体来说,我们首先引入了一个新的语义结构融合模块(SSFM),以利用多尺度结构信息,并通过有效对齐浅层和深层特征,提高模型感知细微结构细节的能力。然后,设计了一个新颖的专家混合对比识别模块(MCRM),通过专家混合(MoE)机制在多层次特征上执行分层对比学习和分类,进一步提高类间可分性和识别性能。更重要的是,我们还精心制作了一个包含六个标准平面的大规模肝脏超声数据集,并进行了细致的标注。在我们自制数据集和两个公开数据集上的大量实验结果表明,SEMC在各项指标上均优于最新的先进方法。
论文及项目相关链接
PDF Accepted by AAAI 2026
Summary
本文提出一种名为SEMC的新型结构增强混合专家对比学习框架,用于超声标准平面识别。该框架结合了结构感知特征融合和专家指导对比学习,通过语义结构融合模块和多层次对比识别模块,提高模型对精细结构信息的感知能力和类别可分性,进而提升超声标准平面的识别性能。
Key Takeaways
- 超声标准平面识别在临床任务中至关重要,如疾病筛查、器官评估和生物测量。
- 现有方法未能有效利用浅层结构信息,并且在通过图像增强生成对比样本时,难以捕捉细微的语义差异。
- SEMC框架结合了结构感知特征融合和专家指导对比学习,以改善超声标准平面的识别。
- 语义结构融合模块(SSFM)用于挖掘多尺度结构信息,并增强模型对细微结构信息的感知能力。
- 混合专家对比识别模块(MCRM)执行分层对比学习和分类,使用混合专家机制提高类别可分性和识别性能。
- 研发了一个大型且精心标注的肝脏超声数据集,包含六个标准平面。
点此查看论文截图
FaNe: Towards Fine-Grained Cross-Modal Contrast with False-Negative Reduction and Text-Conditioned Sparse Attention
Authors:Peng Zhang, Zhihui Lai, Wenting Chen, Xu Wu, Heng Kong
Medical vision-language pre-training (VLP) offers significant potential for advancing medical image understanding by leveraging paired image-report data. However, existing methods are limited by Fa}lse Negatives (FaNe) induced by semantically similar texts and insufficient fine-grained cross-modal alignment. To address these limitations, we propose FaNe, a semantic-enhanced VLP framework. To mitigate false negatives, we introduce a semantic-aware positive pair mining strategy based on text-text similarity with adaptive normalization. Furthermore, we design a text-conditioned sparse attention pooling module to enable fine-grained image-text alignment through localized visual representations guided by textual cues. To strengthen intra-modal discrimination, we develop a hard-negative aware contrastive loss that adaptively reweights semantically similar negatives. Extensive experiments on five downstream medical imaging benchmarks demonstrate that FaNe achieves state-of-the-art performance across image classification, object detection, and semantic segmentation, validating the effectiveness of our framework.
医学视觉-语言预训练(VLP)通过利用配对图像-报告数据,在推进医学图像理解方面显示出巨大的潜力。然而,现有方法受到语义相似文本产生的假阴性(FaNe)和不足的细粒度跨模态对齐的限制。为了解决这些局限性,我们提出了FaNe,一种语义增强的VLP框架。为了减轻假阴性,我们提出了一种基于文本-文本相似性的语义感知正向配对挖掘策略,并进行了自适应归一化。此外,我们设计了一个文本调节的稀疏注意力池模块,通过文本线索引导的局部视觉表示,实现细粒度的图像文本对齐。为了加强模态内判别力,我们开发了一种硬负样本感知对比损失,该损失可以自适应地重新加权语义上相似的负样本。在五个下游医学影像基准测试上的大量实验表明,FaNe在图像分类、目标检测和语义分割方面达到了最新水平,验证了我们框架的有效性。
论文及项目相关链接
PDF AAAI 2026
Summary
医疗视觉-语言预训练(VLP)通过利用配对图像报告数据,为推进医疗图像理解提供了显著潜力。然而,现有方法受到语义相似文本引起的假阴性(FaNe)的限制,以及缺乏精细的跨模态对齐。为解决这些问题,我们提出了FaNe语义增强VLP框架。通过基于文本-文本相似性的语义感知正向配对挖掘策略来缓解假阴性。此外,我们设计了一个文本条件稀疏注意力池模块,通过文本引导的视觉表示实现精细的图像-文本对齐。为了增强模态内判别力,我们开发了一种硬负样本感知对比损失,自适应地重新加权语义相似的负样本。在五个下游医学影像基准测试上的广泛实验表明,FaNe在图像分类、目标检测和语义分割方面达到了最先进的性能,验证了我们的框架的有效性。
Key Takeaways
- 医疗视觉-语言预训练(VLP)有潜力通过利用配对图像报告数据提升医疗图像理解。
- 现有VLP方法面临假阴性(FaNe)问题和精细跨模态对齐的挑战。
- 提出FaNe语义增强VLP框架,通过语义感知正向配对挖掘策略缓解假阴性。
- 设计文本条件稀疏注意力池模块,实现图像和文本的精细对齐。
- 引入硬负样本感知对比损失,自适应重新加权语义相似的负样本,增强模态内判别力。
- 在多个下游医学影像基准测试上取得先进性能,验证了框架的有效性。
点此查看论文截图
Comparative Study of UNet-based Architectures for Liver Tumor Segmentation in Multi-Phase Contrast-Enhanced Computed Tomography
Authors:Doan-Van-Anh Ly, Thi-Thu-Hien Pham, Thanh-Hai Le
Segmentation of liver structures in multi-phase contrast-enhanced computed tomography (CECT) plays a crucial role in computer-aided diagnosis and treatment planning for liver diseases, including tumor detection. In this study, we investigate the performance of UNet-based architectures for liver tumor segmentation, starting from the original UNet and extending to UNet3+ with various backbone networks. We evaluate ResNet, Transformer-based, and State-space (Mamba) backbones, all initialized with pretrained weights. Surprisingly, despite the advances in modern architecture, ResNet-based models consistently outperform Transformer- and Mamba-based alternatives across multiple evaluation metrics. To further improve segmentation quality, we introduce attention mechanisms into the backbone and observe that incorporating the Convolutional Block Attention Module (CBAM) yields the best performance. ResNetUNet3+ with CBAM module not only produced the best overlap metrics with a Dice score of 0.755 and IoU of 0.662, but also achieved the most precise boundary delineation, evidenced by the lowest HD95 distance of 77.911. The model’s superiority was further cemented by its leading overall accuracy of 0.925 and specificity of 0.926, showcasing its robust capability in accurately identifying both lesion and healthy tissue. To further enhance interpretability, Grad-CAM visualizations were employed to highlight the region’s most influential predictions, providing insights into its decision-making process. These findings demonstrate that classical ResNet architecture, when combined with modern attention modules, remain highly competitive for medical image segmentation tasks, offering a promising direction for liver tumor detection in clinical practice.
在多期相增强计算机断层扫描(CECT)中,肝脏结构的分割对于肝脏疾病的计算机辅助诊断和治疗计划,包括肿瘤检测,起着至关重要的作用。本研究中,我们调查了基于UNet架构的肝脏肿瘤分割性能,从原始UNet扩展到带有各种骨干网络的UNet3+。我们评估了ResNet、基于Transformer和State-space(Mamba)的骨干网络,它们都以预训练权重进行初始化。令人惊讶的是,尽管现代架构有所进步,但基于ResNet的模型在多个评估指标上始终优于基于Transformer和Mamba的替代方案。为了进一步提高分割质量,我们将注意力机制引入到骨干网络中,并观察到加入卷积块注意力模块(CBAM)会取得最佳性能。带有CBAM模块的ResNetUNet3+不仅以Dice得分0.755和IoU得分0.662的最佳重叠度指标展现出最好的性能,而且实现了最精确的边缘描绘,以最低的HD95距离77.911为证明。该模型的总体准确度达到0.925和特异性达到0.926,显示出其准确识别病灶和正常组织的稳健能力。为了进一步增加解释性,采用了Grad-CAM可视化来突出显示对预测最具有影响力的区域,提供了对其决策过程的见解。这些发现表明,经典的ResNet架构与现代的注意力模块相结合时,在医学图像分割任务中仍具有高度的竞争力,为临床实践中肝脏肿瘤检测提供了有前景的方向。
论文及项目相关链接
PDF 28 pages, 9 figures
Summary
本研究探讨了基于UNet架构的肝脏肿瘤分割性能,从原始UNet到UNet3+的不同架构,并采用了多种预训练的主干网络,如ResNet、基于Transformer的Mamba等。结果显示,尽管现代架构有所进步,但ResNet模型在多个评估指标上仍表现最佳。引入注意力机制后,性能进一步提升,其中融入卷积块注意力模块(CBAM)的ResNetUNet3+表现最佳。该模型不仅具有最佳的Dice和IoU重叠指标,且边界划定最精确,同时整体准确度和特异性高,展示了其在识别病变和健康组织方面的稳健能力。采用Grad-CAM可视化技术进一步增强了模型的解释性。
Key Takeaways
- UNet-based架构被用于肝脏肿瘤分割研究,包括使用多种预训练的主干网络。
- ResNet模型在多个评估指标上表现优于基于Transformer和Mamba的模型。
- 引入注意力机制后,尤其是融入CBAM的模型性能最佳。
- 最佳模型具有高的Dice和IoU重叠指标,边界划定精确,整体准确度和特异性高。
- 使用Grad-CAM可视化技术增强了模型的解释性。
- 研究结果展示了ResNet架构在医学图像分割任务中的竞争力。
点此查看论文截图
SynSeg: Feature Synergy for Multi-Category Contrastive Learning in End-to-End Open-Vocabulary Semantic Segmentation
Authors:Weichen Zhang, Kebin Liu, Fan Dang, Zhui Zhu, Xikai Sun, Yunhao Liu
Semantic segmentation in open-vocabulary scenarios presents significant challenges due to the wide range and granularity of semantic categories. Existing weakly-supervised methods often rely on category-specific supervision and ill-suited feature construction methods for contrastive learning, leading to semantic misalignment and poor performance. In this work, we propose a novel weakly-supervised approach, SynSeg, to address the challenges. SynSeg performs Multi-Category Contrastive Learning (MCCL) as a stronger training signal with a new feature reconstruction framework named Feature Synergy Structure (FSS). Specifically, MCCL strategy robustly combines both intra- and inter-category alignment and separation in order to make the model learn the knowledge of correlations from different categories within the same image. Moreover, FSS reconstructs discriminative features for contrastive learning through prior fusion and semantic-activation-map enhancement, effectively avoiding the foreground bias introduced by the visual encoder. Furthermore, SynSeg is a lightweight end-to-end solution without using any mid-term output from large-scale pretrained models and capable for real-time inference. In general, SynSeg effectively improves the abilities in semantic localization and discrimination under weak supervision in an efficient manner. Extensive experiments on benchmarks demonstrate that our method outperforms state-of-the-art (SOTA) performance. Particularly, SynSeg achieves higher accuracy than SOTA baselines with a ratio from 6.9% up to 26.2%.
在开放词汇场景中进行语义分割面临着巨大的挑战,因为语义类别的范围广泛且粒度精细。现有的弱监督方法通常依赖于特定类别的监督和对比学习的特征构建方法不当,导致语义不对齐和性能不佳。针对这些挑战,我们在本工作中提出了一种新的弱监督方法SynSeg。SynSeg执行多类别对比学习(MCCL)作为更强的训练信号,并引入了一个新的特征重建框架,称为特征协同结构(FSS)。具体来说,MCCL策略稳健地将同一图像内不同类别的对齐和分离结合起来,使模型学习到不同类别之间的关联知识。此外,FSS通过先验融合和语义激活图增强,为对比学习重建判别特征,有效地避免了视觉编码器带来的前景偏见。此外,SynSeg是一个轻量级的端到端解决方案,不需要使用大规模预训练模型的中期输出,并且支持实时推理。总的来说,SynSeg在弱监督条件下有效地提高了语义定位和鉴别能力,是一种高效的方法。在基准测试上的大量实验表明,我们的方法优于最先进的方法。特别是,SynSeg的准确率高于最先进的基线方法,比率从6.9%提高到26.2%。
论文及项目相关链接
Summary
本文提出了一种新型的弱监督方法SynSeg,用于解决开放词汇表场景下的语义分割挑战。该方法通过多类别对比学习(MCCL)和特征协同结构(FSS)进行训练,有效提高模型的语义定位和鉴别能力。无需使用大规模预训练模型的中期输出,可实现实时推理。在基准测试上表现优于现有技术。
Key Takeaways
- 开放词汇表场景下的语义分割面临广泛范围和粒度的语义类别挑战。
- 现有弱监督方法存在语义不对齐和性能不佳的问题。
- 提出了一种新型的弱监督方法SynSeg,包括多类别对比学习(MCCL)和特征协同结构(FSS)。
- MCCL策略实现了同一图像内不同类别的关联知识学习。
- FSS通过先验融合和语义激活图增强,有效避免前景偏差。
- SynSeg是端到端的轻量级解决方案,无需使用大型预训练模型的中期输出,适合实时推理。
点此查看论文截图
MAISI-v2: Accelerated 3D High-Resolution Medical Image Synthesis with Rectified Flow and Region-specific Contrastive Loss
Authors:Can Zhao, Pengfei Guo, Dong Yang, Yucheng Tang, Yufan He, Benjamin Simon, Mason Belue, Stephanie Harmon, Baris Turkbey, Daguang Xu
Medical image synthesis is an important topic for both clinical and research applications. Recently, diffusion models have become a leading approach in this area. Despite their strengths, many existing methods struggle with (1) limited generalizability that only work for specific body regions or voxel spacings, (2) slow inference, which is a common issue for diffusion models, and (3) weak alignment with input conditions, which is a critical issue for medical imaging. MAISI, a previously proposed framework, addresses generalizability issues but still suffers from slow inference and limited condition consistency. In this work, we present MAISI-v2, the first accelerated 3D medical image synthesis framework that integrates rectified flow to enable fast and high quality generation. To further enhance condition fidelity, we introduce a novel region-specific contrastive loss to enhance the sensitivity to region of interest. Our experiments show that MAISI-v2 can achieve SOTA image quality with $33 \times$ acceleration for latent diffusion model. We also conducted a downstream segmentation experiment to show that the synthetic images can be used for data augmentation. We release our code, training details, model weights, and a GUI demo to facilitate reproducibility and promote further development within the community.
医学图像合成是临床和研究应用中的一个重要课题。最近,扩散模型已成为该领域的主流方法。尽管它们具有优势,但许多现有方法仍面临(1)通用性有限,仅适用于特定部位或体素间距;(2)推理速度慢,这是扩散模型的常见问题;(3)与输入条件的对齐较弱,这对医学影像来说是关键问题。之前提出的MAISI框架解决了通用性问题,但仍存在推理速度慢和条件一致性有限的问题。在这项工作中,我们推出了MAISI-v2,这是第一个加速的3D医学图像合成框架,它整合了校正流,以实现快速和高质量的生成。为了进一步提高条件保真度,我们引入了一种新型的区域特异性对比损失,以提高对感兴趣区域的敏感性。我们的实验表明,MAISI-v2可以在潜在扩散模型上实现$33 \times$的加速,同时达到最先进的图像质量。我们还进行了下游分割实验,以证明合成图像可用于数据增强。我们发布了我们的代码、训练细节、模型权重和GUI演示,以促进社区内的可重复性和进一步发展。
论文及项目相关链接
PDF Accepted by AAAI 2026
Summary
医疗图像合成是临床和研究应用的重要课题,近期扩散模型成为该领域的主流方法。但现有方法存在通用性有限、推理速度慢以及与输入条件对齐不佳等问题。MAISI-v2是首个加速的3D医疗图像合成框架,通过整合矫正流实现快速、高质量生成。为提高条件保真度,引入区域特定对比损失以增强对感兴趣区域的敏感性。实验显示,MAISI-v2可在潜在扩散模型上实现$33 \times$加速,同时保持图像质量。此外,进行下游分割实验证明合成图像可用于数据增强。
Key Takeaways
- 扩散模型在医疗图像合成领域占据主导地位。
- 现有方法存在通用性、推理速度以及与输入条件对齐的问题。
- MAISI-v2框架通过整合矫正流实现快速、高质量医疗图像生成。
- 引入区域特定对比损失提高条件保真度和对感兴趣区域的敏感性。
- MAISI-v2在潜在扩散模型上实现显著加速,同时保持图像质量。
- 合成图像可用于数据增强,为下游任务如分割提供有效支持。
点此查看论文截图
AstroECP: towards more practical Electron Channeling Contrast Imaging
Authors:M. Haroon Qaiser, Lukas Berners, Robin J. Scales, Tianbi Zhang, Martin Heller, Jiri Dluhos, Sandra Korte-Kerzel, T. Ben Britton
Electron channeling contrast imaging (ECCI) is a scanning electron microscopy (SEM) based technique that enables bulk-sample characterization of crystallographic defects (e.g. dislocations, stacking faults, low angle boundaries). Despite its potential, ECCI remains underused for quantitative defect analysis as compared to transmission electron microscope (TEM) based methods. Here, we overcome barriers that limit the use of ECCI including optimizing signal-to-noise contrast, precise determination of the incident beam vector with calibrated and easy to use simulations and experimental selected area electron channeling patterns (SA-ECP). We introduce a systematic ECCI workflow, alongside a new open-source software tool (AstroECP), that includes calibration of stage tilting, SA-ECP field of view, and the energy that forms the ECP/ECCI contrast using dynamical simulations. The functionality of this workflow is demonstrated with case studies that include threading dislocations in GaAs and the cross validation of precession based ECCI-contrast, which is otherwise known as Electron Channeling Orientation Determination (eCHORD). To assist the reader, we also provide best practice guidelines for ECCI implementation to promote high-resolution defect imaging in the SEM.
电子通道衬度成像(ECCI)是一种基于扫描电子显微镜(SEM)的技术,能够对晶体缺陷(例如位错、层错、小角度边界)进行整体样品表征。尽管它具有潜力,但与透射电子显微镜(TEM)方法相比,ECCI在定量缺陷分析方面的应用仍然较少。在这里,我们克服了限制ECCI使用的障碍,包括优化信噪比衬度、使用校准的且易于使用的模拟以及实验选区电子通道模式(SA-ECP)精确确定入射光束矢量。我们引入了一个系统的ECCI工作流程,以及一个新的开源软件工具(AstroECP),包括校准倾斜阶段、SA-ECP视野以及形成ECP/ECCI衬度的能量动态模拟。该功能通过案例研究得到证明,包括GaAs中的穿线位错以及基于精度的ECCI衬度(也称为电子通道取向测定法(eCHORD))的交叉验证。为了帮助读者,我们还提供了ECCI实施的最佳实践指南,以促进SEM中的高分辨率缺陷成像。
论文及项目相关链接
PDF as submitted version, post-peer review
Summary
ECCI技术是一种基于扫描电子显微镜(SEM)的技术,用于表征晶体缺陷,如位错、层错和低角度边界。本文优化ECCT的信号噪声对比,确定入射光束矢量,并引入系统的工作流程和开源软件工具AstroECP,以校准阶段倾斜、SA-ECP视场和形成ECP/ECCT对比度的能量进行动力学模拟。此外,还通过案例研究展示了其功能性,包括GaAs中的穿线位错和基于精确度的ECCT对比度的交叉验证,也称为电子通道取向测定(eCHORD)。本文还提供ECCI实施的最佳实践指南,以促进SEM中的高分辨率缺陷成像。
Key Takeaways
- ECCI是一种基于SEM的技术,用于表征晶体缺陷。
- ECCI在定量缺陷分析方面的应用仍然有限。
- 通过优化信号噪声对比和确定入射光束矢量来克服ECCI的局限性。
- 引入系统的工作流程和开源软件工具AstroECP,用于校准阶段倾斜、SA-ECP视场和对比度的形成能量。
- 展示了ECCI在案例研究中的功能性,包括GaAs中的穿线位错和基于精确度的ECCT对比度的交叉验证。
- 提供ECCI实施的最佳实践指南。