发布日期: 2025-10-02

更新日期: 2025-11-27

文章字数: 4.5k

阅读时长: 18 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-10-02 更新

Evaluating the Impact of Radiographic Noise on Chest X-ray Semantic Segmentation and Disease Classification Using a Scalable Noise Injection Framework

Authors:Derek Jiu, Kiran Nijjer, Nishant Chinta, Ryan Bui, Ben Liu, Kevin Zhu

Deep learning models are increasingly used for radiographic analysis, but their reliability is challenged by the stochastic noise inherent in clinical imaging. A systematic, cross-task understanding of how different noise types impact these models is lacking. Here, we evaluate the robustness of state-of-the-art convolutional neural networks (CNNs) to simulated quantum (Poisson) and electronic (Gaussian) noise in two key chest X-ray tasks: semantic segmentation and pulmonary disease classification. Using a novel, scalable noise injection framework, we applied controlled, clinically-motivated noise severities to common architectures (UNet, DeepLabV3, FPN; ResNet, DenseNet, EfficientNet) on public datasets (Landmark, ChestX-ray14). Our results reveal a stark dichotomy in task robustness. Semantic segmentation models proved highly vulnerable, with lung segmentation performance collapsing under severe electronic noise (Dice Similarity Coefficient drop of 0.843), signifying a near-total model failure. In contrast, classification tasks demonstrated greater overall resilience, but this robustness was not uniform. We discovered a differential vulnerability: certain tasks, such as distinguishing Pneumothorax from Atelectasis, failed catastrophically under quantum noise (AUROC drop of 0.355), while others were more susceptible to electronic noise. These findings demonstrate that while classification models possess a degree of inherent robustness, pixel-level segmentation tasks are far more brittle. The task- and noise-specific nature of model failure underscores the critical need for targeted validation and mitigation strategies before the safe clinical deployment of diagnostic AI.

深度学习模型在放射学分析中的应用越来越广泛，但临床图像中固有的随机噪声对其可靠性提出了挑战。目前还缺乏关于不同噪声类型如何影响这些模型的跨任务系统理解。在这里，我们对最先进的卷积神经网络（CNN）在两种关键胸部X射线任务中，对模拟量子（泊松）和电子（高斯）噪声的稳健性进行了评估：语义分割和肺部疾病分类。我们采用一种新的可扩展噪声注入框架，在公共数据集（Landmark、ChestX-ray14）上对常见架构（UNet、DeepLabV3、FPN；ResNet、DenseNet、EfficientNet）应用了受控的、临床激励的噪声严重程度。我们的结果揭示了任务稳健性的鲜明差异。语义分割模型证明其高度脆弱，在严重的电子噪声下，肺部分割性能大幅下降（Dice相似系数下降0.843），这意味着模型近乎完全失效。相比之下，分类任务表现出更大的整体韧性，但这种稳健性并不统一。我们发现了一种不同的脆弱性：某些任务，如区分气胸和肺不张，在量子噪声下发生灾难性失败（AUROC下降0.355），而其他任务则更容易受到电子噪声的影响。这些结果表明，虽然分类模型具有一定的固有稳健性，但像素级分割任务却更加脆弱。任务和噪声特定的模型失败性质强调了在诊断人工智能的临床部署之前，需要有针对性的验证和缓解策略。

论文及项目相关链接

PDF Accepted to ARRS 2026 Annual Meeting

Summary：深度学习模型在放射学分析中的应用日益广泛，但在处理临床图像中的随机噪声时面临挑战。本研究评估了最先进的卷积神经网络对模拟量子（泊松）和电子（高斯）噪声的鲁棒性，涉及语义分割和肺部疾病分类两项关键任务。通过新型的可扩展噪声注入框架，在公共数据集上应用受控的临床噪声严重程度测试常见架构。结果显示，语义分割模型易受噪声影响，电子噪声严重时几乎完全失效。相比之下，分类任务整体更具韧性，但并非一致。研究揭示了分类特定任务的脆弱性，某些任务在量子噪声和电子噪声下会出现不同程度的失效。这些发现强调了诊断人工智能在临床部署前进行针对性验证和缓解策略的紧迫需求。

Key Takeaways：

深度学习模型在处理临床图像中的随机噪声时面临挑战。
卷积神经网络对量子噪声和电子噪声的鲁棒性在语义分割和肺部疾病分类任务中受到评估。
语义分割模型对噪声高度敏感，电子噪声严重影响性能。
分类任务整体更具韧性，但特定任务在量子噪声和电子噪声下存在不同程度的失效。
不同类型和任务下的模型失效突显了针对性的验证和缓解策略的重要性。
分类模型的内在稳健性存在差异，而像素级分割任务更为脆弱。

Cool Papers

点此查看论文截图

Rethinking Weak-to-Strong Augmentation in Source-Free Domain Adaptive Object Detection

Authors:Song Tang, Jiuzheng Yang, Mao Ye, Boyu Wang, Yan Gan, Xiatian Zhu

Strong data augmentation is a fundamental component of state-of-the-art mean teacher-based Source-Free domain adaptive Object Detection (SFOD) methods, enabling consistency-based self-supervised optimization along weak augmentation. However, our theoretical analysis and empirical observations reveal a critical limitation: strong augmentation can inadvertently erase class-relevant components, leading to artificial inter-category confusion. To address this issue, we introduce Weak-to-strong Semantics Compensation (WSC), a novel remedy that leverages weakly augmented images, which preserve full semantics, as anchors to enrich the feature space of their strongly augmented counterparts. Essentially, this compensates for the class-relevant semantics that may be lost during strong augmentation on the fly. Notably, WSC can be implemented as a generic plug-in, easily integrable with any existing SFOD pipelines. Extensive experiments validate the negative impact of strong augmentation on detection performance, and the effectiveness of WSC in enhancing the performance of previous detection models on standard benchmarks.

强数据增强是最新基于无监督自适应目标检测（Source-Free Object Detection，SFOD）方法的根本组成部分，它能够通过一致性实现基于弱增强的自监督优化。然而，我们的理论分析和实证观察揭示了一个关键局限性：强增强可能会无意中删除类别相关的组件，导致类别间混淆。为了解决这个问题，我们引入了弱到强的语义补偿（Weak-to-strong Semantics Compensation，WSC），这是一种新型补救方法，它通过利用保留了完整语义的弱增强图像作为锚点来丰富强增强特征空间的内容。本质上，这补偿了在强增强过程中可能丢失的类别相关语义信息。值得注意的是，WSC可以作为一种通用插件实现，可以轻松地集成到任何现有的SFOD管道中。大量实验验证了强增强对检测性能的负面影响以及WSC在标准基准测试上增强先前检测模型性能的有效性。

论文及项目相关链接

PDF

Summary

本文指出在基于均值教师的无源域自适应目标检测（SFOD）方法中，强数据增强是关键组成部分，可实现基于一致性的自监督优化。然而，理论分析和实验观察发现强增强会无意中删除类相关组件，导致人工跨类别混淆。为解决这一问题，本文引入弱到强的语义补偿（WSC）新方法，利用保留完整语义的弱增强图像作为锚点，丰富其强增强对应物的特征空间。语义补偿可实时补偿强增强过程中可能丢失的类相关语义。广泛的实验验证了强增强对检测性能的负面影响，以及WSC在提高先前检测模型在标准基准测试上的性能方面的有效性。

Key Takeaways

强数据增强是SFOD方法的核心组成部分，有助于实现自监督优化。
强增强可能会删除类相关组件，导致人工跨类别混淆。
引入WSC方法，利用弱增强图像作为锚点，丰富强增强图像的特征空间。
WSC可以补偿强增强过程中可能丢失的类相关语义。
WSC可以作为通用插件实现，易于集成到任何现有的SFOD管道中。
实验表明，强增强对检测性能有负面影响。

Cool Papers

点此查看论文截图

Investigating Long-term Training for Remote Sensing Object Detection

Authors:JongHyun Park, Yechan Kim, Moongu Jeon

Recently, numerous methods have achieved impressive performance in remote sensing object detection, relying on convolution or transformer architectures. Such detectors typically have a feature backbone to extract useful features from raw input images. A common practice in current detectors is initializing the backbone with pre-trained weights available online. Fine-tuning the backbone is typically required to generate features suitable for remote-sensing images. While the prolonged training could lead to over-fitting, hindering the extraction of basic visual features, it can enable models to gradually extract deeper insights and richer representations from remote sensing data. Striking a balance between these competing factors is critical for achieving optimal performance. In this study, we aim to investigate the performance and characteristics of remote sensing object detection models under very long training schedules, and propose a novel method named Dynamic Backbone Freezing (DBF) for feature backbone fine-tuning on remote sensing object detection under long-term training. Our method addresses the dilemma of whether the backbone should extract low-level generic features or possess specific knowledge of the remote sensing domain, by introducing a module called ‘Freezing Scheduler’ to manage the update of backbone features during long-term training dynamically. Extensive experiments on DOTA and DIOR-R show that our approach enables more accurate model learning while substantially reducing computational costs in long-term training. Besides, it can be seamlessly adopted without additional effort due to its straightforward design. The code is available at https://github.com/unique-chan/dbf.

最近，许多方法已在遥感目标检测方面取得了令人印象深刻的性能，这些方法依赖于卷积或transformer架构。此类检测器通常具有特征主干，可从原始输入图像中提取有用特征。当前检测器中的一种常见做法是使用在线可用的预训练权重来初始化主干。通常需要对主干进行微调，以生成适合遥感图像的特征。虽然长时间的训练可能会导致过拟合，阻碍基本视觉特征的提取，但它可以使模型逐渐从遥感数据中提取更深刻的理解和更丰富的表示。在这两种相互竞争的因素之间取得平衡对于实现最佳性能至关重要。在这项研究中，我们旨在研究遥感目标检测模型在非常长的训练计划下的性能和特点，并提出了一种名为动态主干冻结（DBF）的新方法，用于在长期训练中对遥感目标检测的特征主干进行微调。我们的方法通过引入一个名为“冻结调度程序”的模块来管理长期训练期间主干特征的更新，解决了主干应该提取低级通用特征还是具备遥感领域的特定知识的问题。在DOTA和DIOR-R上的大量实验表明，我们的方法使模型学习更加准确，同时大大降低了长期训练的计算成本。此外，由于设计直观，它可以无缝采用，无需额外努力。代码可在https://github.com/unique-chan/dbf找到。

论文及项目相关链接

PDF Accepted to Machine Vision and Applications (MVA)

Summary
本文探讨远程遥感物体检测模型在长期训练下的性能与特性，并提出一种名为动态主干冻结（DBF）的新方法，用于在长期训练中对遥感物体检测的特征主干进行微调。该方法通过引入称为“冻结调度器”的模块来管理主干特征在长期训练中的更新，解决了主干是否应提取低级别的通用特征还是具备遥感领域的特定知识的问题。在DOTA和DIOR-R上的大量实验表明，该方法能够在长期训练中实现更精确模型学习，并大幅降低计算成本。

Key Takeaways

近期方法在遥感物体检测方面取得了显著成效，主要依赖于卷积或transformer架构。
当前检测器通常使用在线可用的预训练权重来初始化主干。
长期训练可能导致过拟合，但也能使模型逐渐提取更深入和丰富的表示。
平衡这些竞争因素对于实现最佳性能至关重要。
本文旨在研究遥感物体检测模型在长期训练下的性能特性，并提出一种名为动态主干冻结（DBF）的方法。
DBF方法通过引入冻结调度器来管理主干特征的更新，解决了是否应提取通用特征还是遥感特定知识的问题。

Cool Papers

点此查看论文截图

Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection

Authors:Zixing Li, Chao Yan, Zhen Lan, Xiaojia Xiang, Han Zhou, Jun Lai, Dengqing Tang

Advanced cognition can be extracted from the human brain using brain-computer interfaces. Integrating these interfaces with computer vision techniques, which possess efficient feature extraction capabilities, can achieve more robust and accurate detection of dim targets in aerial images. However, existing target detection methods primarily concentrate on homogeneous data, lacking efficient and versatile processing capabilities for heterogeneous multimodal data. In this paper, we first build a brain-eye-computer based object detection system for aerial images under few-shot conditions. This system detects suspicious targets using region proposal networks, evokes the event-related potential (ERP) signal in electroencephalogram (EEG) through the eye-tracking-based slow serial visual presentation (ESSVP) paradigm, and constructs the EEG-image data pairs with eye movement data. Then, an adaptive modality balanced online knowledge distillation (AMBOKD) method is proposed to recognize dim objects with the EEG-image data. AMBOKD fuses EEG and image features using a multi-head attention module, establishing a new modality with comprehensive features. To enhance the performance and robust capability of the fusion modality, simultaneous training and mutual learning between modalities are enabled by end-to-end online knowledge distillation. During the learning process, an adaptive modality balancing module is proposed to ensure multimodal equilibrium by dynamically adjusting the weights of the importance and the training gradients across various modalities. The effectiveness and superiority of our method are demonstrated by comparing it with existing state-of-the-art methods. Additionally, experiments conducted on public datasets and system validations in real-world scenarios demonstrate the reliability and practicality of the proposed system and the designed method.

利用脑机接口可以从人脑中提取高级认知。将这些接口与拥有高效特征提取能力的计算机视觉技术相结合，可以在航空图像中实现更稳健和准确的对微弱目标的检测。然而，现有的目标检测方法主要集中在同质数据上，缺乏对异质多模态数据的高效和通用处理能力。在本文中，我们首先构建了一个基于脑-眼-计算机的少样本条件下航空图像目标检测系统。该系统利用区域提议网络检测可疑目标，通过基于眼动的慢速序列视觉呈现（ESSVP）范式激发脑电图（EEG）中的事件相关电位（ERP）信号，并构建EEG-图像数据对以及眼动数据。然后，提出了一种自适应模态平衡在线知识蒸馏（AMBOKD）方法来识别微弱的物体。AMBOKD使用多头注意力模块融合EEG和图像特征，建立具有综合特征的新模态。为了提升融合模态的性能和稳健性，通过端到端的在线知识蒸馏使模态之间的同时训练和相互学习成为可能。在学习过程中，提出了一种自适应模态平衡模块，通过动态调整不同模态之间的重要性和训练梯度的权重，以确保多模态的平衡。通过与现有先进方法的比较，证明了我们的方法的有效性和优越性。此外，在公共数据集上进行的实验以及在真实场景中的系统验证证明了所提系统和方法的可靠性和实用性。

论文及项目相关链接

PDF 18 pages,15 figures

Summary：
先进的认知技术能够从人脑中通过脑机接口提取出来，结合计算机视觉技术，可有效提高对空中图像中微弱目标的检测准确性和稳健性。然而，现有目标检测方法主要关注同质数据，缺乏处理异质多模态数据的高效通用能力。本文构建了一个基于脑-眼-计算机的空中图像少样本条件下的目标检测系统，利用区域提议网络检测可疑目标，通过眼动追踪的慢速序列视觉呈现范式激发脑电图中的事件相关电位信号，并构建包含眼动数据的脑电图-图像数据对。随后提出了一种自适应模态平衡在线知识蒸馏方法，用于识别微弱的物体。该方法通过多头注意力模块融合脑电图和图像特征，建立具有综合特征的新模态。同时训练不同模态并进行相互学习，通过端到端的在线知识蒸馏增强融合模态的性能和稳健性。自适应模态平衡模块动态调整不同模态的重要性和训练梯度的权重，确保多模态平衡。该方法的有效性及优越性通过对比现有先进方法得以证明，并在公开数据集及真实场景中的实验验证了所提系统和方法的可靠性和实用性。

Key Takeaways：

利用脑机接口和计算机视觉技术提高空中图像中微弱目标的检测准确性和稳健性。
构建基于脑-眼-计算机的目标检测系统用于空中图像少样本条件下的目标检测。
通过区域提议网络检测可疑目标，并激发脑电图中的事件相关电位信号。
提出自适应模态平衡在线知识蒸馏方法用于识别微弱物体，融合脑电图和图像特征。
通过多头注意力模块实现特征融合，并建立新的综合特征模态。
通过端到端的在线知识蒸馏增强融合模态的性能和稳健性，实现不同模态的同时训练和相互学习。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-10-02/%E6%A3%80%E6%B5%8B_%E5%88%86%E5%89%B2_%E8%B7%9F%E8%B8%AA/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

检测/分割/跟踪

无监督/半监督/对比学习

无监督/半监督/对比学习方向最新论文已更新，请持续关注 Update in 2025-10-02 Generalized Contrastive Learning for Universal Multimodal Retrieval

2025-10-02 无监督/半监督/对比学习

无监督/半监督/对比学习

Vision Transformer

Vision Transformer 方向最新论文已更新，请持续关注 Update in 2025-10-02 GastroViT A Vision Transformer Based Ensemble Learning Approach for Gastrointestinal Disease Classification with Grad CAM & SHAP Visualization

2025-10-02 Vision Transformer

Vision Transformer