发布日期: 2025-09-12

更新日期: 2025-10-07

文章字数: 1.9k

阅读时长: 7 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-12 更新

EfficientIML: Efficient High-Resolution Image Manipulation Localization

Authors:Jinhan Li, Haoyang He, Lei Xie, Jiangning Zhang

With imaging devices delivering ever-higher resolutions and the emerging diffusion-based forgery methods, current detectors trained only on traditional datasets (with splicing, copy-moving and object removal forgeries) lack exposure to this new manipulation type. To address this, we propose a novel high-resolution SIF dataset of 1200+ diffusion-generated manipulations with semantically extracted masks. However, this also imposes a challenge on existing methods, as they face significant computational resource constraints due to their prohibitive computational complexities. Therefore, we propose a novel EfficientIML model with a lightweight, three-stage EfficientRWKV backbone. EfficientRWKV’s hybrid state-space and attention network captures global context and local details in parallel, while a multi-scale supervision strategy enforces consistency across hierarchical predictions. Extensive evaluations on our dataset and standard benchmarks demonstrate that our approach outperforms ViT-based and other SOTA lightweight baselines in localization performance, FLOPs and inference speed, underscoring its suitability for real-time forensic applications.

随着成像设备提供更高的分辨率和新兴的扩散式伪造方法的出现，当前仅在传统数据集（包括拼接、复制移动和对象移除伪造）上训练的检测器缺乏对此类新操作类型的了解。为了解决这一问题，我们提出了一个包含1200多个扩散生成操作的高分辨率SIF数据集，其中还包括语义提取的掩膜。然而，这也给现有方法带来了挑战，因为它们面临着巨大的计算资源约束，计算复杂度过高。因此，我们提出了一种新型的EfficientIML模型，具有轻量级的三阶段EfficientRWKV主干网。EfficientRWKV的混合状态空间和注意力网络能够并行捕获全局上下文和局部细节，而多尺度监督策略则确保了层次预测之间的一致性。在我们自己的数据集和标准基准测试上的广泛评估表明，我们的方法在定位性能、FLOPs和推理速度上超越了基于ViT和其他最先进轻量级基准测试，这突显了其在实时取证应用中的适用性。

论文及项目相关链接

PDF

Summary
高解析度成像技术与扩散生成操纵技术不断发展，现有检测器仅针对传统数据集训练，难以应对新类型操纵。为此，我们提出新型高解析度SIF数据集与EfficientIML模型。EfficientIML采用轻量级三阶段EfficientRWKV骨干网，结合全局上下文与局部细节捕捉能力强的混合状态空间与注意力网络，并采用多尺度监督策略确保层次预测的一致性。在数据集和标准基准测试上的广泛评估显示，该方法在定位性能、浮点运算量和推理速度方面优于ViT和其他轻量级基线方法，适合实时取证应用。

Key Takeaways

高解析度成像与扩散生成操纵技术挑战现有检测器。
现有检测器缺乏对新类型操纵的暴露。
引入新型高解析度SIF数据集以应对此挑战。
EfficientIML模型采用轻量级三阶段EfficientRWKV骨干网应对计算资源约束。
EfficientRWKV结合全局上下文与局部细节捕捉能力。
多尺度监督策略确保层次预测的一致性。

Cool Papers

点此查看论文截图

RepViT-CXR: A Channel Replication Strategy for Vision Transformers in Chest X-ray Tuberculosis and Pneumonia Classification

Authors:Faisal Ahmed

Chest X-ray (CXR) imaging remains one of the most widely used diagnostic tools for detecting pulmonary diseases such as tuberculosis (TB) and pneumonia. Recent advances in deep learning, particularly Vision Transformers (ViTs), have shown strong potential for automated medical image analysis. However, most ViT architectures are pretrained on natural images and require three-channel inputs, while CXR scans are inherently grayscale. To address this gap, we propose RepViT-CXR, a channel replication strategy that adapts single-channel CXR images into a ViT-compatible format without introducing additional information loss. We evaluate RepViT-CXR on three benchmark datasets. On the TB-CXR dataset,our method achieved an accuracy of 99.9% and an AUC of 99.9%, surpassing prior state-of-the-art methods such as Topo-CXR (99.3% accuracy, 99.8% AUC). For the Pediatric Pneumonia dataset, RepViT-CXR obtained 99.0% accuracy, with 99.2% recall, 99.3% precision, and an AUC of 99.0%, outperforming strong baselines including DCNN and VGG16. On the Shenzhen TB dataset, our approach achieved 91.1% accuracy and an AUC of 91.2%, marking a performance improvement over previously reported CNN-based methods. These results demonstrate that a simple yet effective channel replication strategy allows ViTs to fully leverage their representational power on grayscale medical imaging tasks. RepViT-CXR establishes a new state of the art for TB and pneumonia detection from chest X-rays, showing strong potential for deployment in real-world clinical screening systems.

胸部X光（CXR）成像仍然是检测肺结核（TB）和肺炎等肺部疾病的最广泛使用的诊断工具之一。最近深度学习的进步，尤其是视觉转换器（ViTs）显示出用于自动化医学图像分析的强大潜力。然而，大多数ViT架构都是在自然图像上进行预训练的，需要三通道输入，而CXR扫描本质上是灰度的。为了解决这一差距，我们提出了RepViT-CXR，这是一种通道复制策略，它可以将单通道CXR图像适应为ViT兼容格式，而不会引入额外的信息损失。我们在三个基准数据集上评估了RepViT-CXR。在TB-CXR数据集上，我们的方法达到了99.9%的准确率和99.9%的AUC，超越了先前的最新方法，如Topo-CXR（准确率为99.3%，AUC为99.8%）。对于小儿肺炎数据集，RepViT-CXR获得了99.0%的准确率，召回率为99.2%，精确度为99.3%，AUC为99.0%，优于DCNN和VGG16等强大的基线。在深圳结核数据集上，我们的方法达到了91.1%的准确率和91.2%的AUC，相较于之前报道的基于CNN的方法，性能有所提升。这些结果表明，一种简单而有效的通道复制策略允许ViTs在灰度医学成像任务上充分利用其表示能力。RepViT-CXR为从胸部X光片中检测肺结核和肺炎树立了新的标杆，显示出在现实世界临床筛查系统中部署的强大潜力。

论文及项目相关链接

PDF 10 pages, 5 figures

Summary

文中提出一种名为RepViT-CXR的渠道复制策略，该策略可将单通道的CXR图像转化为适用于ViT的格式，且不会引入额外的信息损失。在三个基准数据集上的评估结果表明，RepViT-CXR在TB-CXR数据集上实现了高达99.9%的准确率和AUC，在Pediatric Pneumonia数据集上取得了较高的准确率和性能表现，并在深圳的TB数据集上取得了优于先前CNN方法的性能。这些结果证明了ViT在灰度医学成像任务上的强大潜力。

Key Takeaways

RepViT-CXR策略解决了ViT模型在单通道CXR图像应用中的兼容性问题。
RepViT-CXR通过渠道复制策略将单通道CXR图像转化为ViT兼容格式，且不会引入额外信息损失。
在TB-CXR数据集上，RepViT-CXR达到了很高的准确率和AUC，超过了现有的先进方法。
在Pediatric Pneumonia数据集上，RepViT-CXR表现出较高的准确率和性能表现。
在深圳的TB数据集上，RepViT-CXR优于先前基于CNN的方法。
结果证明了ViT在灰度医学图像分析中的强大潜力。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-09-12/Vision%20Transformer/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

Vision Transformer

检测/分割/跟踪

检测/分割/跟踪方向最新论文已更新，请持续关注 Update in 2025-09-12 Vision-Language Semantic Aggregation Leveraging Foundation Model for Generalizable Medical Image Segmentation

2025-09-12 检测/分割/跟踪

检测/分割/跟踪

视频理解

视频理解方向最新论文已更新，请持续关注 Update in 2025-09-12 AdsQA Towards Advertisement Video Understanding

2025-09-12 视频理解

视频理解