嘘~ 正在从服务器偷取页面 . . .

检测/分割/跟踪


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-02-12 更新

Enhancing Ground-to-Aerial Image Matching for Visual Misinformation Detection Using Semantic Segmentation

Authors:Matteo Mule, Matteo Pannacci, Ali Ghasemi Goudarzi, Francesco Pro, Lorenzo Papa, Luca Maiano, Irene Amerini

The recent advancements in generative AI techniques, which have significantly increased the online dissemination of altered images and videos, have raised serious concerns about the credibility of digital media available on the Internet and distributed through information channels and social networks. This issue particularly affects domains that rely heavily on trustworthy data, such as journalism, forensic analysis, and Earth observation. To address these concerns, the ability to geolocate a non-geo-tagged ground-view image without external information, such as GPS coordinates, has become increasingly critical. This study tackles the challenge of linking a ground-view image, potentially exhibiting varying fields of view (FoV), to its corresponding satellite image without the aid of GPS data. To achieve this, we propose a novel four-stream Siamese-like architecture, the Quadruple Semantic Align Net (SAN-QUAD), which extends previous state-of-the-art (SOTA) approaches by leveraging semantic segmentation applied to both ground and satellite imagery. Experimental results on a subset of the CVUSA dataset demonstrate significant improvements of up to 9.8% over prior methods across various FoV settings.

最近生成式人工智能技术的进展大大增加了网络上的改编图像和视频的散播,这引发了人们对互联网上可用数字媒体的可靠性的严重担忧,这些媒体通过信息渠道和社交网络进行传播。这一问题尤其影响到依赖可靠数据的领域,如新闻业、法医分析和地球观测。为了解决这些担忧,无需外部信息,如GPS坐标,对没有地理标签的地面视图图像进行地理定位的能力变得愈发关键。本研究解决了在没有GPS数据辅助的情况下,将视野可能各异的地面视图图像与其对应的卫星图像相关联的挑战。为实现这一目标,我们提出了一种新型的类似Siamese的四流架构——Quadruple Semantic Align Net (SAN-QUAD),它通过应用于地面和卫星图像的语义分割来扩展了之前最先进的(SOTA)方法。在CVUSA数据集的一个子集上的实验结果表明,在各种视野设置下,与之前的方法相比,最多提高了9.8%。

论文及项目相关链接

PDF 9 pages, 4 figures

Summary
最新发展的生成式AI技术导致网络传播的图像和视频被篡改,引发对互联网上数字媒体可信度的严重关注。为解决在不依赖GPS数据的情况下,将地面视图图像与其对应的卫星图像关联的问题,本研究提出了一种新的四流Siamese架构——Quadruple Semantic Align Net (SAN-QUAD),通过语义分割来提高地面和卫星图像的匹配度。实验结果显示,在CVUSA数据集的一个子集上,与先前的方法相比,该网络在不同视野设置下实现了高达9.8%的显著改善。

Key Takeaways

  1. 生成式AI技术的快速发展引发了互联网数字媒体的信任危机。
  2. 对不依赖GPS数据的地面视图图像与卫星图像的关联问题进行了关注。
  3. Quadruple Semantic Align Net (SAN-QUAD)是一个新的四流Siamese架构,用于处理此问题。
  4. SAN-QUAD扩展了现有的技术,采用了地面和卫星图像的语义分割。
  5. 实验结果显示SAN-QUAD在多种视野设置下显著提高匹配性能。
  6. 此项研究在CVUSA数据集的一个子集上进行了实验验证。

Cool Papers

点此查看论文截图

TACO: Training-free Sound Prompted Segmentation via Semantically Constrained Audio-visual CO-factorization

Authors:Hugo Malard, Michel Olvera, Stephane Lathuiliere, Slim Essid

Large-scale pre-trained audio and image models demonstrate an unprecedented degree of generalization, making them suitable for a wide range of applications. Here, we tackle the specific task of sound-prompted segmentation, aiming to segment image regions corresponding to objects heard in an audio signal. Most existing approaches tackle this problem by fine-tuning pre-trained models or by training additional modules specifically for the task. We adopt a different strategy: we introduce a training-free approach that leverages Non-negative Matrix Factorization (NMF) to co-factorize audio and visual features from pre-trained models so as to reveal shared interpretable concepts. These concepts are passed on to an open-vocabulary segmentation model for precise segmentation maps. By using frozen pre-trained models, our method achieves high generalization and establishes state-of-the-art performance in unsupervised sound-prompted segmentation, significantly surpassing previous unsupervised methods.

大规模预训练音频和图像模型表现出了前所未有的泛化程度,使其适用于广泛的应用。在这里,我们解决声音提示分割这一特定任务,旨在将听到的音频信号中的对象对应的图像区域进行分割。大多数现有方法通过微调预训练模型或针对任务训练特定模块来解决这个问题。我们采用了不同的策略:我们引入了一种无需训练的方法,该方法利用非负矩阵分解(NMF)来共同分解预训练模型的音频和视觉特征,以揭示共享的可解释概念。这些概念被传递给开放词汇分割模型,以生成精确的分割图。通过使用冻结的预训练模型,我们的方法实现了较高的泛化能力,并在无监督声音提示分割方面达到了最先进的性能,显著超越了之前的无监督方法。

论文及项目相关链接

PDF

Summary

本文介绍了一种基于预训练模型和NMF(非负矩阵分解)的无监督声音提示分割方法。该方法利用预训练模型的音频和视觉特征进行协同分解,揭示共享的可解释概念,并将其传递给开放词汇分割模型进行精确分割。通过冻结预训练模型,该方法实现了较高的泛化能力,并在无监督声音提示分割方面达到了最新性能水平。

Key Takeaways

以下是基于提供文本提取出的关键要点:

  • 大型预训练模型和高度泛化的应用广泛适用于音频和图像任务。
  • 本文针对声音提示分割任务提出了一种无训练策略,利用非负矩阵分解(NMF)来协同处理预训练模型的音频和视觉特征。
  • 通过揭示共享的可解释概念,该策略能够实现对图像区域中与听到的音频信号对应对象的分割。
  • 使用冻结的预训练模型实现高泛化能力。

Cool Papers

点此查看论文截图

Deep Fourier-embedded Network for RGB and Thermal Salient Object Detection

Authors:Pengfei Lyu, Pak-Hei Yeung, Xiaosheng Yu, Chengdong Wu, Jagath C. Rajapakse

The rapid development of deep learning has significantly improved salient object detection (SOD) combining both RGB and thermal (RGB-T) images. However, existing deep learning-based RGB-T SOD models suffer from two major limitations. First, Transformer-based models with quadratic complexity are computationally expensive and memory-intensive, limiting their application in high-resolution bi-modal feature fusion. Second, even when these models converge to an optimal solution, there remains a frequency gap between the prediction and ground-truth. To overcome these limitations, we propose a purely Fourier transform-based model, namely Deep Fourier-Embedded Network (DFENet), for accurate RGB-T SOD. To address the computational complexity when dealing with high-resolution images, we leverage the efficiency of fast Fourier transform with linear complexity to design three key components: (1) the Modal-coordinated Perception Attention, which fuses RGB and thermal modalities with enhanced multi-dimensional representation; (2) the Frequency-decomposed Edge-aware Block, which clarifies object edges by deeply decomposing and enhancing frequency components of low-level features; and (3) the Fourier Residual Channel Attention Block, which prioritizes high-frequency information while aligning channel-wise global relationships. To mitigate the frequency gap, we propose Co-focus Frequency Loss, which dynamically weights hard frequencies during edge frequency reconstruction by cross-referencing bi-modal edge information in the Fourier domain. Extensive experiments on four RGB-T SOD benchmark datasets demonstrate that DFENet outperforms fifteen existing state-of-the-art RGB-T SOD models. Comprehensive ablation studies further validate the value and effectiveness of our newly proposed components. The code is available at https://github.com/JoshuaLPF/DFENet.

深度学习的高速发展极大地提高了结合RGB和红外热成像(RGB-T)图像的显著性目标检测(SOD)。然而,现有的基于深度学习的RGB-T SOD模型存在两大局限性。首先,基于Transformer的模型计算复杂度高且内存消耗大,限制了其在高分辨率双模态特征融合中的应用。其次,即使这些模型收敛到最优解,预测和真实值之间仍存在频率差距。为了克服这些局限性,我们提出了一种基于纯傅里叶变换的模型,即Deep Fourier-Embedded Network(DFENet),用于准确的RGB-T SOD。针对处理高分辨率图像时的计算复杂度问题,我们利用具有线性复杂度的快速傅里叶变换的效率,设计了三个关键组件:(1)Modal-coordinated Perception Attention,通过增强多维表示融合RGB和热成像模态;(2)Frequency-decomposed Edge-aware Block,通过深入分解和增强低级特征的低频分量来澄清目标边缘;(3)Fourier Residual Channel Attention Block,优先处理高频信息,同时调整通道间的全局关系。为了缩小频率差距,我们提出了Co-focus Frequency Loss,它通过交叉引用傅立叶域中的双模态边缘信息,动态权重边缘频率重建过程中的硬频率。在四个RGB-T SOD基准数据集上的大量实验表明,DFENet优于现有的十五种最先进的RGB-T SOD模型。全面的消融研究进一步验证了我们的新提出组件的价值和有效性。代码可访问:https://github.com/JoshuaLPF/DFENet。

论文及项目相关链接

PDF 12 pages, 13 figures. Submitted to Journal on April 29, 2024

Summary

该文本介绍了深度学习在RGB与热成像(RGB-T)图像融合显著性目标检测(SOD)方面的进展及其面临的挑战。针对现有深度学习模型存在的问题,提出了一种基于傅里叶变换的纯模型DFENet。该模型利用快速傅里叶变换的高效率设计三个关键组件,以解决高分辨率图像处理的计算复杂性并减小预测与真实值之间的频率差距。实验证明,DFENet在四个RGB-T SOD基准数据集上的表现优于其他先进模型。

Key Takeaways

  1. 深度学习在RGB与热成像(RGB-T)图像融合显著性目标检测(SOD)方面取得显著进展。
  2. 现有深度学习模型面临计算复杂度高和内存消耗大的问题,特别是在高分辨率双模态特征融合方面。
  3. 提出了一种基于傅里叶变换的纯模型DFENet,通过快速傅里叶变换解决计算复杂性并减小预测与真实值之间的频率差距。
  4. DFENet包含三个关键组件:模态协调感知注意力、频率分解边缘感知块和傅里叶残差通道注意力块。
  5. 提出了一种名为“协同聚焦频率损失”的方法,通过动态加权硬频率来减小频率差距。
  6. 在四个RGB-T SOD基准数据集上的实验表明,DFENet的性能优于其他先进模型。

Cool Papers

点此查看论文截图

CISCA and CytoDArk0: a Cell Instance Segmentation and Classification method for histo(patho)logical image Analyses and a new, open, Nissl-stained dataset for brain cytoarchitecture studies

Authors:Valentina Vadori, Jean-Marie Graïc, Antonella Peruffo, Giulia Vadori, Livio Finos, Enrico Grisan

Delineating and classifying individual cells in microscopy tissue images is inherently challenging yet remains essential for advancements in medical and neuroscientific research. In this work, we propose a new deep learning framework, CISCA, for automatic cell instance segmentation and classification in histological slices. At the core of CISCA is a network architecture featuring a lightweight U-Net with three heads in the decoder. The first head classifies pixels into boundaries between neighboring cells, cell bodies, and background, while the second head regresses four distance maps along four directions. The outputs from the first and second heads are integrated through a tailored post-processing step, which ultimately produces the segmentation of individual cells. The third head enables the simultaneous classification of cells into relevant classes, if required. We demonstrate the effectiveness of our method using four datasets, including CoNIC, PanNuke, and MoNuSeg, which are publicly available H&Estained datasets that cover diverse tissue types and magnifications. In addition, we introduce CytoDArk0, the first annotated dataset of Nissl-stained histological images of the mammalian brain, containing nearly 40k annotated neurons and glia cells, aimed at facilitating advancements in digital neuropathology and brain cytoarchitecture studies. We evaluate CISCA against other state-of-the-art methods, demonstrating its versatility, robustness, and accuracy in segmenting and classifying cells across diverse tissue types, magnifications, and staining techniques. This makes CISCA well-suited for detailed analyses of cell morphology and efficient cell counting in both digital pathology workflows and brain cytoarchitecture research.

在显微镜组织图像中界定和分类单个细胞具有内在的挑战性,但对于医学和神经科学研究的发展仍然至关重要。在这项工作中,我们提出了一种新的深度学习框架CISCA,用于自动进行组织学切片中的细胞实例分割和分类。CISCA的核心是一种网络架构,该架构具有一个轻量级的U形网络结构,解码器部分有三个头。第一个头将像素分类为相邻细胞边界、细胞体以及背景,而第二个头则沿着四个方向回归四个距离图。第一和第二头的输出通过定制的后期处理步骤进行集成,最终产生单个细胞的分割结果。第三个头可以在需要时同时实现细胞的分类。我们使用四个数据集验证了我们的方法的有效性,包括公开可用的H&E染色数据集CoNIC、PanNuke和MoNuSeg,这些数据集涵盖了多种组织类型和放大倍数。此外,我们还介绍了CytoDArk0数据集,这是哺乳动物大脑Nissl染色组织学图像的首个注释数据集,包含近4万个注释神经元和胶质细胞,旨在为数字神经病理学(Neuropathology)和脑细胞结构研究的发展提供帮助。我们将CISCA与其他先进的方法进行了评估比较,证明了它在多种组织类型、放大倍数和染色技术中分割和分类细胞的通用性、稳健性和准确性。这使得CISCA非常适合于数字病理学工作流程和脑细胞结构研究中细胞形态的详细分析和有效的细胞计数。

论文及项目相关链接

PDF

Summary

本文提出一种新的深度学习框架CISCA,用于显微镜下的组织切片中的细胞实例自动分割和分类。CISCA的核心是一个具有三个解码器头部的轻量化U-Net网络架构。通过四个方向的距离图回归和像素分类,结合定制的后处理步骤,产生个体细胞的分割。同时,第三个头部可实现相关类别的细胞分类。在四个数据集上的实验表明,CISCA在分割和分类多种组织类型、放大倍数和染色技术中的细胞时,表现出通用性、鲁棒性和准确性。这为数字病理学工作流程和大脑细胞结构研究中详细的细胞形态分析和有效的细胞计数提供了良好的支持。

Key Takeaways

  • 提出新的深度学习框架CISCA用于自动细胞实例分割和分类。
  • CISCA采用轻量化U-Net网络架构,具有三个头部进行解码。
  • 通过像素分类和四个方向的距离图回归产生细胞分割。
  • 引入新的数据集CytoDArk0,包含近4万标注神经元和胶质细胞,旨在促进数字神经病理学和大脑细胞结构研究。
  • 在多个公开数据集上的实验验证了CISCA的通用性、鲁棒性和准确性。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
无监督/半监督/对比学习 无监督/半监督/对比学习
无监督/半监督/对比学习 方向最新论文已更新,请持续关注 Update in 2025-02-12 Prototype Contrastive Consistency Learning for Semi-Supervised Medical Image Segmentation
下一篇 
Vision Transformer Vision Transformer
Vision Transformer 方向最新论文已更新,请持续关注 Update in 2025-02-12 ViSIR Vision Transformer Single Image Reconstruction Method for Earth System Models
  目录