发布日期: 2025-08-19

更新日期: 2025-09-08

文章字数: 1.1k

阅读时长: 4 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-08-19 更新

Authors:Bin Ma, Yifei Zhang, Yongjin Xian, Qi Li, Linna Zhou, Gongxun Miao

Existing rumor detection methods often neglect the content within images as well as the inherent relationships between contexts and images across different visual scales, thereby resulting in the loss of critical information pertinent to rumor identification. To address these issues, this paper presents a novel cross-modal rumor detection scheme based on contrastive learning, namely the Multi-scale Image and Context Correlation exploration algorithm (MICC). Specifically, we design an SCLIP encoder to generate unified semantic embeddings for text and multi-scale image patches through contrastive pretraining, enabling their relevance to be measured via dot-product similarity. Building upon this, a Cross-Modal Multi-Scale Alignment module is introduced to identify image regions most relevant to the textual semantics, guided by mutual information maximization and the information bottleneck principle, through a Top-K selection strategy based on a cross-modal relevance matrix constructed between the text and multi-scale image patches. Moreover, a scale-aware fusion network is designed to integrate the highly correlated multi-scale image features with global text features by assigning adaptive weights to image regions based on their semantic importance and cross-modal relevance. The proposed methodology has been extensively evaluated on two real-world datasets. The experimental results demonstrate that it achieves a substantial performance improvement over existing state-of-the-art approaches in rumor detection, highlighting its effectiveness and potential for practical applications.

现有的谣言检测方式往往忽略了图像内容以及不同视觉尺度下上下文与图像之间的内在关联，从而导致了与谣言识别相关的关键信息的丢失。为了解决这些问题，本文提出了一种基于对比学习的跨模态谣言检测方案，即多尺度图像与上下文关联探索算法（MICC）。具体来说，我们设计了一个SCLIP编码器，通过对比预训练生成文本和多尺度图像补丁的统一语义嵌入，使它们的相关性可以通过点积相似性来衡量。在此基础上，引入了一个跨模态多尺度对齐模块，通过互信息最大化与信息瓶颈原理的指导，通过文本和多尺度图像补丁之间构建的跨模态关联矩阵的Top-K选择策略，来识别与文本语义最相关的图像区域。此外，设计了一个尺度感知融合网络，通过根据图像区域的语义重要性和跨模态相关性分配自适应权重，将高度相关的多尺度图像特征与全局文本特征进行融合。该方法在两个真实数据集上进行了广泛评估。实验结果表明，与现有的最先进的谣言检测方法相比，该方法在谣言检测方面取得了显著的性能提升，凸显了其有效性和实际应用潜力。

论文及项目相关链接

PDF

Summary
基于对比学习，提出一种多尺度图像与文本上下文关联探索算法（MICC），用于跨模态谣言检测。通过设计SCLIP编码器生成文本和多尺度图像的统一语义嵌入，并利用跨模态多尺度对齐模块识别与文本语义最相关的图像区域。同时设计了一个尺度感知融合网络，将高度相关的多尺度图像特征与全局文本特征进行融合。实验结果表明，该方法在谣言检测上取得了显著的性能提升。

Key Takeaways