发布日期: 2025-10-07

更新日期: 2025-11-27

文章字数: 1.7k

阅读时长: 7 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-10-07 更新

MaskCD: Mitigating LVLM Hallucinations by Image Head Masked Contrastive Decoding

Authors:Jingyuan Deng, Yujiu Yang

Large vision-language models (LVLMs) have shown remarkable performance in visual-language understanding for downstream multimodal tasks. While their capabilities are improving, problems emerge simultaneously. Among those problems, the hallucinations have attracted much attention, which stands for the phenomenon where LVLMs generate contradictory content to their input visual and text contents. Many approaches have been proposed to deal with this issue, such as contrastive decoding and attention manipulation. However, contrastive decoding methods struggle in constructing appropriate contrastive samples, and attention manipulation methods are highly sensitive, lacking stability. In this work, we propose image head Masked Contrastive Decoding (MaskCD). Our approach utilizes the “image heads” in LVLMs, masking them to construct contrastive samples for contrastive decoding. We evaluated MaskCD on LLaVA-1.5-7b and Qwen-VL-7b, using various benchmarks such as CHAIR, POPE, AMBER and MME. The results demonstrate that MaskCD effectively alleviates the phenomenon of hallucinations and retains the general capabilities of LVLMs. Corresponding resources could be found at: https://github.com/Deng-Jingyuan/MaskCD .

大型视觉语言模型（LVLMs）在下游多模态任务的视觉语言理解方面表现出了显著的性能。随着其能力不断提升，问题也随之出现。其中，幻觉问题引起了广泛关注，即LVLMs会产生与其输入的视觉和文本内容相矛盾的内容的现象。为解决这一问题，已经提出了多种方法，如对比解码和注意力操纵。然而，对比解码方法在构建适当的对比样本方面存在困难，而注意力操纵方法则高度敏感，缺乏稳定性。在本研究中，我们提出了图像头掩码对比解码（MaskCD）。我们的方法利用LVLMs中的“图像头”，对其进行掩码以构建对比样本进行对比解码。我们在LLaVA-1.5-7b和Qwen-VL-7b上评估了MaskCD，使用CHAIR、POPE、AMBER和MME等各种基准测试。结果表明，MaskCD有效缓解了幻觉现象，并保留了LVLMs的一般能力。相关资源可在：https://github.com/Deng-Jingyuan/MaskCD找到。

论文及项目相关链接

PDF accepted to emnlp2025 findings

Summary

大型视觉语言模型（LVLMs）在多模态下游任务中展现出卓越的理解能力，但伴随性能提升的同时，也出现了诸如幻觉等问题。本文提出一种名为MaskCD的图像头掩码对比解码方法，通过利用LVLMs中的图像头进行掩码以构建对比样本，有效缓解幻觉现象并保留LVLMs的一般能力。

Key Takeaways

LVLMs在多模态下游任务中表现出色，但存在幻觉问题。
幻觉现象表现为LVLMs生成与输入视觉和文本内容相矛盾的内容。
现有方法如对比解码和注意力操控被提出以解决此问题，但存在构建合适对比样本困难或稳定性不足的问题。
本文提出MaskCD方法，利用LVLMs中的图像头进行掩码以构建对比样本进行解码。
MaskCD在LLaVA-1.5-7b和Qwen-VL-7b上的实验结果表明，该方法能有效缓解幻觉现象并保留LVLMs的一般能力。
MaskCD的实现资源可以在相应链接找到：https://github.com/Deng-Jingyuan/MaskCD。

Cool Papers

点此查看论文截图

AstroECP: towards more practical Electron Channeling Contrast Imaging

Authors:M. Haroon Qaiser, Lukas Berners, Robin J. Scales, Tianbi Zhang, Martin Heller, Jiri Dluhos, Sandra Korte-Kerzel, T. Ben Britton

Electron channeling contrast imaging (ECCI) is a scanning electron microscopy (SEM) based technique that enables bulk-sample characterization of crystallographic defects (e.g. dislocations, stacking faults, low angle boundaries). Despite its potential, ECCI remains underused for quantitative defect analysis as compared to transmission electron microscope (TEM) based methods. Here, we overcome barriers that limit the use of ECCI including optimizing signal-to-noise contrast, precise determination of the incident beam vector with calibrated and easy to use simulations and experimental selected area electron channeling patterns (SA-ECP). We introduce a systematic ECCI workflow, alongside a new open-source software tool (AstroECP), that includes calibration of stage tilting, SA-ECP field of view, and the energy that forms the ECP/ECCI contrast using dynamical simulations. The functionality of this workflow is demonstrated with case studies that include threading dislocations in GaAs and the cross validation of precession based ECCI-contrast, which is otherwise known as Electron Channeling Orientation Determination (eCHORD). To assist the reader, we also provide best practice guidelines for ECCI implementation to promote high-resolution defect imaging in the SEM.

电子通道衬度成像（ECCI）是一种基于扫描电子显微镜（SEM）的技术，能够对晶体缺陷（如位错、层错、小角度边界）进行批量样品表征。尽管其潜力巨大，但与透射电子显微镜（TEM）方法相比，ECCI在定量缺陷分析方面的应用仍然较少。在这里，我们克服了限制ECCI使用的障碍，包括优化信噪比衬度、通过校准的模拟和实验选区电子通道模式（SA-ECP）精确确定入射光束矢量等。我们引入了一套系统的ECCI工作流程，以及一个新的开源软件工具（AstroECP），包括校准倾斜阶段、SA-ECP视野以及使用动态模拟形成ECP/ECCI衬度的能量。通过包括GaAs中的贯穿位错在内的案例研究，展示了该工作流程的功能，并对基于精度的ECCI对比度进行了交叉验证，也称为电子通道取向测定（eCHORD）。为了帮助读者，我们还提供了ECCI实施的最佳实践指南，以促进SEM中的高分辨率缺陷成像。

论文及项目相关链接

PDF as submitted version, post-peer review

Summary

ECCI技术是一种基于扫描电子显微镜（SEM）的技术，可用于批量样品晶体缺陷的表征。本文对ECCI进行了优化，包括提高信号噪声对比、精确确定入射光束矢量等。引入了一个系统的ECCI工作流程和新的开源软件工具AstroECP，通过动态模拟校准舞台倾斜、SA-ECP视野和形成ECP/ECCI对比的能量。同时，以GaAs中的位错和基于精度的ECCI对比的交叉验证等案例展示了该工作流程的功能，并提供最佳实践指南以促进SEM中高分辨率缺陷成像。

Key Takeaways