发布日期: 2025-09-16

更新日期: 2025-10-07

文章字数: 1.9k

阅读时长: 7 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-16 更新

Attacking Attention of Foundation Models Disrupts Downstream Tasks

Authors:Hondamunige Prasanna Silva, Federico Becattini, Lorenzo Seidenari

Foundation models represent the most prominent and recent paradigm shift in artificial intelligence. Foundation models are large models, trained on broad data that deliver high accuracy in many downstream tasks, often without fine-tuning. For this reason, models such as CLIP , DINO or Vision Transfomers (ViT), are becoming the bedrock of many industrial AI-powered applications. However, the reliance on pre-trained foundation models also introduces significant security concerns, as these models are vulnerable to adversarial attacks. Such attacks involve deliberately crafted inputs designed to deceive AI systems, jeopardizing their reliability. This paper studies the vulnerabilities of vision foundation models, focusing specifically on CLIP and ViTs, and explores the transferability of adversarial attacks to downstream tasks. We introduce a novel attack, targeting the structure of transformer-based architectures in a task-agnostic fashion. We demonstrate the effectiveness of our attack on several downstream tasks: classification, captioning, image/text retrieval, segmentation and depth estimation. Code available at:https://github.com/HondamunigePrasannaSilva/attack-attention

基础模型代表了人工智能领域中最突出、最新的范式转变。基础模型是在广泛数据上训练的的大型模型，在许多下游任务中都能实现高精度，而且通常不需要微调。因此，CLIP、DINO或视觉转换器（ViT）等模型正成为许多工业级人工智能应用程序的基础。然而，对预训练基础模型的依赖也引发了重大的安全隐患，因为这些模型容易受到对抗性攻击的影响。这些攻击涉及故意设计的输入，旨在欺骗AI系统，从而危及它们的可靠性。本文研究了视觉基础模型的漏洞，重点关注CLIP和ViT，并探讨了对抗性攻击对下游任务的可转移性。我们针对基于变压器架构的结构设计了一种新型攻击，该攻击具有任务无关性。我们在多个下游任务上展示了攻击的有效性：分类、描述、图像/文本检索、分割和深度估计。代码可在https://github.com/HondamunigePrasannaSilva/attack-attention找到。

论文及项目相关链接

PDF Paper published at CVPR 2025 Workshop Advml

Summary

大型预训练模型如CLIP、DINO和Vision Transfomers（ViT）是AI领域最新的范式转变的代表，它们广泛应用于多种下游任务并展现出高准确性。然而，这些模型也存在安全隐患，容易受到对抗攻击的影响。本文专注于研究视觉预训练模型的脆弱性，特别是CLIP和ViT，并探索了对抗攻击在下游任务中的可迁移性。文章提出了一种新型攻击方法，针对基于变压器的架构结构进行任务无关的攻击。实验证明，该攻击在多个下游任务中均有效，包括分类、描述、图像/文本检索、分割和深度估计。

Key Takeaways

基金会模型（如CLIP、DINO和ViT）是AI领域的核心，广泛应用于多种下游任务。
这些预训练模型容易受到对抗攻击的影响，存在安全隐患。
本文专注于研究视觉预训练模型的脆弱性，特别是CLIP和ViT模型的对抗攻击研究。
文章提出了一种新型攻击方法，该方法针对基于变压器的架构结构进行任务无关的攻击。
该攻击方法在多个下游任务中均有效，包括分类、描述、图像/文本检索、分割和深度估计。
对抗攻击的转移性是研究的关键，意味着攻击方法可以从一个任务迁移到另一个任务。
代码已公开可用，为进一步研究提供了基础。

Cool Papers

点此查看论文截图

Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation

Authors:Hongji Yang, Yucheng Zhou, Wencheng Han, Jianbing Shen

Text-to-image models are powerful for producing high-quality images based on given text prompts, but crafting these prompts often requires specialized vocabulary. To address this, existing methods train rewriting models with supervision from large amounts of manually annotated data and trained aesthetic assessment models. To alleviate the dependence on data scale for model training and the biases introduced by trained models, we propose a novel prompt optimization framework, designed to rephrase a simple user prompt into a sophisticated prompt to a text-to-image model. Specifically, we employ the large vision language models (LVLMs) as the solver to rewrite the user prompt, and concurrently, employ LVLMs as a reward model to score the aesthetics and alignment of the images generated by the optimized prompt. Instead of laborious human feedback, we exploit the prior knowledge of the LVLM to provide rewards, i.e., AI feedback. Simultaneously, the solver and the reward model are unified into one model and iterated in reinforcement learning to achieve self-improvement by giving a solution and judging itself. Results on two popular datasets demonstrate that our method outperforms other strong competitors.

文本到图像模型能够根据给定的文本提示生成高质量图像，但制作这些提示通常需要专业词汇。为解决这一问题，现有方法通过大量手动注释数据的监督来训练改写模型，并训练审美评估模型。为减轻模型训练对数据规模的依赖以及训练模型所带来的偏见，我们提出了一种新型提示优化框架，旨在将简单用户提示改写为复杂提示以供文本到图像模型使用。具体来说，我们采用大型视觉语言模型（LVLM）作为求解器来改写用户提示，同时，我们还将LVLM用作奖励模型，对优化提示生成的图像的美学和一致性进行评分。我们不需要繁琐的人工反馈，而是利用LVLM的先验知识提供奖励，即AI反馈。同时，求解器和奖励模型被合并到一个模型中，通过强化学习进行迭代，通过自我判断解决方案来实现自我提升。在两个流行数据集上的结果表明，我们的方法优于其他强大竞争对手。

论文及项目相关链接

PDF Accepted by ACL2025 Findings

Summary

本文提出了一种新颖的提示优化框架，用于将用户简单提示改写为对文本到图像模型的复杂提示。为解决需要大量手动标注数据和训练审美评估模型的问题，利用大型视觉语言模型（LVLMs）作为求解器进行提示重写，并利用LVLMs作为奖励模型对优化提示生成的图像进行美学和对齐评分。通过利用LVLM的先验知识提供奖励（即AI反馈），避免繁琐的人工反馈。求解器和奖励模型被整合到一个模型中，通过强化学习进行迭代和自我改进。结果证明，该方法在流行数据集上的表现优于其他竞争对手。

Key Takeaways