发布日期: 2025-04-24

更新日期: 2025-05-14

文章字数: 11.5k

阅读时长: 47 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-04-24 更新

Authors:Zhiyuan Hu, Shiyun Xiong, Yifan Zhang, See-Kiong Ng, Anh Tuan Luu, Bo An, Shuicheng Yan, Bryan Hooi

Recent advancements in visual language models (VLMs) have notably enhanced their capabilities in handling complex Graphical User Interface (GUI) interaction tasks. Despite these improvements, current frameworks often struggle to generate correct actions in challenging GUI environments. State-of-the-art commercial VLMs are black-boxes, and fine-tuning open-source VLMs for GUI tasks requires significant resources. Additionally, existing trajectory-level evaluation and refinement techniques frequently fall short due to delayed feedback and local optimization issues. To address these challenges, we propose an approach that guides VLM agents with process supervision by a reward model during GUI navigation and control at inference time. This guidance allows the VLM agent to optimize actions at each inference step, thereby improving performance in both static and dynamic environments. In particular, our method demonstrates significant performance gains in three GUI navigation tasks, achieving a 3.4% improvement in single step action accuracy for static environments, along with a around 33% increase in task success rate in one dynamic environment. With further integration of trajectory reflection and retry mechanisms, we also demonstrate even greater enhancement in task success.

近年来，视觉语言模型（VLMs）的进展显著增强了其在处理复杂的图形用户界面（GUI）交互任务方面的能力。尽管有了这些改进，当前框架在具有挑战性的GUI环境中往往难以生成正确的操作。最先进的商业VLM是黑箱，为GUI任务微调开源VLM需要大量资源。此外，由于延迟反馈和局部优化问题，现有的轨迹级评估和改进技术经常表现出不足。为了解决这些挑战，我们提出了一种在推理时间通过奖励模型进行过程监督来指导VLM代理进行GUI导航和控制的方法。这种指导允许VLM代理在每一步推理过程中优化动作，从而在静态和动态环境中提高性能。特别是，我们的方法在三个GUI导航任务中显示了显著的性能提升，在静态环境中单步动作准确率提高了3.4%，在一个动态环境中任务成功率提高了约33%。通过进一步整合轨迹反思和重试机制，我们还展示了更大的任务成功增强。

论文及项目相关链接

PDF

Summary

复杂图形用户界面（GUI）交互任务的处理中，视觉语言模型（VLM）的能力已显著增强。然而，现有框架在挑战性GUI环境中生成正确操作的性能仍然不足。为解决现有问题，本文提出了一种在推理时间使用奖励模型进行过程监督的VLM代理引导方法。此方法允许VLM代理在每个推理步骤中优化动作，提高静态和动态环境中的性能。特别是，该方法在三个GUI导航任务上取得了显著的性能提升，在静态环境中单步动作准确性提高3.4%，在一个动态环境中任务成功率提高约33%。通过进一步集成轨迹反思和重试机制，任务成功率得到进一步提升。

Key Takeaways

VLM在处理复杂GUI交互任务的能力已经显著提高，但仍面临生成正确操作的挑战。
当前最先进的商业VLM是黑箱操作，而开源VLM对GUI任务的微调需要大量资源。
现有的轨迹级别评估和改进技术由于延迟反馈和局部优化问题而经常表现不足。
本文提出了一种在推理时间使用奖励模型进行过程监督的方法，以指导VLM代理在GUI导航和控制中的行为。
该方法允许在每个推理步骤中优化动作，从而提高在静态和动态环境中的性能。
在三个GUI导航任务上取得了显著的性能提升，包括静态环境中单步动作准确性的提高和任务成功率的显著提高。

Cool Papers

点此查看论文截图

WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

Authors:Siyu Zhou, Tianyi Zhou, Yijun Yang, Guodong Long, Deheng Ye, Jing Jiang, Chengqi Zhang

Can we build accurate world models out of large language models (LLMs)? How can world models benefit LLM agents? The gap between the prior knowledge of LLMs and the specified environment’s dynamics usually bottlenecks LLMs’ performance as world models. To bridge the gap, we propose a training-free “world alignment” that learns an environment’s symbolic knowledge complementary to LLMs. The symbolic knowledge covers action rules, knowledge graphs, and scene graphs, which are extracted by LLMs from exploration trajectories and encoded into executable codes to regulate LLM agents’ policies. We further propose an RL-free, model-based agent “WALL-E 2.0” through the model-predictive control (MPC) framework. Unlike classical MPC requiring costly optimization on the fly, we adopt an LLM agent as an efficient look-ahead optimizer of future steps’ actions by interacting with the neurosymbolic world model. While the LLM agent’s strong heuristics make it an efficient planner in MPC, the quality of its planned actions is also secured by the accurate predictions of the aligned world model. They together considerably improve learning efficiency in a new environment. On open-world challenges in Mars (Minecraft like) and ALFWorld (embodied indoor environments), WALL-E 2.0 significantly outperforms existing methods, e.g., surpassing baselines in Mars by 16.1%-51.6% of success rate and by at least 61.7% in score. In ALFWorld, it achieves a new record 98% success rate after only 4 iterations.

我们可以利用大型语言模型（LLM）构建精确的世界模型吗？世界模型如何使LLM代理受益？LLM的先验知识与指定环境动态之间的差距通常会限制LLM作为世界模型的性能。为了弥补这一差距，我们提出了一种无需训练的“世界对齐”方法，该方法可以学习与LLM互补的环境符号知识。符号知识包括行动规则、知识图和场景图，这些是由LLM从探索轨迹中提取并编码成可执行代码，以调节LLM代理的策略。我们进一步提出了基于模型预测控制（MPC）框架的无需强化学习的“WALL-E 2.0”模型代理。与传统的MPC需要在飞行中进行昂贵的优化不同，我们采用LLM代理作为与神经符号世界模型交互的有效前瞻优化器，对未来步骤的行动进行优化。虽然LLM代理的强启发式使其在MPC中成为有效的规划器，但其计划行动的质量也得到了对齐的世界模型的准确预测保障。它们共同显著提高了新环境中的学习效率。在类似火星（Minecraft）的Mars和室内环境的ALFWorld等开放世界挑战中，WALL-E 2.0的方法显著优于现有方法，例如在Mars上的成功率超过基线16.1%~51.6%，得分至少提高61.7%。在ALFWorld中，仅经过4次迭代，就达到了98%的成功率，创造了新纪录。

论文及项目相关链接

PDF Code is available at https://github.com/elated-sawyer/WALL-E

Summary

大型语言模型（LLMs）如何构建准确的世界模型并受益于世界模型？LLMs的先验知识与特定环境动态之间的鸿沟通常限制了其作为世界模型的性能。为了弥补这一鸿沟，我们提出了一种无需训练的“世界对齐”方法，该方法学习环境的符号知识并与LLMs互补。符号知识包括行动规则、知识图和场景图，LLMs从探索轨迹中提取这些知识并编码为可执行代码，以调节LLM代理的策略。我们进一步通过模型预测控制（MPC）框架提出了无需强化学习的模型基础代理“WALL-E 2.0”。不同于需要即时优化成本的经典MPC，我们采用LLM代理作为高效的未来步骤行动前瞻优化器，通过与神经符号世界模型的互动来实现。LLM代理的强启发式使其成为MPC中的高效规划器，而其对齐世界模型的准确预测保证了其规划行动的质量。它们共同提高了在新环境中的学习效率。在类似火星（Minecraft）和室内环境（ALFWorld）的开放世界挑战中，WALL-E 2.0显著优于现有方法，例如在火星上成功率超出基线16.1%~51.6%，得分至少提高61.7%。在ALFWorld中，仅经过4次迭代就达到了98%的成功率，创造了新纪录。

Key Takeaways

LLMs在构建世界模型时面临先验知识与环境动态之间的鸿沟。
提出了一种无需训练的“世界对齐”方法，以弥补这一鸿沟，提取环境的符号知识。
符号知识包括行动规则、知识图和场景图，这些由LLMs从探索轨迹中提取并编码为可执行代码。
介绍了通过模型预测控制（MPC）框架的RL-free模型基础代理“WALL-E 2.0”。
LLM代理作为高效的未来步骤行动前瞻优化器，与神经符号世界模型互动。
LLM代理的强启发式使其在MPC中成为高效规划器，而对齐世界模型的预测保证了行动质量。

Cool Papers

点此查看论文截图

Advancing Embodied Agent Security: From Safety Benchmarks to Input Moderation

Authors:Ning Wang, Zihan Yan, Weiyang Li, Chuan Ma, He Chen, Tao Xiang

Embodied agents exhibit immense potential across a multitude of domains, making the assurance of their behavioral safety a fundamental prerequisite for their widespread deployment. However, existing research predominantly concentrates on the security of general large language models, lacking specialized methodologies for establishing safety benchmarks and input moderation tailored to embodied agents. To bridge this gap, this paper introduces a novel input moderation framework, meticulously designed to safeguard embodied agents. This framework encompasses the entire pipeline, including taxonomy definition, dataset curation, moderator architecture, model training, and rigorous evaluation. Notably, we introduce EAsafetyBench, a meticulously crafted safety benchmark engineered to facilitate both the training and stringent assessment of moderators specifically designed for embodied agents. Furthermore, we propose Pinpoint, an innovative prompt-decoupled input moderation scheme that harnesses a masked attention mechanism to effectively isolate and mitigate the influence of functional prompts on moderation tasks. Extensive experiments conducted on diverse benchmark datasets and models validate the feasibility and efficacy of the proposed approach. The results demonstrate that our methodologies achieve an impressive average detection accuracy of 94.58%, surpassing the performance of existing state-of-the-art techniques, alongside an exceptional moderation processing time of merely 0.002 seconds per instance.

实体代理在多领域展现出巨大的潜力，保证其行为安全成为广泛部署的基本前提。然而，现有研究主要集中在通用大型语言模型的安全性上，缺乏针对实体代理建立安全基准和输入管理的专业方法。为了弥补这一空白，本文介绍了一个全新的输入管理框架，该框架精心设计以保障实体代理的安全。框架涵盖了整个流程，包括分类定义、数据集策划、管理者架构、模型训练及严格评估。值得注意的是，我们推出了EAsafetyBench，这是一个精心构建的安全基准，旨在促进专门为实体代理设计的管理者的培训和严格评估。此外，我们提出了Pinpoint，这是一种创新的提示解耦输入管理方案，利用掩模注意力机制，有效地隔离并减轻功能提示对管理任务的影响。在多种基准数据集和模型上进行的广泛实验验证了所提出方法的可行性和有效性。结果表明，我们的方法实现了令人印象深刻的平均检测准确率94.58%，超越了现有最新技术的性能，并且具有出色的处理时间，每个实例仅需0.002秒。

论文及项目相关链接

PDF 9 pages

Summary

本文介绍了针对实体代理的安全性问题，提出了一套全新的输入调节框架和EAsafetyBench安全基准，旨在保障实体代理的安全行为。该框架涵盖了整个流程，包括分类定义、数据集整理、调节器架构、模型训练及严格评估等。同时，提出了prompt-decoupled输入调节方案，通过掩盖注意力机制有效隔离并缓解功能提示对调节任务的影响。实验证明，该方法具有高效性和实时性。

Key Takeaways

实体代理在多个领域具有巨大潜力，但其行为安全性的保证是广泛部署的前提。
当前研究主要关注通用大型语言模型的安全性，缺乏针对实体代理的安全基准和输入调节方法。
引入了一种新的输入调节框架，专为保障实体代理的安全设计，涵盖全过程。
介绍了EAsafetyBench安全基准，用于培训和严格评估实体代理的调节器。
提出了Pinpoint，一种基于masked注意力的prompt-decoupled输入调节方案，有效隔离并缓解功能提示对调节任务的影响。
在多个基准数据集和模型上进行的实验验证了方法的有效性和高效性。

Cool Papers

点此查看论文截图

Solving Multi-Agent Safe Optimal Control with Distributed Epigraph Form MARL

Authors:Songyuan Zhang, Oswin So, Mitchell Black, Zachary Serlin, Chuchu Fan

Tasks for multi-robot systems often require the robots to collaborate and complete a team goal while maintaining safety. This problem is usually formalized as a constrained Markov decision process (CMDP), which targets minimizing a global cost and bringing the mean of constraint violation below a user-defined threshold. Inspired by real-world robotic applications, we define safety as zero constraint violation. While many safe multi-agent reinforcement learning (MARL) algorithms have been proposed to solve CMDPs, these algorithms suffer from unstable training in this setting. To tackle this, we use the epigraph form for constrained optimization to improve training stability and prove that the centralized epigraph form problem can be solved in a distributed fashion by each agent. This results in a novel centralized training distributed execution MARL algorithm named Def-MARL. Simulation experiments on 8 different tasks across 2 different simulators show that Def-MARL achieves the best overall performance, satisfies safety constraints, and maintains stable training. Real-world hardware experiments on Crazyflie quadcopters demonstrate the ability of Def-MARL to safely coordinate agents to complete complex collaborative tasks compared to other methods.

对于多机器人系统，任务通常需要机器人在协作完成团队目标的同时保持安全。这个问题通常被形式化为约束马尔可夫决策过程（CMDP），旨在最小化全局成本并将约束违规的均值降低到用户定义的阈值以下。受现实世界中机器人应用的启发，我们将安全定义为零约束违规。虽然许多安全多智能体强化学习（MARL）算法已被提出用于解决CMDP，但这些算法在这种设置下的训练不稳定。为解决这一问题，我们采用约束优化的上界形式来提高训练稳定性，并证明集中式上界形式问题可以通过每个智能体以分布式方式解决。这导致了一种新型的集中式训练分布式执行MARL算法，名为Def-MARL。在两种不同的模拟器上的不同任务的仿真实验表明，Def-MARL实现了最佳的整体性能，满足了安全约束，并保持了稳定的训练。在Crazyflie四旋翼无人机上的真实世界硬件实验证明了与其他方法相比，Def-MARL能够安全地协调智能体完成复杂的协作任务的能力。

论文及项目相关链接

PDF 28 pages, 16 figures; Accepted by Robotics: Science and Systems 2025

Summary
多机器人系统任务需机器人协作完成团队目标同时保证安全，常建模为约束马尔可夫决策过程（CMDP），目标是降低全局成本并使约束违反均值低于用户定义阈值。安全定义为零约束违反。许多安全多智能体强化学习（MARL）算法被提出以解决CMDP，但训练不稳定。本研究采用约束优化的epigraph形式提高训练稳定性，证明集中式epigraph形式问题可通过各智能体分布式解决，提出新型集中式训练分布式执行MARL算法Def-MARL。仿真实验显示Def-MARL性能最佳，满足安全约束，训练稳定。在Crazyflie四旋翼无人机上的真实世界实验证明了与其他方法相比，Def-MARL能够安全地协调智能体完成复杂的协作任务。

Key Takeaways

多机器人系统协作任务需考虑安全因素，建模为约束马尔可夫决策过程（CMDP）。
安全定义为零约束违反。
现有MARL算法解决CMDP时存在训练不稳定问题。
采用约束优化的epigraph形式提高训练稳定性。
集中式epigraph形式问题可通过各智能体分布式解决。
提出新型MARL算法Def-MARL，实现集中式训练分布式执行。

Cool Papers

点此查看论文截图

Text-to-Decision Agent: Learning Generalist Policies from Natural Language Supervision

Authors:Shilin Zhang, Zican Hu, Wenhao Wu, Xinyi Xie, Jianxiang Tang, Chunlin Chen, Daoyi Dong, Yu Cheng, Zhenhong Sun, Zhi Wang

RL systems usually tackle generalization by inferring task beliefs from high-quality samples or warmup explorations. The restricted form limits their generality and usability since these supervision signals are expensive and even infeasible to acquire in advance for unseen tasks. Learning directly from the raw text about decision tasks is a promising alternative to leverage a much broader source of supervision. In the paper, we propose Text-to-Decision Agent (T2DA), a simple and scalable framework that supervises generalist policy learning with natural language. We first introduce a generalized world model to encode multi-task decision data into a dynamics-aware embedding space. Then, inspired by CLIP, we predict which textual description goes with which decision embedding, effectively bridging their semantic gap via contrastive language-decision pre-training and aligning the text embeddings to comprehend the environment dynamics. After training the text-conditioned generalist policy, the agent can directly realize zero-shot text-to-decision generation in response to language instructions. Comprehensive experiments on MuJoCo and Meta-World benchmarks show that T2DA facilitates high-capacity zero-shot generalization and outperforms various types of baselines.

强化学习系统通常通过从高质量样本或预热探索中推断任务信念来解决泛化问题。这种有限的形式限制了它们的通用性和可用性，因为这些监督信号是昂贵的，甚至对于未见过的任务预先获取是不可行的。直接从关于决策任务的原始文本中学习是利用更广泛的监督来源的一个有前途的替代方案。在论文中，我们提出了文本到决策代理（T2DA），这是一个简单且可扩展的框架，用自然语言监督通用策略学习。我们首先引入一个通用世界模型，将多任务决策数据编码到一个动态感知嵌入空间。然后，受到CLIP的启发，我们预测哪种文本描述与哪种决策嵌入相匹配，有效地通过对比语言决策预训练来弥合它们之间的语义鸿沟，并将文本嵌入对齐以理解环境动态。在训练文本条件下的通用策略之后，代理可以直接实现零射击文本到决策生成，以响应语言指令。在MuJoCo和Meta-World基准上的综合实验表明，T2DA促进了高容量零样本泛化，并超越了各种类型的基线。

论文及项目相关链接

PDF 18 pages, 8 figures

Summary：
文本提出了一种名为Text-to-Decision Agent（T2DA）的框架，该框架可以利用自然语言进行通用决策策略学习。它首先建立一个通用世界模型，将多任务决策数据编码为动态感知嵌入空间。然后，它受到CLIP的启发，预测哪种文本描述与哪种决策嵌入相匹配，通过对比语言决策预训练来缩小语义差距，并对文本嵌入进行对齐以理解环境动态。训练后的文本条件通用策略可以直接实现零文本到决策的生成，以响应语言指令。在MuJoCo和Meta-World基准测试中，T2DA促进了高容量零样本泛化并超越了各种基线。

Key Takeaways：

T2DA框架利用自然语言进行通用决策策略学习。
通过建立通用世界模型，将多任务决策数据编码为动态感知嵌入空间。
T2DA采用对比语言决策预训练，缩小文本描述与决策嵌入之间的语义差距。
框架通过对齐文本嵌入以理解环境动态。
训练后的文本条件通用策略可以直接实现零文本到决策的生成。
T2DA在MuJoCo和Meta-World基准测试中表现出高容量零样本泛化能力。

Cool Papers

点此查看论文截图

Authors:Le Wang, Zonghao Ying, Tianyuan Zhang, Siyuan Liang, Shengshan Hu, Mingchuan Zhang, Aishan Liu, Xianglong Liu

The emergence of multimodal large language models has redefined the agent paradigm by integrating language and vision modalities with external data sources, enabling agents to better interpret human instructions and execute increasingly complex tasks. However, in this work, we identify a critical yet previously overlooked security vulnerability in multimodal agents: cross-modal prompt injection attacks. To exploit this vulnerability, we propose CrossInject, a novel attack framework in which attackers embed adversarial perturbations across multiple modalities to align with target malicious content, allowing external instructions to hijack the agent’s decision-making process and execute unauthorized tasks. Our approach consists of two key components. First, we introduce Visual Latent Alignment, where we optimize adversarial features to the malicious instructions in the visual embedding space based on a text-to-image generative model, ensuring that adversarial images subtly encode cues for malicious task execution. Subsequently, we present Textual Guidance Enhancement, where a large language model is leveraged to infer the black-box defensive system prompt through adversarial meta prompting and generate an malicious textual command that steers the agent’s output toward better compliance with attackers’ requests. Extensive experiments demonstrate that our method outperforms existing injection attacks, achieving at least a +26.4% increase in attack success rates across diverse tasks. Furthermore, we validate our attack’s effectiveness in real-world multimodal autonomous agents, highlighting its potential implications for safety-critical applications.

多模态大型语言模型的兴起，通过整合语言和视觉模式与外部数据源，重新定义了代理范式，使代理能够更好地解释人类指令并执行日益复杂的任务。然而，在这项工作中，我们发现了多模态代理中一个关键但被忽视的安全漏洞：跨模态提示注入攻击。为了利用这一漏洞，我们提出了CrossInject，这是一种新型攻击框架，攻击者可以在多个模态中嵌入对抗性扰动，使其与目标恶意内容相匹配，允许外部指令劫持代理的决策过程并执行未经授权的任务。我们的方法由两个关键组成部分构成。首先，我们引入了视觉潜在对齐技术，在文本到图像生成模型的基础上，在视觉嵌入空间中优化恶意指令的对抗特征，确保对抗性图像能够微妙地编码执行恶意任务的线索。接下来，我们展示了文本指导增强技术，利用大型语言模型来推断通过对抗性元提示的防御系统提示，并生成一个恶意文本命令，使代理输出更符合攻击者的请求。大量实验表明，我们的方法优于现有的注入攻击，在多种任务上的攻击成功率提高了至少+26.4%。此外，我们在现实世界的多模态自主代理上验证了攻击的有效性，强调了其在安全关键应用中的潜在影响。

论文及项目相关链接

PDF 17 pages, 5 figures

Summary
多模态大型语言模型的出现重新定义了代理范式，通过整合语言与视觉模式与外部数据源，使代理能够更好地解释人类指令并执行更复杂的任务。然而，在这项工作中，我们发现了多模态代理中一个关键但以前被忽视的安全漏洞：跨模态提示注入攻击。为了利用这一漏洞，我们提出了CrossInject攻击框架，攻击者可以在多个模式嵌入对抗性扰动，以与目标恶意内容对齐，允许外部指令劫持代理的决策过程并执行未经授权的任务。我们的方法包括两个关键组件：视觉潜在对齐和文本指导增强。

Key Takeaways

多模态大型语言模型的兴起改变了代理的处理和执行任务的方式，通过整合语言和视觉模式与外部数据源，提高了解释和执行复杂任务的能力。
跨模态提示注入攻击是一个在多模态代理中的关键安全漏洞。
CrossInject攻击框架允许攻击者在多个模式嵌入对抗性扰动，以与目标恶意内容对齐，从而劫持代理的决策过程。
CrossInject包括两个关键组件：视觉潜在对齐和文本指导增强。
视觉潜在对齐通过优化文本到图像生成模型的恶意指令的视觉嵌入空间中的对抗性特征来实现。
文本指导增强利用大型语言模型来推断防御系统的提示并通过元提示生成恶意文本命令，使代理输出更符合攻击者的要求。

Cool Papers

点此查看论文截图

DoomArena: A framework for Testing AI Agents Against Evolving Security Threats

Authors:Leo Boisvert, Mihir Bansal, Chandra Kiran Reddy Evuru, Gabriel Huang, Abhay Puri, Avinandan Bose, Maryam Fazel, Quentin Cappart, Jason Stanley, Alexandre Lacoste, Alexandre Drouin, Krishnamurthy Dvijotham

We present DoomArena, a security evaluation framework for AI agents. DoomArena is designed on three principles: 1) It is a plug-in framework and integrates easily into realistic agentic frameworks like BrowserGym (for web agents) and $\tau$-bench (for tool calling agents); 2) It is configurable and allows for detailed threat modeling, allowing configuration of specific components of the agentic framework being attackable, and specifying targets for the attacker; and 3) It is modular and decouples the development of attacks from details of the environment in which the agent is deployed, allowing for the same attacks to be applied across multiple environments. We illustrate several advantages of our framework, including the ability to adapt to new threat models and environments easily, the ability to easily combine several previously published attacks to enable comprehensive and fine-grained security testing, and the ability to analyze trade-offs between various vulnerabilities and performance. We apply DoomArena to state-of-the-art (SOTA) web and tool-calling agents and find a number of surprising results: 1) SOTA agents have varying levels of vulnerability to different threat models (malicious user vs malicious environment), and there is no Pareto dominant agent across all threat models; 2) When multiple attacks are applied to an agent, they often combine constructively; 3) Guardrail model-based defenses seem to fail, while defenses based on powerful SOTA LLMs work better. DoomArena is available at https://github.com/ServiceNow/DoomArena.

我们推出DoomArena，这是一个用于AI代理的安全评估框架。DoomArena基于三个原则设计：1）它是一个插件框架，可以轻松集成到现实的代理框架中，例如BrowserGym（用于网络代理）和τ-bench（用于工具调用代理）；2）它是可配置的，允许进行详细威胁建模，允许配置代理框架的特定组件受到攻击，并为攻击者指定目标；3）它是模块化的，将攻击的发展与代理部署的环境细节解耦，允许在同一攻击中跨多个环境应用。我们说明了该框架的几个优点，包括能够轻松适应新的威胁模型和环境、能够轻松组合之前发布的多个攻击以进行全面和精细的安全测试，以及能够分析各种漏洞和性能之间的权衡。我们将DoomArena应用于最先进的网络工具和工具调用代理，并发现了一些令人惊讶的结果：1）最先进的代理面对不同的威胁模型（恶意用户与恶意环境）的脆弱性程度不同，在所有威胁模型中不存在完全占优的代理；2）当多个攻击同时应用于一个代理时，它们通常会结合产生效果；3）护栏模型为基础的防御似乎失效，而基于先进的大型语言模型（LLMs）的防御效果更好。你可以在https://github.com/ServiceNow/DoomArena获取DoomArena。

论文及项目相关链接

PDF

摘要

本文介绍了针对人工智能代理的安全评估框架——DoomArena。它基于三个原则设计：插件化、可配置模块化。该框架易于集成到真实的代理框架中，如BrowserGym（针对web代理）和τ-bench（针对工具调用代理）；允许详细的威胁建模，可以配置代理框架的特定组件受到攻击，并为攻击者指定目标；模块化设计使得攻击的开发与代理部署的环境细节相分离，使得相同的攻击可以应用于多个环境。该框架具有适应新威胁模型和环境的能力，易于组合以前发布的攻击来实现全面和精细的安全测试，以及分析各种漏洞和性能之间的权衡。应用于最新技术web和工具调用代理时，发现了一些出乎意料的结果。

关键见解

DoomArena框架为AI代理提供了安全评估的解决方案。
它是一个插件化、可配置且模块化的框架，易于集成到多种代理框架中。
该框架允许详细的威胁建模，并配置特定的代理框架组件以应对攻击。
DoomArena能够适应新的威胁模型和环境，并能轻松组合多种已发布的攻击来实现全面的安全测试。
在对最新技术web和工具调用代理的应用中，发现不同威胁模型对不同代理的脆弱性程度不同，没有全面优于所有威胁模型的代理。
当多个攻击同时作用于一个代理时，它们往往会以建设性的方式结合。

Cool Papers

点此查看论文截图

Harmonia: A Multi-Agent Reinforcement Learning Approach to Data Placement and Migration in Hybrid Storage Systems

Authors:Rakesh Nadig, Vamanan Arulchelvan, Rahul Bera, Taha Shahroodi, Gagandeep Singh, Andreas Kakolyris, Mohammad Sadrosadati, Jisung Park, Onur Mutlu

Hybrid storage systems (HSS) combine multiple storage devices with diverse characteristics to achieve high performance and capacity at low cost. The performance of an HSS highly depends on the effectiveness of two key policies: (1) the data-placement policy, which determines the best-fit storage device for incoming data, and (2) the data-migration policy, which rearranges stored data across the devices to sustain high HSS performance. Prior works focus on improving only data placement or only data migration in HSS, which leads to relatively low HSS performance. Unfortunately, no prior work tries to optimize both policies together. Our goal is to design a holistic data-management technique that optimizes both data-placement and data-migration policies to fully exploit the potential of an HSS, and thus significantly improve system performance. We demonstrate the need for multiple reinforcement learning (RL) agents to accomplish our goal. We propose Harmonia, a multi-agent RL-based data-management technique that employs two lightweight autonomous RL agents, a data-placement agent and a data-migration agent, which adapt their policies for the current workload and HSS configuration, and coordinate with each other to improve overall HSS performance. We evaluate Harmonia on a real HSS with up to four heterogeneous and diverse storage devices. Our evaluation using 17 data-intensive workloads on performance-optimized (cost-optimized) HSS with two storage devices shows that, on average, Harmonia outperforms the best-performing prior approach by 49.5% (31.7%). On an HSS with three (four) devices, Harmonia outperforms the best-performing prior work by 37.0% (42.0%). Harmonia’s performance benefits come with low latency (240ns for inference) and storage overheads (206 KiB in DRAM for both RL agents together). We will open-source Harmonia’s implementation to aid future research on HSS.

混合存储系统（HSS）结合了多种不同特性的存储设备，以低成本实现高性能和高容量。HSS的性能高度依赖于两个关键策略的有效性：（1）数据放置策略，它决定了传入数据的最佳存储设备；（2）数据迁移策略，它重新排列存储在各设备上的数据以维持高HSS性能。早期的工作主要集中在改进HSS中的仅数据放置或仅数据迁移，这导致HSS性能相对较低。然而，遗憾的是，没有早期的工作尝试同时优化这两个策略。我们的目标是设计一种全面的数据管理技术，优化数据放置和数据迁移策略，以充分利用HSS的潜力，从而显著提高系统性能。我们证明了需要多个强化学习（RL）代理来实现我们的目标。我们提出了一种基于多代理RL的数据管理技术——Harmonia，它采用两个轻量级的自主RL代理，一个数据放置代理和一个数据迁移代理，这两个代理适应当前的工作负载和HSS配置，并相互协调以提高HSS的整体性能。我们在包含多达四种异质且多样的存储设备的真实HSS上评估了Harmonia。我们对性能优化（成本优化）的HSS使用17个数据密集型工作负载，在包含两个存储设备的评估中显示，Harmonia平均比最佳的前期方法高出49.5%（31.7%）。在包含三个（四个）设备的HSS上，Harmonia比最佳的前期工作高出37.0%（42.0%）。Harmonia的性能优势具有低延迟（推理时间为240ns）和存储开销（两个RL代理共需206KiB的DRAM）。我们将开放Harmonia的实现源代码，以帮助未来的HSS研究。

论文及项目相关链接

PDF

Summary：

混合存储系统（HSS）通过结合多种存储设备以高性能、低成本实现存储的高性能和容量。以往研究主要集中在优化数据放置或数据迁移中的一项政策，这导致了相对较低的系统性能。本文的目标是设计一种全面的数据管理技术，优化数据放置和数据迁移策略，以充分利用HSS的潜力，从而显著提高系统性能。通过采用基于多智能体的强化学习技术，本文提出了Harmonia方案，该方案采用两个轻量级自主强化学习智能体——数据放置智能体和数据迁移智能体，以适应当前工作负载和HSS配置，并相互协调以提高整体HSS性能。在包含最多四种异构存储设备的真实HSS上的评估结果表明，Harmonia在性能优化和成本优化的HSS上平均优于最佳先前方法分别为49.5%和31.7%。随着存储设备的增加，Harmonia的性能优势更加显著，在包含三、四个设备的HSS上分别优于最佳先前工作37.0%和42.0%。此外，Harmonia具有低延迟（推理时间为240ns）和存储开销（两个RL智能体的DRAM占用仅为206KiB）。本文将公开Harmonia的实现，以促进未来对HSS的研究。

Key Takeaways:

混合存储系统（HSS）结合多种存储设备以提高性能和降低成本。
现有研究未全面优化数据放置和数据迁移策略，导致系统性能有限。
Harmonia是一种全面的数据管理技术，使用两个自主强化学习智能体来分别处理数据放置和数据迁移，并相互协调以提高整体性能。
在包含多种设备的真实HSS上的评估显示，Harmonia显著优于现有方法。
Harmonia具有低延迟和存储开销。
Harmonia将公开源代码以促进未来研究。

Cool Papers

点此查看论文截图

Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks

Authors:Lutfi Eren Erdogan, Nicholas Lee, Sehoon Kim, Suhong Moon, Hiroki Furuta, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami

Large language models (LLMs) have shown remarkable advancements in enabling language agents to tackle simple tasks. However, applying them for complex, multi-step, long-horizon tasks remains a challenge. Recent work have found success by separating high-level planning from low-level execution, which enables the model to effectively balance high-level planning objectives and low-level execution details. However, generating accurate plans remains difficult since LLMs are not inherently trained for this task. To address this, we propose Plan-and-Act, a novel framework that incorporates explicit planning into LLM-based agents and introduces a scalable method to enhance plan generation through a novel synthetic data generation method. Plan-and-Act consists of a Planner model which generates structured, high-level plans to achieve user goals, and an Executor model that translates these plans into environment-specific actions. To train the Planner effectively, we introduce a synthetic data generation method that annotates ground-truth trajectories with feasible plans, augmented with diverse and extensive examples to enhance generalization. We evaluate Plan-and-Act using web navigation as a representative long-horizon planning environment, demonstrating a state-of-the-art 57.58% success rate on the WebArena-Lite benchmark as well as a text-only state-of-the-art 81.36% success rate on WebVoyager.

大型语言模型（LLM）在语言智能处理简单任务方面取得了显著的进步。然而，将其应用于复杂、多步骤、长期的任务仍然是一个挑战。近期的研究通过分离高级规划与低级执行取得了成功，这使得模型能够有效地平衡高级规划目标和低级执行细节。然而，生成准确的计划仍然很困难，因为LLM并没有天生就适合这项任务。为了解决这个问题，我们提出了Plan-and-Act（计划-行动）框架，该框架将明确的规划纳入LLM驱动的代理，并引入了一种可扩展的方法，通过一种新的合成数据生成方法来增强计划生成。Plan-and-Act包括一个规划器模型，用于生成结构化、高级的计划来实现用户目标，以及一个执行器模型，用于将这些计划翻译成环境特定的行动。为了有效地训练规划器，我们引入了一种合成数据生成方法，该方法用可行的计划标注了真实轨迹，并通过多样化和广泛的例子来增强通用性。我们使用网页导航作为代表性的长期规划环境来评估Plan-and-Act框架，在WebArena-Lite基准测试上取得了最先进的57.58%的成功率，以及在WebVoyager上的纯文本最先进的81.36%的成功率。

论文及项目相关链接

PDF

Summary

大型语言模型（LLMs）在处理简单任务时表现出显著进展，但在应对复杂、多步骤、长期规划任务时仍面临挑战。为解决这个问题，研究者通过分离高级规划与低级执行来提升模型效果，但仍存在生成准确计划困难的问题。为此，本文提出Plan-and-Act框架，该框架将明确规划纳入LLM代理，并通过新型合成数据生成方法提升计划生成能力。Plan-and-Act包括生成结构化高级计划的Planner模型，以及将计划转化为环境特定动作的Executor模型。为有效训练Planner，本文引入合成数据生成方法，通过标注真实轨迹的可行计划并辅以丰富多样的例子来提升模型的泛化能力。评估结果显示，Plan-and-Act在Web导航等长期规划环境中表现优异，在WebArena-Lite基准测试上取得领先的成功率。

Key Takeaways

大型语言模型（LLMs）在处理复杂、多步骤、长期规划任务时面临挑战。
通过分离高级规划与低级执行来提升模型效果是一种有效的解决方案。
Plan-and-Act框架将明确规划纳入LLM代理，包含生成结构化计划的Planner模型和将计划转化为环境动作的Executor模型。
为训练Planner模型，引入了合成数据生成方法，通过标注真实轨迹的可行计划并辅以丰富多样的例子来提升模型的泛化能力。
Plan-and-Act框架在Web导航等长期规划环境中表现优异。
在WebArena-Lite基准测试上，Plan-and-Act取得了领先的成功率。

Cool Papers

点此查看论文截图

AgilePilot: DRL-Based Drone Agent for Real-Time Motion Planning in Dynamic Environments by Leveraging Object Detection

Authors:Roohan Ahmed Khan, Valerii Serpiva, Demetros Aschalew, Aleksey Fedoseev, Dzmitry Tsetserukou

Autonomous drone navigation in dynamic environments remains a critical challenge, especially when dealing with unpredictable scenarios including fast-moving objects with rapidly changing goal positions. While traditional planners and classical optimisation methods have been extensively used to address this dynamic problem, they often face real-time, unpredictable changes that ultimately leads to sub-optimal performance in terms of adaptiveness and real-time decision making. In this work, we propose a novel motion planner, AgilePilot, based on Deep Reinforcement Learning (DRL) that is trained in dynamic conditions, coupled with real-time Computer Vision (CV) for object detections during flight. The training-to-deployment framework bridges the Sim2Real gap, leveraging sophisticated reward structures that promotes both safety and agility depending upon environment conditions. The system can rapidly adapt to changing environments, while achieving a maximum speed of 3.0 m/s in real-world scenarios. In comparison, our approach outperforms classical algorithms such as Artificial Potential Field (APF) based motion planner by 3 times, both in performance and tracking accuracy of dynamic targets by using velocity predictions while exhibiting 90% success rate in 75 conducted experiments. This work highlights the effectiveness of DRL in tackling real-time dynamic navigation challenges, offering intelligent safety and agility.

动态环境下自主无人机的导航仍然是一个关键挑战，特别是在处理快速移动的物体和快速变化的目标位置等不可预测的场景时。虽然传统的规划器和经典优化方法已被广泛应用于解决这一动态问题，但它们经常面临实时的、不可预测的变化，这最终导致在适应性和实时决策方面的性能不佳。在这项工作中，我们提出了一种新型的基于深度强化学习（DRL）的运动规划器AgilePilot，它在动态条件下进行训练，结合实时计算机视觉（CV）进行飞行过程中的目标检测。从训练到部署的框架缩小了Sim2Real的差距，利用复杂的奖励结构，根据环境条件促进安全性和灵活性的平衡。该系统能够快速适应不断变化的环境，同时在真实场景中实现最高3.0米/秒的速度。相比之下，我们的方法通过使用速度预测在性能和动态目标的跟踪精度方面比基于人工势能场（APF）的运动规划器高出三倍，在进行的75次实验中成功率为90%。这项工作突出了深度强化学习在解决实时动态导航挑战中的有效性，提供了智能的安全性和灵活性。

论文及项目相关链接

PDF Manuscript has been accepted at 2025 INTERNATIONAL CONFERENCE ON UNMANNED AIRCRAFT SYSTEMS (ICUAS)

Summary

本文介绍了一种基于深度强化学习（DRL）的无人机动态导航方法，名为AgilePilot。该方法能够在实时计算机视觉（CV）的辅助下应对动态环境中的目标导航挑战。通过精细设计的奖励结构，AgilePilot能在不同环境下快速适应，同时确保安全性和灵活性。相较于传统算法，如人工势能场（APF）等，AgilePilot在性能和动态目标跟踪精度上更胜一筹，速度预测功能使其在实际场景中最高速度达到3.0米/秒，且在75次实验中成功率高达90%。此研究凸显了DRL在实时动态导航领域的有效性。

Key Takeaways

该研究提出了一种新型的无人机动态导航方法AgilePilot，基于深度强化学习（DRL）。
AgilePilot结合实时计算机视觉（CV）进行目标检测，以应对动态环境变化。
通过精细设计的奖励结构，AgilePilot能在不同环境下快速适应并同时确保安全性和灵活性。
与传统算法相比，AgilePilot在性能和动态目标跟踪精度上表现更优。
AgilePilot具有速度预测功能，实际场景中最高速度达到3.0米/秒。
在75次实验中，AgilePilot的成功率高达90%。

Cool Papers

点此查看论文截图

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs

Authors:Xiaogeng Liu, Peiran Li, Edward Suh, Yevgeniy Vorobeychik, Zhuoqing Mao, Somesh Jha, Patrick McDaniel, Huan Sun, Bo Li, Chaowei Xiao

In this paper, we propose AutoDAN-Turbo, a black-box jailbreak method that can automatically discover as many jailbreak strategies as possible from scratch, without any human intervention or predefined scopes (e.g., specified candidate strategies), and use them for red-teaming. As a result, AutoDAN-Turbo can significantly outperform baseline methods, achieving a 74.3% higher average attack success rate on public benchmarks. Notably, AutoDAN-Turbo achieves an 88.5 attack success rate on GPT-4-1106-turbo. In addition, AutoDAN-Turbo is a unified framework that can incorporate existing human-designed jailbreak strategies in a plug-and-play manner. By integrating human-designed strategies, AutoDAN-Turbo can even achieve a higher attack success rate of 93.4 on GPT-4-1106-turbo.

本文提出了AutoDAN-Turbo，这是一种黑盒越狱方法，能够无需人工干预或预先定义的范围（例如，指定的候选策略），从零开始自动发现尽可能多的越狱策略，并将其用于红队攻击。因此，AutoDAN-Turbo可以显著优于基线方法，在公共基准测试上实现74.3%的平均攻击成功率。值得注意的是，AutoDAN-Turbo在GPT-4-1106-turbo上的攻击成功率达到88.5%。此外，AutoDAN-Turbo是一个统一的框架，能够以即插即用方式融入现有的人工设计越狱策略。通过整合人工设计策略，AutoDAN-Turbo在GPT-4-1106-turbo上的攻击成功率甚至可以达到93.4%。

论文及项目相关链接

PDF ICLR 2025 Spotlight. Project Page: https://autodans.github.io/AutoDAN-Turbo Code: https://github.com/SaFoLab-WISC/AutoDAN-Turbo

Summary

本文提出了AutoDAN-Turbo这一黑盒越狱方法，该方法无需人工干预或预先定义范围，即可自动发现尽可能多的越狱策略，并用于红队攻击。相较于基线方法，AutoDAN-Turbo在公共基准测试上的平均攻击成功率提高了74.3%，在GPT-4-1106-turbo上的攻击成功率更是高达88.5%。此外，AutoDAN-Turbo是一个可融入人类设计越狱策略的通用框架，通过整合这些策略，其在GPT-4-1106-turbo上的攻击成功率可进一步提升至93.4%。

Key Takeaways