发布日期: 2024-12-19

更新日期: 2024-12-20

文章字数: 16.2k

阅读时长: 66 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2024-12-19 更新

Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Authors:Yifei Zhou, Qianlan Yang, Kaixiang Lin, Min Bai, Xiong Zhou, Yu-Xiong Wang, Sergey Levine, Erran Li

The vision of a broadly capable and goal-directed agent, such as an Internet-browsing agent in the digital world and a household humanoid in the physical world, has rapidly advanced, thanks to the generalization capability of foundation models. Such a generalist agent needs to have a large and diverse skill repertoire, such as finding directions between two travel locations and buying specific items from the Internet. If each skill needs to be specified manually through a fixed set of human-annotated instructions, the agent’s skill repertoire will necessarily be limited due to the quantity and diversity of human-annotated instructions. In this work, we address this challenge by proposing Proposer-Agent-Evaluator, an effective learning system that enables foundation model agents to autonomously discover and practice skills in the wild. At the heart of PAE is a context-aware task proposer that autonomously proposes tasks for the agent to practice with context information of the environment such as user demos or even just the name of the website itself for Internet-browsing agents. Then, the agent policy attempts those tasks with thoughts and actual grounded operations in the real world with resulting trajectories evaluated by an autonomous VLM-based success evaluator. The success evaluation serves as the reward signal for the agent to refine its policies through RL. We validate PAE on challenging vision-based web navigation, using both real-world and self-hosted websites from WebVoyager and WebArena.To the best of our knowledge, this work represents the first effective learning system to apply autonomous task proposal with RL for agents that generalizes real-world human-annotated benchmarks with SOTA performances. Our open-source checkpoints and code can be found in https://yanqval.github.io/PAE/

具有广泛能力和目标导向的代理（如数字世界中的互联网浏览代理和物理世界中的家庭人形机器人）的设想迅速得到推进，这要归功于基础模型的泛化能力。这种泛化代理需要拥有大量且多样化的技能库，例如找到两个旅行地点之间的路线以及从互联网上购买特定商品。如果每个技能都需要通过固定的一组人类标注指令来手动指定，那么由于人类标注指令的数量和多样性，代理的技能库必将受到限制。在这项工作中，我们通过提出Proposer-Agent-Evaluator（PAE）来解决这一挑战，这是一种有效的学习系统，使基础模型代理能够自主地发现和练习真实环境中的技能。PAE的核心是一个情境感知的任务提出者，它可以根据环境上下文信息自主地提出任务供代理练习，如用户演示或仅仅是互联网浏览代理的名称本身。然后，代理策略尝试这些任务，在现实世界中通过思考和实际操作来完成任务轨迹，并由自主VLM（基于视觉语言模型）的成功评估器对结果进行评估。成功评估作为奖励信号，用于通过强化学习使代理优化其策略。我们在基于视觉的网页导航等挑战任务上验证了PAE的有效性，这些任务包括使用WebVoyager和WebArena提供的现实世界和自我托管网站。据我们所知，这项工作代表着首次有效地将自主任务提案与强化学习应用于代理的有效学习系统，该代理能够概括现实世界的人类注释基准测试并达到最新性能水平。我们的开源检查点和代码可在https://yanqval.github.io/PAE/找到。

论文及项目相关链接

PDF

Summary

通用能力代理，如数字世界中的网络浏览代理和物理世界中的家庭人形机器人，其视野迅速扩展，得益于基础模型的泛化能力。此类通用代理需要拥有庞大且多样化的技能库，如确定两个旅行地点之间的方向以及从互联网上购买特定商品等。若每项技能都需要通过固定的人类注释指令手动指定，则由于人类注释指令的数量和多样性，代理的技能库必将受到限制。本研究通过提出Proposer-Agent-Evaluator（PAE）这一有效的学习系统来解决这一挑战，使基础模型代理能够自主地发现和练习环境中的技能。PAE的核心是一个情境感知任务提出者，可自动为代理提出练习任务，并利用环境上下文信息，如用户演示或仅是网站名称本身（对于网络浏览代理）。然后，代理策略尝试这些任务，并伴随着实际的世界操作轨迹，由自主VLM成功评估器进行评估。成功评估作为奖励信号，用于通过强化学习优化代理策略。我们在具有挑战性的视觉网络导航任务上验证了PAE的有效性，同时使用WebVoyager和WebArena的真实世界和自托管网站。据我们所知，这项工作首次实现了有效的学习系统，通过强化学习进行自主任务提议，适用于泛化人类注释基准测试的代理，并取得了SOTA性能。

Key Takeaways

通用代理需要拥有广泛且多样化的技能库以适应不同任务需求。
手动指定技能由于指令的数量和多样性而受到限制。
Proposer-Agent-Evaluator（PAE）系统可解决此挑战，使代理能够自主发现环境技能。
PAE的核心是情境感知任务提出者，可利用环境上下文信息为代理提出任务。
代理策略尝试任务并伴随实际操作轨迹，由自主VLM成功评估器进行评估。
成功评估作为强化学习的奖励信号，用于优化代理策略。

Cool Papers

点此查看论文截图

SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents

Authors:Sheng Yin, Xianghe Pang, Yuanzhuo Ding, Menglan Chen, Yutong Bi, Yichen Xiong, Wenhao Huang, Zhen Xiang, Jing Shao, Siheng Chen

With the integration of large language models (LLMs), embodied agents have strong capabilities to execute complicated instructions in natural language, paving a way for the potential deployment of embodied robots. However, a foreseeable issue is that those embodied agents can also flawlessly execute some hazardous tasks, potentially causing damages in real world. To study this issue, we present SafeAgentBench – a new benchmark for safety-aware task planning of embodied LLM agents. SafeAgentBench includes: (1) a new dataset with 750 tasks, covering 10 potential hazards and 3 task types; (2) SafeAgentEnv, a universal embodied environment with a low-level controller, supporting multi-agent execution with 17 high-level actions for 8 state-of-the-art baselines; and (3) reliable evaluation methods from both execution and semantic perspectives. Experimental results show that the best-performing baseline gets 69% success rate for safe tasks, but only 5% rejection rate for hazardous tasks, indicating significant safety risks. More details and codes are available at https://github.com/shengyin1224/SafeAgentBench.

随着大型语言模型（LLM）的集成，实体代理具备了执行自然语言中的复杂指令的强大能力，为实体机器人的潜在部署铺平了道路。然而，一个可预见的问题是，这些实体代理也可以完美地执行一些危险任务，可能在现实世界中造成损害。为了研究这个问题，我们推出了SafeAgentBench——一个新的针对实体LLM代理的安全意识任务规划的基准测试。SafeAgentBench包括：（1）包含750个任务的新数据集，涵盖10种潜在危险和3种任务类型；（2）SafeAgentEnv，一个通用的实体环境，带有低级控制器，支持多代理执行，具有针对8种最新技术的基线模型的17个高级操作；（3）从执行和语义角度的可靠评估方法。实验结果表明，表现最佳的基线模型在安全任务上的成功率为69%，但在危险任务上的拒绝率为仅5%，表明存在显著的安全风险。更多详细信息和代码请访问https://github.com/shengyin1224/SafeAgentBench。

论文及项目相关链接

PDF 21 pages, 14 tables, 7 figures, submitted to ICRA 2024

Summary

基于大型语言模型（LLM）的实体代理具有强大的自然语言执行能力，这为实体机器人的潜在部署铺平了道路。然而，实体代理也可能完美执行一些危险任务，从而在现实世界中造成潜在损害。为解决这一问题，我们推出了SafeAgentBench——一个针对实体LLM代理的安全意识任务规划的新基准测试。SafeAgentBench包括：一、包含750个任务的新数据集，涵盖10种潜在危害和三种任务类型；二、SafeAgentEnv通用实体环境，具有底层控制器，支持多代理执行，为八个最新基线提供十七种高级操作；三、从执行和语义角度提供可靠的评估方法。实验结果表明，最佳基线在安全任务上的成功率达到百分之六十九，但在危险任务上的拒绝率仅为百分之五，显示出显著的安全风险。更多详情和代码可通过链接访问：https://github.com/shengyin1224/SafeAgentBench。

Key Takeaways

大型语言模型（LLM）使实体代理能够执行复杂的自然语言指令。
实体代理在现实世界中有潜在的危险执行任务的能力。
SafeAgentBench是一个新的基准测试，用于评估实体LLM代理的安全意识任务规划。
SafeAgentBench包括一个新数据集，涵盖多种潜在危害和任务类型。
SafeAgentEnv是一个通用实体环境，支持多代理执行和高层次操作。
实验结果表明，现有方法在安全任务上的表现较好，但在危险任务上仍存在显著的安全风险。

Cool Papers

点此查看论文截图

Contract-based Design and Verification of Multi-Agent Systems with Quantitative Temporal Requirements

Authors:Rafael Dewes, Rayna Dimitrova

Quantitative requirements play an important role in the context of multi-agent systems, where there is often a trade-off between the tasks of individual agents and the constraints that the agents must jointly adhere to. We study multi-agent systems whose requirements are formally specified in the quantitative temporal logic LTL[$\mathcal{F}$] as a combination of local task specifications for the individual agents and a shared safety constraint, The intricate dependencies between the individual agents entailed by their local and shared objectives make the design of multi-agent systems error-prone, and their verification time-consuming. In this paper we address this problem by proposing a novel notion of quantitative assume-guarantee contracts, that enables the compositional design and verification of multi-agent systems with quantitative temporal specifications. The crux of these contracts lies in their ability to capture the coordination between the individual agents to achieve an optimal value of the overall specification under any possible behavior of the external environment. We show that the proposed framework improves the scalability and modularity of formal verification of multi-agent systems against quantitative temporal specifications.

在多智能体系统背景下，定量要求发挥着重要作用。个体智能体的任务与智能体必须共同遵守的约束之间往往存在权衡。我们研究的多智能体系统的要求以定量时序逻辑LTL[$\mathcal{F}$]的形式正式指定，这是个体智能体的局部任务规范与共享安全约束的结合。个体智能体之间由局部和共享目标产生的复杂依赖关系使得多智能体系统的设计容易出现错误，并且其验证过程非常耗时。针对这一问题，本文提出了一种新型的定量假设保证合同概念，它能实现对具有定量时序规范的多智能体系统的组合设计和验证。这些合同的核心在于它们能够捕捉个体智能体之间的协调，以实现总体规范的最优值，无论外部环境有任何可能的行为。我们证明了所提出的框架提高了针对定量时序规范的多智能体系统形式验证的可扩展性和模块化程度。

论文及项目相关链接

PDF Extended version of paper accepted at AAAI-25

Summary
多智能体系统的量化要求在协调个体任务与共享约束之间起着重要的权衡作用。本文研究了使用定量时序逻辑LTL[$\mathcal{F}$]规范的多智能体系统，包括个体代理局部任务规范和共享安全约束。代理间复杂的相互依赖关系使得多智能体系统的设计易出错且验证耗时。为解决这一问题，本文提出了基于定量假设保证合同的新概念，使具有定量时序规范的多智能体系统能够实现组合设计与验证。合同的核心在于捕捉个体代理之间的协调，以在任何可能的外界环境行为下实现总体规范的最优值。该框架提高了多智能体系统对定量时序规范的正式验证的可扩展性和模块化。

Key Takeaways

多智能体系统中存在量化要求的重要性，涉及个体任务和共享约束之间的权衡。
使用定量时序逻辑LTL[$\mathcal{F}$]来正式规范多智能体系统的要求。
多智能体系统设计中存在复杂的代理间相互依赖关系，导致设计和验证的困难。
引入定量假设保证合同概念以解决多智能体系统的设计和验证问题。
合同能捕捉个体代理间的协调，实现整体规范的最优值，无论外界环境如何变化。
提出的框架提高了多智能体系统对定量时序规范的正式验证的可扩展性和模块化。
该方法对于处理复杂的智能体系统具有重要的实用性和理论价值。

Cool Papers

点此查看论文截图

An Agentic Approach to Automatic Creation of P&ID Diagrams from Natural Language Descriptions

Authors:Shreeyash Gowaikar, Srinivasan Iyengar, Sameer Segal, Shivkumar Kalyanaraman

The Piping and Instrumentation Diagrams (P&IDs) are foundational to the design, construction, and operation of workflows in the engineering and process industries. However, their manual creation is often labor-intensive, error-prone, and lacks robust mechanisms for error detection and correction. While recent advancements in Generative AI, particularly Large Language Models (LLMs) and Vision-Language Models (VLMs), have demonstrated significant potential across various domains, their application in automating generation of engineering workflows remains underexplored. In this work, we introduce a novel copilot for automating the generation of P&IDs from natural language descriptions. Leveraging a multi-step agentic workflow, our copilot provides a structured and iterative approach to diagram creation directly from Natural Language prompts. We demonstrate the feasibility of the generation process by evaluating the soundness and completeness of the workflow, and show improved results compared to vanilla zero-shot and few-shot generation approaches.

管道与仪表图（P&IDs）是工程和工艺行业中工作流程设计、构建和运营的基础。然而，其手动创建往往劳动强度大、易出错，且缺乏稳健的误差检测和校正机制。虽然最近生成式人工智能的最新进展，特别是大型语言模型（LLM）和视觉语言模型（VLM），已在各个领域显示出巨大的潜力，但其在自动化工程工作流程生成方面的应用仍被探索不足。在这项工作中，我们引入了一种新型 copilot，用于根据自然语言描述自动生成 P&IDs。利用多步代理工作流程，我们的 copilot 提供了一种结构化、迭代的方法，可直接从自然语言提示创建图表。我们通过评估工作流程的健全性和完整性来证明生成过程的可行性，并展示了与零样本和少样本生成方法相比的改进结果。

论文及项目相关链接

PDF Accepted at the AAAI’25 Workshop on AI to Accelerate Science and Engineering (AI2ASE)

Summary

工程流程图和仪器流程图（P&IDs）是工程及流程设计、建设和操作的基础。手动创建P&IDs往往劳动强度大、易出错，缺乏可靠的错误检测和纠正机制。最近，生成式人工智能（尤其是大型语言模型和视觉语言模型）在各领域展现出巨大潜力，但其在自动化生成工程流程方面的应用仍被忽视。本研究引入了一种新型P&IDs自动生成助手，该助手可直接从自然语言描述中生成流程图。利用多步骤代理工作流程，该助手提供结构化、迭代式方法，用于创建直观语言提示下的图表。通过对生成流程的正确性和完整性进行评估，展示了其在提升性能方面的效果，相对于传统的零样本和少量样本生成方法有明显的改进。

Key Takeaways

P&IDs在工程和流程行业中的重要性。
手动创建P&IDs存在的劳动强度高、易出错等问题。
人工智能技术在自动化生成工程流程方面的潜力与应用现状。
新型自动生成助手能够直接从自然语言描述中生成P&IDs的功能特点。
利用多步骤代理工作流程的生成助手实现结构化、迭代式图表创建的方法。
对生成流程的正确性和完整性评估的重要性和方法。

Cool Papers

点此查看论文截图

Bayesian Persuasion with Externalities: Exploiting Agent Types

Authors:Jonathan Shaki, Jiarui Gan, Sarit Kraus

We study a Bayesian persuasion problem with externalities. In this model, a principal sends signals to inform multiple agents about the state of the world. Simultaneously, due to the existence of externalities in the agents’ utilities, the principal also acts as a correlation device to correlate the agents’ actions. We consider the setting where the agents are categorized into a small number of types. Agents of the same type share identical utility functions and are treated equitably in the utility functions of both other agents and the principal. We study the problem of computing optimal signaling strategies for the principal, under three different types of signaling channels: public, private, and semi-private. Our results include revelation-principle-style characterizations of optimal signaling strategies, linear programming formulations, and analysis of in/tractability of the optimization problems. It is demonstrated that when the maximum number of deviating agents is bounded by a constant, our LP-based formulations compute optimal signaling strategies in polynomial time. Otherwise, the problems are NP-hard.

我们研究了一个带有外部性的贝叶斯劝说问题。在这个模型中，一个主体发送信号来告知多个代理世界状态。同时，由于代理效用中存在外部性，主体还充当关联设备来关联代理的行动。我们考虑将代理人分为少数几种类型的情境。同一类型的代理人具有相同的效用函数，并在其他代理人和主体的效用函数中公平对待。我们研究了主体在三种不同类型的信号通道下计算最优信号策略的问题：公共通道、私人通道和半私人通道。我们的结果包括揭示性原理风格的最优信号策略特征、线性规划公式和优化问题的可解性分析。结果表明，当偏离代理人的最大数量被一个常数限制时，我们的基于LP的公式可以在多项式时间内计算最优信号策略。否则，这些问题都是NP难的。

论文及项目相关链接

PDF to be published in AAAI 2025

Summary
在一个带有外部性的贝叶斯说服问题中，主要研究者通过发送信号来告知多个代理世界状态。由于代理的效用存在外部性，主要研究者也作为一个关联设备来关联代理的行动。在代理被分类为少数类型的情况下，同一类型的代理具有相同的效用函数，并且在其他代理和主要研究者的效用函数中公平对待。本文主要研究了主要研究者计算最优信号策略的三大问题，包括公开、私人以及半私人三种信号渠道。研究结果表明，当偏离代理的最大数量被常数限制时，我们的LP基于的公式可以在多项式时间内计算最优信号策略。否则，问题是NP难的。

Key Takeaways

研究了贝叶斯说服问题中的外部性问题，主要研究者通过发送信号告知多个代理世界状态。
主要研究者作为关联设备，关联代理的行动。
在代理分类的情况下，同一类型代理具有相同效用函数。
研究了三种不同的信号渠道：公开、私人以及半私人。
揭示了最优信号策略的计算问题，包括线性规划公式和问题的可解性分析。
当偏离代理的最大数量有限时，可以在多项式时间内计算最优信号策略。

Cool Papers

点此查看论文截图

CP-Guard: Malicious Agent Detection and Defense in Collaborative Bird’s Eye View Perception

Authors:Senkang Hu, Yihang Tao, Guowen Xu, Yiqin Deng, Xianhao Chen, Yuguang Fang, Sam Kwong

Collaborative Perception (CP) has shown a promising technique for autonomous driving, where multiple connected and autonomous vehicles (CAVs) share their perception information to enhance the overall perception performance and expand the perception range. However, in CP, ego CAV needs to receive messages from its collaborators, which makes it easy to be attacked by malicious agents. For example, a malicious agent can send harmful information to the ego CAV to mislead it. To address this critical issue, we propose a novel method, \textbf{CP-Guard}, a tailored defense mechanism for CP that can be deployed by each agent to accurately detect and eliminate malicious agents in its collaboration network. Our key idea is to enable CP to reach a consensus rather than a conflict against the ego CAV’s perception results. Based on this idea, we first develop a probability-agnostic sample consensus (PASAC) method to effectively sample a subset of the collaborators and verify the consensus without prior probabilities of malicious agents. Furthermore, we define a collaborative consistency loss (CCLoss) to capture the discrepancy between the ego CAV and its collaborators, which is used as a verification criterion for consensus. Finally, we conduct extensive experiments in collaborative bird’s eye view (BEV) tasks and our results demonstrate the effectiveness of our CP-Guard.

协同感知（CP）作为一种自动驾驶技术，展现出广阔的应用前景。在协同感知中，多个互联的自动驾驶车辆（CAVs）共享感知信息，以提高整体的感知性能和扩大感知范围。然而，在协同感知过程中，自我CAV需要从其合作伙伴那里接收信息，这使得它容易受到恶意代理的攻击。例如，恶意代理可能会向自我CAV发送有害信息以误导它。为了解决这一关键问题，我们提出了一种新方法——CP守卫者（CP-Guard），这是一种针对协同感知的定制防御机制，可以被每个代理部署，以准确检测和消除合作网络中的恶意代理。我们的核心思想是使协同感知能够达成共识，而不是与自我CAV的感知结果发生冲突。基于这一思想，我们首先开发了一种概率未知样本共识（PASAC）方法，以有效地对合作者子集进行采样，并在不知道恶意代理先验概率的情况下验证共识。此外，我们定义了一个协作一致性损失（CCLoss），以捕捉自我CAV与其合作伙伴之间的差异，并将其用作共识的验证标准。最后，我们在协同鸟瞰（BEV）任务中进行了大量实验，实验结果表明CP-Guard的有效性。

论文及项目相关链接

PDF Accepted by AAAI’25

Summary

协作感知（CP）已成为自动驾驶的一种有前途的技术，其中多个互联的自动驾驶汽车（CAVs）共享其感知信息以提高整体的感知性能和扩大感知范围。然而，在CP中，自我CAV需要接收来自合作车辆的信息，这使其容易受到恶意代理的攻击。针对此问题，我们提出了一种新方法CP-Guard，这是一种针对CP的定制防御机制，可以部署在每个代理上，以准确检测和消除合作网络中的恶意代理。我们的核心思想是使CP达成与自我CAV的感知结果一致的共识，而不是产生冲突。基于这一思想，我们开发了一种概率无关样本共识（PASAC）方法，可有效采样合作方的一部分并验证共识，而无需事先知道恶意代理的概率。此外，我们定义了一个协同一致性损失（CCLoss）来捕捉自我CAV与其合作方之间的差异，用作共识的验证标准。最终的实验结果表明CP-Guard在协同鸟瞰视图（BEV）任务中的有效性。

Key Takeaways

协作感知（CP）允许多个互联的自动驾驶汽车（CAVs）共享感知信息，提高感知性能和扩大感知范围。
CP中自我CAV在接收合作车辆信息时易受到恶意代理攻击。
提出的CP-Guard是一种防御机制，旨在准确检测和消除合作网络中的恶意代理。
CP-Guard的核心思想是使CP达成与自我CAV感知结果一致的共识。
开发了概率无关样本共识（PASAC）方法，有效采样合作方并验证共识。
定义了协同一致性损失（CCLoss）以捕捉自我CAV与合作者之间的差异，作为共识验证标准。

Cool Papers

点此查看论文截图

LLMs Can Simulate Standardized Patients via Agent Coevolution

Authors:Zhuoyun Du, Lujie Zheng, Renjun Hu, Yuyang Xu, Xiawei Li, Ying Sun, Wei Chen, Jian Wu, Haolei Cai, Haohao Ying

Training medical personnel using standardized patients (SPs) remains a complex challenge, requiring extensive domain expertise and role-specific practice. Most research on Large Language Model (LLM)-based simulated patients focuses on improving data retrieval accuracy or adjusting prompts through human feedback. However, this focus has overlooked the critical need for patient agents to learn a standardized presentation pattern that transforms data into human-like patient responses through unsupervised simulations. To address this gap, we propose EvoPatient, a novel simulated patient framework in which a patient agent and doctor agents simulate the diagnostic process through multi-turn dialogues, simultaneously gathering experience to improve the quality of both questions and answers, ultimately enabling human doctor training. Extensive experiments on various cases demonstrate that, by providing only overall SP requirements, our framework improves over existing reasoning methods by more than 10% in requirement alignment and better human preference, while achieving an optimal balance of resource consumption after evolving over 200 cases for 10 hours, with excellent generalizability. The code will be available at https://github.com/ZJUMAI/EvoPatient.

使用标准化病人（SPs）培训医务人员仍然是一个复杂的挑战，需要广泛的领域专业知识和特定角色的实践。大多数关于基于大型语言模型（LLM）的模拟病人的研究都集中在提高数据检索准确性或通过人工反馈调整提示。然而，这种关注忽视了病人代理需要学习标准化表现模式的关键需求，该模式通过无监督模拟将数据转化为人类般的病人反应。为了解决这一差距，我们提出了EvoPatient，这是一个新的模拟病人框架，其中病人代理和医生代理通过多轮对话模拟诊断过程，同时积累经验以提高问题和答案的质量，最终实现对人类医生的培训。在多种病例上的广泛实验表明，仅通过提供总体SP要求，我们的框架在要求对齐和更好的人类偏好方面比现有推理方法提高了10%以上。在超过200个病例进行10小时的进化后，实现了资源消耗的优化平衡，具有良好的通用性。代码将在https://github.com/ZJUMAI/EvoPatient上提供。

论文及项目相关链接

PDF Work in Progress

Summary

使用标准化病人（SPs）训练医疗人员是一项复杂的挑战，需要广泛的领域专业知识和角色特定实践。现有研究多关注提高基于大型语言模型（LLM）模拟患者的数据检索准确性或通过人类反馈调整提示，但忽略了患者代理需要学习标准化呈现模式的重要性。为解决这一不足，我们提出了EvoPatient框架，该框架通过患者代理和医生代理进行多轮对话模拟诊断过程，同时积累经验以提高问题和答案的质量，最终为医生训练提供支持。实验证明，我们的框架在仅提供总体SP要求的情况下，相较于现有推理方法，要求对齐性和人类偏好上有超过10%的提升，且在经过200个案例的10小时演化后实现了资源消耗的优化平衡，展现出良好的通用性。

Key Takeaways

训练医疗人员使用标准化病人（SPs）是一个需要广泛领域知识和特定角色实践的挑战。
当前研究主要关注提高基于大型语言模型（LLM）的模拟患者的数据检索准确性或调整提示。
现有研究忽视了患者代理需要学习标准化呈现模式的重要性。
EvoPatient框架通过患者代理和医生代理进行多轮对话模拟诊断过程。
EvoPatient框架提高了问题和答案的质量，为医生训练提供支持。
实验证明，EvoPatient框架在要求对齐性和人类偏好上相较于现有推理方法有显著提升。
EvoPatient框架在资源消耗方面实现了优化平衡，展现出良好的通用性。

Cool Papers

点此查看论文截图

Loosely Synchronized Rule-Based Planning for Multi-Agent Path Finding with Asynchronous Actions

Authors:Shuai Zhou, Shizhe Zhao, Zhongqiang Ren

Multi-Agent Path Finding (MAPF) seeks collision-free paths for multiple agents from their respective starting locations to their respective goal locations while minimizing path costs. Although many MAPF algorithms were developed and can handle up to thousands of agents, they usually rely on the assumption that each action of the agent takes a time unit, and the actions of all agents are synchronized in a sense that the actions of agents start at the same discrete time step, which may limit their use in practice. Only a few algorithms were developed to address asynchronous actions, and they all lie on one end of the spectrum, focusing on finding optimal solutions with limited scalability. This paper develops new planners that lie on the other end of the spectrum, trading off solution quality for scalability, by finding an unbounded sub-optimal solution for many agents. Our method leverages both search methods (LSS) in handling asynchronous actions and rule-based planning methods (PIBT) for MAPF. We analyze the properties of our method and test it against several baselines with up to 1000 agents in various maps. Given a runtime limit, our method can handle an order of magnitude more agents than the baselines with about 25% longer makespan.

多智能体路径寻找（MAPF）旨在为多个智能体从各自的起始位置到各自的目标位置寻找无碰撞路径，同时最小化路径成本。虽然已开发了许多MAPF算法，并能够处理高达数千个智能体，但它们通常建立在这样一个假设之上，即智能体的每个动作都需要一个时间单位，所有智能体的动作都在同一离散时间步开始，这在实践中可能限制了它们的使用。只有少数算法被开发出来处理异步动作，它们都集中在寻找最优解，但可扩展性有限。本文开发了新的规划器，它们位于光谱的另一端，通过为许多智能体找到一个无界的次优解来权衡解决方案的质量与可扩展性。我们的方法结合了搜索方法（LSS）来处理异步动作和基于规则的规划方法（PIBT）来解决MAPF问题。我们分析了我们方法的属性，并在各种地图上与几个基准点进行了测试，涉及高达1000个智能体。在给定运行时间限制的情况下，我们的方法能够处理的智能体数量比基准方法多出一个数量级，并且具有约25%的更长的完成时间。

论文及项目相关链接

PDF AAAI2025

Summary

该文介绍了多智能体路径规划（MAPF）问题，即在考虑路径成本最小化的前提下，为多个智能体规划碰撞自由的路径。虽然已开发许多MAPF算法，能够处理多达千个智能体的问题，但它们通常假设智能体的每个动作都需要一个时间单位，且所有智能体的动作同步开始，这限制了其在实践中的应用。针对异步动作的问题，本文开发了新的规划方法，通过搜索方法和基于规则的方法相结合来解决MAPF问题。该方法在牺牲一定解质量的前提下提高了可扩展性，能够在各种地图上处理多达千个智能体的问题。在运行时限内，相较于基线方法，本文提出的方法可以处理的智能体数量多出一个数量级，且平均最长完成时间仅增加约25%。

Key Takeaways

多智能体路径规划（MAPF）的目标是规划多个智能体的路径，确保碰撞自由并最小化路径成本。
现有MAPF算法大多假设智能体的动作同步开始，限制了其实际应用。
异步动作处理的异步性是解决此问题的关键。本文通过结合搜索方法和基于规则的方法来处理异步动作和MAPF问题。
在处理大规模问题时，牺牲部分解质量可以提高算法的可扩展性。
本文开发的新方法能够在各种地图上处理多达千个智能体的问题。与基线方法相比，其在运行时限内可以处理的智能体数量多出一个数量级。
本文方法的平均最长完成时间相较于基线方法仅增加约25%。

Cool Papers

点此查看论文截图

Embodied CoT Distillation From LLM To Off-the-shelf Agents

Authors:Wonje Choi, Woo Kyung Kim, Minjong Yoo, Honguk Woo

We address the challenge of utilizing large language models (LLMs) for complex embodied tasks, in the environment where decision-making systems operate timely on capacity-limited, off-the-shelf devices. We present DeDer, a framework for decomposing and distilling the embodied reasoning capabilities from LLMs to efficient, small language model (sLM)-based policies. In DeDer, the decision-making process of LLM-based strategies is restructured into a hierarchy with a reasoning-policy and planning-policy. The reasoning-policy is distilled from the data that is generated through the embodied in-context learning and self-verification of an LLM, so it can produce effective rationales. The planning-policy, guided by the rationales, can render optimized plans efficiently. In turn, DeDer allows for adopting sLMs for both policies, deployed on off-the-shelf devices. Furthermore, to enhance the quality of intermediate rationales, specific to embodied tasks, we devise the embodied knowledge graph, and to generate multiple rationales timely through a single inference, we also use the contrastively prompted attention model. Our experiments with the ALFRED benchmark demonstrate that DeDer surpasses leading language planning and distillation approaches, indicating the applicability and efficiency of sLM-based embodied policies derived through DeDer.

我们应对在决策制定系统及时在容量有限、现成的设备上运行环境下，利用大型语言模型（LLM）执行复杂实体任务所面临的挑战。我们提出了DeDer框架，该框架可从LLM中分解和提炼实体推理能力，以构建基于高效小型语言模型（sLM）的策略。在DeDer中，基于LLM的策略的决策制定过程被重构为一个层次结构，包括推理策略和规划策略。推理策略是从通过LLM的实体上下文学习和自我验证生成的数据中提炼出来的，因此能够产生有效的推理。规划策略在推理的指导下，能够高效地制定优化计划。反过来，DeDer允许对两项政策都采用sLM，并部署在现成的设备上。此外，为了提高与实体任务相关的中间推理的质量，我们设计了实体知识图谱，并使用对比提示注意力模型来通过单次推断及时生成多个推理。我们在ALFRED基准测试上的实验表明，DeDer超越了领先的语言规划和蒸馏方法，证明了通过DeDer衍生出的基于sLM的实体策略的应用性和效率。

论文及项目相关链接

PDF Accepted at ICML 2024

Summary

基于大型语言模型（LLM）解决复杂实体任务面临的挑战，特别是在决策系统需及时在容量有限、现成的设备上运行的环境中。提出DeDer框架，该框架可从LLM中分解和提炼实体推理能力，转化为基于小型语言模型（sLM）的策略。DeDer将LLM策略的决策过程重构为层次结构，包括推理策略和规划策略。推理策略通过实体上下文学习和LLM的自我验证数据生成进行提炼，以产生有效的理由。规划策略根据这些理由可以高效生成优化计划。因此，DeDer可采用sLM作为政策基础，部署在现成的设备上。为提高特定于实体任务的中间理由的质量，开发了实体知识图谱。此外，通过对比提示注意力模型，可实现单次推断生成多个理由。在ALFRED基准测试上的实验表明，DeDer超越了领先的语言规划和蒸馏方法，证明了通过DeDer衍生的sLM实体政策的适用性和效率。

Key Takeaways

DeDer框架解决了利用大型语言模型（LLM）进行复杂实体任务的挑战。
DeDer可以将LLM的决策过程重构为包含推理策略和规划策略的层次结构。
推理策略通过实体上下文学习和自我验证数据生成进行提炼。
规划策略基于推理策略产生的理由来生成优化计划。
DeDer允许使用小型语言模型（sLM）作为策略基础，适用于部署在资源有限的设备上。
开发实体知识图谱提高特定任务的中间理由质量。
对比提示注意力模型可单次推断生成多个理由，提高效率。

Cool Papers

点此查看论文截图

Large Language Model-Brained GUI Agents: A Survey

Authors:Chaoyun Zhang, Shilin He, Jiaxu Qian, Bowen Li, Liqun Li, Si Qin, Yu Kang, Minghua Ma, Guyue Liu, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

GUIs have long been central to human-computer interaction, providing an intuitive and visually-driven way to access and interact with digital systems. The advent of LLMs, particularly multimodal models, has ushered in a new era of GUI automation. They have demonstrated exceptional capabilities in natural language understanding, code generation, and visual processing. This has paved the way for a new generation of LLM-brained GUI agents capable of interpreting complex GUI elements and autonomously executing actions based on natural language instructions. These agents represent a paradigm shift, enabling users to perform intricate, multi-step tasks through simple conversational commands. Their applications span across web navigation, mobile app interactions, and desktop automation, offering a transformative user experience that revolutionizes how individuals interact with software. This emerging field is rapidly advancing, with significant progress in both research and industry. To provide a structured understanding of this trend, this paper presents a comprehensive survey of LLM-brained GUI agents, exploring their historical evolution, core components, and advanced techniques. We address research questions such as existing GUI agent frameworks, the collection and utilization of data for training specialized GUI agents, the development of large action models tailored for GUI tasks, and the evaluation metrics and benchmarks necessary to assess their effectiveness. Additionally, we examine emerging applications powered by these agents. Through a detailed analysis, this survey identifies key research gaps and outlines a roadmap for future advancements in the field. By consolidating foundational knowledge and state-of-the-art developments, this work aims to guide both researchers and practitioners in overcoming challenges and unlocking the full potential of LLM-brained GUI agents.

图形用户界面（GUIs）长期以来在人机交互中占据核心地位，为用户提供一种直观且视觉驱动的方式来访问和与数字系统进行交互。大语言模型（LLMs）的出现，特别是多模态模型，已经开启了GUI自动化的新时代。它们在自然语言理解、代码生成和视觉处理方面表现出卓越的能力。这为新一代基于LLM的GUI代理铺平了道路，这些代理能够解释复杂的GUI元素并基于自然语言指令自主执行操作。这些代理代表了范式转变，使用户能够通过简单的对话命令执行复杂的多步骤任务。它们的应用程序跨越网页导航、移动应用交互和桌面自动化，提供变革性的用户体验，彻底改变个人与软件的交互方式。这个新兴领域正在迅速发展，在研究和工业方面都取得了重大进展。为了对这一趋势进行结构化理解，本文全面概述了基于LLM的GUI代理，探索了其历史演变、核心组件和先进技术。我们解答了诸如现有GUI代理框架、收集和利用数据来训练专业GUI代理、针对GUI任务开发的大型动作模型的开发、以及评估其有效性的评估指标和基准测试等研究问题。此外，我们还介绍了这些代理推动的新兴应用。通过详细分析，这篇综述确定了关键的研究空白，并为该领域的未来进步制定了路线图。通过整合基础知识和最新发展，这项工作旨在指导研究者和实践者克服挑战，发挥基于LLM的GUI代理的全部潜力。

论文及项目相关链接

PDF The collection of papers reviewed in this survey will be hosted and regularly updated on the GitHub repository: https://github.com/vyokky/LLM-Brained-GUI-Agents-Survey Additionally, a searchable webpage is available at https://aka.ms/gui-agent for easier access and exploration

Summary

随着GUI在人类计算机交互中的核心地位日益凸显，多模态模型等LLM技术的出现为GUI自动化开启了新纪元。LLM技术在自然语言理解、代码生成和视觉处理方面的卓越能力，为新一代LLM驱动的GUI代理提供了可能。这些代理能够解释复杂的GUI元素，并根据自然语言指令自主执行操作。它们代表了用户体验的范式转变，通过简单的对话命令完成复杂的多步骤任务。本论文提供了关于LLM驱动的GUI代理的全面了解，包括历史演变、核心组件和高级技术。此外，论文还探讨了现有的GUI代理框架、数据收集和利用、针对GUI任务的行动模型开发等议题。

Key Takeaways

LLMs推动了GUI代理的进化：新一代代理集成了多模态技术，实现了基于自然语言指令的复杂任务自动化执行。
GUI代理代表了用户体验的范式转变：用户可通过简单的对话命令完成复杂的多步骤任务。
LLM驱动的GUI代理应用范围广泛：包括网页导航、移动应用交互和桌面自动化等。
论文全面概述了LLM驱动的GUI代理的发展：包括历史演变、核心组件和高级技术。
论文探讨了现有GUI代理框架和数据收集与利用的问题，并讨论了针对GUI任务的行动模型开发的重要性。
论文强调了评估这些代理的有效性的必要性和相关指标：指出了需要怎样的评估指标和基准测试来评估这些代理的效果。

Cool Papers

点此查看论文截图

A More Advanced Group Polarization Measurement Approach Based on LLM-Based Agents and Graphs

Authors:Zixin Liu, Ji Zhang, Yiran Ding

Group polarization is an important research direction in social media content analysis, attracting many researchers to explore this field. Therefore, how to effectively measure group polarization has become a critical topic. Measuring group polarization on social media presents several challenges that have not yet been addressed by existing solutions. First, social media group polarization measurement involves processing vast amounts of text, which poses a significant challenge for information extraction. Second, social media texts often contain hard-to-understand content, including sarcasm, memes, and internet slang. Additionally, group polarization research focuses on holistic analysis, while texts is typically fragmented. To address these challenges, we designed a solution based on a multi-agent system and used a graph-structured Community Sentiment Network (CSN) to represent polarization states. Furthermore, we developed a metric called Community Opposition Index (COI) based on the CSN to quantify polarization. Finally, we tested our multi-agent system through a zero-shot stance detection task and achieved outstanding results. In summary, the proposed approach has significant value in terms of usability, accuracy, and interpretability.

群体极化是社会媒体内容分析的重要研究方向，吸引了众多研究者探索该领域。因此，如何有效地衡量群体极化成为了一个关键话题。在社交媒体上衡量群体极化面临着一些挑战，而现有解决方案尚未解决这些挑战。首先，社交媒体群体极化的衡量需要处理大量的文本，这对信息提取提出了巨大的挑战。其次，社交媒体文本往往包含难以理解的内容，包括讽刺、表情包和网络俚语。此外，群体极化研究侧重于整体分析，而文本通常是零碎的。为了应对这些挑战，我们设计了一种基于多智能体的系统解决方案，并使用图结构的社区情感网络（CSN）来表示极化状态。此外，我们基于CSN开发了一个名为社区反对指数（COI）的指标来量化极化程度。最后，我们通过零起点立场检测任务测试了我们的多智能体系统，并获得了出色的结果。总之，所提出的方法在可用性、准确性和可解释性方面具有重要的价值。

论文及项目相关链接

PDF

Summary
社交媒体群体极化现象的测量是社交媒体内容分析的重要研究方向。存在文本处理量大、文本内容难以理解以及文本碎片化等问题。我们提出了基于多Agent系统的解决方案，采用社区情感网络（CSN）表示群体极化状态，并基于CSN构建社区反对指数（COI）来量化群体极化程度。该方案具有易用性、准确性和可解释性等优点。

Key Takeaways

社交媒体群体极化现象测量是社交媒体内容分析的重要方向。
社交媒体群体极化测量面临文本处理量大、内容难以理解和文本碎片化等挑战。
提出了基于多Agent系统的解决方案来处理社交媒体群体极化问题。
采用社区情感网络（CSN）表示群体极化状态。
构建社区反对指数（COI）来量化群体极化程度。
通过零样本立场检测任务测试了多Agent系统方案。

Cool Papers

点此查看论文截图

Adsorb-Agent: Autonomous Identification of Stable Adsorption Configurations via Large Language Model Agent

Authors:Janghoon Ock, Tirtha Vinchurkar, Yayati Jadhav, Amir Barati Farimani

Adsorption energy is a key reactivity descriptor in catalysis, enabling efficient screening for optimal catalysts. However, determining adsorption energy typically requires evaluating numerous adsorbate-catalyst configurations. Current algorithmic approaches rely on exhaustive enumeration of adsorption sites and configurations, which makes the process computationally intensive and does not inherently guarantee the identification of the global minimum energy. In this work, we introduce Adsorb-Agent, a Large Language Model (LLM) agent designed to efficiently identify system-specific stable adsorption configurations corresponding to the global minimum adsorption energy. Adsorb-Agent leverages its built-in knowledge and emergent reasoning capabilities to strategically explore adsorption configurations likely to hold adsorption energy. By reducing the reliance on exhaustive sampling, it significantly decreases the number of initial configurations required while improving the accuracy of adsorption energy predictions. We evaluate Adsorb-Agent’s performance across twenty representative systems encompassing a range of complexities. The Adsorb-Agent successfully identifies comparable adsorption energies for 83.7% of the systems and achieves lower energies, closer to the actual global minimum, for 35% of the systems, while requiring significantly fewer initial configurations than conventional methods. Its capability is particularly evident in complex systems, where it identifies lower adsorption energies for 46.7% of systems involving intermetallic surfaces and 66.7% of systems with large adsorbate molecules. These results demonstrate the potential of Adsorb-Agent to accelerate catalyst discovery by reducing computational costs and improving the reliability of adsorption energy predictions.

吸附能是催化中的关键反应性描述符，能够实现高效筛选最佳催化剂。然而，确定吸附能通常需要评估大量的吸附物-催化剂构型。当前的算法方法依赖于吸附位点和构型的详尽列举，这使得过程计算密集，并且不保证能固有地识别全局最低能量。在这项工作中，我们引入了Adsorb-Agent，这是一个大型语言模型（LLM）代理，旨在有效地识别与全局最低吸附能相对应的特定系统稳定吸附构型。Adsorb-Agent利用其内置知识和新兴推理能力来有策略地探索可能含有吸附能的吸附构型。通过减少对详尽采样的依赖，它在减少所需初始构型数量的同时，提高了吸附能预测的准确性。我们评估了Adsorb-Agent在涵盖各种复杂性的二十个代表性系统上的性能。Adsorb-Agent成功地为83.7%的系统确定了相当的吸附能，并为35%的系统达到了更接近实际全局最小值更低的能量，同时所需的初始构型远少于传统方法。在复杂系统中，其能力尤为突出，为涉及金属间表面的46.7%的系统以及具有大吸附分子的系统的66.7%确定了较低的吸附能。这些结果证明了Adsorb-Agent在减少计算成本、提高吸附能预测可靠性的过程中加速催化剂发现的潜力。

论文及项目相关链接

PDF

Summary

本文介绍了一种名为Adsorb-Agent的大型语言模型代理，该代理能够高效确定系统特定的稳定吸附构型，对应全局最低吸附能。与传统的依赖详尽采样的方法相比，Adsorb-Agent通过利用其内置知识和新兴推理能力来策略性地探索可能的吸附构型，显著减少了所需的初始构型数量，同时提高了对吸附能预测的准确度。在二十个代表性系统的评估中，Adsorb-Agent成功识别出相当吸附能的系统比例较高，并且在复杂系统中表现尤为出色。这些结果表明，Adsorb-Agent在加速催化剂发现方面具有潜力，能够降低计算成本并提高吸附能预测的可靠性。

Key Takeaways

Adsorption能量是催化中的重要反应描述符，用于筛选最佳催化剂。
当前算法方法依赖于详尽枚举吸附位点和方法配置，计算量大且不一定能找到全局最低能量。
Adsorb-Agent是一种大型语言模型代理，能够高效确定系统特定的稳定吸附构型。
Adsorb-Agent利用内置知识和新兴推理能力，减少了对详尽采样的依赖，提高了预测吸附能的准确性。
在二十个代表性系统的评估中，Adsorb-Agent在识别吸附能量方面表现出色，特别是在复杂系统中。
Adsorb-Agent成功识别出相当吸附能的系统比例较高，并且在某些系统中实现了更接近实际全局最低的吸附能量。

Cool Papers

点此查看论文截图

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

Authors:Jun Shern Chan, Neil Chowdhury, Oliver Jaffe, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, Lilian Weng, Aleksander Mądry

We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering. To this end, we curate 75 ML engineering-related competitions from Kaggle, creating a diverse set of challenging tasks that test real-world ML engineering skills such as training models, preparing datasets, and running experiments. We establish human baselines for each competition using Kaggle’s publicly available leaderboards. We use open-source agent scaffolds to evaluate several frontier language models on our benchmark, finding that the best-performing setup–OpenAI’s o1-preview with AIDE scaffolding–achieves at least the level of a Kaggle bronze medal in 16.9% of competitions. In addition to our main results, we investigate various forms of resource scaling for AI agents and the impact of contamination from pre-training. We open-source our benchmark code (github.com/openai/mle-bench/) to facilitate future research in understanding the ML engineering capabilities of AI agents.

我们推出了MLE-bench，这是一个用于衡量AI代理在机器学习工程方面表现如何的基准测试。为此，我们从Kaggle中精心挑选了75场与机器学习工程相关的竞赛，创建了一系列具有挑战性的任务集，这些任务测试了现实世界中机器学习工程的技能，如训练模型、准备数据集和进行实验。我们使用Kaggle的公开排行榜为每个竞赛制定人类基准线。我们使用开源代理脚手架来评估我们的基准测试上的最前沿语言模型，发现表现最佳的组合是OpenAI的o1-preview与AIDE脚手架组合，在16.9%的竞赛中至少达到了Kaggle铜牌水平。除了我们的主要结果外，我们还研究了AI代理的各种资源扩展形式以及预训练中的污染影响。我们开源我们的基准测试代码（github.com/openai/mle-bench/），以促进未来对AI代理的机器学习工程能力的研究。

论文及项目相关链接

PDF 10 pages, 17 pages appendix. Equal contribution by first seven authors, authors randomized. Corrected footnote 4. Added citation

Summary

本文介绍了MLE-bench的诞生背景及其作用，它是一个用于衡量AI机器学习工程能力的基准测试。通过从Kaggle收集的75个机器学习竞赛任务，创建了多样化的挑战任务集，测试真实世界的机器学习工程技能，如模型训练、数据集准备和实验运行等。研究结果显示，使用OpenAI的o1-preview与AIDE脚手架的最佳配置在某些竞赛中达到甚至超越Kaggle铜牌选手的水平。此外，文章还探讨了AI代理的各种资源扩展形式和预训练污染的影响，并公开了基准测试代码以促进未来研究。

Key Takeaways

MLE-bench是用于衡量AI机器学习工程能力的基准测试，包含多样化的挑战任务集。
从Kaggle收集了75个与机器学习相关的竞赛任务。
通过公开可用的Kaggle排行榜为竞赛建立人类基准线。
利用开源代理脚手架对前沿语言模型进行评估。
OpenAI的o1-preview与AIDE脚手架组合表现最佳，在某些竞赛中达到或超越Kaggle铜牌选手水平。
研究探讨了AI代理的各种资源扩展形式的影响。

Cool Papers

点此查看论文截图

Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic

Authors:Thomy Phan, Benran Zhang, Shao-Hung Chan, Sven Koenig

Anytime multi-agent path finding (MAPF) is a promising approach to scalable path optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood Search (LNS), is the current state-of-the-art approach where a fast initial solution is iteratively optimized by destroying and repairing selected paths of the solution. Current MAPF-LNS variants commonly use an adaptive selection mechanism to choose among multiple destroy heuristics. However, to determine promising destroy heuristics, MAPF-LNS requires a considerable amount of exploration time. As common destroy heuristics are non-adaptive, any performance bottleneck caused by these heuristics cannot be overcome via adaptive heuristic selection alone, thus limiting the overall effectiveness of MAPF-LNS in terms of solution cost. In this paper, we propose Adaptive Delay-based Destroy-and-Repair Enhanced with Success-based Self-Learning (ADDRESS) as a single-destroy-heuristic variant of MAPF-LNS. ADDRESS applies restricted Thompson Sampling to the top-K set of the most delayed agents to select a seed agent for adaptive LNS neighborhood generation. We evaluate ADDRESS in multiple maps from the MAPF benchmark set and demonstrate cost improvements by at least 50% in large-scale scenarios with up to a thousand agents, compared with the original MAPF-LNS and other state-of-the-art methods.

多智能体路径寻找（MAPF）是一种在多智能体系统中实现可扩展路径优化的有前途的方法。基于局部搜索（LNS）的MAPF-LNS是当前最先进的方法，它通过破坏和修复解决方案中的选定路径来快速迭代优化初始解决方案。当前的MAPF-LNS变体通常使用自适应选择机制来选择多个破坏启发式方法。然而，为了确定有前景的破坏启发式方法，MAPF-LNS需要大量的探索时间。由于常见的破坏启发式方法是非自适应的，因此仅通过自适应启发式选择无法克服这些启发式方法造成的性能瓶颈，从而限制了MAPF-LNS在解决方案成本方面的总体有效性。在本文中，我们提出了基于自适应延迟的破坏与修复增强方法，辅以基于成功的自我学习（ADDRESS）作为MAPF-LNS的单破坏启发式变体。ADDRESS对延迟最高的K个智能体应用有限的Thompson采样，以选择种子智能体用于自适应LNS邻域生成。我们在MAPF基准测试集的多个地图中对ADDRESS进行了评估，与原始MAPF-LNS和其他最先进的方法相比，在大规模场景中实现了至少50%的成本改进，涉及多达一千个智能体。

论文及项目相关链接

PDF Accepted to AAAI 2025

Summary
本论文提出一种名为ADDRESS的单销毁启发式策略变体，用于多智能体路径优化。ADDRESS通过自适应延迟机制结合成功型自我学习来改进基于大型邻域搜索的MAPF算法（MAPF-LNS）。在多个地图上的评估显示，与原始MAPF-LNS和其他最新方法相比，ADDRESS在大规模场景下至少将成本降低了50%。

Key Takeaways

Cool Papers

点此查看论文截图

Walk Wisely on Graph: Knowledge Graph Reasoning with Dual Agents via Efficient Guidance-Exploration

Authors:Zijian Wang, Bin Wang, Haifeng Jing, Huayu Li, Hongbo Dou

Recent years, multi-hop reasoning has been widely studied for knowledge graph (KG) reasoning due to its efficacy and interpretability. However, previous multi-hop reasoning approaches are subject to two primary shortcomings. First, agents struggle to learn effective and robust policies at the early phase due to sparse rewards. Second, these approaches often falter on specific datasets like sparse knowledge graphs, where agents are required to traverse lengthy reasoning paths. To address these problems, we propose a multi-hop reasoning model with dual agents based on hierarchical reinforcement learning (HRL), which is named FULORA. FULORA tackles the above reasoning challenges by eFficient GUidance-ExpLORAtion between dual agents. The high-level agent walks on the simplified knowledge graph to provide stage-wise hints for the low-level agent walking on the original knowledge graph. In this framework, the low-level agent optimizes a value function that balances two objectives: (1) maximizing return, and (2) integrating efficient guidance from the high-level agent. Experiments conducted on three real-word knowledge graph datasets demonstrate that FULORA outperforms RL-based baselines, especially in the case of long-distance reasoning.

近年来，多跳推理（Multi-hop Reasoning）在知识图谱（KG）推理中得到了广泛的研究，因为其有效性和可解释性。然而，之前的多跳推理方法存在两个主要缺点。首先，由于奖励稀疏，智能体在早期阶段很难学习有效且稳健的策略。其次，这些方法在特定数据集上表现不佳，例如稀疏知识图谱，需要智能体遍历长推理路径。为了解决这些问题，我们提出了一种基于分层强化学习（HRL）的双智能体多跳推理模型，命名为FULORA。FULORA通过双智能体之间的有效指导-探索来解决上述推理挑战。高级智能体在简化的知识图谱上行走，为在原始知识图谱上行走的低级智能体提供阶段性提示。在此框架中，低级智能体优化一个平衡两个目标的值函数：（1）最大化回报；（2）整合高级智能体的有效指导。在三个真实知识图谱数据集上进行的实验表明，FULORA优于基于RL的基线方法，尤其在长距离推理的情况下。

论文及项目相关链接

PDF Accepted by AAAI-25

Summary

多跳推理是知识图谱推理中广泛研究的主题，但其存在两个主要问题。为解决这些问题，我们提出了基于分层强化学习的双代理多跳推理模型FULORA。FULORA通过双代理之间的有效指导与探索，解决知识图谱中的推理挑战。高层次的代理在简化后的知识图谱上行走，为低层次的代理提供阶段性提示。低层次的代理在优化价值函数时，平衡了最大化回报和接受高层次代理的有效指导两个目标。在三个真实知识图谱数据集上的实验表明，FULORA优于强化学习基线，尤其在长距离推理方面表现更优秀。

Key Takeaways

多跳推理是知识图谱推理中的热门研究主题。
现有多跳推理方法面临两个主要挑战：早期阶段奖励稀疏和特定数据集（如稀疏知识图谱）上的长路径推理难题。
提出了一种基于分层强化学习的双代理多跳推理模型FULORA。
FULORA通过高效指导与探索解决推理挑战。
高层次代理提供阶段性提示，简化知识图谱的导航。
低层次代理在优化价值函数时，平衡了最大化回报和接受指导的需求。

Cool Papers

点此查看论文截图

Quantifying Misalignment Between Agents: Towards a Sociotechnical Understanding of Alignment

Authors:Aidan Kierans, Avijit Ghosh, Hananel Hazan, Shiri Dori-Hacohen

Existing work on the alignment problem has focused mainly on (1) qualitative descriptions of the alignment problem; (2) attempting to align AI actions with human interests by focusing on value specification and learning; and/or (3) focusing on a single agent or on humanity as a monolith. Recent sociotechnical approaches highlight the need to understand complex misalignment among multiple human and AI agents. We address this gap by adapting a computational social science model of human contention to the alignment problem. Our model quantifies misalignment in large, diverse agent groups with potentially conflicting goals across various problem areas. Misalignment scores in our framework depend on the observed agent population, the domain in question, and conflict between agents’ weighted preferences. Through simulations, we demonstrate how our model captures intuitive aspects of misalignment across different scenarios. We then apply our model to two case studies, including an autonomous vehicle setting, showcasing its practical utility. Our approach offers enhanced explanatory power for complex sociotechnical environments and could inform the design of more aligned AI systems in real-world applications.

关于对齐问题的现有研究主要集中在以下几个方面：（1）对齐问题的定性描述；（2）通过聚焦价值规范和机器学习尝试将AI行为与人的利益进行对齐；（3）以及关注单一实体或将人类视作一个单一整体的研究。最新的社会技术方法强调需要理解人类和AI实体之间的复杂错配问题。我们通过适应人类争论的计算社会科学模型来解决这一问题。我们的模型量化大型、多样化的实体群体在不同问题领域的潜在冲突目标中的错配问题。我们框架中的错配分数取决于观察到的实体群体、相关域以及实体加权偏好之间的冲突。通过模拟，我们展示了我们的模型如何在不同场景中捕捉到直观的错配方面。然后我们将模型应用于两个案例研究，包括自动驾驶车辆设置，展示其实用性。我们的方法为复杂的社技环境提供了增强的解释能力，并可以为现实应用中的更对齐AI系统的设计提供信息。

论文及项目相关链接

PDF 7 pages, 8 figures, 3 tables, forthcoming at the AAAI-25 Special Track on AI Alignment

Summary

本文着重解决人工智能与人类社会多主体间的复杂错位问题。现有研究主要关注错位问题的定性描述、通过价值特定与学习对齐AI行动与人类利益，以及单一主体的研究或人类整体的单一视角。近期社会技术方法强调理解人类与人工智能主体间的复杂错位。本文借鉴计算社会科学中人类争端的模型，对错位问题进行量化处理，尤其针对大型、多元且目标冲突的群体。错位评分取决于观察到的主体群体、特定领域以及主体间冲突。通过模拟，展示该模型在不同场景下捕捉错位问题的直观性。在自动驾驶汽车案例中，验证模型的实际应用效用。该模型对于复杂的社会技术环境更具解释力，为设计更贴合实际应用的AI系统提供依据。

Key Takeaways