嘘~ 正在从服务器偷取页面 . . .

Agent


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-03-11 更新

A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval

Authors:Yu Zhang, Shutong Qiao, Jiaqi Zhang, Tzu-Heng Lin, Chen Gao, Yong Li

Information technology has profoundly altered the way humans interact with information. The vast amount of content created, shared, and disseminated online has made it increasingly difficult to access relevant information. Over the past two decades, search and recommendation systems (collectively referred to as information retrieval systems) have evolved significantly to address these challenges. Recent advances in large language models (LLMs) have demonstrated capabilities that surpass human performance in various language-related tasks and exhibit general understanding, reasoning, and decision-making abilities. This paper explores the transformative potential of large language model agents in enhancing search and recommendation systems. We discuss the motivations and roles of LLM agents, and establish a classification framework to elaborate on the existing research. We highlight the immense potential of LLM agents in addressing current challenges in search and recommendation, providing insights into future research directions. This paper is the first to systematically review and classify the research on LLM agents in these domains, offering a novel perspective on leveraging this advanced AI technology for information retrieval. To help understand the existing works, we list the existing papers on agent-based simulation with large language models at this link: https://github.com/tsinghua-fib-lab/LLM-Agent-for-Recommendation-and-Search.

信息技术已经深刻改变了人类与信息的交互方式。网上创建、共享和传播的大量内容使得获取相关信息的难度越来越大。在过去的二十年中,搜索和推荐系统(统称为信息检索系统)已经发生了显著的变化,以应对这些挑战。大型语言模型(LLM)的最新进展在各种语言相关任务中表现出了超越人类的性能,并展现了通用理解、推理和决策能力。本文探讨了大型语言模型代理在增强搜索和推荐系统方面的变革潜力。我们讨论了大型语言模型代理的动机和角色,并建立了一个分类框架来详细阐述现有的研究。我们强调了大型语言模型代理在解决搜索和推荐方面的当前挑战方面的巨大潜力,为未来的研究方向提供了见解。本文首次系统回顾和分类了这些领域的大型语言模型代理的研究,为如何利用这一先进的AI技术进行信息检索提供了新颖的视角。为了帮助理解现有工作,我们将列出关于基于代理模拟与大型语言模型的现有论文链接:https://github.com/tsinghua-fib-lab/LLM-Agent-for-Recommendation-and-Search。

论文及项目相关链接

PDF

Summary
信息技术已深刻改变人类与信息的交互方式。随着在线创建、共享和传播的内容量激增,获取相关信息变得越来越困难。搜索和推荐系统(统称为信息检索系统)在过去的二十年中已显著发展,以应对这些挑战。大型语言模型(LLM)的最新进展在各种语言相关任务中表现出了超越人类的性能,并展现出了一般理解、推理和决策能力。本文探索了大型语言模型代理在增强搜索和推荐系统中的变革潜力。我们讨论了LLM代理的动机和角色,并建立了一个分类框架来阐述现有研究。我们强调了LLM代理在解决搜索和推荐中的当前挑战方面的巨大潜力,为未来的研究方向提供了见解。本文首次系统回顾和分类了这些领域中关于LLM代理的研究,为如何利用这一先进的人工智能技术进行信息检索提供了新颖的视角。

Key Takeaways

  1. 信息技术改变了人类与信息的交互方式,信息过载导致获取相关信息变得困难。
  2. 搜索和推荐系统是信息检索系统的关键组成部分,已显著发展以应对信息过载挑战。
  3. 大型语言模型(LLM)在多种语言任务中表现出超越人类的性能,具备理解、推理和决策能力。
  4. LLM代理在增强搜索和推荐系统的潜力方面表现出巨大的变革性。
  5. LLM代理的动机和角色被讨论,包括在解决信息检索中的挑战方面的作用。
  6. 现有研究被分类并系统地回顾,首次对LLM代理在搜索和推荐领域的研究进行综述。

Cool Papers

点此查看论文截图

GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation

Authors:Zhenxuan Zhang, Kinhei Lee, Weihang Deng, Huichi Zhou, Zihao Jin, Jiahao Huang, Zhifan Gao, Dominic C Marshall, Yingying Fang, Guang Yang

Automatic medical report generation supports clinical diagnosis, reduces the workload of radiologists, and holds the promise of improving diagnosis consistency. However, existing evaluation metrics primarily assess the accuracy of key medical information coverage in generated reports compared to human-written reports, while overlooking crucial details such as the location and certainty of reported abnormalities. These limitations hinder the comprehensive assessment of the reliability of generated reports and pose risks in their selection for clinical use. Therefore, we propose a Granular Explainable Multi-Agent Score (GEMA-Score) in this paper, which conducts both objective quantification and subjective evaluation through a large language model-based multi-agent workflow. Our GEMA-Score parses structured reports and employs NER-F1 calculations through interactive exchanges of information among agents to assess disease diagnosis, location, severity, and uncertainty. Additionally, an LLM-based scoring agent evaluates completeness, readability, and clinical terminology while providing explanatory feedback. Extensive experiments validate that GEMA-Score achieves the highest correlation with human expert evaluations on a public dataset, demonstrating its effectiveness in clinical scoring (Kendall coefficient = 0.70 for Rexval dataset and Kendall coefficient = 0.54 for RadEvalX dataset). The anonymous project demo is available at: https://github.com/Zhenxuan-Zhang/GEMA_score.

自动生成的医学报告支持临床诊断,减轻了放射科医师的工作量,并有望提高诊断的一致性。然而,现有的评估指标主要评估生成的报告中关键医疗信息覆盖的准确性,并与人工编写的报告进行比较,同时忽略了报告异常的位置和确定性等重要细节。这些局限性阻碍了生成的报告可靠性的全面评估,并在其临床使用选择中构成风险。因此,本文提出了Granular Explainable Multi-Agent Score(GEMA-Score),它采用基于大型语言模型的多智能体工作流程进行客观量化和主观评估。我们的GEMA-Score解析结构化报告,通过智能体之间的信息交互交换采用NER-F1计算,以评估疾病诊断、位置、严重程度和不确定性。此外,基于LLM的评分智能体评估完整性、可读性和临床术语,同时提供解释性反馈。大量实验验证表明,GEMA-Score在公共数据集上与人类专家评价的关联性最高,证明了其在临床评分中的有效性(Rexval数据集的Kendall系数为0.70,RadEvalX数据集的Kendall系数为0.54)。匿名项目演示地址为:https://github.com/Zhenxuan-Zhang/GEMA_score。

论文及项目相关链接

PDF

Summary
本文提出了一个名为GEMA-Score的评估方法,用于对自动生成的医学报告进行详细评估。该方法结合了客观量化和主观评价,通过多智能体工作流程对疾病诊断、位置、严重程度和不确定性进行评估。此外,该方法还使用大型语言模型进行评分并解释反馈,同时评估报告的完整性、可读性和临床术语。实验证明,GEMA-Score与专家评估的关联度最高,能有效应用于临床评分。

Key Takeaways

  1. 自动医学报告生成在辅助临床诊断、减轻放射科医生工作负担和提高诊断一致性方面具有潜力。
  2. 现有评估指标主要关注生成的报告中关键医疗信息的准确性,但忽视了异常位置、确定性等关键细节。
  3. 提出的GEMA-Score方法结合了客观量化和主观评价,通过多智能体工作流程对医学报告进行全面评估。
  4. GEMA-Score能评估疾病诊断、位置、严重程度和不确定性等方面的信息。
  5. 大型语言模型用于评分并解释反馈,同时评估报告的完整性、可读性和临床术语。
  6. 实验证明GEMA-Score与专家评估的关联度最高,能有效评估医学报告的可靠性。

Cool Papers

点此查看论文截图

FedMABench: Benchmarking Mobile Agents on Decentralized Heterogeneous User Data

Authors:Wenhao Wang, Zijie Yu, Rui Ye, Jianqing Zhang, Siheng Chen, Yanfeng Wang

Mobile agents have attracted tremendous research participation recently. Traditional approaches to mobile agent training rely on centralized data collection, leading to high cost and limited scalability. Distributed training utilizing federated learning offers an alternative by harnessing real-world user data, providing scalability and reducing costs. However, pivotal challenges, including the absence of standardized benchmarks, hinder progress in this field. To tackle the challenges, we introduce FedMABench, the first benchmark for federated training and evaluation of mobile agents, specifically designed for heterogeneous scenarios. FedMABench features 6 datasets with 30+ subsets, 8 federated algorithms, 10+ base models, and over 800 apps across 5 categories, providing a comprehensive framework for evaluating mobile agents across diverse environments. Through extensive experiments, we uncover several key insights: federated algorithms consistently outperform local training; the distribution of specific apps plays a crucial role in heterogeneity; and, even apps from distinct categories can exhibit correlations during training. FedMABench is publicly available at: https://github.com/wwh0411/FedMABench with the datasets at: https://huggingface.co/datasets/wwh0411/FedMABench.

移动代理最近吸引了大量的研究参与。传统的移动代理训练方法依赖于集中式数据收集,导致成本高且扩展性有限。利用联邦学习进行分布式训练提供了一种利用现实世界用户数据的替代方案,提供了可扩展性并降低了成本。然而,缺乏标准化基准等关键挑战阻碍了该领域的进展。为了应对这些挑战,我们推出了FedMABench,这是专为异构场景设计的联邦训练和评估移动代理的第一个基准。FedMABench包含6个数据集,涵盖超过30个子集,提供跨环境的移动代理评估的综合框架;8种联邦算法;超过五种类别总计80多个应用程序以及用于测试的十多个基础模型。通过广泛的实验,我们获得了几个关键见解:联邦算法始终优于本地训练;特定应用程序的分布对异构性起着至关重要的作用;即使在训练过程中,来自不同类别的应用程序也会显示出相关性。FedMABench公开可用,可通过以下链接获取数据集和基准:https://github.com/wwh0411/FedMABenchhttps://huggingface.co/datasets/wwh0411/FedMABench

论文及项目相关链接

PDF

Summary

移动代理近期吸引了大量研究参与。传统移动代理训练方法依赖集中式数据收集,成本高且扩展性有限。利用联邦学习进行分布式训练提供了一种替代方案,通过利用真实世界用户数据,提高可扩展性并降低成本。然而,缺乏标准化基准是此领域的关键挑战。为解决此挑战,我们推出FedMABench,专为异构场景设计的移动代理联邦训练和评估的首个基准。FedMABench包含超过30个子集的6个数据集、8种联邦算法、超过10个基础模型以及横跨5类的超过800个应用程序,为评估移动代理在不同环境中的表现提供了全面的框架。通过广泛的实验,我们获得了几个关键见解:联邦算法始终优于本地训练;特定应用程序的分布对异构性至关重要;以及即使来自不同类别的应用程序在训练时也会显示出相关性。FedMABench已在以下网站公开提供:https://github.com/wwh0411/FedMABench,数据集可在:https://huggingface.co/datasets/wwh0411/FedMABench获取。

Key Takeaways

  1. 移动代理领域近期吸引了大量研究关注,存在从集中式数据收集转向分布式训练的趋势。
  2. 联邦学习为移动代理的分布试训练提供了可行方案,有助于利用真实世界用户数据,提高可扩展性并降低成本。
  3. 缺乏标准化基准是移动代理联邦训练领域的主要挑战之一。
  4. FedMABench是首个专为异构场景设计的移动代理联邦训练和评估基准,包含多个数据集、算法和模型。
  5. 联邦算法在性能上通常优于本地训练方法。
  6. 特定应用程序的分布对异构训练环境至关重要。

Cool Papers

点此查看论文截图

SafeArena: Evaluating the Safety of Autonomous Web Agents

Authors:Ada Defne Tur, Nicholas Meade, Xing Han Lù, Alejandra Zambrano, Arkil Patel, Esin Durmus, Spandana Gella, Karolina Stańczak, Siva Reddy

LLM-based agents are becoming increasingly proficient at solving web-based tasks. With this capability comes a greater risk of misuse for malicious purposes, such as posting misinformation in an online forum or selling illicit substances on a website. To evaluate these risks, we propose SafeArena, the first benchmark to focus on the deliberate misuse of web agents. SafeArena comprises 250 safe and 250 harmful tasks across four websites. We classify the harmful tasks into five harm categories – misinformation, illegal activity, harassment, cybercrime, and social bias, designed to assess realistic misuses of web agents. We evaluate leading LLM-based web agents, including GPT-4o, Claude-3.5 Sonnet, Qwen-2-VL 72B, and Llama-3.2 90B, on our benchmark. To systematically assess their susceptibility to harmful tasks, we introduce the Agent Risk Assessment framework that categorizes agent behavior across four risk levels. We find agents are surprisingly compliant with malicious requests, with GPT-4o and Qwen-2 completing 34.7% and 27.3% of harmful requests, respectively. Our findings highlight the urgent need for safety alignment procedures for web agents. Our benchmark is available here: https://safearena.github.io

基于LLM的代理在网络任务解决方面越来越熟练。随着这种能力的增强,被恶意用于不正当目的的风险也随之增加,如在在线论坛上发布错误信息或在网站上出售非法物质。为了评估这些风险,我们提出了SafeArena,这是第一个专注于网络代理故意误用的基准测试。SafeArena包含四个网站上的250个安全和250个有害任务。我们将有害任务分为五个类别:错误信息、非法活动、骚扰、网络犯罪和社会偏见,旨在评估网络代理的实际滥用情况。我们在基准测试上评估了领先的基于LLM的网络代理,包括GPT-4o、Claude-3.5 Sonnet、Qwen-2-VL 72B和Llama-3.2 90B。为了系统地评估他们对有害任务的敏感性,我们引入了Agent Risk Assessment框架,该框架根据四个风险级别对代理行为进行分类。我们发现代理对恶意请求的遵从度令人惊讶,GPT-4o和Qwen-2分别完成了34.7%和27.3%的有害请求。我们的研究结果凸显了网络代理安全对齐程序的迫切需求。我们的基准测试可在此找到:https://safearena.github.io。

论文及项目相关链接

PDF

Summary

大型语言模型(LLM)驱动的代理在解决网络任务方面越来越熟练,但同时也带来了更大的恶意使用风险,如在网上论坛发布虚假信息或在网站上非法销售物品。为评估这些风险,我们提出了SafeArena基准测试,这是第一个专注于网络代理故意误用的测试。SafeArena包含四个网站上的共250个安全任务和250个有害任务。有害任务分为五类:虚假信息、非法活动、骚扰、网络犯罪和社会偏见,旨在评估网络代理的实际恶意使用。评估发现,GPT-4o和Qwen-2等大型语言模型意外遵循恶意请求的比例达到令人担忧的程度。我们的研究凸显了为网络代理制定安全校准程序的紧迫需求。更多信息请访问SafeArena官网:网址链接

Key Takeaways

Cool Papers

点此查看论文截图

VQEL: Enabling Self-Developed Symbolic Language in Agents through Vector Quantization in Emergent Language Games

Authors:Mohammad Mahdi Samiei Paqaleh, Mahdieh Soleymani Baghshah

In the field of emergent language, efforts have traditionally focused on developing communication protocols through interactions between agents in referential games. However, the aspect of internal language learning, where language serves not only as a communicative tool with others but also as a means for individual thinking, self-reflection, and problem-solving remains underexplored. Developing a language through self-play, without another agent’s involvement, poses a unique challenge. It requires an agent to craft symbolic representations and train them using direct gradient methods. The challenge here is that if an agent attempts to learn symbolic representations through self-play using conventional modeling and techniques such as REINFORCE, the solution will offer no advantage over previous multi-agent approaches. We introduce VQEL, a novel method that incorporates Vector Quantization into the agents’ architecture, enabling them to autonomously invent and develop discrete symbolic representations in a self-play referential game. Following the self-play phase, agents can enhance their language through reinforcement learning and interactions with other agents in the mutual-play phase. Our experiments across various datasets demonstrate that VQEL not only outperforms the traditional REINFORCE method but also benefits from improved control and reduced susceptibility to collapse, thanks to the incorporation of vector quantization.

在新兴语言领域,传统上的努力主要聚焦于通过参考游戏中代理之间的交互来开发通信协议。然而,关于内部语言学习的方面仍然被忽视。在这里,语言不仅作为与他人交流的工具,还作为个人思考、自我反思和解决问题的手段。通过自我游戏发展语言,没有另一个代理的参与,提出了一个独特的挑战。它要求代理制作符号表示并使用直接梯度方法进行训练。这里的挑战在于,如果代理试图通过自我游戏使用REINFORCE等常规建模和技术来学习符号表示,那么解决方案将不会比先前的多代理方法有任何优势。我们引入了VQEL,这是一种将向量量化纳入代理架构的新方法,使它们能够在自我游戏的参考游戏中自主发明和发展离散符号表示。在自我游戏阶段之后,代理可以通过强化学习与互动与其他代理在相互游戏阶段提高他们的语言水平。我们在各种数据集上的实验表明,VQEL不仅优于传统的REINFORCE方法,而且由于引入了向量量化技术,还提高了控制性能并降低了崩溃的易感性。

论文及项目相关链接

PDF

Summary

本文探讨了新兴语言领域中的内部语言学习问题。传统上,人们注重通过互动游戏中的代理之间发展沟通协议。然而,语言不仅是与他人交流的工具,也是个人思考、自我反思和解决问题的手段,这一方面的学习仍被忽视。本文通过自主游玩方式开发语言带来了独特的挑战,要求代理能够制造符号性表达并对其进行训练,采用了矢量量化的新技术实现了超越传统方法的性能提升。实验证明,该技术不仅提高了语言质量,而且减少了语言崩溃的风险。

Key Takeaways

  • 传统新兴语言领域研究集中在通过代理互动的游戏发展沟通协议,但内部语言学习的重要性被忽视。
  • 语言不仅是交流工具,也是个人思考、自我反思和解决问题的手段。
  • 通过自主游玩发展语言是一大挑战,需要代理制造符号性表达并使用直接梯度法进行训练。
  • 使用矢量量化技术(VQEL)使得代理能够在自主游戏中自主发明和发展离散符号表达。
  • VQEL技术不仅在性能上超越了传统的REINFORCE方法,也受益于更良好的控制和更少的语言崩溃风险。

Cool Papers

点此查看论文截图

ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks

Authors:Heng Zhou, Hejia Geng, Xiangyuan Xue, Zhenfei Yin, Lei Bai

Multi-agent systems have emerged as a promising approach for enhancing the reasoning capabilities of large language models in complex problem-solving. However, current MAS frameworks are limited by poor flexibility and scalability, with underdeveloped optimization strategies. To address these challenges, we propose ReSo, which integrates task graph generation with a reward-driven two-stage agent selection process. The core of ReSo is the proposed Collaborative Reward Model, which can provide fine-grained reward signals for MAS cooperation for optimization. We also introduce an automated data synthesis framework for generating MAS benchmarks, without human annotations. Experimentally, ReSo matches or outperforms existing methods. ReSo achieves \textbf{33.7%} and \textbf{32.3%} accuracy on Math-MAS and SciBench-MAS SciBench, while other methods completely fail. Code is available at: \href{https://github.com/hengzzzhou/ReSo}{ReSo}

多智能体系统已成为增强大型语言模型在复杂问题解决中的推理能力的一种有前途的方法。然而,当前的多智能体系统框架受限于灵活性和可扩展性较差,优化策略不够成熟。为了解决这些挑战,我们提出了ReSo,它将任务图生成与奖励驱动的两阶段智能体选择过程相结合。ReSo的核心是提出的协作奖励模型,该模型可以为多智能体系统的合作提供精细的奖励信号,以实现优化。我们还引入了一个自动化数据合成框架,用于生成多智能体系统的基准测试,无需人工标注。通过实验验证,ReSo的性能与现有方法相匹配或超越现有方法。ReSo在Math-MAS和SciBench-MAS SciBench上实现了**33.7%32.3%**的准确率,而其他方法则完全失败。代码可在以下网址找到:https://github.com/hengzzzhou/ReSo

论文及项目相关链接

PDF

Summary

多智能体系统作为一种增强大型语言模型在复杂问题解决中的推理能力的方法展现出巨大潜力。然而,当前的多智能体系统框架受限于灵活性和可扩展性不足,优化策略也相对落后。为解决这些问题,本文提出ReSo系统,结合任务图生成和奖励驱动的两阶段智能体选择过程。其核心是提出的协同奖励模型,可为多智能体系统的合作提供精细奖励信号以实现优化。此外,还引入自动化数据合成框架,无需人工标注即可生成多智能体系统基准测试。实验表明,ReSo在Math-MAS和SciBench-MAS上的准确率分别达到了33.7%和32.3%,而其他方法则完全失败。相关代码已上传至GitHub。

Key Takeaways

  1. 多智能体系统用于增强大型语言模型的推理能力。
  2. 当前多智能体系统框架存在灵活性和可扩展性问题。
  3. ReSo系统通过结合任务图生成和奖励驱动的两阶段智能体选择来解决这些问题。
  4. ReSo的核心是协同奖励模型,提供精细奖励信号优化多智能体系统的合作。
  5. 引入自动化数据合成框架,无需人工标注生成多智能体系统基准测试。
  6. ReSo在Math-MAS和SciBench-MAS上的准确率表现优异。

Cool Papers

点此查看论文截图

M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality

Authors:Ziyan Wang, Zhicheng Zhang, Fei Fang, Yali Du

Designing effective reward functions in multi-agent reinforcement learning (MARL) is a significant challenge, often leading to suboptimal or misaligned behaviors in complex, coordinated environments. We introduce Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality (M3HF), a novel framework that integrates multi-phase human feedback of mixed quality into the MARL training process. By involving humans with diverse expertise levels to provide iterative guidance, M3HF leverages both expert and non-expert feedback to continuously refine agents’ policies. During training, we strategically pause agent learning for human evaluation, parse feedback using large language models to assign it appropriately and update reward functions through predefined templates and adaptive weight by using weight decay and performance-based adjustments. Our approach enables the integration of nuanced human insights across various levels of quality, enhancing the interpretability and robustness of multi-agent cooperation. Empirical results in challenging environments demonstrate that M3HF significantly outperforms state-of-the-art methods, effectively addressing the complexities of reward design in MARL and enabling broader human participation in the training process.

在多智能体强化学习(MARL)中设计有效的奖励函数是一项重大挑战,通常会导致在复杂、协调的环境中表现不佳或行为失准。我们引入了多智能体强化学习从混合质量多阶段人类反馈(M3HF),这是一个新型框架,它将混合质量的多阶段人类反馈整合到MARL训练过程中。通过涉及具有不同专业水平的人类提供迭代指导,M3HF利用专家和非专家的反馈来持续调整智能体的策略。在训练过程中,我们战略性地在人类评估时暂停智能体学习,使用大型语言模型解析反馈并适当分配,并通过使用权重衰减和基于性能调整的预定模板和自适应权重来更新奖励函数。我们的方法能够整合各种质量层次的微妙人类见解,提高多智能体合作的解释性和稳健性。在具有挑战性的环境中的经验结果表明,M3HF显著优于最先进的方法,有效解决MARL中奖励设计的复杂性,并使得更广泛的人类参与训练过程成为可能。

论文及项目相关链接

PDF Seventeen pages, four figures

Summary

在多智能体强化学习(MARL)中设计有效的奖励函数是一大挑战,这经常导致在复杂、协调的环境中表现不佳或出现错位行为。本文引入了一个新型框架M3HF,它首次将混合质量的多阶段人类反馈集成到MARL训练过程中。通过涉及具有不同专业水平的专家提供迭代指导,M3HF利用专家和非专家的反馈来持续调整智能体策略。在训练过程中,我们战略性地在关键阶段暂停智能体学习以进行人类评估,并使用大型语言模型解析反馈,通过预设模板和自适应权重更新奖励函数,同时采用权重衰减和基于性能的调整。我们的方法能够整合各种质量层次的微妙人类见解,提高多智能体合作的解释性和稳健性。在具有挑战性的环境中的实证结果表明,M3HF显著优于现有技术方法,有效地解决了强化学习中的奖励设计复杂性,并使得更广泛的人类参与训练过程成为可能。

Key Takeaways

  • M3HF是一个新型框架,将多阶段人类反馈集成到多智能体强化学习(MARL)的训练过程中。
  • M3HF利用专家和非专家的反馈来持续调整智能体的策略和行为。
  • 在训练过程中,M3HF会暂停智能体学习以进行人类评估,并使用大型语言模型解析反馈。
  • M3HF通过预设模板和自适应权重更新奖励函数,同时采用权重衰减和基于性能的调整机制。
  • M3HF能够整合各种质量层次的微妙人类见解,提高多智能体合作的解释性和稳健性。
  • M3HF在具有挑战性的环境中的表现优于现有技术方法。

Cool Papers

点此查看论文截图

Safe Multi-Agent Navigation guided by Goal-Conditioned Safe Reinforcement Learning

Authors:Meng Feng, Viraj Parimi, Brian Williams

Safe navigation is essential for autonomous systems operating in hazardous environments. Traditional planning methods excel at long-horizon tasks but rely on a predefined graph with fixed distance metrics. In contrast, safe Reinforcement Learning (RL) can learn complex behaviors without relying on manual heuristics but fails to solve long-horizon tasks, particularly in goal-conditioned and multi-agent scenarios. In this paper, we introduce a novel method that integrates the strengths of both planning and safe RL. Our method leverages goal-conditioned RL and safe RL to learn a goal-conditioned policy for navigation while concurrently estimating cumulative distance and safety levels using learned value functions via an automated self-training algorithm. By constructing a graph with states from the replay buffer, our method prunes unsafe edges and generates a waypoint-based plan that the agent follows until reaching its goal, effectively balancing faster and safer routes over extended distances. Utilizing this unified high-level graph and a shared low-level goal-conditioned safe RL policy, we extend this approach to address the multi-agent safe navigation problem. In particular, we leverage Conflict-Based Search (CBS) to create waypoint-based plans for multiple agents allowing for their safe navigation over extended horizons. This integration enhances the scalability of goal-conditioned safe RL in multi-agent scenarios, enabling efficient coordination among agents. Extensive benchmarking against state-of-the-art baselines demonstrates the effectiveness of our method in achieving distance goals safely for multiple agents in complex and hazardous environments. Our code and further details about or work is available at https://safe-visual-mapf-mers.csail.mit.edu/.

安全导航对于在危险环境中运行的自主系统至关重要。传统规划方法在长期任务上表现出色,但它们依赖于具有固定距离度量的预定义图。相比之下,安全强化学习(RL)能够学习复杂的行为,而无需依赖手动启发式方法,但在解决长期任务时却失败了,特别是在目标条件和多智能体场景中。在本文中,我们介绍了一种结合规划和安全RL优点的新方法。我们的方法利用目标条件RL和安全RL来学习导航的目标条件策略,同时使用学到的值函数通过自动化自训练算法来估计累积距离和安全水平。通过从回放缓冲区中构建状态图,我们的方法会删除不安全的边缘并生成基于路点的计划,智能体会遵循该计划直到达到目标,从而在较长距离上有效地平衡更快和更安全的路线。通过利用统一的高级图和共享的低位目标条件安全RL策略,我们将此方法扩展到解决多智能体安全导航问题。特别是,我们利用基于冲突的搜索(CBS)来为多个智能体创建基于路点的计划,从而实现他们在长期内的安全导航。这种集成提高了目标条件安全RL在多智能体场景中的可扩展性,实现了智能体之间的有效协调。与最新基准的广泛基准测试表明,我们的方法在复杂和危险的环境中为多个智能体实现安全达到距离目标方面非常有效。有关我们的工作和代码的更详细信息可在https://safe-visual-mapf-mers.csail.mit.edu/找到。

论文及项目相关链接

PDF Due to the limitation “The abstract field cannot be longer than 1,920 characters”, the abstract here is shorter than that in the PDF file

Summary

本文介绍了一种结合规划和安全强化学习(RL)的方法,用于自主系统在危险环境中的安全导航。该方法通过构建包含状态回放缓冲区信息的高级别图,生成基于路点的计划,并跟随该计划直至达成目标,实现了快速安全路线的平衡。同时,该方法利用冲突搜索算法解决多智能体安全导航问题,提高了目标导向安全RL的可扩展性。实验结果表明,该方法在复杂危险环境中实现了多智能体的安全距离目标达成。

Key Takeaways

  • 传统规划方法擅长处理长周期任务,但依赖于预设图和固定距离度量。
  • 安全强化学习(RL)无需手动启发式学习复杂行为,但在解决长周期任务时表现不足,特别是在目标条件和多智能体场景中。
  • 本文结合了规划和安全RL的优势,通过构建包含状态回放缓冲区信息的高级别图,生成基于路点的计划以实现快速和安全路线的平衡。
  • 利用冲突搜索算法解决了多智能体的安全导航问题,增强了目标导向安全RL的扩展性。

Cool Papers

点此查看论文截图

Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs

Authors:Ruichen Zhang, Mufan Qiu, Zhen Tan, Mohan Zhang, Vincent Lu, Jie Peng, Kaidi Xu, Leandro Z. Agudelo, Peter Qian, Tianlong Chen

Web browsing agents powered by large language models (LLMs) have shown tremendous potential in automating complex web-based tasks. Existing approaches typically rely on large LLMs (e.g., GPT-4o) to explore web environments and generate trajectory data, which is then used either for demonstration retrieval (for large LLMs) or to distill small LLMs (e.g., Llama3) in a process that remains decoupled from the exploration. In this paper, we propose AgentSymbiotic, an iterative framework that couples data synthesis with task-performance, yielding a “symbiotic improvement” for both large and small LLMs. Our study uncovers a complementary dynamic between LLM types: while large LLMs excel at generating high-quality trajectories for distillation, the distilled small LLMs-owing to their distinct reasoning capabilities-often choose actions that diverge from those of their larger counterparts. This divergence drives the exploration of novel trajectories, thereby enriching the synthesized data. However, we also observe that the performance of small LLMs becomes a bottleneck in this iterative enhancement process. To address this, we propose two innovations in LLM distillation: a speculative data synthesis strategy that mitigates off-policy bias, and a multi-task learning approach designed to boost the reasoning capabilities of the student LLM. Furthermore, we introduce a Hybrid Mode for Privacy Preservation to address user privacy concerns. Evaluated on the WEBARENA benchmark, AgentSymbiotic achieves SOTA performance with both LLM types. Our best Large LLM agent reaches 52%, surpassing the previous best of 45%, while our 8B distilled model demonstrates a competitive 49%, exceeding the prior best of 28%. Code will be released upon acceptance.

由大型语言模型(LLM)驱动的网页浏览代理在自动化复杂的网络任务方面显示出巨大的潜力。现有方法通常依赖于大型LLM(例如GPT-4o)来探索网络环境和生成轨迹数据,这些数据然后用于演示检索(针对大型LLM)或蒸馏小型LLM(例如Llama3),这一过程与探索仍然脱节。在本文中,我们提出了AgentSymbiotic,这是一个迭代框架,它将数据合成与任务性能相结合,为大型和小型LLM带来“共生改进”。我们的研究发现大型LLM和小型LLM之间存在一种互补动态:虽然大型LLM擅长生成用于蒸馏的高质量轨迹,但由于其独特的推理能力,蒸馏后的小型LLM往往会选择与其较大同类不同的行动。这种分歧驱动了新轨迹的探索,从而丰富了合成数据。然而,我们还观察到小型LLM的性能成为这种迭代增强过程的瓶颈。为了解决这一问题,我们在LLM蒸馏中提出了两项创新:一种缓解离策略偏差的投机性数据合成策略,以及一种旨在提高学员LLM推理能力的多任务学习方法。此外,我们引入了一种混合模式来进行隐私保护,以解决用户隐私担忧。在WEBARENA基准上评估,AgentSymbiotic在两种LLM类型上都实现了最佳性能。我们最好的大型LLM代理达到了52%,超过了之前的最佳成绩45%,而我们8B蒸馏模型表现出竞争力的49%,超过了之前的最佳成绩28%。代码将在接受后发布。

论文及项目相关链接

PDF

摘要

大型语言模型驱动的网页浏览代理在自动化复杂网络任务方面展现出巨大潜力。现有方法通常依赖大型语言模型(如GPT-4o)来探索网络环境和生成轨迹数据,这些数据用于演示检索(针对大型语言模型)或蒸馏小型语言模型(如Llama3)。然而这一过程与探索解耦。本文提出AgentSymbiotic,一个将数据合成与任务执行相结合的迭代框架,为大型和小型语言模型带来“共生改进”。研究发现大型语言模型在生成高质量轨迹以进行蒸馏方面表现出色,而蒸馏得到的小型语言模型因其独特的推理能力而选择不同于大型模型的行为,推动探索新轨迹,从而丰富合成数据。但小型语言模型的性能成为迭代增强过程的瓶颈。为解决这一问题,我们在语言模型蒸馏方面提出两项创新:一种缓解离策略偏差的投机数据合成策略,以及一种旨在提升学生语言模型推理能力的多任务学习。此外,为应对用户隐私担忧,我们引入了隐私保护混合模式。在WEBARENA基准测试上,AgentSymbiotic在两种类型的语言模型上都实现了最佳性能。我们最好的大型语言模型代理达到了52%,超过了之前的最佳成绩45%,而我们的8B蒸馏模型也表现出竞争力,达到49%,超过了之前的最佳成绩28%。

关键见解

  1. 大型语言模型在生成高质量轨迹以进行蒸馏方面表现出色。
  2. 小型语言模型因其独特的推理能力而选择不同于大型模型的行为。
  3. 投机数据合成策略和多任务学习用于提升小型语言模型的性能。
  4. 引入隐私保护混合模式以解决用户隐私担忧。
  5. AgentSymbiotic框架在WEBARENA基准测试中表现优异,大型语言模型性能达到52%,小型语言模型性能有所提升。
  6. 代码将在接受后发布。
  7. 该研究为语言模型在自动化复杂网络任务方面的应用提供了新的视角和解决方案。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
MMT MMT
MMT 方向最新论文已更新,请持续关注 Update in 2025-03-11 Parallel Corpora for Machine Translation in Low-resource Indic Languages A Comprehensive Review
2025-03-11
下一篇 
LLM LLM
LLM 方向最新论文已更新,请持续关注 Update in 2025-03-11 A Survey of Large Language Model Empowered Agents for Recommendation and Search Towards Next-Generation Information Retrieval
2025-03-11
  目录