⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-02-28 更新
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Authors:Hao Peng, Yunjia Qi, Xiaozhi Wang, Zijun Yao, Bin Xu, Lei Hou, Juanzi Li
Reward models (RMs) are crucial for the training and inference-time scaling up of large language models (LLMs). However, existing reward models primarily focus on human preferences, neglecting verifiable correctness signals which have shown strong potential in training LLMs. In this paper, we propose agentic reward modeling, a reward system that combines reward models with verifiable correctness signals from different aspects to provide reliable rewards. We empirically implement a reward agent, named RewardAgent, that combines human preference rewards with two verifiable signals: factuality and instruction following, to provide more reliable rewards. We conduct comprehensive experiments on existing reward model benchmarks and inference time best-of-n searches on real-world downstream tasks. RewardAgent significantly outperforms vanilla reward models, demonstrating its effectiveness. We further construct training preference pairs using RewardAgent and train an LLM with the DPO objective, achieving superior performance on various NLP benchmarks compared to conventional reward models. Our codes are publicly released to facilitate further research (https://github.com/THU-KEG/Agentic-Reward-Modeling).
奖励模型(RMs)对于大规模语言模型(LLMs)的训练和推理时间扩展至关重要。然而,现有的奖励模型主要关注人类偏好,忽视了可验证的正确性信号,这些信号在训练LLMs方面已显示出巨大潜力。在本文中,我们提出了代理奖励建模,这是一种结合奖励模型与从不同方面来的可验证正确性信号的奖励系统,以提供可靠的奖励。我们通过实证实现了一个名为RewardAgent的奖励代理,它将人类偏好奖励与两个可验证的信号(真实性和指令遵循性)相结合,以提供更可靠的奖励。我们在现有的奖励模型基准测试和现实世界下游任务的推理时间best-of-n搜索上进行了全面的实验。RewardAgent显著优于普通奖励模型,证明了其有效性。我们进一步使用RewardAgent构建训练偏好对,并使用DPO目标训练LLM,在各种NLP基准测试上实现了相较于传统奖励模型的卓越性能。我们的代码已公开发布,以便进行进一步的研究(https://github.com/THU-KEG/Agentic-Reward-Modeling)。
论文及项目相关链接
PDF 16 pages, 5 figures
Summary
奖励模型(RMs)对于大型语言模型(LLMs)的训练和推理时间扩展至关重要。然而,现有奖励模型主要关注人类偏好,忽略了可验证的正确性信号在训练LLMs方面的强大潜力。本文提出一种结合奖励模型与从不同方面获取的可验证正确性信号的奖励系统,称为“Agentic奖励建模”。我们实证实现了一种名为RewardAgent的奖励代理,它结合了人类偏好奖励与两个可验证的信号:真实性和指令遵循性,以提供更可靠的奖励。在现有的奖励模型基准测试和现实世界下游任务的推理时间上进行全面实验。RewardAgent显著优于普通奖励模型,证明了其有效性。我们使用RewardAgent构建训练偏好对,并使用DPO目标训练LLM,在多种NLP基准测试中实现优于传统奖励模型的表现。
Key Takeaways
- 奖励模型(RMs)在大型语言模型(LLMs)的训练和推理时间扩展中起关键作用。
- 现有奖励模型主要关注人类偏好,忽略了可验证的正确性信号。
- 提出了“Agentic奖励建模”方法,结合奖励模型与可验证的正确性信号。
- RewardAgent结合人类偏好奖励与两个可验证的信号:真实性和指令遵循性。
- RewardAgent在现有奖励模型基准测试中表现优异。
- 使用RewardAgent构建的训练偏好对可以提升LLM的性能。
- 公开了代码以促进进一步研究。
点此查看论文截图




Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems
Authors:Pierre Peigne-Lefebvre, Mikolaj Kniejski, Filip Sondej, Matthieu David, Jason Hoelscher-Obermaier, Christian Schroeder de Witt, Esben Kran
As AI agents are increasingly adopted to collaborate on complex objectives, ensuring the security of autonomous multi-agent systems becomes crucial. We develop simulations of agents collaborating on shared objectives to study these security risks and security trade-offs. We focus on scenarios where an attacker compromises one agent, using it to steer the entire system toward misaligned outcomes by corrupting other agents. In this context, we observe infectious malicious prompts - the multi-hop spreading of malicious instructions. To mitigate this risk, we evaluated several strategies: two “vaccination” approaches that insert false memories of safely handling malicious input into the agents’ memory stream, and two versions of a generic safety instruction strategy. While these defenses reduce the spread and fulfillment of malicious instructions in our experiments, they tend to decrease collaboration capability in the agent network. Our findings illustrate potential trade-off between security and collaborative efficiency in multi-agent systems, providing insights for designing more secure yet effective AI collaborations.
随着人工智能代理越来越多地被用于共同应对复杂的任务目标,确保自主多智能体系统的安全变得至关重要。我们开发代理模拟来应对共享目标,以研究这些安全风险和权衡。我们关注的场景是攻击者可能会利用某个代理来控制整个系统的输出方向,攻击其他代理使结果无法正确对齐目标。在这种情况下,我们发现具有恶意诱导信息的传播极为快速和广泛。为了缓解这种风险,我们评估了多种策略:两种“疫苗”方法通过在代理内存流中插入关于如何安全处理恶意输入的虚假记忆来发挥作用,还有两种安全指令策略的版本。尽管这些防御措施减少了恶意指令在我们的实验中的传播和完成次数,但它们往往降低了代理网络中的协作能力。我们的研究为多智能体系统中安全与协作效率之间的潜在权衡提供了见解,为设计更加安全且有效的AI协作提供了启示。
论文及项目相关链接
PDF Accepted to AAAI 2025 Conference
Summary:随着人工智能代理被越来越广泛地应用于复杂的协同目标中,确保自主多智能体系统的安全性变得至关重要。该研究通过模拟代理之间的协同合作来研究这些安全风险及安全权衡问题。研究重点是一个攻击者如何通过腐败一个或多个代理来操控整个系统并引导其偏离预期目标的情况。研究发现恶意指令会在代理之间多跳传播,即感染性恶意提示现象。为了降低这种风险,研究评估了几种策略,包括两种“疫苗接种”方法,即在代理的记忆流中插入关于安全处理恶意输入的虚假记忆,以及两种安全指令策略的版本。这些防御措施虽然减少了恶意指令的传播和履行,但往往会降低代理网络中的协作能力。本研究揭示了多智能体系统中安全性和协作效率之间的潜在权衡,为设计更安全的AI协同合作提供了启示。
Key Takeaways:
- AI代理在协同工作中面临安全风险,尤其是当攻击者腐败一个或多个代理以操控整个系统时。
- 研究通过模拟发现恶意指令在多智能体系统中的传播现象,即感染性恶意提示。
- 评估了多种策略来降低风险,包括“疫苗接种”方法和安全指令策略。
- 这些防御策略虽能减少恶意指令的传播和履行,但可能影响代理之间的协作效率。
- 研究揭示了多智能体系统中安全性和协作效率之间的权衡。
- 需要设计更安全的AI协同合作策略,以平衡安全性和效率。
点此查看论文截图






Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation
Authors:Humza Sami, Mubashir ul Islam, Samy Charas, Asav Gandhi, Pierre-Emmanuel Gaillardon, Valerio Tenace
Recent advancements in Large Language Models (LLMs) have substantially evolved Multi-Agent Systems (MASs) capabilities, enabling systems that not only automate tasks but also leverage near-human reasoning capabilities. To achieve this, LLM-based MASs need to be built around two critical principles: (i) a robust architecture that fully exploits LLM potential for specific tasks – or related task sets – and ($ii$) an effective methodology for equipping LLMs with the necessary capabilities to perform tasks and manage information efficiently. It goes without saying that a priori architectural designs can limit the scalability and domain adaptability of a given MAS. To address these challenges, in this paper we introduce Nexus: a lightweight Python framework designed to easily build and manage LLM-based MASs. Nexus introduces the following innovations: (i) a flexible multi-supervisor hierarchy, (ii) a simplified workflow design, and (iii) easy installation and open-source flexibility: Nexus can be installed via pip and is distributed under a permissive open-source license, allowing users to freely modify and extend its capabilities. Experimental results demonstrate that architectures built with Nexus exhibit state-of-the-art performance across diverse domains. In coding tasks, Nexus-driven MASs achieve a 99% pass rate on HumanEval and a flawless 100% on VerilogEval-Human, outperforming cutting-edge reasoning language models such as o3-mini and DeepSeek-R1. Moreover, these architectures display robust proficiency in complex reasoning and mathematical problem solving, achieving correct solutions for all randomly selected problems from the MATH dataset. In the realm of multi-objective optimization, Nexus-based architectures successfully address challenging timing closure tasks on designs from the VTR benchmark suite, while guaranteeing, on average, a power saving of nearly 30%.
最近大型语言模型(LLM)的进展极大地推动了多智能体系统(MAS)的能力,使系统不仅能够自动化任务,而且能够利用接近人类的推理能力。为实现这一点,基于LLM的MAS需要围绕两个关键原则构建:(i)一个能够充分利用LLM对特定任务(或相关任务集)的潜力的稳健架构;(ii)一种为LLM配备必要能力以高效执行任务和管理信息的有效方法。不言而喻,预先的架构设计可能会限制给定MAS的可扩展性和领域适应性。
论文及项目相关链接
Summary
基于大型语言模型(LLM)的多智能体系统(MAS)近期取得了显著进展,实现了自动化任务并具备了近似人类推理的能力。为实现这一目标,需要围绕两个关键原则构建LLM-based MAS:一是充分利用LLM潜力的稳健架构,二是为LLM配备执行任务和管理信息所需能力的有效方法。本文介绍了一个名为Nexus的轻量级Python框架,该框架易于构建和管理基于LLM的MAS,并引入了灵活的多监督层次结构、简化工作流程、易于安装和开源灵活性等特点。实验结果表明,使用Nexus构建的架构在各个领域都表现出卓越的性能。
Key Takeaways
- 大型语言模型(LLM)的进展已显著提升了多智能体系统(MAS)的能力,使其具备了自动化任务和近似人类推理的能力。
- 构建LLM-based MAS需要围绕两个关键原则:利用LLM潜力的稳健架构,以及为LLM配备执行任务和管理信息所需能力的有效方法。
- Nexus是一个轻量级的Python框架,用于构建和管理基于LLM的MAS,具有灵活的多监督层次结构、简化工作流程、易于安装和开源灵活性等特点。
- 使用Nexus构建的架构在各个领域表现出卓越的性能,如在编码任务中实现了高通过率,以及在多目标优化中成功解决了具有挑战性的问题。
- Nexus框架可以通过pip进行安装,并且是在许可的开源许可下分发,允许用户自由修改和扩展其能力。
- 实验结果表明,Nexus在复杂推理和数学问题解决方面表现出强大的能力,能够正确解决MATH数据集中所有随机选择的问题。
点此查看论文截图



ColaCare: Enhancing Electronic Health Record Modeling through Large Language Model-Driven Multi-Agent Collaboration
Authors:Zixiang Wang, Yinghao Zhu, Huiya Zhao, Xiaochen Zheng, Dehao Sui, Tianlong Wang, Wen Tang, Yasha Wang, Ewen Harrison, Chengwei Pan, Junyi Gao, Liantao Ma
We introduce ColaCare, a framework that enhances Electronic Health Record (EHR) modeling through multi-agent collaboration driven by Large Language Models (LLMs). Our approach seamlessly integrates domain-specific expert models with LLMs to bridge the gap between structured EHR data and text-based reasoning. Inspired by the Multidisciplinary Team (MDT) approach used in clinical settings, ColaCare employs two types of agents: DoctorAgents and a MetaAgent, which collaboratively analyze patient data. Expert models process and generate predictions from numerical EHR data, while LLM agents produce reasoning references and decision-making reports within the MDT-driven collaborative consultation framework. The MetaAgent orchestrates the discussion, facilitating consultations and evidence-based debates among DoctorAgents, simulating diverse expertise in clinical decision-making. We additionally incorporate the Merck Manual of Diagnosis and Therapy (MSD) medical guideline within a retrieval-augmented generation (RAG) module for medical evidence support, addressing the challenge of knowledge currency. Extensive experiments conducted on three EHR datasets demonstrate ColaCare’s superior performance in clinical mortality outcome and readmission prediction tasks, underscoring its potential to revolutionize clinical decision support systems and advance personalized precision medicine. All code, case studies and a questionnaire are available at the project website: https://colacare.netlify.app.
我们介绍了ColaCare,这是一个通过大型语言模型(LLM)驱动的多智能体协作增强电子健康记录(EHR)建模的框架。我们的方法将特定领域的专家模型与LLM无缝集成,以弥合结构化EHR数据与基于文本的推理之间的鸿沟。受临床环境中多学科团队(MDT)方法的启发,ColaCare采用两种类型的智能体:DoctorAgents和MetaAgent,它们协同分析患者数据。专家模型处理来自数字EHR数据生成预测,而LLM智能体在MDT驱动的协作咨询框架中产生推理参考和决策报告。MetaAgent负责协调讨论,促进DoctorAgents之间的会诊和循证辩论,模拟临床决策中的不同专业知识。此外,我们还在检索增强生成(RAG)模块中融入了默克诊断和治疗手册(MSD)医学指南,以支持医学证据,应对知识更新的挑战。在三个EHR数据集上进行的广泛实验表明,ColaCare在临床死亡率和再入院率预测任务中表现出卓越的性能,突显其在革命临床决策支持系统以及推动个性化精准医学方面的潜力。所有代码、案例研究和问卷均可在项目网站找到:https://colacare.netlify.app。
论文及项目相关链接
PDF ACM TheWebConf 2025 Conference (WWW 2025) Research Track
摘要
ColaCare框架通过大型语言模型驱动的多智能体协作,提升了电子病历记录(EHR)建模。该框架无缝集成了领域特定专家模型和大型语言模型,以缩小结构化EHR数据和文本推理之间的鸿沟。ColaCare采用两种智能体:DoctorAgents和MetaAgent,协同分析患者数据。专家模型处理并预测数字EHR数据,而大型语言模型智能体在跨学科团队驱动的协作咨询框架中产生推理参考和决策报告。MetaAgent协调讨论,促进DoctorAgents之间的会诊和基于证据的辩论,模拟临床决策中的多样化专业知识。此外,还将默克诊断和治疗手册(MSD)医学指南纳入检索增强生成(RAG)模块,为医学证据提供支持,解决知识更新挑战。在三个EHR数据集上进行的广泛实验表明,ColaCare在临床死亡结果和再入院预测任务上的性能卓越,具有颠覆临床决策支持系统并推动个性化精准医学的潜力。更多代码、案例研究和问卷可在项目网站找到:https://colacare.netlify.app。
关键见解
- ColaCare框架通过多智能体协作增强EHR建模。
- 整合领域特定专家模型和大型语言模型以优化文本推理与结构化EHR数据间的匹配。
- 采用DoctorAgents和MetaAgent两种智能体进行协同分析患者数据。
- 专家模型处理预测数字EHR数据,大型语言模型智能体提供推理参考和决策报告。
- MetaAgent协调讨论和基于证据的辩论,模拟临床决策中的专业知识多样性。
- 结合默克诊断和治疗手册(MSD)医学指南支持医学证据检索和更新知识挑战。
点此查看论文截图







Large Language Model Agent for Hyper-Parameter Optimization
Authors:Siyi Liu, Chen Gao, Yong Li
Hyperparameter optimization is critical in modern machine learning, requiring expert knowledge, numerous trials, and high computational and human resources. Despite the advancements in Automated Machine Learning (AutoML), challenges in terms of trial efficiency, setup complexity, and interoperability still persist. To address these issues, we introduce a novel paradigm leveraging Large Language Models (LLMs) to automate hyperparameter optimization across diverse machine learning tasks, which is named AgentHPO (short for LLM Agent-based Hyperparameter Optimization). Specifically, AgentHPO processes the task information autonomously, conducts experiments with specific hyperparameters (HPs), and iteratively optimizes them based on historical trials. This human-like optimization process largely reduces the number of required trials, simplifies the setup process, and enhances interpretability and user trust, compared to traditional AutoML methods. Extensive empirical experiments conducted on 12 representative machine-learning tasks indicate that AgentHPO not only matches but also often surpasses the best human trials in terms of performance while simultaneously providing explainable results. Further analysis sheds light on the strategies employed by the LLM in optimizing these tasks, highlighting its effectiveness and adaptability in various scenarios.
超参数优化在现代机器学习领域至关重要,需要专业知识、多次试验以及大量的计算和人力资源。尽管自动机器学习(AutoML)有所发展,但在试验效率、设置复杂性和可操作性方面仍存在挑战。为了解决这些问题,我们引入了一种利用大型语言模型(LLM)自动化执行多样化机器学习任务的超参数优化的新范式,名为AgentHPO(基于LLM的代理超参数优化)。具体来说,AgentHPO可以自主处理任务信息,使用特定的超参数进行实验,并根据历史试验进行迭代优化。与传统AutoML方法相比,这种人机协同的优化过程大大减少了所需的试验次数,简化了设置过程,并提高了可解释性和用户信任度。在具有代表性的机器学习任务的广泛实证实验表明,AgentHPO不仅在性能上达到最佳水平,而且经常超越最佳人工试验的结果,同时提供可解释的结果。进一步的分析揭示了LLM在优化这些任务时所采用的策略,突显其在各种场景中的有效性和适应性。
论文及项目相关链接
Summary
现代机器学习中的超参数优化至关重要,需要专业知识、多次试验以及大量的计算资源。尽管自动化机器学习(AutoML)取得了一些进展,但试验效率、设置复杂性以及互通性等方面仍然存在挑战。为了解决这些问题,我们提出了一种利用大型语言模型(LLM)进行跨多种机器学习任务的自动化超参数优化的新范式,称为AgentHPO。相较于传统AutoML方法,AgentHPO自主处理任务信息、进行特定超参数实验并基于历史试验进行迭代优化,这一过程大大减少了所需的试验次数,简化了设置过程,并提高了可解释性和用户信任度。在12个代表性机器学习任务上进行的广泛实验表明,AgentHPO不仅在性能上达到了最佳水平,甚至在某些情况下超过了人类试验的最佳水平,同时提供了可解释的结果。
Key Takeaways
- 超参数优化在现代机器学习中的重要性及其所需的资源投入。
- 虽然AutoML有进展,但仍存在试验效率、设置复杂性和互通性的挑战。
- AgentHPO利用大型语言模型(LLM)自动化进行跨多种机器学习任务的超参数优化。
- AgentHPO能自主处理任务信息并进行超参数实验,基于历史试验进行迭代优化。
- AgentHPO相较于传统AutoML方法提高了试验效率、简化了设置过程、增强了可解释性和用户信任度。
- 在多个代表性机器学习任务上的实验表明,AgentHPO性能达到了甚至超越了最佳水平的人类试验。
点此查看论文截图


