发布日期: 2025-09-13

更新日期: 2025-10-07

文章字数: 12.9k

阅读时长: 52 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-13 更新

Authors:Jiaxuan Ma, Yong Chen, Guangting Chen, Mingyang Gong, Guohui Lin, An Zhang

We study the fair allocation of indivisible items to $n$ agents to maximize the utilitarian social welfare, where the fairness criterion is envy-free up to one item and there are only two different utility functions shared by the agents. We present a $2$-approximation algorithm when the two utility functions are normalized, improving the previous best ratio of $16 \sqrt{n}$ shown for general normalized utility functions; thus this constant ratio approximation algorithm confirms the APX-completeness in this special case previously shown APX-hard. When there are only three agents, i.e., $n = 3$, the previous best ratio is $3$ shown for general utility functions, and we present an improved and tight $\frac 53$-approximation algorithm when the two utility functions are normalized, and a best possible and tight $2$-approximation algorithm when the two utility functions are unnormalized.

我们研究了不可分物品在n个实体之间的公平分配，目标是最大化功利主义的社会福利，这里的公平标准是羡慕自由度最多为一项物品，且只有两个不同的效用函数被这些实体所共享。当两个效用函数标准化时，我们提出了一个近似度为2的算法，这提高了以前对通用标准化效用函数显示的最好比率$16 \sqrt{n}$；因此这个常数比率近似算法证实了在此特殊情况下之前显示的APX完全性为APX难度。当只有三个实体，即n=3时，对于一般效用函数之前显示的最佳比率为3，我们提出了改进且紧密的$\frac 53$近似算法，当两个效用函数标准化时，以及最好且紧密的2近似算法，当两个效用函数非标准化时。

论文及项目相关链接

PDF A shorter version appears in ISAAC 2025; 20 pages in this full version

Summary

研究不可分物品的公平分配问题，旨在最大化功利主义社会福利，公平标准是嫉妒性自由至一项物品，且只有两个不同的效用函数被代理人共享。当两个效用函数正规化时，提出了一个近似比为2的近似算法，改进了之前针对通用正规化效用函数显示的最好比率16√n，证明了此特殊情况下先前被证明的APX完全性。当有仅有的三个代理人时，即n=3时，对一般效用函数提出了最佳的近似比为3/5的近似算法，并对两种效用函数正规化和未正规化的情况分别提出了最佳可能的近似比为2的算法。

Key Takeaways

研究了不可分物品的公平分配问题，旨在最大化功利主义社会福利。
公平标准是嫉妒性自由至一项物品。
当效用函数正规化时，提出了一个近似比为2的近似算法，改进了之前的最佳比率。
在特殊情况下确认了APX-完全性。
对于三个代理人（n=3），改进了近似算法，并提供了针对正规化和未正规化效用函数的最佳可能近似比。
研究中涉及到效用函数的差异对算法复杂性和近似比的影响。

Cool Papers

点此查看论文截图

Bridging the Capability Gap: Joint Alignment Tuning for Harmonizing LLM-based Multi-Agent Systems

Authors:Minghang Zhu, Zhengliang Shi, Zhiwei Xu, Shiguang Wu, Lingjie Wang, Pengjie Ren, Zhaochun Ren, Zhumin Chen

The advancement of large language models (LLMs) has enabled the construction of multi-agent systems to solve complex tasks by dividing responsibilities among specialized agents, such as a planning agent for subgoal generation and a grounding agent for executing tool-use actions. Most existing methods typically fine-tune these agents independently, leading to capability gaps among them with poor coordination. To address this, we propose MOAT, a Multi-Agent Joint Alignment Tuning framework that improves agents collaboration through iterative alignment. MOAT alternates between two key stages: (1) Planning Agent Alignment, which optimizes the planning agent to generate subgoal sequences that better guide the grounding agent; and (2) Grounding Agent Improving, which fine-tunes the grounding agent using diverse subgoal-action pairs generated by the agent itself to enhance its generalization capablity. Theoretical analysis proves that MOAT ensures a non-decreasing and progressively convergent training process. Experiments across six benchmarks demonstrate that MOAT outperforms state-of-the-art baselines, achieving average improvements of 3.1% on held-in tasks and 4.4% on held-out tasks.

大型语言模型（LLM）的进步使得构建多智能体系统成为可能，通过专业智能体之间的责任分工来解决复杂任务，例如规划智能体用于子目标生成和接地智能体用于执行工具使用动作。大多数现有方法通常独立微调这些智能体，导致它们之间存在能力差距和协调不良。为解决此问题，我们提出了MOAT，即多智能体联合对齐调整框架，它通过迭代对齐改进智能体的协作。MOAT在两个关键阶段之间交替进行：（1）规划智能体对齐，优化规划智能体以生成更好地指导接地智能体的子目标序列；（2）接地智能体改进，使用智能体本身生成的多样化的子目标-动作对微调接地智能体，提高其泛化能力。理论分析证明，MOAT确保了非递减和渐进收敛的训练过程。在六个基准测试上的实验表明，MOAT优于最新基线技术，在已解决的任务上平均提高了3.1%，在未解决的任务上平均提高了4.4%。

论文及项目相关链接

PDF EMNLP 2025 Findings

Summary

大型语言模型（LLMs）的进步使得多智能体系统得以构建，通过专业智能体分担任务来解决复杂任务，如规划智能体负责生成子目标，而接地智能体负责执行工具使用动作。现有方法通常独立微调这些智能体，导致能力差距和协调不良。为解决这一问题，提出MOAT多智能体联合对齐调整框架，通过迭代对齐提高智能体协作能力。MOAT交替进行两个关键阶段：规划智能体对齐，优化规划智能体以生成更好的子目标序列来指导接地智能体；接地智能体改进，使用由智能体本身生成的多样化子目标-动作对进行微调，提高其泛化能力。理论分析和实验证明MOAT确保了非递减、渐进收敛的训练过程，且在多个基准测试中表现优于现有最佳基线。

Key Takeaways

大型语言模型的进步推动了多智能体系统的构建，各智能体可分担任务以解决复杂问题。
现有方法独立微调智能体，导致能力差距和协调问题。
MOAT框架通过迭代对齐提高智能体协作能力，包括规划智能体对齐和接地智能体改进两个关键阶段。
规划智能体对齐阶段优化生成子目标序列，以更好地指导接地智能体。
接地智能体改进阶段使用多样化子目标-动作对进行微调，提高其泛化能力。
理论分析证明MOAT确保了非递减、渐进收敛的训练过程。

Cool Papers

点此查看论文截图

AEGIS: An Agent for Extraction and Geographic Identification in Scholarly Proceedings

Authors:Om Vishesh, Harshad Khadilkar, Deepak Akkil

Keeping pace with the rapid growth of academia literature presents a significant challenge for researchers, funding bodies, and academic societies. To address the time-consuming manual effort required for scholarly discovery, we present a novel, fully automated system that transitions from data discovery to direct action. Our pipeline demonstrates how a specialized AI agent, ‘Agent-E’, can be tasked with identifying papers from specific geographic regions within conference proceedings and then executing a Robotic Process Automation (RPA) to complete a predefined action, such as submitting a nomination form. We validated our system on 586 papers from five different conferences, where it successfully identified every target paper with a recall of 100% and a near perfect accuracy of 99.4%. This demonstration highlights the potential of task-oriented AI agents to not only filter information but also to actively participate in and accelerate the workflows of the academic community.

随着学术文献的快速增长，研究者、资助机构和学术社团面临着巨大的挑战。为了解决学术研究过程中耗时耗力的手动查找问题，我们提出了一种新颖的全自动系统，该系统可以从数据查找过渡到直接操作。我们的管道展示了专业的人工智能代理“Agent-E”如何被赋予在会议论文中识别特定地区论文的任务，然后执行机器人流程自动化（RPA）完成预定义的操作，如提交提名表格。我们在586篇来自五个不同会议的论文上验证了我们的系统，该系统成功识别了所有目标论文，召回率为100%，准确率接近完美，为99.4%。这一演示突显了任务导向型人工智能代理的潜力，它们不仅可以过滤信息，还可以积极参与并加速学术社区的工作流程。

论文及项目相关链接

PDF 5 pages, 2 figures

Summary：随着学术文献的快速增长，研究人员、资助机构和学术团体面临着巨大的挑战。为解决手动学术发现的时间消耗问题，我们提出了一种新型的全自动系统，该系统可从数据发现过渡到直接行动。我们的管道展示了专业AI代理“Agent-E”如何被赋予在会议论文中识别特定地区论文的任务，然后通过机器人流程自动化（RPA）完成预设动作，如提交提名表格。在586篇来自五个不同会议的论文上验证了我们的系统，该系统成功识别了所有目标论文，召回率为100%，准确率接近完美，为99.4%。这个演示突出了任务导向型AI代理的潜力，它们不仅可以过滤信息，还可以积极参与并加速学术社区的工作流程。

Key Takeaways:

学术文献的快速增长给研究人员、资助机构和学术团体带来了挑战。
提出了一种新型全自动系统，从数据发现到直接行动。
专门AI代理“Agent-E”可识别特定地区的论文。
通过机器人流程自动化（RPA）完成预设动作。
系统成功识别了所有目标论文，召回率达100%，准确率接近完美，为99.4%。
任务导向型AI代理的潜力巨大，可过滤信息并加速学术社区的工作流程。

Cool Papers

点此查看论文截图

LightAgent: Production-level Open-source Agentic AI Framework

Authors:Weige Cai, Tong Zhu, Jinyi Niu, Ruiqi Hu, Lingyao Li, Tenglong Wang, Xiaowu Dai, Weining Shen, Liwen Zhang

With the rapid advancement of large language models (LLMs), Multi-agent Systems (MAS) have achieved significant progress in various application scenarios. However, substantial challenges remain in designing versatile, robust, and efficient platforms for agent deployment. To address these limitations, we propose \textbf{LightAgent}, a lightweight yet powerful agentic framework, effectively resolving the trade-off between flexibility and simplicity found in existing frameworks. LightAgent integrates core functionalities such as Memory (mem0), Tools, and Tree of Thought (ToT), while maintaining an extremely lightweight structure. As a fully open-source solution, it seamlessly integrates with mainstream chat platforms, enabling developers to easily build self-learning agents. We have released LightAgent at \href{https://github.com/wxai-space/LightAgent}{https://github.com/wxai-space/LightAgent}

随着大型语言模型（LLM）的快速发展，多智能体系统（MAS）在各种应用场景中取得了显著进展。然而，在设计通用、稳健、高效的智能体部署平台时，仍然存在诸多挑战。为了解决这些局限性，我们提出了LightAgent——一个轻量级但强大的智能框架，有效解决了现有框架在灵活性和简单性之间的权衡问题。LightAgent集成了内存（mem0）、工具和思维树（ToT）等核心功能，同时保持极轻的结构。作为完全开源的解决方案，它能无缝地融入主流聊天平台，使开发者能够轻松构建自我学习的智能体。我们已在https://github.com/wxai-space/LightAgent上发布了LightAgent。

论文及项目相关链接

PDF

Summary
简洁高效的代理框架LightAgent用于解决多智能体系统部署中的挑战。该框架具有内存、工具和思维树等核心功能，同时结构轻巧，可轻松集成到主流聊天平台中，支持开发者构建自学习智能体。

Key Takeaways

大型语言模型的快速发展推动了多智能体系统在多种应用场景中的显著进步。
多智能体系统在设计和部署中仍面临实现通用性、鲁棒性和效率的难题。
LightAgent框架被提出以解决这些问题，它在灵活性方面有所创新并且简单实用。
LightAgent集成了核心功能如内存管理、工具和思维树等。
LightAgent是一个完全开源的解决方案，可以无缝集成到主流的聊天平台中。
LightAgent允许开发者轻松构建自学习智能体。

Cool Papers

点此查看论文截图

Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents

Authors:Jiawei Wang, Jiacai Liu, Yuqian Fu, Yingru Li, Xintao Wang, Yuan Lin, Yu Yue, Lin Zhang, Yang Wang, Ke Wang

In long-horizon tasks, recent agents based on Large Language Models (LLMs) face a significant challenge that sparse, outcome-based rewards make it difficult to assign credit to intermediate steps. Previous methods mainly focus on creating dense reward signals to guide learning, either through traditional reinforcement learning techniques like inverse reinforcement learning or by using Process Reward Models for step-by-step feedback. In this paper, we identify a fundamental problem in the learning dynamics of LLMs: the magnitude of policy gradients is inherently coupled with the entropy, which leads to inefficient small updates for confident correct actions and potentially destabilizes large updates for uncertain ones. To resolve this, we propose Entropy-Modulated Policy Gradients (EMPG), a framework that re-calibrates the learning signal based on step-wise uncertainty and the final task outcome. EMPG amplifies updates for confident correct actions, penalizes confident errors, and attenuates updates from uncertain steps to stabilize exploration. We further introduce a bonus term for future clarity that encourages agents to find more predictable solution paths. Through comprehensive experiments on three challenging agent tasks, WebShop, ALFWorld, and Deep Search, we demonstrate that EMPG achieves substantial performance gains and significantly outperforms strong policy gradient baselines. Project page is at https://empgseed-seed.github.io/

在长远任务中，基于大型语言模型（LLM）的近期代理面临一个重大挑战，即稀疏的结果导向奖励使得难以对中间步骤进行信用分配。之前的方法主要集中在创建密集的奖励信号来指导学习，无论是通过逆向强化学习等传统强化学习技术，还是通过用于逐步反馈的过程奖励模型。在本文中，我们发现了大型语言模型学习动力中的根本问题：策略梯度的幅度与熵内在相关，这导致了对自信的正确行动进行低效的小更新，并可能使不确定的行动进行不稳定的大更新。为了解决这一问题，我们提出了熵调制策略梯度（EMPG），这是一种根据逐步不确定性和最终任务结果重新校准学习信号的框架。EMPG放大了自信的正确行动的更新，惩罚自信的误差，并减少不确定步骤的更新以稳定探索。为了进一步鼓励未来的清晰性，我们还引入了一个奖励术语来激励代理寻找更可预测的解决方案路径。我们在三个具有挑战性的代理任务WebShop、ALFWorld和Deep Search上进行了全面的实验，证明了EMPG实现了显著的性能提升，并显著优于强大的策略梯度基线。项目页面位于：https://empgseed-seed.github.io/ 。

论文及项目相关链接

PDF ICLR 2026 Under review

Summary

本文提出了Entropy-Modulated Policy Gradients（EMPG）框架来解决大型语言模型（LLM）在长远任务中面临的挑战。该框架通过重新校准学习信号来解决策略梯度与熵的内在耦合问题，从而提高自信正确行动的更新效率，惩罚自信错误，并稳定不确定步骤的更新以探索。在三个具有挑战性的任务中，EMPG实现了显著的性能提升并显著优于强大的策略梯度基线。

Key Takeaways

LLMs在长远任务中面临稀疏奖励的问题，难以对中间步骤进行信用分配。
策略梯度的幅度与熵的内在耦合导致更新效率低下。
EMPG框架通过重新校准学习信号来解决上述问题，提高自信正确行动的更新效率，并稳定不确定步骤的更新以探索。
EMPG放大对自信正确行动的更新，惩罚自信错误，并衰减不确定步骤的更新。
EMPG引入未来清晰度的奖励项来鼓励代理找到更可预测的解决方案路径。
通过WebShop、ALFWorld和Deep Search等实验证明，EMPG在显著的性能提升方面优于强大的策略梯度基线。

Cool Papers

点此查看论文截图

ProgD: Progressive Multi-scale Decoding with Dynamic Graphs for Joint Multi-agent Motion Forecasting

Authors:Xing Gao, Zherui Huang, Weiyao Lin, Xiao Sun

Accurate motion prediction of surrounding agents is crucial for the safe planning of autonomous vehicles. Recent advancements have extended prediction techniques from individual agents to joint predictions of multiple interacting agents, with various strategies to address complex interactions within future motions of agents. However, these methods overlook the evolving nature of these interactions. To address this limitation, we propose a novel progressive multi-scale decoding strategy, termed ProgD, with the help of dynamic heterogeneous graph-based scenario modeling. In particular, to explicitly and comprehensively capture the evolving social interactions in future scenarios, given their inherent uncertainty, we design a progressive modeling of scenarios with dynamic heterogeneous graphs. With the unfolding of such dynamic heterogeneous graphs, a factorized architecture is designed to process the spatio-temporal dependencies within future scenarios and progressively eliminate uncertainty in future motions of multiple agents. Furthermore, a multi-scale decoding procedure is incorporated to improve on the future scenario modeling and consistent prediction of agents’ future motion. The proposed ProgD achieves state-of-the-art performance on the INTERACTION multi-agent prediction benchmark, ranking $1^{st}$, and the Argoverse 2 multi-world forecasting benchmark.

周围智能体的精确动作预测对于自动驾驶车辆的安全规划至关重要。最新进展已经将预测技术从单个智能体扩展到多个交互智能体的联合预测，并采用了各种策略来解决智能体未来动作中的复杂交互问题。然而，这些方法忽略了这些交互的不断发展变化。为了解决这个问题，我们提出了一种新型渐进多尺度解码策略，称为ProgD，借助基于动态异构图的场景建模来实现。特别是为了明确全面地捕捉未来场景中不断发展的社会交互，并考虑到其固有的不确定性，我们设计了基于动态异构图的场景渐进建模。随着这种动态异构图的展开，设计了一种分解架构来处理未来场景中的时空依赖性，并渐进地消除多个智能体未来动作的不确定性。此外，还融入了一种多尺度解码流程，以改进未来场景建模和智能体未来动作的一致性预测。所提出的ProgD在INTERACTION多智能体预测基准测试上取得了最先进的性能，排名第一，并在Argoverse 2多世界预测基准测试上也取得了优异成绩。

论文及项目相关链接

PDF

Summary
预测周围代理的精确动作对于自主车辆的安全规划至关重要。最新技术已从单个代理的预测扩展到多个交互代理的联合预测，并采用了多种策略来解决未来动作中的复杂交互。然而，这些方法忽略了这些互动的不断发展。为了解决这个问题，我们提出了一种新的渐进多尺度解码策略，称为ProgD，借助动态异构图情景建模。为了明确全面地捕捉未来情景中不断变化的社交互动，我们设计了基于动态异构图的情景渐进建模。随着动态异构图的展开，设计了一种因子化架构来处理未来情景中的时空依赖性，并渐进地消除未来动作的不确定性。此外，还结合了多尺度解码程序，以改进未来情景建模和代理未来运动的持续预测。提出的ProgD在INTERACTION多代理预测基准测试上取得了最新性能，排名第一，并在Argoverse 2多世界预测基准测试上取得了良好成绩。

Key Takeaways

准确预测周围代理的运动对自主车辆的安全规划至关重要。
最新技术已经扩展到对多个交互代理的联合预测。
现有方法忽略了代理之间互动的不断发展。
提出了一种新的渐进多尺度解码策略（ProgD）来改进预测。
利用动态异构图进行情景建模，以捕捉不断变化的社交互动。
ProgD在多个基准测试中取得了最新性能，包括INTERACTION和Argoverse 2。

Cool Papers

点此查看论文截图

Shell or Nothing: Real-World Benchmarks and Memory-Activated Agents for Automated Penetration Testing

Authors:Wuyuao Mai, Geng Hong, Qi Liu, Jinsong Chen, Jiarun Dai, Xudong Pan, Yuan Zhang, Min Yang

Penetration testing is critical for identifying and mitigating security vulnerabilities, yet traditional approaches remain expensive, time-consuming, and dependent on expert human labor. Recent work has explored AI-driven pentesting agents, but their evaluation relies on oversimplified capture-the-flag (CTF) settings that embed prior knowledge and reduce complexity, leading to performance estimates far from real-world practice. We close this gap by introducing the first real-world, agent-oriented pentesting benchmark, TermiBench, which shifts the goal from ‘flag finding’ to achieving full system control. The benchmark spans 510 hosts across 25 services and 30 CVEs, with realistic environments that require autonomous reconnaissance, discrimination between benign and exploitable services, and robust exploit execution. Using this benchmark, we find that existing systems can hardly obtain system shells under realistic conditions. To address these challenges, we propose TermiAgent, a multi-agent penetration testing framework. TermiAgent mitigates long-context forgetting with a Located Memory Activation mechanism and builds a reliable exploit arsenal via structured code understanding rather than naive retrieval. In evaluations, our work outperforms state-of-the-art agents, exhibiting stronger penetration testing capability, reducing execution time and financial cost, and demonstrating practicality even on laptop-scale deployments. Our work delivers both the first open-source benchmark for real-world autonomous pentesting and a novel agent framework that establishes a milestone for AI-driven penetration testing.

渗透测试对于识别和缓解安全漏洞至关重要，但传统的方法仍然昂贵、耗时，并且依赖于专业的人力劳动。近期的工作已经探索了AI驱动的渗透测试代理，但它们的评估依赖于简化的捕获标志（CTF）设置，这些设置包含了先验知识并降低了复杂性，导致性能估计与实际应用相差甚远。我们通过引入面向真实世界的首个渗透测试基准测试平台TermiBench来弥补这一差距，该平台的目标从“寻找标志”转变为实现全面系统控制。该基准测试平台跨越25项服务和30个CVE的510个主机，具有现实环境，需要进行自主侦察、区分良性服务和可利用服务，以及稳健的漏洞利用执行。使用这个基准测试平台，我们发现现有系统几乎无法在真实条件下获取系统外壳。为了应对这些挑战，我们提出了TermiAgent，这是一个多代理渗透测试框架。TermiAgent通过定位内存激活机制缓解长期上下文遗忘问题，并通过结构化代码理解而不是简单检索来建立可靠的漏洞利用库。在评估中，我们的工作优于最新代理，表现出更强的渗透测试能力，减少了执行时间和财务成本，甚至在笔记本电脑规模的部署中也证明了实用性。我们的工作不仅提供了首个面向真实世界自主渗透测试的开源基准测试平台，还提出了一个新颖的代理框架，为AI驱动的渗透测试树立了里程碑。

论文及项目相关链接

PDF

Summary
渗透测试对于识别并缓解安全漏洞至关重要，但传统方法既昂贵又耗时，且依赖于专业人力。近期的研究开始探索AI驱动的渗透测试代理，但其评估依赖于过于简化的捕获旗帜（CTF）设置，这种设置包含先验知识并降低了复杂性，导致性能评估结果与实际世界实践相去甚远。为缩小这一差距，我们引入了首个面向代理的渗透测试基准——TermiBench，它将目标从“寻找旗帜”转变为实现系统全面控制。该基准包括510个主机、25项服务和30项CVE，具有真实环境，需要自主侦察、区分良性服务与可渗透服务以及稳健的漏洞利用执行。使用此基准，我们发现现有系统在实际条件下几乎无法获取系统外壳。为解决这些挑战，我们提出了多代理渗透测试框架——TermiAgent。TermiAgent通过定位内存激活机制缓解长期上下文遗忘，并通过结构化代码理解构建可靠的漏洞库，而非盲目检索。评估显示，我们的工作优于最先进的其他代理，具有更强的渗透测试能力、缩短了执行时间和成本，并在笔记本电脑规模的部署中表现出实用性。我们的工作不仅提供了首个面向真实世界自主渗透测试的开源基准，还推出了一种新颖的代理框架，为AI驱动的渗透测试树立了里程碑。

Key Takeaways

渗透测试在识别安全漏洞方面至关重要，但传统方法存在成本高、耗时长的问题。
近期研究尝试使用AI驱动的渗透测试代理，但评估方法过于简化，与现实世界应用有较大差距。
引入TermiBench作为首个面向代理的渗透测试基准，更接近真实环境，强调系统全面控制。
TermiBench包含多种主机、服务和CVE，要求自主侦察、服务区分和稳健的漏洞利用执行。
现有系统在现实条件下渗透测试表现不佳。
TermiAgent框架通过定位内存激活和结构化代码理解提高渗透测试能力。

Cool Papers

点此查看论文截图

Continuous-Time Value Iteration for Multi-Agent Reinforcement Learning

Authors:Xuefeng Wang, Lei Zhang, Henglin Pu, Ahmed H. Qureshi, Husheng Li

Existing reinforcement learning (RL) methods struggle with complex dynamical systems that demand interactions at high frequencies or irregular time intervals. Continuous-time RL (CTRL) has emerged as a promising alternative by replacing discrete-time Bellman recursion with differential value functions defined as viscosity solutions of the Hamilton–Jacobi–Bellman (HJB) equation. While CTRL has shown promise, its applications have been largely limited to the single-agent domain. This limitation stems from two key challenges: (i) conventional solution methods for HJB equations suffer from the curse of dimensionality (CoD), making them intractable in high-dimensional systems; and (ii) even with HJB-based learning approaches, accurately approximating centralized value functions in multi-agent settings remains difficult, which in turn destabilizes policy training. In this paper, we propose a CT-MARL framework that uses physics-informed neural networks (PINNs) to approximate HJB-based value functions at scale. To ensure the value is consistent with its differential structure, we align value learning with value-gradient learning by introducing a Value Gradient Iteration (VGI) module that iteratively refines value gradients along trajectories. This improves gradient fidelity, in turn yielding more accurate values and stronger policy learning. We evaluate our method using continuous-time variants of standard benchmarks, including multi-agent particle environment (MPE) and multi-agent MuJoCo. Our results demonstrate that our approach consistently outperforms existing continuous-time RL baselines and scales to complex multi-agent dynamics.

现有强化学习（RL）方法在需要高频率或不规则时间间隔交互的复杂动态系统上遇到了挑战。连续时间强化学习（CTRL）作为一种有前景的替代方法应运而生，它用微分值函数替换离散时间Bellman递归，定义为Hamilton-Jacobi-Bellman（HJB）方程的粘性解。虽然CTRL已经显示出潜力，但其应用大多局限于单智能体领域。这一限制源于两个主要挑战：(i) HJB方程的常规解法受到维数诅咒（CoD）的影响，使其在高维系统中难以处理；(ii) 即使采用基于HJB的学习方法，在多变环境中准确逼近集中式值函数仍然困难，这反过来又会使策略训练不稳定。在本文中，我们提出了一种使用物理信息神经网络（PINNs）来大规模逼近基于HJB的值函数的CT-MARL框架。为了确保值与它的微分结构一致，我们通过引入价值梯度迭代（VGI）模块，将价值学习与价值梯度学习相结合，该模块沿着轨迹迭代优化价值梯度。这提高了梯度的保真度，进而得到更精确的值和更强的策略学习。我们使用包括多智能体粒子环境（MPE）和多智能体MuJoCo在内的标准连续时间基准测试来评估我们的方法。结果表明，我们的方法一直优于现有的连续时间RL基准测试，并能适应复杂的多智能体动态。

论文及项目相关链接

PDF 19 pages, 10 figures

Summary

该文提出一种基于物理信息神经网络（PINNs）的连续时间多智能体强化学习（CT-MARL）框架，解决了传统强化学习在处理复杂动态系统时的不足。通过引入值梯度迭代（VGI）模块，提高了值梯度的保真度，从而得到更准确的价值和更强的策略学习。在连续时间标准基准测试环境中进行了评估，包括多智能体粒子环境（MPE）和多智能体MuJoCo，结果表明该方法在连续时间强化学习基准测试中表现优异，能够扩展到复杂的多智能体动态系统。

Key Takeaways

连续时间强化学习（CTRL）通过解决离散时间Bellman递归问题，使用基于HJB方程的微分值函数作为替代方案，以应对复杂动态系统。
CTRL在单智能体领域已有应用，但在多智能体系统中面临两大挑战：高维系统的维度诅咒（CoD）以及集中值函数在多智能体环境中的准确近似问题。
提出了一种基于物理信息神经网络（PINNs）的CT-MARL框架来解决这些问题，以近似大规模的HJB值函数。
通过引入价值梯度迭代（VGI）模块，将价值学习与价值梯度学习相结合，提高了梯度保真度，从而得到更准确的价值和更强的策略学习。
在连续时间的多智能体环境基准测试中验证了该方法的有效性，包括多智能体粒子环境（MPE）和多智能体MuJoCo。
该方法表现出优异的性能，不仅优于现有的连续时间RL基准测试方法，还能扩展到复杂的连续时间多智能体动态系统。

Cool Papers

点此查看论文截图

Understanding Economic Tradeoffs Between Human and AI Agents in Bargaining Games

Authors:Crystal Qian, Kehang Zhu, John Horton, Benjamin S. Manning, Vivian Tsai, James Wexler, Nithum Thain

Coordination tasks traditionally performed by humans are increasingly being delegated to autonomous agents. As this pattern progresses, it becomes critical to evaluate not only these agents’ performance but also the processes through which they negotiate in dynamic, multi-agent environments. Furthermore, different agents exhibit distinct advantages: traditional statistical agents, such as Bayesian models, may excel under well-specified conditions, whereas large language models (LLMs) can generalize across contexts. In this work, we compare humans (N = 216), LLMs (GPT-4o, Gemini 1.5 Pro), and Bayesian agents in a dynamic negotiation setting that enables direct, identical-condition comparisons across populations, capturing both outcomes and behavioral dynamics. Bayesian agents extract the highest surplus through aggressive optimization, at the cost of frequent trade rejections. Humans and LLMs can achieve similar overall surplus, but through distinct behaviors: LLMs favor conservative, concessionary trades with few rejections, while humans employ more strategic, risk-taking, and fairness-oriented behaviors. Thus, we find that performance parity – a common benchmark in agent evaluation – can conceal fundamental differences in process and alignment, which are critical for practical deployment in real-world coordination tasks.

传统上由人类执行的协调任务正越来越多地委托给自主代理。随着这一模式的推进，评估这些代理的性能固然重要，但评估他们在动态多代理环境中进行谈判的过程也至关重要。此外，不同的代理表现出不同的优势：传统的统计代理，如贝叶斯模型，可能在条件明确的情况下表现出色，而大型语言模型（LLM）可以在不同的上下文环境中进行概括。在这项工作中，我们在一个动态谈判环境中比较了人类（N=216）、LLM（GPT-4o、Gemini 1.5 Pro）和贝叶斯代理，该环境可以直接在相同条件下跨人群进行比较，可以捕捉结果和行为动态。贝叶斯代理通过优化实现最高的超额收益，成本是频繁的贸易拒绝。人类和LLM可以实现类似的总体超额盈余，但行为方式不同：LLM喜欢保守、让步的贸易并尽量减少拒绝次数，而人类则采用更具策略性、承担风险和以公平为导向的行为。因此，我们发现性能等价性（代理评估中的常见基准）可能会掩盖过程和一致性上的根本差异，这对于在现实世界中的协调任务的实际部署至关重要。

论文及项目相关链接

PDF

Summary
人类正在逐渐将原本由自己执行的管理工作委托给自主代理，这是一个重要的趋势。为了优化这种趋势下的性能与过程，本实验对人群进行多任务调查评估与动态分析对比。我们发现在对比环境比较频繁的情况下，贝叶斯代理能取得更高的盈余价值，但往往伴随频繁的交易拒绝。人类和大语言模型虽然也可以取得相似的总体盈余，但行为方式截然不同。因此，性能评估只是代理评估的一个基准，而过程与对齐方式同样至关重要。在现实世界中的协调任务中，这些因素的实际部署具有决定性作用。

Key Takeaways

以下是该文本的关键要点：

随着自主代理承担越来越多的传统由人类完成的管理工作，评估其性能及在动态多代理环境中的过程变得至关重要。
贝叶斯代理在某些情况下表现优越，能通过优化取得更高的盈余价值，但频繁的交易拒绝可能影响效率与合作关系。

Cool Papers

点此查看论文截图

Towards Generalized Routing: Model and Agent Orchestration for Adaptive and Efficient Inference

Authors:Xiyu Guo, Shan Wang, Chunfang Ji, Xuefeng Zhao, Wenhao Xi, Yaoyao Liu, Qinglan Li, Chao Deng, Junlan Feng

The rapid advancement of large language models (LLMs) and domain-specific AI agents has greatly expanded the ecosystem of AI-powered services. User queries, however, are highly diverse and often span multiple domains and task types, resulting in a complex and heterogeneous landscape. This diversity presents a fundamental routing challenge: how to accurately direct each query to an appropriate execution unit while optimizing both performance and efficiency. To address this, we propose MoMA (Mixture of Models and Agents), a generalized routing framework that integrates both LLM and agent-based routing. Built upon a deep understanding of model and agent capabilities, MoMA effectively handles diverse queries through precise intent recognition and adaptive routing strategies, achieving an optimal balance between efficiency and cost. Specifically, we construct a detailed training dataset to profile the capabilities of various LLMs under different routing model structures, identifying the most suitable tasks for each LLM. During inference, queries are dynamically routed to the LLM with the best cost-performance efficiency. We also introduce an efficient agent selection strategy based on a context-aware state machine and dynamic masking. Experimental results demonstrate that the MoMA router offers superior cost-efficiency and scalability compared to existing approaches.

随着大型语言模型（LLM）和特定领域AI代理人的快速发展，AI助力服务的生态系统得到了极大的扩展。然而，用户查询非常多样，通常涉及多个领域和任务类型，形成了一个复杂且异构的局面。这种多样性带来了一个基本的路由挑战：如何准确地将每个查询引导到适当的执行单元，同时优化性能和效率。针对这一问题，我们提出了MoMA（模型和代理人混合，Mixture of Models and Agents）系统，这是一个结合了LLM和基于代理人的路由的通用路由框架。基于模型和代理人能力的深刻理解，MoMA通过精确意图识别和自适应路由策略有效处理各种查询，在效率和成本之间实现最佳平衡。具体来说，我们构建了一个详细的训练数据集，以分析不同路由模型结构下各种LLM的能力，并确定每个LLM最合适的任务。在推理过程中，查询被动态路由到具有最佳成本性能效率的LLM。我们还引入了一种基于上下文感知状态机和动态掩码的有效的代理人选择策略。实验结果表明，与现有方法相比，MoMA路由器具有更高的成本效益和可扩展性。

论文及项目相关链接

PDF

Summary

大语言模型和领域特定AI代理的快速进步大大扩展了人工智能服务生态系统。用户查询呈现高度多样性和跨领域特性，带来根本性的路由挑战：如何准确地将每个查询引导到适当的执行单元，同时优化性能和效率。为解决此问题，我们提出MoMA（模型和代理的混合体）这一通用路由框架，集成语言模型和代理路由。MoMA深入了解模型和代理的能力，通过精确意图识别和自适应路由策略有效处理多样查询，实现效率和成本的优化平衡。实验结果表明，MoMA路由器相较于现有方法具有优越的成本效率和可扩展性。

Key Takeaways

大语言模型和领域特定AI代理的进步推动了AI服务生态系统的扩展。
用户查询的多样性和跨领域特性带来路由挑战。
MoMA是一种通用路由框架，集成语言模型和代理路由以解决此挑战。
MoMA通过精确意图识别和自适应路由策略处理多样查询。
MoMA使用详细的训练数据集来识别最适合每个语言模型的任务。
查询被动态路由到具有最佳成本性能效率的语言模型。

Cool Papers

点此查看论文截图

Harmonia: A Multi-Agent Reinforcement Learning Approach to Data Placement and Migration in Hybrid Storage Systems

Authors:Rakesh Nadig, Vamanan Arulchelvan, Rahul Bera, Taha Shahroodi, Gagandeep Singh, Andreas Kakolyris, Mohammad Sadrosadati, Jisung Park, Onur Mutlu

Hybrid storage systems (HSS) integrate multiple storage devices with diverse characteristics to deliver high performance and capacity at low cost. The performance of an HSS highly depends on the effectiveness of two key policies: (1) the data-placement policy, which determines the best-fit storage device for incoming data, and (2) the data-migration policy, which dynamically rearranges stored data (i.e., prefetches hot data and evicts cold data) across the devices to sustain high HSS performance. Prior works optimize either data placement or data migration in isolation, which leads to suboptimal HSS performance. Unfortunately, no prior work tries to optimize both policies together. Our goal is to design a holistic data-management technique that optimizes both data-placement and data-migration policies to fully exploit the potential of an HSS, and thus significantly improve system performance. We propose Harmonia, a multi-agent reinforcement learning (RL)-based data-management technique that employs two lightweight autonomous RL agents, a data-placement agent and a data-migration agent, that adapt their policies for the current workload and HSS configuration while coordinating with each other to improve overall HSS performance. We evaluate Harmonia on real HSS configurations with up to four heterogeneous storage devices and seventeen data-intensive workloads. On performance-optimized (cost-optimized) HSS with two storage devices, Harmonia outperforms the best-performing prior approach by 49.5% (31.7%) on average. On an HSS with three (four) devices, Harmonia outperforms the best-performing prior work by 37.0% (42.0%) on average. Harmonia’s performance benefits come with low latency (240ns for inference) and storage overheads (206 KiB in DRAM for both RL agents combined). We will open-source Harmonia’s implementation to aid future research on HSS.

混合存储系统（HSS）融合了多种具有不同特性的存储设备，以低成本提供高性能和大容量。HSS的性能高度依赖于两个关键策略的有效性：（1）数据放置策略，它决定了传入数据的最佳存储设备；（2）数据迁移策略，它动态地重新排列存储数据（即预取热门数据和逐出冷门数据）以维持HSS的高性能。早期的工作主要优化数据放置或数据迁移中的某一个方面，这导致HSS性能不佳。然而，遗憾的是，没有早期的工作尝试同时优化这两种策略。我们的目标是设计一种全面的数据管理技术，优化数据放置和数据迁移策略，以充分利用HSS的潜力，从而显著提高系统性能。我们提出了Harmonia，这是一种基于多智能体强化学习（RL）的数据管理技术，它采用两个轻量级的自主RL智能体，即数据放置智能体和数据迁移智能体，这两个智能体能适应当前的工作负载和HSS配置，同时相互协调以提高HSS的整体性能。我们在具有多达四个异构存储设备和十七个数据密集型工作负载的真实HSS配置上评估了Harmonia。在性能优化（成本优化）的具有两个存储设备的HSS上，Harmonia平均比最佳性能的先前方法高出49.5%（31.7%）。在具有三个（四个）设备的HSS上，Harmonia平均比最佳性能的先前工作高出37.0%（42.0%）。Harmonia的性能优势具有低延迟（推理时间为240ns）和存储开销（两个RL智能体的DRAM占用仅为206KiB）。我们将开源Harmonia的实现，以帮助未来的HSS研究。

论文及项目相关链接

PDF

Summary

本文介绍了混合存储系统（HSS）的优化问题。文章指出，现有研究通常只针对数据放置或数据迁移进行优化，而忽略了二者的协同优化。文章提出了一种全面的数据管理技术——Harmonia，该技术使用多智能体强化学习来同时优化数据放置和数据迁移策略。在真实混合存储系统配置和多种工作负载下的评估表明，Harmonia在性能上显著优于现有方法。

Key Takeaways

混合存储系统（HSS）通过集成多种存储设备来提高性能和容量。
数据放置和数据迁移是HSS性能的两个关键策略，但现有研究通常只针对其中之一进行优化。
Harmonia是一种全新的数据管理技术，采用多智能体强化学习，可同时优化数据放置和数据迁移策略。
Harmonia在多种工作负载下的真实HSS配置中表现出卓越的性能，平均优于现有最佳方法49.5%（性能优化HSS）和31.7%（成本优化HSS）。

Cool Papers

点此查看论文截图

VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification

Authors:Jungjae Lee, Dongjae Lee, Chihun Choi, Youngmin Im, Jaeyoung Wi, Kihong Heo, Sangeun Oh, Sunjae Lee, Insik Shin

Large Foundation Models (LFMs) have unlocked new possibilities in human-computer interaction, particularly with the rise of mobile Graphical User Interface (GUI) Agents capable of interacting with mobile GUIs. These agents allow users to automate complex mobile tasks through simple natural language instructions. However, the inherent probabilistic nature of LFMs, coupled with the ambiguity and context-dependence of mobile tasks, makes LFM-based automation unreliable and prone to errors. To address this critical challenge, we introduce VeriSafe Agent (VSA): a formal verification system that serves as a logically grounded safeguard for Mobile GUI Agents. VSA deterministically ensures that an agent’s actions strictly align with user intent before executing the action. At its core, VSA introduces a novel autoformalization technique that translates natural language user instructions into a formally verifiable specification. This enables runtime, rule-based verification of agent’s actions, detecting erroneous actions even before they take effect. To the best of our knowledge, VSA is the first attempt to bring the rigor of formal verification to GUI agents, bridging the gap between LFM-driven actions and formal software verification. We implement VSA using off-the-shelf LFM services (GPT-4o) and evaluate its performance on 300 user instructions across 18 widely used mobile apps. The results demonstrate that VSA achieves 94.33%-98.33% accuracy in verifying agent actions, outperforming existing LFM-based verification methods by 30.00%-16.33%, and increases the GUI agent’s task completion rate by 90%-130%.

大型基础模型（LFMs）为人类与计算机的交互带来了新的可能性，特别是在移动图形用户界面（GUI）代理兴起的情况下，这些代理能够与用户进行交互。这些代理允许用户通过简单的自然语言指令自动化复杂的移动任务。然而，LFMs的固有概率性，以及移动任务的模糊性和上下文依赖性，使得基于LFM的自动化不可靠且容易出错。为了应对这一关键挑战，我们引入了VeriSafe代理（VSA）：一种作为移动GUI代理的逻辑基础保障的形式化验证系统。VSA确定性地确保代理的行动严格符合用户的意图再执行行动。其核心是，VSA引入了一种新型的自形式化技术，将自然语言用户指令翻译为可形式化验证的规范。这实现了基于规则的代理行动运行时验证，甚至在行动生效之前就能检测出错误行动。据我们所知，VSA是首次尝试将形式化验证的严谨性引入到GUI代理中，缩小了LFM驱动的行动和正式软件验证之间的差距。我们使用现成的LFM服务（GPT-4o）实现了VSA，并对300条用户指令在18个广泛使用的移动应用程序上进行了性能评估。结果表明，VSA在验证代理行动方面的准确率达到了94.33%-98.33%，比现有的基于LFM的验证方法高出30.00%-16.33%，并提高了GUI代理的任务完成率90%-130%。

论文及项目相关链接

PDF

Summary

大型模型（LFMs）为移动图形用户界面（GUI）代理的自然语言交互提供了新可能，但存在不确定性问题。为解决此问题，提出了VeriSafe代理（VSA），这是一种逻辑严密的验证系统，为移动GUI代理提供逻辑保障。VSA通过自动形式化技术将自然语言用户指令转化为可形式化验证的规范，实现运行时规则验证，检测错误动作。VSA是首个将形式化验证引入GUI代理的系统，填补了LFM驱动动作与正式软件验证之间的空白。在广泛使用的移动应用上进行的实验表明，VSA具有较高的准确性，优于现有的LFM验证方法。

Key Takeaways

大型模型（LFMs）推动了移动GUI代理的自然语言交互能力。
移动GUI代理允许用户通过简单的自然语言指令自动化复杂的移动任务。
LFMs的固有概率性以及移动任务中的歧义和上下文依赖使得自动化存在不确定性问题。
VeriSafe代理（VSA）作为一种逻辑严密的验证系统，旨在解决上述问题。
VSA通过一种新颖的自自动形式化技术将自然语言指令转化为可形式化验证的规范。
VSA实现了运行时规则验证，能有效检测错误动作。

Cool Papers

点此查看论文截图

EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds

Authors:Lu Chen, Yizhou Wang, Shixiang Tang, Qianhong Ma, Tong He, Wanli Ouyang, Xiaowei Zhou, Hujun Bao, Sida Peng

Learning an agent model that behaves like humans-capable of jointly perceiving the environment, predicting the future, and taking actions from a first-person perspective-is a fundamental challenge in computer vision. Existing methods typically train separate models for these abilities, which fail to capture their intrinsic relationships and prevent them from learning from each other. Inspired by how humans learn through the perception-action loop, we propose EgoAgent, a unified agent model that simultaneously learns to represent, predict, and act within a single transformer. EgoAgent explicitly models the causal and temporal dependencies among these abilities by formulating the task as an interleaved sequence of states and actions. It further introduces a joint embedding-action-prediction architecture with temporally asymmetric predictor and observer branches, enabling synergistic optimization across all three capabilities. Comprehensive evaluations of EgoAgent on representative tasks such as image classification, egocentric future state prediction, and 3D human motion prediction demonstrate the superiority of our method. The code and trained models will be publicly available at https://github.com/zju3dv/EgoAgent.

学习一种能够像人类一样感知环境、预测未来、从第一人称视角采取行动的行为代理模型，是计算机视觉领域的一个基本挑战。现有的方法通常针对这些能力进行单独的模型训练，无法捕捉它们之间的内在联系，也阻止它们相互学习。受人类通过感知-行动循环学习的启发，我们提出了EgoAgent，这是一个统一的代理模型，能够在单个变压器内同时学习表示、预测和采取行动。EgoAgent通过将这些任务制定为状态和行动的交错序列，显式地模拟这些能力之间的因果和时间依赖关系。它进一步引入了一种具有时间上不对称的预测器和观察者分支的联合嵌入-动作预测架构，以实现三项能力的协同优化。对EgoAgent在图像分类、第一人称未来状态预测和三维人体运动预测等代表性任务上的全面评估证明了我们的方法的优越性。代码和训练好的模型将在https://github.com/zju3dv/EgoAgent公开可用。

论文及项目相关链接

PDF Project Page: https://egoagent.github.io | Demo Video: https://youtu.be/qhfHp_sfDvY

Summary

本文介绍了学习一种能够像人类一样感知环境、预测未来、采取行动的智能体模型在计算机视觉中的基本挑战。现有的方法通常针对这些能力训练单独的模型，无法捕捉它们之间的内在联系并阻碍相互学习。受人类通过感知行动循环学习的启发，我们提出了EgoAgent模型，这是一种统一的智能体模型，能够在单个转换器内同时学习表示、预测和行动。EgoAgent通过将这些任务表述为状态和动作的交织序列，显式地模拟这些能力之间的因果和时间依赖关系。它引入了联合嵌入动作预测架构，具有时间不对称的预测器和观察者分支，实现三项功能的协同优化。对EgoAgent在图像分类、以自我为中心的未来状态预测和三维人体运动预测等代表性任务上的全面评估证明了我们的方法优越性。

Key Takeaways

学习像人类一样的智能体模型是计算机视觉中的一项基本挑战，需要同时掌握感知环境、预测未来和采取行动的能力。
现有方法通常单独训练模型，无法捕捉这些能力之间的内在联系。
EgoAgent是一种统一的智能体模型，能够在单个转换器内同时学习表示、预测和行动。
EgoAgent显式地模拟感知和行动能力之间的因果和时间依赖关系。
EgoAgent引入了一种联合嵌入动作预测架构，具有时间不对称的预测器和观察者分支。
该模型在图像分类、未来状态预测和三维人体运动预测等任务上表现出优越性。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-09-13/Agent/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

Agent

Few-Shot

Few-Shot 方向最新论文已更新，请持续关注 Update in 2025-09-13 Decoupling Clinical and Class-Agnostic Features for Reliable Few-Shot Adaptation under Shift

2025-09-13 Few-Shot

Few-Shot

LLM

LLM 方向最新论文已更新，请持续关注 Update in 2025-09-13 The Illusion of Diminishing Returns Measuring Long Horizon Execution in LLMs

2025-09-13 LLM

LLM

Agent

2025-09-13 更新

Maximizing social welfare among EF1 allocations at the presence of two types of agents

Bridging the Capability Gap: Joint Alignment Tuning for Harmonizing LLM-based Multi-Agent Systems

AEGIS: An Agent for Extraction and Geographic Identification in Scholarly Proceedings

LightAgent: Production-level Open-source Agentic AI Framework

Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents

ProgD: Progressive Multi-scale Decoding with Dynamic Graphs for Joint Multi-agent Motion Forecasting

Shell or Nothing: Real-World Benchmarks and Memory-Activated Agents for Automated Penetration Testing

Continuous-Time Value Iteration for Multi-Agent Reinforcement Learning

Understanding Economic Tradeoffs Between Human and AI Agents in Bargaining Games

Towards Generalized Routing: Model and Agent Orchestration for Adaptive and Efficient Inference

Harmonia: A Multi-Agent Reinforcement Learning Approach to Data Placement and Migration in Hybrid Storage Systems

VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification

EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds