发布日期: 2025-09-16

更新日期: 2025-10-07

文章字数: 2k

阅读时长: 8 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-16 更新

DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL

Authors:Rui Lu, Zhenyu Hou, Zihan Wang, Hanchen Zhang, Xiao Liu, Yujiang Li, Shi Feng, Jie Tang, Yuxiao Dong

Augmenting large language models (LLMs) with browsing tools substantially improves their potential as deep search agents to solve complex, real-world tasks. Yet, open LLMs still perform poorly in such settings due to limited long-horizon reasoning capacity with browsing tools and the lack of sufficiently difficult supervised data. To address these challenges, we present DeepDive to advance deep search agents. First, we propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs. Second, we apply end-to-end multi-turn reinforcement learning (RL) to enhance LLMs’ long-horizon reasoning with deep search. Experiments show that DeepDive-32B achieves a new open-source competitive result on BrowseComp, outperforming WebSailor, DeepSeek-R1-Browse, and Search-o1. We demonstrate that multi-turn RL training improves deep search ability and significantly contributes to the performance improvements across multiple benchmarks. We observe that DeepDive enables test-time scaling of tool calls and parallel sampling. All datasets, models, and code are publicly available at https://github.com/THUDM/DeepDive.

通过浏览工具增强大型语言模型（LLM）的潜力，可以将其显著地转变为解决复杂现实世界任务的深度搜索代理。然而，开放环境下的LLM在这种设置中仍然表现不佳，这主要是由于其与浏览工具的长期推理能力有限，以及缺乏足够困难的监督数据。为了解决这些挑战，我们提出DeepDive来推进深度搜索代理的发展。首先，我们提出了一种策略，可以自动从开放知识图谱中合成复杂、困难且难以找到的问题。其次，我们应用端到端的多轮强化学习（RL）来提高LLM与深度搜索的长期推理能力。实验表明，DeepDive-32B在BrowseComp上取得了开放源代码的竞赛结果，超越了WebSailor、DeepSeek-R1-Browse和Search-o1。我们证明了多轮RL训练提高了深度搜索能力，并对多个基准测试的性能改善做出了重大贡献。我们观察到DeepDive能够实现工具调用的测试时间扩展和并行采样。所有数据集、模型和代码均可在https://github.com/THUDM/DeepDive公开获取。

论文及项目相关链接

PDF

Summary

大型语言模型（LLM）通过浏览工具进行增强，在深搜索代理解决复杂、现实世界任务方面的潜力得到了显著提升。然而，开放LLM在此类环境中仍表现不佳，存在长期视野推理能力有限以及与浏览工具的集成不足的问题。为应对这些挑战，我们推出DeepDive来提升深搜索代理的性能。首先，我们提出了一种策略，从开放知识图谱中自动合成复杂、困难且难以找到的问题。其次，我们采用端到端的多轮强化学习（RL）来提升LLM与深搜索的长期视野推理能力。实验表明，DeepDive-32B在BrowseComp上取得了开源竞争的新成果，超越了WebSailor、DeepSeek-R1-Browse和Search-o1。我们证明了多轮RL训练提升了深搜索能力，并对多个基准测试的性能改善有显著贡献。DeepDive还支持工具调用的测试时间扩展和并行采样。所有数据集、模型和代码均可在https://github.com/THUDM/DeepDive公开获取。

Key Takeaways

大型语言模型（LLM）与浏览工具结合可提升作为深搜索代理的潜力。
目前开放LLM面临长期视野推理能力和与浏览工具集成的问题。
DeepDive通过自动合成复杂问题和使用多轮强化学习来提升LLM的长期视野推理能力。
DeepDive-32B在BrowseComp上表现优异，超越了其他开源模型。
多轮RL训练对深搜索能力的提升有重要贡献。
DeepDive支持工具调用的测试时间扩展和并行采样。

Cool Papers

点此查看论文截图

DECAMP: Towards Scene-Consistent Multi-Agent Motion Prediction with Disentangled Context-Aware Pre-Training

Authors:Jianxin Shi, Zengqi Peng, Xiaolong Chen, Tianyu Wo, Jun Ma

Trajectory prediction is a critical component of autonomous driving, essential for ensuring both safety and efficiency on the road. However, traditional approaches often struggle with the scarcity of labeled data and exhibit suboptimal performance in multi-agent prediction scenarios. To address these challenges, we introduce a disentangled context-aware pre-training framework for multi-agent motion prediction, named DECAMP. Unlike existing methods that entangle representation learning with pretext tasks, our framework decouples behavior pattern learning from latent feature reconstruction, prioritizing interpretable dynamics and thereby enhancing scene representation for downstream prediction. Additionally, our framework incorporates context-aware representation learning alongside collaborative spatial-motion pretext tasks, which enables joint optimization of structural and intentional reasoning while capturing the underlying dynamic intentions. Our experiments on the Argoverse 2 benchmark showcase the superior performance of our method, and the results attained underscore its effectiveness in multi-agent motion forecasting. To the best of our knowledge, this is the first context autoencoder framework for multi-agent motion forecasting in autonomous driving. The code and models will be made publicly available.

轨迹预测是自动驾驶的关键组成部分，对于确保道路安全和效率至关重要。然而，传统方法往往难以应对标注数据稀缺的问题，在多智能体预测场景中的表现也不尽如人意。为了解决这些挑战，我们提出了一种用于多智能体运动预测的解耦上下文感知预训练框架，命名为DECAMP。与现有方法将表示学习与预训练任务纠缠在一起不同，我们的框架将行为模式学习与潜在特征重建解耦，优先考虑可解释的动态，从而增强下游预测的场景表示。此外，我们的框架结合了上下文感知表示学习与协作空间运动预训练任务，这能够在捕获潜在动态意图的同时，实现结构和意图推理的联合优化。我们在Argoverse 2基准测试上的实验展示了我们的方法的卓越性能，其结果强调了其在多智能体运动预测中的有效性。据我们所知，这是自动驾驶中多智能体运动预测的首个上下文自动编码器框架。代码和模型将公开发布。

论文及项目相关链接

PDF

Summary

基于轨迹预测在自动驾驶中的关键作用，本文提出了一种针对多智能体运动预测的解纠缠上下文感知预训练框架，名为DECAMP。该框架解决了传统方法在处理标签数据稀缺和多智能体预测场景中的性能不佳问题。它通过解耦行为模式学习和潜在特征重建，优先学习可解释的动态，从而提高了场景表示能力，有助于下游预测。此外，该框架结合了上下文感知表示学习与协作空间运动预训练任务，实现了结构性和意图推理的联合优化，同时捕捉了潜在动态意图。在Argoverse 2基准测试上的实验表明，该方法性能卓越，特别是在多智能体运动预测方面。这是首个用于自动驾驶多智能体运动预测的上文自动编码器框架。

Key Takeaways