⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-03-05 更新
ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer
Authors:Omer Goldman, Uri Shaham, Dan Malkin, Sivan Eiger, Avinatan Hassidim, Yossi Matias, Joshua Maynez, Adi Mayrav Gilady, Jason Riesa, Shruti Rijhwani, Laura Rimell, Idan Szpektor, Reut Tsarfaty, Matan Eyal
To achieve equitable performance across languages, multilingual large language models (LLMs) must be able to abstract knowledge beyond the language in which it was acquired. However, the current literature lacks reliable ways to measure LLMs’ capability of cross-lingual knowledge transfer. To that end, we present ECLeKTic, a multilingual closed-book QA (CBQA) dataset that Evaluates Cross-Lingual Knowledge Transfer in a simple, black-box manner. We detected information with uneven coverage across languages by controlling for presence and absence of Wikipedia articles in 12 languages. We generated knowledge-seeking questions in a source language, for which the answer appears in a relevant Wikipedia article and translated them to all other 11 languages, for which the respective Wikipedias lack equivalent articles. Assuming that Wikipedia reflects the prominent knowledge in the LLM’s training data, to solve ECLeKTic’s CBQA task the model is required to transfer knowledge between languages. Experimenting with 8 LLMs, we show that SOTA models struggle to effectively share knowledge across, languages even if they can predict the answer well for queries in the same language the knowledge was acquired in.
为了实现不同语言的公平性能,多语言大型语言模型(LLM)必须能够抽象获取语言之外的知识。然而,现有文献缺乏可靠的方法来衡量LLM跨语言知识转移的能力。为此,我们提出了ECLeKTic,这是一个多语言封闭问答(CBQA)数据集,以简单、黑箱的方式评估跨语言知识的转移。我们通过控制12种语言的维基百科文章的有无,检测了语言间信息覆盖的不均匀性。我们在源语言中生成寻求知识的问题,这些问题的答案出现在相关的维基百科文章中,然后将它们翻译到其他所有11种语言,对于这些语言,相应的维基百科缺乏等价的文章。假设维基百科反映了LLM训练数据中的突出知识,要解决ECLeKTic的CBQA任务,模型需要在语言之间进行知识转移。通过对8种LLM进行实验,我们发现即使对于在其获取知识的同一语言中的查询能够很好地预测答案,SOTA模型在跨语言共享知识方面仍然面临困难。
论文及项目相关链接
Summary
LLM在语言间的表现需平等,需要超越语言本身进行知识抽象。然而,当前文献缺乏可靠的方法来测量LLM跨语言知识转移的能力。为此,我们提出了ECLeKTic数据集,以简单直观的方式评估跨语言的知识转移能力。通过对语言之间信息覆盖不均的情况进行研究,我们通过控制资料的方式检验了知识的迁移。实验中涉及多个LLM,我们发现先进模型在跨语言环境下难以有效共享知识。即使它们在获取知识的同一语言中进行查询预测表现良好,但在跨语言环境下仍然面临挑战。
Key Takeaways
- LLM要实现不同语言间的公平表现,必须超越语言本身进行知识抽象。
- 当前缺乏评估LLM跨语言知识转移能力的可靠方法。
- ECLeKTic数据集用于简单直观地评估跨语言的知识转移能力。
- 不同语言之间的信息覆盖不均影响知识的迁移。
- LLM在跨语言环境下难以有效共享知识。
点此查看论文截图






CAMEx: Curvature-aware Merging of Experts
Authors:Dung V. Nguyen, Minh H. Nguyen, Luc Q. Nguyen, Rachel S. Y. Teo, Tan M. Nguyen, Linh Duy Tran
Existing methods for merging experts during model training and fine-tuning predominantly rely on Euclidean geometry, which assumes a flat parameter space. This assumption can limit the model’s generalization ability, especially during the pre-training phase, where the parameter manifold might exhibit more complex curvature. Curvature-aware merging methods typically require additional information and computational resources to approximate the Fisher Information Matrix, adding memory overhead. In this paper, we introduce CAMEx (Curvature-Aware Merging of Experts), a novel expert merging protocol that incorporates natural gradients to account for the non-Euclidean curvature of the parameter manifold. By leveraging natural gradients, CAMEx adapts more effectively to the structure of the parameter space, improving alignment between model updates and the manifold’s geometry. This approach enhances both pre-training and fine-tuning, resulting in better optimization trajectories and improved generalization without the substantial memory overhead typically associated with curvature-aware methods. Our contributions are threefold: (1) CAMEx significantly outperforms traditional Euclidean-based expert merging techniques across various natural language processing tasks, leading to enhanced performance during pre-training and fine-tuning; (2) we introduce a dynamic merging architecture that optimizes resource utilization, achieving high performance while reducing computational costs, facilitating efficient scaling of large language models; and (3) we provide both theoretical and empirical evidence to demonstrate the efficiency of our proposed method. The code is publicly available at: https://github.com/kpup1710/CAMEx.
现有的模型训练和微调过程中的专家融合方法主要依赖于欧几里得几何,这假设了参数空间是平坦的。这个假设可能会限制模型的泛化能力,特别是在预训练阶段,参数流形可能表现出更复杂的曲率。考虑到曲率的融合方法通常需要额外的信息和计算资源来近似费舍尔信息矩阵,增加了内存开销。在本文中,我们介绍了CAMEx(基于曲率的专家融合),这是一种新的专家融合协议,它采用自然梯度来考虑参数流形的非欧几里得曲率。通过利用自然梯度,CAMEx更有效地适应了参数空间的结构,改进了模型更新与流形几何之间的对齐。这种方法提高了预训练和微调的效果,带来了更好的优化轨迹和改进的泛化能力,而且没有与曲率感知方法相关的巨大内存开销。我们的贡献有三点:(1)CAMEx在各种自然语言处理任务上显著优于传统的基于欧几里得的专家融合技术,在预训练和微调过程中提高了性能;(2)我们引入了一种动态融合架构,优化了资源利用,在降低计算成本的同时实现了高性能,促进了大型语言模型的效率扩展;(3)我们提供了理论和实证证据,证明了我们提出的方法的效率。代码公开在:https://github.com/kpup1710/CAMEx。
论文及项目相关链接
PDF 10 pages, 5 Figures, 7 Tables. Published at ICLR 2025
摘要
本文提出了CAMEx(基于曲率的专家合并),这是一种新的专家合并协议,它采用自然梯度来应对参数流形非欧几里得曲率的问题。通过利用自然梯度,CAMEx更有效地适应了参数空间的结构,改善了模型更新与流形几何之间的对齐。此方法在预训练和微调中都表现出色,能在不增加大量内存开销的情况下优化轨迹并提高泛化能力。本文的主要贡献包括:CAMEx在多种自然语言处理任务上显著优于传统的欧几里得专家合并技术;我们引入了一种动态合并架构,在优化资源利用的同时降低了计算成本,促进了大型语言模型的效率扩展;我们提供了理论和实证证据来证明我们方法的效率。
关键见解
- CAMEx引入了一种新的专家合并方法,考虑参数流形的非欧几里得曲率,通过自然梯度进行更有效的模型更新。
- CAMEx在预训练和微调阶段都表现出卓越的性能,提高了模型的泛化能力。
- 与传统的欧几里得专家合并技术相比,CAMEx在各种自然语言处理任务上表现更优秀。
- 动态合并架构的优化资源利用,降低了计算成本,有利于大型语言模型的效率扩展。
- 本文提供了理论和实证证据来证明CAMEx方法的高效性。
- CAMEx的代码已公开发布在GitHub上。
点此查看论文截图




Chain of Draft: Thinking Faster by Writing Less
Authors:Silei Xu, Wenhao Xie, Lingxiao Zhao, Pengcheng He
Large Language Models (LLMs) have demonstrated remarkable performance in solving complex reasoning tasks through mechanisms like Chain-of-Thought (CoT) prompting, which emphasizes verbose, step-by-step reasoning. However, humans typically employ a more efficient strategy: drafting concise intermediate thoughts that capture only essential information. In this work, we propose Chain of Draft (CoD), a novel paradigm inspired by human cognitive processes, where LLMs generate minimalistic yet informative intermediate reasoning outputs while solving tasks. By reducing verbosity and focusing on critical insights, CoD matches or surpasses CoT in accuracy while using as little as only 7.6% of the tokens, significantly reducing cost and latency across various reasoning tasks. Our code and data are available at https://github.com/sileix/chain-of-draft.
大型语言模型(LLM)通过思维链(CoT)提示等机制,在解决复杂推理任务方面表现出卓越的性能,强调详细、逐步推理。然而,人类通常采用更有效的策略:起草简洁的中间思想,只捕捉关键信息。在这项工作中,我们提出了受人类认知过程启发的全新范式——草稿链(CoD)。在此范式下,LLM在解决问题时会生成简洁而富有信息量的中间推理输出。通过减少冗余并专注于关键见解,CoD在准确性方面与CoT相匹配或超越,同时仅使用7.6%的符号,显著降低了各种推理任务的成本和延迟。我们的代码和数据在https://github.com/sileix/chain-of-draft可公开访问。
论文及项目相关链接
Summary:大型语言模型(LLM)通过如思维链(CoT)提示等机制在解决复杂推理任务方面表现出卓越性能。然而,人类通常采用更高效的策略,即撰写简洁的中间想法,仅捕捉关键信息。本研究提出一种新的范式——思维链草案(CoD),它受到人类认知过程的启发,使LLM在解决问题时能够生成简洁而富有信息量的中间推理输出。通过减少冗余信息并专注于关键见解,CoD在准确性方面与CoT相匹配或更胜一筹,同时仅使用7.6%的标记,显著降低了各种推理任务的成本和延迟。
Key Takeaways:
- 大型语言模型(LLM)通过思维链(CoT)提示解决复杂推理任务表现出卓越性能。
- 人类在解决推理问题时通常采用简洁的中间思考策略,仅关注关键信息。
- 研究提出了一种新的范式——思维链草案(CoD),它受到人类认知过程的启发。
- CoD使LLM能够生成简洁而富有信息量的中间推理输出。
- CoD在准确性方面与CoT相匹配,并在某些情况下表现更优秀。
- CoD显著降低了推理任务的成本和延迟,仅使用7.6%的标记。
点此查看论文截图







InductionBench: LLMs Fail in the Simplest Complexity Class
Authors:Wenyue Hua, Tyler Wong, Sun Fei, Liangming Pan, Adam Jardine, William Yang Wang
Large language models (LLMs) have shown remarkable improvements in reasoning and many existing benchmarks have been addressed by models such as o1 and o3 either fully or partially. However, a majority of these benchmarks emphasize deductive reasoning, including mathematical and coding tasks in which rules such as mathematical axioms or programming syntax are clearly defined, based on which LLMs can plan and apply these rules to arrive at a solution. In contrast, inductive reasoning, where one infers the underlying rules from observed data, remains less explored. Such inductive processes lie at the heart of scientific discovery, as they enable researchers to extract general principles from empirical observations. To assess whether LLMs possess this capacity, we introduce InductionBench, a new benchmark designed to evaluate the inductive reasoning ability of LLMs. Our experimental findings reveal that even the most advanced models available struggle to master the simplest complexity classes within the subregular hierarchy of functions, highlighting a notable deficiency in current LLMs’ inductive reasoning capabilities. Coda and data are available https://github.com/Wenyueh/inductive_reasoning_benchmark.
大型语言模型(LLM)在推理方面取得了显著的进步,现有的许多基准测试(如o1和o3)已经被这些模型完全或部分解决。然而,这些基准测试中大多数都侧重于演绎推理,包括数学和编码任务,这些任务中的规则(如数学公理或编程语法)是明确定义的,LLM可以规划并应用这些规则来得出解决方案。相比之下,归纳推理的研究较少,归纳推理是从观察到的数据中推断出潜在规则的过程。这样的归纳过程处于科学发现的核心,因为它们使研究人员能够从实证观察中提取一般原则。为了评估LLM是否具备这种能力,我们引入了InductionBench,这是一个新的基准测试,旨在评估LLM的归纳推理能力。我们的实验结果表明,即使在功能子正则层次结构中最简单的复杂性类别中,最先进的模型也很难掌握,这突显了当前LLM归纳推理能力的显著不足。相关代码和数据可通过https://github.com/Wenyueh/inductive_reasoning_benchmark获取。
论文及项目相关链接
PDF 24 pages, 7 figures
Summary
大型语言模型(LLM)在推理方面取得显著进步,现有许多基准测试已被模型如o1和o3部分或完全解决。然而,大多数基准测试侧重于明确的规则定义,如数学公理或编程语法,LLM可以通过应用这些规则来解决问题。相比之下,关于从观察到的数据中推断潜在规则的归纳推理研究较少。为了评估LLM的这种能力,我们引入了InductionBench,这是一个新的基准测试,旨在评估LLM的归纳推理能力。实验发现,最先进的模型在函数次正规层次结构中最简单的复杂性类别中也存在困难,这表明当前LLM的归纳推理能力存在明显不足。
Key Takeaways
- LLM在推理方面取得显著进步,并已在多个基准测试中表现优异。
- 现有基准测试大多侧重于明确的规则定义的推理,如数学公理或编程语法。
- 归纳推理是一个相对较新的研究领域,尤其是与LLM的能力评估有关。
- InductionBench是一个新的基准测试,旨在评估LLM的归纳推理能力。
- 最先进的LLM在归纳推理方面仍存在明显不足,即使在简单的复杂性类别中也面临挑战。
- 数据和代码可通过特定链接获取。
点此查看论文截图



SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors
Authors:Bohan Lyu, Siqiao Huang, Zichen Liang
Neural surrogate models have emerged as powerful and efficient tools in data mining. Meanwhile, large language models (LLMs) have demonstrated remarkable capabilities in code-related tasks. We investigate a novel application: using LLMs as surrogate models for code execution prediction. Given LLMs’ unique ability to understand and process diverse programs, they present a promising direction for building general-purpose surrogate models. To systematically investigate this capability, we introduce SURGE, a comprehensive benchmark with $1160$ problems covering $8$ key aspects: multi-language programming tasks, competition-level programming problems, repository-level code analysis, high-cost scientific computing, time-complexity-intensive algorithms, buggy code analysis, programs dependent on specific compilers or execution environments, and formal mathematical proof verification. Through extensive empirical analysis of $21$ open-source and proprietary LLMs, we examine scaling laws, data efficiency, and predictive accuracy. Our findings reveal important insights about the feasibility of LLMs as efficient surrogates for computational processes, with implications for automated software testing, program analysis, and computational resource optimization in data mining applications. Code and dataset are released at https://github.com/Imbernoulli/SURGE.
神经代理模型已经在数据挖掘中展现出强大且高效的工具能力。与此同时,大型语言模型(LLM)在代码相关任务中表现出了显著的能力。我们研究了一个新应用:使用LLM作为代码执行预测的代理模型。鉴于LLM理解和处理各种程序的独特能力,它们为构建通用代理模型提供了一个有前景的方向。为了系统地研究这一能力,我们引入了SURGE,这是一个包含1160个问题的综合基准测试,涵盖8个关键方面:多语言编程任务、竞赛级编程问题、仓库级代码分析、高成本科学计算、时间复杂度密集算法、错误代码分析、依赖于特定编译器或执行环境的程序以及形式化数学证明验证。通过对21个开源和专有LLM进行广泛的实证分析,我们研究了规模定律、数据效率和预测精度。我们的研究结果揭示了LLM作为计算过程的高效代理的可行性,为自动化软件测试、程序分析和数据挖掘应用中的计算资源优化提供了启示。代码和数据集已在https://github.com/Imbernoulli/SURGE发布。
论文及项目相关链接
Summary
神经网络代理模型在数据挖掘中展现出强大且高效的工具能力。同时,大型语言模型(LLM)在代码相关任务中表现出卓越的能力。本研究探索了一种新型应用:将LLM用作代码执行预测的代理模型。鉴于LLM理解和处理多样程序的能力,它们为构建通用代理模型提供了有前景的方向。为了系统地研究这一能力,我们推出了SURGE综合基准测试,包含涵盖八个关键方面的1160个问题:多语言编程任务、竞赛级编程问题、仓库级代码分析、高成本科学计算、时间复杂度密集算法、错误代码分析、特定编译器或执行环境依赖的程序以及形式化数学证明验证。通过对21个开源和专有LLM的广泛实证分析,我们研究了规模定律、数据效率和预测精度。研究结果表明,LLM作为计算过程的代理是可行的,对自动化软件测试、程序分析和数据挖掘应用的计算资源优化具有重要影响。
Key Takeaways
- 神经网络代理模型在数据挖掘中表现强大且高效。
- 大型语言模型(LLM)在代码相关任务中具有卓越能力。
- LLM可应用于代码执行预测,作为通用代理模型具有潜力。
- 推出SURGE综合基准测试,涵盖多个关键方面的编程问题。
- 通过实证分析研究了LLM的规模定律、数据效率和预测精度。
- LLM作为计算过程的代理具有可行性,对自动化软件测试和程序分析有重要影响。
点此查看论文截图




Preconditioned Inexact Stochastic ADMM for Deep Model
Authors:Shenglong Zhou, Ouya Wang, Ziyan Luo, Yongxu Zhu, Geoffrey Ye Li
The recent advancement of foundation models (FMs) has brought about a paradigm shift, revolutionizing various sectors worldwide. The popular optimizers used to train these models are stochastic gradient descent-based algorithms, which face inherent limitations, such as slow convergence and stringent assumptions for convergence. In particular, data heterogeneity arising from distributed settings poses significant challenges to their theoretical and numerical performance. This paper develops an algorithm, PISA ({P}reconditioned {I}nexact {S}tochastic {A}lternating Direction Method of Multipliers), which enables scalable parallel computing and supports various second-moment schemes. Grounded in rigorous theoretical guarantees, the algorithm converges under the sole assumption of Lipschitz continuity of the gradient, thereby removing the need for other conditions commonly imposed by stochastic methods. This capability enables PISA to tackle the challenge of data heterogeneity effectively. Comprehensive experimental evaluations for training or fine-tuning diverse FMs, including vision models, large language models, reinforcement learning models, generative adversarial networks, and recurrent neural networks, demonstrate its superior numerical performance compared to various state-of-the-art optimizers.
近期的预训练模型(FMs)的发展带来了范式转变,为全球的多个行业带来了革命性的变革。用于训练这些模型的流行优化器是基于随机梯度下降算法的算法,这些算法面临着固有的局限性,如收敛速度慢和对收敛的严格假设。特别是分布式设置中产生的数据异构性对其理论和数值性能提出了重大挑战。本文开发了一种算法PISA(预条件近似随机交替方向乘数法),可实现可扩展的并行计算并支持多种二阶矩方案。该算法建立在严格的理论保证之上,只在梯度具有Lipschitz连续性的假设下收敛,从而消除了随机方法通常施加的其它条件。这种能力使得PISA能够更有效地解决数据异构性的挑战。对于训练或微调各种预训练模型的综合实验评估,包括视觉模型、大型语言模型、强化学习模型、生成对抗网络和循环神经网络等,证明了其相较于各种最先进的优化器的优越数值性能。
论文及项目相关链接
Summary
大规模模型(FMs)的最新进展带来了范式转变,正在全球范围内变革各个领域。尽管当前用于训练模型的优化器大多为基于随机梯度下降算法,但其存在固有的局限性,如收敛速度慢和假设条件严格等。数据异构性带来的挑战严重影响了它们的理论和数值性能。本文提出了一种算法PISA,该算法可实现可扩展的并行计算并支持多种二阶矩方案。基于严格的理论保证,该算法仅在梯度满足Lipschitz连续性假设下收敛,无需其他随机方法常见的条件限制。这使得PISA能够更有效地应对数据异构性的挑战。对多种FMs(包括视觉模型、大型语言模型、强化学习模型、生成对抗网络和循环神经网络等)的实验评估显示,其数值性能优于各种最先进的优化器。
Key Takeaways
- 大规模模型(FMs)的进展带来了各领域变革。
- 当前优化器存在收敛速度慢和假设条件严格等问题。
- 数据异构性对优化器的理论和数值性能带来了挑战。
- PISA算法具有可扩展的并行计算能力,并支持多种二阶矩方案。
- PISA基于严格的理论保证,在梯度满足Lipschitz连续性假设下收敛。
- PISA能有效应对数据异构性的挑战。
点此查看论文截图


Gradient-Based Multi-Objective Deep Learning: Algorithms, Theories, Applications, and Beyond
Authors:Weiyu Chen, Xiaoyuan Zhang, Baijiong Lin, Xi Lin, Han Zhao, Qingfu Zhang, James T. Kwok
Multi-objective optimization (MOO) in deep learning aims to simultaneously optimize multiple conflicting objectives, a challenge frequently encountered in areas like multi-task learning and multi-criteria learning. Recent advancements in gradient-based MOO methods have enabled the discovery of diverse types of solutions, ranging from a single balanced solution to finite or even infinite Pareto sets, tailored to user needs. These developments have broad applications across domains such as reinforcement learning, computer vision, recommendation systems, and large language models. This survey provides the first comprehensive review of gradient-based MOO in deep learning, covering algorithms, theories, and practical applications. By unifying various approaches and identifying critical challenges, it serves as a foundational resource for driving innovation in this evolving field. A comprehensive list of MOO algorithms in deep learning is available at https://github.com/Baijiong-Lin/Awesome-Multi-Objective-Deep-Learning.
深度学习中的多目标优化(MOO)旨在同时优化多个相互冲突的目标,这在多任务学习和多标准学习等领域中经常遇到挑战。基于梯度的MOO方法的最新进展已经能够发现多种解决方案,从单一平衡解决方案到有限甚至无限的帕累托集,这些解决方案都符合用户的需求。这些发展在强化学习、计算机视觉、推荐系统和大型语言模型等领域有广泛应用。本文提供了基于梯度的深度学习MOO的首份全面综述,涵盖了算法、理论和实践应用。通过统一各种方法和确定关键挑战,它成为推动这一不断发展的领域创新的基础资源。深度学习中的MOO算法的综合列表可在https://github.com/Baijiong-Lin/Awesome-Multi-Objective-Deep-Learning找到。
论文及项目相关链接
Summary
深度学习中的多目标优化(MOO)旨在同时优化多个冲突目标,这在多任务学习和多标准学习中经常遇到。近期基于梯度的MOO方法的发展,使用户能够发现各种类型的解决方案,从单一平衡解决方案到有限的甚至无限的帕累托集,满足不同需求。该方法广泛应用于强化学习、计算机视觉、推荐系统和大语言模型等领域。这篇综述文章首次全面回顾了深度学习中的基于梯度的MOO,涵盖了算法、理论和应用。它为推动这一新兴领域的创新提供了基础资源。
Key Takeaways
- 多目标优化(MOO)在深度学习中用于同时优化多个冲突目标。
- 基于梯度的MOO方法能发现多种类型的解决方案,满足不同用户需求。
- 该方法广泛应用于强化学习、计算机视觉、推荐系统和大语言模型。
- 综述文章全面介绍了深度学习中的基于梯度的MOO,包括算法、理论和应用。
- 该文章为驱动MOO领域创新提供了基础资源。
- 文章中提到的挑战对于未来研究具有重要意义。
点此查看论文截图




Monet: Mixture of Monosemantic Experts for Transformers
Authors:Jungwoo Park, Young Jin Ahn, Kee-Eung Kim, Jaewoo Kang
Understanding the internal computations of large language models (LLMs) is crucial for aligning them with human values and preventing undesirable behaviors like toxic content generation. However, mechanistic interpretability is hindered by polysemanticity – where individual neurons respond to multiple, unrelated concepts. While Sparse Autoencoders (SAEs) have attempted to disentangle these features through sparse dictionary learning, they have compromised LLM performance due to reliance on post-hoc reconstruction loss. To address this issue, we introduce Mixture of Monosemantic Experts for Transformers (Monet) architecture, which incorporates sparse dictionary learning directly into end-to-end Mixture-of-Experts pretraining. Our novel expert decomposition method enables scaling the expert count to 262,144 per layer while total parameters scale proportionally to the square root of the number of experts. Our analyses demonstrate mutual exclusivity of knowledge across experts and showcase the parametric knowledge encapsulated within individual experts. Moreover, Monet allows knowledge manipulation over domains, languages, and toxicity mitigation without degrading general performance. Our pursuit of transparent LLMs highlights the potential of scaling expert counts to enhance mechanistic interpretability and directly resect the internal knowledge to fundamentally adjust model behavior. The source code and pretrained checkpoints are available at https://github.com/dmis-lab/Monet.
理解大型语言模型(LLM)的内部计算对于将其与人类价值观对齐并防止生成有毒内容等不当行为至关重要。然而,多义性(即单个神经元对多个不相关概念的响应)阻碍了机械解释性。稀疏自动编码器(SAE)曾试图通过稀疏字典学习来解开这些特征,但由于依赖事后重建损失而损害了LLM的性能。为了解决这一问题,我们引入了“Transformer的单语义专家混合体”(Monet)架构,它将稀疏字典学习直接集成到端到端的专家混合预训练过程中。我们新颖的专家分解方法能够实现每层专家数量达到262,144个,同时总参数数量与专家数量的平方根成比例增长。我们的分析证明了专家之间知识的相互独立性,并展示了单个专家所包含的参数知识。此外,Monet允许在域、语言和毒性方面进行知识操控,同时不会降低整体性能。我们对透明LLM的追求突显了增加专家数量以提升机械解释性的潜力,并可直接切除内部知识以从根本上调整模型行为。源代码和预训练检查点可访问https://github.com/dmis-lab/Monet。
论文及项目相关链接
Summary
大型语言模型(LLM)的内部计算理解对于与人类价值观对齐及防止生成有毒内容等不可取行为至关重要。然而,由于神经元的多义性(即单个神经元对多个无关概念的响应),机械解释性受阻。为解决此问题,我们引入Monet架构,将稀疏字典学习直接融入端到端的专家混合预训练。Monet的新型专家分解法可实现每层专家数量增至262,144个,同时总参数与专家数量的平方根成比例增长。分析显示,专家间的知识相互独立,展示了单个专家内部的参数知识。此外,Monet可实现跨领域、语言的知识操控及毒性缓解,且不影响整体性能。我们对透明LLM的追求突显了增加专家数量以提升机械解释性的潜力,并可直接调整内部知识以改变模型行为。
Key Takeaways
- 理解LLM的内部计算对于与人类价值观对齐和防止不良行为至关重要。
- 神经元的多义性阻碍了机械解释性。
- Sparse Autoencoders(SAE)尝试通过稀疏字典学习解开这些特征,但影响了LLM性能。
- 引入Monet架构,将稀疏字典学习直接融入专家混合预训练的端到端过程。
- Monet的新型专家分解法允许每层专家数量大幅增加。
- 专家间的知识相互独立,展示了单个专家的参数知识。
点此查看论文截图



Controllable Context Sensitivity and the Knob Behind It
Authors:Julian Minder, Kevin Du, Niklas Stoehr, Giovanni Monea, Chris Wendler, Robert West, Ryan Cotterell
When making predictions, a language model must trade off how much it relies on its context vs. its prior knowledge. Choosing how sensitive the model is to its context is a fundamental functionality, as it enables the model to excel at tasks like retrieval-augmented generation and question-answering. In this paper, we search for a knob which controls this sensitivity, determining whether language models answer from the context or their prior knowledge. To guide this search, we design a task for controllable context sensitivity. In this task, we first feed the model a context (Paris is in England) and a question (Where is Paris?); we then instruct the model to either use its prior or contextual knowledge and evaluate whether it generates the correct answer for both intents (either France or England). When fine-tuned on this task, instruction-tuned versions of Llama-3.1, Mistral-v0.3, and Gemma-2 can solve it with high accuracy (85-95%). Analyzing these high-performing models, we narrow down which layers may be important to context sensitivity using a novel linear time algorithm. Then, in each model, we identify a 1-D subspace in a single layer that encodes whether the model follows context or prior knowledge. Interestingly, while we identify this subspace in a fine-tuned model, we find that the exact same subspace serves as an effective knob in not only that model but also non-fine-tuned instruct and base models of that model family. Finally, we show a strong correlation between a model’s performance and how distinctly it separates context-agreeing from context-ignoring answers in this subspace. These results suggest a single subspace facilitates how the model chooses between context and prior knowledge, hinting at a simple fundamental mechanism that controls this behavior.
在做出预测时,语言模型必须在依赖上下文和先前知识之间进行权衡。选择模型对上下文的敏感程度是一个基本功能,因为这使模型能够在诸如检索增强生成和问答等任务中表现出色。在本文中,我们寻找一个控制这种敏感性的旋钮,以确定语言模型是从上下文还是先前知识中得出答案。为了指导这次搜索,我们设计了一个可控上下文敏感度的任务。在此任务中,我们首先向模型提供上下文(例如“巴黎在英国”)和问题(“巴黎在哪里?”);然后指示模型使用其先前的知识或上下文知识,并评估它是否能够针对两种意图(法国或英国)给出正确的答案。在此任务上进行微调后,Llama-3.1、Mistral-v0.3和Gemma-2的指令调整版可以以高准确率(85-95%)解决此问题。通过分析这些高性能模型,我们利用一种新颖的线性时间算法缩小了对上下文敏感度至关重要的层。然后,在每个模型中,我们在单层中识别出一个一维子空间,该子空间编码模型是遵循上下文还是先前知识。有趣的是,虽然我们在经过精细调整的模型中确定了此子空间,但我们发现该子空间不仅在该模型中有效,而且在该模型系列的其他未经过精细调整和指令的模型中同样有效。最后,我们发现在此子空间中,模型的性能与区分上下文一致和忽略上下文的答案之间存在很强的相关性。这些结果表明,单个子空间促进了模型在上下文和先前知识之间的选择,暗示了一个简单的基本机制可以控制这种行为。
论文及项目相关链接
PDF Published as a conference paper at ICLR 2025
摘要
本文探讨了语言模型在预测时如何平衡上下文依赖与先验知识依赖的权衡问题。文章提出了一种控制语境敏感度的任务,通过设计特定任务来指导模型如何选择使用上下文知识或先验知识。研究发现,通过微调任务,某些模型能够准确地在不同情境下选择使用上下文知识或先验知识,并表现出高准确率。进一步的分析揭示了控制语境敏感度的关键层级和子空间,这些子空间在不同模型和情境下均有效。研究还发现模型性能与在子空间中区分上下文一致和忽略上下文的答案的能力之间存在强烈关联。这些结果揭示了单一子空间如何帮助模型在上下文和先验知识之间做出选择,并暗示了一种控制这种行为的基本机制。
关键见解
- 语言模型在预测时需权衡上下文依赖与先验知识依赖。
- 通过设计特定任务来指导模型如何选择使用上下文知识或先验知识。
- 高性能模型能够通过微调任务来准确地在不同情境下选择使用上下文知识或先验知识。
- 揭示了控制语境敏感度的关键层级和子空间。
- 这些子空间不仅适用于微调过的模型,也适用于未经过调教的模型和该模型家族的基础模型。
- 模型性能与子空间中区分上下文一致性和忽略上下文的答案的能力之间存在强烈关联。
点此查看论文截图


CUIfy the XR: An Open-Source Package to Embed LLM-powered Conversational Agents in XR
Authors:Kadir Burak Buldu, Süleyman Özdel, Ka Hei Carrie Lau, Mengdi Wang, Daniel Saad, Sofie Schönborn, Auxane Boch, Enkelejda Kasneci, Efe Bozkir
Recent developments in computer graphics, machine learning, and sensor technologies enable numerous opportunities for extended reality (XR) setups for everyday life, from skills training to entertainment. With large corporations offering affordable consumer-grade head-mounted displays (HMDs), XR will likely become pervasive, and HMDs will develop as personal devices like smartphones and tablets. However, having intelligent spaces and naturalistic interactions in XR is as important as technological advances so that users grow their engagement in virtual and augmented spaces. To this end, large language model (LLM)–powered non-player characters (NPCs) with speech-to-text (STT) and text-to-speech (TTS) models bring significant advantages over conventional or pre-scripted NPCs for facilitating more natural conversational user interfaces (CUIs) in XR. This paper provides the community with an open-source, customizable, extendable, and privacy-aware Unity package, CUIfy, that facilitates speech-based NPC-user interaction with widely used LLMs, STT, and TTS models. Our package also supports multiple LLM-powered NPCs per environment and minimizes latency between different computational models through streaming to achieve usable interactions between users and NPCs. We publish our source code in the following repository: https://gitlab.lrz.de/hctl/cuify
最近的计算机图形学、机器学习和传感器技术的发展为扩展现实(XR)在日常生活中的运用提供了众多机会,无论是技能培训还是娱乐。随着大型企业提供经济实惠的消费级头戴显示器(HMD),XR很可能变得普及,HMD将发展成为像智能手机和平板电脑一样的个人设备。然而,拥有智能空间和自然交互的XR与科技进步一样重要,使用户能够在虚拟和增强空间中增加参与度。为此,使用大型语言模型(LLM)驱动的非玩家角色(NPC)通过语音转文本(STT)和文本转语音(TTS)模型,为XR中更自然的对话式用户界面(CUI)带来了相较于传统或预设NPC的显著优势。本文为社区提供了一个开源、可定制、可扩展且注重隐私的Unity软件包CUIfy,它促进了基于语音的NPC-用户交互,广泛使用了LLM、STT和TTS模型。我们的软件包还支持每个环境多个LLM驱动的NPC,并通过流式传输减少不同计算模型之间的延迟,从而实现用户与NPC之间的可用交互。我们在以下仓库中发布源代码:https://gitlab.lrz.de/hctl/cuify 。
论文及项目相关链接
PDF 7th IEEE International Conference on Artificial Intelligence & eXtended and Virtual Reality (IEEE AIxVR 2025)
Summary
随着计算机图形学、机器学习和传感器技术的最新发展,扩展现实(XR)设置在日常生活中的机会日益增多,从技能培训到娱乐皆可应用。大型企业推出的经济实惠的消费级头戴式显示器(HMDs)将使XR普及,HMDs将像智能手机和平板电脑一样成为个人设备。然而,实现XR中的智能空间和自然交互与科技进步同样重要,以使用户增加对虚拟和增强空间的参与度。为此,利用大型语言模型(LLM)驱动的非玩家角色(NPCs)结合语音识别和文本生成技术,可为XR中的用户带来更自然的语言用户界面(CUI)。本文向社区提供了一个开源、可定制、可扩展且注重隐私的Unity软件包CUIfy,它促进了基于语音的NPC-用户与广泛使用的LLM、语音识别和文本生成模型的交互。我们的软件包还支持每个环境多个LLM驱动的NPC,并通过流技术最小化不同计算模型之间的延迟,以实现用户和NPC之间的可用交互。
Key Takeaways
- 扩展现实(XR)在日常生活中的机会增多,涉及技能培训、娱乐等多个领域。
- 头戴式显示器(HMDs)将成为个人设备,如同智能手机和平板电脑。
- 智能空间和自然交互在XR中的重要性不亚于科技进步。
- 大型语言模型(LLM)驱动的非玩家角色(NPCs)结合语音识别和文本生成技术,可实现更自然的语言用户界面(CUI)。
- 本文提供了一个开源的Unity软件包CUIfy,支持基于语音的NPC-用户交互。
- CUIfy支持多个LLM驱动的NPC在同一环境内,提高交互多样性。
- 通过流技术最小化计算模型间的延迟,提高用户与NPC之间的交互体验。
点此查看论文截图




Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Authors:Sangmin Bae, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Seungyeon Kim, Tal Schuster
Large language models (LLMs) are expensive to deploy. Parameter sharing offers a possible path towards reducing their size and cost, but its effectiveness in modern LLMs remains fairly limited. In this work, we revisit “layer tying” as form of parameter sharing in Transformers, and introduce novel methods for converting existing LLMs into smaller “Recursive Transformers” that share parameters across layers, with minimal loss of performance. Here, our Recursive Transformers are efficiently initialized from standard pretrained Transformers, but only use a single block of unique layers that is then repeated multiple times in a loop. We further improve performance by introducing Relaxed Recursive Transformers that add flexibility to the layer tying constraint via depth-wise low-rank adaptation (LoRA) modules, yet still preserve the compactness of the overall model. We show that our recursive models (e.g., recursive Gemma 1B) outperform both similar-sized vanilla pretrained models (such as TinyLlama 1.1B and Pythia 1B) and knowledge distillation baselines – and can even recover most of the performance of the original “full-size” model (e.g., Gemma 2B with no shared parameters). Finally, we propose Continuous Depth-wise Batching, a promising new inference paradigm enabled by the Recursive Transformer when paired with early exiting. In a theoretical analysis, we show that this has the potential to lead to significant (2-3x) gains in inference throughput.
大型语言模型(LLM)的部署成本很高。参数共享为实现减少其规模和成本的可能路径提供了机会,但在现代LLM中的有效性仍然相当有限。在这项工作中,我们重新研究了Transformer中的参数共享形式“层绑定”,并介绍了将现有LLM转换为较小的“递归Transformer”的新方法,这些递归Transformer在层之间共享参数,同时性能损失最小。在这里,我们的递归Transformer是从标准预训练Transformer有效初始化的,但只使用单个唯一层块,然后在循环中多次重复。通过引入灵活的递归Transformer,我们在层绑定约束中添加灵活性,通过深度低秩适应(LoRA)模块仍然保持整体模型的紧凑性,进一步提高了性能。我们展示了我们的递归模型(例如,递归Gemma 1B)优于类似大小的普通预训练模型(如TinyLlama 1.1B和Pythia 1B)和知识蒸馏基线——甚至可以恢复原始“全尺寸”模型的大部分性能(例如,没有共享参数的Gemma 2B)。最后,我们提出了连续深度分批处理,这是由递归Transformer与早期退出相结合而实现的具有前景的新推理范式。在理论分析中,我们证明了这有可能导致推理吞吐量产生显著(2-3倍)的提升。
论文及项目相关链接
PDF ICLR 2025; 49 pages, 17 figures, 19 tables
Summary
大型语言模型(LLM)部署成本高昂。本研究重新审视了Transformer中的参数共享方式“层绑定”,并引入了将现有LLM转换为较小的“递归Transformer”的新方法,这些递归Transformer在层之间共享参数,同时性能损失最小化。通过标准预训练Transformer有效地初始化递归Transformer,并仅使用一个独特的图层块,然后在循环中多次重复。通过引入灵活的递归Transformer,即通过深度低秩适应(LoRA)模块增加层绑定约束的灵活性,仍保留了模型的紧凑性。递归模型(例如递归Gemma 1B)表现出优于类似大小的普通预训练模型(如TinyLlama 1.1B和Pythia 1B)和知识蒸馏基线的能力,甚至可以恢复大部分原始“全尺寸”模型(如没有共享参数的Gemma 2B)的性能。最后,提出了由递归Transformer与早期退出相结合实现的全新推理模式——连续深度分批,理论上有潜力实现推理吞吐量的显著(2-3倍)提升。
Key Takeaways
- 研究探讨了如何使用参数共享来减小大型语言模型(LLM)的尺寸和降低成本。
- 提出了“递归Transformer”模型,通过层绑定实现参数共享,并最小化性能损失。
- 递归Transformer通过标准预训练Transformer进行高效初始化,且仅使用一个独特图层块进行多次重复。
- 通过引入灵活的递归Transformer和深度低秩适应(LoRA)模块,提高了模型性能并保持了其紧凑性。
- 递归模型在性能上超越了类似的预训练模型和知识蒸馏基线,并接近全尺寸模型的表现。
- 结合早期退出机制,提出了新的推理模式——连续深度分批,有望显著提高推理效率。
点此查看论文截图





Can Knowledge Editing Really Correct Hallucinations?
Authors:Baixiang Huang, Canyu Chen, Xiongxiao Xu, Ali Payani, Kai Shu
Large Language Models (LLMs) suffer from hallucinations, referring to the non-factual information in generated content, despite their superior capacities across tasks. Meanwhile, knowledge editing has been developed as a new popular paradigm to correct erroneous factual knowledge encoded in LLMs with the advantage of avoiding retraining from scratch. However, a common issue of existing evaluation datasets for knowledge editing is that they do not ensure that LLMs actually generate hallucinated answers to the evaluation questions before editing. When LLMs are evaluated on such datasets after being edited by different techniques, it is hard to directly adopt the performance to assess the effectiveness of different knowledge editing methods in correcting hallucinations. Thus, the fundamental question remains insufficiently validated: Can knowledge editing really correct hallucinations in LLMs? We proposed HalluEditBench to holistically benchmark knowledge editing methods in correcting real-world hallucinations. First, we rigorously construct a massive hallucination dataset with 9 domains, 26 topics and more than 6,000 hallucinations. Then, we assess the performance of knowledge editing methods in a holistic way on five dimensions including Efficacy, Generalization, Portability, Locality, and Robustness. Through HalluEditBench, we have provided new insights into the potentials and limitations of different knowledge editing methods in correcting hallucinations, which could inspire future improvements and facilitate progress in the field of knowledge editing.
大型语言模型(LLM)存在虚构问题,即生成内容中的非事实信息,尽管其在各项任务中的能力超群。同时,知识编辑作为一种新的流行范式,以其避免从头开始重新训练的优势,被开发出来以纠正编码在LLM中的错误事实知识。然而,现有知识编辑评估数据集的一个普遍问题是,它们并不能确保LLM在编辑前实际上生成了虚构的答案。当在各种技术编辑后的LLM上使用此类数据集进行评估时,很难直接采用其性能来评估不同知识编辑方法在纠正虚构内容方面的有效性。因此,一个基本的问题仍未得到充分验证:知识编辑真的能够纠正LLM中的虚构内容吗?我们提出了HalluEditBench,以全面评估知识编辑方法在纠正现实世界中的虚构内容方面的表现。首先,我们严格构建了一个大规模虚构数据集,包含9个领域、26个主题和超过6000个虚构案例。然后,我们从五个维度全面评估了知识编辑方法的表现,包括效率、泛化能力、可移植性、局部性和稳健性。通过HalluEditBench,我们对不同知识编辑方法在纠正虚构内容方面的潜力与局限性有了全新的认识,这可以激发未来的改进,并推动知识编辑领域的发展。
论文及项目相关链接
PDF ICLR 2025. Main paper: 10 pages; total: 34 pages (including appendix). The first two authors contributed equally to this work. Code, data, results, and additional resources are available on the project website: https://llm-editing.github.io
Summary
大语言模型(LLM)存在生成内容中的非事实信息问题,即所谓的“幻觉”。为纠正编码在LLM中的错误事实知识,知识编辑作为一种新型范式得到发展,其优势在于避免从头开始重新训练。然而,现有评估数据集普遍存在一个问题,即它们无法保证LLM在编辑前实际上生成了幻觉答案。当在各种知识编辑技术评估数据集上评估LLM时,很难直接采用性能评估来检验不同知识编辑方法在纠正幻觉方面的效果。为此,我们提出HalluEditBench,全面评估知识编辑方法在纠正现实幻觉方面的能力。我们严格构建了包含9个领域、26个主题和6000多个幻觉的大规模幻觉数据集,从效率、通用性、可移植性、局部性和稳健性五个维度全面评估知识编辑方法的性能。HalluEditBench为不同知识编辑方法在纠正幻觉方面的潜力和局限性提供了新的见解,有望激发未来改进并推动知识编辑领域的进步。
Key Takeaways
- LLMs存在生成非事实信息的幻觉问题。
- 知识编辑作为一种新兴范式,旨在纠正LLMs中的错误事实知识,且避免从头开始重新训练。
- 现有评估数据集在评估知识编辑方法时存在缺陷,无法保证LLMs在编辑前生成幻觉答案。
- HalluEditBench旨在全面评估知识编辑方法在纠正现实幻觉方面的能力。
- HalluEditBench包括一个大规模幻觉数据集,涵盖多个领域和主题。
- 知识编辑方法的评估涉及五个维度:效率、通用性、可移植性、局部性和稳健性。
点此查看论文截图





LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch
Authors:Caigao Jiang, Xiang Shu, Hong Qian, Xingyu Lu, Jun Zhou, Aimin Zhou, Yang Yu
Optimization problems are prevalent across various scenarios. Formulating and then solving optimization problems described by natural language often requires highly specialized human expertise, which could block the widespread application of optimization-based decision making. To automate problem formulation and solving, leveraging large language models (LLMs) has emerged as a potential way. However, this kind of approach suffers from the issue of optimization generalization. Namely, the accuracy of most current LLM-based methods and the generality of optimization problem types that they can model are still limited. In this paper, we propose a unified learning-based framework called LLMOPT to boost optimization generalization. Starting from the natural language descriptions of optimization problems and a pre-trained LLM, LLMOPT constructs the introduced five-element formulation as a universal model for learning to define diverse optimization problem types. Then, LLMOPT employs the multi-instruction tuning to enhance both problem formalization and solver code generation accuracy and generality. After that, to prevent hallucinations in LLMs, such as sacrificing solving accuracy to avoid execution errors, the model alignment and self-correction mechanism are adopted in LLMOPT. We evaluate the optimization generalization ability of LLMOPT and compared methods across six real-world datasets covering roughly 20 fields such as health, environment, energy and manufacturing, etc. Extensive experiment results show that LLMOPT is able to model various optimization problem types such as linear/nonlinear programming, mixed integer programming, and combinatorial optimization, and achieves a notable 11.08% average solving accuracy improvement compared with the state-of-the-art methods. The code is available at https://github.com/caigaojiang/LLMOPT.
优化问题在各种场景中普遍存在。通过自然语言描述并然后解决优化问题通常需要高度专业化的人类专家知识,这可能会阻碍基于优化的决策制定的广泛应用。为了自动进行问题制定和解决,利用大型语言模型(LLM)已经成为一种可能的方法。然而,这种类型的方法存在优化泛化的问题。也就是说,当前大多数基于LLM的方法的准确度以及它们能够建模的优化问题类型的普遍性仍然有限。在本文中,我们提出了一个统一的基于学习的框架,称为LLMOPT,以提升优化泛化能力。LLMOPT从优化问题的自然语言描述和预训练LLM开始,构建了一个五元素公式作为学习定义多种优化问题类型的通用模型。然后,LLMOPT采用多指令调整,以提高问题形式化和求解器代码生成的准确性和普遍性。之后,为了防止LLM中的幻觉,例如在避免执行错误的情况下牺牲解决精度,LLMOPT中采用了模型对齐和自校正机制。我们在涵盖大约20个领域(如健康、环境、能源和制造等)的六个真实世界数据集上评估了LLMOPT和对比方法的优化泛化能力。大量的实验结果表明,LLMOPT能够建模多种优化问题类型,如线性/非线性规划、混合整数规划和组合优化等,与最先进的方法相比,实现了11.08%的平均解决精度改进。代码可在https://github.com/caigaojiang/LLMOPT上找到。
论文及项目相关链接
摘要
优化问题广泛存在于各种场景中。自然语言描述优化问题的形成和解决需要高度专业化的人类专业知识,这阻碍了基于优化的决策制定的广泛应用。为了自动化问题形成和解决,利用大型语言模型(LLM)已经成为一种可能的方法。然而,这种方法面临着优化泛化的问题。即当前大多数基于LLM的方法的准确性和它们能够建模的优化问题类型的普遍性仍然受到限制。在本文中,我们提出了一个名为LLMOPT的统一学习框架,以提高优化泛化能力。LLMOPT从优化问题的自然语言描述和一个预训练的大型语言模型开始,构建了五元素公式作为学习定义多种优化问题类型的通用模型。然后,LLMOPT采用多指令调整增强问题形式化和求解器代码生成准确性和普遍性。之后,为了防止LLM中的幻觉,如牺牲求解精度以避免执行错误,在LLMOPT中采用了模型对齐和自我校正机制。我们在涵盖大约20个领域的六个真实数据集上评估了LLMOPT和对比方法的优化泛化能力,如健康、环境、能源和制造等。大量实验结果表明,LLMOPT能够模拟多种优化问题类型,如线性/非线性规划、混合整数规划和组合优化,与最先进的方法相比,平均求解精度提高了11.08%。代码可在https://github.com/caigaojiang/LLMOPT上找到。
关键见解
- 大型语言模型(LLM)可用于自动化优化问题的形成和解决,但存在优化泛化的问题。
- LLMOPT框架通过构建五元素公式来提高优化泛化能力,并处理多种优化问题类型。
- LLMOPT采用多指令调整增强问题和求解器代码生成的准确性和普遍性。
- LLMOPT采用模型对齐和自我校正机制以防止LLM中的幻觉。
- 在六个真实数据集上的实验结果表明,LLMOPT在多种优化问题类型上具有显著优势,平均求解精度较现有方法提高了11.08%。
- LLMOPT代码库公开可用,便于进一步研究和应用。
点此查看论文截图



Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning
Authors:Mingyang Chen, Haoze Sun, Tianpeng Li, Fan Yang, Hao Liang, Keer Lu, Bin Cui, Wentao Zhang, Zenan Zhou, Weipeng Chen
Large Language Models (LLMs) have exhibited significant potential in performing diverse tasks, including the ability to call functions or use external tools to enhance their performance. While current research on function calling by LLMs primarily focuses on single-turn interactions, this paper addresses the overlooked necessity for LLMs to engage in multi-turn function calling–critical for handling compositional, real-world queries that require planning with functions but not only use functions. To facilitate this, we introduce an approach, BUTTON, which generates synthetic compositional instruction tuning data via bottom-up instruction construction and top-down trajectory generation. In the bottom-up phase, we generate simple atomic tasks based on real-world scenarios and build compositional tasks using heuristic strategies based on atomic tasks. Corresponding function definitions are then synthesized for these compositional tasks. The top-down phase features a multi-agent environment where interactions among simulated humans, assistants, and tools are utilized to gather multi-turn function calling trajectories. This approach ensures task compositionality and allows for effective function and trajectory generation by examining atomic tasks within compositional tasks. We produce a dataset BUTTONInstruct comprising 8k data points and demonstrate its effectiveness through extensive experiments across various LLMs.
大型语言模型(LLM)在执行多样化任务时表现出了巨大的潜力,包括调用函数或使用外部工具来提升性能的能力。虽然目前关于LLM的函数调用研究主要集中在单轮交互上,但本文强调了LLM进行多轮函数调用的必要性——这对于处理需要规划函数而非仅使用函数的组合、现实世界查询至关重要。为了促进这一点,我们提出了一种方法——BUTTON,它通过自下而上的指令构建和自上而下的轨迹生成,生成合成组合指令微调数据。在自下而上的阶段,我们基于现实场景生成简单的原子任务,并使用基于原子任务的启发式策略构建组合任务。然后为这些组合任务合成相应的函数定义。自上而下阶段的特点是多智能体环境,其中模拟人类、助理和工具之间的交互用于收集多轮函数调用轨迹。这种方法确保了任务的组合性,并允许通过检查组合任务中的原子任务来进行有效的函数和轨迹生成。我们制作了一个包含8000个数据点的BUTTONInstruct数据集,并通过在各种LLM上进行的大量实验证明了其有效性。
论文及项目相关链接
PDF Accepted to ICLR 2025
Summary
大型语言模型(LLM)具有调用函数或使用外部工具提升性能的能力,目前研究主要关注单轮交互中的函数调用。本文为解决处理需要规划函数而非仅使用函数的组合式、现实世界查询的需求,提出了一种名为BUTTON的方法。该方法通过自下而上的指令构建和自上而下的轨迹生成,生成合成组合指令调整数据。首先,基于真实场景生成简单原子任务,并利用启发式策略构建组合任务及相应函数定义。接着,在多智能体环境中模拟人类、助理和工具之间的交互,收集多轮函数调用轨迹。此方法确保了任务的可组合性,并允许通过检查组合任务中的原子任务来有效地生成功能和轨迹。我们制作了包含8k数据点的BUTTONInstruct数据集,并通过跨多种LLM的广泛实验证明了其有效性。
Key Takeaways
- LLMs具备调用函数和外部工具的能力,目前研究主要集中在单轮交互中的函数调用。
- 本文强调处理组合式、现实世界查询的需求,这些查询需要LLMs进行多轮函数调用。
- BUTTON方法用于生成合成组合指令调整数据,包含自下而上的指令构建和自上而下的轨迹生成两个阶段。
- 在BOTTOM阶段,基于真实场景生成原子任务,并使用启发式策略构建组合任务和相应函数定义。
- 在TOP阶段,利用多智能体环境模拟人类、助理和工具之间的交互,生成多轮函数调用轨迹。
- BUTTON方法确保了任务的可组合性,并能有效生成功能和轨迹。
点此查看论文截图




Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Authors:Bo Chen, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song
In-context learning has been recognized as a key factor in the success of Large Language Models (LLMs). It refers to the model’s ability to learn patterns on the fly from provided in-context examples in the prompt during inference. Previous studies have demonstrated that the Transformer architecture used in LLMs can implement a single-step gradient descent update by processing in-context examples in a single forward pass. Recent work has further shown that, during in-context learning, a looped Transformer can implement multi-step gradient descent updates in forward passes. However, their theoretical results require an exponential number of in-context examples, $n = \exp(\Omega(T))$, where $T$ is the number of loops or passes, to achieve a reasonably low error. In this paper, we study linear looped Transformers in-context learning on linear vector generation tasks. We show that linear looped Transformers can implement multi-step gradient descent efficiently for in-context learning. Our results demonstrate that as long as the input data has a constant condition number, e.g., $n = O(d)$, the linear looped Transformers can achieve a small error by multi-step gradient descent during in-context learning. Furthermore, our preliminary experiments validate our theoretical analysis. Our findings reveal that the Transformer architecture possesses a stronger in-context learning capability than previously understood, offering new insights into the mechanisms behind LLMs and potentially guiding the better design of efficient inference algorithms for LLMs.
上下文学习已被认为是大型语言模型(LLM)成功的关键因素。它指的是模型在推理过程中从提供的上下文示例中即时学习模式的能力。先前的研究表明,LLM中使用的Transformer架构可以通过一次前向传递处理上下文示例来实现单步梯度下降更新。最近的工作进一步表明,在上下文学习过程中,循环Transformer可以在前向传递中实现多步梯度下降更新。然而,他们的理论结果需要呈指数增长的上下文示例数量,即$n = \exp(\Omega(T))$,其中$T$是循环或传递的次数,才能达到合理的低误差率。在本文中,我们研究了线性向量生成任务中线性循环Transformer的上下文学习。我们展示了线性循环Transformer可以有效地实现多步梯度下降,用于上下文学习。我们的结果表明,只要输入数据具有恒定的条件数,例如$n = O(d)$,线性循环Transformer就可以通过多步梯度下降在上下文学习中实现小误差。此外,我们的初步实验验证了我们的理论分析。我们的研究结果表明,Transformer架构的上下文学习能力比以前所理解的要强,这为LLM的机制提供了新的见解,并可能为LLM设计更有效的推理算法提供指导。
论文及项目相关链接
PDF AIStats 2025
Summary
本文探讨了线性循环Transformer在上下文学习中的表现,特别是在线性向量生成任务上。研究结果显示,线性循环Transformer可以有效地实现多步梯度下降,即使在输入数据具有恒定条件数的情况下,也能通过多步梯度下降实现低误差的上下文学习。初步实验验证了理论分析的有效性,揭示了Transformer架构比先前所理解的拥有更强大的上下文学习能力,为LLM的机制和高效推理算法的设计提供了新见解。
Key Takeaways
- In-context learning是LLM成功的关键因素,指模型从提示中的上下文示例实时学习模式的能力。
- 之前的研究表明,Transformer架构可以通过单次前向传递实现单步梯度下降更新。
- 最新工作显示,在上下文学习中,循环Transformer可以实施多步梯度下降更新。
- 理论结果需要指数级的上下文示例数量来达到较低的误差。
- 线性循环Transformer在上下文学习中可以高效实现多步梯度下降。
- 当输入数据具有恒定条件数时,线性循环Transformer可以通过多步梯度下降实现小误差的上下文学习。
点此查看论文截图


From Tokens to Words: On the Inner Lexicon of LLMs
Authors:Guy Kaplan, Matanel Oren, Yuval Reif, Roy Schwartz
Natural language is composed of words, but modern large language models (LLMs) process sub-words as input. A natural question raised by this discrepancy is whether LLMs encode words internally, and if so how. We present evidence that LLMs engage in an intrinsic detokenization process, where sub-word sequences are combined into coherent whole-word representations at their last token. Our experiments show that this process primarily takes place within the early and middle layers of the model. We further demonstrate its robustness to arbitrary splits (e.g., “cats” to “ca” and “ts”), typos, and importantly-to out-of-vocabulary words: when feeding the last token internal representations of such words to the model as input, it can “understand” them as the complete word despite never seeing such representations as input during training. Our findings suggest that LLMs maintain a latent vocabulary beyond the tokenizer’s scope. These insights provide a practical, finetuning-free application for expanding the vocabulary of pre-trained models. By enabling the addition of new vocabulary words, we reduce input length and inference iterations, which reduces both space and model latency, with little to no loss in model accuracy.
自然语言由单词组成,但现代大型语言模型(LLM)将子词作为输入进行处理。由此产生的一个自然问题是,LLM是否在内部对单词进行编码,如果是的话是如何进行的。我们提供证据表明,LLM经历了内在的细节消解过程,其中子词序列在最后一个词处组合成连贯的整词表示。我们的实验表明,这个过程主要发生在模型的早期和中期层。我们还证明了它对任意拆分(例如,“cats”到“ca”和“ts”)、拼写错误以及最重要的超出词汇表的单词的稳健性:当将这些单词的最后一个内部表示作为输入提供给模型时,即使它在训练期间从未见过这样的表示作为输入,它也可以将它们“理解”为完整的单词。我们的研究结果表明,LLM在令牌化器范围之外维持了一个潜在词汇。这些见解提供了一种实用、无需微调即可扩展预训练模型词汇量的应用。通过添加新词汇单词的能力,我们减少了输入长度和推理迭代次数,从而在几乎不损失模型精度的情况下减少了空间和模型延迟。
论文及项目相关链接
PDF Accepted to the International Conference on Learning Representations (ICLR) 2025
Summary
大型语言模型(LLM)在处理自然语言时,虽然输入的是子词,但内在进行了词汇的合并与组合,形成连贯的词汇表示。实验显示这一过程主要在模型的前几层进行,且对任意拆分、拼写错误及超出词汇表的词汇具有稳健性。模型能在无需额外训练的情况下,将最后一项词汇的内部表示作为输入进行识别。这些发现显示LLM具备潜在的词汇库扩展能力,可提高模型效率并减少损失。
Key Takeaways
- LLM具有内在词汇合并过程,将子词序列组合成完整的词汇表示。
- 该过程主要在模型的前几层发生,涉及词汇的连贯表示。
- LLM对词汇的任意拆分、拼写错误及超出词汇表的词汇具有稳健性。
- LLM能够识别并理解训练时未曾接触过的词汇内部表示。
- LLM具备潜在的词汇库扩展能力,可超越分词器的范围。
- 通过扩展词汇量,可以减少输入长度和推理迭代,从而减少空间需求和模型延迟。
点此查看论文截图




TRACE: Temporal Grounding Video LLM via Causal Event Modeling
Authors:Yongxin Guo, Jingyu Liu, Mingda Li, Qingbin Liu, Xi Chen, Xiaoying Tang
Video Temporal Grounding (VTG) is a crucial capability for video understanding models and plays a vital role in downstream tasks such as video browsing and editing. To effectively handle various tasks simultaneously and enable zero-shot prediction, there is a growing trend in employing video LLMs for VTG tasks. However, current video LLM-based methods rely exclusively on natural language generation, lacking the ability to model the clear structure inherent in videos, which restricts their effectiveness in tackling VTG tasks. To address this issue, this paper first formally introduces causal event modeling framework, which represents video LLM outputs as sequences of events, and predict the current event using previous events, video inputs, and textural instructions. Each event consists of three components: timestamps, salient scores, and textual captions. We then propose a novel task-interleaved video LLM called TRACE to effectively implement the causal event modeling framework in practice. The TRACE process visual frames, timestamps, salient scores, and text as distinct tasks, employing various encoders and decoding heads for each. Task tokens are arranged in an interleaved sequence according to the causal event modeling framework’s formulation. Extensive experiments on various VTG tasks and datasets demonstrate the superior performance of TRACE compared to state-of-the-art video LLMs. Our model and code are available at https://github.com/gyxxyg/TRACE.
视频时序定位(VTG)是视频理解模型的关键能力,并在视频浏览和编辑等下游任务中起到至关重要的作用。为了有效地同时处理各种任务并实现零样本预测,越来越多的趋势是采用视频LLM进行VTG任务。然而,基于当前视频LLM的方法仅依赖于自然语言生成,缺乏建模视频内在清晰结构的能力,这限制了它们在处理VTG任务时的有效性。为了解决这个问题,本文首先正式引入了因果事件建模框架,该框架将视频LLM输出表示为事件序列,并使用以前的事件、视频输入和文本指令来预测当前事件。每个事件由三个组件组成:时间戳、突出得分和文本标题。然后,我们提出了一种新的任务交织视频LLM,称为TRACE,以有效地在实际中实施因果事件建模框架。TRACE将视觉帧、时间戳、突出得分和文本视为不同的任务,并为每个任务使用各种编码器和解码头。任务标记根据因果事件建模框架的公式进行交错序列安排。在各种VTG任务和数据集上的大量实验表明,TRACE的性能优于最先进的视频LLM。我们的模型和代码可在https://github.com/gyxxyg/TRACE上找到。
论文及项目相关链接
PDF ICLR 2025
Summary:视频时序定位(VTG)是视频理解模型中的核心能力,在视频浏览和编辑等下游任务中扮演重要角色。现有基于视频LLM的方法主要依赖自然语言生成,缺乏建模视频内在结构的能力,限制了其在VTG任务上的效果。本文引入因果事件建模框架,将视频LLM输出表示为事件序列,并使用先前事件、视频输入和文本指令预测当前事件。提出任务交织的视频LLM(TRACE),实现因果事件建模框架。TRACE对视觉帧、时间戳、显著分数和文本进行不同任务处理,根据因果事件建模框架的公式安排任务标记。在多个VTG任务和数据集上的实验表明TRACE优于现有视频LLM。
Key Takeaways:
- 视频时序定位(VTG)在视频理解模型中占据重要地位,对视频浏览和编辑等下游任务至关重要。
- 当前基于视频LLM的方法主要依赖自然语言生成,缺乏建模视频内在结构的能力。
- 引入因果事件建模框架,将视频LLM输出表示为事件序列,包括时间戳、显著分数和文本描述。
- 提出任务交织的视频LLM(TRACE)来实施因果事件建模框架。
- TRACE对视觉帧、时间戳、显著分数和文本进行不同任务处理,使用各种编码器和解码头。
- 根据因果事件建模框架的公式安排任务标记,实现更好的VTG任务性能。
- 在多个VTG任务和数据集上的实验表明TRACE优于现有视频LLM。
点此查看论文截图





FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation
Authors:Yuki Imajuku, Yoko Yamakata, Kiyoharu Aizawa
Research on food image understanding using recipe data has been a long-standing focus due to the diversity and complexity of the data. Moreover, food is inextricably linked to people’s lives, making it a vital research area for practical applications such as dietary management. Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities, not only in their vast knowledge but also in their ability to handle languages naturally. While English is predominantly used, they can also support multiple languages including Japanese. This suggests that MLLMs are expected to significantly improve performance in food image understanding tasks. We fine-tuned open MLLMs LLaVA-1.5 and Phi-3 Vision on a Japanese recipe dataset and benchmarked their performance against the closed model GPT-4o. We then evaluated the content of generated recipes, including ingredients and cooking procedures, using 5,000 evaluation samples that comprehensively cover Japanese food culture. Our evaluation demonstrates that the open models trained on recipe data outperform GPT-4o, the current state-of-the-art model, in ingredient generation. Our model achieved F1 score of 0.531, surpassing GPT-4o’s F1 score of 0.481, indicating a higher level of accuracy. Furthermore, our model exhibited comparable performance to GPT-4o in generating cooking procedure text.
使用食谱数据进行食品图像理解的研究一直是焦点,主要由于数据的多样性和复杂性。此外,食品与人们的生活紧密相连,使得其在饮食管理等实际应用中成为重要研究领域。近期多模态大语言模型(MLLMs)的进展表现出了显著的实力,不仅在于其丰富的知识,还在于其自然处理语言的能力。虽然主要是英语使用为主,但它们也支持多种语言,包括日语。这表明MLLMs有望在食品图像理解任务中显著提高性能。我们对公开MLLMs LLaVA-1.5和Phi-3 Vision进行了微调,使用日本食谱数据集进行基准测试,与封闭模型GPT-4o的性能进行了比较。然后,我们使用全面覆盖日本食品文化的5000个评估样本,对生成食谱的内容(包括食材和烹饪程序)进行了评估。我们的评估表明,在食材生成方面,基于食谱数据训练的公开模型优于当前最先进的模型GPT-4o。我们的模型F1分数达到0.531,超过了GPT-4o的0.481,显示出更高的准确性。此外,我们的模型在生成烹饪过程文本方面的性能与GPT-4o相当。
论文及项目相关链接
PDF 15 pages, 5 figures. We found errors in the calculation of evaluation metrics, which were corrected in this version with $\color{blue}{\text{modifications highlighted in blue}}$. Please also see the Appendix
Summary:
利用食谱数据研究食品图像识别一直是研究的重点,由于数据的多样性和复杂性。近期多模态大语言模型(MLLMs)在食品图像识别任务上展现出显著优势,不仅能处理多种语言包括日语,而且在食谱数据上的表现优异。本研究对LLaVA-1.5和Phi-3 Vision两个开源模型进行微调,并在日本食谱数据集上进行性能评估,结果显示其在食材生成方面优于当前最先进的模型GPT-4o,F1分数达到0.531。
Key Takeaways:
- 食品图像识别研究因数据的多样性和复杂性而持续受到关注。
- 多模态大语言模型(MLLMs)在食品图像识别任务上具有显著优势。
- MLLMs不仅能处理英语,也可支持包括日语在内的多种语言。
- 开源模型LLaVA-1.5和Phi-3 Vision在日本食谱数据集上的表现优于当前最先进的模型GPT-4o。
- 在食材生成方面,LLaVA-1.5和Phi-3 Vision的F1分数达到0.531,高于GPT-4o的0.481。
- 在生成烹饪流程文本方面,LLaVA-1.5和Phi-3 Vision与GPT-4o表现相当。
- 研究结果表明开源模型在食品图像识别任务中的性能提升有潜力。
点此查看论文截图




StarVid: Enhancing Semantic Alignment in Video Diffusion Models via Spatial and SynTactic Guided Attention Refocusing
Authors:Yuanhang Li, Qi Mao, Lan Chen, Zhen Fang, Lei Tian, Xinyan Xiao, Libiao Jin, Hua Wu
Recent advances in text-to-video (T2V) generation with diffusion models have garnered significant attention. However, they typically perform well in scenes with a single object and motion, struggling in compositional scenarios with multiple objects and distinct motions to accurately reflect the semantic content of text prompts. To address these challenges, we propose \textbf{StarVid}, a plug-and-play, training-free method that improves semantic alignment between multiple subjects, their motions, and text prompts in T2V models. StarVid first leverages the spatial reasoning capabilities of large language models (LLMs) for two-stage motion trajectory planning based on text prompts. Such trajectories serve as spatial priors, guiding a spatial-aware loss to refocus cross-attention (CA) maps into distinctive regions. Furthermore, we propose a syntax-guided contrastive constraint to strengthen the correlation between the CA maps of verbs and their corresponding nouns, enhancing motion-subject binding. Both qualitative and quantitative evaluations demonstrate that the proposed framework significantly outperforms baseline methods, delivering videos of higher quality with improved semantic consistency.
最近,文本到视频(T2V)生成中的扩散模型的进展引起了人们的广泛关注。然而,它们在单一物体和运动的场景中表现较好,但在多个物体和不同运动的组合场景中,难以准确反映文本提示的语义内容。为了解决这些挑战,我们提出了StarVid,这是一种即插即用、无需训练的方法,旨在提高T2V模型中多个主题、其运动与文本提示之间的语义对齐。StarVid首先利用大型语言模型(LLM)的空间推理能力,进行基于文本提示的两阶段运动轨迹规划。这些轨迹作为空间先验,引导空间感知损失来重新定位交叉注意力(CA)地图到不同的区域。此外,我们提出了语法指导的对比约束,以加强动词和对应名词的CA地图之间的相关性,增强运动主题绑定。定性和定量评估均表明,所提出的框架显著优于基准方法,生成的视频质量更高,语义一致性更强。
论文及项目相关链接
Summary
文本到视频(T2V)生成领域近期利用扩散模型取得了显著进展,但在处理包含多个对象和不同运动的场景时仍面临挑战,难以准确反映文本提示的语义内容。为解决此问题,提出一种名为StarVid的即插即用、无需训练的方法,可提升T2V模型中多个主题、其运动和文本提示之间的语义对齐。StarVid首先利用大型语言模型的空间推理能力,基于文本提示进行两阶段运动轨迹规划。这些轨迹作为空间先验,引导空间感知损失,将交叉注意力(CA)地图重新定位到不同区域。此外,还提出一种语法指导的对比约束,增强动词和对应名词的CA地图之间的关联性,增强运动主题绑定。评估和实验证明,该方法显著优于基准方法,生成高质量且语义一致的视频。
Key Takeaways
- 文本到视频生成领域存在挑战:处理包含多个对象和不同运动的场景时难以准确反映文本提示的语义内容。
- StarVid是一种无需训练的方法,可改善T2V模型中语义对齐问题。
- StarVid利用大型语言模型的空间推理能力进行两阶段运动轨迹规划。
- 空间先验和交叉注意力地图的重定位是提高视频质量的关键。
- 语法指导的对比约束增强了动词和名词之间的关联性,提高运动主题绑定。
- 与基准方法相比,StarVid显著提高了视频生成的语义一致性和质量。
点此查看论文截图






PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs
Authors:Xueluan Gong, Mingzhe Li, Yilin Zhang, Fengyuan Ran, Chen Chen, Yanjiao Chen, Qian Wang, Kwok-Yan Lam
Large Language Models (LLMs) have excelled in various tasks but are still vulnerable to jailbreaking attacks, where attackers create jailbreak prompts to mislead the model to produce harmful or offensive content. Current jailbreak methods either rely heavily on manually crafted templates, which pose challenges in scalability and adaptability, or struggle to generate semantically coherent prompts, making them easy to detect. Additionally, most existing approaches involve lengthy prompts, leading to higher query costs. In this paper, to remedy these challenges, we introduce a novel jailbreaking attack framework called PAPILLON, which is an automated, black-box jailbreaking attack framework that adapts the black-box fuzz testing approach with a series of customized designs. Instead of relying on manually crafted templates,PAPILLON starts with an empty seed pool, removing the need to search for any related jailbreaking templates. We also develop three novel question-dependent mutation strategies using an LLM helper to generate prompts that maintain semantic coherence while significantly reducing their length. Additionally, we implement a two-level judge module to accurately detect genuine successful jailbreaks. We evaluated PAPILLON on 7 representative LLMs and compared it with 5 state-of-the-art jailbreaking attack strategies. For proprietary LLM APIs, such as GPT-3.5 turbo, GPT-4, and Gemini-Pro, PAPILLONs achieves attack success rates of over 90%, 80%, and 74%, respectively, exceeding existing baselines by more than 60%. Additionally, PAPILLON can maintain high semantic coherence while significantly reducing the length of jailbreak prompts. When targeting GPT-4, PAPILLON can achieve over 78% attack success rate even with 100 tokens. Moreover, PAPILLON demonstrates transferability and is robust to state-of-the-art defenses. Code: https://github.com/aaFrostnova/Papillon
大型语言模型(LLM)在各种任务中表现出色,但仍容易受到越狱攻击的影响。攻击者会制造越狱提示来误导模型,产生有害或冒犯性的内容。当前的越狱方法要么严重依赖手工制作的模板,这在可扩展性和适应性方面带来挑战,要么难以生成语义连贯的提示,使其容易被检测。此外,大多数现有方法都需要冗长的提示,导致查询成本较高。
针对这些挑战,本文介绍了一种名为PAPILLON的新型越狱攻击框架。它是一个自动化、黑箱越狱攻击框架,采用黑箱模糊测试方法与一系列定制设计相结合。与依赖手工制作的模板不同,PAPILLON从一个空的种子池开始,无需搜索任何相关的越狱模板。我们还利用LLM助手开发了三种新颖的问题相关变异策略,生成保持语义连贯性同时显著缩短长度的提示。此外,我们实现了一个两级判断模块,以准确检测真正的成功越狱。
我们在7个代表性LLM上评估了PAPILLON,并与5种最先进的越狱攻击策略进行了比较。对于专有LLM API,如GPT-3.5 turbo、GPT-4和Gemini-Pro,PAPILLON的攻击成功率分别超过90%、80%和74%,超过现有基线60%以上。此外,PAPILLON在保持高语义连贯性的同时,能显著缩短越狱提示的长度。当针对GPT-4时,即使在100个令牌的情况下,PAPILLON的攻击成功率也能达到78%以上。而且,PAPILLON具有可迁移性,对最先进的防御措施表现出稳健性。
代码:https://github.com/aaFrostnova/Papillon
论文及项目相关链接
Summary:
LLMs虽然擅长各种任务,但仍易受越狱攻击的影响。现有越狱方法要么依赖于手工制作的模板,面临可扩展性和适应性的挑战,要么难以生成语义连贯的提示,容易被检测。针对这些问题,本文提出了一种名为PAPILLON的自动化黑箱越狱攻击框架,采用黑箱模糊测试方法与一系列定制设计相结合。PAPILLON通过自动化生成语义连贯且简短的内容绕过现有LLM防线实施攻击,显著提高了越狱成功率并降低了查询成本。
Key Takeaways:
- LLMs尽管在各种任务中表现出色,但仍面临越狱攻击的风险。
- 当前越狱方法面临的挑战包括缺乏可扩展性和适应性、难以生成语义连贯的提示以及高查询成本。
- PAPILLON是一种新型自动化黑箱越狱攻击框架,无需搜索相关越狱模板。
- PAPILLON采用三种新型问题相关的变异策略生成简短且语义连贯的提示。
- PAPILLON实施了一个两级判断模块来准确检测真正的成功越狱。
- 在代表性LLMs上评估,PAPILLON对专有LLM API(如GPT-3.5 turbo、GPT-4和Gemini-Pro)的越狱成功率超过现有基线60%以上。
点此查看论文截图



