发布日期: 2025-02-28

更新日期: 2025-05-14

文章字数: 7.3k

阅读时长: 30 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-02-28 更新

FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users

Authors:Anikait Singh, Sheryl Hsu, Kyle Hsu, Eric Mitchell, Stefano Ermon, Tatsunori Hashimoto, Archit Sharma, Chelsea Finn

Effective personalization of LLMs is critical for a broad range of user-interfacing applications such as virtual assistants and content curation. Inspired by the strong in-context learning capabilities of LLMs, we propose Few-Shot Preference Optimization (FSPO), which reframes reward modeling as a meta-learning problem. Under this framework, an LLM learns to quickly adapt to a user via a few labeled preferences from that user, constructing a personalized reward function for them. Additionally, since real-world preference data is scarce and challenging to collect at scale, we propose careful design choices to construct synthetic preference datasets for personalization, generating over 1M synthetic personalized preferences using publicly available LLMs. In particular, to successfully transfer from synthetic data to real users, we find it crucial for the data to exhibit both high diversity and coherent, self-consistent structure. We evaluate FSPO on personalized open-ended generation for up to 1,500 synthetic users across across three domains: movie reviews, pedagogical adaptation based on educational background, and general question answering, along with a controlled human study. Overall, FSPO achieves an 87% Alpaca Eval winrate on average in generating responses that are personalized to synthetic users and a 72% winrate with real human users in open-ended question answering.

有效的大模型个性化策略对于各种用户接口应用至关重要，例如虚拟助手和内容整理。我们受到大模型上下文学习能力的启发，提出了Few-Shot Preference Optimization（FSPO）方法，该方法将奖励建模重新定位为元学习问题。在该框架下，大模型通过学习快速适应用户通过少量标记偏好来自用户构建个性化的奖励功能。此外，由于现实世界中的偏好数据稀缺且难以大规模收集，我们提出了精心设计选择来构建个性化合成数据集，使用公开可用的大模型生成超过1M的合成个性化偏好。特别是，为了成功从合成数据转移到真实用户，我们发现数据的高多样性和连贯、一致的结构至关重要。我们在三个领域对FSPO进行了个性化开放式生成评估：电影评论、基于教育背景的教学适应和一般问答，以及受控的人类研究。总体而言，FSPO在针对合成用户的个性化生成方面平均获得了87%的Alpaca Eval胜率，在开放式问答中对真实人类用户的胜率为72%。

Summary

本文提出了基于LLM的个性化方法——Few-Shot Preference Optimization（FSPO）。通过重构奖励模型为元学习问题，LLM可以快速适应用户的偏好并构建个性化奖励函数。为了从合成数据转移到真实用户，合成数据必须展现高多样性和连贯性结构。评估结果显示，FSPO在个性化生成方面取得了显著成果。

Key Takeaways

Few-Shot Preference Optimization（FSPO）是一种基于LLM的有效个性化方法，适用于虚拟助理和内容推荐等用户接口应用。
FSPO通过重构奖励模型为元学习问题，使LLM能够快速适应用户偏好并构建个性化奖励函数。
为解决真实世界偏好数据稀缺和难以大规模收集的问题，提出了构建合成偏好数据集的方法，成功生成超过1M的合成个性化偏好数据。
数据的高多样性和连贯性结构是从合成数据转移到真实用户的关键因素。
FSPO在个性化生成方面取得了显著成果，尤其是在电影评论、基于教育背景的教学适应和一般问答等领域。
在对合成用户的评估中，FSPO的平均Alpaca Eval胜率达到87%。

Cool Papers

点此查看论文截图

Brain-inspired analogical mixture prototypes for few-shot class-incremental learning

Authors:Wanyi Li, Wei Wei, Yongkang Luo, Peng Wang

Few-shot class-incremental learning (FSCIL) poses significant challenges for artificial neural networks due to the need to efficiently learn from limited data while retaining knowledge of previously learned tasks. Inspired by the brain’s mechanisms for categorization and analogical learning, we propose a novel approach called Brain-inspired Analogical Mixture Prototypes (BAMP). BAMP has three components: mixed prototypical feature learning, statistical analogy, and soft voting. Starting from a pre-trained Vision Transformer (ViT), mixed prototypical feature learning represents each class using a mixture of prototypes and fine-tunes these representations during the base session. The statistical analogy calibrates the mean and covariance matrix of prototypes for new classes according to similarity to the base classes, and computes classification score with Mahalanobis distance. Soft voting combines both merits of statistical analogy and an off-shelf FSCIL method. Our experiments on benchmark datasets demonstrate that BAMP outperforms state-of-the-art on both traditional big start FSCIL setting and challenging small start FSCIL setting. The study suggests that brain-inspired analogical mixture prototypes can alleviate catastrophic forgetting and over-fitting problems in FSCIL.

少量类别增量学习（FSCIL）给人工神经网络带来了重大挑战，因为需要在有限数据中有效学习，同时保留先前学习任务的记忆。受大脑分类和类比学习机制的启发，我们提出了一种新的方法，称为Brain-inspired Analogical Mixture Prototypes（BAMP）。BAMP包含三个组成部分：混合原型特征学习、统计类比和软投票。从预训练的Vision Transformer（ViT）开始，混合原型特征学习通过混合原型表示每个类别，并在基础会话期间对这些表示进行微调。统计类比根据新类别与基础类别的相似性对原型的均值和协方差矩阵进行校准，并使用马氏距离计算分类分数。软投票结合了统计类比和现成的FSCIL方法两者的优点。我们在基准数据集上的实验表明，BAMP在传统的大起始FSCIL设置和具有挑战性的小起始FSCIL设置上都优于最新技术状态。研究表明，受大脑启发的类比混合原型可以缓解FSCIL中的灾难性遗忘和过度拟合问题。

论文及项目相关链接

PDF under review

Summary

基于预训练的Vision Transformer（ViT），提出一种新颖的类增量学习方法Brain-inspired Analogical Mixture Prototypes（BAMP）。它通过混合原型特征学习、统计类比和软投票三个组件，有效应对有限数据下的学习任务并保留先前学习的知识。实验结果在基准数据集上证明BAMP在传統的较大样本开始FSCIL场景和具有挑战性的小样本开始FSCIL场景中均优于当前顶尖水平。这暗示了BAMP可能有助于解决FSCIL中的灾难性遗忘和过度拟合问题。

Key Takeaways

BAMP方法结合了预训练的Vision Transformer（ViT）。
混合原型特征学习用于表示每个类的混合原型并微调这些表示。
统计类比根据新类别与基础类别的相似性来调整原型的均值和协方差矩阵。
使用Mahalanobis距离计算分类分数。
软投票结合了统计类比和现有FSCIL方法的优点。
BAMP在基准数据集上的表现优于当前顶尖水平。
BAMP有助于解决FSCIL中的灾难性遗忘和过度拟合问题。

Cool Papers

点此查看论文截图

Exploring Graph Tasks with Pure LLMs: A Comprehensive Benchmark and Investigation

Authors:Yuxiang Wang, Xinnan Dai, Wenqi Fan, Yao Ma

Graph-structured data has become increasingly prevalent across various domains, raising the demand for effective models to handle graph tasks like node classification and link prediction. Traditional graph learning models like Graph Neural Networks (GNNs) have made significant strides, but their capabilities in handling graph data remain limited in certain contexts. In recent years, large language models (LLMs) have emerged as promising candidates for graph tasks, yet most studies focus primarily on performance benchmarks and fail to address their broader potential, including their ability to handle limited data, their transferability across tasks, and their robustness. In this work, we provide a comprehensive exploration of LLMs applied to graph tasks. We evaluate the performance of pure LLMs, including those without parameter optimization and those fine-tuned with instructions, across various scenarios. Our analysis goes beyond accuracy, assessing LLM ability to perform in few-shot/zero-shot settings, transfer across domains, understand graph structures, and demonstrate robustness in challenging scenarios. We conduct extensive experiments with 16 graph learning models alongside 6 LLMs (e.g., Llama3B, GPT-4o, Qwen-plus), comparing their performance on datasets like Cora, PubMed, ArXiv, and Products. Our findings show that LLMs, particularly those with instruction tuning, outperform traditional models in few-shot settings, exhibit strong domain transferability, and demonstrate excellent generalization and robustness. This work offers valuable insights into the capabilities of LLMs for graph learning, highlighting their advantages and potential for real-world applications, and paving the way for future research in this area. Codes and datasets are released in https://github.com/myflashbarry/LLM-benchmarking.

图形结构化数据在各个领域的普及程度越来越高，对处理图形任务（如节点分类和链接预测）的有效模型的需求也随之增加。传统的图形学习模型，如图神经网络（GNNs）已经取得了显著的进步，但在某些情况下处理图形数据的能力仍然有限。近年来，大型语言模型（LLMs）作为图形任务的候选者前景广阔，但大多数研究主要集中在性能基准测试上，未能解决其更广泛的潜力，包括处理有限数据的能力、跨任务的迁移能力以及稳健性。在这项工作中，我们对LLMs在图形任务中的应用进行了全面的探索。我们评估了纯LLMs的性能，包括那些无需参数优化和经过指令微调的情况。我们的分析超越了准确性，评估了LLM在少量样本/零样本设置中的表现、跨域的迁移能力、对图形结构的理解以及在具有挑战的场景中展现的稳健性。我们使用大量的实验对与图学习模型的比较，对与16种图学习模型并行的六个大型语言模型进行了评估（如Llama3B、GPT-4o和Qwen-plus），对比其在诸如Cora、PubMed、ArXiv和Products等数据集上的性能表现。我们的研究结果表明，LLMs在少量样本设置中具有出色的表现，特别是在经过指令微调后，显示出强大的领域迁移能力，并展现出良好的泛化和稳健性。这项工作为LLMs在图形学习方面的能力提供了有价值的见解，突出了其在现实世界应用中的优势和潜力，并为该领域的未来研究铺平了道路。相关代码和数据集已在https://github.com/myflashbarry/LLM-benchmarking上发布。

论文及项目相关链接

PDF

Summary
大型语言模型（LLMs）在处理图任务时展现出潜力，特别是在少样本或无样本场景下表现优异。本研究对LLMs在图任务中的应用进行了全面探索，与16个图学习模型和6个LLMs进行实验对比，发现LLMs特别是经过指令调优的模型表现出强大的性能和优越性。

Key Takeaways

大型语言模型（LLMs）在图任务中展现出潜力。
LLMs能在少样本或无样本场景下处理图数据。
LLMs具有强大的跨域转移能力。
相较于传统图学习模型，LLMs在图任务中表现出更好的性能。
指令调优的LLMs在图任务中展现更强大的性能。
LLMs在处理图任务时展现出良好的泛化和鲁棒性。

Cool Papers

点此查看论文截图

On the Generalization and Adaptation Ability of Machine-Generated Text Detectors in Academic Writing

Authors:Yule Liu, Zhiyuan Zhong, Yifan Liao, Zhen Sun, Jingyi Zheng, Jiaheng Wei, Qingyuan Gong, Fenghua Tong, Yang Chen, Yang Zhang, Xinlei He

The rising popularity of large language models (LLMs) has raised concerns about machine-generated text (MGT), particularly in academic settings, where issues like plagiarism and misinformation are prevalent. As a result, developing a highly generalizable and adaptable MGT detection system has become an urgent priority. Given that LLMs are most commonly misused in academic writing, this work investigates the generalization and adaptation capabilities of MGT detectors in three key aspects specific to academic writing: First, we construct MGT-Acedemic, a large-scale dataset comprising over 336M tokens and 749K samples. MGT-Acedemic focuses on academic writing, featuring human-written texts (HWTs) and MGTs across STEM, Humanities, and Social Sciences, paired with an extensible code framework for efficient benchmarking. Second, we benchmark the performance of various detectors for binary classification and attribution tasks in both in-domain and cross-domain settings. This benchmark reveals the often-overlooked challenges of attribution tasks. Third, we introduce a novel attribution task where models have to adapt to new classes over time without (or with very limited) access to prior training data in both few-shot and many-shot scenarios. We implement eight different adapting techniques to improve the performance and highlight the inherent complexity of the task. Our findings provide insights into the generalization and adaptation ability of MGT detectors across diverse scenarios and lay the foundation for building robust, adaptive detection systems. The code framework is available at https://github.com/Y-L-LIU/MGTBench-2.0.

随着大型语言模型（LLM）的日益普及，人们对机器生成文本（MGT）的担忧也随之增加，特别是在学术环境中，抄袭和误信息等问题的普遍存在。因此，开发一个高度通用化和适应性的MGT检测系统已成为一项紧迫的任务。鉴于LLM在学术写作中最常被滥用，本研究探讨了MGT检测器在学术写作方面的三个关键方面的通用性和适应性能力：首先，我们构建了MGT-Acedemic，这是一个大规模数据集，包含超过3.36亿个令牌和74.9万个样本。MGT-Acedemic专注于学术写作，涵盖了STEM、人文和社会科学领域的人类书写文本（HWTs）和MGTs，并配备了一个可扩展的代码框架，以便进行有效的基准测试。其次，我们对各种检测器在域内和跨域设置中的二元分类和归属任务性能进行了基准测试。这一基准测试揭示了归属任务经常被忽视的挑战。第三，我们引入了一个新的归属任务，即模型必须在没有（或非常有限的）先前训练数据的情况下，在少数和多数场景中逐渐适应新的类别。我们实施了八种不同的适应技术来提高性能，并突出了这项任务的固有复杂性。我们的研究为MGT检测器在不同场景下的通用性和适应能力提供了洞察，并为构建稳健、自适应的检测系统奠定了基础。代码框架可在https://github.com/Y-L-LIU/MGTBench-2.0找到。

论文及项目相关链接

PDF

Summary

大型语言模型（LLM）的普及引发了关于机器生成文本（MGT）的担忧，特别是在学术环境中。为了应对抄袭和误导信息等常见问题，开发具有高度通用性和适应性的MGT检测系统已成为当务之急。本文构建了一个专注于学术写作的大型数据集MGT-Acedemic，并评估了不同检测器在二元分类和归属任务中的性能，包括内部领域和跨领域设置。此外，本文引入了一种新的归属任务，要求模型在少量或几乎没有先前训练数据的情况下，随时间适应新类别。本文的研究结果揭示了MGT检测器在多种场景下的通用性和适应性，为构建稳健、自适应的检测系统提供了基础。

Key Takeaways

大型语言模型（LLM）的普及引发了机器生成文本（MGT）在学术写作中的滥用问题，需要开发高度通用和适应的MGT检测系统。
构建了专注于学术写作的大型数据集MGT-Acedemic，包含超过336M标记和749K样本，用于评估MGT检测器的性能。
评估了不同检测器在二元分类和归属任务中的性能，涉及内部领域和跨领域设置，揭示了归属任务的挑战。
引入了一种新的归属任务，要求模型在有限或没有先前训练数据的情况下随时间适应新类别，并实施了八种不同的适应技术来提高性能。
研究结果强调了MGT检测器在多种场景下的通用性和适应性挑战。
代码框架的可用性为研究人员提供了一个基础，以进一步开发和改进MGT检测系统。

Cool Papers

点此查看论文截图

Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation

Authors:Zhaochong An, Guolei Sun, Yun Liu, Runjia Li, Min Wu, Ming-Ming Cheng, Ender Konukoglu, Serge Belongie

Few-shot 3D point cloud segmentation (FS-PCS) aims at generalizing models to segment novel categories with minimal annotated support samples. While existing FS-PCS methods have shown promise, they primarily focus on unimodal point cloud inputs, overlooking the potential benefits of leveraging multimodal information. In this paper, we address this gap by introducing a multimodal FS-PCS setup, utilizing textual labels and the potentially available 2D image modality. Under this easy-to-achieve setup, we present the MultiModal Few-Shot SegNet (MM-FSS), a model effectively harnessing complementary information from multiple modalities. MM-FSS employs a shared backbone with two heads to extract intermodal and unimodal visual features, and a pretrained text encoder to generate text embeddings. To fully exploit the multimodal information, we propose a Multimodal Correlation Fusion (MCF) module to generate multimodal correlations, and a Multimodal Semantic Fusion (MSF) module to refine the correlations using text-aware semantic guidance. Additionally, we propose a simple yet effective Test-time Adaptive Cross-modal Calibration (TACC) technique to mitigate training bias, further improving generalization. Experimental results on S3DIS and ScanNet datasets demonstrate significant performance improvements achieved by our method. The efficacy of our approach indicates the benefits of leveraging commonly-ignored free modalities for FS-PCS, providing valuable insights for future research. The code is available at https://github.com/ZhaochongAn/Multimodality-3D-Few-Shot

少量射击3D点云分割（FS-PCS）旨在将模型推广到对新型类别的分割，且只需要极少量的注释支持样本。虽然现有的FS-PCS方法已经显示出潜力，但它们主要关注单模态点云输入，忽视了利用多模态信息可能带来的潜在好处。在本文中，我们通过引入多模态FS-PCS设置来解决这一差距，利用文本标签和可用的2D图像模式。在这个易于实现的设置中，我们提出了多模态少量射击SegNet（MM-FSS），该模型能够有效地利用来自多个模态的互补信息。MM-FSS采用共享主干和两个头来提取跨模态和单模态视觉特征，并使用预训练的文本编码器生成文本嵌入。为了充分利用多模态信息，我们提出了多模态关联融合（MCF）模块来生成多模态关联，以及多模态语义融合（MSF）模块，使用文本感知语义指导来完善关联。此外，我们提出了一种简单有效的测试时自适应跨模态校准（TACC）技术，以减轻训练偏见，进一步提高泛化能力。在S3DIS和ScanNet数据集上的实验结果证明了我们的方法取得了显著的性能改进。我们的方法的有效性表明了利用常被忽略的免费模态对于FS-PCS的好处，为未来的研究提供了有价值的见解。代码可在https://github.com/ZhaochongAn/Multimodality-3D-Few-Shot上获取。

论文及项目相关链接

PDF Published at ICLR 2025 (Spotlight)

Summary

本文介绍了针对少样本3D点云分割（FS-PCS）的多模态方法。现有FS-PCS方法主要关注单模态点云输入，忽略了多模态信息的潜力。本文引入多模态FS-PCS设置，利用文本标签和可用的2D图像模态。提出的多模态少样本SegNet（MM-FSS）模型能有效利用多模态的互补信息。通过共享骨架、两个头提取跨模态和单模态视觉特征，并使用预训练的文本编码器生成文本嵌入。为充分利用多模态信息，提出多模态关联融合（MCF）模块和多模态语义融合（MSF）模块。此外，还提出了一种简单有效的测试时自适应跨模态校准（TACC）技术，以减轻训练偏见，进一步提高泛化能力。在S3DIS和ScanNet数据集上的实验结果表明，该方法取得了显著的性能改进。

Key Takeaways

现有FS-PCS方法主要关注单模态点云输入，忽略了多模态信息的潜力。
引入多模态FS-PCS设置，利用文本标签和2D图像模态。
提出的多模态少样本SegNet（MM-FSS）模型能有效利用多模态的互补信息。
MM-FSS使用共享骨架、两个头来提取跨模态和单模态视觉特征，并结合预训练的文本编码器。
提出了多模态关联融合（MCF）模块和多模态语义融合（MSF）模块以充分利用多模态信息。
采用测试时自适应跨模态校准（TACC）技术减轻训练偏见，提高模型泛化能力。

Cool Papers

点此查看论文截图

Local-Prompt: Extensible Local Prompts for Few-Shot Out-of-Distribution Detection

Authors:Fanhu Zeng, Zhen Cheng, Fei Zhu, Hongxin Wei, Xu-Yao Zhang

Out-of-Distribution (OOD) detection, aiming to distinguish outliers from known categories, has gained prominence in practical scenarios. Recently, the advent of vision-language models (VLM) has heightened interest in enhancing OOD detection for VLM through few-shot tuning. However, existing methods mainly focus on optimizing global prompts, ignoring refined utilization of local information with regard to outliers. Motivated by this, we freeze global prompts and introduce Local-Prompt, a novel coarse-to-fine tuning paradigm to emphasize regional enhancement with local prompts. Our method comprises two integral components: global prompt guided negative augmentation and local prompt enhanced regional regularization. The former utilizes frozen, coarse global prompts as guiding cues to incorporate negative augmentation, thereby leveraging local outlier knowledge. The latter employs trainable local prompts and a regional regularization to capture local information effectively, aiding in outlier identification. We also propose regional-related metric to empower the enrichment of OOD detection. Moreover, since our approach explores enhancing local prompts only, it can be seamlessly integrated with trained global prompts during inference to boost the performance. Comprehensive experiments demonstrate the effectiveness and potential of our method. Notably, our method reduces average FPR95 by 5.17% against state-of-the-art method in 4-shot tuning on challenging ImageNet-1k dataset, even outperforming 16-shot results of previous methods. Code is released at https://github.com/AuroraZengfh/Local-Prompt.

异常检测（OOD检测）旨在区分已知类别中的异常值，在真实场景中已经得到了广泛应用。最近，视觉语言模型（VLM）的出现增强了通过少样本调整来增强VLM的异常检测的兴趣。然而，现有的方法主要集中在优化全局提示上，忽略了利用关于异常值的局部信息的精细化。受此启发，我们冻结全局提示并引入Local-Prompt，这是一种新的从粗到细的调整范式，旨在通过局部提示来强调区域增强。我们的方法包含两个基本组成部分：全局提示引导下的负增强和局部提示增强的区域正则化。前者利用冻结的粗略全局提示作为引导线索来引入负增强，从而利用局部异常值知识。后者采用可训练局部提示和区域正则化来有效地捕获局部信息，有助于异常值识别。我们还提出了与区域相关的度量指标来增强异常检测能力。此外，由于我们的方法只专注于增强局部提示，因此可以在推理过程中无缝集成已训练的全局提示以提高性能。综合实验证明了我们的方法的有效性和潜力。值得注意的是，在具有挑战性的ImageNet-1k数据集上进行4次样本调整时，我们的方法将平均FPR95降低了5.17%，超过了最新方法的性能，甚至超过了以前方法的16次样本调整结果。相关代码已发布在https://github.com/AuroraZengfh/Local-Prompt。

论文及项目相关链接

PDF Accepted by The Thirteenth International Conference on Learning Representations (ICLR 2025). Code is available at https://github.com/AuroraZengfh/Local-Prompt

Summary
在分布式外检测（OOD）中，区分已知类别与异常值至关重要。现有的视觉语言模型（VLM）主要用于全局提示优化，忽视了局部信息的重要性。本文提出一种新的方法Local-Prompt，旨在通过局部提示进行微调，以强调区域增强。它包括全局提示引导负增强和局部提示增强区域正则化两个组成部分。该方法可无缝集成到训练好的全局提示中以提高性能。实验证明，该方法在ImageNet-1k数据集上的表现优于当前其他方法。此外，本文还提出了一个用于强化OOD检测的局部相关度量指标。总的来说，该方法对增强异常检测能力有积极影响。

Key Takeaways

OOD检测旨在区分已知类别与异常值，在实际场景中尤为重要。
现有的VLM模型主要关注全局提示优化，忽略了局部信息的重要性。
Local-Prompt是一种新的方法，旨在通过局部提示进行微调以增强区域信息。它包括全局提示引导负增强和局部提示增强区域正则化两个主要组成部分。
该方法可以无缝集成到训练好的全局提示中，以提高性能。

Cool Papers

点此查看论文截图

Meta Prompting for AI Systems

Authors:Yifan Zhang, Yang Yuan, Andrew Chi-Chih Yao

We introduce Meta Prompting (MP), a prompting paradigm designed to enhance the utilization of large language models (LLMs) and AI systems in complex problem-solving and data interaction. Grounded in type theory and category theory, Meta Prompting prioritizes structural and syntactical considerations over traditional content-centric methods. In this work, we formally define Meta Prompting, delineate its distinctions from few-shot prompting, and demonstrate its effectiveness across various AI applications. In particular, we show that Meta Prompting can decompose intricate reasoning tasks into simpler sub-problems, thereby improving token efficiency and enabling fairer comparisons with conventional few-shot techniques. Furthermore, we extend this framework to prompting tasks, allowing LLMs to recursively self-generate refined prompts in a metaprogramming-like manner. Empirical evaluations reveal that a Qwen-72B base language model equipped with Meta Prompting-without additional instruction tuning-achieves a PASS@1 accuracy of 46.3% on MATH problems, surpassing a supervised fine-tuned counterpart, 83.5% accuracy on GSM8K, and a 100% success rate on Game of 24 tasks using GPT-4. The code is available at https://github.com/meta-prompting/meta-prompting.

我们介绍了Meta Prompting（MP），这是一种旨在提高大型语言模型（LLM）和人工智能系统在复杂问题解决和数据交互中的利用率的提示范式。基于类型理论和范畴理论，Meta Prompting 优先考虑结构和句法因素，而非传统的内容中心方法。在这项工作中，我们正式定义了Meta Prompting，详细阐述了它与少样本提示的区别，并在各种人工智能应用中证明了其有效性。特别是，我们展示了Meta Prompting能够将复杂的推理任务分解成更简单的子问题，从而提高标记效率，并能以更公平的方式与常规少样本技术进行比。此外，我们将此框架扩展到提示任务上，使LLM能够递归地自我生成精细化的提示，类似于元编程的方式。实证评估表明，配备Meta Prompting的Quwen-7 - 在不使用任何额外指令调整的情况下在MATH问题上达到了46.3%的PASS@1准确率，超过了经过监督微调的对标模型；在GSM8K上达到了83.5%的准确率；在Game of 24任务上使用GPT-4达到了100%的成功率。代码可在https://github.com/meta-prompting/meta-prompting获取。

论文及项目相关链接

PDF

Summary

本文介绍了Meta Prompting（MP）这一新型提示范式，旨在提高大型语言模型（LLMs）和AI系统在复杂问题解决和数据交互中的利用效率。该研究基于类型理论和范畴理论，强调结构和句法因素，而非传统的内容中心方法。通过正式定义Meta Prompting，并详细阐述其与few-shot提示的区别，展示其在各种AI应用中的有效性。特别是，Meta Prompting能将复杂的推理任务分解为更简单的子问题，提高令牌效率，并与传统的few-shot技术进行更公平的比较。此外，该研究还将此框架扩展到提示任务，使LLMs能够以类似元编程的方式递归地自我生成精细提示。实证评估表明，配备Meta Prompting的Qwen-72B基础语言模型，无需额外的指令调整，在MATH问题上达到46.3%的PASS@1准确率，超越经过监督微调的对标模型；在GSM8K上达到83.5%的准确率；在Game of 24任务上使用GPT-4实现100%的成功率。

Key Takeaways

Meta Prompting是一种新型的提示范式，旨在增强大型语言模型和AI系统在复杂问题解决和数据交互中的表现。
Meta Prompting基于类型理论和范畴理论，注重结构和句法因素。
Meta Prompting能将复杂的推理任务分解为更简单的子问题，提高令牌效率。
Meta Prompting在多种AI应用中表现出有效性，包括MATH问题、GSM8K和Game of 24任务。
配备Meta Prompting的Qwen-72B模型在MATH问题上实现较高的PASS@1准确率。
Meta Prompting框架可以扩展到提示任务，使LLMs能够自我生成精细提示。
实证评估证明了Meta Prompting的有效性和潜力。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-02-28/Few-Shot/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

Few-Shot

I2I Translation

I2I Translation 方向最新论文已更新，请持续关注 Update in 2025-02-28 ProxyTransformation Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding

2025-02-28 I2I Translation

I2I Translation

Agent

Agent 方向最新论文已更新，请持续关注 Update in 2025-02-28 Agentic Reward Modeling Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

2025-02-28 Agent

Agent