发布日期: 2025-10-03

更新日期: 2025-11-27

文章字数: 5k

阅读时长: 20 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-10-03 更新

SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP

Authors:Christoph Timmermann, Hyunse Lee, Woojin Lee

While Contrastive Language-Image Pretraining (CLIP) excels at zero-shot tasks by aligning image and text embeddings, its performance in few-shot classification is hindered by a critical limitation: intra-modal misalignment. This issue, caused by a persistent modality gap and CLIP’s exclusively inter-modal training objective, leaves the embedding spaces uncalibrated, making direct image-to-image comparisons unreliable. Existing methods attempt to address this by refining similarity logits or by computationally expensive per-sample optimization. To overcome these challenges, we introduce SeMoBridge, a lightweight yet powerful approach that directly addresses the misalignment. Our method maps images into the text modality, while keeping their semantic content intact through what we call a Semantic Modality Bridge. SeMoBridge is closed-form and can optionally be trained through multi-modal supervision, combining image and text-alignment losses to optimize the projection. Experiments show that the trained version, SeMoBridge-T, requires only a fraction of the training time while overall outperforming other methods, particularly in low-data scenarios (1, 2, and 4 shots). The code is available at https://github.com/christti98/semobridge.

对比语言图像预训练（CLIP）在零样本任务中表现出色，通过对齐图像和文本嵌入，实现优异性能。然而，其在小样本分类方面的表现受到一个关键限制的影响：模态内的不对齐。这一问题由持续的模态差距和CLIP的单一模态间训练目标造成，使得嵌入空间未校准，直接进行图像间比较变得不可靠。现有方法试图通过微调相似性逻辑或计算昂贵的每样本优化来解决这一问题。为了克服这些挑战，我们引入了SeMoBridge，这是一种轻量级但强大的方法，直接解决不对齐问题。我们的方法将图像映射到文本模态，同时通过我们所谓的语义模态桥保持其语义内容完整。SeMoBridge是封闭形式的，可以通过多模态监督进行可选训练，结合图像和文本对齐损失来优化投影。实验表明，经过训练的SeMoBridge-T版本仅需一小部分训练时间，总体上优于其他方法，特别是在低数据场景（1、2和4个样本）下表现更出色。代码可在https://github.com/christti98/semobridge找到。

论文及项目相关链接

PDF 19 pages, 12 figures, Under review as a conference paper at ICLR 2026

Summary

CLIP模型在零样本任务上表现出色，但在少样本分类任务中受到模态间不对齐问题的限制。SeMoBridge方法通过构建语义模态桥梁，将图像映射到文本模态，同时保持语义内容完整，解决了这一问题。该方法既轻便又强大，可通过多模态监督进行训练，优化投影效果。实验表明，SeMoBridge-T版本在少样本场景下表现优异，且训练时间短。

Key Takeaways

CLIP模型在零样本任务上表现良好，但在少样本分类中受到模态间不对齐问题的限制。
SeMoBridge方法解决了CLIP模型在少样本分类中的模态间不对齐问题。
SeMoBridge通过构建语义模态桥梁，将图像映射到文本模态，保持语义内容完整。
SeMoBridge方法既轻便又强大，可通过多模态监督进行训练。
SeMoBridge-T版本在少样本场景下的表现优于其他方法。
SeMoBridge-T的训练时间较短。

Cool Papers

点此查看论文截图

Metaphor identification using large language models: A comparison of RAG, prompt engineering, and fine-tuning

Authors:Matteo Fuoli, Weihang Huang, Jeannette Littlemore, Sarah Turner, Ellen Wilding

Metaphor is a pervasive feature of discourse and a powerful lens for examining cognition, emotion, and ideology. Large-scale analysis, however, has been constrained by the need for manual annotation due to the context-sensitive nature of metaphor. This study investigates the potential of large language models (LLMs) to automate metaphor identification in full texts. We compare three methods: (i) retrieval-augmented generation (RAG), where the model is provided with a codebook and instructed to annotate texts based on its rules and examples; (ii) prompt engineering, where we design task-specific verbal instructions; and (iii) fine-tuning, where the model is trained on hand-coded texts to optimize performance. Within prompt engineering, we test zero-shot, few-shot, and chain-of-thought strategies. Our results show that state-of-the-art closed-source LLMs can achieve high accuracy, with fine-tuning yielding a median F1 score of 0.79. A comparison of human and LLM outputs reveals that most discrepancies are systematic, reflecting well-known grey areas and conceptual challenges in metaphor theory. We propose that LLMs can be used to at least partly automate metaphor identification and can serve as a testbed for developing and refining metaphor identification protocols and the theory that underpins them.

隐喻是话语的普遍特征和考察认知、情感和意识形态的强大工具。然而，由于隐喻的语境敏感性，大规模分析一直受到需要手动注释的限制。本研究探讨了大型语言模型（LLM）在全自动文本中识别隐喻的潜力。我们比较了三种方法：（i）检索增强生成（RAG），向模型提供代码本并根据其规则和示例指导其注释文本；（ii）指令设计，我们设计特定的任务口头指令；（iii）微调，在手工编码的文本上训练模型以优化性能。在指令设计中，我们测试了零样本、少样本和链式思维策略。我们的结果表明，最新封闭源代码的LLM可以达到高准确率，微调后中位数F1分数为0.79。对比人类和LLM的输出结果显示，大多数差异是系统的，反映了隐喻理论中的灰色地带和概念挑战。我们提议，可以利用LLM至少部分自动进行隐喻识别，并可以作为开发和精炼隐喻识别协议及其基础理论的测试平台。

论文及项目相关链接

PDF

Summary

本文探讨了大型语言模型在自动识别文本中的隐喻方面的潜力。研究比较了三种方法：使用代码本并提供指令让模型标注文本的方法、设计特定任务口头指令的方法以及通过手动画文本训练模型以优化性能的方法。结果显示，最新封闭源代码的大型语言模型可以达到高准确率，其中微调方法的F1分数中位数为0.79。对比人类与大型语言模型的输出，大部分差异具有系统性，反映了隐喻理论中的灰色地带和概念挑战。本研究表明，大型语言模型至少可以部分用于自动隐喻识别，并可作测试平台，用以开发和优化隐喻识别协议及其基础理论。

Key Takeaways

大型语言模型具备自动识别文本中隐喻的潜力。
探讨了三种隐喻识别方法：使用代码本和指令、设计特定任务口头指令、以及训练模型优化性能。
先进的大型语言模型准确率较高，其中微调方法的F1分数中位数为0.79。
人与大型语言模型在隐喻识别上的差异反映隐喻理论中的灰色地带和概念挑战。
大型语言模型可用于部分自动隐喻识别，成为发展和完善隐喻识别协议及理论的测试平台。
研究强调了大型语言模型在语境敏感任务中的潜力，如隐喻识别。

Cool Papers

点此查看论文截图

Training-free LLM Verification via Recycling Few-shot Examples

Authors:Dongseok Lee, Jimyung Hong, Dongyoung Kim, Jaehyung Kim

Although LLMs have achieved remarkable performance, the inherent stochasticity of their reasoning process and varying conclusions present significant challenges. Majority voting or Best-of-N with external verification models has been explored to find the most promising solution among multiple LLM outputs. However, these approaches have certain limitations, such as limited applicability or the cost of an additional training step. To address this problem, we propose a novel and effective framework that Recycles Few-shot examples to verify LLM outputs (ReFeri). Our key idea is to additionally utilize the given few-shot examples to evaluate the candidate outputs of the target query, not only using them to generate outputs as the conventional few-shot prompting setup. Specifically, ReFeri evaluates the generated outputs by combining two different scores, designed motivated from Bayes’ rule, and subsequently selects the candidate that is both confidently determined and contextually coherent through a few additional LLM inferences. Experiments with three different LLMs and across seven diverse tasks demonstrate that our framework significantly improves the accuracy of LLMs-achieving an average gain of 4.8%-through effective response selection, without additional training.

尽管大型语言模型（LLMs）已经取得了显著的性能，但它们推理过程中的固有随机性以及结论的多样性仍然带来了重大挑战。人们已经探索了多数投票或Best-of-N与外部验证模型等方法，以在多个LLM输出中找到最有前途的解决方案。然而，这些方法存在一定的局限性，如适用性有限或需要额外的训练步骤。为了解决这个问题，我们提出了一种新的有效框架，即利用少量示例来验证LLM输出（ReFeri）。我们的关键想法是，除了使用传统的少量提示设置来生成输出外，还利用给定的少量示例来评估目标查询的候选输出。具体来说，ReFeri通过结合两个受贝叶斯法则启发的不同分数来评估生成的输出，随后选择通过几次额外的LLM推理确定性强且上下文连贯的候选者。在三个不同的LLMs和七个不同的任务上进行的实验表明，我们的框架通过有效的响应选择，在不进行额外训练的情况下，显著提高了LLMs的准确性，平均提高了4.8%。

论文及项目相关链接

PDF

Summary

LLMs的推理过程存在固有的随机性和结论多样性，带来挑战。为从多个LLM输出中找到最佳解决方案，人们尝试了多数投票和最佳N选等外部验证模型，但存在局限性。为此，本文提出一种新型框架ReFeri，利用给定的少量示例来评估目标查询的候选输出，不仅用于生成输出。实验表明，ReFeri通过有效的响应选择，在不增加训练成本的情况下，显著提高了LLMs的准确性。

Key Takeaways

LLMs的推理过程存在随机性和结论多样性，带来挑战。
多数投票和最佳N选等外部验证模型已被探索用于解决LLM输出不一致的问题，但存在局限性。
ReFeri框架利用给定的少量示例来评估目标查询的候选输出。
ReFeri结合了两种不同的评分，基于贝叶斯规则设计，以评估生成的输出。
ReFeri能够通过少量的额外LLM推理，选择出既确定又符合上下文语境的候选输出。
实验表明，ReFeri框架显著提高了LLMs的准确性，平均提高了4.8%。

Cool Papers

点此查看论文截图

PCoreSet: Effective Active Learning through Knowledge Distillation from Vision-Language Models

Authors:Seongjae Kang, Dong Bok Lee, Hyungjoon Jang, Dongseop Kim, Sung Ju Hwang

Knowledge distillation (KD) is a widely used framework for training compact, task-specific models by transferring the knowledge from teacher models. However, its application to active learning (AL), which aims to minimize annotation costs through iterative sample selection, remains underexplored. This gap stems from the fact that KD typically assumes access to sufficient labeled data, whereas AL operates in data-scarce scenarios where task-specific teacher models are often unavailable. In this paper, we first introduce ActiveKD, a framework that integrates AL with KD by leveraging the zero- and few-shot capabilities of large vision-language models (VLMs). A key aspect of ActiveKD is the structured prediction bias of VLMs-i.e., their predictions form clusters in the probability space. We regard this structure as an inductive bias of the teacher model, capturing generalizable output patterns beneficial to student learning. To exploit this bias, we propose Probabilistic CoreSet (PCoreSet), a selection strategy that maximizes coverage in the probability space rather than the feature space. PCoreSet strategically selects probabilistically diverse unlabeled samples, facilitating more efficient transfer of teacher knowledge under limited annotation budgets. Extensive evaluations on 11 datasets show that ActiveKD consistently improves performance across selection methods (e.g., +29.07% on ImageNet, averaged over methods). Under ActiveKD, PCoreSet ranks first in 64/73 settings (approximately 87.7%) across 5 student and 3 teacher networks, always achieving the best performance except for first 2 AL rounds. Our code is available at https://github.com/erjui/PCoreSet.

知识蒸馏（KD）是一个广泛应用于通过从教师模型中转移知识来训练紧凑、特定任务的模型的框架。然而，其在主动学习（AL）中的应用仍然被较少探索，主动学习旨在通过迭代样本选择来最小化注释成本。这种差距源于KD通常假设可以访问足够的标记数据，而AL则在数据稀缺的场景中运行，其中特定任务的教师模型通常不可用。在本文中，我们首先介绍了ActiveKD，这是一个将AL与KD相结合的框架，它利用大型视觉语言模型（VLMs）的零样本和少样本能力。ActiveKD的关键方面是VLMs的结构化预测偏差，即它们的预测在概率空间中形成聚类。我们认为这种结构作为教师模型的归纳偏见，捕获了对学生学习有益的通用输出模式。为了利用这种偏见，我们提出了Probabilistic CoreSet（PCoreSet），这是一种选择策略，旨在最大化概率空间中的覆盖率而不是特征空间中的覆盖率。PCoreSet策略性地选择概率上多样化的未标记样本，在有限的注释预算下，更有效地转移教师知识。在11个数据集上的广泛评估表明，ActiveKD在多种选择方法中始终提高了性能（例如在ImageNet上提高了+29.07%，平均在所有方法上）。在ActiveKD下，PCoreSet在73次设置中的前六名中排名第一（约占87.7%），在所有学生网络和教师网络中始终排名第一，除了前两次主动学习回合。我们的代码可在https://github.com/erjui/PCoreSet找到。

论文及项目相关链接

PDF 39 pages, 25 figures, preprint

Summary

知识蒸馏（KD）是一种广泛应用于训练紧凑、特定任务的模型的框架，它通过由教师模型传授知识来实现。然而，关于知识蒸馏在主动学习（AL）中的应用仍然研究不足。主动学习旨在通过迭代样本选择来最小化注释成本，而知识蒸馏通常假设有足够的标记数据。本文提出了ActiveKD框架，它将主动学习（AL）和知识蒸馏（KD）相结合，利用大型视觉语言模型（VLMs）的零样本和少样本能力。ActiveKD的关键在于利用VLMs的结构预测偏差，即它们的预测在概率空间中形成聚类。我们认为这种结构作为教师模型的归纳偏差，可以捕获对学生学习有益的可概括输出模式。为了利用这种偏差，本文提出了Probabilistic CoreSet（PCoreSet）选择策略，它在概率空间中最大化覆盖而不是特征空间，从而在有限的注释预算下更有效地转移教师知识。在广泛的评估中，ActiveKD在多个数据集上始终提高了性能。PCoreSet在多数设置中都排名第一，并在几乎所有情况下都取得了最佳性能。

Key Takeaways

知识蒸馏（KD）是一种用于训练特定任务模型的框架，通过教师模型传授知识。
主动学习（AL）旨在通过迭代样本选择最小化注释成本，而传统的知识蒸馏假设有充足的标记数据。
本文提出了ActiveKD框架，结合了知识蒸馏和主动学习，利用大型视觉语言模型（VLMs）的零样本和少样本能力。
VLMs的结构预测偏差是关键，其预测在概率空间中形成聚类，被视为教师模型的归纳偏差。
为了利用这种偏差，提出了Probabilistic CoreSet（PCoreSet）选择策略，在概率空间中最大化覆盖，提高教师知识的转移效率。
在多个数据集上的评估显示，ActiveKD和PCoreSet策略显著提高了性能。

Cool Papers

点此查看论文截图

Prompt Tuning Decision Transformers with Structured and Scalable Bandits

Authors:Finn Rietz, Oleg Smirnov, Sara Karimi, Lele Cao

Prompt tuning has emerged as a key technique for adapting large pre-trained Decision Transformers (DTs) in offline Reinforcement Learning (RL), particularly in multi-task and few-shot settings. The Prompting Decision Transformer (PDT) enables task generalization via trajectory prompts sampled uniformly from expert demonstrations – without accounting for prompt informativeness. In this work, we propose a bandit-based prompt-tuning method that learns to construct optimal trajectory prompts from demonstration data at inference time. We devise a structured bandit architecture operating in the trajectory prompt space, achieving linear rather than combinatorial scaling with prompt size. Additionally, we show that the pre-trained PDT itself can serve as a powerful feature extractor for the bandit, enabling efficient reward modeling across various environments. We theoretically establish regret bounds and demonstrate empirically that our method consistently enhances performance across a wide range of tasks, high-dimensional environments, and out-of-distribution scenarios, outperforming existing baselines in prompt tuning.

提示调整作为一种关键技术，已经出现在离线强化学习（RL）中，用于适应大型预训练决策转换器（DTs），特别是在多任务和小样本设置中。提示决策转换器（PDT）能够通过从专家演示中均匀采样的轨迹提示来实现任务泛化，而不会考虑提示信息。在这项工作中，我们提出了一种基于强盗的提示调整方法，该方法在推理时间从演示数据中学习构建最优轨迹提示。我们设计了一种在轨迹提示空间中运行的结构化强盗架构，随着提示规模的增长，它实现了线性而不是组合扩展。此外，我们还表明，预训练的PDT本身可以作为强大的特征提取器用于强盗，能够在各种环境中实现有效的奖励建模。我们从理论上建立了后悔界限，并通过实验证明，我们的方法在各种任务、高维环境和超出分布场景中始终提高了性能，在提示调整方面优于现有基线。

论文及项目相关链接

PDF Accepted at NeurIPS 2025

Summary

预训练决策转换器（DT）在离线强化学习（RL）中的适应性调整是关键技术，特别是在多任务和小样本环境中。本研究提出了一种基于强盗的提示调整方法，该方法在推理阶段从演示数据中学习构建最优轨迹提示。本研究实现了一种轨迹提示空间的结构化强盗架构，通过线性而不是组合的方式扩展提示规模。此外，我们还展示了预训练的PDT可以作为强盗的强大特征提取器，能够在各种环境中进行有效的奖励建模。理论上的后悔界限和实证结果表明，该方法在广泛的任务、高维环境和超出分布的场景中始终提高了性能，在提示调整方面优于现有基线。

Key Takeaways

预先训练的决策转换器（DT）在离线强化学习（RL）中的适应性调整是多任务和小样本环境的关键技术。
提出了一种基于强盗的提示调整方法，能够从演示数据中学习构建最优轨迹提示。
实现了轨迹提示空间的结构化强盗架构，实现了线性扩展提示规模。
预训练的PDT可以作为强大的特征提取器用于奖励建模。
该方法在多种任务、高维环境和超出分布的场景中性能优越。
该方法在提示调整方面优于现有基线。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-10-03/Few-Shot/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

Few-Shot

Vision Transformer

Vision Transformer 方向最新论文已更新，请持续关注 Update in 2025-10-03 MMGeoLM Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal Models

2025-10-03 Vision Transformer

Vision Transformer

Agent

Agent 方向最新论文已更新，请持续关注 Update in 2025-10-03 Efficient and Transferable Agentic Knowledge Graph RAG via Reinforcement Learning

2025-10-03 Agent

Agent