发布日期: 2025-09-29

更新日期: 2025-11-27

文章字数: 4k

阅读时长: 16 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-29 更新

Technical report on label-informed logit redistribution for better domain generalization in low-shot classification with foundation models

Authors:Behraj Khan, Tahir Syed

Confidence calibration is an emerging challenge in real-world decision systems based on foundations models when used for downstream vision classification tasks. Due to various reasons exposed, logit scores on the CLIP head remain large irrespective of whether the image-language pairs reconcile. It is difficult to address in data space, given the few-shot regime. We propose a penalty incorporated into loss objective that penalizes incorrect classifications whenever one is made during finetuning, by moving an amount of log-likelihood to the true class commensurate to the relative amplitudes of the two likelihoods. We refer to it as \textit{confidence misalignment penalty (CMP)}. Extensive experiments on $12$ vision datasets and $5$ domain generalization datasets supports the calibration performance of our method against stat-of-the-art. CMP outperforms the benchmarked prompt learning methods, demonstrating average improvement in Expected Calibration Error (ECE) by average $6.01$%, $4.01$ % at minimum and $9.72$% at maximum.

在基于基础模型的现实世界决策系统中，当用于下游视觉分类任务时，置信度校准正成为一个新兴的挑战。由于各种原因，无论图像-语言对是否协调一致，CLIP 头的对数分数仍然很大，这使得在数据空间中难以解决这一问题，尤其是在小样本情况下。我们提出了一种损失目标中的惩罚项，即每当微调过程中出现错误分类时对其进行惩罚。通过将与两个概率的相对幅度相当的似然对数移动到真实类别来实现这一点。我们将其称为置信度不匹配惩罚（CMP）。在 12 个视觉数据集和 5 个领域泛化数据集上的大量实验支持了我们方法的校准性能，优于当前最佳水平。CMP 优于基准提示学习方法，在期望校准误差（ECE）方面平均提高了 6.01%，最低提高 4.01%，最高提高 9.72%。

论文及项目相关链接

PDF

Summary
基于预训练模型进行下游视觉分类任务时，置信度校准成为了一个新兴的挑战。在少量样本情况下，图像-语言对的不匹配导致logit分数依然较大。本研究提出了一种损失目标中的惩罚机制，即在微调过程中每当发生错误分类时，根据两个似然值的相对幅度将一部分对数似然值转移到真实类别，我们称之为置信度不匹配惩罚（CMP）。在广泛的实验对比中，CMP相较于当前最先进的方法有更好的校准性能，并在预期校准误差（ECE）上平均提高了6.01%，最小提高4.01%，最大提高9.72%。通过在不同视觉数据集上的测试证明了此方法的有效性。

Key Takeaways

置信度校准是基于预训练模型的下游视觉分类任务中的新兴挑战。
在有限样本条件下，图像-语言对的不匹配问题使得模型难以处理对数分数与类别对应度不一致的情况。
研究人员提出了一种称为“置信度不匹配惩罚（CMP）”的新方法来解决这个问题。
CMP通过在损失函数中增加惩罚项来工作，该惩罚项会在微调过程中发生错误分类时起作用。它将一部分对数似然值转移到真实类别，基于两个类别的相对似然值。

Cool Papers

点此查看论文截图

FSOS-AMC: Few-Shot Open-Set Learning for Automatic Modulation Classification Over Multipath Fading Channels

Authors:Hao Zhang, Fuhui Zhou, Qihui Wu, Chau Yuen

Automatic modulation classification (AMC) plays a vital role in advancing future wireless communication networks. Although deep learning (DL)-based AMC frameworks have demonstrated remarkable classification capabilities, they typically require large-scale training datasets and assume consistent class distributions between training and testing data-prerequisites that prove challenging in few-shot and open-set scenarios. To address these limitations, we propose a novel few-shot open-set AMC (FSOS-AMC) framework that integrates a multisequence multiscale attention network (MS-MSANet), meta-prototype training, and a modular open-set classifier. The MS-MSANet extracts features from multisequence input signals, while meta-prototype training optimizes both the feature extractor and the modular open-set classifier, which can effectively categorize testing data into known modulation types or identify potential unknown modulations. Extensive simulation results demonstrate that our FSOS-AMC framework achieves superior performance in few-shot open-set scenarios compared to state-of-the-art methods. Specifically, the framework exhibits higher classification accuracy for both known and unknown modulations, as validated by improved accuracy and area under the receiver operating characteristic curve (AUROC) metrics. Moreover, the proposed framework demonstrates remarkable robustness under challenging low signal-to-noise ratio (SNR) conditions, significantly outperforming existing approaches.

自动调制分类（AMC）在推动未来无线通信网络发展方面发挥着重要作用。尽管基于深度学习的AMC框架已表现出卓越的分类能力，但它们通常需要大规模的训练数据集，并假设训练数据和测试数据之间的类别分布是一致的——在少量样本和开放集场景中，这些先决条件证明是具有挑战性的。为了解决这些局限性，我们提出了一种新的少量样本开放集AMC（FSOS-AMC）框架，它集成了多序列多尺度注意网络（MS-MSANet）、元原型训练和模块化开放集分类器。MS-MSANet从多序列输入信号中提取特征，而元原型训练则优化特征提取器和模块化开放集分类器，有效地将测试数据分类为已知调制类型或识别潜在的未知调制。大量的仿真结果表明，我们的FSOS-AMC框架在少量样本开放集场景中的性能优于最新方法。具体来说，该框架在已知和未知调制方面表现出更高的分类准确性，这通过提高的准确性和接收者操作特性曲线下的面积（AUROC）指标得到了验证。此外，该框架在具有挑战性的低信噪比（SNR）条件下表现出显著的稳健性，大大优于现有方法。

论文及项目相关链接

PDF

Summary
针对未来无线通信网络中的自动调制分类问题，提出了一种新型的少样本开放式AMC（FSOS-AMC）框架。该框架结合了多序列多尺度注意力网络（MS-MSANet）、元原型训练和模块化开放式分类器，能够在少样本和开放式场景下实现高效分类。通过广泛的模拟实验验证，该框架相较于现有方法表现出卓越的性能。

Key Takeaways

自动调制分类（AMC）对未来无线通信网络发展至关重要。
深度学习（DL）虽能进行有效分类，但在少样本和开放式场景中面临挑战。
提出的FSOS-AMC框架集成了MS-MSANet、元原型训练和模块化开放式分类器。
MS-MSANet能够从多序列输入信号中提取特征。
元原型训练优化了特征提取器和模块化开放式分类器的性能。
模块化开放式分类器能有效区分已知调制类型并识别潜在未知调制。
模拟实验表明，FSOS-AMC框架在少样本开放式场景中具有卓越性能。

Cool Papers

点此查看论文截图

Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models

Authors:Xudong Li, Zihao Huang, Yan Zhang, Yunhang Shen, Ke Li, Xiawu Zheng, Liujuan Cao, Rongrong Ji

Image Quality Assessment (IQA) remains an unresolved challenge in computer vision due to complex distortions, diverse image content, and limited data availability. Existing Blind IQA (BIQA) methods largely rely on extensive human annotations, which are labor-intensive and costly due to the demanding nature of creating IQA datasets. To reduce this dependency, we propose the Gradient-Regulated Meta-Prompt IQA Framework (GRMP-IQA), designed to efficiently adapt the visual-language pre-trained model, CLIP, to IQA tasks, achieving high accuracy even with limited data. GRMP-IQA consists of two core modules: (i) Meta-Prompt Pre-training Module and (ii) Quality-Aware Gradient Regularization. The Meta Prompt Pre-training Module leverages a meta-learning paradigm to pre-train soft prompts with shared meta-knowledge across different distortions, enabling rapid adaptation to various IQA tasks. On the other hand, the Quality-Aware Gradient Regularization is designed to adjust the update gradients during fine-tuning, focusing the model’s attention on quality-relevant features and preventing overfitting to semantic information. Extensive experiments on standard BIQA datasets demonstrate the superior performance to the state-of-the-art BIQA methods under limited data setting. Notably, utilizing just 20% of the training data, GRMP-IQA is competitive with most existing fully supervised BIQA approaches.

图像质量评估（IQA）仍然是计算机视觉领域的一个待解决挑战，这主要是由于复杂的失真、多样的图像内容和数据可用性有限。现有的盲图像质量评估（BIQA）方法大多依赖于广泛的人工注释，由于创建IQA数据集的性质要求严格，因此劳动密集且成本高昂。为了减少这种依赖性，我们提出了梯度调节元提示IQA框架（GRMP-IQA），旨在有效地适应视觉语言预训练模型CLIP，即使在有限数据的情况下也能实现高准确性。GRMP-IQA由两个核心模块组成：（i）元提示预训练模块和（ii）质量感知梯度正则化。元提示预训练模块利用元学习范式来预训练软提示，以获取不同失真之间的共享元知识，从而能够迅速适应各种IQA任务。另一方面，质量感知梯度正则化设计用于微调期间的更新梯度的调整，使模型专注于质量相关的特征，并防止过度拟合语义信息。在标准BIQA数据集上的广泛实验表明，在有限数据设置下，GRMP-IQA的性能优于最先进的BIQA方法。值得注意的是，仅使用20%的训练数据，GRMP-IQA与大多数现有的完全监督BIQA方法具有竞争力。

论文及项目相关链接

PDF

Summary

GRMP-IQA框架通过梯度调节和元提示预训练模块，使CLIP模型适应IQA任务，即使数据有限也能实现高准确率。它通过元学习范式预训练软提示，快速适应各种IQA任务。同时，质量感知梯度调节模块在微调时调整更新梯度，使模型关注质量相关特征，防止过度拟合语义信息。在标准BIQA数据集上的实验表明，在有限数据设置下，GRMP-IQA性能优于最新BIQA方法。仅使用20%的训练数据，GRMP-IQA的表现在大多数现有的完全监督BIQA方法中亦具有竞争力。

Key Takeaways

IQA（图像质量评估）是计算机视觉领域的一个挑战性问题，原因在于复杂的失真、多样的图像内容和数据有限。
现有的盲IQA（BIQA）方法严重依赖于大量人工注释的数据集，这既耗时又耗资。
GRMP-IQA框架被设计用来解决上述问题，通过梯度调节和元提示预训练模块，使CLIP模型适应IQA任务。
GRMP-IQA包含两个核心模块：Meta-Prompt Pre-training Module和Quality-Aware Gradient Regularization。
Meta-Prompt Pre-training Module利用元学习范式进行软提示预训练，能迅速适应不同的IQA任务。
Quality-Aware Gradient Regularization模块用于微调时调整更新梯度，使模型关注质量相关特征并防止过度拟合语义信息。

Cool Papers

点此查看论文截图

TALEC: Teach Your LLM to Evaluate in Specific Domain with In-house Criteria by Criteria Division and Zero-shot Plus Few-shot

Authors:Kaiqi Zhang, Shuai Yuan, Honghan Zhao

With the rapid development of large language models (LLM), the evaluation of LLM becomes increasingly important. Measuring text generation tasks such as summarization and article creation is very difficult. Especially in specific application domains (e.g., to-business or to-customer service), in-house evaluation criteria have to meet not only general standards (correctness, helpfulness and creativity, etc.) but also specific needs of customers and business security requirements at the same time, making the evaluation more difficult. So far, the evaluation of LLM in business scenarios has mainly relied on manual, which is expensive and time-consuming. In this paper, we propose a model-based evaluation method: TALEC, which allows users to flexibly set their own evaluation criteria, and uses in-context learning (ICL) to teach judge model these in-house criteria. In addition, we try combining zero-shot and few-shot to make the judge model focus on more information. We also propose a prompt paradigm and an engineering approach to adjust and iterate the shots ,helping judge model to better understand the complex criteria. We then compare fine-tuning with ICL, finding that fine-tuning can be replaced by ICL. TALEC demonstrates a strong capability to accurately reflect human preferences and achieves a correlation of over 80% with human judgments, outperforming even the inter-human correlation in some tasks. The code is released in https://github.com/zlkqz/auto_eval

随着大型语言模型（LLM）的快速发展，对LLM的评估变得愈发重要。测量文本生成任务，如摘要和文章创建，是非常困难的。尤其在特定应用领域（例如，面向商业或客户服务），内部评估标准必须满足通用标准（正确性、有用性和创造性等），同时还要满足客户的特定需求和业务安全要求，这使得评估更加困难。迄今为止，商业场景中LLM的评估主要依赖于人工方式，这既昂贵又耗时。在本文中，我们提出了一种基于模型的评估方法：TALEC，它允许用户灵活地设置自己的评估标准，并使用上下文学习（ICL）来教授评判模型这些内部标准。此外，我们尝试将零样本和少样本相结合，使评判模型关注更多信息。我们还提出了一种提示范式和工程方法来调整和迭代样本，帮助评判模型更好地理解复杂标准。然后，我们将微调与ICL进行了比较，发现可以用ICL替代微调。TALEC表现出强烈的能力来准确反映人类偏好，与人类判断的相关性达到80%以上，在某些任务中甚至超过了人类之间的相关性。代码已发布在https://github.com/zlkqz/auto_eval。

论文及项目相关链接

PDF

Summary

随着大型语言模型（LLM）的快速发展，对其评估变得越来越重要。针对文本生成任务（如摘要和文章创建）的测量在特定应用领域（如面向商业或客户服务）中尤其困难，因为必须同时满足一般标准（正确性、有用性和创造性等）以及客户的特定需求和商业安全要求。迄今为止，商业场景中LLM的评估主要依赖于手动方式，既昂贵又耗时。本文提出了一种基于模型的评估方法TALEC，允许用户灵活设置自己的评估标准，并使用上下文学习（ICL）来教授判断模型这些内部标准。此外，本文尝试结合零样本和少样本技术，使判断模型更关注信息。本文还提出了一种提示范式和工程方法来调整和迭代样本，帮助判断模型更好地理解复杂的标准。研究发现，可以用ICL替代微调。TALEC能够准确反映人类偏好，与人类判断的相关性超过80%，在某些任务中甚至超过了人类之间的相关性。相关代码已发布在https://github.com/zlkqz/auto_eval。

Key Takeaways