⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-09-17 更新
UniPar: A Unified LLM-Based Framework for Parallel and Accelerated Code Translation in HPC
Authors:Tomer Bitan, Tal Kadosh, Erel Kaplan, Shira Meiri, Le Chen, Peter Morales, Niranjan Hasabnis, Gal Oren
Translating programs between various parallel programming languages is an important problem in the high-performance computing (HPC) community. Existing tools for this problem are either too narrow in scope and/or outdated. Recent explosive growth in the popularity of large language models (LLMs) and their ability to generate and translate code offers a potential alternative approach. Toward that end, we first need to systematically evaluate the ability of LLMs to translate between parallel languages. In this work, we introduce UniPar, a systematic evaluation framework for LLM-based parallel code translation. Specifically, in this work, we target translations between serial code, CUDA, and OpenMP. Our goal is to assess how well current instruction-tuned LLMs – specifically GPT-4o-mini and LLaMA-3.3-70B-Instruct – can be used out of the box or enhanced through known strategies. We evaluated four major usage modes: hyperparameter optimization for decoding, zero- and few-shot prompting, supervised fine-tuning, and iterative feedback through compiler-based repair. As a part of the evaluation, we construct a new dataset called PARATRANS, covering both serial-to-parallel translation and cross-paradigm transformations. Our findings reveal that while off-the-shelf models struggle under the default settings (e.g., GPT-4o-mini achieves only 46% compilation and 15% functional correctness), our UniPar methodology – combining fine-tuning, hyperparameter tuning, and compiler-guided repair – improves performance by up to 2X (69% compilation and 33% correctness). We believe that our findings will provide useful insights for researchers to further improve LLMs for the parallel language translation problem. UniPar source code and PARATRANS dataset are available at our GitHub repository https://github.com/Scientific-Computing-Lab/UniPar_AI.
在不同并行编程语言之间的程序翻译是高性能计算(HPC)社区中的重要问题。现有工具的范围要么过于狭窄,要么已经过时。最近,大型语言模型(LLM)的普及及其生成和翻译代码的能力提供了一个潜在的替代方案。为此,我们首先需要系统地评估LLM在并行语言之间的翻译能力。在这项工作中,我们介绍了UniPar,一个用于LLM并行代码翻译的系统性评价框架。具体来说,我们专注于串行代码、CUDA和OpenMP之间的翻译。我们的目标是评估当前指令调优的LLM——特别是GPT-4o-mini和LLaMA-3.3-70B-Instruct——是否可以直接使用或通过已知策略进行增强。我们评估了四种主要使用模式:解码超参数优化、零样本和少样本提示、监督微调以及基于编译器的迭代反馈修复。作为评估的一部分,我们构建了一个名为PARATRANS的新数据集,涵盖了串行到并行翻译和跨范式转换。我们的研究发现,虽然现成的模型在默认设置下表现挣扎(例如,GPT-4o-mini仅实现46%的编译和15%的功能正确性),但我们的UniPar方法——结合微调、超参数调整和编译器引导修复——将性能提高了两倍(69%的编译和33%的正确性)。我们相信,我们的研究结果将为研究人员进一步改进LLM以解决并行语言翻译问题提供有益的见解。UniPar源代码和PARATRANS数据集可在我们的GitHub仓库中找到:https://github.com/Scientific-Computing-Lab/UniPar_AI。
论文及项目相关链接
PDF Accepted to IEEE HPEC conference 2025. 9 pages, incl references
Summary
本文介绍了UniPar框架,用于评估大型语言模型(LLMs)在并行代码翻译方面的能力。研究针对串行代码、CUDA和OpenMP之间的翻译进行了评估,并发现现有模型性能有限。通过结合微调、超参数调整和编译器引导修复,UniPar方法能提高性能,达到最高69%编译成功率和33%功能正确性。相关源代码和数据集已上传至GitHub仓库。
Key Takeaways
- UniPar是一个用于评估大型语言模型(LLMs)在并行代码翻译能力的系统性评价框架。
- 研究关注串行代码、CUDA和OpenMP之间的翻译。
- 现有模型在默认设置下表现不佳,GPT-4o-mini只有46%编译成功率和15%功能正确性。
- UniPar方法通过结合微调、超参数调整和编译器引导修复,能提高模型性能,最高达到69%编译成功率和33%正确性。
- UniPar源代码和PARATRANS数据集已公开,可供研究使用。
- 该研究发现对改进LLMs解决并行语言翻译问题有重要参考价值。
点此查看论文截图







FS-SAM2: Adapting Segment Anything Model 2 for Few-Shot Semantic Segmentation via Low-Rank Adaptation
Authors:Bernardo Forni, Gabriele Lombardi, Federico Pozzi, Mirco Planamente
Few-shot semantic segmentation has recently attracted great attention. The goal is to develop a model capable of segmenting unseen classes using only a few annotated samples. Most existing approaches adapt a pre-trained model by training from scratch an additional module. Achieving optimal performance with these approaches requires extensive training on large-scale datasets. The Segment Anything Model 2 (SAM2) is a foundational model for zero-shot image and video segmentation with a modular design. In this paper, we propose a Few-Shot segmentation method based on SAM2 (FS-SAM2), where SAM2’s video capabilities are directly repurposed for the few-shot task. Moreover, we apply a Low-Rank Adaptation (LoRA) to the original modules in order to handle the diverse images typically found in standard datasets, unlike the temporally connected frames used in SAM2’s pre-training. With this approach, only a small number of parameters is meta-trained, which effectively adapts SAM2 while benefiting from its impressive segmentation performance. Our method supports any K-shot configuration. We evaluate FS-SAM2 on the PASCAL-5$^i$, COCO-20$^i$ and FSS-1000 datasets, achieving remarkable results and demonstrating excellent computational efficiency during inference. Code is available at https://github.com/fornib/FS-SAM2
少样本语义分割近期引起了极大的关注。其目标在于开发一种模型,该模型能够仅使用少量标注样本对未见过的类别进行分割。现有大多数方法通过从头开始训练附加模块来适应预训练模型。这些方法需要在大规模数据集上进行大量训练才能实现最佳性能。Segment Anything Model 2(SAM2)是一个用于零样本图像和视频分割的基础模型,具有模块化设计。在本文中,我们提出了基于SAM2的少样本分割方法(FS-SAM2),其中SAM2的视频功能被直接重新用于少样本任务。此外,我们对原始模块应用了低秩适应(LoRA),以处理通常在标准数据集中发现的多种图像,这与SAM2预训练中使用的时序关联帧不同。通过这种方法,只有少数参数进行元训练,这有效地适应了SAM2,同时受益于其令人印象深刻的分割性能。我们的方法支持任何K-shot配置。我们在PASCAL-5i、COCO-20i和FSS-1000数据集上评估了FS-SAM2,取得了显著的结果,并展示了推理过程中的出色计算效率。代码可通过https://github.com/fornib/FS-SAM2获取。
论文及项目相关链接
PDF Accepted at ICIAP 2025
Summary
本文介绍了一种基于Segment Anything Model 2(SAM2)的Few-Shot语义分割方法。该方法利用SAM2的视频功能,通过低秩适应(LoRA)技术直接应用于few-shot任务。此方法仅需少量参数进行元训练,即可有效适应各种图像数据集,实现PASCAL-5i、COCO-20i和FSS-1000数据集上的显著结果,并展现出良好的推理计算效率。
Key Takeaways
- Few-shot语义分割旨在利用少量标注样本对未见的类别进行分割。
- Segment Anything Model 2 (SAM2)是一个用于零样本图像和视频分割的基础模型。
- 提出的FS-SAM2方法基于SAM2,直接利用其视频功能进行few-shot任务。
- 通过应用低秩适应(LoRA)技术,FS-SAM2能够处理多样化的图像,不同于SAM2的预训练使用的时序关联帧。
- FS-SAM2仅需少量参数进行元训练,即可有效适应各种数据集。
- 在PASCAL-5i、COCO-20i和FSS-1000数据集上,FS-SAM2取得了显著的结果。
点此查看论文截图



CEMTM: Contextual Embedding-based Multimodal Topic Modeling
Authors:Amirhossein Abaskohi, Raymond Li, Chuyuan Li, Shafiq Joty, Giuseppe Carenini
We introduce CEMTM, a context-enhanced multimodal topic model designed to infer coherent and interpretable topic structures from both short and long documents containing text and images. CEMTM builds on fine-tuned large vision language models (LVLMs) to obtain contextualized embeddings, and employs a distributional attention mechanism to weight token-level contributions to topic inference. A reconstruction objective aligns topic-based representations with the document embedding, encouraging semantic consistency across modalities. Unlike existing approaches, CEMTM can process multiple images per document without repeated encoding and maintains interpretability through explicit word-topic and document-topic distributions. Extensive experiments on six multimodal benchmarks show that CEMTM consistently outperforms unimodal and multimodal baselines, achieving a remarkable average LLM score of 2.61. Further analysis shows its effectiveness in downstream few-shot retrieval and its ability to capture visually grounded semantics in complex domains such as scientific articles.
我们介绍了CEMTM,这是一种基于上下文增强的多模态主题模型,旨在从包含文本和图像的长短文中推断出连贯且可解释的主题结构。CEMTM建立在经过精细调整的大型视觉语言模型(LVLMs)上,以获得上下文嵌入,并采用分布式注意力机制来权衡对主题推断的标记级贡献。重建目标使基于主题的表达与文档嵌入对齐,鼓励跨模态的语义一致性。与现有方法不同,CEMTM可以处理每篇文档的多个图像,无需重复编码,并通过明确的单词主题和文档主题分布保持可解释性。在六个多模态基准测试上的大量实验表明,CEMTM始终优于单模态和多模态基准测试,实现了令人瞩目的平均大模型分数2.61。进一步的分析显示了其在下游小样本检索中的有效性,以及其在科学文章等复杂领域中捕获视觉基础语义的能力。
论文及项目相关链接
PDF EMNLP 2025
Summary
CEMTM是一种基于上下文增强的多模态主题模型,可从包含文本和图像的长短文中推断出连贯且可解释的主题结构。它利用精细调整的大型视觉语言模型获取上下文嵌入,采用分布注意力机制对主题推断中的标记级贡献进行加权。重建目标使基于主题的表示与文档嵌入对齐,鼓励跨模态的语义一致性。与现有方法不同,CEMTM可以处理每篇文档的多个图像,无需重复编码,并通过明确的词主题和文档主题分布保持可解释性。在六个多模态基准测试上的实验表明,CEMTM始终优于单模态和多模态基线,平均LLM得分高达2.61。进一步分析表明,其在下游少样本检索中的有效性及其在捕获科学文章等复杂领域的视觉基础语义方面的能力。
Key Takeaways
- CEMTM是一个多模态主题模型,可以从长短文中推断连贯且可解释的主题结构。
- 它利用大型视觉语言模型(LVLMs)获取上下文嵌入。
- CEMTM采用分布注意力机制,对主题推断中的标记级别贡献进行加权。
- 重建目标使基于主题的表示与文档嵌入对齐,保持跨模态的语义一致性。
- CEMTM可以处理每篇文档的多个图像,无需重复编码。
- CEMTM具有优秀的性能表现,在多个基准测试中始终优于其他模型,平均LLM得分高达2.61。
点此查看论文截图





Intelligent Reservoir Decision Support: An Integrated Framework Combining Large Language Models, Advanced Prompt Engineering, and Multimodal Data Fusion for Real-Time Petroleum Operations
Authors:Seyed Kourosh Mahjour, Seyed Saman Mahjour
The petroleum industry faces unprecedented challenges in reservoir management, requiring rapid integration of complex multimodal datasets for real-time decision support. This study presents a novel integrated framework combining state-of-the-art large language models (GPT-4o, Claude 4 Sonnet, Gemini 2.5 Pro) with advanced prompt engineering techniques and multimodal data fusion for comprehensive reservoir analysis. The framework implements domain-specific retrieval-augmented generation (RAG) with over 50,000 petroleum engineering documents, chain-of-thought reasoning, and few-shot learning for rapid field adaptation. Multimodal integration processes seismic interpretations, well logs, and production data through specialized AI models with vision transformers. Field validation across 15 diverse reservoir environments demonstrates exceptional performance: 94.2% reservoir characterization accuracy, 87.6% production forecasting precision, and 91.4% well placement optimization success rate. The system achieves sub-second response times while maintaining 96.2% safety reliability with no high-risk incidents during evaluation. Economic analysis reveals 62-78% cost reductions (mean 72%) relative to traditional methods with 8-month payback period. Few-shot learning reduces field adaptation time by 72%, while automated prompt optimization achieves 89% improvement in reasoning quality. The framework processed real-time data streams with 96.2% anomaly detection accuracy and reduced environmental incidents by 45%. We provide detailed experimental protocols, baseline comparisons, ablation studies, and statistical significance testing to ensure reproducibility. This research demonstrates practical integration of cutting-edge AI technologies with petroleum domain expertise for enhanced operational efficiency, safety, and economic performance.
石油工业在储层管理上面临着前所未有的挑战,需要快速整合复杂的跨模态数据集以支持实时决策。本研究提出一种新型的综合框架,结合最前沿的大型语言模型(GPT-4o、Claude 4 Sonnet、Gemini 2.5 Pro)与先进的提示工程技术和跨模态数据融合,用于全面的储层分析。该框架实现了针对5万多名石油工程文档的领域特定检索增强生成(RAG)、链式思维推理和小样本学习,用于快速现场适应。跨模态整合流程通过具有视觉变压器的专业AI模型解释地震数据、井日志和生产数据。在15个不同储层环境进行的现场验证显示出色性能:94.2%的储层表征准确性、87.6%的生产预测精度和91.4%的井位优化成功率。系统实现亚秒级响应时间,同时保持96.2%的安全可靠性,在评估期间无高风险事件发生。经济分析显示,与传统方法相比,成本降低62-78%(平均降低72%),回报期为8个月。小样本学习将现场适应时间减少72%,而自动提示优化则实现了推理质量提高89%。该框架处理实时数据流,异常检测准确率为96.2%,并减少45%的环境事件。我们提供了详细的实验协议、基线对比、消融研究和统计显著性测试,以确保研究的可重复性。本研究展示了将最前沿的AI技术与石油领域专业知识相结合的实际集成,以提高操作效率、安全性和经济绩效。
论文及项目相关链接
Summary
该研究采用新型综合框架,结合最新大型语言模型(如GPT-4o、Claude 4 Sonnet、Gemini 2.5 Pro)与先进的提示工程技术和多模式数据融合,用于石油储层综合分析。该框架实施领域特定的检索增强生成技术,结合超过5万份石油工程文献、思维链推理和少量学习,用于快速适应油田环境。多模式集成处理地震解释、井日志和生产数据,通过专业的人工智能模型和视觉转换器实现高效分析。实验验证显示,该框架在多个环境中有出色的表现,例如:高精确度(达百分之九十以上)的储层特征分析、生产预测和油井位置优化等。同时,该框架具有快速响应时间和高度的安全性和可靠性。经济分析表明,与传统方法相比,该框架能显著降低成本并缩短回报周期。此外,该框架还能处理实时数据流、提高异常检测准确性并减少环境事故。本研究展示了先进AI技术与石油领域专业知识相结合的实际应用,有助于提高运营效率、安全性和经济效益。
Key Takeaways
- 综合运用最新大型语言模型进行石油储层分析。
- 利用检索增强生成技术实现快速适应油田环境的能力。
- 多模式集成处理地震、井日志和生产数据以提高分析准确性。
- 框架具有高度准确性和性能表现,如储层特征分析、生产预测和油井位置优化等。
- 具有快速响应时间和出色的安全性和可靠性。
- 与传统方法相比,该框架能显著降低运营成本并缩短回报周期。
点此查看论文截图


ANROT-HELANet: Adverserially and Naturally Robust Attention-Based Aggregation Network via The Hellinger Distance for Few-Shot Classification
Authors:Gao Yu Lee, Tanmoy Dam, Md Meftahul Ferdaus, Daniel Puiu Poenar, Vu N. Duong
Few-Shot Learning (FSL), which involves learning to generalize using only a few data samples, has demonstrated promising and superior performances to ordinary CNN methods. While Bayesian based estimation approaches using Kullback-Leibler (KL) divergence have shown improvements, they remain vulnerable to adversarial attacks and natural noises. We introduce ANROT-HELANet, an Adversarially and Naturally RObusT Hellinger Aggregation Network that significantly advances the state-of-the-art in FSL robustness and performance. Our approach implements an adversarially and naturally robust Hellinger distance-based feature class aggregation scheme, demonstrating resilience to adversarial perturbations up to $\epsilon=0.30$ and Gaussian noise up to $\sigma=0.30$. The network achieves substantial improvements across benchmark datasets, including gains of 1.20% and 1.40% for 1-shot and 5-shot scenarios on miniImageNet respectively. We introduce a novel Hellinger Similarity contrastive loss function that generalizes cosine similarity contrastive loss for variational few-shot inference scenarios. Our approach also achieves superior image reconstruction quality with a FID score of 2.75, outperforming traditional VAE (3.43) and WAE (3.38) approaches. Extensive experiments conducted on four few-shot benchmarked datasets verify that ANROT-HELANet’s combination of Hellinger distance-based feature aggregation, attention mechanisms, and our novel loss function establishes new state-of-the-art performance while maintaining robustness against both adversarial and natural perturbations. Our code repository will be available at https://github.com/GreedYLearner1146/ANROT-HELANet/tree/main.
少量学习(FSL)技术涉及仅使用少量数据样本进行泛化学习,已经展现出比普通CNN方法更有前景和更优越的性能。虽然基于贝叶斯估计的使用Kullback-Leibler(KL)散度的方法已经取得了进展,但它们仍然容易受到对抗性攻击和自然噪声的影响。我们介绍了ANROT-HELANet,这是一个对抗性和自然鲁棒的Hellinger聚合网络,它显著地推进了FSL的稳健性和性能的前沿水平。我们的方法实现了一种对抗性和自然鲁棒的基于Hellinger距离的特征类聚合方案,显示出对对抗性扰动高达$\epsilon=0.30$和高斯噪声高达$\sigma=0.30$的韧性。该网络在基准数据集上取得了实质性的改进,包括在miniImageNet上的1次和5次场景分别提高了1.20%和1.40%。我们引入了一种新的Hellinger相似性对比损失函数,它推广了用于可变少量推理场景的余弦相似性对比损失。我们的方法还实现了优越的图像重建质量,FID得分为2.75,优于传统的VAE(3.43)和WAE(3.38)方法。在四个少量基准数据集上进行的广泛实验证实,ANROT-HELANet结合了基于Hellinger距离的特征聚合、注意力机制和我们的新型损失函数,在保持对抗性和自然扰动的稳健性的同时,树立了新的业界最佳性能标准。我们的代码仓库将位于https://github.com/GreedYLearner1146/ANROT-HELANet/tree/main。
论文及项目相关链接
PDF Preprint version. The manuscript has been submitted to a journal. All changes will be transferred to the final version if accepted. Also an erratum: In Figure 10 and 11, the $\epsilon = 0.005$ value should be $\epsilon = 0.05$
Summary
本文介绍了ANROT-HELANet网络在少量样本学习(FSL)中的卓越表现。该网络采用基于Hellinger距离的特征类聚合方案,对抗性扰动和自然噪声都有很强的鲁棒性。在多个基准数据集上实现了显著的性能提升,并引入了一种新的Hellinger相似性对比损失函数,适用于变分少样本推理场景。ANROT-HELANet还具有出色的图像重建质量,且对对抗性和自然扰动具有稳健性。
Key Takeaways
- ANROT-HELANet网络在少量样本学习(FSL)中表现出卓越的性能。
- 该网络采用基于Hellinger距离的特征类聚合方案,增强了对抗性和自然噪声的鲁棒性。
- ANROT-HELANet在多个基准数据集上实现了显著的性能提升,特别是在miniImageNet上的1-shot和5-shot场景。
- 引入了一种新的Hellinger相似性对比损失函数,适用于变分少样本推理场景。
- ANROT-HELANet具有出色的图像重建质量,FID分数为2.75,优于传统VAE和WAE方法。
- 通过对四个少样本基准数据集的广泛实验,验证了ANROT-HELANet的有效性和鲁棒性。
点此查看论文截图



CCoMAML: Efficient Cattle Identification Using Cooperative Model-Agnostic Meta-Learning
Authors:Rabin Dulal, Lihong Zheng, Ashad Kabir
Cattle identification is critical for efficient livestock farming management, currently reliant on radio-frequency identification (RFID) ear tags. However, RFID-based systems are prone to failure due to loss, damage, tampering, and vulnerability to external attacks. As a robust alternative, biometric identification using cattle muzzle patterns similar to human fingerprints has emerged as a promising solution. Deep learning techniques have demonstrated success in leveraging these unique patterns for accurate identification. But deep learning models face significant challenges, including limited data availability, disruptions during data collection, and dynamic herd compositions that require frequent model retraining. To address these limitations, this paper proposes a novel few-shot learning framework for real-time cattle identification using Cooperative Model-Agnostic Meta-Learning (CCoMAML) with Multi-Head Attention Feature Fusion (MHAFF) as a feature extractor model. This model offers great model adaptability to new data through efficient learning from few data samples without retraining. The proposed approach has been rigorously evaluated against current state-of-the-art few-shot learning techniques applied in cattle identification. Comprehensive experimental results demonstrate that our proposed CCoMAML with MHAFF has superior cattle identification performance with 98.46% and 97.91% F1 scores.
牲畜识别对于高效的畜牧业管理至关重要,目前主要依赖于射频识别(RFID)耳标。然而,基于RFID的系统容易因丢失、损坏、干扰和外部攻击而出现故障。作为一种可靠的替代方案,利用牲畜鼻纹(类似于人类指纹)进行生物识别已崭露头角,成为一项前景广阔的技术解决方案。深度学习技术已成功证明可利用这些独特图案进行准确识别。但是,深度学习模型面临重大挑战,包括数据有限、数据收集过程中的中断以及动态群体构成需要频繁重新训练模型等。为了解决这些局限性,本文提出了一种新型的小样本学习框架,用于实时牲畜识别。该框架采用合作模型不可知元学习(CCoMAML)和多头注意力特征融合(MHAFF)作为特征提取模型。该模型通过从少量数据样本中进行高效学习,无需重新训练即可快速适应新数据。所提出的方法与当前用于牲畜识别的最先进的小样本学习技术进行了严格评估比较。综合实验结果表明,我们提出的CCoMAML与MHAFF相结合,在牲畜识别方面具有卓越性能,F1得分达到98.46%和97.91%。
论文及项目相关链接
Summary
本文介绍了畜牧业中牲畜识别的重要性,目前主要依赖射频识别(RFID)耳标。然而,RFID系统易出现故障,如丢失、损坏、干扰和外部攻击等问题。为解决这些问题,采用生物识别技术,如利用牛口鼻模式(类似于人类指纹)进行识别。深度学习技术在利用这些独特模式进行准确识别方面已取得成功。但深度学习模型面临数据有限、收集数据时的干扰以及群体组成动态变化等挑战,需要频繁重新训练模型。为解决这些挑战,本文提出了一种新型的基于少量数据的协作模型无关元学习(CCoMAML)结合多头注意力特征融合(MHAFF)模型用于实时牛只识别的应用框架。该模型具备适应新数据的能力,能从少量数据中高效学习,无需重新训练。实验结果显示,所提出的CCoMAML与MHAFF结合模型在牛只识别方面具有优异的性能,F1分数达到98.46%和97.91%。
Key Takeaways
- RFID在牲畜识别中存在问题,如易丢失、损坏和受到外部攻击等缺点。
- 生物识别技术如牛口鼻模式识别是一种有前途的解决方案。
- 深度学习已成功应用于利用牛口鼻模式进行准确识别。
- 深度学习模型在牛只识别面临挑战,包括数据有限、收集数据时的干扰和群体组成的变化等。
- 本文提出了一种基于少量数据的协作模型无关元学习(CCoMAML)与多头注意力特征融合(MHAFF)模型的新的解决方案来应对上述挑战。
- 该模型具备适应新数据的能力,并能从少量数据中高效学习,无需重新训练。
点此查看论文截图

An Advanced Convolutional Neural Network for Bearing Fault Diagnosis under Limited Data
Authors:Shengke Sun, Shuzhen Han, Ziqian Luan, Xinghao Qin, Jiao Yin, Zhanshan Zhao, Jinli Cao, Hua Wang
In the area of bearing fault diagnosis, deep learning (DL) methods have been widely used recently. However, due to the high cost or privacy concerns, high-quality labeled data are scarce in real world scenarios. While few-shot learning has shown promise in addressing data scarcity, existing methods still face significant limitations in this domain. Traditional data augmentation techniques often suffer from mode collapse and generate low-quality samples that fail to capture the diversity of bearing fault patterns. Moreover, conventional convolutional neural networks (CNNs) with local receptive fields makes them inadequate for extracting global features from complex vibration signals. Additionally, existing methods fail to model the intricate relationships between limited training samples. To solve these problems, we propose an advanced data augmentation and contrastive fourier convolution framework (DAC-FCF) for bearing fault diagnosis under limited data. Firstly, a novel conditional consistent latent representation and reconstruction generative adversarial network (CCLR-GAN) is proposed to generate more diverse data. Secondly, a contrastive learning based joint optimization mechanism is utilized to better model the relations between the available training data. Finally, we propose a 1D fourier convolution neural network (1D-FCNN) to achieve a global-aware of the input data. Experiments demonstrate that DAC-FCF achieves significant improvements, outperforming baselines by up to 32% on case western reserve university (CWRU) dataset and 10% on a self-collected test bench. Extensive ablation experiments prove the effectiveness of the proposed components. Thus, the proposed DAC-FCF offers a promising solution for bearing fault diagnosis under limited data.
在轴承故障诊断领域,深度学习(DL)方法最近得到了广泛应用。然而,由于成本高或隐私担忧,高质量标注数据在现实场景中非常稀缺。尽管小样本学习在解决数据稀缺问题上显示出潜力,但现有方法在这个领域仍然面临重大挑战。传统数据增强技术经常遭受模式崩溃,并产生低质量样本,无法捕获轴承故障模式的多样性。此外,具有局部感受野的传统卷积神经网络(CNN)在提取复杂振动信号的全局特征方面表现不足。另外,现有方法无法对有限的训练样本之间的复杂关系进行建模。为了解决这些问题,我们提出了一种先进的数据增强和对比傅里叶卷积框架(DAC-FCF),用于在有限数据下进行轴承故障诊断。首先,提出了一种新型的条件一致潜在表示和重建生成对抗网络(CCLR-GAN),以生成更多样化的数据。其次,利用基于对比学习的联合优化机制,更好地对可用训练数据之间的关系进行建模。最后,我们提出了一种一维傅里叶卷积神经网络(1D-FCNN),实现对输入数据的全局感知。实验表明,DAC-FCF取得了显著改进,在凯斯西储大学(CWRU)数据集上较基线高出32%,在自收集的试验台上高出10%。广泛的消融实验证明了所提出组件的有效性。因此,所提出的DAC-FCF为有限数据下的轴承故障诊断提供了有前景的解决方案。
论文及项目相关链接
Summary
在深学习广泛应用于轴承故障诊断领域的同时,因成本高和隐私顾虑导致高质量标注数据稀缺。少样本学习虽具潜力,但现有方法仍存在局限。本文提出一种先进的数据增强和对比傅里叶卷积框架(DAC-FCF),用于有限数据下的轴承故障诊断。通过条件一致潜在表示和重建生成对抗网络(CCLR-GAN)生成更多样化数据,利用对比学习优化机制建模有限训练数据间的关系,并引入一维傅里叶卷积神经网络(1D-FCNN)实现全局感知输入数据。实验证明,DAC-FCF在案例西部储备大学数据集上较基线方法提高了高达32%,在自建试验台上提高了约10%,广泛验证证明了该方法的有效性。它为有限数据下的轴承故障诊断提供了可行解决方案。
Key Takeaways
- 深学习在轴承故障诊断中广泛应用,但高质量标注数据稀缺。
- 现有少样本学习方法在轴承故障诊断领域存在局限性。
- 本文提出DAC-FCF框架,包括数据增强技术CCLR-GAN、对比学习优化机制及全局感知的一维傅里叶卷积神经网络。
- 实验证明DAC-FCF在轴承故障诊断上表现优异,显著提高诊断准确率。
点此查看论文截图



An Interpretable Benchmark for Clickbait Detection and Tactic Attribution
Authors:Lihi Nofar, Tomer Portal, Aviv Elbaz, Alexander Apartsin, Yehudit Aperstein
The proliferation of clickbait headlines poses significant challenges to the credibility of information and user trust in digital media. While recent advances in machine learning have improved the detection of manipulative content, the lack of explainability limits their practical adoption. This paper presents a model for explainable clickbait detection that not only identifies clickbait titles but also attributes them to specific linguistic manipulation strategies. We introduce a synthetic dataset generated by systematically augmenting real news headlines using a predefined catalogue of clickbait strategies. This dataset enables controlled experimentation and detailed analysis of model behaviour. We present a two-stage framework for automatic clickbait analysis comprising detection and tactic attribution. In the first stage, we compare a fine-tuned BERT classifier with large language models (LLMs), specifically GPT-4.0 and Gemini 2.4 Flash, under both zero-shot prompting and few-shot prompting enriched with illustrative clickbait headlines and their associated persuasive tactics. In the second stage, a dedicated BERT-based classifier predicts the specific clickbait strategies present in each headline. This work advances the development of transparent and trustworthy AI systems for combating manipulative media content. We share the dataset with the research community at https://github.com/LLM-HITCS25S/ClickbaitTacticsDetection
标题欺诈内容的泛滥对数字媒体的信息可信度和用户信任度构成了重大挑战。虽然机器学习领域的最新进展已经提高了对操纵性内容的检测能力,但由于缺乏解释性,限制了其实际应用。本文提出了一种可解释的标题欺诈检测模型,该模型不仅可以识别欺诈标题,还可以将它们归因于特定的语言操纵策略。我们引入了一个合成数据集,通过系统地使用预定义的标题欺诈策略目录来增强真实新闻标题,以生成该数据集。该数据集能够进行受控实验并对模型行为进行详细分析。我们提出了一种用于自动标题欺诈分析的两阶段框架,包括检测和策略归因。在第一阶段,我们将微调过的BERT分类器与大型语言模型(LLM)进行比较,特别是GPT-4.0和Gemini 2.4 Flash,在零样本提示和通过插入具有代表性的欺诈标题及其相关说服策略的少量样本提示下进行比较。在第二阶段,一个专门的基于BERT的分类器预测每个标题中存在的特定欺诈策略。这项工作推动了透明和可信赖的AI系统在打击操纵性媒体内容方面的发展。我们已在https://github.com/LLM-HITCS25S/ClickbaitTacticsDetection上与研究界共享该数据集。
论文及项目相关链接
PDF 7 pages
Summary
本文提出了一个可解释的点击诱饵检测模型,用于检测数字化媒体中的操纵性标题内容并指出其所依赖的特定语言操纵策略。模型构建在一个合成数据集上,该数据集通过系统地利用真实新闻标题和预设的点击诱饵策略目录来生成。研究提供了一个两阶段的自动点击诱饵分析框架,包括检测和策略归因。在第一阶段,对比了微调后的BERT分类器与大型语言模型(如GPT-4.0和Gemini 2.4 Flash)在零样本提示和包含典型点击诱饵标题及其相关说服策略的少量样本提示下的表现。第二阶段使用专门的BERT分类器预测每个标题中特定的点击诱饵策略。该研究有助于开发透明且可信赖的AI系统来对抗操纵性媒体内容。
Key Takeaways
- 点击诱饵标题对信息可信度和用户对数字媒体的信任度构成重大挑战。
- 尽管机器学习在检测操纵性内容上有所进展,但缺乏可解释性限制了其实际应用。
- 本文提出了一个两阶段的自动点击诱饵分析框架,包括检测和策略归因阶段。
- 合成数据集通过系统地利用真实新闻标题和预设的点击诱饵策略生成,促进模型行为的控制实验和详细分析。
- 研究对比了不同模型(如BERT分类器和大型语言模型)在点击诱饵检测中的表现。
- 专门的BERT分类器用于预测每个标题中特定的点击诱饵策略。
点此查看论文截图





Two Sides of the Same Optimization Coin: Model Degradation and Representation Collapse in Graph Foundation Models
Authors:Xunkai Li, Daohan Su, Sicheng Liu, Ru Zhang, Zhenjun Li, Bing Zhou, Rong-Hua Li, Guoren Wang
Graph foundation models, inspired by the success of LLMs, are designed to learn the optimal embedding from multi-domain TAGs for the downstream cross-task generalization capability. During our investigation, graph VQ-MAE stands out among the increasingly diverse landscape of GFM architectures. This is attributed to its ability to jointly encode topology and textual attributes from multiple domains into discrete embedding spaces with clear semantic boundaries. Despite its potential, domain generalization conflicts cause imperceptible pitfalls. In this paper, we instantiate two of them, and they are just like two sides of the same GFM optimization coin - Side 1 Model Degradation: The encoder and codebook fail to capture the diversity of inputs; Side 2 Representation Collapse: The hidden embedding and codebook vector fail to preserve semantic separability due to constraints from narrow representation subspaces. These two pitfalls (sides) collectively impair the decoder and generate the low-quality reconstructed supervision, causing the GFM optimization dilemma during pre-training (coin). Through empirical investigation, we attribute the above challenges to Information Bottleneck and Regularization Deficit. To address them, we propose MoT (Mixture-of-Tinkers) - (1) Information Tinker for Two Pitfalls, which utilizes an edge-wise semantic fusion strategy and a mixture-of-codebooks with domain-aware routing to improve information capacity. (2) Regularization Tinker for Optimization Coin, which utilizes two additional regularizations to further improve gradient supervision in our proposed Information Tinker. Notably, as a flexible architecture, MoT adheres to the scaling laws of GFM, offering a controllable model scale. Compared to SOTA baselines, experiments on 22 datasets across 6 domains demonstrate that MoT achieves significant improvements in supervised, few-shot, and zero-shot scenarios.
图模型基础,受到大型语言模型(LLMs)成功的启发,旨在从多域标签(TAGs)学习最佳嵌入,以获得下游跨任务泛化能力。在我们的调查中,图VQ-MAE在日益多样化的GFM架构中脱颖而出。这归功于它将多个领域的拓扑和文本属性联合编码到具有清晰语义边界的离散嵌入空间中的能力。尽管如此,领域泛化冲突会导致一些不易察觉的陷阱。在本文中,我们实例化了两个陷阱,它们就像是GFM优化的同一枚硬币的两面——第一面模型退化:编码器和代码本无法捕捉输入的多样性;第二面表示崩溃:隐藏嵌入和代码本向量由于狭窄表示子空间的约束而无法保持语义可分性。这两个陷阱(侧面)共同损害了解码器,并产生低质量重建的监督,导致预训练过程中的GFM优化困境(硬币)。通过实证研究,我们将上述挑战归结为信息瓶颈和正则化缺陷。为了解决这些问题,我们提出了MoT(混合微调器)——(1)针对两个陷阱的信息微调器,它利用边缘语义融合策略和带有域感知路由的混合代码本,以提高信息容量。(2)针对优化硬币的正则化微调器,它利用两种额外的正则化方法,以进一步提高我们提出的信息微调器中的梯度监督。值得注意的是,作为一个灵活的架构,MoT遵循GFM的扩展定律,提供了可控的模型规模。在6个领域的22个数据集上的实验表明,与最先进的基线相比,MoT在监督、小样本和零样本场景中实现了显著改进。
论文及项目相关链接
Summary
图基础模型借鉴了大型语言模型的成功经验,旨在从多域标签中学习最佳嵌入,以实现下游跨任务的泛化能力。在研究中,图VQ-MAE在众多图基础模型架构中脱颖而出。这得益于其能够联合编码多个领域的拓扑和文本属性到具有清晰语义边界的离散嵌入空间的能力。然而,领域泛化冲突会引发一些难以察觉的问题,本文实例化了两个这样的问题,它们就像是图基础模型优化的同一枚硬币的两面——模型退化:编码器和代码本无法捕捉输入的多样性;表示崩溃:隐藏嵌入和代码本向量由于狭窄的表示子空间的约束而无法保持语义可分性。这两个问题共同影响解码器,生成低质量的重构监督,造成预训练时的图基础模型优化困境(“硬币”)。为解决这些挑战,本文提出了混合敲打者(MoT)架构。其采用边缘语义融合策略和混合代码本进行信息扩容和改进优化的监管措施等策略。作为灵活架构,MoT遵循图基础模型的规模定律,并在多个数据集和领域上实现了显著优于当前最优水平的改进,无论在监督学习、小样本学习和零样本学习场景下。
Key Takeaways
- 图基础模型借鉴大型语言模型的成功经验,旨在学习多域标签的最佳嵌入,提高下游任务的泛化能力。
- 图VQ-MAE在多种图基础模型架构中表现突出,能够联合编码拓扑和文本属性到具有清晰语义边界的离散嵌入空间。
- 领域泛化冲突会导致难以察觉的问题,如模型退化和表示崩溃等挑战。这些问题会共同影响解码器性能并影响预训练过程中的图基础模型优化。这些问题被视为图基础模型优化的困境(“硬币”)。
点此查看论文截图




Enkidu: Universal Frequential Perturbation for Real-Time Audio Privacy Protection against Voice Deepfakes
Authors:Zhou Feng, Jiahao Chen, Chunyi Zhou, Yuwen Pu, Qingming Li, Tianyu Du, Shouling Ji
The rapid advancement of voice deepfake technologies has raised serious concerns about user audio privacy, as attackers increasingly exploit publicly available voice data to generate convincing fake audio for malicious purposes such as identity theft, financial fraud, and misinformation campaigns. While existing defense methods offer partial protection, they face critical limitations, including weak adaptability to unseen user data, poor scalability to long audio, rigid reliance on white-box knowledge, and high computational and temporal costs during the encryption process. To address these challenges and defend against personalized voice deepfake threats, we propose Enkidu, a novel user-oriented privacy-preserving framework that leverages universal frequential perturbations generated through black-box knowledge and few-shot training on a small amount of user data. These highly malleable frequency-domain noise patches enable real-time, lightweight protection with strong generalization across variable-length audio and robust resistance to voice deepfake attacks, all while preserving perceptual quality and speech intelligibility. Notably, Enkidu achieves over 50 to 200 times processing memory efficiency (as low as 0.004 gigabytes) and 3 to 7000 times runtime efficiency (real-time coefficient as low as 0.004) compared to six state-of-the-art countermeasures. Extensive experiments across six mainstream text-to-speech models and five cutting-edge automated speaker verification models demonstrate the effectiveness, transferability, and practicality of Enkidu in defending against both vanilla and adaptive voice deepfake attacks. Our code is currently available.
随着语音深度伪造技术的快速发展,用户音频隐私引起了严重关注。攻击者越来越多地利用公开可用的语音数据生成令人信服的虚假音频,用于身份盗窃、金融欺诈和虚假信息宣传等恶意目的。尽管现有的防御方法提供了部分保护,但它们面临着关键的局限性,包括对新用户数据的适应能力弱、对长音频的可扩展性差、对白盒知识的刚性依赖以及加密过程中的计算和时间成本高。为了解决这些挑战并防范个性化的语音深度伪造威胁,我们提出了Enkidu,这是一个新型的用户导向的隐私保护框架。它利用通过黑盒知识和少量用户数据进行的少量训练生成通用频率扰动。这些高度灵活的频域噪声块实现了实时、轻量级的保护,具有强大的泛化能力,可跨变长音频运行,并具有很强的抵抗语音深度伪造攻击的能力,同时保留了感知质量和语音清晰度。值得注意的是,与六种最先进的对策相比,Enkidu实现了高达50至200倍的内存处理效率(低至0.004千兆字节)和3至7000倍的运行时间效率(实时系数低至0.004)。在六个主流文本到语音模型和五个先进的自动说话人验证模型上进行的广泛实验证明了Enkidu在防范常规和自适应语音深度伪造攻击方面的有效性、可迁移性和实用性。我们的代码目前可用。
论文及项目相关链接
PDF Accepted by ACM MM 2025, Open-sourced
Summary
本文关注语音深度伪造技术的快速发展所带来的用户音频隐私保护问题。现有防御方法存在诸多局限性,如难以适应未见过的用户数据、难以处理长音频、过于依赖白盒知识、加密过程中的计算和时间成本高昂等。为此,本文提出一种新型用户导向的隐私保护框架Enkidu,利用通过黑盒知识和少量用户数据进行的少量训练生成通用频域扰动。这些高度灵活的频域噪声补丁可实现实时、轻量级的保护,具有良好的泛化能力和抵抗语音深度伪造攻击的能力,同时保持感知质量和语音清晰度。与六种最先进的对策相比,Enkidu在处理内存方面实现了高达50至200倍的效率,并在运行时效率方面实现了高达3至7000倍的提升。实验证明,Enkidu在抵御普通和自适应语音深度伪造攻击方面均有效、可转移并具有实用性。
Key Takeaways
- 语音深度伪造技术的快速发展引发了对用户音频隐私的关注。
- 现有防御方法存在适应未见用户数据能力差、难以处理长音频、依赖白盒知识等局限性。
- Enkidu框架利用黑盒知识和少量训练数据生成通用频域扰动。
- Enkidu实现实时、轻量级保护,泛化能力强,能有效抵抗语音深度伪造攻击。
- Enkidu在处理内存和运行时效率方面较现有方法显著提升。
- Enkidu在抵御不同类型语音深度伪造攻击时均表现出有效性和实用性。
点此查看论文截图





The Diffusion Duality
Authors:Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, Volodymyr Kuleshov
Uniform-state discrete diffusion models hold the promise of fast text generation due to their inherent ability to self-correct. However, they are typically outperformed by autoregressive models and masked diffusion models. In this work, we narrow this performance gap by leveraging a key insight: Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion. Our method, Duo, transfers powerful techniques from Gaussian diffusion to improve both training and sampling. First, we introduce a curriculum learning strategy guided by the Gaussian process, doubling training speed by reducing variance. Models trained with curriculum learning surpass autoregressive models in zero-shot perplexity on 3 of 7 benchmarks. Second, we present Discrete Consistency Distillation, which adapts consistency distillation from the continuous to the discrete setting. This algorithm unlocks few-step generation in diffusion language models by accelerating sampling by two orders of magnitude. We provide the code and model checkpoints on the project page: http://s-sahoo.github.io/duo
均匀状态离散扩散模型因其固有的自我校正能力而具有快速文本生成的潜力。然而,它们通常被自回归模型和掩膜扩散模型所超越。在这项工作中,我们通过利用一个关键见解来缩小这一性能差距:均匀状态扩散过程自然产生于底层的高斯扩散。我们的方法Duo,将高斯扩散的强大技术转移到改进训练和采样中。首先,我们引入了一种由高斯过程引导的课程学习策略,通过减少方差使训练速度翻倍。用课程学习策略训练的模型在7个基准测试中的3个上实现了零启动困惑度超过自回归模型。其次,我们提出了离散一致性蒸馏(Discrete Consistency Distillation),该算法将一致性蒸馏从连续环境适配到离散环境。这一算法通过加速采样速度两个数量级,实现了扩散语言模型的几步生成。我们已在项目页面提供代码和模型检查点:http://s-sahoo.github.io/duo。
论文及项目相关链接
PDF ICML 2025. We provide the code at: https://github.com/s-sahoo/duo [v2]: Camera ready revisions
Summary
本文介绍了基于均匀态离散扩散模型的文本生成方法,通过借鉴高斯扩散过程中的关键技术来提升其性能。引入课程学习策略,减少训练方差,加速训练速度。同时,提出离散一致性蒸馏算法,将一致性蒸馏从连续环境迁移到离散环境,实现在扩散语言模型中的少步生成,并大幅度加速采样过程。
Key Takeaways
- 基于均匀态离散扩散模型的文本生成方法具有自我修正能力,可实现快速文本生成。
- 高斯扩散过程中的关键技术被用来提升均匀态离散扩散模型的性能。
- 引入课程学习策略,通过减少训练方差来加速训练速度。
- 在七个基准测试中,有三个上,使用课程学习策略的模型在零样本困惑度上超过了自回归模型。
- 提出离散一致性蒸馏算法,实现了扩散语言模型中的少步生成。
- 离散一致性蒸馏算法可以大幅度加速采样过程。
点此查看论文截图


Can LLMs assist with Ambiguity? A Quantitative Evaluation of various Large Language Models on Word Sense Disambiguation
Authors:T. G. D. K. Sumanathilaka, Nicholas Micallef, Julian Hough
Ambiguous words are often found in modern digital communications. Lexical ambiguity challenges traditional Word Sense Disambiguation (WSD) methods, due to limited data. Consequently, the efficiency of translation, information retrieval, and question-answering systems is hindered by these limitations. This study investigates the use of Large Language Models (LLMs) to improve WSD using a novel approach combining a systematic prompt augmentation mechanism with a knowledge base (KB) consisting of different sense interpretations. The proposed method incorporates a human-in-loop approach for prompt augmentation where prompt is supported by Part-of-Speech (POS) tagging, synonyms of ambiguous words, aspect-based sense filtering and few-shot prompting to guide the LLM. By utilizing a few-shot Chain of Thought (COT) prompting-based approach, this work demonstrates a substantial improvement in performance. The evaluation was conducted using FEWS test data and sense tags. This research advances accurate word interpretation in social media and digital communication.
在现代数字通信中经常可以发现含义模糊的词汇。由于数据有限,词汇的模糊性给传统的词义消歧(WSD)方法带来了挑战。因此,翻译、信息检索和问答系统的效率受到了这些限制的影响。本研究探讨了使用大型语言模型(LLM)来改善词义消歧(WSD)的方法,采用一种新颖的结合系统提示增强机制和知识库(KB)的方法,知识库由不同的词义解释构成。所提出的方法采用了一种人机结合的提示增强方法,其中提示由词性标注、模糊词同义词、基于方面的词义过滤和少量提示组成,以引导语言模型。通过采用基于少量提示的思考链(COT)提示方法,这项工作在性能上取得了显著改进。评估工作使用了FEWS测试数据和词义标签。本研究推动了社交媒体和数字通信中的准确词汇解释。
论文及项目相关链接
PDF 12 pages,6 tables, 1 figure, Proceedings of the 1st International Conference on NLP & AI for Cyber Security
Summary
本研究探讨了使用大型语言模型(LLMs)通过结合系统提示增强机制和包含不同词义解读的知识库(KB)来改善词汇感知消歧(WSD)的方法。该研究采用人机结合的方式进行提示增强,借助词性标注、模糊词同义词、基于方面的词义过滤和少量提示来引导LLM。通过采用基于少量思维的链式思考(COT)提示方法,该研究在性能上取得了显著改进。
Key Takeaways
- 词汇模糊在现代数字通信中普遍存在,对传统的词汇感知消歧(WSD)方法构成挑战。
- 由于数据有限,翻译、信息检索和问答系统的效率受到影响。
- 大型语言模型(LLMs)在解决词汇模糊问题方面具有潜力。
- 结合系统提示增强机制和包含不同词义解读的知识库能改善WSD。
- 人机结合的方式进行提示增强,借助词性标注、同义词、基于方面的词义过滤和少量提示引导LLM。
- 采用基于Chain of Thought(COT)的少量提示方法,在性能上取得显著改进。
点此查看论文截图





Step-wise Distribution Alignment Guided Style Prompt Tuning for Source-free Cross-domain Few-shot Learning
Authors:Huali Xu, Li Liu, Tianpeng Liu, Shuaifeng Zhi, Shuzhou Sun, Ming-Ming Cheng
Existing cross-domain few-shot learning (CDFSL) methods, which develop source-domain training strategies to enhance model transferability, face challenges with large-scale pre-trained models (LMs) due to inaccessible source data and training strategies. Moreover, fine-tuning LMs for CDFSL demands substantial computational resources, limiting practicality. This paper addresses the source-free CDFSL (SF-CDFSL) problem, tackling few-shot learning (FSL) in the target domain using only pre-trained models and a few target samples without source data or strategies. To overcome the challenge of inaccessible source data, this paper introduces Step-wise Distribution Alignment Guided Style Prompt Tuning (StepSPT), which implicitly narrows domain gaps through prediction distribution optimization. StepSPT proposes a style prompt to align target samples with the desired distribution and adopts a dual-phase optimization process. In the external process, a step-wise distribution alignment strategy factorizes prediction distribution optimization into a multi-step alignment problem to tune the style prompt. In the internal process, the classifier is updated using standard cross-entropy loss. Evaluations on five datasets demonstrate that StepSPT outperforms existing prompt tuning-based methods and SOTAs. Ablation studies further verify its effectiveness. Code will be made publicly available at https://github.com/xuhuali-mxj/StepSPT.
现有跨域小样本学习(CDFSL)方法通过开发源域训练策略以提高模型的迁移能力,但由于无法访问源数据以及训练策略,面临着大规模预训练模型(LMs)的挑战。此外,为CDFSL微调LM需要大量的计算资源,这限制了其实用性。本文解决了无源CDFSL(SF-CDFSL)问题,旨在仅使用预训练模型和一些目标样本,无需源数据或策略,来解决目标域的小样本学习(FSL)问题。为了克服无法访问源数据的挑战,本文引入了逐步分布对齐引导风格提示调整(StepSPT),它通过预测分布优化来隐含地缩小域间差距。StepSPT提出了一种风格提示来使目标样本与所需的分布对齐,并采用了一种双阶段优化过程。在外部过程中,逐步分布对齐策略将预测分布优化分解为多步对齐问题,以调整风格提示。在内部过程中,分类器使用标准的交叉熵损失进行更新。在五个数据集上的评估表明,StepSPT优于现有的基于提示调整的方法和SOTAs。消融研究进一步验证了其有效性。代码将公开发布在https://github.com/xuhuali-mxj/StepSPT。
论文及项目相关链接
PDF Accepted at IEEE TPAMI, 16 pages, 12 figures, 7 tables
Summary
本文解决了无源数据的跨域小样本学习(SF-CDFSL)问题,即在目标域进行小样本学习(FSL),仅使用预训练模型和一些目标样本,无需源数据或策略。引入了一种名为StepSPT的方法,通过预测分布优化来隐式缩小域差距,采用风格提示来对齐目标样本与所需分布,并采用双阶段优化过程。在五个数据集上的评估表明,StepSPT优于现有的提示调优方法和SOTAs。
Key Takeaways
- 源数据不可访问的挑战:现有的CDFSL方法面临使用大型预训练模型时源数据不可访问和训练策略的限制。
- StepSPT方法介绍:通过预测分布优化隐式缩小域差距,采用风格提示与所需分布对齐。
- 双阶段优化过程:StepSPT采用外部和内部两个过程的优化,外部过程通过逐步分布对齐策略将预测分布优化分解为多步对齐问题以调整风格提示,内部过程则使用标准交叉熵损失更新分类器。
- 评估结果:在五个数据集上的评估显示,StepSPT优于现有的提示调优方法和最佳表现技术(SOTAs)。
- 公开可用代码:研究人员的代码将在https://github.com/xuhuali-mxj/StepSPT上公开。
- 实际应用价值:该研究解决了大型预训练模型在小样本学习中的实际问题,具有较高的实用价值。
点此查看论文截图




