发布日期: 2025-11-07

更新日期: 2025-11-27

文章字数: 9.3k

阅读时长: 37 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-11-07 更新

Decoupling Augmentation Bias in Prompt Learning for Vision-Language Models

Authors:Gahyeon Kim, Sohee Kim, Seokju Lee

Recent advances in large-scale vision and language models have led to significant progress in zero-shot learning tasks. Methods such as CoOp and CoCoOp have shown that replacing handcrafted prompts with learnable vectors, known as prompt learning, can result in improved performance. However, these models often struggle to generalize to entirely unseen categories. While traditional zero-shot learning techniques benefit from various data augmentation strategies, prompt learning has primarily focused on text-based modifications, leaving the potential of image-based augmentation largely unexplored. In this work, we explore how image-level augmentations, particularly those that introduce attribute-specific variations, can support and enhance prompt learning. Our analysis examines the interaction between these augmentations and soft prompt frameworks, revealing their potential to improve generalization. We also identify a limitation in existing methods, such as CoCoOp, which do not provide explicit guidance for learning prompts that focus on semantically meaningful visual features. To address this, we propose Adding Attributes to Prompt Learning, AAPL, a novel method that introduces adversarial token embeddings to decouple superficial visual variations introduced by augmentation from class-relevant semantic representations. This decoupling enables the learned prompts to concentrate on visually discriminative features that align with the target categories. We conduct comprehensive experiments on eleven benchmark datasets, and AAPL consistently outperforms existing methods across few-shot, zero-shot, cross-dataset, and domain generalization settings. Our source code is publicly available at: https://github.com/Gahyeonkim09/AAPL

最近大规模视觉和语言模型的进展为零样本学习任务带来了重大突破。CoOp和CoCoOp等方法表明，用可学习向量替代手工提示，即所谓的提示学习，可以提高性能。然而，这些模型通常在推广到未见过的类别时遇到挑战。虽然传统零样本学习技术受益于各种数据增强策略，但提示学习主要关注文本修改，很少探索图像增强方法的潜力。在这项工作中，我们探讨了图像级增强（特别是引入特定属性变化的增强）如何支持和增强提示学习。我们的分析研究了这些增强与软提示框架之间的相互作用，揭示了它们提高泛化的潜力。我们还发现了现有方法（如CoCoOp）的一个局限性，即它们没有为学习专注于语义上有意义的视觉特征的提示提供明确指导。为了解决这一问题，我们提出了“添加属性提示学习”（AAPL）这一新方法，它引入对抗性令牌嵌入来解耦由增强引入的表面视觉变化与类相关的语义表示。这种解耦使得学习到的提示能够集中在与目标类别对齐的视觉区分特征上。我们在11个基准数据集上进行了全面的实验，AAPL在少样本、零样本、跨数据集和领域泛化设置上均表现出超越现有方法的效果。我们的源代码可在以下网址公开获取：https://github.com/Gahyeonkim09/AAPL

论文及项目相关链接

PDF Accepted in Pattern Recognition

Summary
大型视觉与语言模型的最新进展推动了零样本学习任务的显著进步。本研究探讨了图像级别的增强，特别是引入属性特定变化的增强如何支持和增强提示学习。分析表明，这些增强与软提示框架之间的交互具有改善泛化的潜力。此外，我们还指出了现有方法（如CoCoOp）的局限性，并提出了名为AAPL的新方法，通过引入对抗性令牌嵌入来解耦由增强引入的表面视觉变化与类相关的语义表示。在十一个基准数据集上的实验表明，AAPL在少样本、零样本、跨数据集和域泛化设置上均优于现有方法。

Key Takeaways

大型视觉与语言模型的最新进展推动了零样本学习的发展。
提示学习在零样本学习任务中展现了改善性能的可能性。
图像级别的增强，特别是属性特定变化的增强，对于支持并增强提示学习具有重要意义。
软提示框架与图像增强的交互有助于改善模型的泛化能力。
现有方法（如CoCoOp）在学习提示时缺乏针对语义相关视觉特征的明确指导。
AAPL方法通过引入对抗性令牌嵌入来解耦表面视觉变化和类相关语义表示。

Cool Papers

点此查看论文截图

SLIP: Structural-aware Language-Image Pretraining for Vision-Language Alignment

Authors:Wenbo Lu

Vision-Language Pretraining (VLP) has achieved remarkable success across various downstream tasks, but such gains are largely driven by scaling up on training data. Yet, literature methods treat image-text pairs as isolated training examples; this neglects the rich relational structure naturally present in many domains, such as e-commerce product co-purchase graphs and social recommendation networks. Inspired by neuroscientific evidence that human encodes knowledge as relationship cognitive maps, we introduce Structure-aware Language-Image Pretraining (SLIP). SLIP integrates a structural contrastive loss to align modalities while also modeling relationships between neighboring entities in a structured graph. To support this paradigm, we construct a large-scale Amazon Product Co-purchase Multimodal Graph Dataset, enabling structured cross-modality supervision at scale. Experiment results show that SLIP consistently outperforms CLIP on cross-modal retrieval and classification tasks in both zero-shot and few-shot settings, showing the value of relational supervision for cross-modal alignment.

视觉-语言预训练（VLP）在各种下游任务中取得了显著的成功，但这些成果主要得益于训练数据的规模化。然而，现有方法将图像文本对作为孤立的训练示例进行处理，这忽略了多个领域中自然存在的丰富关系结构，如电子商务产品共购图和社交推荐网络。受神经科学证据表明人类将知识编码为关系认知地图的启发，我们引入了结构感知语言图像预训练（SLIP）。SLIP集成了一个结构对比损失，以在对齐模式的同时对结构化图中的相邻实体之间的关系进行建模。为了支持这一范式，我们构建了一个大规模的亚马逊产品共购多模态图数据集，以实现大规模的结构化跨模态监督。实验结果表明，SLIP在零样本和少样本设置的跨模态检索和分类任务上始终优于CLIP，这显示了关系监督对跨模态对齐的价值。

论文及项目相关链接

PDF Capstone Paper

Summary

本文介绍了Vision-Language Pretraining（VLP）在多种下游任务中取得的显著成功，但其主要得益于训练数据的规模化。现有的方法将图像文本对视为孤立训练样本，忽略了多个领域自然存在的丰富关系结构，如电子商务产品共购图和社交推荐网络。受神经科学证据表明人类将知识编码为关系认知图的启发，本文提出了Structure-aware Language-Image Pretraining（SLIP）。SLIP通过集成结构对比损失来对齐不同模态，同时建模结构化图中相邻实体之间的关系。为了支持这一理念，构建了大规模的亚马逊产品共购多模态图数据集，实现了结构化跨模态监督的大规模应用。实验结果表明，SLIP在跨模态检索和分类任务上的零样本和少样本设置中均优于CLIP，证明了关系监督对于跨模态对齐的价值。

Key Takeaways

Vision-Language Pretraining (VLP) 在不同下游任务中取得了显著成功，主要得益于训练数据的规模化。
现有方法忽略了图像文本对之间的丰富关系结构。
SLIP 通过引入结构对比损失来建模图像和文本之间的关系结构。
SLIP 在跨模态检索和分类任务上表现优于 CLIP。
SLIP 利用大规模亚马逊产品共购多模态图数据集进行训练。
关系监督对于跨模态对齐具有重要价值。

Cool Papers

点此查看论文截图

NABench: Large-Scale Benchmarks of Nucleotide Foundation Models for Fitness Prediction

Authors:Zhongmin Li, Runze Ma, Jiahao Tan, Chengzi Tan, Shuangjia Zheng

Nucleotide sequence variation can induce significant shifts in functional fitness. Recent nucleotide foundation models promise to predict such fitness effects directly from sequence, yet heterogeneous datasets and inconsistent preprocessing make it difficult to compare methods fairly across DNA and RNA families. Here we introduce NABench, a large-scale, systematic benchmark for nucleic acid fitness prediction. NABench aggregates 162 high-throughput assays and curates 2.6 million mutated sequences spanning diverse DNA and RNA families, with standardized splits and rich metadata. We show that NABench surpasses prior nucleotide fitness benchmarks in scale, diversity, and data quality. Under a unified evaluation suite, we rigorously assess 29 representative foundation models across zero-shot, few-shot prediction, transfer learning, and supervised settings. The results quantify performance heterogeneity across tasks and nucleic-acid types, demonstrating clear strengths and failure modes for different modeling choices and establishing strong, reproducible baselines. We release NABench to advance nucleic acid modeling, supporting downstream applications in RNA/DNA design, synthetic biology, and biochemistry. Our code is available at https://github.com/mrzzmrzz/NABench.

核苷酸序列变异会导致功能适应性的显著变化。最近的核苷酸基础模型承诺直接从序列预测适应性效应，但异质的数据集和不一致的预处理使得在DNA和RNA家族之间公平比较方法变得困难。在这里，我们介绍了NABench，这是一个用于核酸适应性预测的大规模系统基准测试。NABench汇集了162个高通量测定法，并整理了跨越多种DNA和RNA家族的260万突变序列，具有标准化的分割和丰富的元数据。我们表明，在规模、多样性和数据质量方面，NABench超越了先前的核苷酸适应性基准测试。在统一的评估套件下，我们对零样本、少样本预测、迁移学习和监督环境下的29个代表性基础模型进行了严格评估。结果量化了不同任务和核酸类型之间的性能异质性，展示了不同建模选择的明显优势和失败模式，并建立了强大且可重现的基准。我们发布NABench以促进核酸建模，支持RNA/DNA设计、合成生物学和生物化学等下游应用。我们的代码可在https://github.com/mrzzmrzz/NABench获取。

论文及项目相关链接

PDF

Summary

本文介绍了NABench，一个用于核酸适应性预测的大规模、系统性基准测试平台。该平台汇聚了162个高通量实验，整理出涵盖多种DNA和RNA家族的260万突变序列，具有标准化分割和丰富的元数据。通过对29个代表性基础模型进行统一评估套件下的严格评估，NABench在规模、多样性和数据质量上超越了先前的核苷酸适应性基准测试。评估结果量化了不同任务和核酸类型之间性能的差异，展示了不同建模选择的明显优势和失败模式，为RNA/DNA设计、合成生物学和生物化学等下游应用提供了推动。

Key Takeaways

NABench是一个用于核酸适应性预测的大规模基准测试平台。
平台涵盖了多种DNA和RNA家族的突变序列数据。
NABench具有标准化分割和丰富的元数据，提高了数据质量。
通过对多种基础模型的评估，NABench在规模和多样性上超越了先前基准测试。
评估结果揭示了不同建模选择在不同任务和核酸类型上的性能差异。
NABench为核酸建模提供了推动，支持RNA/DNA设计、合成生物学和生物化学等下游应用。

Cool Papers

点此查看论文截图

Enabling Robust In-Context Memory and Rapid Task Adaptation in Transformers with Hebbian and Gradient-Based Plasticity

Authors:Siddharth Chaudhary

Large language models display in-context learning as an emergent effect of scale, but they rely on static weights during inference. In contrast, biological systems continually adapt via synaptic plasticity. We investigate whether explicit, biologically inspired plasticity can endow Transformers with faster in-sequence adaptation. To this end, we augment decoder-only Transformers with fast-weight modules updated either by (i) a neuromodulated Hebbian rule or (ii) the gradient-based plasticity mechanism of Duan et al. (2023). Across copying, regression, and few-shot classification tasks (CIFAR-FS, Omniglot), Hebbian plasticity consistently achieves lower loss and stronger few-shot generalization, while gradient-based updates perform best on long-horizon credit assignment. When associations are short and linearly separable, static weights suffice, defining a clear boundary condition for when plasticity helps. Analysis of learned modulatory signals reveals that gradient-based rules maintain large, persistent updates, whereas Hebbian plasticity is sharply gated around salient events. Together, these results show that explicit plasticity complements attention by enabling rapid, task-specific adaptation, and clarify when different plasticity mechanisms are most effective.

大型语言模型展现出上下文学习作为规模的一种涌现效应，但它们在推理过程中依赖于静态权重。与之相反，生物系统通过突触可塑性持续适应。我们研究是否明确的、受生物启发的可塑性可以赋予Transformer更快的序列内适应。为此，我们增加了仅解码的Transformer，通过（i）神经调节的赫布规则或（ii）段等人提出的基于梯度的可塑性机制进行快速权重模块更新（2023年）。在复制、回归和少样本分类任务（CIFAR-FS、Omniglot）中，赫布可塑性始终实现更低的损失和更强的少样本泛化能力，而基于梯度的更新在长远视野信用分配方面表现最佳。当关联短暂且线性可分时，静态权重就足够了，这为可塑性何时有帮助设定了明确的边界条件。对学到的调制信号的分析表明，基于梯度的规则维持了大而持久的更新，而赫布可塑性则围绕显著事件进行尖锐的闸门控制。总的来说，这些结果展示了明确的可塑性如何通过实现快速、特定任务的适应来补充注意力，并明确了不同可塑性机制何时最为有效。

论文及项目相关链接

PDF

Summary

大型语言模型展现出上下文学习的能力，这归功于其规模效应，但在推理过程中依赖静态权重。与此相反，生物系统则通过突触可塑性进行持续的适应。本研究旨在探讨明确的、受生物启发的可塑性是否能够赋予Transformer更快的序列适应力。为此，我们为仅解码的Transformer增加了通过（i）神经调节的赫布规则或（ii）段等人提出的基于梯度的可塑性机制更新的快速权重模块。（在拷贝任务、回归任务和少样本分类任务（如CIFAR-FS和Omniglot）上，赫布可塑性达到了更低的损失和更强的泛化能力，而基于梯度的更新在长时间序列上表现最佳。当关联短暂且线性可分时，静态权重足够使用，这为可塑性发挥作用设定了明确的边界条件。对学到的调制信号的分析显示，基于梯度的规则维持了大而持久的更新，而赫布可塑性则围绕重要事件进行尖锐的调节。总体而言，明确的可塑性补充了注意力机制，实现了快速、针对任务的适应，并明确了不同可塑性机制何时最为有效。

Key Takeaways

大型语言模型展现出上下文学习的能力，但推理时依赖静态权重。
生物系统通过突触可塑性进行持续适应。
显式、受生物启发的可塑性可以使Transformer具备更快的序列适应力。
通过赫布可塑性和基于梯度的更新机制增强了Transformer的性能。
在不同任务场景下，赫布可塑性和基于梯度的更新机制各有优势。
当关联短暂且线性可分时，静态权重足够使用，这是可塑性的边界条件之一。

Cool Papers

点此查看论文截图

Navigating High Dimensional Concept Space with Metalearning

Authors:Max Gupta

Rapidly learning abstract concepts from limited examples is a hallmark of human intelligence. This work investigates whether gradient-based meta-learning can equip neural networks with inductive biases for efficient few-shot acquisition of discrete concepts. I compare meta-learning methods against a supervised learning baseline on Boolean concepts (logical statements) generated by a probabilistic context-free grammar (PCFG). By systematically varying concept dimensionality (number of features) and recursive compositionality (depth of grammar recursion), I delineate between complexity regimes in which meta-learning robustly improves few-shot concept learning and regimes in which it does not. Meta-learners are much better able to handle compositional complexity than featural complexity. I highlight some reasons for this with a representational analysis of the weights of meta-learners and a loss landscape analysis demonstrating how featural complexity increases the roughness of loss trajectories, allowing curvature-aware optimization to be more effective than first-order methods. I find improvements in out-of-distribution generalization on complex concepts by increasing the number of adaptation steps in meta-SGD, where adaptation acts as a way of encouraging exploration of rougher loss basins. Overall, this work highlights the intricacies of learning compositional versus featural complexity in high dimensional concept spaces and provides a road to understanding the role of 2nd order methods and extended gradient adaptation in few-shot concept learning.

从有限例子中快速学习抽象概念是人类智能的标志。本研究旨在探究基于梯度的元学习是否能为神经网络配备归纳偏置，以高效地进行离散概念的少量样本获取。我在概率上下文无关语法（PCFG）生成的布尔概念（逻辑陈述）上，将元学习方法与监督学习基线进行了比较。通过系统地改变概念的维度（特征数量）和递归组合性（语法递归深度），我划定了元学习在哪些复杂性环境中能够可靠地改善少量样本的概念学习，以及在哪些环境中则不能。元学习者比特征复杂性更能处理组合复杂性。通过对元学习者的权重表示进行分析，以及通过损失景观分析展示特征复杂性如何增加损失轨迹的粗糙度，从而使曲率感知优化比一阶方法更有效，我强调了这一点的一些原因。我发现通过增加元SGD中的适应步骤数量，可以提高复杂概念上的离群分布泛化能力，其中适应作为一种方式鼓励探索更粗糙的损失盆地。总体而言，这项工作突出了在高维概念空间中学习组合与特征复杂性的细微差别，并为理解二阶方法和扩展梯度适应在少量样本概念学习中的作用提供了途径。

论文及项目相关链接

PDF 7 pages, 3 figures. Presented at the ICML 2025 HiLD Workshop

Summary

本文探讨了基于梯度的元学习是否能让神经网络具备归纳偏见，以有效地从有限样本中学习抽象概念。研究通过对比元学习方法与监督学习基线，在由概率上下文无关文法生成的布尔概念上进行实验。通过系统地改变概念的维度和递归组合性，研究分析了元学习在哪些复杂度情况下能稳健地提高少样本概念学习，以及在哪些情况下无法提高。此外，研究还通过元学习器的权重表示分析和损失轨迹的粗糙性分析，揭示了元学习器处理组合复杂性和特征复杂性的差异。通过增加元SGD中的适应步骤数量，发现对复杂概念的泛化能力有所提高，其中适应作为一种鼓励探索粗糙损失盆地的方式。总体而言，本文揭示了高维概念空间中学习组合与特征复杂性的细微差别，并为理解二阶方法和扩展梯度适应在少样本概念学习中的作用提供了方向。

Key Takeaways

本文探讨了元学习在少样本概念学习中的应用，关注其在处理抽象概念时的效率。
通过对比实验，研究了元学习在处理不同维度和组合性的概念时的表现。
发现元学习在处理组合复杂性方面优于特征复杂性。
通过分析元学习器的权重和损失轨迹，揭示了其处理复杂性的机制。
增加适应步骤数量可提高在复杂概念上的泛化能力。
适应过程有助于探索粗糙的损失盆地。

Cool Papers

点此查看论文截图

Scalable Medication Extraction and Discontinuation Identification from Electronic Health Records Using Large Language Models

Authors:Chong Shao, Douglas Snyder, Chiran Li, Bowen Gu, Kerry Ngan, Chun-Ting Yang, Jiageng Wu, Richard Wyss, Kueiyu Joshua Lin, Jie Yang

Identifying medication discontinuations in electronic health records (EHRs) is vital for patient safety but is often hindered by information being buried in unstructured notes. This study aims to evaluate the capabilities of advanced open-sourced and proprietary large language models (LLMs) in extracting medications and classifying their medication status from EHR notes, focusing on their scalability on medication information extraction without human annotation. We collected three EHR datasets from diverse sources to build the evaluation benchmark. We evaluated 12 advanced LLMs and explored multiple LLM prompting strategies. Performance on medication extraction, medication status classification, and their joint task (extraction then classification) was systematically compared across all experiments. We found that LLMs showed promising performance on the medication extraction and discontinuation classification from EHR notes. GPT-4o consistently achieved the highest average F1 scores in all tasks under zero-shot setting - 94.0% for medication extraction, 78.1% for discontinuation classification, and 72.7% for the joint task. Open-sourced models followed closely, Llama-3.1-70B-Instruct achieved the highest performance in medication status classification on the MIV-Med dataset (68.7%) and in the joint task on both the Re-CASI (76.2%) and MIV-Med (60.2%) datasets. Medical-specific LLMs demonstrated lower performance compared to advanced general-domain LLMs. Few-shot learning generally improved performance, while CoT reasoning showed inconsistent gains. LLMs demonstrate strong potential for medication extraction and discontinuation identification on EHR notes, with open-sourced models offering scalable alternatives to proprietary systems and few-shot can further improve LLMs’ capability.

识别电子健康记录（EHRs）中的药物停用对于患者安全至关重要，但往往因信息隐藏在非结构化笔记中而受到阻碍。本研究旨在评估先进的开源和专有大型语言模型（LLMs）从EHR笔记中提取药物和分类其用药状态的能力，重点关注其在无需人工标注的药物信息提取上的可扩展性。我们从多种来源收集了三个EHR数据集来建立评估基准。我们评估了12个先进的大型语言模型，并探索了多种大型语言模型提示策略。系统地比较了所有实验中药物提取、用药状态分类及其联合任务（先提取后分类）的性能。我们发现大型语言模型在EHR笔记中的药物提取和停药分类方面表现出有希望的性能。GPT-4o在所有任务中的平均F1得分始终最高，零样本设置下的得分分别为：药物提取94.0%，停药分类78.1%，联合任务72.7%。开源模型紧随其后，Llama-3.1-70B-Instruct在MIV-Med数据集上的用药状态分类及Re-CASI和MIV-Med数据集的联合任务中均取得了最高性能（分别为68.7%、76.2%和60.2%）。特定医疗领域的大型语言模型与先进通用领域的大型语言模型相比，表现较差。少样本学习通常可以提高性能，而CoT推理显示出不一致的收益。大型语言模型在EHR笔记中的药物提取和停药识别方面表现出强大的潜力，开源模型为专有系统提供了可扩展的替代方案，而少样本学习可以进一步提高大型语言模型的能力。

论文及项目相关链接

PDF

Summary

本文研究了利用先进的开源和专有大型语言模型（LLMs）从电子健康记录（EHRs）中提取药物信息并分类其用药状态的能力。实验表明，LLMs在药物提取和停药分类方面表现出良好性能，GPT-4o在零样本设置下表现最佳。此外，开源模型如Llama-3.1-70B-Instruct也有出色表现，而针对医疗领域的LLMs性能较低。少量样本学习能提高性能，而CoT推理则表现出不一致的效果。LLMs在EHRs中的药物提取和停药识别方面具有强大潜力，开源模型可提供可扩展的替代方案。

Key Takeaways

先进的LLMs在药物提取和停药分类方面表现出良好性能。
GPT-4o在零样本设置下表现最佳，平均F1分数高。
开源模型如Llama-3.1-70B-Instruct在药物状态分类方面表现优秀。
医疗领域的LLMs性能较低。
少量样本学习能提高LLMs的性能。
CoT推理的效果表现不一。

Cool Papers

点此查看论文截图

UniFault: A Fault Diagnosis Foundation Model from Bearing Data

Authors:Emadeldeen Eldele, Mohamed Ragab, Xu Qing, Edward, Zhenghua Chen, Min Wu, Xiaoli Li, Jay Lee

Machine fault diagnosis (FD) is a critical task for predictive maintenance, enabling early fault detection and preventing unexpected failures. Despite its importance, existing FD models are operation-specific with limited generalization across diverse datasets. Foundation models (FM) have demonstrated remarkable potential in both visual and language domains, achieving impressive generalization capabilities even with minimal data through few-shot or zero-shot learning. However, translating these advances to FD presents unique hurdles. Unlike the large-scale, cohesive datasets available for images and text, FD datasets are typically smaller and more heterogeneous, with significant variations in sampling frequencies and the number of channels across different systems and applications. This heterogeneity complicates the design of a universal architecture capable of effectively processing such diverse data while maintaining robust feature extraction and learning capabilities. In this paper, we introduce UniFault, a foundation model for fault diagnosis that systematically addresses these issues. Specifically, the model incorporates a comprehensive data harmonization pipeline featuring two key innovations. First, a unification scheme transforms multivariate inputs into standardized univariate sequences. Second, a novel cross-domain temporal fusion strategy mitigates distribution shifts and enriches sample diversity and count, improving the model generalization across varying conditions. UniFault is pretrained on over 6.9 million samples spanning diverse FD datasets, enabling superior few-shot performance. Extensive experiments on real-world FD datasets demonstrate that UniFault achieves state-of-the-art performance, setting a new benchmark for fault diagnosis models and paving the way for more scalable and robust predictive maintenance solutions.

机器故障诊断（FD）是预测性维护中的一项关键任务，能够实现早期故障检测并防止意外故障的发生。尽管其重要性显著，但现有的FD模型通常具有操作特异性，在多种数据集上的泛化能力有限。基础模型（FM）在视觉和语言领域表现出了惊人的潜力，即使在少量数据的情况下，也能通过小样本学习实现令人印象深刻的泛化能力。然而，将这些进展转化为FD面临着独特的挑战。与可用于图像和文本的大规模连贯数据集相比，FD数据集通常更小、更异构，不同系统和应用的采样频率和数据通道数量存在重大差异。这种异质性使得设计一个能够处理如此多样数据的同时保持稳健的特征提取和学习能力的通用架构变得复杂。在本文中，我们介绍了UniFault，一个用于故障诊断的基础模型，该模型系统地解决了这些问题。具体来说，该模型采用全面的数据协调管道，具有两个关键创新点。首先，统一方案将多元输入转换为标准化的单变量序列。其次，一种新的跨域时间融合策略缓解了分布偏移问题并丰富了样本多样性和数量，提高了模型在不同条件下的泛化能力。UniFault在涵盖多种FD数据集的超690万个样本上进行预训练，可实现出色的小样本性能。在真实世界的FD数据集上进行的广泛实验表明，UniFault达到了最先进的性能水平，为故障诊断模型设定了新的基准，并为更可扩展和稳健的预测性维护解决方案铺平了道路。

论文及项目相关链接

PDF

Summary
工业故障诊断是预测性维护的关键任务，对于防止意外故障有着重要作用。当前故障诊断模型的操作具有特异性，对多样数据集的泛化能力受限。基础模型（FM）在视觉和语言领域展现了出色的潜力，能通过小样本学习实现令人印象深刻的泛化能力。然而，将这一进展应用于故障诊断面临独特挑战。数据集小且多样、采样频率差异大等导致设计能够处理此类数据的通用架构变得复杂。本文提出UniFault模型，系统解决这些问题。模型包含全面的数据调和管道，通过统一方案将多元输入转化为标准化一元序列，并通过新型跨域时间融合策略解决分布转移问题，提高样本多样性和数量，改善模型在不同条件下的泛化能力。UniFault在多个真实世界故障诊断数据集上进行预训练，展现出卓越的小样本性能，为故障诊断模型树立新标杆，为更可扩展和稳健的预测性维护解决方案铺平道路。

Key Takeaways

机器故障诊断（FD）是预测性维护的关键任务，要求早期故障检测。
当前FD模型操作特异性高，泛化能力有限。
基础模型（FM）在视觉和语言领域具有出色泛化潜力。
将FM应用于FD面临数据小、多样性和采样频率差异等挑战。
UniFault模型通过数据调和管道解决这些问题，包含统一方案和跨域时间融合策略。
UniFault通过预训练在多个FD数据集上实现优越的小样本性能。

Cool Papers

点此查看论文截图

Sundial: A Family of Highly Capable Time Series Foundation Models

Authors:Yong Liu, Guo Qin, Zhiyuan Shi, Zhi Chen, Caiyin Yang, Xiangdong Huang, Jianmin Wang, Mingsheng Long

We introduce Sundial, a family of native, flexible, and scalable time series foundation models. To predict the next-patch’s distribution, we propose a TimeFlow Loss based on flow-matching, which facilitates native pre-training of Transformers on continuous-valued time series without discrete tokenization. Conditioned on arbitrary-length time series, our models are pre-trained without specifying any prior distribution and can generate multiple probable predictions, achieving more flexibility in representation learning than using parametric densities. Towards time series foundation models, we leverage minimal but crucial adaptations of Transformers and curate TimeBench with one trillion time points, comprising mostly real-world datasets and synthetic data. By mitigating mode collapse via TimeFlow Loss, we pre-train a family of Sundial models on TimeBench, which achieve unprecedented model capacity and generalization performance. In addition to excellent scalability, Sundial achieves state-of-the-art results on both point and probabilistic forecasting benchmarks with a just-in-time inference speed, i.e., making zero-shot predictions within a few milliseconds. We believe that Sundial’s pioneering generative forecasting capability can improve model reliability in real-world decision-making. Code is available at: https://github.com/thuml/Sundial.

我们介绍了Sundial，这是一系列原生、灵活且可扩展的时间序列基础模型。为了预测下一个片段的分布，我们提出了基于流匹配的TimeFlow Loss，它促进了Transformer在连续值时间序列上的原生预训练，无需进行离散令牌化。我们的模型可以对任意长度的时间序列进行预训练，无需指定任何先验分布，并且可以生成多个可能的预测，与参数密度方法相比，在表示学习中实现了更大的灵活性。在时间序列基础模型方面，我们对Transformer进行了微小但关键性的调整，并精心制作了包含大多数现实世界数据集和合成数据的TimeBench数据集，包含万亿个时间点。通过TimeFlow Loss减轻模式崩溃问题，我们在TimeBench上预训练了一系列Sundial模型，实现了前所未有的模型容量和泛化性能。除了出色的可扩展性外，Sundial在点预测和概率预测基准测试上都达到了最新水平，即时推理速度极快，即零样本预测仅需几毫秒。我们相信Sundial开创性的生成预测能力可以提高真实世界决策中的模型可靠性。代码可从以下网站获取：https://github.com/thuml/Sundial 。

论文及项目相关链接

PDF

Summary

本文介绍了Sundial系列原生、灵活且可扩展的时间序列基础模型。提出基于流匹配的TimeFlow Loss，用于在无需离散标记的情况下对连续值时间序列进行原生预训练。该模型能够在未指定先验分布的情况下进行预训练，生成多种可能的预测结果，从而实现表示学习上的更大灵活性。模型还利用最小但关键的Transformer改进，精心打造了TimeBench数据集，包含万亿个时间点，以涵盖现实数据和合成数据为主。经过TimeFlow Loss防止模式崩溃的方法，在TimeBench上进行预训练的Sundial系列模型表现出前所未有的容量和泛化性能，不仅在点和概率预测基准测试中取得了最先进的成果，还具有即时推理速度，能够在几毫秒内完成零样本预测。相信Sundial开创性的生成预测能力可以提高现实决策中的模型可靠性。

Key Takeaways

Sundial是原生、灵活且可扩展的时间序列基础模型。
提出TimeFlow Loss，基于流匹配进行预训练，无需离散标记处理连续值时间序列。
模型能够生成多种预测结果，实现表示学习上的灵活性。
利用最小但关键的Transformer改进，创建TimeBench数据集。
TimeBench包含大量现实和合成数据，涵盖多种时间序列场景。
通过TimeFlow Loss防止模式崩溃，提高模型的泛化性能和容量。

Cool Papers

点此查看论文截图

MSDNet: Multi-Scale Decoder for Few-Shot Semantic Segmentation via Transformer-Guided Prototyping

Authors:Amirreza Fateh, Mohammad Reza Mohammadi, Mohammad Reza Jahed Motlagh

Few-shot Semantic Segmentation addresses the challenge of segmenting objects in query images with only a handful of annotated examples. However, many previous state-of-the-art methods either have to discard intricate local semantic features or suffer from high computational complexity. To address these challenges, we propose a new Few-shot Semantic Segmentation framework based on the Transformer architecture. Our approach introduces the spatial transformer decoder and the contextual mask generation module to improve the relational understanding between support and query images. Moreover, we introduce a multi scale decoder to refine the segmentation mask by incorporating features from different resolutions in a hierarchical manner. Additionally, our approach integrates global features from intermediate encoder stages to improve contextual understanding, while maintaining a lightweight structure to reduce complexity. This balance between performance and efficiency enables our method to achieve competitive results on benchmark datasets such as PASCAL-5^i and COCO-20^i in both 1-shot and 5-shot settings. Notably, our model with only 1.5 million parameters demonstrates competitive performance while overcoming limitations of existing methodologies. https://github.com/amirrezafateh/MSDNet

少样本语义分割（Few-shot Semantic Segmentation）旨在解决仅使用少量标注示例来对查询图像中的对象进行分割的挑战。然而，许多现有的最先进的方法要么不得不丢弃复杂的局部语义特征，要么面临计算复杂度高的困扰。为了应对这些挑战，我们提出了一种基于Transformer架构的少样本语义分割框架。我们的方法引入了空间变换解码器和上下文掩模生成模块，以提高支持图像和查询图像之间的关系理解。此外，我们引入了多尺度解码器，以层次的方式融入不同分辨率的特征来优化分割掩模。同时，我们的方法融合了中间编码器阶段的全局特征以提高上下文理解，同时保持轻量级结构以降低复杂度。性能与效率之间的这种平衡使我们的方法能够在PASCAL-5^i和COCO-20^i等基准数据集上实现具有竞争力的结果，无论是在单次射击还是五次射击的环境中。值得注意的是，我们的模型仅有150万个参数，展现了具有竞争力的性能，同时克服了现有方法的局限性。详情请参见https://github.com/amirrezafateh/MSDNet。

论文及项目相关链接

PDF

Summary

本文提出基于Transformer架构的Few-shot语义分割新框架，通过引入空间变换解码器、上下文掩膜生成模块和多尺度解码器，提高了对支持图像和查询图像之间关系的理解，实现了对少量标注样本的图像分割。该模型在PASCAL-5^i和COCO-20^i等基准数据集上表现优异，展现出强大的性能与效率平衡。

Key Takeaways