发布日期: 2025-10-18

更新日期: 2025-11-27

文章字数: 19.4k

阅读时长: 79 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-10-18 更新

Unifying Environment Perception and Route Choice Modeling for Trajectory Representation Learning

Authors:Ji Cao, Yu Wang, Tongya Zheng, Zujie Ren, Canghong Jin, Gang Chen, Mingli Song

Trajectory Representation Learning (TRL) aims to encode raw trajectories into low-dimensional vectors, which can then be leveraged in various downstream tasks, including travel time estimation, location prediction, and trajectory similarity analysis. However, existing TRL methods suffer from a key oversight: treating trajectories as isolated spatio-temporal sequences, without considering the external environment and internal route choice behavior that govern their formation. To bridge this gap, we propose a novel framework that unifies comprehensive environment \textbf{P}erception and explicit \textbf{R}oute choice modeling for effective \textbf{Traj}ectory representation learning, dubbed \textbf{PRTraj}. Specifically, PRTraj first introduces an Environment Perception Module to enhance the road network by capturing multi-granularity environmental semantics from surrounding POI distributions. Building on this environment-aware backbone, a Route Choice Encoder then captures the route choice behavior inherent in each trajectory by modeling its constituent road segment transitions as a sequence of decisions. These route-choice-aware representations are finally aggregated to form the global trajectory embedding. Extensive experiments on 3 real-world datasets across 5 downstream tasks validate the effectiveness and generalizability of PRTraj. Moreover, PRTraj demonstrates strong data efficiency, maintaining robust performance under few-shot scenarios. Our code is available at: https://anonymous.4open.science/r/PRTraj.

轨迹表示学习（TRL）旨在将原始轨迹编码为低维向量，然后可用于各种下游任务，包括旅行时间估计、位置预测和轨迹相似性分析。然而，现有的TRL方法存在一个重要疏忽：将轨迹视为孤立的时空序列，而没有考虑控制其形成的外部环境和内部路线选择行为。为了弥补这一差距，我们提出了一个统一全面的环境感知和明确的路线选择建模的有效轨迹表示学习框架，称为PRTraj。具体来说，PRTraj首先引入环境感知模块，通过捕获周围POI分布的多粒度环境语义来增强道路网络。在此基础上，路线选择编码器捕获每条轨迹固有的路线选择行为，通过将其构成的路段转换建模为一系列决策。这些基于路线选择的表示最终被聚合以形成全局轨迹嵌入。在三个真实世界数据集上的五个下游任务的广泛实验验证了PRTraj的有效性和通用性。而且，PRTraj表现出很强的数据效率，在少量样本场景下也能保持稳健的性能。我们的代码可在以下链接找到：https://anonymous.4open.science/r/PRTraj。

论文及项目相关链接

PDF

Summary

轨迹表示学习（TRL）旨在将原始轨迹编码为低维向量，可应用于旅行时间估计、位置预测和轨迹相似性分析等下游任务。然而，现有的TRL方法忽略了轨迹形成背后的外部环境和内部路线选择行为。为此，我们提出了一个统一环境感知和路线选择建模的新框架，用于有效的轨迹表示学习，称为PRTraj。实验表明，PRTraj在多个真实世界数据集上的五个下游任务中表现出有效性和通用性。而且，PRTraj在少量数据场景下也表现出强大的数据效率。

Key Takeaways

轨迹表示学习（TRL）旨在将轨迹编码为低维向量，用于多种下游任务。
现有TRL方法忽略轨迹背后的外部环境和内部路线选择行为。
PRTraj框架通过环境感知模块捕捉多粒度环境语义，增强道路网络。
PRTraj通过路线选择编码器建模轨迹的内在路线选择行为，将路线选择感知表示聚合形成全局轨迹嵌入。
PRTraj在多个真实世界数据集上的五个下游任务中表现出有效性和通用性。
PRTraj在少量数据场景下表现出强大的数据效率。

Cool Papers

点此查看论文截图

David vs. Goliath: A comparative study of different-sized LLMs for code generation in the domain of automotive scenario generation

Authors:Philipp Bauerfeind, Amir Salarpour, David Fernandez, Pedram MohajerAnsari, Johannes Reschke, Mert D. Pesé

Scenario simulation is central to testing autonomous driving systems. Scenic, a domain-specific language (DSL) for CARLA, enables precise and reproducible scenarios, but NL-to-Scenic generation with large language models (LLMs) suffers from scarce data, limited reproducibility, and inconsistent metrics. We introduce NL2Scenic, an open dataset and framework with 146 NL/Scenic pairs, a difficulty-stratified 30-case test split, an Example Retriever, and 14 prompting variants (ZS, FS, CoT, SP, MoT). We evaluate 13 models: four proprietary (GPT-4o, GPT-5, Claude-Sonnet-4, Gemini-2.5-pro) and nine open-source code models (Qwen2.5Coder 0.5B-32B; CodeLlama 7B/13B/34B), using text metrics (BLEU, ChrF, EDIT-SIM, CrystalBLEU) and execution metrics (compilation and generation), and compare them with an expert study (n=11). EDIT-SIM correlates best with human judgments; we also propose EDIT-COMP (F1 of EDIT-SIM and compilation) as a robust dataset-level proxy that improves ranking fidelity. GPT-4o performs best overall, while Qwen2.5Coder-14B reaches about 88 percent of its expert score on local hardware. Retrieval-augmented prompting, Few-Shot with Example Retriever (FSER), consistently boosts smaller models, and scaling shows diminishing returns beyond mid-size, with Qwen2.5Coder outperforming CodeLlama at comparable scales. NL2Scenic and EDIT-COMP offer a standardized, reproducible basis for evaluating Scenic code generation and indicate that mid-size open-source models are practical, cost-effective options for autonomous-driving scenario programming.

场景模拟是测试自动驾驶系统的核心。针对CARLA的域特定语言（DSL）Scenic能够实现精确且可重复的场景，但使用大型语言模型（LLM）进行NL-to-Scenic生成面临着数据稀缺、可重复性有限和指标不一致的问题。我们推出了NL2Scenic，这是一个包含146个NL/Scenic对、难度分层30例测试集的开放数据集和框架，还包含示例检索器以及14种提示变体（ZS、FS、CoT、SP、MoT）。我们评估了13个模型，包括四个专有模型（GPT-4o、GPT-5、Claude-Sonnet-4、Gemini-2.5-pro）和九个开源代码模型（Qwen2.5Coder 0.5B-32B；CodeLlama 7B/13B/34B），使用了文本指标（BLEU、ChrF、EDIT-SIM、CrystalBLEU）和执行指标（编译和生成），并与专家研究（n=11）进行了比较。EDIT-SIM与人类判断的相关性最好；我们还提出了EDIT-COMP（EDIT-SIM和编译的F1值）作为一个稳健的数据集级别代理，可以提高排名保真度。GPT-4o总体表现最佳，而Qwen2.5Coder-14B在本地硬件上达到了专家得分的约88%。检索增强提示、带有示例检索器的Few-Shot（FSER）始终可以提升较小模型的表现，而在中等规模之后，规模扩大收益递减，Qwen2.5Coder在相当规模上表现优于CodeLlama。NL2Scenic和EDIT-COMP为评估Scenic代码生成提供了标准化、可重复的基础，并表明中等规模的开源模型是自动驾驶场景编程实用且经济实惠的选择。

论文及项目相关链接

PDF

Summary
针对自动驾驶系统测试中的场景模拟问题，本文介绍了NL2Scenic这一开放数据集和框架。该框架解决了NL-to-Scenic生成中的一些问题，如数据稀缺、重现性差和指标不一致等。通过引入多种模型和评估指标，本文发现GPT-4o性能最佳，而Qwen2.5Coder在某些场景下表现优秀。此外，文章还讨论了数据增强提示对小型模型性能的提升以及模型规模与性能的关系。总体来说，NL2Scenic为Scenic代码生成提供了一个标准化、可重现的评估基础。

Key Takeaways

NL2Scenic是一个用于评估Scenic代码生成能力的开放数据集和框架。
NL2Scenic解决了NL-to-Scenic生成中的数据稀缺、重现性差和指标不一致等问题。
GPT-4o在性能评估中表现最佳。
Qwen2.5Coder在某些场景下表现优秀，与专家评分接近。
检索增强提示技术有助于提高小型模型的性能。
模型规模与性能的关系呈现边际收益递减的趋势。

Cool Papers

点此查看论文截图

Efficient Few-Shot Learning in Remote Sensing: Fusing Vision and Vision-Language Models

Authors:Jia Yun Chua, Argyrios Zolotas, Miguel Arana-Catania

Remote sensing has become a vital tool across sectors such as urban planning, environmental monitoring, and disaster response. While the volume of data generated has increased significantly, traditional vision models are often constrained by the requirement for extensive domain-specific labelled data and their limited ability to understand the context within complex environments. Vision Language Models offer a complementary approach by integrating visual and textual data; however, their application to remote sensing remains underexplored, particularly given their generalist nature. This work investigates the combination of vision models and VLMs to enhance image analysis in remote sensing, with a focus on aircraft detection and scene understanding. The integration of YOLO with VLMs such as LLaVA, ChatGPT, and Gemini aims to achieve more accurate and contextually aware image interpretation. Performance is evaluated on both labelled and unlabelled remote sensing data, as well as degraded image scenarios which are crucial for remote sensing. The findings show an average MAE improvement of 48.46% across models in the accuracy of aircraft detection and counting, especially in challenging conditions, in both raw and degraded scenarios. A 6.17% improvement in CLIPScore for comprehensive understanding of remote sensing images is obtained. The proposed approach combining traditional vision models and VLMs paves the way for more advanced and efficient remote sensing image analysis, especially in few-shot learning scenarios.

遥感技术已成为城市规划、环境监测和灾害应对等领域的重要工具。虽然产生的数据量大幅增加，但传统视觉模型往往受限于需要大量特定领域的标记数据及其理解复杂环境中上下文的能力有限。视觉语言模型通过整合视觉和文本数据提供了一种互补的方法，但它们在遥感领域的应用仍然被探索不足，尤其是考虑到它们的通用性。本研究旨在将视觉模型和视觉语言模型结合起来，以提高遥感图像分析的精度。工作重点放在飞机检测和场景理解上。通过整合YOLO模型和LLaVA、ChatGPT和Gemini等视觉语言模型，实现更准确和具有上下文意识的图像解读。评估性能时，既考虑了标记的和未标记的遥感数据，也考虑了对于遥感至关重要的退化图像场景。研究结果显示，在飞机检测和计数方面，与传统模型相比，新模型的平均MAE（平均绝对误差）提高了48.46%，尤其是在原始和退化场景中挑战性的条件下更是如此。在遥感图像的综合理解方面，CLIPScore提高了6.17%。所提出结合传统视觉模型和视觉语言模型的方法为更先进和高效的遥感图像分析铺平了道路，特别是在小样学习场景中。

论文及项目相关链接

PDF 11 pages, 7 figures, 8 tables. To be published in Applied AI Letters

Summary
遥感技术在城市规划、环境监测和灾害响应等领域扮演着重要角色。随着数据量的增长，传统视觉模型受限于大量特定领域的标注数据和复杂环境下的上下文理解能力。本研究探讨了结合视觉模型和视觉语言模型（VLMs）以增强遥感图像分析的准确性，特别是在飞机检测和场景理解方面的应用。通过整合YOLO与LLaVA、ChatGPT和Gemini等VLMs，在标记和未标记的遥感数据以及退化图像场景下进行评估，结果显示飞机检测和计数的准确性平均提高了48.46%，尤其是在原始和退化场景中更具挑战性。同时，对遥感图像的综合理解也取得了CLIPScore的6.17%的提升。结合传统视觉模型和VLMs的方法为更先进和高效的遥感图像分析铺平了道路，特别是在小样本学习场景中。

Key Takeaways

遥感技术广泛应用于多个领域，如城市规划、环境监测和灾害响应。
传统视觉模型受限于需要大量特定领域的标注数据和在复杂环境下的上下文理解能力。
视觉语言模型（VLMs）能够通过整合视觉和文本数据提供一种新的方法。
研究结合了YOLO与LLaVA、ChatGPT和Gemini等VLMs进行遥感图像分析，专注于飞机检测和场景理解。
在标记和未标记的遥感数据以及退化图像场景下进行了性能评估。
飞机检测和计数的准确性平均提高了48.46%，尤其在具有挑战性的场景下。

Cool Papers

点此查看论文截图

CoLoR-GAN: Continual Few-Shot Learning with Low-Rank Adaptation in Generative Adversarial Networks

Authors:Munsif Ali, Leonardo Rossi, Massimo Bertozzi

Continual learning (CL) in the context of Generative Adversarial Networks (GANs) remains a challenging problem, particularly when it comes to learn from a few-shot (FS) samples without catastrophic forgetting. Current most effective state-of-the-art (SOTA) methods, like LFS-GAN, introduce a non-negligible quantity of new weights at each training iteration, which would become significant when considering the long term. For this reason, this paper introduces \textcolor{red}{\textbf{\underline{c}}}ontinual few-sh\textcolor{red}{\textbf{\underline{o}}}t learning with \textcolor{red}{\textbf{\underline{lo}}}w-\textcolor{red}{\textbf{\underline{r}}}ank adaptation in GANs named CoLoR-GAN, a framework designed to handle both FS and CL together, leveraging low-rank tensors to efficiently adapt the model to target tasks while reducing even more the number of parameters required. Applying a vanilla LoRA implementation already permitted us to obtain pretty good results. In order to optimize even further the size of the adapters, we challenged LoRA limits introducing a LoRA in LoRA (LLoRA) technique for convolutional layers. Finally, aware of the criticality linked to the choice of the hyperparameters of LoRA, we provide an empirical study to easily find the best ones. We demonstrate the effectiveness of CoLoR-GAN through experiments on several benchmark CL and FS tasks and show that our model is efficient, reaching SOTA performance but with a number of resources enormously reduced. Source code is available on \href{https://github.com/munsifali11/CoLoR-GAN}{Github.

在生成对抗网络（GANs）的背景下，持续学习（CL）仍然是一个具有挑战性的问题，特别是在使用少量样本（FS）进行学习的场景中，避免灾难性遗忘尤为重要。当前最先进的方法（如LFS-GAN）每次训练迭代都会引入大量的新权重，这在长期考虑时变得尤为显著。因此，本文介绍了名为CoLoR-GAN的GANs中的持续少量样本学习，利用低秩自适应来设计框架同时处理少量样本学习和持续学习。通过利用低秩张量有效地使模型适应目标任务，同时进一步减少所需的参数数量。通过应用简单的LoRA实现，我们已经取得了相当不错的结果。为了进一步优化适配器的大小，我们引入了针对卷积层的嵌套LoRA（LLoRA）技术。最后，考虑到LoRA超参数选择的重要性，我们提供了一项实证研究，以便轻松找到最佳参数。我们通过多个基准测试CL和FS任务的实验证明了CoLoR-GAN的有效性，并展示了我们的模型具有高效率，达到了最新技术水平但大幅减少了资源消耗。源代码可以在Github上找到：https://github.com/munsifali11/CoLoR-GAN。

论文及项目相关链接

PDF

Summary

本论文介绍了针对生成对抗网络（GANs）的持续学习（CL）问题，特别是在有限样本（FS）下的挑战。论文提出了名为CoLoR-GAN的框架，利用低秩张量来有效地适应目标任务，同时进一步减少所需的参数数量。通过引入LoRA in LoRA（LLoRA）技术，优化了适配器的大小。论文还提供了关于LoRA超参数选择的实证研究，并通过多个基准测试任务展示了CoLoR-GAN的有效性，实现了高效且资源消耗大大减少的性能，达到或超越了现有最佳水平。

Key Takeaways

介绍了一种名为CoLoR-GAN的框架，用于处理有限样本下的生成对抗网络的持续学习问题。
CoLoR-GAN利用低秩张量进行模型适应，有效减少所需的参数数量。
引入了LoRA in LoRA（LLoRA）技术，进一步优化适配器大小。
论文提供了关于LoRA超参数选择的实证研究，以方便找到最佳参数。
通过多个基准测试任务验证了CoLoR-GAN的有效性。
CoLoR-GAN达到了现有最佳水平，并实现了资源消耗大大减少的性能。

Cool Papers

点此查看论文截图

ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking

Authors:Yutao Wu, Xiao Liu, Yinghui Li, Yifeng Gao, Yifan Ding, Jiale Ding, Xiang Zheng, Xingjun Ma

Knowledge poisoning poses a critical threat to Retrieval-Augmented Generation (RAG) systems by injecting adversarial content into knowledge bases, tricking Large Language Models (LLMs) into producing attacker-controlled outputs grounded in manipulated context. Prior work highlights LLMs’ susceptibility to misleading or malicious retrieved content. However, real-world fact-checking scenarios are more challenging, as credible evidence typically dominates the retrieval pool. To investigate this problem, we extend knowledge poisoning to the fact-checking setting, where retrieved context includes authentic supporting or refuting evidence. We propose \textbf{ADMIT} (\textbf{AD}versarial \textbf{M}ulti-\textbf{I}njection \textbf{T}echnique), a few-shot, semantically aligned poisoning attack that flips fact-checking decisions and induces deceptive justifications, all without access to the target LLMs, retrievers, or token-level control. Extensive experiments show that ADMIT transfers effectively across 4 retrievers, 11 LLMs, and 4 cross-domain benchmarks, achieving an average attack success rate (ASR) of 86% at an extremely low poisoning rate of $0.93 \times 10^{-6}$, and remaining robust even in the presence of strong counter-evidence. Compared with prior state-of-the-art attacks, ADMIT improves ASR by 11.2% across all settings, exposing significant vulnerabilities in real-world RAG-based fact-checking systems.

知识污染对检索增强生成（RAG）系统构成严重威胁，它通过向知识库注入对抗性内容，欺骗大型语言模型（LLM）在操纵的上下文中生成攻击者控制的输出。早期的研究强调了LLM易受误导或恶意检索内容的影响。然而，现实世界的事实核查场景更具挑战性，因为可信的证据通常主导检索池。为了研究这个问题，我们将知识污染扩展到事实核查环境，其中检索到的上下文包括真实的支持或反驳证据。我们提出ADMIT（对抗性多注入技术），这是一种少数、语义对齐的污染攻击，能够颠覆事实核查决策并引发欺骗性辩解，无需访问目标LLM、检索器或令牌级别控制。大量实验表明，ADMIT在4个检索器、1a个LLM和4个跨域基准测试上的转移非常有效，在极低的污染率（0.93×10^-6）下，平均攻击成功率（ASR）达到86%，即使在存在强有力的反证的情况下也保持稳健。与先前的最先进的攻击相比，ADMIT在所有设置中的ASR提高了11.2%，暴露了基于RAG的现实世界事实核查系统中的重大漏洞。

论文及项目相关链接

PDF

Summary

本文探讨了知识中毒对检索增强生成（RAG）系统的威胁，通过向知识库中注入对抗性内容，误导大型语言模型（LLM）产生基于操纵上下文的输出。针对事实核查场景，提出ADMIT对抗性多注入技术，能在无需访问目标LLM、检索器或控制令牌级别的情况下，翻转事实核查结果并产生欺骗性理由。实验表明，ADMIT在四个检索器、11个LLM和四个跨域基准测试上的平均攻击成功率（ASR）达到86%，且在强大的反证存在的情况下仍能保持稳健。与先前的最先进的攻击相比，ADMIT在所有设置中的ASR提高了11.2%，暴露了现实世界中基于RAG的事实核查系统的重大漏洞。

Key Takeaways

知识中毒对检索增强生成（RAG）系统构成重大威胁，可通过注入对抗性内容误导LLM。
在事实核查场景中，存在对抗性内容可能导致语言模型在真实证据与假信息间做出错误的决策。
提出的ADMIT技术能在无需访问目标LLM、检索器或控制令牌级别的情况下，有效翻转事实核查结果。
ADMIT在多个检索器和LLM上进行了广泛实验，平均攻击成功率（ASR）高达86%。
ADMIT在极低的知识中毒率下即可实现高攻击成功率，显示出RAG系统的显著漏洞。
与现有攻击相比，ADMIT的攻击成功率提高了11.2%，显示出其优越性。

Cool Papers

点此查看论文截图

Program of Thoughts for Financial Reasoning: Leveraging Dynamic In-Context Examples and Generative Retrieval

Authors:Subhendu Khatuya, Shashwat Naidu, Pawan Goyal, Niloy Ganguly

Despite continuous advancements in the capabilities of large language models (LLMs), numerical reasoning remains a challenging area. Techniques like chain-of-thought prompting, tree-of-thought prompting, and program-of-thought prompting guide LLMs through intermediate reasoning steps. Although in-context learning with few-shot prompting has improved performance, LLMs still lag behind state-of-the-art models on financial numerical reasoning datasets such as FinQA and ConvFinQA. In this work, we introduce FINDER, a novel two-step framework, to enhance LLMs’ capabilities in financial numerical reasoning. The first step utilizes a generative retriever to extract relevant facts from unstructured data, including both text and tables. This is followed by context-aware Program of Thought prompting with dynamic selection of in-context examples. Our model FINDER achieves a new state-of-the-art performance on both the FinQA and ConvFinQA datasets, surpassing previous benchmarks with execution accuracy improvements of 5.98% and 4.05%, respectively.

尽管大型语言模型（LLM）的能力持续进步，数值推理仍然是一个具有挑战的领域。链式思维提示、树状思维提示和程序化思维提示等技术通过引导LLM进行中间推理步骤来提高性能。虽然基于上下文的少量提示的学习已经提高了性能，但LLM在金融数值推理数据集（如FinQA和ConvFinQA）上的表现仍然落后于最新模型。在这项工作中，我们引入了FINDER，一个新型的两步框架，以增强LLM在金融数值推理方面的能力。第一步是利用生成式检索器从非结构化数据中提取相关事实，包括文本和表格。然后是上下文感知的程序化思维提示，并动态选择上下文示例。我们的FINDER模型在FinQA和ConvFinQA数据集上达到了新的最新性能水平，分别超越了之前的基准测试，执行准确率提高了5.98%和4.05%。

论文及项目相关链接

PDF This work has been accepted for publication in the Main Conference of the Empirical Methods in Natural Language Processing (EMNLP) 2025

Summary：

尽管大型语言模型（LLM）在持续进步，数值推理仍是其面临的一大挑战。本文通过引入FINDER这一新型两步框架，提高了LLM在财务数值推理方面的能力。首先，利用生成式检索器从非结构化数据中提取相关事实，然后进行上下文感知的“程序思维”提示并动态选择上下文示例。模型在FinQA和ConvFinQA数据集上的性能达到新的领先水平，执行准确率分别提高了5.98%和4.05%。

Key Takeaways：

大型语言模型在数值推理方面仍面临挑战。
FINDER是一个新型框架，分为两步，旨在提高LLM在财务数值推理方面的能力。
生成式检索器可从非结构化数据中提取相关事实。
上下文感知的“程序思维”提示能提高LLM的推理能力。
动态选择上下文示例是FINDER框架的一个重要特点。
FINDER在FinQA和ConvFinQA数据集上的性能达到新的领先水平。

Cool Papers

点此查看论文截图

Few Shot Semi-Supervised Learning for Abnormal Stop Detection from Sparse GPS Trajectories

Authors:Muhammad Ayub Sabir, Junbiao Pang, Jiaqi Wu, Fatima Ashraf

Abnormal stop detection (ASD) in intercity coach transportation is critical for ensuring passenger safety, operational reliability, and regulatory compliance. However, two key challenges hinder ASD effectiveness: sparse GPS trajectories, which obscure short or unauthorized stops, and limited labeled data, which restricts supervised learning. Existing methods often assume dense sampling or regular movement patterns, limiting their applicability. To address data sparsity, we propose a Sparsity-Aware Segmentation (SAS) method that adaptively defines segment boundaries based on local spatial-temporal density. Building upon these segments, we introduce three domain-specific indicators to capture abnormal stop behaviors. To further mitigate the impact of sparsity, we develop Locally Temporal-Indicator Guided Adjustment (LTIGA), which smooths these indicators via local similarity graphs. To overcome label scarcity, we construct a spatial-temporal graph where each segment is a node with LTIGA-refined features. We apply label propagation to expand weak supervision across the graph, followed by a GCN to learn relational patterns. A final self-training module incorporates high-confidence pseudo-labels to iteratively improve predictions. Experiments on real-world coach data show an AUC of 0.854 and AP of 0.866 using only 10 labeled instances, outperforming prior methods. The code and dataset are publicly available at \href{https://github.com/pangjunbiao/Abnormal-Stop-Detection-SSL.git}

城际客车运输中的异常停车检测（ASD）对于确保乘客安全、运营可靠性和法规合规性至关重要。然而，存在两个关键挑战阻碍了ASD的有效性：稀疏的GPS轨迹，这掩盖了短暂或未经授权的停车；以及有限的有标签数据，这限制了监督学习。现有方法通常假设密集采样或规律的运动模式，从而限制了其适用性。为了解决数据稀疏问题，我们提出了一种稀疏感知分割（SAS）方法，该方法可以基于局部时空密度自适应地定义段边界。在这些分段的基础上，我们引入了三个领域特定的指标来捕捉异常停车行为。为了减轻稀疏性造成的影响，我们开发了局部时间指标引导调整（LTIGA），通过局部相似图平滑这些指标。为了克服标签稀缺问题，我们构建了一个时空图，其中每个分段都是一个节点，具有LTIGA精炼的特征。我们应用标签传播来在图中扩展弱监督，随后使用GCN来学习关系模式。最终的自我训练模块结合了高置信度的伪标签，以迭代地改进预测。在现实世界的客车数据上的实验表明，仅使用10个有标签的实例，AUC为0.854，AP为0.866，优于先前的方法。代码和数据集已在[\url{https://github.com/pangjunbiao/Abnormal-Stop-Detection-SSL.git} ]上公开提供。

论文及项目相关链接

PDF

摘要

城际客车运输中的异常停车检测（ASD）对于确保乘客安全、运营可靠性和法规合规性至关重要。然而，存在两个关键挑战：稀疏的GPS轨迹，掩盖了短暂或未经授权的停车；以及有限的有标签数据，限制了监督学习。为应对数据稀疏问题，我们提出了一种稀疏感知分段（SAS）方法，该方法可基于局部时空密度自适应地定义分段边界。在此基础上，我们引入三个领域特定的指标来捕捉异常停车行为。为进一步减轻稀疏性的影响，我们开发了局部时间指标引导调整（LTIGA），通过局部相似图平滑这些指标。为解决标签稀缺问题，我们构建了一个时空图，每个分段都是一个带有LTIGA优化特征的节点。我们应用标签传播在图中扩展弱监督，随后使用GCN学习关系模式。最后的自训练模块结合高置信度的伪标签，以迭代方式改进预测。在真实客车数据上的实验表明，仅使用10个有标签实例的AUC为0.854，AP为0.866，优于先前的方法。

关键见解

异常停车检测在城际客车运输中至关重要。
数据稀疏和标签稀缺是异常停车检测面临的主要挑战。
提出了一种稀疏感知分段（SAS）方法，自适应地处理数据稀疏问题。
引入三个领域特定指标来捕捉异常停车行为。
开发局部时间指标引导调整（LTIGA）以平滑指标。
构建时空图并使用标签传播和GCN来扩展弱监督并学习关系模式。
通过自训练模块结合高置信度的伪标签提高预测性能，实验结果表明该方法优于先前的方法。

Cool Papers

点此查看论文截图

CoRA: Covariate-Aware Adaptation of Time Series Foundation Models

Authors:Guo Qin, Zhi Chen, Yong Liu, Zhiyuan Shi, Haixuan Liu, Xiangdong Huang, Jianmin Wang, Mingsheng Long

Time Series Foundation Models (TSFMs) have shown significant impact through their model capacity, scalability, and zero-shot generalization. However, due to the heterogeneity of inter-variate dependencies and the backbone scalability on large-scale multivariate datasets, most TSFMs are typically pre-trained on univariate time series. This limitation renders them oblivious to crucial information from diverse covariates in real-world forecasting tasks. To further enhance the performance of TSFMs, we propose a general covariate-aware adaptation (CoRA) framework for TSFMs. It leverages pre-trained backbones of foundation models while effectively incorporating exogenous covariates from various modalities, including time series, language, and images, to improve the quality of predictions. Technically, CoRA maintains the equivalence of initialization and parameter consistency during adaptation. With preserved backbones of foundation models as frozen feature extractors, the outcome embeddings from foundation models are empirically demonstrated more informative than raw data. Further, CoRA employs a novel Granger Causality Embedding (GCE) to automatically evaluate covariates regarding their causal predictability with respect to the target variate. We incorporate these weighted embeddings with a zero-initialized condition-injection mechanism, avoiding catastrophic forgetting of pre-trained foundation models and gradually integrates exogenous information. Extensive experiments show that CoRA of TSFMs surpasses state-of-the-art covariate-aware deep forecasters with full or few-shot training samples, achieving 31.1% MSE reduction on covariate-aware forecasting. Compared to other adaptation methods, CoRA exhibits strong compatibility with various advanced TSFMs and extends the scope of covariates to other modalities, presenting a practical paradigm for the application of TSFMs.

时间序列基础模型（TSFMs）通过其模型容量、可扩展性和零射泛化能力产生了显著影响。然而，由于变量间依赖性的异质性和大规模多元数据集在主干上的可扩展性，大多数TSFM通常是在单变量时间序列上进行预训练的。这一限制使得它们无法注意到真实世界预测任务中来自不同协变量的关键信息。为了进一步提高TSFM的性能，我们为TSFM提出了一个通用的协变量感知适应（CoRA）框架。它利用基础模型的预训练主干，同时有效地结合了来自各种模态的外源协变量，包括时间序列、语言和图像，以提高预测的质量。技术上，CoRA在适应过程中保持初始化和参数的一致性。基础模型的保留主干作为冻结的特征提取器，基础模型的输出嵌入被证明比原始数据更具信息性。此外，CoRA采用新颖的格兰杰因果嵌入（GCE）自动评估协变量与目标变量之间的因果可预测性。我们将这些加权嵌入与零初始化条件注入机制相结合，避免了预训练基础模型的灾难性遗忘，并逐步整合了外来信息。大量实验表明，TSFM的CoRA超越了具有全数据或少量训练样本的先进协变量感知深度预测器，在协变量感知预测上实现了31.1%的MSE降低。与其他适应方法相比，CoRA与各种先进TSFM具有很强的兼容性，并将协变量的范围扩展到其他模态，为TSFM的应用提供了一个实用的范例。

论文及项目相关链接

PDF

Summary

本文介绍了时间序列表征模型（TSFMs）在多元时间序列预测中的局限性，并提出了一个通用的协变量感知适应（CoRA）框架来增强TSFMs的性能。CoRA框架利用预训练的模型骨架，同时有效地结合来自各种模态的外源性协变量，包括时间序列、语言和图像，以提高预测质量。通过保持初始化和参数一致性，CoRA在适应过程中避免了灾难性遗忘，并通过嵌入Granger因果嵌入（GCE）自动评估协变量与目标变量之间的因果预测关系。实验表明，CoRA在协变量感知预测方面优于其他先进的深度预测器，并且在有限的训练样本下也能实现出色的性能。

Key Takeaways

时间序列表征模型（TSFMs）在多元时间序列预测方面存在局限性，主要预训练于单变量时间序列，忽略多样协变量的信息。
提出CoRA框架，利用预训练模型骨架并融入外源性协变量，提高预测质量。
CoRA通过维护初始化和参数一致性，在适应过程中避免灾难性遗忘。
CoRA结合Granger因果嵌入（GCE）评估协变量的预测贡献。
CoRA框架与其他先进的TSFMs兼容性强，并能将协变量的应用范围扩展到其他模态。
实验显示CoRA在协变量感知预测方面优于其他方法，实现了显著的均方误差（MSE）降低。

Cool Papers

点此查看论文截图

Graph Few-Shot Learning via Adaptive Spectrum Experts and Cross-Set Distribution Calibration

Authors:Yonghao Liu, Yajun Wang, Chunli Guo, Wei Pang, Ximing Li, Fausto Giunchiglia, Xiaoyue Feng, Renchu Guan

Graph few-shot learning has attracted increasing attention due to its ability to rapidly adapt models to new tasks with only limited labeled nodes. Despite the remarkable progress made by existing graph few-shot learning methods, several key limitations remain. First, most current approaches rely on predefined and unified graph filters (e.g., low-pass or high-pass filters) to globally enhance or suppress node frequency signals. Such fixed spectral operations fail to account for the heterogeneity of local topological structures inherent in real-world graphs. Moreover, these methods often assume that the support and query sets are drawn from the same distribution. However, under few-shot conditions, the limited labeled data in the support set may not sufficiently capture the complex distribution of the query set, leading to suboptimal generalization. To address these challenges, we propose GRACE, a novel Graph few-shot leaRning framework that integrates Adaptive spectrum experts with Cross-sEt distribution calibration techniques. Theoretically, the proposed approach enhances model generalization by adapting to both local structural variations and cross-set distribution calibration. Empirically, GRACE consistently outperforms state-of-the-art baselines across a wide range of experimental settings. Our code can be found here.

图少数样本学习因其仅利用有限标记节点就能迅速适应新任务的能力而越来越受到关注。尽管现有的图少数样本学习方法取得了显著的进展，但仍存在一些关键局限性。首先，大多数当前方法依赖于预定义和统一的图滤波器（例如低通或高通滤波器）来全局增强或抑制节点频率信号。这种固定的谱操作未能考虑到真实世界图中固有的局部拓扑结构的异质性。此外，这些方法通常假设支撑集和查询集来自同一分布。然而，在少数样本条件下，支撑集中有限的标记数据可能无法充分捕捉查询集的复杂分布，导致次优泛化。为了解决这些挑战，我们提出了GRACE，这是一种新的图少数样本学习框架，融合了自适应谱专家与跨集分布校准技术。理论上，该方法通过适应局部结构变化和跨集分布校准，提高了模型的泛化能力。经验上，GRACE在广泛的实验设置下始终优于最新基线。我们的代码可在此处找到。

论文及项目相关链接

PDF NeurIPS25

Summary
图少样本学习因能快速适应新任务且只需有限标记节点而备受关注。现有方法虽有所进展，但仍存在关键局限。多数方法依赖预设的统一图滤波器进行全局增强或抑制节点频率信号，忽略了真实图中局部拓扑结构的异质性。此外，这些方法常假设支持集和查询集来自同一分布，但在少样本条件下，支持集中有限的标记数据可能不足以捕捉查询集的复杂分布，导致次优泛化。为应对这些挑战，我们提出GRACE，一个整合自适应频谱专家与跨集分布校准技术的图少样本学习框架。理论上，该方法通过适应局部结构变化和跨集分布校准提高模型泛化能力。经验上，GRACE在广泛实验设置下始终优于最新基线。

Key Takeaways

图少样本学习能迅速适应新任务，只需有限标记节点。
现有图少样本学习方法存在依赖预设图滤波器的局限，忽视真实图中局部拓扑结构的异质性。
多数方法假设支持集和查询集来自同一分布，但在少样本条件下这一假设可能不成立。
GRACE是一个新的图少样本学习框架，整合自适应频谱专家与跨集分布校准技术。
GRACE通过适应局部结构变化和跨集分布校准提高模型泛化能力。
GRACE在广泛实验设置下性能优于最新基线。
GRACE的代码可公开获取。

Cool Papers

点此查看论文截图

Can Representation Gaps Be the Key to Enhancing Robustness in Graph-Text Alignment?

Authors:Heng Zhang, Tianyi Zhang, Yuling Shi, Xiaodong Gu, Yaomin Shen, Zijian Zhang, Yilei Yuan, Hao Zhang, Jin Huang

Representation learning on text-attributed graphs (TAGs) integrates structural connectivity with rich textual semantics, enabling applications in diverse domains. Current methods largely rely on contrastive learning to maximize cross-modal similarity, assuming tighter coupling between graph and text representations improves transfer performance. However, our empirical analysis reveals that both natural gap expansion and forced gap reduction result in performance degradation by disrupting pre-trained knowledge structures and impairing generalization. This arises from the geometric incompatibility between encoders, where graph encoders capture topological patterns, while text encoders capture semantic structures. Over-alignment compresses these distinct spaces into shared subspaces, causing structure collapse that diminishes both topological reasoning and semantic understanding. We propose \textbf{LLM4GTA}, a gap-aware alignment framework that preserves representation gaps as geometric necessities for maintaining modality-specific knowledge and improving transfer performance. LLM4GTA includes an adaptive gap preservation module to prevent over-alignment by monitoring similarity evolution and an intra-modal compensation mechanism that boosts discriminative power using auxiliary classifiers in graph space. Extensive experiments show significant improvements over existing methods in zero-shot and few-shot scenarios.

文本属性图（TAG）上的表示学习结合了结构连通性和丰富的文本语义，为各个领域的应用提供了可能。当前的方法大多依赖于对比学习，以最大化跨模态相似性为假设，认为图表示和文本表示之间的紧密耦合可以提高迁移性能。然而，我们的经验分析表明，无论是自然的间隙扩张还是强制的间隙减少都会导致性能下降，因为它们破坏了预训练的知识结构并损害了泛化能力。这是因为编码器之间的几何不兼容，其中图编码器捕获拓扑模式，而文本编码器捕获语义结构。过度对齐将这些不同的空间压缩到共享的子空间中，导致结构崩溃，既损害了拓扑推理又损害了语义理解。我们提出了LLM4GTA，一个间隙感知对齐框架，它保留了表示间隙作为保持特定模态知识和提高迁移性能的几何必要。LLM4GTA包括自适应间隙保留模块，通过监控相似性演变来防止过度对齐，以及一种增强鉴别力的内部模态补偿机制，使用图空间中的辅助分类器来提高鉴别力。大量实验表明，在零样本和少样本场景下，相较于现有方法有明显的改进。

论文及项目相关链接

PDF

Summary

文本属性图（TAG）的表示学习结合了结构连通性和丰富的文本语义，可应用于多个领域。现有方法主要依赖对比学习来最大化跨模态相似性，假设图与文本表示之间的紧密耦合能提高迁移性能。然而，我们的实证分析发现，自然间隙扩张和强制间隙缩减都会导致性能下降，因为它们破坏了预训练的知识结构并影响了泛化。这是因为编码器之间的几何不兼容，其中图编码器捕获拓扑模式，而文本编码器捕获语义结构。过度对齐将这些不同的空间压缩成共享子空间，导致结构崩溃，既降低了拓扑推理也降低了语义理解。我们提出了LLM4GTA，一个间隙感知对齐框架，保持表示间隙作为保持模态特定知识和提高迁移性能的几何必要。LLM4GTA包括自适应间隙保留模块，通过监控相似性演变来防止过度对齐，以及内部模态补偿机制，通过利用图空间中的辅助分类器来提高鉴别力。大量实验表明，与现有方法相比，零样本和少样本场景下有显著改进。

Key Takeaways

文本属性图（TAG）的表示学习结合了结构连通性和丰富的文本语义。
现有方法主要依赖对比学习并假设紧密耦合能提高迁移性能，但自然间隙扩张和强制间隙缩减可能导致性能下降。
间隙感知对齐框架（LLM4GTA）旨在保持表示间隙以提高迁移性能。
LLM4GTA包括自适应间隙保留模块和内部模态补偿机制。
自适应间隙保留模块通过监控相似性演变防止过度对齐。
内部模态补偿机制利用图空间中的辅助分类器提高鉴别力。

Cool Papers

点此查看论文截图

Connecting Giants: Synergistic Knowledge Transfer of Large Multimodal Models for Few-Shot Learning

Authors:Hao Tang, Shengfeng He, Jing Qin

Few-shot learning (FSL) addresses the challenge of classifying novel classes with limited training samples. While some methods leverage semantic knowledge from smaller-scale models to mitigate data scarcity, these approaches often introduce noise and bias due to the data’s inherent simplicity. In this paper, we propose a novel framework, Synergistic Knowledge Transfer (SynTrans), which effectively transfers diverse and complementary knowledge from large multimodal models to empower the off-the-shelf few-shot learner. Specifically, SynTrans employs CLIP as a robust teacher and uses a few-shot vision encoder as a weak student, distilling semantic-aligned visual knowledge via an unsupervised proxy task. Subsequently, a training-free synergistic knowledge mining module facilitates collaboration among large multimodal models to extract high-quality semantic knowledge. Building upon this, a visual-semantic bridging module enables bi-directional knowledge transfer between visual and semantic spaces, transforming explicit visual and implicit semantic knowledge into category-specific classifier weights. Finally, SynTrans introduces a visual weight generator and a semantic weight reconstructor to adaptively construct optimal multimodal FSL classifiers. Experimental results on four FSL datasets demonstrate that SynTrans, even when paired with a simple few-shot vision encoder, significantly outperforms current state-of-the-art methods.

少量样本学习（FSL）旨在解决用有限训练样本对新型类别进行分类的挑战。虽然一些方法利用小规模模型的语义知识来缓解数据稀缺的问题，但这些方法由于数据的固有简单性而经常引入噪声和偏见。在本文中，我们提出了一种新的框架，即协同知识转移（SynTrans），它可以从大型多模态模型中有效地转移多样且互补的知识，以增强现有的少量样本学习者。具体来说，SynTrans采用CLIP作为稳健的教师，并使用少量样本视觉编码器作为弱势学生，通过无监督的代理任务蒸馏语义对齐的视觉知识。随后，一个无需训练即可使用的协同知识挖掘模块有助于大型多模态模型之间的合作，以提取高质量语义知识。在此基础上，视觉语义桥梁模块实现了视觉和语义空间之间的双向知识转移，将明确的视觉知识和隐含的语义知识转化为特定的类别分类器权重。最后，SynTrans引入了视觉权重生成器和语义权重重建器，以自适应地构建最优的多模态FSL分类器。在四个FSL数据集上的实验结果表明，即使与简单的少量样本视觉编码器配对，SynTrans也显著优于当前最先进的方法。

论文及项目相关链接

PDF Accepted by IJCAI 2025

Summary
少量样本学习面临样本不足的挑战，本论文提出一种名为SynTrans的新框架，它有效地从大模型中转移多样且互补的知识来解决这个问题。SynTrans利用CLIP作为强大的教师模型，使用少量样本的视觉编码器作为弱学生模型，通过无监督代理任务蒸馏语义对齐的视觉知识。此外，它还引入了一个训练协同知识挖掘模块来提取高质量语义知识，并在视觉语义桥梁模块的帮助下实现双向知识转移。最后，实验证明SynTrans即使在搭配简单的少量样本视觉编码器时也能显著超越当前最先进的方法。

Key Takeaways

Few-shot learning (FSL)面临样本不足的挑战。
提出了一种名为SynTrans的新框架，用于解决该问题。
SynTrans利用CLIP模型作为教师模型。
利用弱学生模型来处理样本数据，避免噪声和偏见问题。
通过无监督代理任务实现知识的转移和蒸馏。
训练协同知识挖掘模块帮助提取高质量语义知识。

Cool Papers

点此查看论文截图

FusionGen: Feature Fusion-Based Few-Shot EEG Data Generation

Authors:Yuheng Chen, Dingkun Liu, Xinyao Yang, Xinping Xu, Baicheng Chen, Dongrui Wu

Brain-computer interfaces (BCIs) provide potential for applications ranging from medical rehabilitation to cognitive state assessment by establishing direct communication pathways between the brain and external devices via electroencephalography (EEG). However, EEG-based BCIs are severely constrained by data scarcity and significant inter-subject variability, which hinder the generalization and applicability of EEG decoding models in practical settings. To address these challenges, we propose FusionGen, a novel EEG data generation framework based on disentangled representation learning and feature fusion. By integrating features across trials through a feature matching fusion module and combining them with a lightweight feature extraction and reconstruction pipeline, FusionGen ensures both data diversity and trainability under limited data constraints. Extensive experiments on multiple publicly available EEG datasets demonstrate that FusionGen significantly outperforms existing augmentation techniques, yielding notable improvements in classification accuracy.

脑机接口（BCIs）通过脑电图（EEG）建立大脑与外部设备之间的直接通信路径，在医疗康复到认知状态评估等领域具有广泛的应用潜力。然而，基于EEG的BCIs受到数据稀缺和显著个体差异的严重制约，这阻碍了EEG解码模型在实际环境中的通用性和适用性。为了解决这些挑战，我们提出了FusionGen，这是一个基于解纠缠表示学习和特征融合的新型EEG数据生成框架。它通过特征匹配融合模块整合试验间的特征，并结合轻量级特征提取和重建管道，确保在有限的数据约束下实现数据多样性和可训练性。在多个公开可用的EEG数据集上的广泛实验表明，FusionGen显著优于现有的增强技术，在分类精度方面取得了显著的改进。

论文及项目相关链接

PDF

Summary

脑机接口（BCIs）通过脑电图（EEG）建立大脑和外部设备之间的直接通信路径，在医疗康复和认知状态评估等领域具有广泛应用潜力。然而，EEG-based BCIs面临数据稀缺和个体间显著差异的挑战，限制了EEG解码模型在实际场景中的通用性和适用性。为解决这些问题，提出一种基于解纠缠表示学习和特征融合的EEG数据生成框架FusionGen，通过跨试验特征融合模块和轻量级特征提取与重建管道，确保在有限数据约束下的数据多样性和可训练性。在多个公开EEG数据集上的实验表明，FusionGen显著优于现有增强技术，分类精度得到明显提高。

Key Takeaways

脑机接口（BCIs）通过脑电图（EEG）建立大脑与外部设备的直接通信。
EEG-based BCIs面临数据稀缺和个体间差异的挑战。
FusionGen是一种基于解纠缠表示学习和特征融合的EEG数据生成框架。
FusionGen通过跨试验特征融合模块确保数据多样性和可训练性。
FusionGen在有限数据约束下表现优越。
FusionGen在多个公开EEG数据集上的实验分类精度显著提高。

Cool Papers

点此查看论文截图

Opacity-Gradient Driven Density Control for Compact and Efficient Few-Shot 3D Gaussian Splatting

Authors:Abdelrhman Elrawy, Emad A. Mohammed

3D Gaussian Splatting (3DGS) struggles in few-shot scenarios, where its standard adaptive density control (ADC) can lead to overfitting and bloated reconstructions. While state-of-the-art methods like FSGS improve quality, they often do so by significantly increasing the primitive count. This paper presents a framework that revises the core 3DGS optimization to prioritize efficiency. We replace the standard positional gradient heuristic with a novel densification trigger that uses the opacity gradient as a lightweight proxy for rendering error. We find this aggressive densification is only effective when paired with a more conservative pruning schedule, which prevents destructive optimization cycles. Combined with a standard depth-correlation loss for geometric guidance, our framework demonstrates a fundamental improvement in efficiency. On the 3-view LLFF dataset, our model is over 40% more compact (32k vs. 57k primitives) than FSGS, and on the Mip-NeRF 360 dataset, it achieves a reduction of approximately 70%. This dramatic gain in compactness is achieved with a modest trade-off in reconstruction metrics, establishing a new state-of-the-art on the quality-vs-efficiency Pareto frontier for few-shot view synthesis.

3D高斯平铺（3DGS）在少量样本场景中面临挑战，其标准的自适应密度控制（ADC）可能导致过拟合和冗余的重建。虽然最新的方法如FSGS提高了质量，但它们通常是通过大幅增加基本单位数量来实现的。本文提出了一个修订的3DGS优化框架，以效率为优先。我们用一个新颖的致密化触发器替换标准的位置梯度启发式方法，使用透明度梯度作为渲染错误的轻量级代理。我们发现这种积极的致密化只有在与更保守的修剪计划相结合时才有效，可以防止破坏性的优化循环。结合用于几何引导的深度相关性损失标准，我们的框架在效率上实现了根本性的改进。在LLFF三视图数据集上，我们的模型比FSGS更紧凑（使用32k个基本单位对比FSGS的57k个基本单位），达到了超过40%的紧凑度提升；在Mip-NeRF 360数据集上，实现了大约70%的减少。这种显著的紧凑性提升是在适度牺牲重建指标的情况下实现的，在少样本视图合成的质量与效率帕累托前沿上建立了新的技术领先地位。

论文及项目相关链接

PDF

Summary

本文提出了一种针对三维高斯点云模型（3DGS）的优化框架，用于解决其在小样本场景下的不足。该研究通过引入一种新的密度控制策略，利用透明度梯度作为渲染误差的轻量级代理，优化了模型在渲染效率和准确性之间的权衡。通过实现更为高效的优化策略与先进的保真度方法结合，最终使模型能在保真度和计算效率上取得了优秀的成绩。

Key Takeaways

针对三维高斯点云模型（3DGS）在少样本场景下的不足，提出了一种新的优化框架。
研究使用透明度梯度作为轻量级代理来衡量渲染误差，优化模型密度控制。
提出一种更保守的修剪策略，避免破坏性的优化循环。

Cool Papers

点此查看论文截图

Preference-driven Knowledge Distillation for Few-shot Node Classification

Authors:Xing Wei, Chunchun Chen, Rui Fan, Xiaofeng Cao, Sourav Medya, Wei Ye

Graph neural networks (GNNs) can efficiently process text-attributed graphs (TAGs) due to their message-passing mechanisms, but their training heavily relies on the human-annotated labels. Moreover, the complex and diverse local topologies of nodes of real-world TAGs make it challenging for a single mechanism to handle. Large language models (LLMs) perform well in zero-/few-shot learning on TAGs but suffer from a scalability challenge. Therefore, we propose a preference-driven knowledge distillation (PKD) framework to synergize the complementary strengths of LLMs and various GNNs for few-shot node classification. Specifically, we develop a GNN-preference-driven node selector that effectively promotes prediction distillation from LLMs to teacher GNNs. To further tackle nodes’ intricate local topologies, we develop a node-preference-driven GNN selector that identifies the most suitable teacher GNN for each node, thereby facilitating tailored knowledge distillation from teacher GNNs to the student GNN. Extensive experiments validate the efficacy of our proposed framework in few-shot node classification on real-world TAGs.

图神经网络（GNNs）由于其消息传递机制，能够有效地处理文本属性图（TAGs）。但是，它们的训练严重依赖于人工标注的标签。此外，现实世界中的TAG节点的复杂和多样化的局部拓扑结构使得单一机制处理起来具有挑战性。大型语言模型（LLMs）在TAG的零/少镜头学习上表现良好，但面临可扩展性挑战。因此，我们提出了一个偏好驱动的知识蒸馏（PKD）框架，以协同LLMs和各种GNNs的互补优势，用于少镜头节点分类。具体来说，我们开发了一个GNN偏好驱动节点选择器，有效地促进了从LLMs到教师GNNs的预测蒸馏。为了进一步解决节点的复杂局部拓扑问题，我们开发了一个节点偏好驱动的GNN选择器，为每个节点确定最合适的教师GNN，从而促进从教师GNNs到学生GNN的知识蒸馏。大量实验验证了我们的框架在现实世界的TAG少镜头节点分类中的有效性。

论文及项目相关链接

PDF Accepted at NeurIPS 2025

Summary
本文提出一种偏好驱动的知识蒸馏（PKD）框架，旨在结合大型语言模型（LLMs）和图神经网络（GNNs）的优势，用于少样本节点分类。通过GNN偏好驱动节点选择器和节点偏好驱动GNN选择器，促进预测从LLMs到教师GNNs的蒸馏，并解决了节点复杂局部拓扑的问题。

Key Takeaways

GNNs能高效处理文本属性图（TAGs），但其训练依赖于人工标注的标签。
真实世界的TAGs节点的复杂和多样局部拓扑给单一机制处理带来挑战。
大型语言模型（LLMs）在零/少样本学习对TAGs表现良好，但存在可扩展性挑战。
提出偏好驱动的知识蒸馏（PKD）框架，结合LLMs和GNNs的优势进行少样本节点分类。
开发GNN偏好驱动节点选择器，有效促进从LLMs到教师GNNs的预测蒸馏。
为应对节点的复杂局部拓扑，开发节点偏好驱动GNN选择器，为每个节点识别最合适的教师GNN。

Cool Papers

点此查看论文截图

FSP-DETR: Few-Shot Prototypical Parasitic Ova Detection

Authors:Shubham Trehan, Udhav Ramachandran, Akash Rao, Ruth Scimeca, Sathyanarayanan N. Aakur

Object detection in biomedical settings is fundamentally constrained by the scarcity of labeled data and the frequent emergence of novel or rare categories. We present FSP-DETR, a unified detection framework that enables robust few-shot detection, open-set recognition, and generalization to unseen biomedical tasks within a single model. Built upon a class-agnostic DETR backbone, our approach constructs class prototypes from original support images and learns an embedding space using augmented views and a lightweight transformer decoder. Training jointly optimizes a prototype matching loss, an alignment-based separation loss, and a KL divergence regularization to improve discriminative feature learning and calibration under scarce supervision. Unlike prior work that tackles these tasks in isolation, FSP-DETR enables inference-time flexibility to support unseen class recognition, background rejection, and cross-task adaptation without retraining. We also introduce a new ova species detection benchmark with 20 parasite classes and establish standardized evaluation protocols. Extensive experiments across ova, blood cell, and malaria detection tasks demonstrate that FSP-DETR significantly outperforms prior few-shot and prototype-based detectors, especially in low-shot and open-set scenarios.

生物医学环境中的目标检测受到标签数据稀缺和新型或罕见类别频繁出现等根本性约束。我们提出了FSP-DETR，这是一个统一的检测框架，能够在单个模型内实现稳健的少量样本检测、开放集识别和未见过生物医学任务的泛化。我们的方法建立在类无关的DETR主干网络上，通过构建来自原始支持图像的类原型并使用增强视图和轻量级转换器解码器来学习嵌入空间。训练通过联合优化原型匹配损失、基于对齐的分离损失和KL散度正则化，以提高判别特征的学习和稀缺监督下的校准。与以前分别处理这些任务的工作不同，FSP-DETR支持未见类别识别、背景排斥和跨任务适应的推理时间灵活性，无需重新训练。我们还引入了一个新的卵物种检测基准测试，包括20个寄生虫类别，并建立标准化的评估协议。在卵、血细胞和疟疾检测任务上的大量实验表明，FSP-DETR在少量样本和开放集场景中显著优于先前的少量样本和基于原型的检测器。

论文及项目相关链接

PDF 10 pages, 3 Figures, 5 Tables. Under Review

Summary

FSP-DETR是一个用于生物医学图像中的目标检测的框架，支持少样本检测、开放集识别和未见过任务的一般化。该框架基于DETR构建，通过支持图像构建类原型，并使用增强视图和轻量级转换器解码器学习嵌入空间。通过联合优化原型匹配损失、基于对齐的分离损失和KL散度正则化，提高判别特征学习和稀缺监督下的校准。FSP-DETR支持未见类识别、背景拒绝和跨任务适应，无需重新训练。在卵、血细胞及疟疾检测任务上的实验表明，FSP-DETR显著优于先前的少样本和基于原型的检测器，尤其在低样本和开放集场景中。

Key Takeaways

FSP-DETR是一个统一的检测框架，支持少样本检测、开放集识别和未见过任务的一般化。
基于DETR构建，利用支持图像构建类原型并学习嵌入空间。
通过原型匹配损失、基于对齐的分离损失和KL散度正则化的联合优化，提高判别特征学习和校准。
与之前的研究不同，FSP-DETR可以在推理时支持未见类识别、背景拒绝和跨任务适应，无需重新训练。
引入了一个新的卵物种检测基准测试，包括20个寄生虫类别，并建立标准化评估协议。
在多个检测任务上的实验表明，FSP-DETR显著优于其他方法，特别是在低样本和开放集场景中。
FSP-DETR框架对于解决生物医学图像中的目标检测问题具有重要的实际应用价值。

Cool Papers

点此查看论文截图

Higher-order interactions of multi-layer prompt

Authors:Ziyu Zheng, Yaming Yang, Ziyu Guan, Wei Zhao, Xinyan Huang, Weigang Lu

The “pre-train, prompt” paradigm has successfully evolved in representation learning. While current prompt-tuning methods often introduce learnable prompts, they predominantly treat prompts as isolated, independent components across different network layers. This overlooks the complex and synergistic higher-order interactions that exist between prompts at various hierarchical depths, consequently limiting the expressive power and semantic richness of the prompted model. To address this fundamental gap, we propose a novel framework that explicitly models the Higher-order Interactions of Multi-layer Prompt. Our approach conceptualizes prompts from different layers not as separate entities, but as a cohesive system where their inter-relationships are critical. We design an innovative interaction module that captures these sophisticated, non-linear correlations among multi-layer prompts, effectively modeling their cooperative effects. This allows the model to dynamically aggregate and refine prompt information across the network’s depth, leading to a more integrated and powerful prompting strategy. Extensive experiments on eight benchmark datasets demonstrate that our method, by leveraging these higher-order interactions, consistently surpasses state-of-the-art prompt-tuning baselines. The performance advantage is particularly pronounced in few-shot scenarios, validating that capturing the intricate interplay between multi-layer prompts is key to unlocking more robust and generalizable representation learning.

“预训练提示”范式在表征学习中已经成功发展。虽然当前的提示调整方法经常引入可学习的提示，但它们主要将提示视为不同网络层中孤立的独立组件。这忽视了提示之间在各种层次深度上存在的复杂且协同的更高阶交互，从而限制了提示模型的表现力和语义丰富性。为了弥补这一基本差距，我们提出了一个明确建模多层提示高阶交互的新框架。我们的方法从不同层面概念化提示，并不将其视为独立实体，而是作为一个紧密联系的系统，其相互关系至关重要。我们设计了一个创新性的交互模块，可以捕捉多层提示之间复杂、非线性的关联，有效地建模它们的协同作用。这允许模型动态地聚合和精炼网络深度中的提示信息，从而实现更集成、更强大的提示策略。在八个基准数据集上的大量实验表明，我们的方法通过利用这些高阶交互，始终超越最新的提示调整基线。性能优势在少量样本场景中尤为突出，验证了捕捉多层提示之间的微妙相互作用是解锁更稳健和可泛化的表征学习的关键。

论文及项目相关链接

PDF under review

Summary

本文介绍了“预训练，提示”范式在表示学习中的成功应用。针对当前提示调整方法忽略多层提示之间复杂协同的高阶交互问题，提出了一种新的框架，显式地建模多层提示的高阶交互。通过设计一个创新性的交互模块，捕捉多层提示之间的复杂非线性关联，有效地模拟它们的合作效应。在八个基准数据集上的实验表明，该方法利用这些高阶交互，始终超越最先进的提示调整基线，特别是在少样本场景下性能优势更为明显，验证了捕捉多层提示之间的微妙互动是解锁更稳健和可泛化的表示学习的关键。

Key Takeaways

当前提示调整方法主要将提示视为独立组件，忽略了不同网络层之间的复杂高阶交互。
提出的框架显式地建模多层提示的高阶交互，将不同层的提示视为一个紧密系统。
创新性的交互模块有效地捕捉多层提示之间的复杂非线性关联。
该方法通过利用高阶交互，在多个基准数据集上表现优于最新的提示调整技术。
在少样本场景下，该方法的性能优势更为明显。
捕捉多层提示之间的微妙互动是解锁更稳健和可泛化的表示学习的关键。

Cool Papers

点此查看论文截图

Diffusion-Classifier Synergy: Reward-Aligned Learning via Mutual Boosting Loop for FSCIL

Authors:Ruitao Wu, Yifan Zhao, Guangyao Chen, Jia Li

Few-Shot Class-Incremental Learning (FSCIL) challenges models to sequentially learn new classes from minimal examples without forgetting prior knowledge, a task complicated by the stability-plasticity dilemma and data scarcity. Current FSCIL methods often struggle with generalization due to their reliance on limited datasets. While diffusion models offer a path for data augmentation, their direct application can lead to semantic misalignment or ineffective guidance. This paper introduces Diffusion-Classifier Synergy (DCS), a novel framework that establishes a mutual boosting loop between diffusion model and FSCIL classifier. DCS utilizes a reward-aligned learning strategy, where a dynamic, multi-faceted reward function derived from the classifier’s state directs the diffusion model. This reward system operates at two levels: the feature level ensures semantic coherence and diversity using prototype-anchored maximum mean discrepancy and dimension-wise variance matching, while the logits level promotes exploratory image generation and enhances inter-class discriminability through confidence recalibration and cross-session confusion-aware mechanisms. This co-evolutionary process, where generated images refine the classifier and an improved classifier state yields better reward signals, demonstrably achieves state-of-the-art performance on FSCIL benchmarks, significantly enhancing both knowledge retention and new class learning.

少量类别增量学习（FSCIL）挑战模型在少量样本中按顺序学习新类别而不遗忘先前知识的能力，这一任务因稳定性与可塑性之间的冲突和数据稀缺而变得复杂。当前的FSCIL方法由于依赖有限数据集而往往面临泛化困难的问题。虽然扩散模型为数据增强提供了途径，但其直接应用可能导致语义不对齐或指导无效。本文介绍了扩散分类协同（DCS）这一新框架，该框架在扩散模型和FSCIL分类器之间建立了相互增强的循环。DCS利用奖励对齐学习策略，其中由分类器的状态派生出的动态、多方面的奖励函数指导扩散模型。这种奖励系统在两个层面上运行：特征层面利用原型锚定的最大均值差异和维度方差匹配确保语义连贯性和多样性；逻辑层面通过置信度再校准和跨会话混淆感知机制促进探索性图像生成并增强类间可区分性。这种协同进化过程，其中生成的图像优化分类器，改进的分类器状态产生更好的奖励信号，在FSCIL基准测试中实现了最先进的性能，显著提高了知识保留和新类别学习的能力。

论文及项目相关链接

PDF Accepted by NeurIPS 2025

Summary

本文提出了Diffusion-Classifier Synergy（DCS）框架，解决了Few-Shot Class-Incremental Learning（FSCIL）中的挑战。DCS建立了一个扩散模型和FSCIL分类器之间的互助循环，利用基于分类器状态的动态多元奖励函数引导扩散模型。DCS在特征级别和logits级别采用奖励系统，确保语义的连贯性和多样性，并促进探索性图像生成，增强类间判别力。该协同进化过程实现了知识保留和新类学习的显著增强，在FSCIL基准测试中达到了最新性能。

Key Takeaways

Diffusion-Classifier Synergy (DCS)框架解决了Few-Shot Class-Incremental Learning (FSCIL)的挑战。
DCS建立了一个扩散模型和FSCIL分类器之间的互助循环。
DCS利用基于分类器状态的动态多元奖励函数来引导扩散模型。
DCS在特征级别采用奖励系统，确保语义的连贯性和多样性。
DCS在logits级别促进探索性图像生成，增强类间判别力。
DCS协同进化过程实现了知识保留和新类学习的显著增强。

Cool Papers

点此查看论文截图

Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning

Authors:Zilun Zhang, Zian Guan, Tiancheng Zhao, Haozhan Shen, Tianyu Li, Yuxiang Cai, Zhonggen Su, Zhaojun Liu, Jianwei Yin, Xiang Li

Referring expression understanding in remote sensing poses unique challenges, as it requires reasoning over complex object-context relationships. While supervised fine-tuning (SFT) on multimodal large language models achieves strong performance with massive labeled datasets, they struggle in data-scarce scenarios, leading to poor generalization. To address this limitation, we propose Geo-R1, a reasoning-centric reinforcement fine-tuning (RFT) paradigm for few-shot geospatial referring. Geo-R1 enforces the model to first generate explicit, interpretable reasoning chains that decompose referring expressions, and then leverage these rationales to localize target objects. This “reason first, then act” process enables the model to make more effective use of limited annotations, enhances generalization, and provides interpretability. We validate Geo-R1 on three carefully designed few-shot geospatial referring benchmarks, where our model consistently and substantially outperforms SFT baselines. It also demonstrates strong cross-dataset generalization, highlighting its robustness. Code and data will be released at: https://github.com/Geo-R1/geo-r1.

遥感中的指代表达式理解带来了独特的挑战，因为它需要推理复杂的对象上下文关系。虽然基于多模态大型语言模型的监督微调（SFT）在大量标记数据集上表现强劲，但在数据稀缺的场景下却表现不佳，导致泛化性能差。为了解决这一局限性，我们提出了Geo-R1，这是一种以推理为中心的强化微调（RFT）范式，用于少数地理空间指代。Geo-R1强制模型首先生成明确、可解释的推理链，对指代表达式进行分解，然后利用这些理性来定位目标对象。这种“先推理，后行动”的过程使模型能够更有效地利用有限的注释，增强泛化能力，并提供可解释性。我们在三个精心设计的少数地理空间指代基准上对Geo-R1进行了验证，我们的模型始终且大幅度地超越了SFT基准测试。它还展示了强大的跨数据集泛化能力，凸显了其稳健性。代码和数据将在https://github.com/Geo-R1/geo-r1上发布。

论文及项目相关链接

PDF

Summary

远程感知中的指代表达式理解面临独特挑战，需要推理复杂对象上下文关系。虽然监督微调（SFT）在多模态大型语言模型上应用时，在大量标记数据集上表现出强大性能，但在数据稀缺场景中却表现不佳，导致泛化能力有限。为解决这一问题，我们提出Geo-R1，一种针对少数地理空间指代的推理中心强化微调（RFT）范式。Geo-R1强制模型首先生成明确、可解释的推理链，分解指代表达式，然后利用这些理性来定位目标对象。这种“先推理，后行动”的过程使模型更有效地利用有限注释，增强了泛化能力，并提供了可解释性。我们在三个精心设计的少数地理空间指代基准测试上验证了Geo-R1，我们的模型始终且大幅度优于SFT基准测试。它还展示了强大的跨数据集泛化能力，凸显了其稳健性。

Key Takeaways

远程感知中的指代表达式理解需要处理复杂的对象上下文关系。
监督微调（SFT）在大型语言模型上虽然性能强大，但在数据稀缺时泛化能力有限。
Geo-R1是一种针对少数地理空间指代的推理中心强化微调（RFT）方法。
Geo-R1通过生成明确、可解释的推理链来分解指代表达式，提高模型的泛化能力和可解释性。
“先推理，后行动”的过程使模型更有效地利用有限注释。
Geo-R1在多个基准测试中表现优异，且显著优于SFT方法。

Cool Papers

点此查看论文截图

SCENEFORGE: Enhancing 3D-text alignment with Structured Scene Compositions

Authors:Cristian Sbrolli, Matteo Matteucci

The whole is greater than the sum of its parts-even in 3D-text contrastive learning. We introduce SceneForge, a novel framework that enhances contrastive alignment between 3D point clouds and text through structured multi-object scene compositions. SceneForge leverages individual 3D shapes to construct multi-object scenes with explicit spatial relations, pairing them with coherent multi-object descriptions refined by a large language model. By augmenting contrastive training with these structured, compositional samples, SceneForge effectively addresses the scarcity of large-scale 3D-text datasets, significantly enriching data complexity and diversity. We systematically investigate critical design elements, such as the optimal number of objects per scene, the proportion of compositional samples in training batches, and scene construction strategies. Extensive experiments demonstrate that SceneForge delivers substantial performance gains across multiple tasks, including zero-shot classification on ModelNet, ScanObjNN, Objaverse-LVIS, and ScanNet, as well as few-shot part segmentation on ShapeNetPart. SceneForge’s compositional augmentations are model-agnostic, consistently improving performance across multiple encoder architectures. Moreover, SceneForge improves 3D visual question answering on ScanQA, generalizes robustly to retrieval scenarios with increasing scene complexity, and showcases spatial reasoning capabilities by adapting spatial configurations to align precisely with textual instructions.

整体大于部分之和，甚至在3D文本对比学习中也是如此。我们推出了SceneForge，这是一个新型框架，通过结构化的多目标场景组合，增强了3D点云和文本之间的对比对齐。SceneForge利用单个3D形状构建具有明确空间关系的多目标场景，将它们与由大型语言模型完善的一致多目标描述配对。通过将这些结构化的组合样本增强对比训练，SceneForge有效地解决了大规模3D文本数据集的稀缺问题，极大地丰富了数据的复杂性和多样性。我们系统地研究了关键的设计元素，如每个场景中的最佳目标数、训练批次中组合样本的比例以及场景构建策略。大量的实验表明，SceneForge在多个任务上实现了显著的性能提升，包括ModelNet、ScanObjNN、Objaverse-LVIS和ScanNet上的零样本分类，以及ShapeNetPart上的少样本部件分割。SceneForge的组合增强是模型无关的，在多种编码器架构上都能提高性能。此外，SceneForge改进了ScanQA的3D视觉问答，能稳健地适应日益复杂的场景检索，并通过适应空间配置来精确符合文本指令，展示了空间推理能力。

论文及项目相关链接

PDF to appear in NeurIPS 2025

Summary

SceneForge框架通过结构化的多对象场景组合，增强了3D点云和文本之间的对比对齐。该框架利用个体3D形状构建具有明确空间关系的多对象场景，并与由大型语言模型精细描述的多对象相结合。通过对比训练，SceneForge有效解决了大规模3D-文本数据集稀缺的问题，极大地丰富了数据的复杂性和多样性。

Key Takeaways

SceneForge是一个新的框架，用于增强3D点云和文本之间的对比对齐，通过结构化的多对象场景组合实现。
该框架利用个体3D形状构建多对象场景，并明确其空间关系，配合大型语言模型描述多对象。
SceneForge通过对比训练，有效解决大规模3D-文本数据集的稀缺问题。
该框架在多个任务上实现显著性能提升，如ModelNet、ScanObjNN、Objaverse-LVIS和ScanNet的零样本分类，以及ShapeNetPart的少数样本部分分割。
SceneForge的组合增强方法是模型无关的，可在多种编码器架构上实现性能改进。
SceneForge在3D视觉问答任务上表现优异，能适应场景复杂性的增加，并展示空间推理能力。

Cool Papers

点此查看论文截图

SPADE: Spatial Transcriptomics and Pathology Alignment Using a Mixture of Data Experts for an Expressive Latent Space

Authors:Ekaterina Redekop, Mara Pleasure, Zichen Wang, Kimberly Flores, Anthony Sisk, William Speier, Corey W. Arnold

The rapid growth of digital pathology and advances in self-supervised deep learning have enabled the development of foundational models for various pathology tasks across diverse diseases. While multimodal approaches integrating diverse data sources have emerged, a critical gap remains in the comprehensive integration of whole-slide images (WSIs) with spatial transcriptomics (ST), which is crucial for capturing critical molecular heterogeneity beyond standard hematoxylin & eosin (H&E) staining. We introduce SPADE, a foundation model that integrates histopathology with ST data to guide image representation learning within a unified framework, in effect creating an ST-informed latent space. SPADE leverages a mixture-of-data experts technique, where experts are created via two-stage imaging feature-space clustering using contrastive learning to learn representations of co-registered WSI patches and gene expression profiles. Pre-trained on the comprehensive HEST-1k dataset, SPADE is evaluated on 20 downstream tasks, demonstrating significantly superior few-shot performance compared to baseline models, highlighting the benefits of integrating morphological and molecular information into one latent space. Code and pretrained weights are available at https://github.com/uclabair/SPADE.

数字病理学的快速发展和自监督深度学习的进步为各种疾病的不同病理任务的模型开发提供了可能。虽然出现了融合多种数据源的多模式方法，但在整合全切片图像（WSI）与空间转录学（ST）方面仍存在关键差距，这对于捕获标准苏木精和伊红（H&E）染色之外的分子异质性至关重要。我们引入了SPADE，这是一个将组织病理学数据与ST数据相结合的基础模型，在一个统一框架内指导图像表示学习，从而创建一个受ST启发的潜在空间。SPADE利用数据混合专家技术，通过两阶段成像特征空间聚类对比学习来创建专家，学习已注册WSI补丁和基因表达谱的表示。在全面的HEST-1k数据集上进行预训练后，SPADE在2w下游任务上进行了评估，与基线模型相比，其少数样本性能表现显著优越，突显了将形态学和分子信息整合到一个潜在空间中的优势。代码和预训练权重可在https://github.com/uclabair/SPADE找到。

论文及项目相关链接

PDF

Summary
数字病理学的快速发展和自监督深度学习的进步，为各种病理任务的基础模型开发提供了可能。该研究引入SPADE模型，整合组织病理学与空间转录组数据，在统一框架下指导图像表示学习，创建了一个受空间转录组信息影响的数据空间。SPADE使用混合数据专家技术，通过对比学习进行两阶段成像特征空间聚类来创建专家模型，学习共注册显微镜图像切片补丁的基因表达谱的表示。在大量下游任务上评估表明，SPADE模型展现出卓越的小样本性能，证明将形态学和分子信息集成到单一潜在空间的优势。代码和预训练权重可在GitHub上找到。

Key Takeaways