发布日期: 2025-11-05

更新日期: 2025-11-27

文章字数: 20k

阅读时长: 81 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-11-05 更新

Patient-Centered Summarization Framework for AI Clinical Summarization: A Mixed-Methods Design

Authors:Maria Lizarazo Jimenez, Ana Gabriela Claros, Kieran Green, David Toro-Tobon, Felipe Larios, Sheena Asthana, Camila Wenczenovicz, Kerly Guevara Maldonado, Luis Vilatuna-Andrango, Cristina Proano-Velez, Satya Sai Sri Bandi, Shubhangi Bagewadi, Megan E. Branda, Misk Al Zahidy, Saturnino Luz, Mirella Lapata, Juan P. Brito, Oscar J. Ponce-Ponte

Large Language Models (LLMs) are increasingly demonstrating the potential to reach human-level performance in generating clinical summaries from patient-clinician conversations. However, these summaries often focus on patients’ biology rather than their preferences, values, wishes, and concerns. To achieve patient-centered care, we propose a new standard for Artificial Intelligence (AI) clinical summarization tasks: Patient-Centered Summaries (PCS). Our objective was to develop a framework to generate PCS that capture patient values and ensure clinical utility and to assess whether current open-source LLMs can achieve human-level performance in this task. We used a mixed-methods process. Two Patient and Public Involvement groups (10 patients and 8 clinicians) in the United Kingdom participated in semi-structured interviews exploring what personal and contextual information should be included in clinical summaries and how it should be structured for clinical use. Findings informed annotation guidelines used by eight clinicians to create gold-standard PCS from 88 atrial fibrillation consultations. Sixteen consultations were used to refine a prompt aligned with the guidelines. Five open-source LLMs (Llama-3.2-3B, Llama-3.1-8B, Mistral-8B, Gemma-3-4B, and Qwen3-8B) generated summaries for 72 consultations using zero-shot and few-shot prompting, evaluated with ROUGE-L, BERTScore, and qualitative metrics. Patients emphasized lifestyle routines, social support, recent stressors, and care values. Clinicians sought concise functional, psychosocial, and emotional context. The best zero-shot performance was achieved by Mistral-8B (ROUGE-L 0.189) and Llama-3.1-8B (BERTScore 0.673); the best few-shot by Llama-3.1-8B (ROUGE-L 0.206, BERTScore 0.683). Completeness and fluency were similar between experts and models, while correctness and patient-centeredness favored human PCS.

大型语言模型（LLM）在生成基于医患对话的临床摘要方面，越来越显示出达到人类水平的潜力。然而，这些摘要往往关注患者的生物学特征，而非患者的偏好、价值观、愿望和关切。为了实现以患者为中心的医疗护理，我们为人工智能（AI）临床摘要任务提出了一项新标准：以患者为中心的摘要（PCS）。我们的目标是开发一个框架来生成捕捉患者价值观、确保临床实用性的PCS，并评估当前的开源LLM是否能在这一任务中达到人类水平的性能。我们采用了混合方法。英国的两个患者和公众参与小组（各包括10名患者和8名临床医生）参与了半结构化访谈，探讨了临床摘要中应包含哪些个人和情境信息，以及如何结构化这些信息以供临床使用。访谈结果指导了由八名临床医生根据结果制定的标注指南，他们利用这些指南创建了基于88次房颤会诊的黄金标准PCS。十六次会诊被用来完善与指南相符的提示。五个开源LLM（Llama-3.2-3B、Llama-3.1-8B、Mistral-8B、Gemma-3-4B和Qwen3-8B）使用零样本和少样本提示技术为七十二次会诊生成摘要，并使用ROUGE-L、BERTScore和定性指标进行评估。患者强调生活方式、社会支持、近期压力因子和护理价值观。临床医生寻求简洁明了的功能性、社会心理和情感背景。零样本情况下表现最好的是Mistral-8B（ROUGE-L得分为0.189）和Llama-3.1-8B（BERTScore得分为0.673）；而在有少量样本的情况下表现最好的仍是Llama-3.1-8B（ROUGE-L得分为0.206，BERTScore得分为0.683）。在专家和模型之间，完整性流畅性相似，但正确性和以患者为中心的观点则更倾向于人类生成的PCS。

论文及项目相关链接

PDF The first two listed authors contributed equally Pages: 21; Figures:2; Tables:3

摘要

大型语言模型在生成以患者为中心的摘要方面展现出潜力，但往往偏重生物学信息而忽视患者偏好、价值观和关切。本研究旨在建立一个新的AI临床摘要标准——患者中心摘要（PCS），旨在捕捉患者价值观和确保临床实用性，并评估现有开源大型语言模型在此任务上是否能达到人类水平表现。通过混合方法，包括患者和公众参与组的半结构化访谈，以及医生创建的金标准患者中心摘要，研究发现患者强调生活方式、社会支持、近期压力以及护理价值观等信息的重要性。对五种开源大型语言模型的评估显示，零样本和少量样本提示下生成的摘要在ROUGE-L和BERTScore评价中有良好表现。总体而言，模型的完整性和流畅性与专家相似，但在正确性和患者中心性方面仍略逊于人类生成的患者中心摘要。

关键见解

大型语言模型在临床摘要生成方面的潜力巨大，但仍需关注患者为中心的护理需求。
患者和医生对于临床摘要中的信息有不同的侧重点，如患者更关注生活方式和社会支持等，医生则寻求简洁的功能性、心理和社会背景信息。
建立新的AI临床摘要标准——患者中心摘要（PCS），以捕捉患者价值观和确保临床实用性。
通过混合方法研究，制定了相应的注解指南并为模型生成摘要提供依据。
五种开源大型语言模型在零样本和少量样本提示下的表现良好，但仍有待提高在正确性和患者中心性方面的表现。
在评估生成的摘要时，ROUGE-L和BERTScore是重要的评价指标。

Cool Papers

点此查看论文截图

Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds

Authors:Wu Wei, Xiaomeng Fan, Yuwei Wu, Zhi Gao, Pengxiang Li, Yunde Jia, Mehrtash Harandi

Modality alignment is critical for vision-language models (VLMs) to effectively integrate information across modalities. However, existing methods extract hierarchical features from text while representing each image with a single feature, leading to asymmetric and suboptimal alignment. To address this, we propose Alignment across Trees, a method that constructs and aligns tree-like hierarchical features for both image and text modalities. Specifically, we introduce a semantic-aware visual feature extraction framework that applies a cross-attention mechanism to visual class tokens from intermediate Transformer layers, guided by textual cues to extract visual features with coarse-to-fine semantics. We then embed the feature trees of the two modalities into hyperbolic manifolds with distinct curvatures to effectively model their hierarchical structures. To align across the heterogeneous hyperbolic manifolds with different curvatures, we formulate a KL distance measure between distributions on heterogeneous manifolds, and learn an intermediate manifold for manifold alignment by minimizing the distance. We prove the existence and uniqueness of the optimal intermediate manifold. Experiments on taxonomic open-set classification tasks across multiple image datasets demonstrate that our method consistently outperforms strong baselines under few-shot and cross-domain settings.

模态对齐对于跨模态有效整合视觉语言模型（VLMs）的信息至关重要。然而，现有的方法从文本中提取层次特征，同时使用单一特征表示每幅图像，导致不对称和次优对齐。为了解决这一问题，我们提出了“树间对齐”方法，该方法构建并对齐图像和文本模态的树状层次特征。具体来说，我们引入了一个语义感知的视觉特征提取框架，该框架对中间Transformer层的视觉类别标记应用跨注意力机制，通过文本线索引导以提取具有从粗到细语义的视觉特征。然后，我们将两种模态的特征树嵌入具有不同曲率的双曲流形中，以有效地建模其层次结构。为了对齐具有不同曲率的异构双曲流形，我们在异构流形上的分布之间制定了一种KL距离度量，并通过最小化距离来学习用于流形对齐的中间流形。我们证明了最优中间流形的存在性和唯一性。在多个图像数据集上的开放式分类任务实验表明，我们的方法在少样本和跨域设置下始终优于强大的基线模型。

论文及项目相关链接

PDF

Summary

本文提出一种名为“跨树对齐”的方法，构建并实现对图像和文本模态的层次特征进行对齐。该方法通过引入语义感知的视觉特征提取框架，使用跨注意力机制对视觉类标记进行精细到粗糙的语义提取。之后，将两种模态的特征树嵌入到具有不同曲率的双曲流形上，并提出了一种KL距离度量方法以实现对不同曲率的异质双曲流形的对齐。本文证明存在唯一最优的中间流形。在多个图像数据集上的分类任务中，本文方法在少样本和跨域设置下均表现出优势。

Key Takeaways

模态对齐对多模态模型至关重要，特别是在处理视觉和语言信息整合时。
当前方法存在特征对齐不对称问题，因此本文提出跨树对齐方法来解决这一问题。
提出语义感知的视觉特征提取框架，使用跨注意力机制对视觉类标记进行精细到粗糙的语义提取。
特征树被嵌入到具有不同曲率的双曲流形上，以有效建模其层次结构。
提出一种KL距离度量方法，用于在不同曲率的异质双曲流形之间进行对齐。
存在唯一最优的中间流形用于对齐，并进行了证明。

Cool Papers

点此查看论文截图

Discovering EV Charging Site Archetypes Through Few Shot Forecasting: The First U.S.-Wide Study

Authors:Kshitij Nikhal, Luke Ackerknecht, Benjamin S. Riggan, Phil Stahlfeld

The decarbonization of transportation relies on the widespread adoption of electric vehicles (EVs), which requires an accurate understanding of charging behavior to ensure cost-effective, grid-resilient infrastructure. Existing work is constrained by small-scale datasets, simple proximity-based modeling of temporal dependencies, and weak generalization to sites with limited operational history. To overcome these limitations, this work proposes a framework that integrates clustering with few-shot forecasting to uncover site archetypes using a novel large-scale dataset of charging demand. The results demonstrate that archetype-specific expert models outperform global baselines in forecasting demand at unseen sites. By establishing forecast performance as a basis for infrastructure segmentation, we generate actionable insights that enable operators to lower costs, optimize energy and pricing strategies, and support grid resilience critical to climate goals.

交通运输的脱碳依赖于电动汽车（EVs）的广泛应用，这需要准确了解充电行为，以确保具有成本效益和电网韧性的基础设施。现有工作受限于小规模数据集、基于简单接近度的时序依赖建模，以及对操作历史有限站点的泛化能力较弱。为了克服这些局限性，这项工作提出了一个框架，该框架将聚类与少量预测相结合，利用充电需求的新型大规模数据集来揭示站点原型。结果表明，特定于原型的专家模型在预测未见站点上的需求时优于全球基准。通过以预测性能作为基础设施细分的基础，我们产生了可操作性的见解，使运营商能够降低成本、优化能源和定价策略，并支持对气候目标至关重要的电网韧性。

论文及项目相关链接

PDF Tackling Climate Change with Machine Learning: Workshop at NeurIPS 2025

Summary

交通运输的脱碳依赖于电动汽车（EVs）的广泛应用，这需要对充电行为有准确了解以确保成本效益高、电网韧性强的基础设施建设。为克服现有工作受小规模数据集、简单基于距离的时空依赖性建模以及缺乏对新站点推广的局限性，本研究提出一个结合聚类与少量预测数据的框架，利用新型大规模充电需求数据集揭示站点原型。结果表明，针对原型定制的模型在未见站点上的需求预测表现优于全球基准模型。通过建立预测性能作为基础设施分段的基础，我们产生了可操作的见解，帮助运营商降低成本、优化能源和定价策略，并为实现气候目标至关重要的电网韧性提供支持。

Key Takeaways

交通运输脱碳依赖电动汽车广泛采用，需深入了解充电行为以支持基础设施建设。
现有研究受限于小规模数据集、简单建模及弱推广能力。
提出结合聚类和少量预测数据的框架，利用大规模充电需求数据集。
原型特定模型在未见站点上的需求预测表现优于全球基准模型。
建立预测性能作为基础设施分段的基础，为运营商提供降低成本、优化能源和定价策略的建议。
该研究有助于实现电网韧性，对实现气候目标至关重要。

Cool Papers

点此查看论文截图

Questionnaire meets LLM: A Benchmark and Empirical Study of Structural Skills for Understanding Questions and Responses

Authors:Duc-Hai Nguyen, Vijayakumar Nanjappan, Barry O’Sullivan, Hoang D. Nguyen

Millions of people take surveys every day, from market polls and academic studies to medical questionnaires and customer feedback forms. These datasets capture valuable insights, but their scale and structure present a unique challenge for large language models (LLMs), which otherwise excel at few-shot reasoning over open-ended text. Yet, their ability to process questionnaire data or lists of questions crossed with hundreds of respondent rows remains underexplored. Current retrieval and survey analysis tools (e.g., Qualtrics, SPSS, REDCap) are typically designed for humans in the workflow, limiting such data integration with LLM and AI-empowered automation. This gap leaves scientists, surveyors, and everyday users without evidence-based guidance on how to best represent questionnaires for LLM consumption. We address this by introducing QASU (Questionnaire Analysis and Structural Understanding), a benchmark that probes six structural skills, including answer lookup, respondent count, and multi-hop inference, across six serialization formats and multiple prompt strategies. Experiments on contemporary LLMs show that choosing an effective format and prompt combination can improve accuracy by up to 8.8% points compared to suboptimal formats. For specific tasks, carefully adding a lightweight structural hint through self-augmented prompting can yield further improvements of 3-4% points on average. By systematically isolating format and prompting effects, our open source benchmark offers a simple yet versatile foundation for advancing both research and real-world practice in LLM-based questionnaire analysis.

每天有数百万人参与各种调查，从市场民意调查、学术研究到医疗问卷和客户反馈表。这些数据集捕捉了宝贵的见解，但其规模和结构给大型语言模型（LLM）带来了独特的挑战，而这些模型在其他情况下善于进行少样本开放式文本的推理。然而，它们在处理问卷数据或与数百名受访者行的数据交叉时的能力仍未得到充分探索。当前的检索和调查分析工具（例如Qualtrics、SPSS、REDCap）通常是为人类工作流程设计的，这限制了此类数据与LLM和AI赋能自动化的集成。这一空白使得科学家、调查人员以及日常用户缺乏基于证据的指南，来指导他们如何以最适合LLM的方式呈现问卷。我们通过引入QASU（问卷分析与结构理解）来解决这个问题，这是一个基准测试，它可以探查六种结构技能，包括答案查找、受访者计数和多跳推理等，涵盖六种序列化格式和多种提示策略。在当代大型语言模型上的实验表明，与次优格式相比，选择有效的格式和提示组合可以将准确性提高高达8.8个百分点。对于特定任务，通过自我增强提示仔细添加轻量级结构提示平均可以进一步提高3-4个百分点。通过系统地隔离格式和提示效果，我们开放的基准测试提供了一个简单而通用的基础，可推动LLM在问卷调查分析方面的研究和实际应用的发展。

论文及项目相关链接

PDF 14 pages, 3 figures, 8 tables

摘要

问卷分析结构理解基准测试（QASU）旨在解决大型语言模型在处理问卷数据方面的挑战。该基准测试包括六种结构技能，涉及答案查找、受访者计数和多跳推理等，跨越六种序列化格式和多种提示策略。实验显示，选择有效的格式和提示组合可以提高准确率达8.8%。通过系统地隔离格式和提示效果，QASU为推进大型语言模型在问卷分析领域的研究和实践提供了简单而通用的基础。

关键见解

问卷数据对于大型语言模型来说具有挑战性，需要专门的基准测试如QASU来解决。
QASU包括多种结构技能，涵盖答案查找、受访者计数和多跳推理等。
不同的序列化格式和提示策略对大型语言模型的性能有显著影响。
选择有效的格式和提示组合可以提高准确率达8.8%。
通过自我增强提示添加轻量级结构提示，可以进一步提高特定任务的性能。
QASU为推进大型语言模型在问卷分析领域的研究提供了基础。

Cool Papers

点此查看论文截图

Prototype-Driven Adaptation for Few-Shot Object Detection

Authors:Yushen Huang, Zhiming Wang

Few-shot object detection (FSOD) often suffers from base-class bias and unstable calibration when only a few novel samples are available. We propose Prototype-Driven Alignment (PDA), a lightweight, plug-in metric head for DeFRCN that provides a prototype-based “second opinion” complementary to the linear classifier. PDA maintains support-only prototypes in a learnable identity-initialized projection space and optionally applies prototype-conditioned RoI alignment to reduce geometric mismatch. During fine-tuning, prototypes can be adapted via exponential moving average(EMA) updates on labeled foreground RoIs-without introducing class-specific parameters-and are frozen at inference to ensure strict protocol compliance. PDA employs a best-of-K matching scheme to capture intra-class multi-modality and temperature-scaled fusion to combine metric similarities with detector logits. Experiments on VOC FSOD and GFSOD benchmarks show that PDA consistently improves novel-class performance with minimal impact on base classes and negligible computational overhead.

小样检测（FSOD）通常在只有少量新样本可用时面临基础类别偏差和不稳定校准的问题。我们提出了原型驱动对齐（PDA）方法，这是一个轻量级的、即插即用的度量头，适用于DeFRCN，在线性分类器的基础上提供基于原型的“第二意见”作为补充。PDA在可学习的身份初始化投影空间中维护仅支持原型，并可选择应用原型条件化RoI对齐以减少几何不匹配。在微调过程中，可以通过对标记前景RoIs的指数移动平均（EMA）更新来适应原型，而无需引入特定类别的参数，并在推理时冻结原型以确保严格遵守协议。PDA采用最佳K匹配方案来捕捉类内多模态性，并采用温度缩放融合将度量相似性与检测器对数几率相结合。在VOC FSOD和GFSOD基准测试上的实验表明，PDA在基础类别上的影响微乎其微，计算开销可忽略不计的情况下，始终提高了新类别的性能。

论文及项目相关链接

PDF 7 pages,1 figure,2 tables,Preprint

Summary

该文本介绍了针对少样本目标检测（FSOD）中遇到的基类偏差和新样本数量有限导致的不稳定校准问题，提出了一种名为Prototype-Driven Alignment（PDA）的轻量级插件度量头。PDA通过提供基于原型的“第二意见”来补充线性分类器，通过维护支持原型来减少几何不匹配并优化原型对齐方式，进而提升模型的泛化能力。在微调过程中，原型可以通过对标记前景RoIs的指数移动平均（EMA）更新进行自适应调整，同时确保推理时遵循严格的协议。PDA采用最佳K匹配方案和温度缩放融合策略，以结合度量相似性和检测器对数概率值。在VOC FSOD和GFSOD基准测试中，PDA能够持续提高新类别的性能，对基类的影响极小且计算开销可以忽略不计。

Key Takeaways

PDA针对少样本目标检测中的基类偏差和新样本不稳定校准问题提出了解决方案。
PDA是一个轻量级的插件度量头，为DeFRCN提供了基于原型的“第二意见”补充。
PDA通过维护学习身份初始化的投影空间中的支持原型来减少几何不匹配。
在微调过程中，原型可以通过EMA更新进行自适应调整，同时确保推理时遵循严格的协议。
PDA采用最佳K匹配方案来捕捉类内多模态性，并结合温度缩放融合策略来结合度量相似性和检测器对数概率值。
PDA在VOC FSOD和GFSOD基准测试中表现出卓越性能，能够持续提高新类别的检测性能。

Cool Papers

点此查看论文截图

Few-Shot Remote Sensing Image Scene Classification with CLIP and Prompt Learning

Authors:Ivica Dimitrovski, Vlatko Spasev, Ivan Kitanovski

Remote sensing applications increasingly rely on deep learning for scene classification. However, their performance is often constrained by the scarcity of labeled data and the high cost of annotation across diverse geographic and sensor domains. While recent vision-language models like CLIP have shown promise by learning transferable representations at scale by aligning visual and textual modalities, their direct application to remote sensing remains suboptimal due to significant domain gaps and the need for task-specific semantic adaptation. To address this critical challenge, we systematically explore prompt learning as a lightweight and efficient adaptation strategy for few-shot remote sensing image scene classification. We evaluate several representative methods, including Context Optimization, Conditional Context Optimization, Multi-modal Prompt Learning, and Prompting with Self-Regulating Constraints. These approaches reflect complementary design philosophies: from static context optimization to conditional prompts for enhanced generalization, multi-modal prompts for joint vision-language adaptation, and semantically regularized prompts for stable learning without forgetting. We benchmark these prompt-learning methods against two standard baselines: zero-shot CLIP with hand-crafted prompts and a linear probe trained on frozen CLIP features. Through extensive experiments on multiple benchmark remote sensing datasets, including cross-dataset generalization tests, we demonstrate that prompt learning consistently outperforms both baselines in few-shot scenarios. Notably, Prompting with Self-Regulating Constraints achieves the most robust cross-domain performance. Our findings underscore prompt learning as a scalable and efficient solution for bridging the domain gap in satellite and aerial imagery, providing a strong foundation for future research in this field.

遥感应用越来越依赖深度学习进行场景分类。然而，由于其缺乏标注数据和在不同地理和传感器领域的标注成本高昂，它们的性能往往受到限制。虽然最近的视觉语言模型（如CLIP）通过大规模学习可转移表示并通过视觉和文本模态的对齐显示出前景，但它们直接应用于遥感仍然不是最优的，这主要是因为存在明显的领域差距和需要特定任务的语义适应。为了应对这一关键挑战，我们系统地探索了提示学习作为一种轻量级、高效的适应策略，用于少数遥感图像场景分类。我们评估了几种具有代表性的方法，包括上下文优化、条件上下文优化、多模态提示学习和带有自我调控约束的提示。这些方法反映了互补的设计理念：从静态上下文优化到条件提示以增强泛化能力，多模态提示用于联合视觉语言适应，以及语义正则化提示以实现稳定学习而不会遗忘。我们将这些提示学习方法与两个标准基准线进行比较：使用手工提示的零样本CLIP和在线性探测器上训练的冷冻CLIP特征。通过对多个基准遥感数据集进行的广泛实验，包括跨数据集泛化测试，我们证明在少数场景中，提示学习始终优于两个基准线。值得注意的是，“带有自我调控约束的提示”实现了最稳健的跨域性能。我们的研究强调了提示学习作为缩小卫星和航空图像领域差距的可扩展和高效解决方案，为这一领域的未来研究提供了坚实的基础。

论文及项目相关链接

PDF

Summary
深度学习在遥感场景分类中的应用越来越广泛，但由于缺乏标注数据和标注成本高昂，其性能受到限制。本文探索了提示学习作为一种轻量级和高效的适应策略，用于解决遥感图像场景分类中的小样问题。通过对比几种代表性的提示学习方法，包括上下文优化、条件上下文优化、多模态提示学习和带有自我调节约束的提示等，发现提示学习在跨数据集测试中表现优异，为解决卫星和航空图像的域间隙问题提供了可扩展和高效的解决方案。

Key Takeaways

深度学习在遥感场景分类中的应用受到数据标注稀缺性的限制。
提示学习作为一种轻量级和高效的适应策略，用于解决小样问题。
上下文优化、条件上下文优化、多模态提示学习和带有自我调节约束的提示等方法具有不同的设计哲学。
提示学习在跨数据集测试中表现优越，特别是在自我调节约束提示下实现最稳健的跨域性能。

Cool Papers

点此查看论文截图

Towards Transparent Reasoning: What Drives Faithfulness in Large Language Models?

Authors:Teague McMillan, Gabriele Dominici, Martin Gjoreski, Marc Langheinrich

Large Language Models (LLMs) often produce explanations that do not faithfully reflect the factors driving their predictions. In healthcare settings, such unfaithfulness is especially problematic: explanations that omit salient clinical cues or mask spurious shortcuts can undermine clinician trust and lead to unsafe decision support. We study how inference and training-time choices shape explanation faithfulness, focusing on factors practitioners can control at deployment. We evaluate three LLMs (GPT-4.1-mini, LLaMA 70B, LLaMA 8B) on two datasets-BBQ (social bias) and MedQA (medical licensing questions), and manipulate the number and type of few-shot examples, prompting strategies, and training procedure. Our results show: (i) both the quantity and quality of few-shot examples significantly impact model faithfulness; (ii) faithfulness is sensitive to prompting design; (iii) the instruction-tuning phase improves measured faithfulness on MedQA. These findings offer insights into strategies for enhancing the interpretability and trustworthiness of LLMs in sensitive domains.

大型语言模型（LLMs）产生的解释通常不能真实地反映其预测背后的因素。在医疗环境中，这种不忠尤其成问题：解释中省略重要的临床线索或掩盖错误的捷径会破坏医生对模型的信任，导致不安全的决策支持。我们研究了推理和训练时的选择如何影响解释的忠实度，重点关注部署阶段实践者可以控制的因素。我们在两个数据集BBQ（社交偏见）和MedQA（医学许可问题）上评估了三个LLM（GPT-4.1 mini、LLaMA 70B、LLaMA 8B），并操作了少量示例的数量和类型、提示策略以及训练程序。我们的结果表明：（i）无论是数量还是质量上，少量的例子都会对模型的忠实度产生显著影响；（ii）忠实度对提示设计敏感；（iii）指令调整阶段提高了在MedQA上的衡量忠实度。这些发现对于提高敏感领域LLM的可解释性和可信度提供了策略上的见解。

论文及项目相关链接

PDF 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: NeurIPS 2025 Workshop on Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling

Summary

大型语言模型（LLMs）产生的解释通常不能忠实反映其预测的因素。在医疗环境中，不忠实的解释尤其成问题：省略重要临床线索或掩盖虚假捷径的解释可能会破坏医生信任，导致决策支持的潜在风险。本研究探讨推理和训练时间的选择如何影响解释的忠实性，重点关注部署时从业者可以控制的因素。对三种LLMs（GPT-4.1-mini、LLaMA 70B、LLaMA 8B）在两个数据集（BBQ社会偏见和MedQA医学许可问题）上进行评估，并操作少量示例的数量和类型、提示策略和培训程序。结果表明：（i）少量示例的数量和质量显著影响模型的忠实性；（ii）忠实性对提示设计敏感；（iii）指令调整阶段提高了在MedQA上的测量忠实性。这些发现有助于深入了解如何改进敏感领域的大型语言模型的解释性和可信度。

Key Takeaways

大型语言模型（LLMs）的解释不总是忠实于预测因素。
在医疗环境中，不忠实的解释可能导致医生信任受损和决策风险。
推理和训练时间的选择影响解释的忠实性。
少量示例的数量和质量对模型的忠实性有重要影响。
模型忠实性对提示设计敏感。
指令调整阶段可以提高模型的忠实性。

Cool Papers

点此查看论文截图

MoEMeta: Mixture-of-Experts Meta Learning for Few-Shot Relational Learning

Authors:Han Wu, Jie Yin

Few-shot knowledge graph relational learning seeks to perform reasoning over relations given only a limited number of training examples. While existing approaches largely adopt a meta-learning framework for enabling fast adaptation to new relations, they suffer from two key pitfalls. First, they learn relation meta-knowledge in isolation, failing to capture common relational patterns shared across tasks. Second, they struggle to effectively incorporate local, task-specific contexts crucial for rapid adaptation. To address these limitations, we propose MoEMeta, a novel meta-learning framework that disentangles globally shared knowledge from task-specific contexts to enable both effective generalization and rapid adaptation. MoEMeta introduces two key innovations: (i) a mixture-of-experts (MoE) model that learns globally shared relational prototypes to enhance generalization, and (ii) a task-tailored adaptation mechanism that captures local contexts for fast task-specific adaptation. By balancing global generalization with local adaptability, MoEMeta significantly advances few-shot relational learning. Extensive experiments and analyses on three KG benchmarks demonstrate that MoEMeta consistently outperforms existing baselines, achieving state-of-the-art performance.

少量样本知识图谱关系学习旨在利用有限的训练样本进行关系推理。虽然现有方法大多采用元学习框架，以实现对新关系的快速适应，但它们存在两个主要缺陷。首先，它们孤立地学习关系元知识，无法捕获跨任务共享的常见关系模式。其次，它们难以有效地融入对快速适应至关重要的局部特定任务上下文。为了解决这些局限性，我们提出了MoEMeta，这是一种新型元学习框架，能够解构全局共享知识并融入特定任务的上下文环境，以实现有效的泛化和快速适应。MoEMeta引入了两项关键创新：一是混合专家（MoE）模型，学习全局共享的关系原型以增强泛化能力；二是任务定制适应机制，捕捉局部上下文环境以实现特定任务的快速适应。通过平衡全局泛化与局部适应性，MoEMeta大幅提升了少量样本关系学习。在三个知识图谱基准测试上的广泛实验和分析表明，MoEMeta始终优于现有基线，达到最先进的性能水平。

论文及项目相关链接

PDF Accepted by NeurIPS 2025

Summary

少量样本知识图谱关系学习旨在利用有限的训练样本进行关系推理。现有方法主要采用元学习框架，以实现对新关系的快速适应，但它们存在两个主要缺陷。首先，它们孤立地学习关系元知识，无法捕获任务间共享的常见关系模式。其次，它们难以有效地融入对快速任务适应至关重要的局部特定任务上下文。为解决这些局限性，我们提出了MoEMeta这一新型元学习框架，它通过分离全局共享知识与任务特定上下文，实现了有效的泛化和快速适应。MoEMeta引入了两个关键创新点：（i）混合专家（MoE）模型，学习全局共享关系原型以增强泛化能力；（ii）任务定制适应机制，捕捉局部上下文以实现特定任务的快速适应。通过平衡全局泛化与局部适应性，MoEMeta大幅提升了少量样本关系学习。在三个知识图谱基准测试上的广泛实验和分析表明，MoEMeta始终优于现有基线，达到最新技术水平。

Key Takeaways

Few-shot knowledge graph relational learning 旨在利用有限训练样本进行关系推理。
现有方法采用元学习框架，但存在孤立学习关系元知识和难以融入局部特定任务上下文的问题。
MoEMeta框架通过分离全局共享知识与任务特定上下文，实现有效泛化和快速适应。
MoEMeta引入混合专家（MoE）模型，学习全局共享关系原型以增强泛化能力。
MoEMeta提出任务定制适应机制，捕捉局部上下文以实现特定任务的快速适应。
MoEMeta通过平衡全局泛化与局部适应性，显著提升了少量样本关系学习效果。

Cool Papers

点此查看论文截图

Conjugate Relation Modeling for Few-Shot Knowledge Graph Completion

Authors:Zilong Wang, Qingtian Zeng, Hua Duan, Cheng Cheng, Minghao Zou, Ziyang Wang

Few-shot Knowledge Graph Completion (FKGC) infers missing triples from limited support samples, tackling long-tail distribution challenges. Existing methods, however, struggle to capture complex relational patterns and mitigate data sparsity. To address these challenges, we propose a novel FKGC framework for conjugate relation modeling (CR-FKGC). Specifically, it employs a neighborhood aggregation encoder to integrate higher-order neighbor information, a conjugate relation learner combining an implicit conditional diffusion relation module with a stable relation module to capture stable semantics and uncertainty offsets, and a manifold conjugate decoder for efficient evaluation and inference of missing triples in manifold space. Experiments on three benchmarks demonstrate that our method achieves superior performance over state-of-the-art methods.

少量样本知识图谱补全（FKGC）从有限的支撑样本中推断出缺失的三元组，解决了长尾分布挑战。然而，现有方法难以捕捉复杂的模式关系并缓解数据稀疏问题。为了应对这些挑战，我们提出了一种用于共轭关系建模的FKGC新框架（CR-FKGC）。具体来说，它采用邻域聚合编码器来整合高阶邻居信息，一个结合隐式条件扩散关系模块和稳定关系模块的共轭关系学习者来捕捉稳定语义和不确定性偏移，以及一个流形共轭解码器，用于流形空间中缺失三元组的有效评估和推理。在三个基准测试集上的实验表明，我们的方法实现了优于最新技术的性能。

论文及项目相关链接

PDF

Summary

本文介绍了Few-shot知识图谱补全（FKGC）的问题，现有方法在面对复杂关系模式和稀疏数据挑战时捉襟见肘。为此，提出了一种新的FKGC框架——共轭关系建模（CR-FKGC）。该框架采用邻域聚合编码器整合高阶邻域信息，通过隐性条件扩散关系模块与稳定关系模块的结合，捕捉稳定语义和不确定性偏移。此外，还包括一个流形共轭解码器，用于在流形空间中对缺失的三元组进行高效评估和推理。实验表明，该方法在三个基准测试集上的性能优于现有方法。

Key Takeaways

Few-shot知识图谱补全（FKGC）旨在从有限的样本中推断出缺失的三元组，以解决长尾分布的挑战。
现有方法在捕捉复杂关系模式和缓解数据稀疏方面存在困难。
提出的CR-FKGC框架采用邻域聚合编码器整合高阶邻域信息。
CR-FKGC结合隐性条件扩散关系模块与稳定关系模块，以捕捉稳定语义和不确定性偏移。
流形共轭解码器用于在流形空间中对缺失的三元组进行高效评估和推理。
实验结果表明，CR-FKGC在三个基准测试集上的性能优于其他方法。

Cool Papers

点此查看论文截图

Few-Shot Knowledge Distillation of LLMs With Counterfactual Explanations

Authors:Faisal Hamman, Pasan Dissanayake, Yanjun Fu, Sanghamitra Dutta

Knowledge distillation is a promising approach to transfer capabilities from complex teacher models to smaller, resource-efficient student models that can be deployed easily, particularly in task-aware scenarios. However, existing methods of task-aware distillation typically require substantial quantities of data which may be unavailable or expensive to obtain in many practical scenarios. In this paper, we address this challenge by introducing a novel strategy called Counterfactual-explanation-infused Distillation CoD for few-shot task-aware knowledge distillation by systematically infusing counterfactual explanations. Counterfactual explanations (CFEs) refer to inputs that can flip the output prediction of the teacher model with minimum perturbation. Our strategy CoD leverages these CFEs to precisely map the teacher’s decision boundary with significantly fewer samples. We provide theoretical guarantees for motivating the role of CFEs in distillation, from both statistical and geometric perspectives. We mathematically show that CFEs can improve parameter estimation by providing more informative examples near the teacher’s decision boundary. We also derive geometric insights on how CFEs effectively act as knowledge probes, helping the students mimic the teacher’s decision boundaries more effectively than standard data. We perform experiments across various datasets and LLMs to show that CoD outperforms standard distillation approaches in few-shot regimes (as low as 8-512 samples). Notably, CoD only uses half of the original samples used by the baselines, paired with their corresponding CFEs and still improves performance.

知识蒸馏是一种有前景的方法，可以从复杂的教师模型向更小、资源效率更高、易于部署的学生模型转移能力，特别是在任务感知场景中。然而，现有的任务感知蒸馏方法通常需要大量的数据，这在许多实际场景中可能无法获得或成本高昂。针对这一挑战，本文引入了一种称为“融合反事实解释的蒸馏”（CoD）的新策略，用于少样本任务感知知识蒸馏，通过系统融合反事实解释。反事实解释（CFE）是指能够用最少的扰动改变教师模型输出预测的投入。我们的CoD策略利用这些CFE来精确映射教师的决策边界，所需样本数量大大减少。我们从统计和几何两个角度为反事实解释在蒸馏中的作用提供了理论保证。我们数学上证明，反事实解释可以通过提供更多接近教师决策边界的示例来改善参数估计。我们还从几何角度洞察了反事实解释如何有效地作为知识探针发挥作用，帮助学生更有效地模仿教师的决策边界，而不是标准数据。我们在各种数据集和大型语言模型上进行了实验，结果表明，在少量样本情况下，CoD优于标准蒸馏方法（低至8-512个样本）。值得注意的是，CoD仅使用基线的一半样本，配对相应的CFE，仍然可以提高性能。

论文及项目相关链接

PDF NeurIPS 2025

Summary

本文提出一种名为CoD（融合反事实解释的蒸馏策略），用于解决任务感知蒸馏中的少样本挑战。该策略通过系统地融入反事实解释（CFE），能够在少量样本下精确地映射教师模型的决策边界。反事实解释是那些能够最小化扰动并改变教师模型预测结果的输入。本文通过理论和数学证明，证明了反事实解释在蒸馏中的重要作用，并通过实验验证了在各种数据集和大型语言模型（LLMs）上，CoD在少样本环境下优于标准蒸馏方法。

Key Takeaways

知识蒸馏是一种从复杂教师模型向更小、资源效率更高的学生模型转移能力的方法，特别是在任务感知场景中。
现有的任务感知蒸馏方法通常需要大量数据，这在许多实际场景中可能无法获得或成本高昂。
CoD策略通过融入反事实解释（CFE）来解决少样本蒸馏的挑战，能够精确地映射教师模型的决策边界。
反事实解释是指那些能够最小化扰动并改变教师模型预测结果的输入。
理论和数学证明支持了反事实解释在蒸馏中的重要性。
实验表明，在各种数据集和大型语言模型上，CoD在少样本环境下显著优于标准蒸馏方法。

Cool Papers

点此查看论文截图

An Automatic Detection Method for Hematoma Features in Placental Abruption Ultrasound Images Based on Few-Shot Learning

Authors:Xiaoqing Liu, Jitai Han, Hua Yan, Peng Li, Sida Tang, Ying Li, Kaiwen Zhang, Min Yu

Placental abruption is a severe complication during pregnancy, and its early accurate diagnosis is crucial for ensuring maternal and fetal safety. Traditional ultrasound diagnostic methods heavily rely on physician experience, leading to issues such as subjective bias and diagnostic inconsistencies. This paper proposes an improved model, EH-YOLOv11n (Enhanced Hemorrhage-YOLOv11n), based on small-sample learning, aiming to achieve automatic detection of hematoma features in placental ultrasound images. The model enhances performance through multidimensional optimization: it integrates wavelet convolution and coordinate convolution to strengthen frequency and spatial feature extraction; incorporates a cascaded group attention mechanism to suppress ultrasound artifacts and occlusion interference, thereby improving bounding box localization accuracy. Experimental results demonstrate a detection accuracy of 78%, representing a 2.5% improvement over YOLOv11n and a 13.7% increase over YOLOv8. The model exhibits significant superiority in precision-recall curves, confidence scores, and occlusion scenarios. Combining high accuracy with real-time processing, this model provides a reliable solution for computer-aided diagnosis of placental abruption, holding significant clinical application value.

胎盘早剥是妊娠过程中的一种严重并发症，早期准确诊断对确保母婴安全至关重要。传统的超声诊断方法严重依赖医生经验，导致主观偏见和诊断不一致等问题。本文针对胎盘超声图像中的血肿特征自动检测，提出了一种基于小样本学习的改进模型EH-YOLOv11n（增强出血-YOLOv11n）。该模型通过多维优化提升性能：它融合了小波卷积和坐标卷积，以强化频率和空间特征提取；引入级联组注意力机制，抑制超声伪影和遮挡干扰，从而提高边界框定位精度。实验结果显示，检测准确率为78%，相较于YOLOv11n提高2.5%，相较于YOLOv8提高13.7%。该模型在精确率-召回率曲线、置信度得分和遮挡场景等方面表现出显著优势。兼具高准确率和实时处理能力的该模型，为计算机辅助诊断胎盘早剥提供了可靠解决方案，具有显著的临床应用价值。

论文及项目相关链接

PDF

Summary

这篇论文提出了一种基于小样本学习的改进模型EH-YOLOv11n，用于自动检测胎盘超声图像中的血肿特征，实现胎盘早剥的计算机辅助诊断。该模型通过多维度优化提高性能，集成小波卷积和坐标卷积，加强频率和空间特征提取；采用级联组注意机制，抑制超声伪影和遮挡干扰，提高边界框定位精度。实验结果表明，该模型检测准确率达到了78%，相较于YOLOv11n和YOLOv8分别提高了2.5%和13.7%。在精度-召回率曲线、置信度得分和遮挡场景下，该模型表现出显著优势，兼具高准确性和实时处理性能，具有重要的临床应用价值。

Key Takeaways

胎盘早剥的准确早期诊断对确保母婴安全至关重要。
传统超声诊断方法存在主观偏见和诊断不一致性问题。
EH-YOLOv11n模型基于小样本学习，旨在自动检测胎盘超声图像中的血肿特征。
该模型通过多维度优化提高性能，包括集成小波卷积和坐标卷积，以及采用级联组注意机制。
实验结果表明，EH-YOLOv11n模型的检测准确率达到了78%，相较于其他模型有明显提升。
EH-YOLOv11n模型在精度-召回率曲线、置信度得分和遮挡场景下表现出显著优势。

Cool Papers

点此查看论文截图

Parameter-Free Hypergraph Neural Network for Few-Shot Node Classification

Authors:Chaewoon Bae, Doyun Choi, Jaehyun Lee, Jaemin Yoo

Few-shot node classification on hypergraphs requires models that generalize from scarce labels while capturing high-order structures. Existing hypergraph neural networks (HNNs) effectively encode such structures but often suffer from overfitting and scalability issues due to complex, black-box architectures. In this work, we propose ZEN (Zero-Parameter Hypergraph Neural Network), a fully linear and parameter-free model that achieves both expressiveness and efficiency. Built upon a unified formulation of linearized HNNs, ZEN introduces a tractable closed-form solution for the weight matrix and a redundancy-aware propagation scheme to avoid iterative training and to eliminate redundant self information. On 11 real-world hypergraph benchmarks, ZEN consistently outperforms eight baseline models in classification accuracy while achieving up to 696x speedups over the fastest competitor. Moreover, the decision process of ZEN is fully interpretable, providing insights into the characteristic of a dataset. Our code and datasets are fully available at https://github.com/chaewoonbae/ZEN.

在超图上进行小样本节点分类需要能够从稀缺标签中概括同时捕获高阶结构的模型。现有的超图神经网络（HNN）可以有效地编码此类结构，但由于复杂的黑匣子架构，常常面临过拟合和可扩展性问题。在这项工作中，我们提出了ZEN（零参数超图神经网络），这是一个完全线性且无参数的模型，实现了表达性和效率。ZEN建立在线性化HNNs的统一公式之上，引入了一种可行的闭式解权重矩阵和一种避免冗余信息的传播方案，以避免迭代训练和消除冗余的自身信息。在11个真实世界的超图基准测试中，ZEN在分类精度上始终优于8个基准模型，同时在速度上实现了最快竞争对手高达696倍的提升。此外，ZEN的决策过程是完全可解释的，为数据集的特征提供了深入见解。我们的代码和数据集可在https://github.com/chaewoonbae/ZEN完全获得。

论文及项目相关链接

PDF

Summary

简洁有效的超图神经网络模型——ZEN。该模型实现了线性化且无需参数，通过引入权重矩阵的封闭解和冗余感知传播方案，解决了现有超图神经网络（HNNs）面临的过拟合和可扩展性问题。ZEN在分类准确性方面表现出色，超越八个基准模型，且处理速度最快可达到现有最佳模型的696倍。其决策过程具有完全的可解释性，有助于深入了解数据集的特性。数据集与代码已在GitHub上公开。

Key Takeaways

ZEN是一个无需参数的超图神经网络模型，实现了线性化，兼具表达力和效率。
ZEN解决了现有超图神经网络面临的过拟合和可扩展性问题。
在11个真实世界的超图基准测试中，ZEN的分类准确性超越了八个基准模型。
ZEN处理速度最快可达到现有最佳模型的696倍。
ZEN的决策过程具有完全的可解释性。
ZEN通过引入权重矩阵的封闭解和冗余感知传播方案，优化了模型性能。

Cool Papers

点此查看论文截图

M-GLC: Motif-Driven Global-Local Context Graphs for Few-shot Molecular Property Prediction

Authors:Xiangyang Xu, Hongyang Gao

Molecular property prediction (MPP) is a cornerstone of drug discovery and materials science, yet conventional deep learning approaches depend on large labeled datasets that are often unavailable. Few-shot Molecular property prediction (FSMPP) addresses this scarcity by incorporating relational inductive bias through a context graph that links molecule nodes to property nodes, but such molecule-property graphs offer limited structural guidance. We propose a comprehensive solution: Motif Driven Global-Local Context Graph for few-shot molecular property prediction, which enriches contextual information at both the global and local levels. At the global level, chemically meaningful motif nodes representing shared substructures, such as rings or functional groups, are introduced to form a global tri-partite heterogeneous graph, yielding motif-molecule-property connections that capture long-range compositional patterns and enable knowledge transfer among molecules with common motifs. At the local level, we build a subgraph for each node in the molecule-property pair and encode them separately to concentrate the model’s attention on the most informative neighboring molecules and motifs. Experiments on five standard FSMPP benchmarks demonstrate that our framework consistently outperforms state-of-the-art methods. These results underscore the effectiveness of integrating global motif knowledge with fine-grained local context to advance robust few-shot molecular property prediction.

分子属性预测（MPP）是药物发现和材料科学的重要组成部分，然而传统的深度学习方法依赖于大量可用的标记数据集，而这些数据通常并不可用。小样本分子属性预测（FSMPP）通过结合关系归纳偏见来解决这个问题，该偏见通过上下文图链接分子节点和属性节点，但这种分子-属性图提供的结构指导有限。我们提出了一个全面的解决方案：针对小样本分子属性预测的动机驱动全局-局部上下文图，它同时丰富全局和局部级别的上下文信息。在全局层面，引入代表共享子结构的化学意义基元节点（如环或官能团），形成全局三方异质图，产生基元-分子-属性连接，捕获长程组合模式，并在具有共同基元的分子之间进行知识转移。在局部层面，我们为分子-属性对中的每个节点构建子图并进行单独编码，使模型关注最具有信息量的邻近分子和基元。在五个标准FSMPP基准测试上的实验表明，我们的框架始终优于最新方法。这些结果突显了将全局基元知识与精细的局部上下文相结合，以推动稳健的小样本分子属性预测的有效性。

论文及项目相关链接

PDF

Summary

本文介绍了针对分子属性预测的新方法，解决了传统深度学习需要大量标注数据集的问题。通过构建全局和局部的上下文图，引入化学意义上的基元节点，形成全局三元异构图，实现长程组合模式的捕捉和分子间知识的转移。同时，为每个分子属性对节点构建子图，分别编码，使模型关注最具有信息量的邻近分子和基元。实验表明，该方法在五个标准少样本分子属性预测基准测试上均优于现有方法。

Key Takeaways

分子属性预测（MPP）是药物发现和材料科学的核心，但传统深度学习方法需要大量标注数据集，这在现实中常常无法获得。
少样本分子属性预测（FSMPP）通过关系归纳偏见解决了数据稀缺问题，使用上下文图连接分子节点和属性节点。
提出了一种全新的方法——基于基元驱动的全局-局部上下文图，以丰富全局和局部层面的上下文信息。
在全局层面，引入化学意义上的基元节点，形成全局三元异构图，捕捉长程组合模式，实现分子间知识转移。
在局部层面，为每个分子属性对节点构建子图并分别编码，使模型关注最相关的邻近分子和基元。
实验证明，该方法在多个标准少样本分子属性预测基准测试上表现优异。

Cool Papers

点此查看论文截图

On-the-Fly OVD Adaptation with FLAME: Few-shot Localization via Active Marginal-Samples Exploration

Authors:Yehonathan Refael, Amit Aides, Aviad Barzilai, George Leifman, Genady Beryozkin, Vered Silverman, Bolous Jaber, Tomer Shekel

Open-vocabulary object detection (OVD) models offer remarkable flexibility by detecting objects from arbitrary text queries. However, their zero-shot performance in specialized domains like Remote Sensing (RS) is often compromised by the inherent ambiguity of natural language, limiting critical downstream applications. For instance, an OVD model may struggle to distinguish between fine-grained classes such as “fishing boat” and “yacht” since their embeddings are similar and often inseparable. This can hamper specific user goals, such as monitoring illegal fishing, by producing irrelevant detections. To address this, we propose a cascaded approach that couples the broad generalization of a large pre-trained OVD model with a lightweight few-shot classifier. Our method first employs the zero-shot model to generate high-recall object proposals. These proposals are then refined for high precision by a compact classifier trained in real-time on only a handful of user-annotated examples - drastically reducing the high costs of RS imagery annotation.The core of our framework is FLAME, a one-step active learning strategy that selects the most informative samples for training. FLAME identifies, on the fly, uncertain marginal candidates near the decision boundary using density estimation, followed by clustering to ensure sample diversity. This efficient sampling technique achieves high accuracy without costly full-model fine-tuning and enables instant adaptation, within less then a minute, which is significantly faster than state-of-the-art alternatives.Our method consistently surpasses state-of-the-art performance on RS benchmarks, establishing a practical and resource-efficient framework for adapting foundation models to specific user needs.

开放词汇对象检测（OVD）模型通过从任意文本查询中检测对象提供了显著的灵活性。然而，它们在遥感（RS）等特定领域的零样本性能通常受到自然语言固有模糊性的限制，从而影响关键下游应用。例如，OVD模型可能难以区分细粒度类别，如“渔船”和“游艇”，因为它们的嵌入相似且通常无法区分。这可能会阻碍特定的用户目标，例如监测非法捕鱼，因为会产生不相关的检测。为解决这一问题，我们提出了一种级联方法，将大型预训练OVD模型的广泛泛化与轻量级小样本分类器相结合。我们的方法首先使用零样本模型生成高召回率的对象提案。然后，使用仅使用少量用户注释实例实时训练的紧凑分类器对这些提案进行精炼，以提高精度——大大降低了遥感图像注释的高成本。我们框架的核心是FLAME，这是一种一步式主动学习策略，可选择最具信息量的样本进行训练。FLAME即时识别决策边界附近的不确定性边缘候选者，并使用密度估计进行聚类，以确保样本多样性。这种高效的采样技术无需昂贵的全模型微调即可实现高准确性，并能在不到一分钟的时间内实现即时适应，这明显快于最先进的替代品。我们的方法在遥感基准测试上的表现始终超过最新技术，为将基础模型适应特定用户需求建立了一个实用且资源高效的框架。

论文及项目相关链接

PDF

Summary

本文提出一种针对开放词汇对象检测（OVD）模型在遥感（RS）领域零样本性能不佳的问题的解决方案。通过结合大规模预训练OVD模型的广泛泛化能力与轻量级的小样本分类器，该方案首先利用零样本模型生成高召回率的对象提案，然后通过实时训练少量用户标注的样本对提案进行精炼，以提高精度。该框架的核心是一步式主动学习策略FLAME，它能选择最具代表性的样本进行训练，通过密度估计和聚类技术快速适应决策边界的不确定性边际候选，实现高效采样，且无需昂贵的全模型微调，能在几分钟内完成适应。此方案在遥感基准测试上的表现超越了现有技术，为将基础模型适应特定用户需求建立了一个实用且资源高效框架。

Key Takeaways

开放词汇对象检测（OVD）模型能灵活地通过文本查询检测对象，但在遥感（RS）领域的零样本性能受限。
提出的方案结合了预训练OVD模型的广泛泛化能力与小样本分类器，旨在提高在遥感图像中的对象检测性能。
利用零样本模型生成高召回率的对象提案，然后通过实时训练的少量用户标注样本提高精度。
框架核心为一步式主动学习策略FLAME，能选择最具代表性的样本进行训练，实现高效采样且无需全模型微调。
FLAME通过密度估计和聚类技术适应决策边界的不确定性边际候选。
该方案在遥感基准测试上的表现超越了现有技术。

Cool Papers

点此查看论文截图

Preference-driven Knowledge Distillation for Few-shot Node Classification

Authors:Xing Wei, Chunchun Chen, Rui Fan, Xiaofeng Cao, Sourav Medya, Wei Ye

Graph neural networks (GNNs) can efficiently process text-attributed graphs (TAGs) due to their message-passing mechanisms, but their training heavily relies on the human-annotated labels. Moreover, the complex and diverse local topologies of nodes of real-world TAGs make it challenging for a single mechanism to handle. Large language models (LLMs) perform well in zero-/few-shot learning on TAGs but suffer from a scalability challenge. Therefore, we propose a preference-driven knowledge distillation (PKD) framework to synergize the complementary strengths of LLMs and various GNNs for few-shot node classification. Specifically, we develop a GNN-preference-driven node selector that effectively promotes prediction distillation from LLMs to teacher GNNs. To further tackle nodes’ intricate local topologies, we develop a node-preference-driven GNN selector that identifies the most suitable teacher GNN for each node, thereby facilitating tailored knowledge distillation from teacher GNNs to the student GNN. Extensive experiments validate the efficacy of our proposed framework in few-shot node classification on real-world TAGs. Our code is be available.

图神经网络（GNNs）由于其信息传递机制，可以有效地处理文本属性图（TAGs）。但是，它们的训练严重依赖于人工标注的标签。此外，现实世界中的TAG节点的复杂和多样的局部拓扑结构使得单一机制处理起来具有挑战性。大型语言模型（LLMs）在TAG的零/少样本学习方面表现良好，但面临可扩展性挑战。因此，我们提出了一种偏好驱动的知识蒸馏（PKD）框架，以协同LLMs和各种GNNs的互补优势，用于少样本节点分类。具体来说，我们开发了一种GNN偏好驱动节点选择器，有效地促进了从LLMs到教师GNNs的预测蒸馏。为了进一步解决节点的复杂局部拓扑结构问题，我们开发了节点偏好驱动GNN选择器，它为每个节点确定最合适的教师GNN，从而促进了从教师GNN到学生GNN的定制知识蒸馏。大量实验验证了我们在现实世界的TAG少样本节点分类中提出的框架的有效性。我们的代码可用。

论文及项目相关链接

PDF Accepted by NeurIPS 2025

Summary

基于图神经网络（GNNs）在处理文本属性图（TAGs）时的优势，结合其信息传递机制，但训练过程高度依赖人工标注标签。针对真实世界TAGs节点复杂多变的局部拓扑结构，单一机制难以应对。大型语言模型（LLMs）在零/少样本学习对TAGs表现良好，但面临可扩展性挑战。为此，提出一种偏好驱动的知识蒸馏（PKD）框架，融合LLMs和各种GNNs的互补优势，用于少样本节点分类。该框架包括GNN偏好驱动节点选择器和节点偏好驱动GNN选择器，有效促进从LLMs到教师GNNs的预测蒸馏，并解决了节点的复杂局部拓扑问题。在真实世界TAGs上的大量实验验证了该框架在少样本节点分类中的有效性。

Key Takeaways

图神经网络（GNNs）能高效处理文本属性图（TAGs），得益于其消息传递机制。
GNNs的训练严重依赖人工标注标签。
真实世界的TAGs节点具有复杂和多样的局部拓扑结构，使得单一机制应对困难。
大型语言模型（LLMs）在零/少样本学习对TAGs表现出良好性能，但存在可扩展性问题。
提出了偏好驱动的知识蒸馏（PKD）框架，融合了LLMs和GNNs的优势，用于少样本节点分类。
PKD框架包括GNN偏好驱动节点选择器和节点偏好驱动GNN选择器，分别促进预测蒸馏和针对节点复杂局部拓扑的定制知识蒸馏。
在真实世界TAGs上的实验验证了该框架在少样本节点分类中的有效性。

Cool Papers

点此查看论文截图

SynBrain: Enhancing Visual-to-fMRI Synthesis via Probabilistic Representation Learning

Authors:Weijian Mai, Jiamin Wu, Yu Zhu, Zhouheng Yao, Dongzhan Zhou, Andrew F. Luo, Qihao Zheng, Wanli Ouyang, Chunfeng Song

Deciphering how visual stimuli are transformed into cortical responses is a fundamental challenge in computational neuroscience. This visual-to-neural mapping is inherently a one-to-many relationship, as identical visual inputs reliably evoke variable hemodynamic responses across trials, contexts, and subjects. However, existing deterministic methods struggle to simultaneously model this biological variability while capturing the underlying functional consistency that encodes stimulus information. To address these limitations, we propose SynBrain, a generative framework that simulates the transformation from visual semantics to neural responses in a probabilistic and biologically interpretable manner. SynBrain introduces two key components: (i) BrainVAE models neural representations as continuous probability distributions via probabilistic learning while maintaining functional consistency through visual semantic constraints; (ii) A Semantic-to-Neural Mapper acts as a semantic transmission pathway, projecting visual semantics into the neural response manifold to facilitate high-fidelity fMRI synthesis. Experimental results demonstrate that SynBrain surpasses state-of-the-art methods in subject-specific visual-to-fMRI encoding performance. Furthermore, SynBrain adapts efficiently to new subjects with few-shot data and synthesizes high-quality fMRI signals that are effective in improving data-limited fMRI-to-image decoding performance. Beyond that, SynBrain reveals functional consistency across trials and subjects, with synthesized signals capturing interpretable patterns shaped by biological neural variability. Our code is available at https://github.com/MichaelMaiii/SynBrain.

解析视觉刺激如何转化为皮层反应是计算神经科学中的一项基本挑战。这种视觉到神经的映射本质上是一种一对多的关系，因为相同的视觉输入在试验、上下文和受试者之间可靠地引发了可变的血流动力学反应。然而，现有的确定性方法很难同时模拟这种生物变异，同时捕捉编码刺激信息的潜在功能一致性。为了解决这些局限性，我们提出了SynBrain，这是一个生成性框架，以概率和生物学上可解释的方式模拟从视觉语义到神经反应的转化。SynBrain有两个关键组成部分：（i）BrainVAE通过概率学习将神经表征建模为连续概率分布，同时通过视觉语义约束维持功能一致性；（ii）语义到神经映射器充当语义传输路径，将视觉语义投影到神经响应流形中，以实现高保真度的fMRI合成。实验结果表明，SynBrain在针对特定主题的视觉到fMRI编码性能方面超越了最先进的方法。此外，SynBrain能够高效适应新主题并生成高质量fMRI信号，有效提高了数据有限的fMRI到图像解码性能。除此之外，SynBrain揭示了试验和受试者之间的功能一致性，合成的信号捕捉到了由生物神经变异所塑造的可解释模式。我们的代码位于 https://github.com/MichaelMaiii/SynBrain。

论文及项目相关链接

PDF Accepted by NeurIPS 2025

Summary

本文研究了计算神经科学中的视觉刺激如何转化为皮层响应的问题。由于相同的视觉输入在不同试验、情境和受试者中引发的血流动力学响应是可变的，因此这种视觉到神经的映射是一种一对一多的关系。针对现有确定性方法无法同时模拟这种生物变异性和编码刺激信息的底层功能一致性的局限性，提出了SynBrain这一生成框架。它通过概率学习和语义约束模拟视觉语义到神经响应的转换，并引入BrainVAE和语义到神经映射器两个关键组件。实验结果表明，SynBrain在特定受试者的视觉到fMRI编码性能上超越了最新方法，并能有效地适应新的受试者进行小样本数据的fMRI信号合成，提高数据受限的fMRI到图像的解码性能。此外，SynBrain揭示了试验和受试者之间的功能一致性，合成的信号捕捉到了由生物神经变异塑造的可解释模式。

Key Takeaways

研究了计算神经科学中的视觉刺激转化为皮层响应问题，指出这是一个一对一多的关系。
现有方法难以同时模拟生物变异性和底层功能一致性。
提出了SynBrain生成框架，通过概率学习和语义约束模拟视觉语义到神经响应的转换。
SynBrain包括BrainVAE和语义到神经映射器两个关键组件。
SynBrain在视觉到fMRI编码性能上超越了最新方法。
SynBrain能高效适应新受试者的小样本数据，提高fMRI信号合成的质量。

Cool Papers

点此查看论文截图

CompoST: A Benchmark for Analyzing the Ability of LLMs To Compositionally Interpret Questions in a QALD Setting

Authors:David Maria Schmidt, Raoul Schubert, Philipp Cimiano

Language interpretation is a compositional process, in which the meaning of more complex linguistic structures is inferred from the meaning of their parts. Large language models possess remarkable language interpretation capabilities and have been successfully applied to interpret questions by mapping them to SPARQL queries. An open question is how systematic this interpretation process is. Toward this question, in this paper, we propose a benchmark for investigating to what extent the abilities of LLMs to interpret questions are actually compositional. For this, we generate three datasets of varying difficulty based on graph patterns in DBpedia, relying on Lemon lexica for verbalization. Our datasets are created in a very controlled fashion in order to test the ability of LLMs to interpret structurally complex questions, given that they have seen the atomic building blocks. This allows us to evaluate to what degree LLMs are able to interpret complex questions for which they “understand” the atomic parts. We conduct experiments with models of different sizes using both various prompt and few-shot optimization techniques as well as fine-tuning. Our results show that performance in terms of macro $F_1$ degrades from $0.45$ over $0.26$ down to $0.09$ with increasing deviation from the samples optimized on. Even when all necessary information was provided to the model in the input, the $F_1$ scores do not exceed $0.57$ for the dataset of lowest complexity. We thus conclude that LLMs struggle to systematically and compositionally interpret questions and map them into SPARQL queries.

语言理解是一个组合过程，其中更复杂的语言结构的含义是从其组成部分中推断出来的。大型语言模型拥有显著的语言理解能力，并已成功应用于将问题映射到SPARQL查询以进行理解。一个悬而未决的问题是这种理解过程有多系统。针对这个问题，本文提出了一个基准测试，以调查LLMs在多大程度上能够理解问题的组合性质。为此，我们基于DBpedia中的图形模式生成了三个不同难度的数据集，并使用Lemon词典进行口语化表达。我们的数据集以非常受控的方式创建，旨在测试LLMs在理解结构复杂问题的能力，在给定他们已接触到的基本要素的前提下。这使得我们能够评估LLMs能够在多大程度上理解复杂问题，其中“理解”了原子部分。我们使用不同大小的模型进行实验，采用各种提示和少样本优化技术，以及微调。我们的结果表明，在宏观F1的绩效方面，随着与优化样本的偏差越来越大，从0.45降至0.09。即使在输入中提供了所有必要信息的情况下，对于复杂度最低的数据集而言，F1得分也没有超过0.57。因此，我们得出结论，LLMs在系统性地、组合地理解问题并将其映射为SPARQL查询方面存在困难。

论文及项目相关链接

PDF Research Track, 24th International Semantic Web Conference (ISWC 2025), November 2-6, 2025, Nara, Japan

Summary

本文探讨了大型语言模型（LLMs）在解读问题并将其转化为SPARQL查询时的能力。针对这一问题，提出了一个评估LLMs在解读问题时，是否能够组合运用的能力的基准测试。该基准测试通过生成基于DBpedia图谱模式的三个不同难度的数据集来进行测试，并利用Lemon词典进行表达。实验结果显示，随着与训练样本的偏差增大，大型语言模型在复杂问题解读方面的性能急剧下降。因此，LLMs在系统性组合解读问题和转化为SPARQL查询方面仍存在困难。

Key Takeaways

大型语言模型具备显著的语言解读能力，并能成功应用于将问题解读转化为SPARQL查询。
论文提出了一个评估LLMs解读问题能力的基准测试，通过生成基于DBpedia图谱模式的三个不同难度的数据集进行测试。
实验采用不同的模型大小、提示和少样本优化技术以及微调方法进行研究。
性能在宏观F1得分方面，从优化样本的偏离度增加时，从0.45降至0.09。
即使所有必要信息都提供给模型，对于最低复杂度数据集，F1得分也不超过0.57。
LLMs在系统性组合解读问题和将其转化为SPARQL查询方面存在困难。

Cool Papers

点此查看论文截图

Context Tuning for In-Context Optimization

Authors:Jack Lu, Ryan Teehan, Zhenbang Yang, Mengye Ren

We introduce Context Tuning, a simple and effective method to significantly enhance few-shot adaptation of language models (LLMs) without fine-tuning model parameters. While prompt-based adaptation techniques have demonstrated the effectiveness of lightweight adaptation methods for LLMs, they typically initialize a trainable prompt or prefix with irrelevant tokens for the task at hand. In contrast, Context Tuning initializes the trainable prompt or prefix with task-specific demonstration examples, leveraging the model’s inherent In-Context Learning (ICL) ability to extract relevant information for improved few-shot learning performance. Extensive evaluations on benchmarks such as CrossFit, UnifiedQA, MMLU, BIG-Bench Hard, and ARC demonstrate that Context Tuning outperforms traditional prompt-based adaptation methods and achieves competitive accuracy to Test-Time Training with significantly higher training efficiency.

我们引入了Context Tuning，这是一种简单有效的方法，可以在不微调语言模型（LLM）参数的情况下显著提高LLM的少量适应。虽然基于提示的适应技术已经证明了其对LLM的轻量级适应方法的有效性，但它们通常使用与当前任务不相关的令牌来初始化可训练的提示或前缀。相比之下，Context Tuning使用特定任务的演示示例来初始化可训练的提示或前缀，利用模型固有的上下文学习（ICL）能力来提取相关信息，以提高少量学习的性能。在CrossFit、UnifiedQA、MMLU、BIG-Bench Hard和ARC等基准测试上的广泛评估表明，Context Tuning优于传统的基于提示的适应方法，并实现了与测试时间训练相当的准确率，同时显著提高了训练效率。

论文及项目相关链接

PDF A short version of this paper was accepted at ICML 2025 Workshop on Test-Time Adaptation

Summary

本文介绍了Context Tuning，这是一种简单有效的方法，可在不微调语言模型参数的情况下显著提高其小样本适应性。与基于提示的适应技术相比，Context Tuning通过任务特定的演示示例初始化可训练的提示或前缀，利用模型的固有上下文学习能力来提取相关信息，从而提高小样本学习的性能。在CrossFit、UnifiedQA、MMLU、BIG-Bench Hard和ARC等基准测试上的广泛评估表明，Context Tuning优于传统的基于提示的适应方法，并实现了与测试时间训练相当的精度，同时显著提高了训练效率。

Key Takeaways

Context Tuning是一种增强语言模型小样本适应性的简单有效方法，无需微调模型参数。
与其他基于提示的适应技术不同，Context Tuning通过任务特定演示示例初始化可训练的提示或前缀。
Context Tuning利用语言模型的固有上下文学习能力来提取相关信息。
Context Tuning在多个基准测试上表现出优异的性能，如CrossFit、UnifiedQA、MMLU、BIG-Bench Hard和ARC。
Context Tuning优于传统的基于提示的适应方法。
Context Tuning实现了与测试时间训练相当的精度。

Cool Papers

点此查看论文截图

PlantSegNeRF: A few-shot, cross-species method for plant 3D instance point cloud reconstruction via joint-channel NeRF with multi-view image instance matching

Authors:Xin Yang, Ruiming Du, Hanyang Huang, Jiayang Xie, Pengyao Xie, Leisen Fang, Ziyue Guo, Nanjun Jiang, Yu Jiang, Haiyan Cen

Organ segmentation of plant point clouds is a prerequisite for the high-resolution and accurate extraction of organ-level phenotypic traits. Although the fast development of deep learning has boosted much research on segmentation of plant point clouds, the existing techniques for organ segmentation still face limitations in resolution, segmentation accuracy, and generalizability across various plant species. In this study, we proposed a novel approach called plant segmentation neural radiance fields (PlantSegNeRF), aiming to directly generate high-precision instance point clouds from multi-view RGB image sequences for a wide range of plant species. PlantSegNeRF performed 2D instance segmentation on the multi-view images to generate instance masks for each organ with a corresponding ID. The multi-view instance IDs corresponding to the same plant organ were then matched and refined using a specially designed instance matching module. The instance NeRF was developed to render an implicit scene, containing color, density, semantic and instance information. The implicit scene was ultimately converted into high-precision plant instance point clouds based on the volume density. The results proved that in semantic segmentation of point clouds, PlantSegNeRF outperformed the commonly used methods, demonstrating an average improvement of 16.1%, 18.3%, 17.8%, and 24.2% in precision, recall, F1-score, and IoU compared to the second-best results on structurally complex species. More importantly, PlantSegNeRF exhibited significant advantages in plant point cloud instance segmentation tasks. Across all plant species, it achieved average improvements of 11.7%, 38.2%, 32.2% and 25.3% in mPrec, mRec, mCov, mWCov, respectively. This study extends the organ-level plant phenotyping and provides a high-throughput way to supply high-quality 3D data for the development of large-scale models in plant science.

植物点云器官分割是实现高分辨率和精确提取器官水平表型特征的前提。尽管深度学习快速发展推动了大量关于植物点云分割的研究，但现有的器官分割技术在分辨率、分割精度和跨多种植物物种的通用性方面仍存在局限性。本研究提出了一种名为PlantSegNeRF的新型方法，旨在从多视角RGB图像序列直接生成高精度的实例点云，适用于多种植物物种。PlantSegNeRF对多视角图像进行二维实例分割，为每个器官生成具有相应ID的实例掩膜。然后，使用专门设计的实例匹配模块匹配和细化对应于同一植物器官的多视角实例ID。开发了实例NeRF来呈现包含颜色、密度、语义和实例信息的隐式场景。最终，基于体积密度将隐式场景转换为高精度的植物实例点云。结果证明，在点云语义分割中，PlantSegNeRF优于常用方法，在结构复杂的物种上，相较于第二好的结果，其在精度、召回率、F1分数和IoU上分别平均提高了16.1%、18.3%、17.8%和24.2%。更重要的是，PlantSegNeRF在植物点云实例分割任务中表现出显著优势。在所有植物物种中，它在mPrec、mRec、mCov和mWCov上分别平均提高了11.7%、38.2%、32.2%和25.3%。本研究扩展了器官水平的植物表型分析，并为植物科学中大规模模型的开发提供了一种高通量的方法来提供高质量的三维数据。

论文及项目相关链接

PDF

Summary

本文提出一种名为PlantSegNeRF的新方法，用于从多视角RGB图像序列直接生成高精度植物器官点云。该方法通过2D实例分割生成实例掩模，并利用专门设计的实例匹配模块进行匹配和细化。之后开发实例NeRF渲染隐式场景，最后基于体积密度转换为高精度植物实例点云。实验结果表明，该方法在语义分割和植物点云实例分割任务上均表现优越，为植物科学的大型模型开发提供高质量3D数据。

Key Takeaways

PlantSegNeRF方法能够直接生成高精度植物器官点云。
通过多视角RGB图像序列进行2D实例分割，生成实例掩模。
专门设计的实例匹配模块用于匹配和细化多视角实例。
开发实例NeRF以渲染包含颜色、密度、语义和实例信息的隐式场景。
PlantSegNeRF在语义分割和植物点云实例分割任务上表现优越。
与其他常用方法相比，PlantSegNeRF在结构性复杂物种的精度、召回率、F1分数和IoU等方面平均提高了16.1%、18.3%、17.8%和24.2%。

Cool Papers

点此查看论文截图

AdFair-CLIP: Adversarial Fair Contrastive Language-Image Pre-training for Chest X-rays

Authors:Chenlang Yi, Zizhan Xiong, Qi Qi, Xiyuan Wei, Girish Bathla, Ching-Long Lin, Bobak Jack Mortazavi, Tianbao Yang

Contrastive Language-Image Pre-training (CLIP) models have demonstrated superior performance across various visual tasks including medical image classification. However, fairness concerns, including demographic biases, have received limited attention for CLIP models. This oversight leads to critical issues, particularly those related to race and gender, resulting in disparities in diagnostic outcomes and reduced reliability for underrepresented groups. To address these challenges, we introduce AdFair-CLIP, a novel framework employing adversarial feature intervention to suppress sensitive attributes, thereby mitigating spurious correlations and improving prediction fairness. We conduct comprehensive experiments on chest X-ray (CXR) datasets, and show that AdFair-CLIP significantly enhances both fairness and diagnostic accuracy, while maintaining robust generalization in zero-shot and few-shot scenarios. These results establish new benchmarks for fairness-aware learning in CLIP-based medical diagnostic models, particularly for CXR analysis.

对比语言图像预训练（CLIP）模型在包括医学图像分类在内的各种视觉任务中表现出卓越的性能。然而，关于CLIP模型的公平性关注，包括人口统计偏见，并未得到足够的重视。这种疏忽导致了关键问题，特别是与种族和性别有关的问题，从而导致诊断结果存在差距，并对代表性不足的群体降低了可靠性。为了解决这些挑战，我们引入了AdFair-CLIP，这是一个采用对抗性特征干预来抑制敏感属性的新型框架，从而减轻偶然相关性，提高预测公平性。我们在胸部X射线（CXR）数据集上进行了全面的实验，结果表明AdFair-CLIP在零样本和少样本场景中，显著提高了公平性和诊断准确性，同时保持了稳健的泛化能力。这些结果为CLIP基医学诊断模型中的公平意识学习，特别是CXR分析，建立了新的基准。

论文及项目相关链接

PDF This preprint has been accepted by MICCAI 2025

Summary

CLIP模型在包括医学图像分类在内的各种视觉任务中展现出卓越性能，但在公平性问题上，特别是种族和性别方面的偏见受到的关注有限。为解决这一问题，我们提出AdFair-CLIP框架，采用对抗特征干预来抑制敏感属性，减少偶然关联并提升预测公平性。在胸部X光数据集上的实验表明，AdFair-CLIP在零样本和少样本场景下显著提升了公平性和诊断准确性，同时保持了良好的泛化能力。这为CLIP医学诊断模型中的公平意识学习树立了新基准，特别是在CXR分析方面。

Key Takeaways