⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-09-11 更新
Object-level Correlation for Few-Shot Segmentation
Authors:Chunlin Wen, Yu Zhang, Jie Fan, Hongyuan Zhu, Xiu-Shen Wei, Yijun Wang, Zhiqiang Kou, Shuzhou Sun
Few-shot semantic segmentation (FSS) aims to segment objects of novel categories in the query images given only a few annotated support samples. Existing methods primarily build the image-level correlation between the support target object and the entire query image. However, this correlation contains the hard pixel noise, \textit{i.e.}, irrelevant background objects, that is intractable to trace and suppress, leading to the overfitting of the background. To address the limitation of this correlation, we imitate the biological vision process to identify novel objects in the object-level information. Target identification in the general objects is more valid than in the entire image, especially in the low-data regime. Inspired by this, we design an Object-level Correlation Network (OCNet) by establishing the object-level correlation between the support target object and query general objects, which is mainly composed of the General Object Mining Module (GOMM) and Correlation Construction Module (CCM). Specifically, GOMM constructs the query general object feature by learning saliency and high-level similarity cues, where the general objects include the irrelevant background objects and the target foreground object. Then, CCM establishes the object-level correlation by allocating the target prototypes to match the general object feature. The generated object-level correlation can mine the query target feature and suppress the hard pixel noise for the final prediction. Extensive experiments on PASCAL-${5}^{i}$ and COCO-${20}^{i}$ show that our model achieves the state-of-the-art performance.
小样语义分割(FSS)旨在根据少量已标注的支持样本对查询图像中的新类别对象进行分割。现有方法主要构建支持目标对象和整个查询图像之间的图像级关联。然而,这种关联包含难以追踪和抑制的硬像素噪声,即无关的背景对象,导致背景过拟合。为了解决这种关联的局限性,我们模仿生物视觉过程来识别对象级别的信息中的新对象。在一般对象中的目标识别比在整个图像中更有效,特别是在数据较少的情况下。受此启发,我们设计了一个对象级关联网络(OCNet),通过建立支持目标对象和查询一般对象之间的对象级关联,主要由通用对象挖掘模块(GOMM)和关联构建模块(CCM)组成。具体而言,GOMM通过学习显著性和高级相似性线索来构建查询通用对象特征,其中通用对象包括无关的背景对象和目标前景对象。然后,CCM通过建立对象级关联,将目标原型分配给通用对象特征进行匹配。生成的对象级关联可以挖掘查询目标特征并抑制最终预测中的硬像素噪声。在PASCAL-5i和COCO-20i上的大量实验表明,我们的模型达到了最先进的性能。
论文及项目相关链接
PDF This paper was accepted by ICCV 2025
Summary
本文提出一种用于解决小样本语义分割问题的新方法。针对现有方法在图像级别上构建支持目标对象和查询图像之间关联时存在的问题,本文采用生物视觉模拟方式,在对象级别上识别新目标对象。通过建立支持目标对象和查询对象之间的对象级别关联,设计出一种对象关联网络(OCNet),并主要由通用对象挖掘模块(GOMM)和关联构建模块(CCM)组成。在PASCAL-5i和COCO-20i等数据集上的实验表明,该方法取得了最佳性能。
Key Takeaways
- Few-shot semantic segmentation (FSS)旨在使用少量标注的支持样本对查询图像中的新类别对象进行分割。
- 现有方法主要在图像级别上建立支持目标对象和查询图像之间的关联。
- 图像级别的关联包含难以追踪和抑制的硬像素噪声(即无关的背景对象),导致背景过拟合。
- 本文通过模仿生物视觉过程,在对象级别上识别新目标对象,以克服这一限制。
- 提出了一种对象关联网络(OCNet),由通用对象挖掘模块(GOMM)和关联构建模块(CCM)组成。
- GOMM通过学习显著性和高级相似性线索来构建查询通用对象特征。
点此查看论文截图




Universal Few-Shot Spatial Control for Diffusion Models
Authors:Kiet T. Nguyen, Chanhuyk Lee, Donggyun Kim, Dong Hoon Lee, Seunghoon Hong
Spatial conditioning in pretrained text-to-image diffusion models has significantly improved fine-grained control over the structure of generated images. However, existing control adapters exhibit limited adaptability and incur high training costs when encountering novel spatial control conditions that differ substantially from the training tasks. To address this limitation, we propose Universal Few-Shot Control (UFC), a versatile few-shot control adapter capable of generalizing to novel spatial conditions. Given a few image-condition pairs of an unseen task and a query condition, UFC leverages the analogy between query and support conditions to construct task-specific control features, instantiated by a matching mechanism and an update on a small set of task-specific parameters. Experiments on six novel spatial control tasks show that UFC, fine-tuned with only 30 annotated examples of novel tasks, achieves fine-grained control consistent with the spatial conditions. Notably, when fine-tuned with 0.1% of the full training data, UFC achieves competitive performance with the fully supervised baselines in various control tasks. We also show that UFC is applicable agnostically to various diffusion backbones and demonstrate its effectiveness on both UNet and DiT architectures. Code is available at https://github.com/kietngt00/UFC.
预训练文本到图像扩散模型中的空间条件对生成图像的结构提供了精细的控制。然而,现有的控制适配器表现出有限的适应性和较高的训练成本,在遇到与训练任务大相径庭的新型空间控制条件时尤为明显。为了解决这个问题,我们提出了通用少样本控制(UFC),这是一种通用的少样本控制适配器,能够推广到新型空间条件。对于未见任务的一些图像条件对和查询条件,UFC通过查询和支持条件之间的类比来构建任务特定的控制特征,通过匹配机制和少量任务特定参数的更新来实现。在六个新型空间控制任务上的实验表明,UFC只需要用未见任务的30个标注样本进行微调,就能实现与空间条件一致的精细控制。值得注意的是,当使用完整训练数据的0.1%进行微调时,UFC在各种控制任务中的性能与完全监督的基线相比具有竞争力。我们还证明了UFC可独立于各种扩散主干而适用,并在UNet和DiT架构上都展示了其有效性。代码可在https://github.com/kietngt00/UFC获得。
论文及项目相关链接
Summary
空间条件在预训练的文本到图像扩散模型中,对生成图像的结构具有精细的控制作用。然而,现有控制适配器在面对与训练任务差异较大的新型空间控制条件时,展现出有限的适应性和较高的训练成本。为解决这一问题,本文提出一种通用的少样本控制适配器——Universal Few-Shot Control(UFC),它能够适应新型空间条件。UFC利用未见任务的一些图像条件对和查询条件,通过查询和支持条件之间的类比来构建任务特定的控制特征,通过匹配机制和少量任务特定参数的更新来实现。在六个新型空间控制任务上的实验表明,UFC仅通过30个未见任务的标注样本进行微调,即可实现与空间条件一致的精细控制。值得注意的是,当使用全训练数据的0.1%进行微调时,UFC在各种控制任务中的性能与全监督基线相当。UFC还可适用于各种扩散主干网络,并在UNet和DiT架构上展现出其有效性。
Key Takeaways
- 空间条件在文本到图像扩散模型中对于生成图像的结构具有精细的控制作用。
- 现有控制适配器在面对新型空间控制条件时存在适应性和训练成本的问题。
- Universal Few-Shot Control (UFC) 是一种通用的少样本控制适配器,能够适应新型空间条件。
- UFC通过利用少量标注样本构建任务特定的控制特征来实现精细控制。
- UFC在多个空间控制任务上表现出优异的性能,即使使用极少的标注数据也能实现良好的控制效果。
- UFC的性能与全监督基线相当,并且适用于各种扩散主干网络。
点此查看论文截图


HealthSLM-Bench: Benchmarking Small Language Models for Mobile and Wearable Healthcare Monitoring
Authors:Xin Wang, Ting Dang, Xinyu Zhang, Vassilis Kostakos, Michael J. Witbrock, Hong Jia
Mobile and wearable healthcare monitoring play a vital role in facilitating timely interventions, managing chronic health conditions, and ultimately improving individuals’ quality of life. Previous studies on large language models (LLMs) have highlighted their impressive generalization abilities and effectiveness in healthcare prediction tasks. However, most LLM-based healthcare solutions are cloud-based, which raises significant privacy concerns and results in increased memory usage and latency. To address these challenges, there is growing interest in compact models, Small Language Models (SLMs), which are lightweight and designed to run locally and efficiently on mobile and wearable devices. Nevertheless, how well these models perform in healthcare prediction remains largely unexplored. We systematically evaluated SLMs on health prediction tasks using zero-shot, few-shot, and instruction fine-tuning approaches, and deployed the best performing fine-tuned SLMs on mobile devices to evaluate their real-world efficiency and predictive performance in practical healthcare scenarios. Our results show that SLMs can achieve performance comparable to LLMs while offering substantial gains in efficiency and privacy. However, challenges remain, particularly in handling class imbalance and few-shot scenarios. These findings highlight SLMs, though imperfect in their current form, as a promising solution for next-generation, privacy-preserving healthcare monitoring.
移动和可穿戴医疗设备在促进及时干预、管理慢性健康状况以及最终提高个人生活质量方面发挥着至关重要的作用。先前关于大型语言模型(LLM)的研究已经突出了其在医疗保健预测任务中的令人印象深刻的泛化能力和效果。然而,大多数基于LLM的医疗保健解决方案都是基于云的,这引发了重大的隐私担忧,并导致内存使用和延迟增加。为了解决这些挑战,人们对小型语言模型(SLM)的兴趣日益增长,这些模型轻巧且设计可在移动和可穿戴设备上本地高效运行。然而,这些模型在医疗保健预测方面的表现如何仍然在很大程度上未被探索。我们系统地评估了SLM在健康预测任务中的零样本、少样本和指令微调方法,并在移动设备上部署了表现最佳的微调SLM,以评估其在现实世界的效率和预测性能的实际医疗保健场景中的表现。我们的结果表明,SLM可以在实现与LLM相当的性能的同时,在效率和隐私方面提供实质性的改进。然而,仍然存在挑战,特别是在处理类别不平衡和少样本场景方面。这些发现突出了SLM尽管在当前形式下并不完美,但作为下一代隐私保护医疗保健监测的有希望解决方案。
论文及项目相关链接
PDF 9 pages, 6 tables, 6 figures
Summary
小型语言模型(SLMs)在移动和可穿戴设备上的健康监测中展现出潜力。研究评价了SLMs在健康预测任务中的性能,并将其与大型语言模型(LLMs)相比。结果显示,SLMs虽面临类不平衡和少量样本场景的挑战,但效率与隐私保护性能优越,有潜力成为下一代健康监测的隐私保护解决方案。
Key Takeaways
- 移动和可穿戴设备在医疗保健监测中扮演重要角色,有助于及时干预、管理慢性疾病并改善生活质量。
- 大型语言模型(LLMs)在医疗保健预测任务中表现出强大的泛化能力和效果。
- LLMs主要作为云解决方案存在,存在隐私担忧、内存使用增加和延迟问题。
- 小型语言模型(SLMs)是轻量级的,旨在在移动和可穿戴设备上本地运行,提高效率。
- SLMs在健康预测任务中的性能与LLMs相当,同时提供实质性的效率和隐私收益。
- SLMs在处理类不平衡和少量样本场景时仍面临挑战。
点此查看论文截图




MSCPT: Few-shot Whole Slide Image Classification with Multi-scale and Context-focused Prompt Tuning
Authors:Minghao Han, Linhao Qu, Dingkang Yang, Xukun Zhang, Xiaoying Wang, Lihua Zhang
Multiple instance learning (MIL) has become a standard paradigm for the weakly supervised classification of whole slide images (WSIs). However, this paradigm relies on using a large number of labeled WSIs for training. The lack of training data and the presence of rare diseases pose significant challenges for these methods. Prompt tuning combined with pre-trained Vision-Language models (VLMs) is an effective solution to the Few-shot Weakly Supervised WSI Classification (FSWC) task. Nevertheless, applying prompt tuning methods designed for natural images to WSIs presents three significant challenges: 1) These methods fail to fully leverage the prior knowledge from the VLM’s text modality; 2) They overlook the essential multi-scale and contextual information in WSIs, leading to suboptimal results; and 3) They lack exploration of instance aggregation methods. To address these problems, we propose a Multi-Scale and Context-focused Prompt Tuning (MSCPT) method for FSWC task. Specifically, MSCPT employs the frozen large language model to generate pathological visual language prior knowledge at multiple scales, guiding hierarchical prompt tuning. Additionally, we design a graph prompt tuning module to learn essential contextual information within WSI, and finally, a non-parametric cross-guided instance aggregation module has been introduced to derive the WSI-level features. Extensive experiments, visualizations, and interpretability analyses were conducted on five datasets and three downstream tasks using three VLMs, demonstrating the strong performance of our MSCPT. All codes have been made publicly accessible at https://github.com/Hanminghao/MSCPT.
多任务学习(MIL)已成为全幻灯片图像(WSI)的弱监督分类的标准范式。然而,这种范式依赖于大量标记的WSI进行训练。缺乏训练数据和罕见疾病的存在给这些方法带来了重大挑战。结合预训练的语言视觉模型(VLMs)进行提示微调是解决少样本弱监督WSI分类(FSWC)任务的有效方法。然而,将针对自然图像设计的提示微调方法应用于WSI面临三大挑战:1)这些方法未能充分利用来自VLM文本模态的先验知识;2)它们忽略了WSI中的多尺度和上下文信息,导致结果不理想;3)缺乏对实例聚合方法的探索。为了解决这些问题,我们提出了一种用于FSWC任务的基于多尺度和上下文关注的提示微调(MSCPT)方法。具体来说,MSCPT使用冻结的大型语言模型在多个尺度上生成病理学视觉语言先验知识,引导分层提示微调。此外,我们设计了一个图形提示调整模块来学习WSI中的关键上下文信息,并最终引入了一个非参数化交叉引导实例聚合模块来提取WSI级别的特征。在五个数据集和三个下游任务上进行了广泛的实验、可视化和解释性分析,使用了三种VLMs,证明了我们的MSCPT的强大性能。所有代码已公开访问:https://github.com/Hanminghao/MSCPT。
论文及项目相关链接
PDF This work has been submitted to the IEEE TMI for possible publication
Summary
在数据有限的情况下,对全幻灯图像进行弱监督分类面临巨大挑战。鉴于此,针对少数弱监督全幻灯图像分类任务(FSWC),结合预训练的语言视觉模型(VLM),提出一种多尺度与上下文相关提示微调方法(MSCPT)。MSCPT在多尺度上利用先验知识生成病理视觉语言,并设计了提示调整模块来学习和理解上下文信息。此外,引入了非参数跨引导实例聚合模块来提取幻灯片级别的特征。在五个数据集和三个下游任务上的实验证明了其强大性能。代码已公开。
Key Takeaways
- 多实例学习(MIL)是弱监督全幻灯图像分类的标准范式,但需要大量标记数据进行训练。
- 针对数据缺乏和罕见疾病的情况,提示微调与预训练的语言视觉模型相结合是解决少数弱监督全幻灯图像分类任务的有效方法。
- 应用于全幻灯图像的提示微调方法面临三大挑战:未能充分利用语言视觉模型的文本模态先验知识、忽视多尺度和上下文信息、缺乏实例聚合方法的探索。
- 提出的MSCPT方法通过在多尺度上生成病理视觉语言的先验知识,解决了上述问题,并设计了层次化的提示调整模块来学习上下文信息。
- 非参数跨引导实例聚合模块被引入以提取幻灯片级别的特征。
- 在多个数据集和下游任务上的实验证明了MSCPT方法的强大性能。
点此查看论文截图


