⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-02-28 更新
Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation
Authors:Zhaochong An, Guolei Sun, Yun Liu, Runjia Li, Min Wu, Ming-Ming Cheng, Ender Konukoglu, Serge Belongie
Few-shot 3D point cloud segmentation (FS-PCS) aims at generalizing models to segment novel categories with minimal annotated support samples. While existing FS-PCS methods have shown promise, they primarily focus on unimodal point cloud inputs, overlooking the potential benefits of leveraging multimodal information. In this paper, we address this gap by introducing a multimodal FS-PCS setup, utilizing textual labels and the potentially available 2D image modality. Under this easy-to-achieve setup, we present the MultiModal Few-Shot SegNet (MM-FSS), a model effectively harnessing complementary information from multiple modalities. MM-FSS employs a shared backbone with two heads to extract intermodal and unimodal visual features, and a pretrained text encoder to generate text embeddings. To fully exploit the multimodal information, we propose a Multimodal Correlation Fusion (MCF) module to generate multimodal correlations, and a Multimodal Semantic Fusion (MSF) module to refine the correlations using text-aware semantic guidance. Additionally, we propose a simple yet effective Test-time Adaptive Cross-modal Calibration (TACC) technique to mitigate training bias, further improving generalization. Experimental results on S3DIS and ScanNet datasets demonstrate significant performance improvements achieved by our method. The efficacy of our approach indicates the benefits of leveraging commonly-ignored free modalities for FS-PCS, providing valuable insights for future research. The code is available at https://github.com/ZhaochongAn/Multimodality-3D-Few-Shot
少量样本3D点云分割(FS-PCS)旨在将模型推广到对新型类别的分割,且只需极少的标注支持样本。虽然现有的FS-PCS方法已经显示出潜力,但它们主要关注单模态点云输入,忽视了利用多模态信息可能带来的潜在好处。针对这一问题,本文引入了一种多模态FS-PCS设置,利用文本标签和可能存在的2D图像模态。在这个易于实现的设置下,我们提出了多模态少量样本SegNet(MM-FSS),该模型能够有效地利用多模态的互补信息。MM-FSS采用共享主干和两个头来提取跨模态和单模态视觉特征,并使用预训练的文本编码器生成文本嵌入。为了充分利用多模态信息,我们提出了多模态关联融合(MCF)模块来生成多模态关联,以及多模态语义融合(MSF)模块,利用文本感知语义指导来优化关联。此外,我们还提出了一种简单有效的测试时自适应跨模态校准(TACC)技术,以减轻训练偏见,进一步提高泛化能力。在S3DIS和ScanNet数据集上的实验结果表明,我们的方法取得了显著的性能改进。我们的方法的有效性表明了利用常被忽略的免费模态对FS-PCS的益处,为未来的研究提供了有价值的见解。代码可用在 https://github.com/ZhaochongAn/Multimodality-3D-Few-Shot
论文及项目相关链接
PDF Published at ICLR 2025 (Spotlight)
Summary
本文介绍了针对少样本3D点云分割(FS-PCS)的多模态方法。现有方法主要关注单模态点云输入,忽略了多模态信息的潜力。本文提出一种多模态FS-PCS设置,利用文本标签和可能的2D图像模态。在此设置下,提出了一种名为MultiModal Few-Shot SegNet(MM-FSS)的模型,该模型有效利用多模态的互补信息。通过共享主干和两个头来提取跨模态和单模态视觉特征,并使用预训练的文本编码器生成文本嵌入。为充分利用多模态信息,提出了Multimodal Correlation Fusion(MCF)模块和Multimodal Semantic Fusion(MSF)模块来生成和精炼多模态关联。此外,还提出了一种简单有效的测试时自适应跨模态校准(TACC)技术来缓解训练偏见,进一步提高泛化能力。在S3DIS和ScanNet数据集上的实验结果表明,该方法实现了显著的性能提升。
Key Takeaways
- 现有FS-PCS方法主要关注单模态点云输入,忽略了多模态信息的潜力。
- 提出了一种多模态FS-PCS设置,包括文本标签和可能的2D图像模态的利用。
- 引入了MultiModal Few-Shot SegNet(MM-FSS)模型,利用多模态的互补信息。
- MM-FSS模型通过共享主干和两个头来提取跨模态和单模态视觉特征,并利用预训练的文本编码器。
- 提出了Multimodal Correlation Fusion(MCF)和Multimodal Semantic Fusion(MSF)模块来生成和精炼多模态关联。
- 引入了测试时自适应跨模态校准(TACC)技术来缓解训练偏见,提高模型的泛化能力。
点此查看论文截图


