嘘~ 正在从服务器偷取页面 . . .

NeRF


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-10-25 更新

Extreme Views: 3DGS Filter for Novel View Synthesis from Out-of-Distribution Camera Poses

Authors:Damian Bowness, Charalambos Poullis

When viewing a 3D Gaussian Splatting (3DGS) model from camera positions significantly outside the training data distribution, substantial visual noise commonly occurs. These artifacts result from the lack of training data in these extrapolated regions, leading to uncertain density, color, and geometry predictions from the model. To address this issue, we propose a novel real-time render-aware filtering method. Our approach leverages sensitivity scores derived from intermediate gradients, explicitly targeting instabilities caused by anisotropic orientations rather than isotropic variance. This filtering method directly addresses the core issue of generative uncertainty, allowing 3D reconstruction systems to maintain high visual fidelity even when users freely navigate outside the original training viewpoints. Experimental evaluation demonstrates that our method substantially improves visual quality, realism, and consistency compared to existing Neural Radiance Field (NeRF)-based approaches such as BayesRays. Critically, our filter seamlessly integrates into existing 3DGS rendering pipelines in real-time, unlike methods that require extensive post-hoc retraining or fine-tuning. Code and results at https://damian-bowness.github.io/EV3DGS

当从训练数据分布之外的相机位置观察3D高斯平铺(3DGS)模型时,通常会出现大量的视觉噪声。这些伪影是由于在这些外推区域缺乏训练数据,导致模型的密度、颜色和几何预测不确定。为了解决这一问题,我们提出了一种新的实时渲染感知滤波方法。我们的方法利用从中间梯度派生的灵敏度得分,明确针对由非各向同性方向引起的非稳定性,而不是各向同性方差。这种滤波方法直接解决了生成不确定性的核心问题,使得3D重建系统即使在用户自由导航原始训练视点之外时,也能保持高视觉保真度。实验评估表明,与现有的基于神经辐射场(NeRF)的方法(如BayesRays)相比,我们的方法在视觉质量、真实感和一致性方面有了显著改进。关键的是,我们的过滤器可以无缝地实时集成到现有的3DGS渲染流程中,不同于那些需要大量事后再训练或微调的方法。相关代码和结果请访问:https://damian-bowness.github.io/EV3DGS

论文及项目相关链接

PDF

Summary

本文描述了针对3D高斯模糊模型在相机视角超出训练数据分布时产生的视觉噪声问题,提出了一种实时渲染感知滤波方法。该方法利用中间梯度衍生的灵敏度评分,针对各向异性方向引起的不稳定进行明确处理,而非笼统地处理方差。此滤波方法直接解决了生成不确定性问题,使得用户在自由导航至原始训练视角之外时,仍能维持高视觉保真度。实验评估显示,相较于现有的NeRF方法,如BayesRays,该方法在视觉质量、真实感和一致性方面有明显提升。最重要的是,该滤波器无缝集成到现有3DGS渲染管线中,实现了实时操作,不同于需要大量事后再训练或精细调整的方法。

Key Takeaways

  1. 当相机视角超出训练数据分布时,3DGS模型会出现视觉噪声。
  2. 这种噪声源于缺乏训练数据,导致模型在预测密度、颜色和几何数据时的不确定性。
  3. 提出了一种实时渲染感知滤波方法来解决这一问题。
  4. 该方法利用中间梯度衍生的灵敏度评分,专门针对各向异性方向引起的稳定性问题。
  5. 滤波方法提升了视觉质量、真实感和一致性,相较于现有NeRF方法有明显优势。
  6. 滤波器可无缝集成到现有3DGS渲染管线中,实现实时操作。

Cool Papers

点此查看论文截图

AegisRF: Adversarial Perturbations Guided with Sensitivity for Protecting Intellectual Property of Neural Radiance Fields

Authors:Woo Jae Kim, Kyu Beom Han, Yoonki Cho, Youngju Na, Junsik Jung, Sooel Son, Sung-eui Yoon

As Neural Radiance Fields (NeRFs) have emerged as a powerful tool for 3D scene representation and novel view synthesis, protecting their intellectual property (IP) from unauthorized use is becoming increasingly crucial. In this work, we aim to protect the IP of NeRFs by injecting adversarial perturbations that disrupt their unauthorized applications. However, perturbing the 3D geometry of NeRFs can easily deform the underlying scene structure and thus substantially degrade the rendering quality, which has led existing attempts to avoid geometric perturbations or restrict them to explicit spaces like meshes. To overcome this limitation, we introduce a learnable sensitivity to quantify the spatially varying impact of geometric perturbations on rendering quality. Building upon this, we propose AegisRF, a novel framework that consists of a Perturbation Field, which injects adversarial perturbations into the pre-rendering outputs (color and volume density) of NeRF models to fool an unauthorized downstream target model, and a Sensitivity Field, which learns the sensitivity to adaptively constrain geometric perturbations, preserving rendering quality while disrupting unauthorized use. Our experimental evaluations demonstrate the generalized applicability of AegisRF across diverse downstream tasks and modalities, including multi-view image classification and voxel-based 3D localization, while maintaining high visual fidelity. Codes are available at https://github.com/wkim97/AegisRF.

随着神经辐射场(NeRF)作为3D场景表示和新型视图合成的强大工具的出现,保护其知识产权(IP)免受未经授权的使用变得越来越重要。在这项工作中,我们的目标是通过注入对抗性扰动来保护NeRF的知识产权,这些扰动会破坏其未经授权的应用。然而,扰动NeRF的3D几何结构很容易变形底层场景结构,从而显著降低渲染质量,导致现有尝试避免几何扰动或将其限制在显式空间(如网格)。为了克服这一局限性,我们引入了一种可学习的敏感性,以量化几何扰动对渲染质量的空间变化影响。在此基础上,我们提出了AegisRF,这是一个新型框架,它包括一个扰动场,该扰动场将对抗性扰动注入NeRF模型的预渲染输出(颜色和体积密度)以欺骗未经授权的目标模型,以及一个敏感场,该敏感场学习敏感性以自适应地约束几何扰动,在破坏未经授权使用的同时保持渲染质量。我们的实验评估表明,AegisRF在多种下游任务和模态中具有普遍适用性,包括多视图图像分类和基于体素的3D定位,同时保持高保真视觉效果。代码可在https://github.com/wkim97/AegisRF找到。

论文及项目相关链接

PDF BMVC 2025

Summary

神经网络辐射场(NeRF)的知识产权保护日益重要。本研究旨在通过注入干扰扰动来保护NeRF的IP,干扰其未经授权的应用。针对现有方法避免几何扰动或仅在明确空间(如网格)内限制几何扰动的问题,本研究引入可学习的敏感性来量化几何扰动对渲染质量的空间变化影响。基于此,我们提出了AegisRF框架,该框架包含一个扰动场,用于向NeRF模型的预渲染输出(颜色和体积密度)注入对抗性扰动,以欺骗未经授权的下游目标模型,以及一个敏感性场,用于学习敏感性以自适应地约束几何扰动,从而在破坏未经授权使用的同时保持渲染质量的高保真度。

Key Takeaways

  1. 本研究关注NeRF技术的知识产权保护。
  2. 通过注入对抗性扰动来保护NeRF的IP。
  3. 引入可学习的敏感性来量化几何扰动对渲染质量的影响。
  4. 提出AegisRF框架,包含扰动场和敏感性场。
  5. 扰动场注入扰动以干扰未经授权的下游目标模型。
  6. 敏感性场用于自适应约束几何扰动,保持高渲染质量。
  7. 实验评估证明AegisRF在多种下游任务和模态中具有普遍适用性。

Cool Papers

点此查看论文截图

Advances in 4D Representation: Geometry, Motion, and Interaction

Authors:Mingrui Zhao, Sauradip Nag, Kai Wang, Aditya Vora, Guangda Ji, Peter Chun, Ali Mahdavi-Amiri, Hao Zhang

We present a survey on 4D generation and reconstruction, a fast-evolving subfield of computer graphics whose developments have been propelled by recent advances in neural fields, geometric and motion deep learning, as well 3D generative artificial intelligence (GenAI). While our survey is not the first of its kind, we build our coverage of the domain from a unique and distinctive perspective of 4D representations/}, to model 3D geometry evolving over time while exhibiting motion and interaction. Specifically, instead of offering an exhaustive enumeration of many works, we take a more selective approach by focusing on representative works to highlight both the desirable properties and ensuing challenges of each representation under different computation, application, and data scenarios. The main take-away message we aim to convey to the readers is on how to select and then customize the appropriate 4D representations for their tasks. Organizationally, we separate the 4D representations based on three key pillars: geometry, motion, and interaction. Our discourse will not only encompass the most popular representations of today, such as neural radiance fields (NeRFs) and 3D Gaussian Splatting (3DGS), but also bring attention to relatively under-explored representations in the 4D context, such as structured models and long-range motions. Throughout our survey, we will reprise the role of large language models (LLMs) and video foundational models (VFMs) in a variety of 4D applications, while steering our discussion towards their current limitations and how they can be addressed. We also provide a dedicated coverage on what 4D datasets are currently available, as well as what is lacking, in driving the subfield forward. Project page:https://mingrui-zhao.github.io/4DRep-GMI/

关于四维生成与重建的调查报告,这是计算机图形学中的一个快速发展的子领域,其发展由神经网络、深度学习和三维生成人工智能的最新进展所推动。虽然这不是第一篇此类调查报告,但我们从四维表示法的独特且与众不同的视角来覆盖该领域的内容,以模拟随时间演化的三维几何结构,同时展示运动和交互。具体来说,我们不提供详尽的作品清单,而是采取更选择性的方法,重点关注代表性作品来强调不同计算、应用和数据场景下每个表示方法的理想属性和面临的挑战。我们旨在向读者传达的主要信息是如何选择然后定制适合其任务的四维表示法。在结构上,我们将根据三个关键支柱将四维表示法分开:几何、运动和交互。我们的讨论不仅包括当今最流行的表示方法,如神经辐射场(NeRF)和三维高斯飞溅(3DGS),而且还关注在四维背景下相对未被充分研究的表示方法,如结构化模型和远程运动。在我们的调查报告中,我们将重新探讨大型语言模型(LLM)和视频基础模型(VFM)在各种四维应用中的作用,同时引导讨论关注它们目前的局限性以及如何解决这些问题。我们还专门介绍了当前可用的四维数据集以及推动该子领域向前发展所缺乏的内容。项目页面:https://mingrui-zhao.github.io/4DRep-GMI/。

论文及项目相关链接

PDF 21 pages. Project Page: https://mingrui-zhao.github.io/4DRep-GMI/

Summary:本文综述了计算机图形学中的新兴子领域四维数据生成与重建的研究现状,强调了其基于神经场、几何和动作深度学习的快速发展。文章重点介绍了在不同计算、应用和场景下的代表性工作,以及如何选择和定制适当的四维表示方法。文章还涵盖了流行的表示方法,如神经辐射场和三维高斯溅射等,并关注相对未深入探索的四维环境下的表示方法。此外,文章也涉及大型语言模型和视频基础模型在四维应用中的角色和局限性。最后,本文还介绍了当前四维数据集的可用性和不足之处。

Key Takeaways

  1. 文章概述了四维数据生成与重建的最新进展,强调该领域由神经场、几何和动作深度学习等技术推动。
  2. 文章通过聚焦代表性工作来展示不同计算、应用和场景下的四维表示方法的优势和挑战。
  3. 文章不仅关注当前流行的表示方法(如神经辐射场和三维高斯溅射),还探讨相对较少的四维环境下的表示方法(如结构化模型和长距离动作)。
  4. 文章探讨了大型语言模型和视频基础模型在四维应用中的作用和局限性。
  5. 文章提到了现有四维数据集的可用性,并指出了目前缺乏的四维数据集的类型。
  6. 文章强调了在选择适当的四维表示方法时需要考虑的因素,包括计算效率、应用场景和数据特性等。

Cool Papers

点此查看论文截图

PlantSegNeRF: A few-shot, cross-species method for plant 3D instance point cloud reconstruction via joint-channel NeRF with multi-view image instance matching

Authors:Xin Yang, Ruiming Du, Hanyang Huang, Jiayang Xie, Pengyao Xie, Leisen Fang, Ziyue Guo, Nanjun Jiang, Yu Jiang, Haiyan Cen

Organ segmentation of plant point clouds is a prerequisite for the high-resolution and accurate extraction of organ-level phenotypic traits. Although the fast development of deep learning has boosted much research on segmentation of plant point clouds, the existing techniques for organ segmentation still face limitations in resolution, segmentation accuracy, and generalizability across various plant species. In this study, we proposed a novel approach called plant segmentation neural radiance fields (PlantSegNeRF), aiming to directly generate high-precision instance point clouds from multi-view RGB image sequences for a wide range of plant species. PlantSegNeRF performed 2D instance segmentation on the multi-view images to generate instance masks for each organ with a corresponding ID. The multi-view instance IDs corresponding to the same plant organ were then matched and refined using a specially designed instance matching module. The instance NeRF was developed to render an implicit scene, containing color, density, semantic and instance information. The implicit scene was ultimately converted into high-precision plant instance point clouds based on the volume density. The results proved that in semantic segmentation of point clouds, PlantSegNeRF outperformed the commonly used methods, demonstrating an average improvement of 16.1%, 18.3%, 17.8%, and 24.2% in precision, recall, F1-score, and IoU compared to the second-best results on structurally complex species. More importantly, PlantSegNeRF exhibited significant advantages in plant point cloud instance segmentation tasks. Across all plant species, it achieved average improvements of 11.7%, 38.2%, 32.2% and 25.3% in mPrec, mRec, mCov, mWCov, respectively. This study extends the organ-level plant phenotyping and provides a high-throughput way to supply high-quality 3D data for the development of large-scale models in plant science.

植物点云器官分割是高分辨率和精确提取器官水平表型特征的前提。尽管深度学习的快速发展促进了植物点云分割的研究,但现有的器官分割技术在分辨率、分割精度和跨多种植物的通用性方面仍面临局限性。本研究提出了一种名为PlantSegNeRF的新型方法,旨在直接从多视角RGB图像序列生成高精度实例点云,适用于广泛的植物物种。PlantSegNeRF对多视角图像进行2D实例分割,为每个器官生成具有相应ID的实例掩膜。然后,使用专门设计的实例匹配模块匹配和细化对应于同一植物器官的多视角实例ID。开发了实例NeRF来呈现包含颜色、密度、语义和实例信息的隐式场景。最终,基于体积密度将隐式场景转换为高精度植物实例点云。结果表明,在点云语义分割中,PlantSegNeRF优于常用方法,在结构复杂的物种上,与第二好的结果相比,精度、召回率、F1分数和IoU平均提高了16.1%、18.3%、17.8%和24.2%。更重要的是,PlantSegNeRF在植物点云实例分割任务中表现出显著优势。在所有植物物种中,它在mPrec、mRec、mCov和mWCov上平均提高了11.7%、38.2%、32.2%和25.3%。该研究扩展了器官水平的植物表型分析,并提供了一种高通量的方法为植物科学的大规模模型发展提供高质量的3D数据。

论文及项目相关链接

PDF

Summary
植物点云器官分割是实现高分辨率和精准提取器官水平表型特征的前提。本研究提出了一种新型方法——植物分割神经辐射场(PlantSegNeRF),可直接从多视角RGB图像序列生成高精度实例点云,适用于多种植物物种。该方法通过多视角实例分割生成每个器官的实例掩膜,并利用专门设计的实例匹配模块进行匹配和细化。实验结果显示,PlantSegNeRF在语义分割和实例分割方面均表现出优异性能,平均提升精度、召回率、F1分数和IoU等指标。本研究为植物表型研究和植物科学大规模模型的发展提供了高质量的三维数据支持。

Key Takeaways

  1. 植物点云器官分割对于提取器官水平表型特征至关重要。
  2. PlantSegNeRF是一种直接生成高精度实例点云的新型方法,适用于多种植物物种。
  3. PlantSegNeRF通过多视角实例分割生成器官实例掩膜,并采用实例匹配模块进行匹配和细化。
  4. PlantSegNeRF在语义分割和实例分割方面均表现出优异性能,相比其他方法有明显提升。
  5. PlantSegNeRF在结构复杂的物种上的表现优于其他方法,平均提升多个评估指标。
  6. 本研究为植物表型研究提供了高质量的三维数据支持。

Cool Papers

点此查看论文截图

Discretized Gaussian Representation for Tomographic Reconstruction

Authors:Shaokai Wu, Yuxiang Lu, Yapan Guo, Wei Ji, Suizhi Huang, Fengyu Yang, Shalayiding Sirejiding, Qichen He, Jing Tong, Yanbiao Ji, Yue Ding, Hongtao Lu

Computed Tomography (CT) enables detailed cross-sectional imaging but continues to face challenges in balancing reconstruction quality and computational efficiency. While deep learning-based methods have significantly improved image quality and noise reduction, they typically require large-scale training data and intensive computation. Recent advances in scene reconstruction, such as Neural Radiance Fields and 3D Gaussian Splatting, offer alternative perspectives but are not well-suited for direct volumetric CT reconstruction. In this work, we propose Discretized Gaussian Representation (DGR), a novel framework that reconstructs the 3D volume directly using a set of discretized Gaussian functions in an end-to-end manner. To further enhance efficiency, we introduce Fast Volume Reconstruction, a highly parallelized technique that aggregates Gaussian contributions into the voxel grid with minimal overhead. Extensive experiments on both real-world and synthetic datasets demonstrate that DGR achieves superior reconstruction quality and runtime performance across various CT reconstruction scenarios. Our code is publicly available at https://github.com/wskingdom/DGR.

计算机断层扫描(CT)能够实现详细的横截面成像,但仍在重建质量和计算效率之间寻求平衡方面面临挑战。虽然基于深度学习的方法在图像质量和降噪方面取得了显著改进,但它们通常需要大规模的训练数据和大量的计算。最近的场景重建技术,如神经辐射场和3D高斯喷绘,提供了替代视角,但不适用于直接的体积CT重建。在这项工作中,我们提出了离散高斯表示(DGR)这一新型框架,以端到端的方式直接使用一组离散化的高斯函数重建3D体积。为了进一步提高效率,我们引入了快速体积重建技术,这是一种高度并行化的技术,可以将高斯贡献汇聚到体素网格中,几乎不产生额外开销。在真实和合成数据集上的大量实验表明,DGR在各种CT重建场景中实现了优越的重建质量和运行性能。我们的代码可在https://github.com/wskingdom/DGR公开访问。

论文及项目相关链接

PDF Accepted to ICCV 2025

Summary

基于深度学习的CT重建方法在图像质量和噪声抑制方面取得了显著进展,但仍面临计算效率与重建质量之间的平衡挑战。最新场景重建技术如神经辐射场和三维高斯喷射等虽提供了新视角,但并不适用于直接体积CT重建。本研究提出离散高斯表示(DGR)框架,以一组离散高斯函数直接重建三维体积,并采用高效并行化技术实现快速体积重建。实验证明,DGR在各种CT重建场景中均表现出卓越的性能。代码公开在GitHub上。

Key Takeaways

  • 计算层析成像技术面临在平衡重建质量与计算效率的问题。深度学习在这方面提升了图像质量和降噪效果,但对大规模训练数据和计算需求极高。
  • 目前场景重建技术如神经辐射场和三维高斯喷射方法并未针对直接体积CT重建进行优化。
  • 离散高斯表示(DGR)框架利用离散化的高斯函数直接重建三维体积,为提高效率提供了新方法。这是首次在研究中得到实现和展示。
  • 通过引入高效并行化技术实现快速体积重建,提高了计算效率。
  • DGR在各种CT重建场景中均表现出卓越的性能,包括在实际和合成数据集上的广泛实验验证。

Cool Papers

点此查看论文截图

When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

Authors:Xianzheng Ma, Brandon Smart, Yash Bhalgat, Shuai Chen, Xinghui Li, Jian Ding, Jindong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, Philip H Torr, Marc Pollefeys, Matthias Nießner, Ian D Reid, Angel X. Chang, Iro Laina, Victor Adrian Prisacariu

As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context learning, step-by-step reasoning, open-vocabulary capabilities, and extensive world knowledge, we underscore their potential to significantly advance spatial comprehension and interaction within embodied Artificial Intelligence (AI) systems. Our investigation spans various 3D data representations, from point clouds to Neural Radiance Fields (NeRFs). It examines their integration with LLMs for tasks such as 3D scene understanding, captioning, question-answering, and dialogue, as well as LLM-based agents for spatial reasoning, planning, and navigation. The paper also includes a brief review of other methods that integrate 3D and language. The meta-analysis presented in this paper reveals significant progress yet underscores the necessity for novel approaches to harness the full potential of 3D-LLMs. Hence, with this paper, we aim to chart a course for future research that explores and expands the capabilities of 3D-LLMs in understanding and interacting with the complex 3D world. To support this survey, we have established a project page where papers related to our topic are organized and listed: https://github.com/ActiveVisionLab/Awesome-LLM-3D.

随着大型语言模型(LLM)的发展,它们与3D空间数据(3D-LLM)的融合取得了快速进展,为理解和与物理空间的交互提供了前所未有的能力。这篇综述提供了使LLM能够处理、理解和生成3D数据的方法论的全面概述。我们强调了LLM的独特优势,例如在上下文中的学习、逐步推理、开放词汇能力和广泛的世界知识,突出了它们在推动嵌入式人工智能(AI)系统中的空间理解和交互方面的巨大潜力。我们的研究涵盖了从点云到神经辐射场(NeRF)的各种3D数据表示。我们研究了它们与LLM的集成,用于3D场景理解、标题生成、问答和对话等任务,以及LLM基于空间的推理、规划和导航的代理。本文还简要回顾了其他将3D和语言集成的方法。本文的元分析表明取得了重大进展,但强调需要新的方法来充分利用3D-LLM的潜力。因此,我们希望通过这篇论文为未来研究绘制一条道路,探索并扩展3D-LLM在理解和与复杂3D世界交互方面的能力。为了支持这篇综述,我们已经建立了一个项目页面,与主题相关的论文被组织和列出:https://github.com/ActiveVisionLab/Awesome-LLM-3D

论文及项目相关链接

PDF 2nd version update to Jun.2025

Summary

本文综述了大型语言模型与三维空间数据的融合进展,展现了这一领域的前沿技术和潜力。文章强调了大型语言模型在理解物理空间方面的独特优势,并深入探讨了其与多种三维数据表示形式的融合方法,如点云和神经辐射场(NeRFs)。此外,文章还介绍了大型语言模型在三维场景理解、标注、问答、对话以及空间推理、规划和导航等方面的应用。本文旨在为未来研究提供方向,以更全面地挖掘三维大型语言模型的潜力。

Key Takeaways

  1. 大型语言模型(LLMs)与三维空间数据(3D-LLMs)的融合取得快速进展。
  2. LLMs具有上下文学习、逐步推理、开放词汇表和广泛世界知识的独特优势。
  3. 3D-LLMs在提升空间理解和交互能力方面具有巨大潜力。
  4. 多种三维数据表示形式(如点云和NeRFs)与LLMs的融合方法被探讨。
  5. LLMs在三维场景理解、标注、问答、对话等方面应用广泛。
  6. LLMs也可用于空间推理、规划和导航等任务。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
Diffusion Models Diffusion Models
Diffusion Models 方向最新论文已更新,请持续关注 Update in 2025-10-25 Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge
下一篇 
3DGS 3DGS
3DGS 方向最新论文已更新,请持续关注 Update in 2025-10-25 GSWorld Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation
2025-10-25
  目录