3DGS

发布日期: 2025-09-29

更新日期: 2025-11-27

文章字数: 5.6k

阅读时长: 22 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-29 更新

Gaussian Herding across Pens: An Optimal Transport Perspective on Global Gaussian Reduction for 3DGS

Authors:Tao Wang, Mengyu Li, Geduo Zeng, Cheng Meng, Qiong Zhang

3D Gaussian Splatting (3DGS) has emerged as a powerful technique for radiance field rendering, but it typically requires millions of redundant Gaussian primitives, overwhelming memory and rendering budgets. Existing compaction approaches address this by pruning Gaussians based on heuristic importance scores, without global fidelity guarantee. To bridge this gap, we propose a novel optimal transport perspective that casts 3DGS compaction as global Gaussian mixture reduction. Specifically, we first minimize the composite transport divergence over a KD-tree partition to produce a compact geometric representation, and then decouple appearance from geometry by fine-tuning color and opacity attributes with far fewer Gaussian primitives. Experiments on benchmark datasets show that our method (i) yields negligible loss in rendering quality (PSNR, SSIM, LPIPS) compared to vanilla 3DGS with only 10% Gaussians; and (ii) consistently outperforms state-of-the-art 3DGS compaction techniques. Notably, our method is applicable to any stage of vanilla or accelerated 3DGS pipelines, providing an efficient and agnostic pathway to lightweight neural rendering. The code is publicly available at https://github.com/DrunkenPoet/GHAP

3D高斯融合（3DGS）作为一种强大的辐射场渲染技术已经崭露头角，但它通常需要数百万个冗余的高斯原始数据，占据了大量的内存和渲染预算。现有的压缩方法通过基于启发式重要性得分的高斯修剪来解决这个问题，但这种方法没有全局保真度保证。为了弥合这一差距，我们从一种新的最优传输角度提出将3DGS压缩视为全局高斯混合减少。具体来说，我们首先通过KD树分区最小化复合传输散度，以产生紧凑的几何表示，然后通过微调颜色和透明度属性，使外观与几何分离，使用的高斯原始数据更少。在基准数据集上的实验表明，我们的方法（i）与使用只有10%的高斯的标准3DGS相比，在渲染质量（PSNR、SSIM、LPIPS）方面损失可忽略不计；（ii）始终优于最先进的3DGS压缩技术。值得注意的是，我们的方法适用于任何阶段的标准或加速的3DGS管道，为轻量级神经网络渲染提供了一条高效且通用的途径。代码可在https://github.com/DrunkenPoet/GHAP公开访问。

论文及项目相关链接

PDF 26 pages, 15 figures

Summary

本文介绍了三维高斯混合模型在辐射场渲染中的应用，提出一种基于最优传输理论的新方法来解决现有紧凑化技术存在的问题。该方法通过最小化KD树上的复合传输散度，实现了紧凑的几何表示，并通过微调颜色和透明度属性，减少了冗余的高斯原始数据。实验结果表明，该方法在保证渲染质量的同时，显著减少了高斯原始数据的数量，并超越了现有的紧凑化技术。此方法适用于任何阶段的三维高斯混合模型管道，为轻量级神经网络渲染提供了高效且通用的途径。

Key Takeaways

3DGS是一种强大的辐射场渲染技术，但需要大量冗余的高斯原始数据。
现有紧凑化技术基于启发式重要性评分进行高斯修剪，缺乏全局保真度保证。
提出了一种基于最优传输理论的新方法来解决此问题，将3DGS紧凑化问题视为全局高斯混合减少。
通过最小化KD树上的复合传输散度实现紧凑几何表示。
通过微调颜色和透明度属性，减少冗余的高斯原始数据，实现了高质量渲染。
实验结果表明，该方法在保证渲染质量的同时，显著减少了高斯原始数据的数量，并超越了现有技术。
此方法适用于任何阶段的三维高斯混合模型管道，具有高效性和通用性。

Cool Papers

点此查看论文截图

Generating 360° Video is What You Need For a 3D Scene

Authors:Zhaoyang Zhang, Yannick Hold-Geoffroy, Miloš Hašan, Ziwen Chen, Fujun Luan, Julie Dorsey, Yiwei Hu

Generating 3D scenes is still a challenging task due to the lack of readily available scene data. Most existing methods only produce partial scenes and provide limited navigational freedom. We introduce a practical and scalable solution that uses 360{\deg} video as an intermediate scene representation, capturing the full-scene context and ensuring consistent visual content throughout the generation. We propose WorldPrompter, a generative pipeline that synthesizes traversable 3D scenes from text prompts. WorldPrompter incorporates a conditional 360{\deg} panoramic video generator, capable of producing a 128-frame video that simulates a person walking through and capturing a virtual environment. The resulting video is then reconstructed as Gaussian splats by a fast feedforward 3D reconstructor, enabling a true walkable experience within the 3D scene. Experiments demonstrate that our panoramic video generation model, trained with a mix of image and video data, achieves convincing spatial and temporal consistency for static scenes. This is validated by an average COLMAP matching rate of 94.6%, allowing for high-quality panoramic Gaussian splat reconstruction and improved navigation throughout the scene. Qualitative and quantitative results also show it outperforms the state-of-the-art 360{\deg} video generators and 3D scene generation models.

生成3D场景仍然是一个具有挑战性的任务，因为缺乏可用的场景数据。现有的大多数方法只能生成部分场景，提供的导航自由度有限。我们引入了一种实用且可扩展的解决方案，该方案使用360°视频作为中间场景表示，捕捉整个场景的上下文，并在生成过程中确保视觉内容的一致性。我们提出了WorldPrompter，这是一种从文本提示合成可遍历的3D场景的生成管道。WorldPrompter包含一个条件式360°全景视频生成器，能够生成一个模拟人行走并捕捉虚拟环境的128帧视频。然后，该视频由快速前馈的3D重建器重建为高斯斑点，使用户能够在3D场景中获得真正的行走体验。实验表明，我们的全景视频生成模型通过混合图像和视频数据进行训练，为静态场景达到了令人信服的空间和时间一致性。这通过COLMAP的平均匹配率为94.6%得到了验证，使得高质量的全景高斯斑点重建和在场景中的改进导航成为可能。定性和定量的结果也表明，它的性能优于最先进的360°视频生成器和3D场景生成模型。

论文及项目相关链接

PDF SIGGRAPH Asia 2025. Project Page: https://zhaoyangzh.github.io/projects/worldprompter/

Summary

该文提出一种实用且可扩展的解决方案，通过采用全景视频作为中间场景表示，解决生成三维场景的挑战性问题。利用WorldPrompter生成管线，从文本提示合成可通行的三维场景。该管线包含一个条件全景视频生成器，能够产生模拟人行走并捕捉虚拟环境的128帧视频。然后，通过快速前馈三维重建器将视频重建为高斯斑点，实现真正的三维场景行走体验。实验表明，该全景视频生成模型在静态场景上实现了令人信服的空间和时间一致性。

Key Takeaways

使用全景视频作为中间表示来解决生成三维场景的挑战。
介绍WorldPrompter生成管线，能从文本提示合成可通行的三维场景。
WorldPrompter包含一个条件全景视频生成器，可模拟人行走并捕捉虚拟环境的视频。
视频通过快速前馈三维重建器重建为高斯斑点，提供真实的行走体验。
该全景视频生成模型实现了空间和时间上的一致性，在静态场景上表现优异。
模型通过COLMAP匹配率高达94.6%，保证高质量的全景高斯斑点重建。

Cool Papers

点此查看论文截图

Online Language Splatting

Authors:Saimouli Katragadda, Cho-Ying Wu, Yuliang Guo, Xinyu Huang, Guoquan Huang, Liu Ren

To enable AI agents to interact seamlessly with both humans and 3D environments, they must not only perceive the 3D world accurately but also align human language with 3D spatial representations. While prior work has made significant progress by integrating language features into geometrically detailed 3D scene representations using 3D Gaussian Splatting (GS), these approaches rely on computationally intensive offline preprocessing of language features for each input image, limiting adaptability to new environments. In this work, we introduce Online Language Splatting, the first framework to achieve online, near real-time, open-vocabulary language mapping within a 3DGS-SLAM system without requiring pre-generated language features. The key challenge lies in efficiently fusing high-dimensional language features into 3D representations while balancing the computation speed, memory usage, rendering quality and open-vocabulary capability. To this end, we innovatively design: (1) a high-resolution CLIP embedding module capable of generating detailed language feature maps in 18ms per frame, (2) a two-stage online auto-encoder that compresses 768-dimensional CLIP features to 15 dimensions while preserving open-vocabulary capabilities, and (3) a color-language disentangled optimization approach to improve rendering quality. Experimental results show that our online method not only surpasses the state-of-the-art offline methods in accuracy but also achieves more than 40x efficiency boost, demonstrating the potential for dynamic and interactive AI applications.

为了让人工智能代理无缝地与人类和三维环境进行交互，它们不仅要准确地感知三维世界，还要将人类语言与三维空间表示对齐。虽然先前的工作通过利用三维高斯拼贴（GS）将语言特征集成到几何详细的三维场景表示中取得了显著进展，但这些方法依赖于针对每个输入图像的耗时离线预处理语言特征，对新环境的适应能力有限。在这项工作中，我们引入了在线语言拼贴，这是第一个实现在线、接近实时、在3DGS-SLAM系统内实现开放词汇语言映射的框架，无需预先生成的语言特征。关键挑战在于高效地将高维语言特征融合到三维表示中，同时平衡计算速度、内存使用、渲染质量和开放词汇能力。为此，我们创新地设计了：（1）一个高分辨率CLIP嵌入模块，能够在每帧18毫秒内生成详细的语言特征图；（2）一个两阶段在线自动编码器，将768维CLIP特征压缩到15维，同时保留开放词汇能力；（3）一种颜色语言分离的优化方法，以提高渲染质量。实验结果表明，我们的在线方法不仅超越了最新离线方法的准确性，而且实现了超过40倍的效率提升，展现了动态和交互式人工智能应用的潜力。

论文及项目相关链接

PDF

Summary

本文介绍了在线语言拼接技术，该技术能够在3DGS-SLAM系统中实现近实时的开放词汇语言映射，无需预先生成语言特征。通过设计高效的CLIP嵌入模块、两阶段在线自动编码器以及色彩与语言分离的优化方法，该技术能够平衡计算速度、内存使用、渲染质量和开放词汇能力，实现了高维语言特征在三维表示中的有效融合。相较于传统的离线方法，该方法不仅提高了准确性，还实现了超过40倍的性能提升，为动态和交互式的AI应用提供了潜力。

Key Takeaways

引入在线语言拼接技术，实现实时三维环境语言映射。
关键技术包括：设计高效的CLIP嵌入模块、两阶段在线自动编码器以及色彩与语言分离的优化方法。
在线语言拼接技术能够平衡计算速度、内存使用、渲染质量和开放词汇能力。
在线方法实现了高维语言特征在三维表示中的有效融合。
与传统离线方法相比，该技术提高了准确性并实现了超过40倍的性能提升。

Cool Papers

点此查看论文截图

REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints

Authors:Di Wu, Liu Liu, Zhou Linli, Anran Huang, Liangtu Song, Qiaojun Yu, Qi Wu, Cewu Lu

Articulated objects, as prevalent entities in human life, their 3D representations play crucial roles across various applications. However, achieving both high-fidelity textured surface reconstruction and dynamic generation for articulated objects remains challenging for existing methods. In this paper, we present REArtGS, a novel framework that introduces additional geometric and motion constraints to 3D Gaussian primitives, enabling realistic surface reconstruction and generation for articulated objects. Specifically, given multi-view RGB images of arbitrary two states of articulated objects, we first introduce an unbiased Signed Distance Field (SDF) guidance to regularize Gaussian opacity fields, enhancing geometry constraints and improving surface reconstruction quality. Then we establish deformable fields for 3D Gaussians constrained by the kinematic structures of articulated objects, achieving unsupervised generation of surface meshes in unseen states. Extensive experiments on both synthetic and real datasets demonstrate our approach achieves high-quality textured surface reconstruction for given states, and enables high-fidelity surface generation for unseen states. Project site: https://sites.google.com/view/reartgs/home.

关节型物体作为人类生活中普遍存在的实体，其3D表示在各种应用中扮演着至关重要的角色。然而，对于现有方法来说，实现关节型物体的高保真纹理表面重建和动态生成仍然具有挑战性。在本文中，我们提出了REArtGS，一个引入额外几何和动作约束到3D高斯基本体的新型框架，为实现关节型物体的真实表面重建和生成提供了可能。具体来说，给定关节型物体的任意两种状态的多视角RGB图像，我们首先引入无偏Sign Distance Field（SDF）指导来规范高斯不透明度场，增强几何约束，提高表面重建质量。然后，我们根据关节型物体的运动结构建立可变形场，对3D高斯进行约束，实现未见状态的表面网格的无监督生成。在合成和真实数据集上的大量实验表明，我们的方法实现了给定状态的高质量纹理表面重建，并为未见状态实现了高保真表面生成。项目网站：网站链接。

论文及项目相关链接

PDF 11pages, 6 figures

Summary

本文介绍了REArtGS框架，该框架引入额外的几何和运动约束到三维高斯原始模型，实现了对关节活动物体的真实表面重建和生成。通过多视角RGB图像，结合无偏符号距离场（SDF）指导，对高斯不透明度场进行正则化，提高几何约束和表面重建质量。然后，根据关节活动物体的运动结构建立可变形的三维高斯场，实现未见状态的表面网格的无监督生成。该框架在合成和真实数据集上的实验表明，其实现了高质量纹理表面重建和对未见状态的高保真表面生成。

Key Takeaways

REArtGS框架引入几何和运动约束到三维高斯模型，实现关节活动物体的真实表面重建和生成。
通过多视角RGB图像结合无偏符号距离场（SDF）指导，提高表面重建的几何约束和质量。
建立基于运动结构的可变三维高斯场，实现未见状态的表面网格的无监督生成。
该方法能在合成和真实数据集上实现高质量纹理表面重建和对未见状态的高保真表面生成。
该方法可以用于多种应用，如动画渲染、虚拟现实等。
REArtGS框架具有广泛的应用前景，尤其是在需要关节活动物体模拟的领域。

Cool Papers

点此查看论文截图

GaussianSeal: Rooting Adaptive Watermarks for 3D Gaussian Generation Model

Authors:Runyi Li, Xuanyu Zhang, Chuhan Tong, Zhipei Xu, Jian Zhang

With the advancement of AIGC technologies, the modalities generated by models have expanded from images and videos to 3D objects, leading to an increasing number of works focused on 3D Gaussian Splatting (3DGS) generative models. Existing research on copyright protection for generative models has primarily concentrated on watermarking in image and text modalities, with little exploration into the copyright protection of 3D object generative models. In this paper, we propose the first bit watermarking framework for 3DGS generative models, named GaussianSeal, to enable the decoding of bits as copyright identifiers from the rendered outputs of generated 3DGS. By incorporating adaptive bit modulation modules into the generative model and embedding them into the network blocks in an adaptive way, we achieve high-precision bit decoding with minimal training overhead while maintaining the fidelity of the model’s outputs. Experiments demonstrate that our method outperforms post-processing watermarking approaches for 3DGS objects, achieving superior performance of watermark decoding accuracy and preserving the quality of the generated results.

随着AIGC技术的进步，模型生成的模态已经从图像和视频扩展到了三维物体，这导致对三维高斯喷射（3DGS）生成模型的研究工作日益增多。目前关于生成模型版权保护的研究主要集中在图像和文本的模态水印上，而对三维物体生成模型的版权保护研究很少。在本文中，我们提出了首个针对3DGS生成模型的位水印框架，名为GaussianSeal。它能够从生成的3DGS的渲染输出中解码作为版权标识符的位。通过将自适应位调制模块集成到生成模型中，并以自适应的方式嵌入到网络块中，我们在几乎不增加训练开销的情况下实现了高精度的位解码，同时保持了模型输出的保真度。实验表明，我们的方法在三维高斯喷射对象的水印处理方面优于后处理水印方法，实现了水印解码的高精度和生成结果的良好质量保持。

论文及项目相关链接

PDF To be appeared in Machine Intelligence Research

Summary
随着AIGC技术的发展，生成模型产生的形式已从图像和视频扩展到三维物体。目前关于生成模型的版权保护研究主要集中在图像和文本模态的水印，对三维物体生成模型的版权保护探索甚少。本文提出首个针对3DGS生成模型的位水印框架——GaussianSeal，可从生成的3DGS渲染输出中解码位作为版权标识符。通过自适应位调制模块融入生成模型并自适应嵌入网络块，我们在保持模型输出保真度的同时实现了高精度位解码，且训练开销极小。实验证明，我们的方法优于现有的三维物体水印处理方案，实现了水印解码的高精度和生成结果质量的保持。

Key Takeaways

生成模型的形式扩展至三维物体，引发对3DGS生成模型的版权保护需求。
当前版权保护研究主要集中在图像和文本模态的水印。
提出首个针对3DGS生成模型的位水印框架——GaussianSeal。
GaussianSeal能通过生成的3DGS渲染输出解码位作为版权标识符。
自适应位调制模块融入生成模型，实现高精度位解码同时保持输出保真度。
方法优于现有三维物体水印处理方案，实现水印解码的高精度。

Cool Papers

点此查看论文截图

Variational Bayes Gaussian Splatting

Authors:Toon Van de Maele, Ozan Catal, Alexander Tschantz, Christopher L. Buckley, Tim Verbelen

Recently, 3D Gaussian Splatting has emerged as a promising approach for modeling 3D scenes using mixtures of Gaussians. The predominant optimization method for these models relies on backpropagating gradients through a differentiable rendering pipeline, which struggles with catastrophic forgetting when dealing with continuous streams of data. To address this limitation, we propose Variational Bayes Gaussian Splatting (VBGS), a novel approach that frames training a Gaussian splat as variational inference over model parameters. By leveraging the conjugacy properties of multivariate Gaussians, we derive a closed-form variational update rule, allowing efficient updates from partial, sequential observations without the need for replay buffers. Our experiments show that VBGS not only matches state-of-the-art performance on static datasets, but also enables continual learning from sequentially streamed 2D and 3D data, drastically improving performance in this setting.

最近，3D高斯点扩展法（Gaussian Splatting）已成为一种使用混合高斯模型对三维场景进行建模的具有前景的方法。这些模型的主要优化方法依赖于通过可微分渲染管道进行反向传播梯度，这在处理连续数据流时面临灾难性遗忘的问题。为了解决这个问题，我们提出了变分贝叶斯高斯点扩展法（Variational Bayes Gaussian Splatting，VBGS），这是一种将训练高斯点扩展法作为模型参数上的变分推断的新方法。通过利用多元高斯共轭属性，我们推导出了闭式变分更新规则，允许从部分连续观察中进行有效更新，无需回放缓冲区。我们的实验表明，VBGS不仅在静态数据集上达到了最新技术水平，还实现了从连续流式的二维和三维数据的持续学习，大大提高了在这种环境下的性能。

论文及项目相关链接

PDF

Summary

近期，三维高斯点云（Gaussian Splatting）在三维场景建模中展现出巨大潜力，采用高斯混合模型。然而，现有优化方法通过可微渲染管道进行梯度反向传播，在处理连续数据流时面临灾难性遗忘问题。为解决此局限性，本文提出变分贝叶斯高斯点云（Variational Bayes Gaussian Splatting，VBGS）方法，将训练高斯点云视为模型参数的变分推断。借助多元高斯共轭属性，我们推导出一个封闭形式的变分更新规则，允许从部分连续观察中进行高效更新，无需回放缓冲区。实验表明，VBGS不仅在静态数据集上达到最新性能水平，还实现了从连续流式二维和三维数据的持续学习，大大提高了此场景的性能。

Key Takeaways