3DGS

发布日期: 2025-09-13

更新日期: 2025-10-07

文章字数: 2.4k

阅读时长: 9 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-13 更新

ForestSplats: Deformable transient field for Gaussian Splatting in the Wild

Authors:Wongi Park, Myeongseok Nam, Siwon Kim, Sangwoo Jo, Soomok Lee

Recently, 3D Gaussian Splatting (3D-GS) has emerged, showing real-time rendering speeds and high-quality results in static scenes. Although 3D-GS shows effectiveness in static scenes, their performance significantly degrades in real-world environments due to transient objects, lighting variations, and diverse levels of occlusion. To tackle this, existing methods estimate occluders or transient elements by leveraging pre-trained models or integrating additional transient field pipelines. However, these methods still suffer from two defects: 1) Using semantic features from the Vision Foundation model (VFM) causes additional computational costs. 2) The transient field requires significant memory to handle transient elements with per-view Gaussians and struggles to define clear boundaries for occluders, solely relying on photometric errors. To address these problems, we propose ForestSplats, a novel approach that leverages the deformable transient field and a superpixel-aware mask to efficiently represent transient elements in the 2D scene across unconstrained image collections and effectively decompose static scenes from transient distractors without VFM. We designed the transient field to be deformable, capturing per-view transient elements. Furthermore, we introduce a superpixel-aware mask that clearly defines the boundaries of occluders by considering photometric errors and superpixels. Additionally, we propose uncertainty-aware densification to avoid generating Gaussians within the boundaries of occluders during densification. Through extensive experiments across several benchmark datasets, we demonstrate that ForestSplats outperforms existing methods without VFM and shows significant memory efficiency in representing transient elements.

最近，3D高斯展布（3D-GS）技术已经出现，它在静态场景中实现了实时渲染速度和高质量的结果。尽管3D-GS在静态场景中的表现很有效，但在真实世界环境中，由于瞬态物体、光照变化和遮挡水平的多样性，它们的性能会大大降低。为了解决这一问题，现有方法通过利用预训练模型或集成额外的瞬态场管道来估计遮挡物或瞬态元素。然而，这些方法仍然存在两个缺陷：1）使用视觉基础模型（VFM）的语义特征会导致额外的计算成本。2）瞬态场需要处理大量的内存来处理带有视图的高斯分布的瞬态元素，并且仅依赖光度误差来定义遮挡物的清晰边界，这使其面临困难。为了解决这些问题，我们提出了ForestSplats，这是一种利用可变形瞬态场和超像素感知掩膜的新方法，可以高效地表示跨无约束图像集合的二维场景中的瞬态元素，并从瞬态干扰物中有效地分解静态场景而无需使用VFM。我们设计的瞬态场是可变形的，可以捕捉每个视图的瞬态元素。此外，我们引入了一个超像素感知掩膜，通过考虑光度误差和超像素来清晰地定义遮挡物的边界。另外，我们提出了不确定性感知加密法，以避免在遮挡物边界内进行加密时产生高斯分布。通过几项基准数据集的广泛实验，我们证明了ForestSplats在不需要VFM的情况下优于现有方法，并且在表示瞬态元素时表现出显著的内存效率。

论文及项目相关链接

PDF

Summary

本文介绍了ForestSplats技术，该技术针对现有的三维高斯采样技术（如因处理静态场景表现出实时渲染速度和高质量结果但在现实环境中性能显著下降的问题）进行了改进。ForestSplats利用可变形瞬态场和超像素感知掩膜来有效地表示二维场景中的瞬态元素，并从无约束的图像集合中分解静态场景与瞬态干扰物。其创新性地设计了可变形瞬态场来捕捉每视角的瞬态元素，并通过引入超像素感知掩膜来明确界定遮挡物的边界。此外，该技术还提出了不确定性感知加密法，以避免在遮挡物边界内生成高斯分布。实验证明，ForestSplats在不使用视觉基础模型的情况下优于现有方法，并在表示瞬态元素时表现出显著的内存效率。

Key Takeaways

ForestSplats技术解决了现有三维高斯采样技术在现实环境中处理瞬态对象时的性能下降问题。
ForestSplats通过利用可变形瞬态场和超像素感知掩膜有效地表示二维场景中的瞬态元素。
该技术明确了遮挡物的边界，避免了使用视觉基础模型带来的额外计算成本。
ForestSplats具有内存效率，能够在表示瞬态元素时表现出显著的优势。
该技术通过不确定性感知加密法避免在遮挡物内部生成高斯分布。

Cool Papers

点此查看论文截图

The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods

Authors:Yifu Tao, Miguel Ángel Muñoz-Bañón, Lintong Zhang, Jiahao Wang, Lanke Frank Tarimo Fu, Maurice Fallon

This paper introduces a large-scale multi-modal dataset captured in and around well-known landmarks in Oxford using a custom-built multi-sensor perception unit as well as a millimetre-accurate map from a Terrestrial LiDAR Scanner (TLS). The perception unit includes three synchronised global shutter colour cameras, an automotive 3D LiDAR scanner, and an inertial sensor - all precisely calibrated. We also establish benchmarks for tasks involving localisation, reconstruction, and novel-view synthesis, which enable the evaluation of Simultaneous Localisation and Mapping (SLAM) methods, Structure-from-Motion (SfM) and Multi-view Stereo (MVS) methods as well as radiance field methods such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting. To evaluate 3D reconstruction the TLS 3D models are used as ground truth. Localisation ground truth is computed by registering the mobile LiDAR scans to the TLS 3D models. Radiance field methods are evaluated not only with poses sampled from the input trajectory, but also from viewpoints that are from trajectories which are distant from the training poses. Our evaluation demonstrates a key limitation of state-of-the-art radiance field methods: we show that they tend to overfit to the training poses/images and do not generalise well to out-of-sequence poses. They also underperform in 3D reconstruction compared to MVS systems using the same visual inputs. Our dataset and benchmarks are intended to facilitate better integration of radiance field methods and SLAM systems. The raw and processed data, along with software for parsing and evaluation, can be accessed at https://dynamic.robots.ox.ac.uk/datasets/oxford-spires/.

本文介绍了一个大规模多模态数据集，该数据集在牛津著名地标内部及周边区域进行捕捉，采用自定义的多传感器感知单元以及陆地激光雷达扫描仪（TLS）的毫米级精度地图。感知单元包括三个同步的全局快门彩色相机、汽车3D激光雷达扫描仪和惯性传感器，所有设备均经过精确校准。我们还为定位、重建和新颖视图合成任务建立了基准测试，这些基准测试能够对SLAM（同时定位与地图构建）、SfM（从运动重建）和MVS（多视图立体成像）方法以及NeRF（神经网络辐射场）和3D高斯喷绘等辐射场方法进行评估。为了评估3D重建的效果，我们使用TLS的3D模型作为真实数据。定位真实数据是通过将移动激光雷达扫描注册到TLS 3D模型来计算的。辐射场方法的评估不仅使用了从输入轨迹中采样的姿态，还包括来自远离训练姿态的轨迹的视点。我们的评估展示了当前辐射场方法的一个关键局限：它们往往过于适合训练姿态/图像，对于不按顺序的姿态不能很好地进行概括。在3D重建方面，它们也比使用相同视觉输入源的MVS系统表现较差。我们的数据集和基准测试旨在促进辐射场方法和SLAM系统之间的更好集成。原始数据和加工数据以及用于解析和评估的软件都可以访问https://dynamic.robots.ox.ac.uk/datasets/oxford-spires/。

论文及项目相关链接

PDF Accepted by IJRR. Website: https://dynamic.robots.ox.ac.uk/datasets/oxford-spires/

Summary
本文介绍了一个在牛津著名地标区域采集的大规模多模态数据集，使用了自定义的多传感器感知单元和地面激光雷达扫描仪。该数据集为定位、重建和新颖视角合成任务提供了基准测试，并评估了SLAM、SfM、MVS以及NeRF等光线场方法。实验表明，当前的光线场方法存在过度拟合训练姿态/图像的问题，对超出序列的姿态泛化能力较差，且在3D重建方面表现不佳。数据集和基准测试旨在促进光线场方法与SLAM系统的更好融合。

Key Takeaways