3DGS

发布日期: 2025-10-22

更新日期: 2025-11-27

文章字数: 6.1k

阅读时长: 24 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-10-22 更新

Botany-Bot: Digital Twin Monitoring of Occluded and Underleaf Plant Structures with Gaussian Splats

Authors:Simeon Adebola, Chung Min Kim, Justin Kerr, Shuangyu Xie, Prithvi Akella, Jose Luis Susa Rincon, Eugen Solowjow, Ken Goldberg

Commercial plant phenotyping systems using fixed cameras cannot perceive many plant details due to leaf occlusion. In this paper, we present Botany-Bot, a system for building detailed “annotated digital twins” of living plants using two stereo cameras, a digital turntable inside a lightbox, an industrial robot arm, and 3D segmentated Gaussian Splat models. We also present robot algorithms for manipulating leaves to take high-resolution indexable images of occluded details such as stem buds and the underside/topside of leaves. Results from experiments suggest that Botany-Bot can segment leaves with 90.8% accuracy, detect leaves with 86.2% accuracy, lift/push leaves with 77.9% accuracy, and take detailed overside/underside images with 77.3% accuracy. Code, videos, and datasets are available at https://berkeleyautomation.github.io/Botany-Bot/.

使用固定摄像头的商业植物表型系统由于叶片遮挡无法感知到许多植物细节。在本文中，我们介绍了Botany-Bot系统，该系统利用两个立体相机、一个内置于灯箱内的数字转盘、一个工业机械臂和3D分割高斯Splat模型，为活体植物构建详细的“注释数字双胞胎”。我们还介绍了用于操作叶片的机器人算法，以获取被遮挡的细节的高分辨率可索引图像，例如茎芽和叶片的背面/正面。实验结果表明，Botany-Bot可以90.8%的准确度分割叶片，以86.2%的准确度检测叶片，以77.9%的准确度抬起/推动叶片，并以77.3%的准确度拍摄详细的正面/背面图像。代码、视频和数据集可在https://berkeleyautomation.github.io/Botany-Bot/找到。

论文及项目相关链接

PDF 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)

Summary

植物表型商业系统因叶片遮挡无法捕捉许多植物细节。本文介绍Botany-Bot系统，利用两个立体相机、一个内置于灯光箱的数字转盘、一个工业机器人手臂和3D分割高斯Splat模型，构建详细的植物“注释数字双胞胎”。此外，本文还介绍了机器人算法，可操控叶片以拍摄被遮挡的细节的高分辨率可索引图像，如叶芽和叶片的底面/正面。实验结果表明，Botany-Bot能以90.8%的准确度分割叶片，以86.2%的准确度检测叶片，以77.9%的准确度抬起/推动叶片，并以77.3%的准确度拍摄详细的正面/背面图像。相关代码、视频和数据集可在链接找到。

Key Takeaways

Botany-Bot系统利用高科技设备构建植物“注释数字双胞胎”，包括立体相机、数字转盘、工业机器人手臂等。
系统采用3D分割高斯Splat模型，以提高图像分辨率和准确性。
机器人算法可操控叶片，拍摄被遮挡的植物细节，如叶芽和叶片的正反面。
实验结果表明，Botany-Bot在叶片分割、检测、推动和图像拍摄方面具有较高的准确性。
该系统有助于解决商业植物表型系统中因叶片遮挡导致的信息捕捉不全问题。
相关代码、视频和数据集可在指定链接找到，便于用户了解和使用该系统。

Cool Papers

点此查看论文截图

Initialize to Generalize: A Stronger Initialization Pipeline for Sparse-View 3DGS

Authors:Feng Zhou, Wenkai Guo, Pu Cao, Zhicheng Zhang, Jianqin Yin

Sparse-view 3D Gaussian Splatting (3DGS) often overfits to the training views, leading to artifacts like blurring in novel view rendering. Prior work addresses it either by enhancing the initialization (\emph{i.e.}, the point cloud from Structure-from-Motion (SfM)) or by adding training-time constraints (regularization) to the 3DGS optimization. Yet our controlled ablations reveal that initialization is the decisive factor: it determines the attainable performance band in sparse-view 3DGS, while training-time constraints yield only modest within-band improvements at extra cost. Given initialization’s primacy, we focus our design there. Although SfM performs poorly under sparse views due to its reliance on feature matching, it still provides reliable seed points. Thus, building on SfM, our effort aims to supplement the regions it fails to cover as comprehensively as possible. Specifically, we design: (i) frequency-aware SfM that improves low-texture coverage via low-frequency view augmentation and relaxed multi-view correspondences; (ii) 3DGS self-initialization that lifts photometric supervision into additional points, compensating SfM-sparse regions with learned Gaussian centers; and (iii) point-cloud regularization that enforces multi-view consistency and uniform spatial coverage through simple geometric/visibility priors, yielding a clean and reliable point cloud. Our experiments on LLFF and Mip-NeRF360 demonstrate consistent gains in sparse-view settings, establishing our approach as a stronger initialization strategy. Code is available at https://github.com/zss171999645/ItG-GS.

稀疏视角的3D高斯喷绘（3DGS）经常过度拟合训练视角，导致新视角渲染时出现模糊等伪影。早期的工作通过增强初始化（即来自运动结构（SfM）的点云）或在3DGS优化中添加训练时间约束（正则化）来解决这个问题。然而，我们的受控消融研究揭示了初始化的决定性作用：它决定了稀疏视角3DGS可达到的性能范围，而训练时间约束只会产生额外的成本，带来适度的改进。鉴于初始化的重要性，我们将设计重点放在这里。尽管SfM在稀疏视图下的表现不佳，依赖于特征匹配，但它仍然可以提供可靠的种子点。因此，我们建立在SfM的基础上，努力尽可能全面地补充其未能覆盖的区域。具体来说，我们设计了：（i）频率感知SfM，通过低频视图增强和放松的多视图对应关系提高低纹理覆盖率；（ii）3DGS自初始化，将光度监督提升为额外的点，用学习到的高斯中心补偿SfM稀疏区域；（iii）点云正则化，通过简单的几何/可见性先验强制执行多视图一致性和均匀的空间覆盖，从而产生干净可靠的点云。我们在LLFF和Mip-NeRF360上的实验表明，在稀疏视角设置下，我们的方法具有一致的增益，作为更强大的初始化策略。代码可在https://github.com/zss171999645/ItG-GS找到。

论文及项目相关链接

PDF A preprint paper

Summary

本文探讨了Sparse-view 3D Gaussian Splatting（3DGS）在训练视图中的过拟合问题，导致在新型视图渲染中出现模糊等伪影。虽然先前的工作通过增强初始化或添加训练时间约束来解决这个问题，但研究发现初始化是决定性因素。因此，本文聚焦于初始化设计，以SfM为基础，通过频率感知SfM、3DGS自我初始化和点云正则化等技术，补充SfM未能全面覆盖的区域，提高稀疏视图下的性能。实验结果表明，该方法在LLFF和Mip-NeRF360上实现了稀疏视图设置的一致增益，成为更强大的初始化策略。

Key Takeaways

Sparse-view 3D Gaussian Splatting（3DGS）存在过拟合问题，导致渲染的新视图中出现伪影。
初始化是决定Sparse-view 3DGS性能的关键因素。
通过增强初始化来解决伪影问题比添加训练时间约束更有效。
本文采用SfM作为基础，并对其进行改进以更好地适应稀疏视图环境。
设计了频率感知SfM，通过低频率视图增强和放松的多视图对应关系来提高低纹理覆盖率。
引入3DGS自我初始化，将光度监督提升为额外的点，以补偿SfM稀疏区域。

Cool Papers

点此查看论文截图

GSPlane: Concise and Accurate Planar Reconstruction via Structured Representation

Authors:Ruitong Gan, Junran Peng, Yang Liu, Chuanchen Luo, Qing Li, Zhaoxiang Zhang

Planes are fundamental primitives of 3D sences, especially in man-made environments such as indoor spaces and urban streets. Representing these planes in a structured and parameterized format facilitates scene editing and physical simulations in downstream applications. Recently, Gaussian Splatting (GS) has demonstrated remarkable effectiveness in the Novel View Synthesis task, with extensions showing great potential in accurate surface reconstruction. However, even state-of-the-art GS representations often struggle to reconstruct planar regions with sufficient smoothness and precision. To address this issue, we propose GSPlane, which recovers accurate geometry and produces clean and well-structured mesh connectivity for plane regions in the reconstructed scene. By leveraging off-the-shelf segmentation and normal prediction models, GSPlane extracts robust planar priors to establish structured representations for planar Gaussian coordinates, which help guide the training process by enforcing geometric consistency. To further enhance training robustness, a Dynamic Gaussian Re-classifier is introduced to adaptively reclassify planar Gaussians with persistently high gradients as non-planar, ensuring more reliable optimization. Furthermore, we utilize the optimized planar priors to refine the mesh layouts, significantly improving topological structure while reducing the number of vertices and faces. We also explore applications of the structured planar representation, which enable decoupling and flexible manipulation of objects on supportive planes. Extensive experiments demonstrate that, with no sacrifice in rendering quality, the introduction of planar priors significantly improves the geometric accuracy of the extracted meshes across various baselines.

平面是3D场景的基本元素，特别是在人造环境如室内空间和城市街道中。以结构和参数化的格式表示这些平面有助于下游场景编辑和物理模拟应用。最近，高斯贴图（GS）在新型视图合成任务中表现出了显著的效果，其扩展在精确表面重建方面显示出巨大的潜力。然而，即使是最先进的高斯贴图表征也往往难以以足够的平滑度和精度重建平面区域。为了解决这个问题，我们提出了GSPlane。它能够准确恢复几何结构，为重建场景中的平面区域生成清洁、结构良好的网格连接。通过利用现成的分割和法线预测模型，GSPlane提取稳健的平面先验知识，为平面高斯坐标建立结构化表征，通过强制几何一致性来引导训练过程。为了进一步增强训练过程的稳健性，引入了一种动态高斯重新分类器，可自适应地重新分类具有持续高梯度的平面高斯为非平面，确保更可靠的优化。此外，我们利用优化后的平面先验知识来完善网格布局，在改善拓扑结构的同时减少顶点和面的数量。我们还探索了结构化平面表征的应用，使支撑平面上的对象实现解耦和灵活操作。大量实验表明，在不牺牲渲染质量的情况下，引入平面先验知识可以显著提高在各种基线提取的网格的几何精度。

论文及项目相关链接

PDF

Summary

本文介绍了在3D场景重建中平面表示的重要性，并提出了一种新的方法GSPlane。该方法能够提取稳健的平面先验信息，建立结构化平面高斯坐标表示，提高几何一致性。通过引入动态高斯重新分类器，自适应地重新分类高梯度的平面高斯为非平面，增强训练稳健性。同时，利用优化后的平面先验信息对网格布局进行精细化处理，提高拓扑结构，减少顶点和面的数量。此外，还探索了结构化平面表示的应用，实现了支撑平面上对象的解耦和灵活操作。实验表明，引入平面先验信息在不降低渲染质量的前提下，显著提高了提取网格的几何精度。

Key Takeaways

平面是3D场景中的基本元素，尤其在人造环境如室内和街道中更为重要。
高斯喷涂（GS）在新型视图合成任务中表现出卓越的有效性，并在表面重建方面展现出巨大潜力。
GSPlane方法能够重建准确的几何结构，产生清洁、结构良好的网格连接。
GSPlane通过提取稳健的平面先验信息来建立结构化平面高斯坐标表示，提高几何一致性。
引入动态高斯重新分类器，自适应地处理高梯度的平面高斯，增强训练过程的稳健性。
利用优化后的平面先验信息对网格布局进行精细化处理，提高拓扑结构，同时减少顶点和面的数量。

Cool Papers

点此查看论文截图

REALM: An MLLM-Agent Framework for Open World 3D Reasoning Segmentation and Editing on Gaussian Splatting

Authors:Changyue Shi, Minghao Chen, Yiping Mao, Chuxiao Yang, Xinyuan Hu, Jiajun Ding, Zhou Yu

Bridging the gap between complex human instructions and precise 3D object grounding remains a significant challenge in vision and robotics. Existing 3D segmentation methods often struggle to interpret ambiguous, reasoning-based instructions, while 2D vision-language models that excel at such reasoning lack intrinsic 3D spatial understanding. In this paper, we introduce REALM, an innovative MLLM-agent framework that enables open-world reasoning-based segmentation without requiring extensive 3D-specific post-training. We perform segmentation directly on 3D Gaussian Splatting representations, capitalizing on their ability to render photorealistic novel views that are highly suitable for MLLM comprehension. As directly feeding one or more rendered views to the MLLM can lead to high sensitivity to viewpoint selection, we propose a novel Global-to-Local Spatial Grounding strategy. Specifically, multiple global views are first fed into the MLLM agent in parallel for coarse-level localization, aggregating responses to robustly identify the target object. Then, several close-up novel views of the object are synthesized to perform fine-grained local segmentation, yielding accurate and consistent 3D masks. Extensive experiments show that REALM achieves remarkable performance in interpreting both explicit and implicit instructions across LERF, 3D-OVS, and our newly introduced REALM3D benchmarks. Furthermore, our agent framework seamlessly supports a range of 3D interaction tasks, including object removal, replacement, and style transfer, demonstrating its practical utility and versatility. Project page: https://ChangyueShi.github.io/REALM.

在视觉和机器人领域，弥合复杂人类指令和精确三维对象定位之间的差距仍然是一个巨大的挑战。现有的三维分割方法在解释模糊、基于推理的指令时常常感到困难，而擅长此类推理的二维视觉语言模型缺乏内在的三维空间理解。在本文中，我们介绍了REALM，这是一个创新的多模态语言建模（MLLM）代理框架，它能够在不需要广泛的针对三维的特定后训练的情况下实现基于推理的分割。我们直接在三维高斯平铺表示上进行分割，充分利用其呈现高度逼真的新视角的能力，非常适合多模态语言建模的理解。由于直接将一个或多个渲染视图直接输入到多模态语言建模中可能导致对视角选择的敏感性，因此我们提出了一种新颖的全局到局部空间定位策略。具体来说，首先将多个全局视图并行输入到MLLM代理中进行粗略定位，通过聚合响应来稳健地识别目标对象。然后，合成该对象的一些特写新颖视图以执行精细的局部分割，产生准确且一致的3D蒙版。大量实验表明，REALM在解释LERF、3D-OVS和我们新引入的REALM3D基准测试中的明确和隐含指令时取得了显著的成绩。此外，我们的代理框架无缝支持一系列三维交互任务，包括对象移除、替换和风格转换，证明了其实用性、通用性和多功能性。项目页面：https://ChangyueShi.github.io/REALM。

论文及项目相关链接

PDF

Summary

本文提出了一种名为REALM的创新MLLM-agent框架，该框架能够在无需大量针对3D的后期训练情况下，实现基于开放世界推理的分割。该框架直接在3D高斯拼贴表示上进行分割，并利用其呈现高度适合MLLM理解的光照真实感新视角的能力。为解决直接喂养一个或多个渲染视图给MLLM可能导致的视角选择敏感性，提出一种Global-to-Local Spatial Grounding策略。首先，将多个全局视图并行输入MLLM agent进行粗略定位，聚合响应以稳健识别目标物体。接着，合成物体的几个特写新颖视图，进行精细局部分割，生成准确一致的3D掩膜。实验表明，REALM在解释明确和隐含指令方面表现出卓越性能，并在LERF、3D-OVS以及新引入的REALM3D基准测试中取得显著成果。此外，该框架轻松支持多种3D交互任务，包括物体移除、替换和风格转换，展示了其实用性和多功能性。

Key Takeaways

REALM是一个MLLM-agent框架，能够在无需大量针对3D的后期训练情况下实现基于开放世界推理的分割。
该框架利用3D高斯拼贴表示进行分割，并利用其呈现的光照真实感新视角。
Global-to-Local Spatial Grounding策略解决了视角选择问题。
REALM通过结合全局和局部视图，先粗略定位再精细分割，准确识别目标物体。
该框架在多个基准测试中表现卓越，包括LERF、3D-OVS和新的REALM3D测试。
REALM支持多种3D交互任务，包括物体移除、替换和风格转换。

Cool Papers

点此查看论文截图

MaterialRefGS: Reflective Gaussian Splatting with Multi-view Consistent Material Inference

Authors:Wenyuan Zhang, Jimin Tang, Weiqi Zhang, Yi Fang, Yu-Shen Liu, Zhizhong Han

Modeling reflections from 2D images is essential for photorealistic rendering and novel view synthesis. Recent approaches enhance Gaussian primitives with reflection-related material attributes to enable physically based rendering (PBR) with Gaussian Splatting. However, the material inference often lacks sufficient constraints, especially under limited environment modeling, resulting in illumination aliasing and reduced generalization. In this work, we revisit the problem from a multi-view perspective and show that multi-view consistent material inference with more physically-based environment modeling is key to learning accurate reflections with Gaussian Splatting. To this end, we enforce 2D Gaussians to produce multi-view consistent material maps during deferred shading. We also track photometric variations across views to identify highly reflective regions, which serve as strong priors for reflection strength terms. To handle indirect illumination caused by inter-object occlusions, we further introduce an environment modeling strategy through ray tracing with 2DGS, enabling photorealistic rendering of indirect radiance. Experiments on widely used benchmarks show that our method faithfully recovers both illumination and geometry, achieving state-of-the-art rendering quality in novel views synthesis.

从二维图像中对反射进行建模对于照片级渲染和新型视图合成至关重要。最近的方法使用与反射相关的材料属性来增强高斯基本元素，以实现基于高斯拼贴的物理渲染（PBR）。然而，材料推断通常缺乏足够的约束，特别是在环境建模受限的情况下，导致照明混淆和泛化性能下降。在这项工作中，我们从多角度重新考虑这一问题，并表明具有更多基于物理的环境建模的多视角一致材料推断是使用高斯拼贴学习准确反射的关键。为此，我们在延迟着色过程中强制二维高斯生成多视角一致的材料贴图。我们还跟踪不同视角下的光度变化，以识别高反射区域，这些区域作为反射强度项的强先验。为了处理由于对象遮挡造成的间接照明，我们进一步通过结合二维拼贴的光跟踪技术引入环境建模策略，从而实现间接辐射的照片级渲染。在广泛使用的基准测试上的实验表明，我们的方法忠实地恢复了照明和几何结构，在新型视图合成中实现了最先进的渲染质量。

论文及项目相关链接

PDF Accepted by NeurIPS 2025. Project Page: https://wen-yuan-zhang.github.io/MaterialRefGS

Summary

本文研究了基于二维图像建模反射对真实感渲染和新颖视角合成的重要性。为提高物理基础渲染（PBR）中材料推断的准确性，文章采用高斯拼贴技术，并从多视角角度重新审视问题。通过实施多视角一致的材料推断和更基于物理的环境建模，解决了在有限环境建模下材料推断的约束不足问题，减少了光照混淆并提高了泛化能力。为实现这一目标，文章在延迟着色过程中强制二维高斯生成多视角一致的材料图，并跟踪视图间的光度变化以识别高反射区域，作为反射强度项的强先验。为解决由对象间遮挡引起的间接照明问题，文章还通过采用二维高斯拼贴（2DGS）进行光线追踪，实现了间接辐射的真实感渲染。实验表明，该方法在照明和几何恢复方面表现优异，在新视角合成中达到最先进的渲染质量。

Key Takeaways

建模反射从二维图像对于真实感渲染和新颖视角合成至关重要。
高斯拼贴技术用于物理基础渲染（PBR），解决材料推断问题。
多视角一致的材料推断是提高材料推断准确性的关键。
通过强制二维高斯生成多视角一致的材料图，实施环境建模。
跟踪视图间的光度变化以识别高反射区域作为反射强度项的强先验。
采用二维高斯拼贴（2DGS）进行光线追踪，实现间接辐射的真实感渲染。

Cool Papers

点此查看论文截图

GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR

Authors:Christophe Bolduc, Yannick Hold-Geoffroy, Zhixin Shu, Jean-François Lalonde

We present GaSLight, a method that generates spatially-varying lighting from regular images. Our method proposes using HDR Gaussian Splats as light source representation, marking the first time regular images can serve as light sources in a 3D renderer. Our two-stage process first enhances the dynamic range of images plausibly and accurately by leveraging the priors embedded in diffusion models. Next, we employ Gaussian Splats to model 3D lighting, achieving spatially variant lighting. Our approach yields state-of-the-art results on HDR estimations and their applications in illuminating virtual objects and scenes. To facilitate the benchmarking of images as light sources, we introduce a novel dataset of calibrated and unsaturated HDR to evaluate images as light sources. We assess our method using a combination of this novel dataset and an existing dataset from the literature. Project page: https://lvsn.github.io/gaslight/

我们提出了GaSLight方法，该方法可以从常规图像生成空间变化的照明。我们的方法建议使用HDR高斯Splats作为光源表示，这是首次在3D渲染器中将常规图像作为光源。我们的两阶段过程首先利用扩散模型中的先验知识，以合理且准确的方式增强图像的动态范围。接下来，我们使用高斯Splats对3D照明进行建模，以实现空间变化的照明效果。我们的方法在HDR估算及其应用于照明虚拟对象和场景方面产生了最先进的结果。为了对图像作为光源进行基准测试，我们引入了一个全新的校准和不饱和HDR数据集来评估图像作为光源的效果。我们使用这个新数据集和文献中的现有数据集来评估我们的方法。项目页面：https://lvsn.github.io/gaslight/

论文及项目相关链接

PDF

Summary

GaSLight方法能够从普通图像生成空间变化的光照。该方法使用HDR高斯斑点作为光源表示，首次实现普通图像在3D渲染中作为光源。该方法分为两个阶段，首先利用扩散模型中的先验知识来增强图像的动态范围，然后采用高斯斑点对3D照明进行建模，实现空间变化的光照。该方法在HDR估算及其应用于照明虚拟对象和场景方面达到了最新水平的结果。为了对图像作为光源进行基准测试，我们引入了一个新型的校准不饱和HDR数据集。

Key Takeaways