3DGS

发布日期: 2025-11-08

更新日期: 2025-11-27

文章字数: 5.4k

阅读时长: 21 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-11-08 更新

Real-to-Sim Robot Policy Evaluation with Gaussian Splatting Simulation of Soft-Body Interactions

Authors:Kaifeng Zhang, Shuo Sha, Hanxiao Jiang, Matthew Loper, Hyunjong Song, Guangyan Cai, Zhuo Xu, Xiaochen Hu, Changxi Zheng, Yunzhu Li

Robotic manipulation policies are advancing rapidly, but their direct evaluation in the real world remains costly, time-consuming, and difficult to reproduce, particularly for tasks involving deformable objects. Simulation provides a scalable and systematic alternative, yet existing simulators often fail to capture the coupled visual and physical complexity of soft-body interactions. We present a real-to-sim policy evaluation framework that constructs soft-body digital twins from real-world videos and renders robots, objects, and environments with photorealistic fidelity using 3D Gaussian Splatting. We validate our approach on representative deformable manipulation tasks, including plush toy packing, rope routing, and T-block pushing, demonstrating that simulated rollouts correlate strongly with real-world execution performance and reveal key behavioral patterns of learned policies. Our results suggest that combining physics-informed reconstruction with high-quality rendering enables reproducible, scalable, and accurate evaluation of robotic manipulation policies. Website: https://real2sim-eval.github.io/

机器人操控策略正在迅速发展，但在现实世界中直接评估它们仍然成本高昂、耗时且难以重现，特别是对于涉及可变形物体的任务。仿真提供了一种可扩展和系统的替代方案，但现有仿真器通常无法捕捉软体交互的耦合视觉和物理复杂性。我们提出了一种从现实到仿真的策略评估框架，该框架通过现实世界的视频构建软体数字孪生体，并使用三维高斯喷涂技术以逼真的保真度渲染机器人、物体和环境。我们在代表性的可变形操作任务上验证了我们的方法，包括毛绒玩具包装、绳索路由和T块推动，证明模拟结果与真实世界执行性能高度相关，并揭示了学习策略的关键行为模式。我们的结果表明，将物理信息重建与高质量渲染相结合，可实现机器人操控策略的可重复性、可扩展性和准确性评估。网站：https://real2sim-eval.github.io/

Summary

随着机器人操作策略的不断进步，真实世界中的直接评估变得日益昂贵、耗时且难以复制，特别是在涉及可变形的物体时更是如此。模拟技术提供了可扩展和系统的替代方案，但现有模拟技术往往无法捕捉软体交互的复杂视觉和物理特性。我们提出了一种从现实视频构建软体数字双胞胎的真实模拟策略评估框架，使用三维高斯飞溅技术以逼真的保真度渲染机器人、物体和环境。我们通过典型的可变形操作任务验证了我们的方法，包括填充毛绒玩具、路线规划和推动T块，证明模拟结果与真实世界的执行性能高度相关，并揭示了学习策略的关键行为模式。结果表明，结合物理信息重建和高品质渲染，可以实现机器人操作策略的再现性、可扩展性和准确评估。

Key Takeaways

机器人操作策略的真实世界评估变得昂贵、耗时且难以复制。
模拟技术提供了评估机器人操作策略的可扩展和系统的替代方案。
现有模拟技术难以捕捉软体交互的复杂视觉和物理特性。
提出了一种真实模拟策略评估框架，从现实视频构建软体数字双胞胎。
使用三维高斯飞溅技术以逼真的保真度渲染机器人、物体和环境。
通过典型可变形操作任务验证了方法的有效性。

Cool Papers

点此查看论文截图

FastGS: Training 3D Gaussian Splatting in 100 Seconds

Authors:Shiwei Ren, Tianci Wen, Yongchun Fang, Biao Lu

The dominant 3D Gaussian splatting (3DGS) acceleration methods fail to properly regulate the number of Gaussians during training, causing redundant computational time overhead. In this paper, we propose FastGS, a novel, simple, and general acceleration framework that fully considers the importance of each Gaussian based on multi-view consistency, efficiently solving the trade-off between training time and rendering quality. We innovatively design a densification and pruning strategy based on multi-view consistency, dispensing with the budgeting mechanism. Extensive experiments on Mip-NeRF 360, Tanks & Temples, and Deep Blending datasets demonstrate that our method significantly outperforms the state-of-the-art methods in training speed, achieving a 3.32$\times$ training acceleration and comparable rendering quality compared with DashGaussian on the Mip-NeRF 360 dataset and a 15.45$\times$ acceleration compared with vanilla 3DGS on the Deep Blending dataset. We demonstrate that FastGS exhibits strong generality, delivering 2-7$\times$ training acceleration across various tasks, including dynamic scene reconstruction, surface reconstruction, sparse-view reconstruction, large-scale reconstruction, and simultaneous localization and mapping. The project page is available at https://fastgs.github.io/

主导的三维高斯展布（3DGS）加速方法未能妥善调控训练过程中的高斯数量，导致额外的计算时间开销。在本文中，我们提出了FastGS，这是一种新颖、简单且通用的加速框架，它充分考虑了每个高斯的重要性，基于多视角一致性，有效地解决了训练时间和渲染质量之间的权衡问题。我们创新地设计了一种基于多视角一致性的加密和修剪策略，摒弃了预算机制。在Mip-NeRF 360、Tanks & Temples以及Deep Blending数据集上的大量实验表明，我们的方法在训练速度上显著优于现有技术，在Mip-NeRF 360数据集上与DashGaussian相比，实现了3.32倍的训练加速和可比的渲染质量；在Deep Blending数据集上与标准的3DGS相比，实现了15.45倍的训练加速。我们证明FastGS具有很强的通用性，在各种任务中实现了2-7倍的训练加速，包括动态场景重建、表面重建、稀疏视图重建、大规模重建以及同时定位和地图构建。项目页面可在https://fastgs.github.io/找到。

论文及项目相关链接

PDF Project page: https://fastgs.github.io/

Summary

本文主要探讨了当前主流的3D高斯展开（3DGS）加速方法在训练过程中存在的问题，即无法适当调控高斯数量，导致计算时间冗余。为此，本文提出了一种名为FastGS的新型、简单且通用的加速框架，该框架充分考虑了每个高斯的重要性，基于多视角一致性进行优化，解决了训练时间与渲染质量之间的权衡问题。

Key Takeaways

FastGS框架基于多视角一致性，提出了密集化及剪枝策略，摒弃了预算机制。
与现有先进方法相比，FastGS在Mip-NeRF 360、Tanks & Temples及Deep Blending等多个数据集上的实验表明，其在训练速度上表现显著优势。
在Mip-NeRF 360数据集上，FastGS相对于DashGaussian实现了3.32倍的训练加速，同时保持相当的渲染质量。
在Deep Blending数据集上，FastGS相对于传统3DGS实现了15.45倍的训练加速。
FastGS展现出强大的通用性，在各种任务中均实现了2-7倍的训练加速，包括动态场景重建、表面重建、稀疏视图重建、大规模重建以及同时定位与地图构建。
FastGS的设计新颖、简单，并且易于实施。

Cool Papers

点此查看论文截图

CaRF: Enhancing Multi-View Consistency in Referring 3D Gaussian Splatting Segmentation

Authors:Yuwen Tao, Kanglei Zhou, Xin Tan, Yuan Xie

Referring 3D Gaussian Splatting Segmentation (R3DGS) aims to interpret free-form language expressions and localize the corresponding 3D regions in Gaussian fields. While recent advances have introduced cross-modal alignment between language and 3D geometry, existing pipelines still struggle with cross-view consistency due to their reliance on 2D rendered pseudo supervision and view specific feature learning. In this work, we present Camera Aware Referring Field (CaRF), a fully differentiable framework that operates directly in the 3D Gaussian space and achieves multi view consistency. Specifically, CaRF introduces Gaussian Field Camera Encoding (GFCE), which incorporates camera geometry into Gaussian text interactions to explicitly model view dependent variations and enhance geometric reasoning. Building on this, In Training Paired View Supervision (ITPVS) is proposed to align per Gaussian logits across calibrated views during training, effectively mitigating single view overfitting and exposing inter view discrepancies for optimization. Extensive experiments on three representative benchmarks demonstrate that CaRF achieves average improvements of 16.8%, 4.3%, and 2.0% in mIoU over state of the art methods on the Ref LERF, LERF OVS, and 3D OVS datasets, respectively. Moreover, this work promotes more reliable and view consistent 3D scene understanding, with potential benefits for embodied AI, AR/VR interaction, and autonomous perception.

涉及三维高斯平铺分割（R3DGS）的目标是解释自由形式的语言表达并定位高斯场中的相应三维区域。虽然最近的进展已经引入了语言和三维几何之间的跨模态对齐，但现有流程仍然因依赖二维渲染的伪监督和特定视图特征学习而面临跨视图一致性的挑战。在这项工作中，我们提出了Camera Aware Referring Field（CaRF），这是一个直接在三维高斯空间操作的完全可微分的框架，实现了多视图一致性。具体来说，CaRF引入了高斯场相机编码（GFCE），它将相机几何信息融入高斯文本交互中，以显式建模视图相关的变化并增强几何推理。在此基础上，提出了训练配对视图监督（ITPVS），在训练过程中对齐校准视图上的每个高斯逻辑值，有效缓解单视图过拟合并暴露视图之间的差异以进行优化。在三个代表性数据集上的大量实验表明，CaRF在Ref LERF、LERF OVS和3D OVS数据集上的平均mIoU指标比现有技术方法分别提高了16.8%、4.3%和2.0%。此外，这项工作促进了更可靠和视图一致的3D场景理解，对嵌入式AI、AR/VR交互和自主感知具有潜在益处。

论文及项目相关链接

PDF

Summary

在3D高斯空间中，直接操作并理解自由形式语言表达式及其对应的3D区域是一项挑战。现有技术存在跨视图一致性差的问题，主要依赖于二维渲染的伪监督以及特定视图的特征学习。本研究提出了Camera AwareReferring Field（CaRF），一个在三维高斯空间中进行操作的全微分框架，解决了多视图一致性难题。它通过引入Gaussian Field Camera Encoding（GFCE），结合摄像机几何信息和高斯文本交互进行建模，提升了几何推理能力。此外，还提出了Training Paired View Supervision（ITPVS），在训练过程中校准视图间的高斯逻辑，减少了单一视图的过度拟合，提高了优化效果。该技术在多个数据集上的实验表明，CaRF在平均mIoU指标上相较于最新技术有了显著提升。该研究为更可靠、更一致的3D场景理解奠定了基础，对未来在人工智能应用、AR/VR交互以及自主感知等方面都具有积极影响。

Key Takeaways

R3DGS旨在将自由形式语言表达式与高斯场中的相应3D区域进行解释和定位。
现有技术面临跨视图一致性挑战，主要因为依赖二维渲染的伪监督和特定视图特征学习。
CaRF框架直接操作三维高斯空间，解决了多视图一致性难题。
GFCE结合了摄像机几何信息和高斯文本交互，提升了几何推理能力。
ITPVS在训练过程中校准不同视图间的高斯逻辑，提高了模型性能。
CaRF在多个数据集上的实验表现优异，平均mIoU指标显著提升。

Cool Papers

点此查看论文截图

Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos

Authors:Kaifeng Zhang, Baoyu Li, Kris Hauser, Yunzhu Li

Modeling the dynamics of deformable objects is challenging due to their diverse physical properties and the difficulty of estimating states from limited visual information. We address these challenges with a neural dynamics framework that combines object particles and spatial grids in a hybrid representation. Our particle-grid model captures global shape and motion information while predicting dense particle movements, enabling the modeling of objects with varied shapes and materials. Particles represent object shapes, while the spatial grid discretizes the 3D space to ensure spatial continuity and enhance learning efficiency. Coupled with Gaussian Splattings for visual rendering, our framework achieves a fully learning-based digital twin of deformable objects and generates 3D action-conditioned videos. Through experiments, we demonstrate that our model learns the dynamics of diverse objects – such as ropes, cloths, stuffed animals, and paper bags – from sparse-view RGB-D recordings of robot-object interactions, while also generalizing at the category level to unseen instances. Our approach outperforms state-of-the-art learning-based and physics-based simulators, particularly in scenarios with limited camera views. Furthermore, we showcase the utility of our learned models in model-based planning, enabling goal-conditioned object manipulation across a range of tasks. The project page is available at https://kywind.github.io/pgnd .

对可变物体的动态进行建模是一个挑战，因为它们的物理属性多种多样，从有限的视觉信息中估计状态很困难。我们通过神经动力学框架来解决这些挑战，该框架结合了对象粒子和空间网格的混合表示。我们的粒子网格模型可以捕获全局形状和运动信息，同时预测粒子密集运动，从而能够对各种形状和材料的物体进行建模。粒子代表物体的形状，而空间网格将三维空间离散化，以确保空间连续性并提高学习效率。结合用于视觉渲染的高斯飞溅技术，我们的框架实现了基于学习的可变物体的数字孪生，并生成了3D动作条件视频。通过实验，我们证明了我们的模型能够从机器人与物体互动的稀疏视图RGB-D记录中学习各种物体的动态，如绳子、布料、填充动物和纸袋等，同时还能在类别层面推广到未见过的实例。我们的方法优于最新的基于学习和基于物理的模拟器，特别是在有限相机视角的场景中。此外，我们展示了所学模型在基于模型的规划中的实用性，能够在各种任务中实现目标条件下的物体操作。项目页面可在https://kywind.github.io/pgnd找到。

论文及项目相关链接

PDF Project page: https://kywind.github.io/pgnd

Summary

该文本介绍了一种解决动态可变物体建模的新神经网络框架，通过粒子与空间网格混合表示，该框架可捕获全局形状和动态信息，并预测粒子的密集运动。其适用于不同形状和材料物体的建模。框架利用高斯贴片实现视觉渲染，实现了基于学习的数字双胞胎的可变形物体模型，并能生成三维动作视频。实验表明，该模型可从稀疏视图的RGB-D机器人物体交互记录中学习各种物体的动态特性，并在类别层面推广到未见实例中。此模型在场景受限时表现出优势，且在模型为基础的规划中显示出其价值，能够在不同任务中通过设定目标来完成物体的操作任务。详情请参见相关网站https://kywind.github.io/pgnd。

Key Takeaways

以下是关于该文本的关键见解列表：

该研究提出一个结合粒子与空间网格的神经动力学框架来模拟变形物体的动态行为。这个混合表示允许模型捕捉全局形状和动态信息。
该框架通过预测粒子的密集运动来模拟各种形状和材料的物体。粒子代表物体的形状，而空间网格则确保空间连续性和提高学习效率。
高斯贴片技术用于视觉渲染，实现了基于学习的数字双胞胎的可变形物体模型生成三维动作视频。
实验结果显示，该模型能够从稀疏视图的RGB-D机器人交互记录中学习物体的动态特性，并能在未见实例中进行类别级别的推广。
该模型在有限的相机视角场景下表现优越，尤其是在与当前主流的学习和物理模拟器相比时。

Cool Papers

点此查看论文截图

Optimized Minimal 3D Gaussian Splatting

Authors:Joo Chan Lee, Jong Hwan Ko, Eunbyung Park

3D Gaussian Splatting (3DGS) has emerged as a powerful representation for real-time, high-performance rendering, enabling a wide range of applications. However, representing 3D scenes with numerous explicit Gaussian primitives imposes significant storage and memory overhead. Recent studies have shown that high-quality rendering can be achieved with a substantially reduced number of Gaussians when represented with high-precision attributes. Nevertheless, existing 3DGS compression methods still rely on a relatively large number of Gaussians, focusing primarily on attribute compression. This is because a smaller set of Gaussians becomes increasingly sensitive to lossy attribute compression, leading to severe quality degradation. Since the number of Gaussians is directly tied to computational costs, it is essential to reduce the number of Gaussians effectively rather than only optimizing storage. In this paper, we propose Optimized Minimal Gaussians representation (OMG), which significantly reduces storage while using a minimal number of primitives. First, we determine the distinct Gaussian from the near ones, minimizing redundancy without sacrificing quality. Second, we propose a compact and precise attribute representation that efficiently captures both continuity and irregularity among primitives. Additionally, we propose a sub-vector quantization technique for improved irregularity representation, maintaining fast training with a negligible codebook size. Extensive experiments demonstrate that OMG reduces storage requirements by nearly 50% compared to the previous state-of-the-art and enables 600+ FPS rendering while maintaining high rendering quality. Our source code is available at https://maincold2.github.io/omg/.

3D高斯点云（3DGS）作为一种强大的实时高性能渲染表示方法，已经广泛应用于各种应用中。然而，使用大量显式高斯基元来表示3D场景会带来巨大的存储和内存开销。最近的研究表明，在具有高精度属性的表示下，通过大幅减少高斯数量也能实现高质量渲染。然而，现有的3DGS压缩方法仍然依赖于相对大量的高斯基元，主要侧重于属性压缩。这是因为较少的高斯基元对有损属性压缩更加敏感，从而导致质量严重下降。由于高斯基元的数量与计算成本直接相关，因此不仅要优化存储，还要有效地减少高斯基元的数量。在本文中，我们提出了优化最小高斯表示法（OMG），该方法在使用最少基元的同时显著减少了存储需求。首先，我们从相近的高斯中识别出独特的高斯，最小化冗余而不影响质量。其次，我们提出了一种紧凑而精确的属性表示法，能够高效地捕捉基元之间的连续性和不规则性。此外，我们还提出了一种子向量量化技术，以改进不规则性的表示，同时在保持快速训练的同时实现较小的代码本大小。大量实验表明，与最新的先进技术相比，OMG将存储需求减少了近50%，同时实现了高达600FPS的渲染速度，并保持较高的渲染质量。我们的源代码可在https://maincold2.github.io/omg/ 获得。

摘要

实时高性能渲染中的高斯波谷映射法实现了丰富的场景呈现能力。本文将目光投向场景中大量显式高斯基元带来的存储和内存开销问题，提出了一种优化的最小高斯基元表示法（OMG）。OMG通过去除冗余基元并精确描述属性的连续性和不规则性来实现压缩效果。其中引入了子向量量化技术以保持高效的训练，而减少了对庞大编码需求大小的依赖。对比实验显示，OMG降低了高达一半的存储需求并提升了渲染效率和质量。具体的开源实现可以通过官方渠道下载和使用。想了解更多有关信息和支持的话请登陆上述官网地址进行了解学习。通过基于参数的训练配置库用户可以进行具体的操作和调节以实现对代码性能的提升，可以为用户提供代码复用和提高软件复用性上很大的帮助。感兴趣的科研人士可以在以上提供的网站地址获取源码和参考相应的技术支持进一步学习和使用进行提升和发展技术本身以及丰富使用功能及改善用户体验。

关键见解

3DGS成为实时高性能渲染的强大表示方法，广泛应用于各种场景。
场景中显式高斯基元带来的存储和内存需求大是目前的挑战。
OMG提出一种优化最小高斯基元表示法显著减少存储需求并维持高质量渲染效果。
OMG通过消除冗余基元并精确描述属性的连续性和不规则性实现优化。
子向量量化技术提升了OMG的效率及提升了代码的可复用性以帮助优化体验和用户支持调整方案促进复利学术知识的快速普及化以实现科研工作降本增效和提升性能便利的工作体系高效率可持续高效管理。

Cool Papers

点此查看论文截图