嘘~ 正在从服务器偷取页面 . . .

3DGS


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-10-25 更新

GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation

Authors:Guangqi Jiang, Haoran Chang, Ri-Zhao Qiu, Yutong Liang, Mazeyu Ji, Jiyue Zhu, Zhao Dong, Xueyan Zou, Xiaolong Wang

This paper presents GSWorld, a robust, photo-realistic simulator for robotics manipulation that combines 3D Gaussian Splatting with physics engines. Our framework advocates “closing the loop” of developing manipulation policies with reproducible evaluation of policies learned from real-robot data and sim2real policy training without using real robots. To enable photo-realistic rendering of diverse scenes, we propose a new asset format, which we term GSDF (Gaussian Scene Description File), that infuses Gaussian-on-Mesh representation with robot URDF and other objects. With a streamlined reconstruction pipeline, we curate a database of GSDF that contains 3 robot embodiments for single-arm and bimanual manipulation, as well as more than 40 objects. Combining GSDF with physics engines, we demonstrate several immediate interesting applications: (1) learning zero-shot sim2real pixel-to-action manipulation policy with photo-realistic rendering, (2) automated high-quality DAgger data collection for adapting policies to deployment environments, (3) reproducible benchmarking of real-robot manipulation policies in simulation, (4) simulation data collection by virtual teleoperation, and (5) zero-shot sim2real visual reinforcement learning. Website: https://3dgsworld.github.io/.

本文介绍了GSWorld,这是一个稳健的光照现实模拟器,用于机器人操作,它将3D高斯拼贴技术与物理引擎相结合。我们的框架倡导“闭环”开发操作策略,通过从真实机器人数据中学习到的策略的可重复评估,以及无需使用真实机器人的sim2real策略训练。为了实现各种场景的光照现实渲染,我们提出了一种新的资产格式,我们称之为GSDF(高斯场景描述文件),它将网格上的高斯表示与机器人URDF和其他对象融合在一起。通过简化的重建流程,我们整理了一个GSDF数据库,其中包含用于单臂操作和双手操作的3种机器人形态,以及超过40个对象。将GSDF与物理引擎相结合,我们展示了几个即时有趣的应用:(1)使用光照现实渲染学习零镜头sim2real像素到操作的操作策略,(2)自动化高质量DAgger数据收集,以适应部署环境策略,(3)在模拟中进行真实机器人操作策略的可重复基准测试,(4)通过虚拟遥操作进行模拟数据采集,以及(5)零镜头sim2real视觉强化学习。网站:https://3dgsworld.github.io/。

论文及项目相关链接

PDF

Summary

本文介绍了GSWorld,一个用于机器人操作的稳健、逼真的模拟器。它结合了3D高斯溅泼和物理引擎,提倡在模拟环境中进行策略学习与评估,实现sim2real策略训练。采用新的资产格式GSDF,支持多样化的场景逼真渲染,并展示了其在多种机器人操作中的应用。

Key Takeaways

  1. GSWorld是一个用于机器人操作的模拟器,结合了3D高斯溅泼和物理引擎。
  2. 该模拟器实现了sim2real策略训练,可以在模拟环境中进行策略学习与评估。
  3. 提出了全新的资产格式GSDF,支持多样化的场景逼真渲染。
  4. GSWForld可以应用于学习零拍摄像机到动作的操作策略、自动化高质量数据收集、可重复的机器人操作策略评估等。
  5. 通过虚拟遥操作进行模拟数据采集。
  6. 支持单臂和双手操作机器人的模拟,以及超过40个物体的模拟。

Cool Papers

点此查看论文截图

OnlineSplatter: Pose-Free Online 3D Reconstruction for Free-Moving Objects

Authors:Mark He Huang, Lin Geng Foo, Christian Theobalt, Ying Sun, De Wen Soh

Free-moving object reconstruction from monocular video remains challenging, particularly without reliable pose or depth cues and under arbitrary object motion. We introduce OnlineSplatter, a novel online feed-forward framework generating high-quality, object-centric 3D Gaussians directly from RGB frames without requiring camera pose, depth priors, or bundle optimization. Our approach anchors reconstruction using the first frame and progressively refines the object representation through a dense Gaussian primitive field, maintaining constant computational cost regardless of video sequence length. Our core contribution is a dual-key memory module combining latent appearance-geometry keys with explicit directional keys, robustly fusing current frame features with temporally aggregated object states. This design enables effective handling of free-moving objects via spatial-guided memory readout and an efficient sparsification mechanism, ensuring comprehensive yet compact object coverage. Evaluations on real-world datasets demonstrate that OnlineSplatter significantly outperforms state-of-the-art pose-free reconstruction baselines, consistently improving with more observations while maintaining constant memory and runtime.

从单目视频中进行自由移动物体的重建仍然是一个挑战,特别是在没有可靠的姿态或深度线索以及任意物体运动的情况下。我们引入了OnlineSplatter,这是一种新型在线前馈框架,能够直接从RGB帧生成高质量、以物体为中心的3D高斯图,而无需相机姿态、深度先验知识或捆绑优化。我们的方法使用第一帧进行重建,并通过密集的高斯原始场逐步优化物体表示,无论视频序列长度如何,计算成本都保持不变。我们的核心贡献是一个双键内存模块,该模块结合了潜在的外观-几何键和明确的方向键,稳健地融合了当前帧特征与时间上聚合的物体状态。这种设计能够通过空间引导内存读出和高效的稀疏化机制,有效地处理自由移动的物体,确保全面而紧凑的物体覆盖。在真实数据集上的评估表明,OnlineSplatter显著优于最新的无姿态重建基线,随着观察次数的增加,其性能持续提高,同时保持恒定的内存和运行时间。

论文及项目相关链接

PDF NeurIPS 2025 (Spotlight)

Summary

在线Splatter是一种新型在线前馈框架,可从RGB帧直接生成高质量的对象中心3D高斯,无需相机姿态、深度先验或捆绑优化。它通过第一帧进行重建锚定,并通过密集的高斯原始场逐步优化对象表示,保持恒定的计算成本,无论视频序列长度如何。其核心贡献是双键内存模块,结合潜在外观-几何键与明确方向键,稳健地融合当前帧特征与时间上聚合的对象状态。这种设计可通过空间引导内存读取和高效稀疏机制有效处理自由移动的对象,确保全面而紧凑的对象覆盖。在真实世界数据集上的评估表明,OnlineSplatter显著优于最新的无姿态重建基线,随着观察次数的增加,表现持续提高,同时保持恒定的内存和运行时。

Key Takeaways

  1. OnlineSplatter是一种在线前馈框架,能够从RGB帧直接生成高质量的对象中心3D高斯。
  2. 该方法无需相机姿态、深度先验或捆绑优化。
  3. 通过第一帧进行重建锚定,并通过密集的高斯原始场逐步优化对象表示。
  4. 双键内存模块结合潜在外观-几何键与明确方向键,稳健地融合信息。
  5. 通过空间引导内存读取和高效稀疏机制有效处理自由移动的对象。
  6. OnlineSplatter在真实世界数据集上的表现优于其他最新无姿态重建方法。

Cool Papers

点此查看论文截图

COS3D: Collaborative Open-Vocabulary 3D Segmentation

Authors:Runsong Zhu, Ka-Hei Hui, Zhengzhe Liu, Qianyi Wu, Weiliang Tang, Shi Qiu, Pheng-Ann Heng, Chi-Wing Fu

Open-vocabulary 3D segmentation is a fundamental yet challenging task, requiring a mutual understanding of both segmentation and language. However, existing Gaussian-splatting-based methods rely either on a single 3D language field, leading to inferior segmentation, or on pre-computed class-agnostic segmentations, suffering from error accumulation. To address these limitations, we present COS3D, a new collaborative prompt-segmentation framework that contributes to effectively integrating complementary language and segmentation cues throughout its entire pipeline. We first introduce the new concept of collaborative field, comprising an instance field and a language field, as the cornerstone for collaboration. During training, to effectively construct the collaborative field, our key idea is to capture the intrinsic relationship between the instance field and language field, through a novel instance-to-language feature mapping and designing an efficient two-stage training strategy. During inference, to bridge distinct characteristics of the two fields, we further design an adaptive language-to-instance prompt refinement, promoting high-quality prompt-segmentation inference. Extensive experiments not only demonstrate COS3D’s leading performance over existing methods on two widely-used benchmarks but also show its high potential to various applications,~\ie, novel image-based 3D segmentation, hierarchical segmentation, and robotics. The code is publicly available at \href{https://github.com/Runsong123/COS3D}{https://github.com/Runsong123/COS3D}.

开放词汇表的3D分割是一项基础且具有挑战性的任务,需要深入理解分割和语言。然而,现有的基于高斯平铺的方法要么依赖于单个的3D语言场,导致分割效果较差,要么依赖于预先计算的类别无关的分割,导致误差累积。为了解决这些局限性,我们提出了COS3D,这是一种新的协同提示分割框架,有助于在整个管道中有效地整合互补的语言和分割线索。我们首先引入了一个新的协同场概念,它由实例场和语言场组成,作为协同合作的核心。在训练过程中,为了有效地构建协同场,我们的关键想法是通过新的实例到语言特征映射和设计一个高效的两阶段训练策略,来捕捉实例场和语言场之间的内在关系。在推理过程中,为了弥合两个场的独特特征,我们进一步设计了一种自适应的语言到实例提示优化,促进了高质量提示分割推理。大量实验不仅证明了COS3D在两个广泛使用的基准测试上的性能领先现有方法,而且显示出其在多种应用中的高潜力,例如基于新型图像的3D分割、层次分割和机器人技术。代码已公开发布在https://github.com/Runsong123/COS3D

论文及项目相关链接

PDF NeurIPS 2025. The code is publicly available at \href{https://github.com/Runsong123/COS3D}{https://github.com/Runsong123/COS3D}

Summary
开源词汇的3D分割任务是一个基础且具有挑战性的任务,要求对分割和语言有共同的理解。现有的高斯样条法存在缺陷,COS3D则通过引入协作场的新概念,包括实例场和语言场,实现了有效的互补语言和分割线索整合。通过设计实例到语言的特征映射和两个阶段的训练策略,构建了协作场。推理期间则通过语言到实例的提示精炼设计,实现了高质量提示分割推理。实验证明COS3D在广泛应用中具有领先性能,适用于多种应用,如基于图像的新颖性3D分割、层次分割和机器人技术。代码已公开在GitHub上。

Key Takeaways

  • 开源词汇的3D分割任务要求同时对分割和语言有深入理解。
  • 现有高斯样条法存在缺陷,如依赖单一的语言场或预计算的类无关分割。
  • COS3D引入协作场新概念,整合语言和分割线索。
  • 通过实例到语言的特征映射和两个阶段的训练策略构建协作场。
  • 推理期间采用语言到实例的提示精炼设计,实现高质量推理。
  • COS3D在广泛应用中具有领先性能,适用于多种应用,如新颖性3D分割、层次分割和机器人技术。

Cool Papers

点此查看论文截图

Extreme Views: 3DGS Filter for Novel View Synthesis from Out-of-Distribution Camera Poses

Authors:Damian Bowness, Charalambos Poullis

When viewing a 3D Gaussian Splatting (3DGS) model from camera positions significantly outside the training data distribution, substantial visual noise commonly occurs. These artifacts result from the lack of training data in these extrapolated regions, leading to uncertain density, color, and geometry predictions from the model. To address this issue, we propose a novel real-time render-aware filtering method. Our approach leverages sensitivity scores derived from intermediate gradients, explicitly targeting instabilities caused by anisotropic orientations rather than isotropic variance. This filtering method directly addresses the core issue of generative uncertainty, allowing 3D reconstruction systems to maintain high visual fidelity even when users freely navigate outside the original training viewpoints. Experimental evaluation demonstrates that our method substantially improves visual quality, realism, and consistency compared to existing Neural Radiance Field (NeRF)-based approaches such as BayesRays. Critically, our filter seamlessly integrates into existing 3DGS rendering pipelines in real-time, unlike methods that require extensive post-hoc retraining or fine-tuning. Code and results at https://damian-bowness.github.io/EV3DGS

在显著偏离训练数据分布的位置观察三维高斯喷溅(3DGS)模型时,通常会出现大量的视觉噪声。这些伪影是由于这些外推区域缺乏训练数据导致的,从而导致模型对密度、颜色和几何预测的不确定性。为了解决这个问题,我们提出了一种新的实时渲染感知滤波方法。我们的方法利用中间梯度派生出的灵敏度得分,明确针对由非各向同性方向引起的非稳定性,而不是各向同性方差。这种滤波方法直接解决了生成不确定性的核心问题,使得即使在用户自由浏览原始训练视角之外时,三维重建系统也能保持高度的视觉保真度。实验评估表明,与现有的基于神经辐射场(NeRF)的方法(如BayesRays)相比,我们的方法在视觉质量、真实感和一致性方面有了显著的提升。关键的是,我们的过滤器无缝集成到现有的实时3DGS渲染管道中,不同于那些需要大量事后再训练或精细调整的方法。相关代码和结果可在https://damian-bowness.github.io/EV3DGS查看。

论文及项目相关链接

PDF

Summary

3DGS模型在远离训练数据分布的相机位置观察时,会出现视觉噪声。这是由于在这些外推区域缺乏训练数据,导致模型的密度、颜色和几何预测不确定。为解决这一问题,提出了一种新型的实时渲染感知滤波方法。该方法利用中间梯度派生的灵敏度得分,专门针对由非均匀方向引起的非稳定性,而非均匀方差。这种滤波方法直接解决了生成不确定性问题,使得用户在原始训练视角之外自由导航时,仍能维持高保真度的视觉体验。实验评估显示,该方法在视觉质量、真实感和一致性方面显著优于现有的基于神经辐射场的方法,如BayesRays。最重要的是,该滤波器无缝集成到现有的实时渲染管线中,不同于需要额外训练后修正或精细调整的方法。有关代码和结果可通过网站链接查看

Key Takeaways

  1. 当从超出训练数据分布的相机位置观察时,基于三维高斯纹理映射(3DGS)的模型会出现视觉噪声。这是由于在这些区域缺乏训练数据导致的模型预测不确定性。
  2. 提出的实时渲染感知滤波方法直接解决了生成不确定性问题,通过利用中间梯度派生的灵敏度得分,明确处理非均匀方向引起的非稳定性。
  3. 该方法显著提高了视觉质量、真实感和一致性,相较于现有的基于神经辐射场的方法如BayesRays具有优势。
  4. 该滤波器能够无缝集成到现有的实时渲染管线中,便于实际应用和用户操作。这一点相较于其他需要大量训练的复杂方法更为实用和便捷。

Cool Papers

点此查看论文截图

Re-Activating Frozen Primitives for 3D Gaussian Splatting

Authors:Yuxin Cheng, Binxiao Huang, Wenyong Zhou, Taiqiang Wu, Zhengwu Liu, Graziano Chesi, Ngai Wong

3D Gaussian Splatting (3D-GS) achieves real-time photorealistic novel view synthesis, yet struggles with complex scenes due to over-reconstruction artifacts, manifesting as local blurring and needle-shape distortions. While recent approaches attribute these issues to insufficient splitting of large-scale Gaussians, we identify two fundamental limitations: gradient magnitude dilution during densification and the primitive frozen phenomenon, where essential Gaussian densification is inhibited in complex regions while suboptimally scaled Gaussians become trapped in local optima. To address these challenges, we introduce ReAct-GS, a method founded on the principle of re-activation. Our approach features: (1) an importance-aware densification criterion incorporating $\alpha$-blending weights from multiple viewpoints to re-activate stalled primitive growth in complex regions, and (2) a re-activation mechanism that revitalizes frozen primitives through adaptive parameter perturbations. Comprehensive experiments across diverse real-world datasets demonstrate that ReAct-GS effectively eliminates over-reconstruction artifacts and achieves state-of-the-art performance on standard novel view synthesis metrics while preserving intricate geometric details. Additionally, our re-activation mechanism yields consistent improvements when integrated with other 3D-GS variants such as Pixel-GS, demonstrating its broad applicability.

3D高斯融合(3D-GS)能够实现实时的逼真视角合成,但由于过度重建的伪影,在复杂场景中会遇到困难,表现为局部模糊和针状失真。虽然最近的方法将这些问题归因于大规模高斯融合的不充分分裂,但我们发现了两个基本局限:一是在密集化过程中的梯度幅度稀释,二是在复杂区域中基本体冻结现象,其中必要的高斯密集化在复杂区域受到抑制,而次优缩放的高斯则陷入局部最优。为了应对这些挑战,我们引入了ReAct-GS,这是一种基于再激活原理的方法。我们的方法具有以下特点:(1)一种重要性感知密集化标准,结合多视角的α混合权重,以重新激活复杂区域停滞的基本体增长;(2)一种再激活机制,通过自适应参数扰动使冻结的基本体复苏。在多种真实世界数据集上的综合实验表明,ReAct-GS有效地消除了过度重建的伪影,并在标准视角合成指标上达到了最先进的性能,同时保留了精细的几何细节。此外,我们的再激活机制在与其他3D-GS变体(如Pixel-GS)集成时产生了持续的改进,证明了其广泛的适用性。

论文及项目相关链接

PDF

Summary

本文介绍了针对复杂场景下的三维高斯点插值法存在的问题提出的ReAct-GS方法。ReAct-GS通过引入重要性感知的稠化准则和再激活机制,解决了局部模糊和针状失真等重建问题,提高了对复杂场景的重建效果,且在真实世界数据集上的实验结果验证了其在合成新视角的精准度和保留精细几何细节上的优越性。同时,ReAct-GS与其他三维高斯点插值法变种如Pixel-GS结合时,也能表现出良好的适用性。

Key Takeaways

  1. ReAct-GS针对复杂场景下的三维高斯点插值法(3D-GS)中存在的重建问题进行研究。这些问题主要表现为局部模糊和针状失真等重建伪影。
  2. ReAct-GS提出了重要性感知的稠化准则,通过结合多个视角的$\alpha$-混合权重,解决停滞的原始增长问题。
  3. ReAct-GS引入了再激活机制,通过自适应参数扰动来恢复冻结的原始数据。这一机制能有效解决复杂区域的重建问题。
  4. 实验结果显示,ReAct-GS在消除重建伪影、保留精细几何细节以及合成新视角的精准度方面达到了业界领先水平。其在多个真实世界数据集上的表现优于其他重建方法。

Cool Papers

点此查看论文截图

VGD: Visual Geometry Gaussian Splatting for Feed-Forward Surround-view Driving Reconstruction

Authors:Junhong Lin, Kangli Wang, Shunzhou Wang, Songlin Fan, Ge Li, Wei Gao

Feed-forward surround-view autonomous driving scene reconstruction offers fast, generalizable inference ability, which faces the core challenge of ensuring generalization while elevating novel view quality. Due to the surround-view with minimal overlap regions, existing methods typically fail to ensure geometric consistency and reconstruction quality for novel views. To tackle this tension, we claim that geometric information must be learned explicitly, and the resulting features should be leveraged to guide the elevating of semantic quality in novel views. In this paper, we introduce \textbf{Visual Gaussian Driving (VGD)}, a novel feed-forward end-to-end learning framework designed to address this challenge. To achieve generalizable geometric estimation, we design a lightweight variant of the VGGT architecture to efficiently distill its geometric priors from the pre-trained VGGT to the geometry branch. Furthermore, we design a Gaussian Head that fuses multi-scale geometry tokens to predict Gaussian parameters for novel view rendering, which shares the same patch backbone as the geometry branch. Finally, we integrate multi-scale features from both geometry and Gaussian head branches to jointly supervise a semantic refinement model, optimizing rendering quality through feature-consistent learning. Experiments on nuScenes demonstrate that our approach significantly outperforms state-of-the-art methods in both objective metrics and subjective quality under various settings, which validates VGD’s scalability and high-fidelity surround-view reconstruction.

前馈环绕视图自动驾驶场景重建提供了快速、可推广的推理能力,其核心挑战在于在提升新颖视图质量的同时确保泛化能力。由于环绕视图具有最小的重叠区域,现有方法通常无法保证新颖视图的几何一致性和重建质量。为了解决这个问题,我们主张必须显式地学习几何信息,并利用所得特征来指导提升新颖视图中的语义质量。在本文中,我们引入了视觉高斯驾驶(VGD),这是一种新型前馈端到端学习框架,旨在应对这一挑战。为了实现可泛化的几何估计,我们设计了VGGT架构的轻量级变体,以有效地从其预训练的VGGT中提取几何先验知识并将其传递给几何分支。此外,我们还设计了一个高斯头(Gaussian Head),它融合了多尺度几何标记以预测用于新颖视图渲染的高斯参数,并与几何分支共享相同的补丁主干。最后,我们整合了来自几何和高斯头分支的多尺度特征,共同监督语义细化模型,通过特征一致学习优化渲染质量。在nuScenes上的实验表明,我们的方法在客观指标和主观质量方面均显著优于各种设置下的最新方法,这验证了VGD的可扩展性和高保真环绕视图重建。

论文及项目相关链接

PDF 10 pages, 7 figures

Summary

本文提出一种名为Visual Gaussian Driving(VGD)的端到端学习框架,用于解决自动驾驶场景重建中的几何估计难题。该框架采用前馈方式,通过设计轻量级的VGGT架构变体,实现几何先验知识的提炼和共享。同时引入高斯头(Gaussian Head)进行多尺度几何标记融合,预测新视角的渲染高斯参数。通过整合几何和高斯头的多尺度特征,共同监督语义细化模型,实现特征一致的学习,优化渲染质量。在nuScenes数据集上的实验表明,该方法在客观指标和主观质量上均优于现有方法,验证了VGD的可扩展性和高保真环绕视图重建能力。

Key Takeaways

  • VGD框架解决了自动驾驶场景重建中的核心挑战,即确保几何估计的泛化能力同时提升新视角的渲染质量。
  • 引入几何信息显式学习,利用特征引导提升新视角的语义质量。
  • 设计轻量级的VGGT架构变体,实现几何先验知识的提炼和共享。
  • 通过高斯头预测新视角渲染的高斯参数,实现多尺度几何标记融合。
  • 通过整合几何和高斯头的多尺度特征,共同监督语义细化模型,优化渲染质量。
  • 实验结果表明,VGD在多种设置下均显著优于现有方法,具有可扩展性和高保真环绕视图重建能力。

Cool Papers

点此查看论文截图

Advances in 4D Representation: Geometry, Motion, and Interaction

Authors:Mingrui Zhao, Sauradip Nag, Kai Wang, Aditya Vora, Guangda Ji, Peter Chun, Ali Mahdavi-Amiri, Hao Zhang

We present a survey on 4D generation and reconstruction, a fast-evolving subfield of computer graphics whose developments have been propelled by recent advances in neural fields, geometric and motion deep learning, as well 3D generative artificial intelligence (GenAI). While our survey is not the first of its kind, we build our coverage of the domain from a unique and distinctive perspective of 4D representations/}, to model 3D geometry evolving over time while exhibiting motion and interaction. Specifically, instead of offering an exhaustive enumeration of many works, we take a more selective approach by focusing on representative works to highlight both the desirable properties and ensuing challenges of each representation under different computation, application, and data scenarios. The main take-away message we aim to convey to the readers is on how to select and then customize the appropriate 4D representations for their tasks. Organizationally, we separate the 4D representations based on three key pillars: geometry, motion, and interaction. Our discourse will not only encompass the most popular representations of today, such as neural radiance fields (NeRFs) and 3D Gaussian Splatting (3DGS), but also bring attention to relatively under-explored representations in the 4D context, such as structured models and long-range motions. Throughout our survey, we will reprise the role of large language models (LLMs) and video foundational models (VFMs) in a variety of 4D applications, while steering our discussion towards their current limitations and how they can be addressed. We also provide a dedicated coverage on what 4D datasets are currently available, as well as what is lacking, in driving the subfield forward. Project page:https://mingrui-zhao.github.io/4DRep-GMI/

我们对4D生成与重建进行了一项调查,这是计算机图形学的一个快速发展的子领域,其进展由神经网络、深度学习和几何运动以及三维生成人工智能(GenAI)的最新进展所推动。虽然我们的调查并非首创,但我们从独特的视角对领域进行了全面的覆盖,即基于四维表示来模拟随时间变化的3D几何形状的运动和交互。具体来说,我们不是提供大量工作的详尽列表,而是采取更有选择性的方法,重点关注代表性工作来突出不同计算、应用和场景下的各种表示形式的理想特性和随之而来的挑战。我们想传达给读者的主要信息是如何选择并定制适合其任务的适当的四维表示形式。在结构上,我们将根据三个关键支柱将四维表示形式分开:几何、运动和交互。我们的讨论不仅包括当今最流行的表示形式,如神经辐射场(NeRFs)和三维高斯平铺(3DGS),而且还关注在四维背景下相对未被充分研究的表示形式,如结构化模型和长距离运动。在我们的调查中,我们将回顾大型语言模型(LLM)和视频基础模型(VFM)在各种四维应用中的作用,同时引导讨论转向它们当前的局限性以及如何解决这些问题。我们还专门介绍了当前可用的四维数据集以及推动该子领域发展所缺乏的部分。项目页面:https://mingrui-zhao.github.io/4DRep-GMI/

论文及项目相关链接

PDF 21 pages. Project Page: https://mingrui-zhao.github.io/4DRep-GMI/

Summary
本文调查了4D生成与重建技术,这是计算机图形学的一个快速发展的子领域,受到神经网络、几何和运动深度学习以及3D生成人工智能(GenAI)的推动。文章从独特的角度探讨了4D表示法,旨在展示如何为不同任务选择和定制适当的4D表示法。文章涵盖了流行的表示方法,如神经辐射场(NeRF)和三维高斯溅射(3DGS),并探讨了相对未被充分探索的4D表示法。同时,文章还讨论了大型语言模型(LLMs)和视频基础模型(VFMs)在多种4D应用中的作用及其限制和挑战。最后,介绍了当前可用的4D数据集以及推动该子领域发展所需的缺失内容。

Key Takeaways

  • 文章概述了4D生成与重建技术的现状和发展趋势。
  • 文章聚焦于代表性工作,展示了不同计算、应用和场景下的各种表示方法的理想特性和挑战。
  • 文章强调了选择适当4D表示法的重要性,并将其分为几何、运动和交互三个关键支柱。
  • 文章涵盖了流行的表示方法,如神经辐射场和三维高斯溅射,并探讨了相对未被充分探索的4D表示法。
  • 文章讨论了大型语言模型和视频基础模型在多种4D应用中的角色和限制。

Cool Papers

点此查看论文截图

MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting

Authors:In-Hwan Jin, Hyeongju Mun, Joonsoo Kim, Kugjin Yun, Kyeongbo Kong

Recent advances in dynamic scene reconstruction have significantly benefited from 3D Gaussian Splatting, yet existing methods show inconsistent performance across diverse scenes, indicating no single approach effectively handles all dynamic challenges. To overcome these limitations, we propose Mixture of Experts for Dynamic Gaussian Splatting (MoE-GS), a unified framework integrating multiple specialized experts via a novel Volume-aware Pixel Router. Our router adaptively blends expert outputs by projecting volumetric Gaussian-level weights into pixel space through differentiable weight splatting, ensuring spatially and temporally coherent results. Although MoE-GS improves rendering quality, the increased model capacity and reduced FPS are inherent to the MoE architecture. To mitigate this, we explore two complementary directions: (1) single-pass multi-expert rendering and gate-aware Gaussian pruning, which improve efficiency within the MoE framework, and (2) a distillation strategy that transfers MoE performance to individual experts, enabling lightweight deployment without architectural changes. To the best of our knowledge, MoE-GS is the first approach incorporating Mixture-of-Experts techniques into dynamic Gaussian splatting. Extensive experiments on the N3V and Technicolor datasets demonstrate that MoE-GS consistently outperforms state-of-the-art methods with improved efficiency. Video demonstrations are available at https://anonymous.4open.science/w/MoE-GS-68BA/.

近期动态场景重建的研究进展得益于三维高斯展平技术,但现有方法在不同场景中的表现存在不一致,没有单一的方法能够有效地应对所有动态挑战。为了克服这些局限性,我们提出了动态高斯展平的混合专家(MoE-GS)方法,这是一个通过新型体积感知像素路由器整合多个专业专家的统一框架。我们的路由器通过可微分的权重展平,将体积高斯级别的权重自适应地混合到像素空间中,确保空间和时间上连贯的结果。尽管MoE-GS提高了渲染质量,但模型容量的增加和每秒帧数(FPS)的降低是混合专家(MoE)架构固有的。为了缓解这一问题,我们探索了两个互补的方向:(1)单通道多专家渲染和门控感知高斯修剪,以提高MoE框架内的效率;(2)蒸馏策略,将MoE的性能转移到单个专家上,实现无需架构更改的轻便部署。据我们所知,MoE-GS是首次将混合专家技术纳入动态高斯展平的方法。在N3V和Technicolor数据集上的大量实验表明,MoE-GS始终优于最新方法,且效率更高。视频演示请访问:https://anonymous.4open.science/w/MoE-GS-68BA/。

论文及项目相关链接

PDF

Summary

基于当前动态场景重建中的挑战,提出了一种基于专家混合的动态高斯贴图方法(MoE-GS)。该方法通过体积感知像素路由器集成多个专业专家,自适应融合专家输出,确保空间和时间上的连贯结果。为提高效率,还探索了单通道多专家渲染和门感知高斯剪枝方法。MoE-GS在N3V和Technicolor数据集上的表现优于其他最新方法。

Key Takeaways

  1. 当前动态场景重建中,不同方法在不同场景下的性能表现不一致,没有单一方法能有效处理所有动态挑战。
  2. 提出了一种基于专家混合的动态高斯贴图(MoE-GS)方法,集成多个专业专家处理动态场景。
  3. MoE-GS通过一个新颖的Volume-aware Pixel Router自适应地融合专家输出。
  4. 采用可微分的权重贴图技术,将体积高斯级别的权重投影到像素空间。
  5. MoE-GS在提高渲染质量的同时,也增加了模型容量并降低了帧率,这是MoE架构的固有特点。
  6. 为提高MoE框架的效率,探索了单通道多专家渲染和门感知高斯剪枝方法。
  7. MoE-GS是首个将专家混合技术融入动态高斯贴图的方法,在N3V和Technicolor数据集上的表现优于其他最新方法。

Cool Papers

点此查看论文截图

GRASPLAT: Enabling dexterous grasping through novel view synthesis

Authors:Matteo Bortolon, Nuno Ferreira Duarte, Plinio Moreno, Fabio Poiesi, José Santos-Victor, Alessio Del Bue

Achieving dexterous robotic grasping with multi-fingered hands remains a significant challenge. While existing methods rely on complete 3D scans to predict grasp poses, these approaches face limitations due to the difficulty of acquiring high-quality 3D data in real-world scenarios. In this paper, we introduce GRASPLAT, a novel grasping framework that leverages consistent 3D information while being trained solely on RGB images. Our key insight is that by synthesizing physically plausible images of a hand grasping an object, we can regress the corresponding hand joints for a successful grasp. To achieve this, we utilize 3D Gaussian Splatting to generate high-fidelity novel views of real hand-object interactions, enabling end-to-end training with RGB data. Unlike prior methods, our approach incorporates a photometric loss that refines grasp predictions by minimizing discrepancies between rendered and real images. We conduct extensive experiments on both synthetic and real-world grasping datasets, demonstrating that GRASPLAT improves grasp success rates up to 36.9% over existing image-based methods. Project page: https://mbortolon97.github.io/grasplat/

实现具有多指手指的灵巧机器人抓取仍然是一个重大挑战。尽管现有方法依赖于完整的3D扫描来预测抓取姿势,但由于在真实世界场景中获取高质量3D数据的困难,这些方法面临局限性。在本文中,我们介绍了GRASPLAT,这是一种新型抓取框架,它利用一致的3D信息,并且仅使用RGB图像进行训练。我们的关键见解是,通过合成手抓握物体的物理可行图像,我们可以回归相应的手关节以实现成功的抓取。为了实现这一点,我们采用3D高斯展平技术生成真实手与物体相互作用的高保真新视图,从而实现使用RGB数据的端到端训练。与以前的方法不同,我们的方法结合了光度损失,通过最小化渲染图像与真实图像之间的差异来优化抓取预测。我们在合成和真实世界抓取数据集上进行了广泛的实验,证明GRASPLAT相比现有基于图像的方法,抓取成功率提高了高达36.9%。项目页面:https://mbortolon97.github.io/grasplat/

论文及项目相关链接

PDF Accepted IROS 2025

Summary

基于RGB图像的新型抓取框架GRASPLAT,通过合成逼真的手抓物体图像来回归手关节位置,实现灵巧的机器人抓取。利用3D高斯拼贴技术生成真实手物交互的高保真新视角,实现端到端的RGB数据训练。通过引入光度损失优化抓取预测,显著提高抓取成功率。

Key Takeaways

  1. GRASPLAT是一个基于RGB图像的新型机器人抓取框架,无需完整的3D扫描数据。
  2. 该框架通过合成手抓物体图像来训练模型,实现手关节位置的回归。
  3. 利用3D高斯拼贴技术生成真实手物交互的高保真新视角,用于端到端的训练。
  4. GRASPLAT通过引入光度损失优化抓取预测,进一步提高抓取成功率。
  5. 该方法在合成和真实世界抓取数据集上进行了广泛实验,证明其有效性。
  6. GRASPLAT提高了现有图像方法的抓取成功率,最高可提高36.9%。

Cool Papers

点此查看论文截图

Moving Light Adaptive Colonoscopy Reconstruction via Illumination-Attenuation-Aware 3D Gaussian Splatting

Authors:Hao Wang, Ying Zhou, Haoyu Zhao, Rui Wang, Qiang Hu, Xing Zhang, Qiang Li, Zhiwei Wang

3D Gaussian Splatting (3DGS) has emerged as a pivotal technique for real-time view synthesis in colonoscopy, enabling critical applications such as virtual colonoscopy and lesion tracking. However, the vanilla 3DGS assumes static illumination and that observed appearance depends solely on viewing angle, which causes incompatibility with the photometric variations in colonoscopic scenes induced by dynamic light source/camera. This mismatch forces most 3DGS methods to introduce structure-violating vaporous Gaussian blobs between the camera and tissues to compensate for illumination attenuation, ultimately degrading the quality of 3D reconstructions. Previous works only consider the illumination attenuation caused by light distance, ignoring the physical characters of light source and camera. In this paper, we propose ColIAGS, an improved 3DGS framework tailored for colonoscopy. To mimic realistic appearance under varying illumination, we introduce an Improved Appearance Modeling with two types of illumination attenuation factors, which enables Gaussians to adapt to photometric variations while preserving geometry accuracy. To ensure the geometry approximation condition of appearance modeling, we propose an Improved Geometry Modeling using high-dimensional view embedding to enhance Gaussian geometry attribute prediction. Furthermore, another cosine embedding input is leveraged to generate illumination attenuation solutions in an implicit manner. Comprehensive experimental results on standard benchmarks demonstrate that our proposed ColIAGS achieves the dual capabilities of novel view synthesis and accurate geometric reconstruction. It notably outperforms other state-of-the-art methods by achieving superior rendering fidelity while significantly reducing Depth MSE. Code will be available.

3D高斯模糊(3DGS)已经成为结肠镜检查中实时视图合成的关键技术,支持虚拟结肠镜检查和病变追踪等重要应用。然而,传统的3DGS假设光照是静态的,观察到的外观仅取决于观看角度,这与由动态光源/相机引起的结肠镜场景的光度变化不兼容。这种不匹配使得大多数3DGS方法必须在相机和组织之间引入违反结构的蒸气状高斯斑点,以补偿光照衰减,最终降低了3D重建的质量。之前的研究只考虑了光距离引起的照明衰减,忽略了光源和相机的物理特性。在本文中,我们提出了一种针对结肠镜检查的改进型3DGS框架——ColIAGS。为了模拟不同光照下的真实外观,我们引入了两种类型的照明衰减因子的改进外观建模,使高斯模型能够在保持几何精度的同时适应光度变化。为了确保外观建模的几何近似条件,我们提出了一种使用高维视图嵌入的改进几何建模,以增强高斯几何属性预测。此外,还利用另一个余弦嵌入输入以隐含的方式生成照明衰减解决方案。在标准基准测试上的综合实验结果表明,我们提出的ColIAGS实现了新颖视图合成和精确几何重建的双重功能。与其他最先进的方法相比,它在渲染保真度方面表现出色,同时显著降低了深度均方误差。代码将公开提供。

论文及项目相关链接

PDF

Summary

本文介绍了针对结肠镜检查中的实时视图合成技术,提出了一种改进的三维高斯融合(ColIAGS)方法。该方法通过改进外观模型和几何模型来模拟不同光照条件下的真实外观,克服了原有技术的光照衰减问题,实现了高质量的3D重建。此外,通过利用高维视图嵌入和余弦嵌入技术,提高了几何属性的预测精度和光照衰减解决方案的隐含生成能力。实验结果表明,ColIAGS在标准数据集上实现了新颖的视图合成和准确的几何重建能力,并显著提高了渲染保真度和深度均方误差(Depth MSE)。

Key Takeaways

  • 3DGS在实时视图合成中扮演重要角色,特别是在结肠镜检查中。
  • 原版3DGS假设光照恒定,无法适应结肠镜场景中的光度变化。
  • ColIAGS通过改进外观模型和几何模型来模拟不同光照条件下的真实外观,提高了质量。
  • ColIAGS解决了光照衰减问题,避免了结构破坏的模糊效果。
  • 通过高维视图嵌入和余弦嵌入技术提高了几何属性预测和光照衰减解决方案的生成能力。

Cool Papers

点此查看论文截图

Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications

Authors:Naruya Kondo, Yuto Asano, Yoichi Ochiai

We present Instant Skinned Gaussian Avatars, a real-time and cross-platform 3D avatar system. Many approaches have been proposed to animate Gaussian Splatting, but they often require camera arrays, long preprocessing times, or high-end GPUs. Some methods attempt to convert Gaussian Splatting into mesh-based representations, achieving lightweight performance but sacrificing visual fidelity. In contrast, our system efficiently animates Gaussian Splatting by leveraging parallel splat-wise processing to dynamically follow the underlying skinned mesh in real time while preserving high visual fidelity. From smartphone-based 3D scanning to on-device preprocessing, the entire process takes just around five minutes, with the avatar generation step itself completed in only about 30 seconds. Our system enables users to instantly transform their real-world appearance into a 3D avatar, making it ideal for seamless integration with social media and metaverse applications. Website: https://gaussian-vrm.github.io/

我们推出即时皮肤化高斯化身(Instant Skinned Gaussian Avatars),这是一个实时跨平台的3D化身系统。虽然有许多方法已被提出用于激活高斯贴图(Gaussian Splatting),但它们通常需要相机阵列、长时间的预处理或高端GPU。一些方法试图将高斯贴图转换为基于网格的表示形式,以实现轻量级性能,但牺牲了视觉保真度。相比之下,我们的系统通过利用并行平铺处理来高效激活高斯贴图,能够实时跟踪底层蒙皮网格,同时保持高视觉保真度。从基于智能手机的3D扫描到设备上的预处理,整个过程只需大约五分钟,其中化身生成步骤本身仅需约30秒。我们的系统允许用户立即将他们在现实世界的外观转化为3D化身,使其成为与社交媒体和元宇宙应用程序无缝集成的理想选择。网站地址为:https://gaussian-vrm.github.io/。

论文及项目相关链接

PDF Accepted to SUI 2025 Demo Track

Summary
实时跨平台3D虚拟形象系统——即时皮肤高斯化身。系统通过利用并行分块处理,实时动态跟随皮肤网格,保持高视觉保真度,实现高斯拼贴动画的高效动画生成。从智能手机基础的3D扫描到设备上的预处理,整个过程仅需约五分钟,其中化身生成步骤仅需约三十秒。本系统使用户能够立即将现实世界的外貌转化为3D虚拟形象,非常适合与社交媒体和元宇宙应用无缝集成。

Key Takeaways

  1. 提出了即时皮肤高斯化身系统,实现了实时跨平台的3D虚拟形象生成。
  2. 利用并行分块处理技术,实现了高斯拼贴动画的高效动画生成。
  3. 系统能够在保持高视觉保真度的同时,实时动态跟随皮肤网格。
  4. 通过智能手机基础的3D扫描和设备上的预处理,实现了快速化身生成。
  5. 整个生成过程仅需约五分钟,其中化身生成步骤仅需约三十秒。
  6. 系统适用于社交媒体和元宇宙应用的无缝集成。

Cool Papers

点此查看论文截图

Pose-free 3D Gaussian splatting via shape-ray estimation

Authors:Youngju Na, Taeyeon Kim, Jumin Lee, Kyu Beom Han, Woo Jae Kim, Sung-eui Yoon

While generalizable 3D Gaussian splatting enables efficient, high-quality rendering of unseen scenes, it heavily depends on precise camera poses for accurate geometry. In real-world scenarios, obtaining accurate poses is challenging, leading to noisy pose estimates and geometric misalignments. To address this, we introduce SHARE, a pose-free, feed-forward Gaussian splatting framework that overcomes these ambiguities by joint shape and camera rays estimation. Instead of relying on explicit 3D transformations, SHARE builds a pose-aware canonical volume representation that seamlessly integrates multi-view information, reducing misalignment caused by inaccurate pose estimates. Additionally, anchor-aligned Gaussian prediction enhances scene reconstruction by refining local geometry around coarse anchors, allowing for more precise Gaussian placement. Extensive experiments on diverse real-world datasets show that our method achieves robust performance in pose-free generalizable Gaussian splatting. Code is avilable at https://github.com/youngju-na/SHARE

虽然通用的三维高斯贴图技术能够实现高效、高质量的场景渲染,但它严重依赖于精确的相机姿态以获得准确的几何信息。在真实场景中,获取准确的姿态是一个挑战,导致姿态估计出现噪声和几何错位。为了解决这个问题,我们引入了SHARE,一个无需姿态的前馈高斯贴图框架,它通过联合形状和相机射线的估计来克服这些模糊性。SHARE不是依赖于明确的3D转换,而是建立了一个姿态感知的标准体积表示,无缝地集成了多视角信息,减少了因姿态估计不准确导致的错位。此外,锚点对齐的高斯预测通过细化粗糙锚点周围的局部几何结构,增强了场景的重建,允许更精确的高斯放置。在多种真实数据集上的广泛实验表明,我们的方法在无需姿态的通用高斯贴图中实现了稳健的性能。代码可在https://github.com/youngju-na/SHARE获取。

论文及项目相关链接

PDF ICIP 2025 (Best Student Paper Award) Code available at: https://github.com/youngju-na/SHARE

Summary

该摘要介绍了SHARE,一个无姿态的前馈高斯贴片框架,用于解决通用三维高斯贴片在现实世界场景中因姿态估计不准确导致的几何失真问题。通过构建姿态感知规范体积表示并整合多视角信息,SHARE提高了场景重建的精度和鲁棒性。同时,锚点对齐的高斯预测技术进一步优化了粗锚点周围的局部几何结构,实现了更精确的高斯贴片放置。

Key Takeaways

  • SHARE是一个无姿态依赖的前馈高斯贴片框架,解决了因现实世界场景中姿态估计不准确导致的几何失真问题。
  • 通过构建姿态感知规范体积表示,整合多视角信息,提高了场景重建的精度。
  • 锚点对齐的高斯预测技术优化局部几何结构,实现了更精确的高斯贴片放置。
  • SHARE方法通过广泛的真实世界数据集实验验证,表现出稳健的性能。
  • 该方法公开可用,代码可在GitHub上找到。
  • 该摘要强调了无姿态依赖的重要性,解决了依赖精确相机姿态带来的挑战。

Cool Papers

点此查看论文截图

Discretized Gaussian Representation for Tomographic Reconstruction

Authors:Shaokai Wu, Yuxiang Lu, Yapan Guo, Wei Ji, Suizhi Huang, Fengyu Yang, Shalayiding Sirejiding, Qichen He, Jing Tong, Yanbiao Ji, Yue Ding, Hongtao Lu

Computed Tomography (CT) enables detailed cross-sectional imaging but continues to face challenges in balancing reconstruction quality and computational efficiency. While deep learning-based methods have significantly improved image quality and noise reduction, they typically require large-scale training data and intensive computation. Recent advances in scene reconstruction, such as Neural Radiance Fields and 3D Gaussian Splatting, offer alternative perspectives but are not well-suited for direct volumetric CT reconstruction. In this work, we propose Discretized Gaussian Representation (DGR), a novel framework that reconstructs the 3D volume directly using a set of discretized Gaussian functions in an end-to-end manner. To further enhance efficiency, we introduce Fast Volume Reconstruction, a highly parallelized technique that aggregates Gaussian contributions into the voxel grid with minimal overhead. Extensive experiments on both real-world and synthetic datasets demonstrate that DGR achieves superior reconstruction quality and runtime performance across various CT reconstruction scenarios. Our code is publicly available at https://github.com/wskingdom/DGR.

计算机断层扫描(CT)能够实现详细的横截面成像,但仍在平衡重建质量和计算效率方面面临挑战。虽然基于深度学习的方法在图像质量和降噪方面有了显著改善,但它们通常需要大规模的训练数据和大量的计算。最近场景重建技术的进步,如神经辐射场和三维高斯拼贴,提供了替代视角,但并不适合直接用于三维CT重建。在这项工作中,我们提出了离散高斯表示(DGR)这一新型框架,以端到端的方式直接使用一组离散高斯函数重建三维体积。为了进一步提高效率,我们引入了快速体积重建技术,这是一种高度并行化的技术,可以将高斯贡献以最小的开销聚集到体素网格中。在真实和合成数据集上的广泛实验表明,DGR在各种CT重建场景中实现了优越的重建质量和运行性能。我们的代码可在https://github.com/wskingdom/DGR公开获取。

论文及项目相关链接

PDF Accepted to ICCV 2025

Summary

本文提出了离散高斯表示(DGR)框架,利用离散高斯函数集直接重建三维体积。为提高效率,引入了一种高度并行化的快速体积重建技术,该技术通过将高斯贡献聚集到体素网格中来实现最小开销。实验证明,DGR在各种CT重建场景中实现了优越的重建质量和运行性能。

Key Takeaways

  1. DGR框架使用离散高斯函数集直接重建三维体积,实现高质量的CT重建。
  2. 为提高计算效率,引入了高度并行化的快速体积重建技术。
  3. DGR在真实和合成数据集上的实验证明了其在各种CT重建场景中的优越性能。
  4. DGR框架具有公开可用的代码实现,便于研究者和工程师使用。
  5. DGR在图像质量和噪声减少方面有明显改进,尤其是使用深度学习的方法。
  6. 与其他方法相比,DGR在平衡重建质量和计算效率方面表现出优势。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
NeRF NeRF
NeRF 方向最新论文已更新,请持续关注 Update in 2025-10-25 Extreme Views 3DGS Filter for Novel View Synthesis from Out-of-Distribution Camera Poses
2025-10-25
下一篇 
元宇宙/虚拟人 元宇宙/虚拟人
元宇宙/虚拟人 方向最新论文已更新,请持续关注 Update in 2025-10-25 Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications
  目录