嘘~ 正在从服务器偷取页面 . . .

3DGS


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-10-11 更新

ReSplat: Learning Recurrent Gaussian Splats

Authors:Haofei Xu, Daniel Barath, Andreas Geiger, Marc Pollefeys

While feed-forward Gaussian splatting models provide computational efficiency and effectively handle sparse input settings, their performance is fundamentally limited by the reliance on a single forward pass during inference. We propose ReSplat, a feed-forward recurrent Gaussian splatting model that iteratively refines 3D Gaussians without explicitly computing gradients. Our key insight is that the Gaussian splatting rendering error serves as a rich feedback signal, guiding the recurrent network to learn effective Gaussian updates. This feedback signal naturally adapts to unseen data distributions at test time, enabling robust generalization. To initialize the recurrent process, we introduce a compact reconstruction model that operates in a $16 \times$ subsampled space, producing $16 \times$ fewer Gaussians than previous per-pixel Gaussian models. This substantially reduces computational overhead and allows for efficient Gaussian updates. Extensive experiments across varying of input views (2, 8, 16), resolutions ($256 \times 256$ to $540 \times 960$), and datasets (DL3DV and RealEstate10K) demonstrate that our method achieves state-of-the-art performance while significantly reducing the number of Gaussians and improving the rendering speed. Our project page is at https://haofeixu.github.io/resplat/.

前馈高斯平铺模型虽然提供了计算效率并有效地处理了稀疏输入设置,但其性能从根本上受限于推理过程中的单次前向传递。我们提出了ReSplat,一种前馈递归高斯平铺模型,它可以迭代地优化3D高斯,而无需显式计算梯度。我们的关键见解是,高斯平铺渲染误差作为一种丰富的反馈信号,指导递归网络学习有效的高斯更新。这种反馈信号在测试时自然适应于未见过的数据分布,从而实现稳健的泛化。为了初始化递归过程,我们引入了一个紧凑的重建模型,该模型在16x子采样空间中运行,产生的高斯数比以前的像素级高斯模型少16x。这大大降低了计算开销,并实现了高效的高斯更新。在不同输入视图(2、8、16个)、分辨率(从256x256到540x960)和数据集(DL3DV和RealEstate10K)的大量实验中,我们的方法证明了其实现最先进的性能,同时大大减少高斯数量并提高了渲染速度。我们的项目页面为:https://haofeixu.github.io/resplat/。

论文及项目相关链接

PDF Project page: https://haofeixu.github.io/resplat/

Summary
提出的ReSplat模型为前馈递归高斯投影模型,可在无需明确计算梯度的情况下,对三维高斯进行迭代优化。利用高斯投影渲染误差作为丰富的反馈信号,指导网络学习有效的高斯更新,并自然适应未见过的数据分布,实现稳健的泛化。通过引入紧凑的重建模型初始化递归过程,在$16\times$子采样空间中进行操作,与之前的逐像素高斯模型相比,产生$16\times$更少的高斯,大大降低了计算开销,并实现了高效的高斯更新。

Key Takeaways

  1. ReSplat是一个前馈递归高斯投影模型,能够迭代优化三维高斯。
  2. 该模型利用高斯投影渲染误差作为反馈信号,指导网络学习有效的高斯更新。
  3. 反馈信号自然适应未见过的数据分布,使模型实现稳健的泛化。
  4. 紧凑的重建模型用于初始化递归过程,降低了计算开销。
  5. 与之前的逐像素高斯模型相比,ReSplat产生更少的高斯。
  6. ReSplat在多种输入视角、分辨率和数据集上实现了最先进的性能。
  7. 该项目的网页地址为https://haofeixu.github.io/resplat/。

Cool Papers

点此查看论文截图

D$^2$GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction

Authors:Meixi Song, Xin Lin, Dizhe Zhang, Haodong Li, Xiangtai Li, Bo Du, Lu Qi

Recent advances in 3D Gaussian Splatting (3DGS) enable real-time, high-fidelity novel view synthesis (NVS) with explicit 3D representations. However, performance degradation and instability remain significant under sparse-view conditions. In this work, we identify two key failure modes under sparse-view conditions: overfitting in regions with excessive Gaussian density near the camera, and underfitting in distant areas with insufficient Gaussian coverage. To address these challenges, we propose a unified framework D$^2$GS, comprising two key components: a Depth-and-Density Guided Dropout strategy that suppresses overfitting by adaptively masking redundant Gaussians based on density and depth, and a Distance-Aware Fidelity Enhancement module that improves reconstruction quality in under-fitted far-field areas through targeted supervision. Moreover, we introduce a new evaluation metric to quantify the stability of learned Gaussian distributions, providing insights into the robustness of the sparse-view 3DGS. Extensive experiments on multiple datasets demonstrate that our method significantly improves both visual quality and robustness under sparse view conditions. The project page can be found at: https://insta360-research-team.github.io/DDGS-website/.

近年来,三维高斯融合(3DGS)的最新进展使得具有明确三维表示形式的实时高保真新视图合成(NVS)成为可能。然而,在稀疏视情况下,性能下降和不稳定仍然显著。在这项工作中,我们确定了稀疏视情况下的两种主要失败模式:在相机附近的高斯密度过大区域的过度拟合以及在缺乏足够高斯覆盖的远距离区域的欠拟合。为了解决这些挑战,我们提出了一种统一框架D$^2$GS,包含两个关键组件:一种深度与密度引导丢弃策略,通过基于密度和深度的自适应掩蔽冗余高斯来抑制过度拟合;一个距离感知保真增强模块,通过有针对性的监督提高欠拟合远场区域的重建质量。此外,我们引入了一种新的评估指标来量化学习的高斯分布的稳定性,从而深入了解稀疏视图的3DGS的稳健性。在多个数据集上的大量实验表明,我们的方法在稀疏视图条件下显著提高了视觉质量和稳健性。项目页面位于:https://insta360-research-team.github.io/DDGS-website/。

论文及项目相关链接

PDF

Summary

该摘要基于最新的三维高斯绘制技术(3DGS),可实现实时高保真新型视图合成(NVS)的明确三维表示。然而,在稀疏视图条件下,性能下降和不稳定的问题仍然显著。本文提出了一个统一的框架D$^2$GS来解决这两个挑战,它结合了深度与密度引导丢弃策略(Depth-and-Density Guided Dropout)和距离感知保真增强模块(Distance-Aware Fidelity Enhancement)。此外,还引入了一种新的评估指标来量化高斯分布学习的稳定性,为稀疏视图下的三维高斯绘制提供了稳健性洞察。实验证明,该方法在视觉质量和稀疏视图条件下的稳定性方面均显著提高。

Key Takeaways

  • 最新3DGS技术可实现实时高保真NVS。
  • 稀疏视图条件下性能下降和不稳定是现有技术的两大挑战。
  • 提出统一的D$^2$GS框架,包含深度与密度引导丢弃策略以及距离感知保真增强模块。
  • 引入新的评估指标来衡量高斯分布学习的稳定性。

Cool Papers

点此查看论文截图

ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation

Authors:Guanghao Li, Kerui Ren, Linning Xu, Zhewen Zheng, Changjian Jiang, Xin Gao, Bo Dai, Jian Pu, Mulin Yu, Jiangmiao Pang

On-the-fly 3D reconstruction from monocular image sequences is a long-standing challenge in computer vision, critical for applications such as real-to-sim, AR/VR, and robotics. Existing methods face a major tradeoff: per-scene optimization yields high fidelity but is computationally expensive, whereas feed-forward foundation models enable real-time inference but struggle with accuracy and robustness. In this work, we propose ARTDECO, a unified framework that combines the efficiency of feed-forward models with the reliability of SLAM-based pipelines. ARTDECO uses 3D foundation models for pose estimation and point prediction, coupled with a Gaussian decoder that transforms multi-scale features into structured 3D Gaussians. To sustain both fidelity and efficiency at scale, we design a hierarchical Gaussian representation with a LoD-aware rendering strategy, which improves rendering fidelity while reducing redundancy. Experiments on eight diverse indoor and outdoor benchmarks show that ARTDECO delivers interactive performance comparable to SLAM, robustness similar to feed-forward systems, and reconstruction quality close to per-scene optimization, providing a practical path toward on-the-fly digitization of real-world environments with both accurate geometry and high visual fidelity. Explore more demos on our project page: https://city-super.github.io/artdeco/.

从单目图像序列进行实时3D重建是计算机视觉领域一个持久性的挑战,对于实转模拟、增强现实/虚拟现实和机器人等技术应用至关重要。现有方法面临主要权衡:针对每个场景的优化产生高保真度,但计算成本高昂,而前馈基础模型虽然能够实现实时推理,但在准确性和稳健性方面却存在困难。在这项工作中,我们提出了ARTDECO,这是一个结合前馈模型的效率与基于SLAM的管道可靠性的统一框架。ARTDECO使用3D基础模型进行姿态估计和点预测,结合高斯解码器将多尺度特征转换为结构化3D高斯。为了在大规模上同时保持保真度和效率,我们设计了一种层次化的高斯表示,并采用了具有LoD感知的渲染策略,这提高了渲染保真度,同时减少了冗余。在八个不同的室内和室外基准测试上的实验表明,ARTDECO的交互性能可与SLAM相媲美,稳健性与前馈系统相似,重建质量接近针对每个场景的优化,为我们实现具有精确几何和高视觉逼真度的现实世界环境的实时数字化提供了实际途径。想了解更多演示,请访问我们的项目页面:https://city-super.github.io/artdeco/。

论文及项目相关链接

PDF

Summary

这篇摘要介绍了ARTDECO,一种结合实时性能与可靠性的统一框架,用于从单目图像序列进行实时三维重建。它采用三维基础模型进行姿态估计和点预测,结合高斯解码器将多尺度特征转换为结构化三维高斯分布。为提高渲染质量和效率,设计了一种层次化的高斯表示方法和LoD感知渲染策略。实验表明,ARTDECO在性能、鲁棒性和重建质量方面表现优异。

Key Takeaways

  1. ARTDECO是一个结合了基础模型的实时性和基于SLAM的可靠性的一体化框架。
  2. 利用三维基础模型进行姿态估计和点预测,并采用高斯解码器转换多尺度特征到结构化三维高斯分布。
  3. 通过层次化的高斯表示方法和LoD感知渲染策略,提高了渲染质量和效率。
  4. 在室内外八个不同基准测试上的实验表明,ARTDECO的性能和SLAM相当,鲁棒性类似于前馈系统,重建质量接近单场景优化。
  5. 该技术对于实现现实世界环境的即时数字化具有重要意义。
  6. 实现了高精度几何和高视觉保真度的重建。

Cool Papers

点此查看论文截图

Splat the Net: Radiance Fields with Splattable Neural Primitives

Authors:Xilong Zhou, Bao-Huy Nguyen, Loïc Magne, Vladislav Golyanik, Thomas Leimkühler, Christian Theobalt

Radiance fields have emerged as a predominant representation for modeling 3D scene appearance. Neural formulations such as Neural Radiance Fields provide high expressivity but require costly ray marching for rendering, whereas primitive-based methods such as 3D Gaussian Splatting offer real-time efficiency through splatting, yet at the expense of representational power. Inspired by advances in both these directions, we introduce splattable neural primitives, a new volumetric representation that reconciles the expressivity of neural models with the efficiency of primitive-based splatting. Each primitive encodes a bounded neural density field parameterized by a shallow neural network. Our formulation admits an exact analytical solution for line integrals, enabling efficient computation of perspectively accurate splatting kernels. As a result, our representation supports integration along view rays without the need for costly ray marching. The primitives flexibly adapt to scene geometry and, being larger than prior analytic primitives, reduce the number required per scene. On novel-view synthesis benchmarks, our approach matches the quality and speed of 3D Gaussian Splatting while using $10\times$ fewer primitives and $6\times$ fewer parameters. These advantages arise directly from the representation itself, without reliance on complex control or adaptation frameworks. The project page is https://vcai.mpi-inf.mpg.de/projects/SplatNet/.

辐射场已经成为建模3D场景外观的主要表示方法。神经网络公式(如神经辐射场)提供了高表现力,但渲染需要昂贵的光线投射,而基于原始物体的方法(如3D高斯喷射)通过喷射实现了实时效率,但牺牲了表现力。受到这两个方向进步的启发,我们引入了可喷射神经原始物体,这是一种新的体积表示方法,能够协调神经模型的表现力与基于原始物体的喷射效率。每个原始物体编码一个有界的神经密度场,由浅层神经网络参数化。我们的公式允许对线积分进行精确解析解,能够高效计算透视准确的喷射内核。因此,我们的表示方法允许沿视线进行积分,而无需昂贵的光线投射。原始物体灵活地适应场景几何结构,并且比先前的解析原始物体更大,减少了每场景所需的原始物体数量。在新型视图合成基准测试中,我们的方法与3D高斯喷射的质量和速度相匹配,同时使用10倍的较少原始物体和6倍的较少参数。这些优势直接来自于表示本身,不需要依赖复杂的控制或适应框架。项目页面是https://vcai.mpi-inf.mpg.de/projects/SplatNet/。

论文及项目相关链接

PDF

Summary

本文介绍了基于神经原始体(neural primitives)的新型体积表示方法,结合了神经模型的表现力和基于原始体的高效溅射技术。每个原始体编码一个受浅层神经网络参数化的有界神经密度场。该方法允许精确解析解线积分,无需昂贵的光线追踪即可计算透视准确的溅射核。此表示法支持沿视线积分,灵活适应场景几何,使用较少的原始体和参数即可实现高质量和速度的3D场景渲染。项目页面为https://vcai.mpi-inf.mpg.de/projects/SplatNet/。

Key Takeaways

  1. 引入了一种新的体积表示方法——基于神经原始体的溅射技术,结合了神经模型的高表现力和原始体方法的实时效率。
  2. 每个原始体编码一个受浅层神经网络参数化的有界神经密度场,提升了模型的灵活性和适应性。
  3. 方法允许精确解析解线积分,从而实现了高效的视角准确溅射核计算,无需昂贵的光线追踪。
  4. 该方法支持沿视线积分,提升了场景渲染的质量和速度。
  5. 与传统方法相比,使用较少的原始体和参数即可达到相同甚至更好的渲染效果。
  6. 此技术可灵活适应不同的场景几何,包括复杂的形状和结构。

Cool Papers

点此查看论文截图

Efficient Label Refinement for Face Parsing Under Extreme Poses Using 3D Gaussian Splatting

Authors:Ankit Gahlawat, Anirban Mukherjee, Dinesh Babu Jayagopi

Accurate face parsing under extreme viewing angles remains a significant challenge due to limited labeled data in such poses. Manual annotation is costly and often impractical at scale. We propose a novel label refinement pipeline that leverages 3D Gaussian Splatting (3DGS) to generate accurate segmentation masks from noisy multiview predictions. By jointly fitting two 3DGS models, one to RGB images and one to their initial segmentation maps, our method enforces multiview consistency through shared geometry, enabling the synthesis of pose-diverse training data with only minimal post-processing. Fine-tuning a face parsing model on this refined dataset significantly improves accuracy on challenging head poses, while maintaining strong performance on standard views. Extensive experiments, including human evaluations, demonstrate that our approach achieves superior results compared to state-of-the-art methods, despite requiring no ground-truth 3D annotations and using only a small set of initial images. Our method offers a scalable and effective solution for improving face parsing robustness in real-world settings.

在极端视角下准确进行面部解析仍然是一个重大挑战,因为在这种姿势下的标记数据有限。手动标注成本高昂,大规模操作通常不切实际。我们提出了一种新的标签优化管道,它利用三维高斯平铺(3DGS)技术从嘈杂的多视角预测中生成准确的分割掩膜。我们通过联合拟合两个3DGS模型,一个用于RGB图像,另一个用于其初始分割图,通过共享几何结构来强制执行多视角一致性,从而能够在仅进行最小后处理的情况下合成姿势多样的训练数据。在精细数据集上微调面部解析模型,可以显著提高具有挑战性的头部姿势的准确性,同时在标准视图上保持强大的性能。包括人类评估在内的广泛实验表明,我们的方法与最先进的方法相比取得了优越的结果,尽管我们不需要真实的三维注释,并且只使用了一小部分初始图像。我们的方法提供了一种可扩展和有效的解决方案,可以在现实世界的环境中提高面部解析的稳健性。

论文及项目相关链接

PDF Accepted to VCIP 2025 (International Conference on Visual Communications and Image Processing 2025)

Summary

本文提出了一种基于3D高斯拼贴(3DGS)的标签优化流程,用于从噪声多视角预测中生成精确分割掩膜。通过联合拟合两个3DGS模型(一个用于RGB图像,另一个用于初始分割图),该方法通过共享几何信息实现多视角一致性,仅通过少量后期处理即可合成姿态多样的训练数据。在精细数据集上微调面部解析模型,可显著提高具有挑战性的头部姿态的准确性,同时保持对标准视图的强大性能。实验表明,该方法与最新技术相比具有优越的结果,尽管无需真实的三维注释并且只使用一组初始图像。本文的方法为现实世界环境中提高面部解析稳健性提供了可扩展和有效的解决方案。

Key Takeaways

  1. 利用3D高斯拼贴(3DGS)技术生成准确分割掩膜。
  2. 通过联合拟合两个3DGS模型实现多视角一致性。
  3. 仅通过少量后期处理即可合成姿态多样的训练数据。
  4. 提出的方法在具有挑战性的头部姿态上显著提高面部解析的准确性。
  5. 在标准视图上保持强大的性能表现。
  6. 实验证明,该方法无需真实的三维注释和仅使用少量初始图像即可实现优越结果。

Cool Papers

点此查看论文截图

PrismGS: Physically-Grounded Anti-Aliasing for High-Fidelity Large-Scale 3D Gaussian Splatting

Authors:Houqiang Zhong, Zhenglong Wu, Sihua Fu, Zihan Zheng, Xin Jin, Xiaoyun Zhang, Li Song, Qiang Hu

3D Gaussian Splatting (3DGS) has recently enabled real-time photorealistic rendering in compact scenes, but scaling to large urban environments introduces severe aliasing artifacts and optimization instability, especially under high-resolution (e.g., 4K) rendering. These artifacts, manifesting as flickering textures and jagged edges, arise from the mismatch between Gaussian primitives and the multi-scale nature of urban geometry. While existing ``divide-and-conquer’’ pipelines address scalability, they fail to resolve this fidelity gap. In this paper, we propose PrismGS, a physically-grounded regularization framework that improves the intrinsic rendering behavior of 3D Gaussians. PrismGS integrates two synergistic regularizers. The first is pyramidal multi-scale supervision, which enforces consistency by supervising the rendering against a pre-filtered image pyramid. This compels the model to learn an inherently anti-aliased representation that remains coherent across different viewing scales, directly mitigating flickering textures. This is complemented by an explicit size regularization that imposes a physically-grounded lower bound on the dimensions of the 3D Gaussians. This prevents the formation of degenerate, view-dependent primitives, leading to more stable and plausible geometric surfaces and reducing jagged edges. Our method is plug-and-play and compatible with existing pipelines. Extensive experiments on MatrixCity, Mill-19, and UrbanScene3D demonstrate that PrismGS achieves state-of-the-art performance, yielding significant PSNR gains around 1.5 dB against CityGaussian, while maintaining its superior quality and robustness under demanding 4K rendering.

3D高斯点云(3DGS)最近已经实现了紧凑场景下的实时逼真渲染,但在大型城市环境的扩展中出现了严重的混叠伪影和优化的不稳定性,特别是在高分辨率(例如4K)渲染下。这些伪影表现为闪烁纹理和锯齿状边缘,源于高斯原始点与城市几何多尺度特性之间的不匹配。现有的“分而治之”管道虽然解决了可扩展性,但未能解决这一保真度差距。在本文中,我们提出了PrismGS,这是一种基于物理的正则化框架,旨在改善3D高斯的内蕴渲染行为。PrismGS集成了两种协同的正则化器。第一种是金字塔多尺度监督,它通过监督渲染与预过滤图像金字塔的一致性来强制执行一致性。这使得模型学习一种固有的抗混叠表示,在不同的观看尺度上保持一致性,直接减轻了闪烁纹理。这得到了明确的尺寸正则化的补充,对3D高斯的大小施加基于物理的下界。这防止了形成退化、与视图相关的原始点,从而产生了更稳定、更合理的几何表面并减少了锯齿状边缘。我们的方法是即插即用,与现有管道兼容。在MatrixCity、Mill-19和UrbanScene3D上的大量实验表明,PrismGS达到了最先进的性能,与CityGaussian相比,PSNR增益显著提高约1.5 dB,同时在苛刻的4K渲染下保持了其卓越的质量和稳健性。

论文及项目相关链接

PDF

Summary

本文提出了PrismGS,一个基于物理规则的3D Gaussian Splatting(3DGS)正则化框架,用于改善其在大型城市环境中的渲染效果。该框架通过引入多尺度监督和显式大小正则化,解决了3DGS在渲染大型城市环境时出现的闪烁纹理和边缘锯齿问题。PrismGS方法即插即用,与现有管道兼容,并在MatrixCity、Mill-19和UrbanScene3D的实验中实现了最佳性能。

Key Takeaways

  1. PrismGS是一个针对3D Gaussian Splatting(3DGS)的正则化框架,旨在改善其在大型城市环境的渲染效果。
  2. 该方法通过引入金字塔多尺度监督来强制执行渲染的一致性,对抗闪砾纹理。
  3. 显式大小正则化对3DGS的维度施加物理基础的下限,防止形成退化、视图依赖的原始数据,从而减少边缘锯齿。
  4. PrismGS方法即插即用,与现有管道兼容。
  5. 在MatrixCity、Mill-19和UrbanScene3D的实验中,PrismGS实现了最佳性能,相较于CityGaussian有约1.5dB的PSNR增益。
  6. PrismGS在保持高质量渲染的同时,也能应对高分辨率(如4K)的渲染需求,展现出其优越的性能和稳健性。

Cool Papers

点此查看论文截图

DEGS: Deformable Event-based 3D Gaussian Splatting from RGB and Event Stream

Authors:Junhao He, Jiaxu Wang, Jia Li, Mingyuan Sun, Qiang Zhang, Jiahang Cao, Ziyi Zhang, Yi Gu, Jingkai Sun, Renjing Xu

Reconstructing Dynamic 3D Gaussian Splatting (3DGS) from low-framerate RGB videos is challenging. This is because large inter-frame motions will increase the uncertainty of the solution space. For example, one pixel in the first frame might have more choices to reach the corresponding pixel in the second frame. Event cameras can asynchronously capture rapid visual changes and are robust to motion blur, but they do not provide color information. Intuitively, the event stream can provide deterministic constraints for the inter-frame large motion by the event trajectories. Hence, combining low-temporal-resolution images with high-framerate event streams can address this challenge. However, it is challenging to jointly optimize Dynamic 3DGS using both RGB and event modalities due to the significant discrepancy between these two data modalities. This paper introduces a novel framework that jointly optimizes dynamic 3DGS from the two modalities. The key idea is to adopt event motion priors to guide the optimization of the deformation fields. First, we extract the motion priors encoded in event streams by using the proposed LoCM unsupervised fine-tuning framework to adapt an event flow estimator to a certain unseen scene. Then, we present the geometry-aware data association method to build the event-Gaussian motion correspondence, which is the primary foundation of the pipeline, accompanied by two useful strategies, namely motion decomposition and inter-frame pseudo-label. Extensive experiments show that our method outperforms existing image and event-based approaches across synthetic and real scenes and prove that our method can effectively optimize dynamic 3DGS with the help of event data.

从低帧率的RGB视频重建动态三维高斯片元(3DGS)是一项挑战。这是因为帧间的大运动会增加解空间的不确定性。例如,第一帧中的一个像素可能有更多选择到达第二帧中对应的像素。事件相机可以异步捕获快速的视觉变化,并且对运动模糊具有鲁棒性,但它们不提供颜色信息。直觉上,事件流可以通过事件轨迹为帧间大运动提供确定性约束。因此,将低时间分辨率图像与高帧率的事件流相结合可以应对这一挑战。然而,由于这两种数据模态之间的差异很大,因此联合优化RGB和事件模态的动态3DGS是具有挑战性的。本文引入了一个联合优化两种模态的动态3DGS的新框架。关键思想是采用事件运动先验来指导变形场的优化。首先,我们使用提出的LoCM无监督微调框架从事件流中提取运动先验知识,以适应某个未见过的场景。然后,我们提出了基于几何意识的数据关联方法来建立事件-高斯运动对应关系,这是管道的主要基础,并辅以两个有用的策略,即运动分解和帧间伪标签。大量实验表明,我们的方法在合成场景和真实场景上都优于现有的图像和事件处理方法,并证明我们的方法可以在事件数据的帮助下有效地优化动态3DGS。

论文及项目相关链接

PDF Accepted by TVCG

Summary

本文提出了一种结合RGB视频和事件流数据的动态三维高斯展铺(3DGS)重建框架。该框架利用事件运动先验信息来优化变形场,并通过事件轨迹为帧间大运动提供确定性约束。通过采用LoCM无监督微调框架提取事件流中的运动先验信息,并引入几何感知数据关联方法建立事件-高斯运动对应关系。该方法在合成场景和真实场景的实验中均表现出优于现有图像和事件基础方法的效果,证明了借助事件数据优化动态3DGS的有效性。

Key Takeaways

  1. 动态三维高斯展铺(3DGS)从低帧率RGB视频重建具有挑战性,大帧间运动增加了解决方案空间的不确定性。
  2. 事件相机可以异步捕获快速视觉变化,对运动模糊具有鲁棒性,但不提供颜色信息。
  3. 事件流可以通过事件轨迹为帧间大运动提供确定性约束。
  4. 结合低时间分辨率图像与高帧率事件流可以解决这一挑战。
  5. 挑战在于如何联合优化RGB和事件模态的动态3DGS,两者之间存在显著差异。
  6. 本文引入了一种联合优化两种模态动态3DGS的新框架,采用事件运动先验指导变形场的优化。

Cool Papers

点此查看论文截图

ComGS: Efficient 3D Object-Scene Composition via Surface Octahedral Probes

Authors:Jian Gao, Mengqi Yuan, Yifei Zeng, Chang Zeng, Zhihao Li, Zhenyu Chen, Weichao Qiu, Xiao-Xiao Long, Hao Zhu, Xun Cao, Yao Yao

Gaussian Splatting (GS) enables immersive rendering, but realistic 3D object-scene composition remains challenging. Baked appearance and shadow information in GS radiance fields cause inconsistencies when combining objects and scenes. Addressing this requires relightable object reconstruction and scene lighting estimation. For relightable object reconstruction, existing Gaussian-based inverse rendering methods often rely on ray tracing, leading to low efficiency. We introduce Surface Octahedral Probes (SOPs), which store lighting and occlusion information and allow efficient 3D querying via interpolation, avoiding expensive ray tracing. SOPs provide at least a 2x speedup in reconstruction and enable real-time shadow computation in Gaussian scenes. For lighting estimation, existing Gaussian-based inverse rendering methods struggle to model intricate light transport and often fail in complex scenes, while learning-based methods predict lighting from a single image and are viewpoint-sensitive. We observe that 3D object-scene composition primarily concerns the object’s appearance and nearby shadows. Thus, we simplify the challenging task of full scene lighting estimation by focusing on the environment lighting at the object’s placement. Specifically, we capture a 360 degrees reconstructed radiance field of the scene at the location and fine-tune a diffusion model to complete the lighting. Building on these advances, we propose ComGS, a novel 3D object-scene composition framework. Our method achieves high-quality, real-time rendering at around 28 FPS, produces visually harmonious results with vivid shadows, and requires only 36 seconds for editing. Code and dataset are available at https://nju-3dv.github.io/projects/ComGS/.

高斯飞溅(GS)技术能够实现沉浸式渲染,但真实3D对象场景组合仍然具有挑战性。GS辐射场中的烘焙外观和阴影信息在组合对象和场景时会导致不一致。解决此问题需要可照明的对象重建和场景灯光估计。对于可照明的对象重建,现有的基于高斯值的逆渲染方法通常依赖于光线追踪,导致效率较低。我们引入了表面八面探针(SOPs),其存储照明和遮挡信息,并通过插值实现高效3D查询,避免了昂贵的光线追踪。SOPs至少可以将重建速度提高两倍,并在高斯场景中实现实时阴影计算。对于灯光估计,现有的基于高斯的逆渲染方法在模拟复杂的光传输时遇到困难,在复杂的场景中经常失效,而基于学习的方法则从单一图像预测照明,对观点很敏感。我们观察到,3D对象场景组合主要关注的是对象的外观和附近的阴影。因此,我们通过关注对象放置处的环境照明来简化整个场景灯光估计的艰巨任务。具体来说,我们在该位置对场景的360度重建辐射场进行捕捉,并对扩散模型进行微调以完成照明。基于这些进步,我们提出了ComGS,一种新型的3D对象场景组合框架。我们的方法以大约28 FPS的高品质实现实时渲染,产生视觉和谐、阴影生动的结果,编辑仅需36秒。代码和数据集可在https://nju-3dv.github.io/projects/ComGS/获得。

论文及项目相关链接

PDF

Summary

该文讨论了Gaussian Splatting(GS)在沉浸式渲染中的局限性,特别是在3D对象场景的组成方面。为了解决结合对象和场景时出现的问题,例如颜色和阴影不一致性,提出了使用Surface Octahedral Probes(SOPs)来提高对象的重建效率和实时阴影计算的方法。同时,通过简化场景灯光估计任务,专注于对象放置位置的环境灯光,提出了一种新的3D对象场景组成框架ComGS。该方法实现了高质量、实时的渲染,并具有生动的阴影效果。

Key Takeaways

  • Gaussian Splatting在沉浸式渲染中面临挑战,特别是在3D对象场景的组成方面。
  • 现有方法在处理对象和场景的整合时,会出现颜色和阴影不一致的问题。
  • Surface Octahedral Probes(SOPs)用于存储照明和遮挡信息,通过插值实现高效的3D查询,避免昂贵的光线追踪。
  • SOPs至少可将重建速度提高2倍,并允许在Gaussian场景中实时计算阴影。
  • 简化场景灯光估计任务,专注于对象放置位置的环境灯光。
  • 通过在特定位置捕捉场景的360度重建辐射场并使用扩散模型进行微调来完成灯光估计。
  • 提出了全新的3D对象场景组成框架ComGS,实现高质量、实时渲染,产生生动的阴影效果,编辑时间仅需36秒。

Cool Papers

点此查看论文截图

RTGS: Real-Time 3D Gaussian Splatting SLAM via Multi-Level Redundancy Reduction

Authors:Leshu Li, Jiayin Qin, Jie Peng, Zishen Wan, Huaizhi Qu, Ye Han, Pingqing Zheng, Hongsen Zhang, Yu Cao, Tianlong Chen, Yang Katie Zhao

3D Gaussian Splatting (3DGS) based Simultaneous Localization and Mapping (SLAM) systems can largely benefit from 3DGS’s state-of-the-art rendering efficiency and accuracy, but have not yet been adopted in resource-constrained edge devices due to insufficient speed. Addressing this, we identify notable redundancies across the SLAM pipeline for acceleration. While conceptually straightforward, practical approaches are required to minimize the overhead associated with identifying and eliminating these redundancies. In response, we propose RTGS, an algorithm-hardware co-design framework that comprehensively reduces the redundancies for real-time 3DGS-SLAM on edge. To minimize the overhead, RTGS fully leverages the characteristics of the 3DGS-SLAM pipeline. On the algorithm side, we introduce (1) an adaptive Gaussian pruning step to remove the redundant Gaussians by reusing gradients computed during backpropagation; and (2) a dynamic downsampling technique that directly reuses the keyframe identification and alpha computing steps to eliminate redundant pixels. On the hardware side, we propose (1) a subtile-level streaming strategy and a pixel-level pairwise scheduling strategy that mitigates workload imbalance via a Workload Scheduling Unit (WSU) guided by previous iteration information; (2) a Rendering and Backpropagation (R&B) Buffer that accelerates the rendering backpropagation by reusing intermediate data computed during rendering; and (3) a Gradient Merging Unit (GMU) to reduce intensive memory accesses caused by atomic operations while enabling pipelined aggregation. Integrated into an edge GPU, RTGS achieves real-time performance (>= 30 FPS) on four datasets and three algorithms, with up to 82.5x energy efficiency over the baseline and negligible quality loss. Code is available at https://github.com/UMN-ZhaoLab/RTGS.

基于3D高斯延展(3DGS)的同时定位与地图构建(SLAM)系统可以大大受益于3DGS的先进渲染效率和准确性。然而,由于速度不足,尚未在资源受限的边缘设备上采用该技术。针对这一问题,我们识别出SLAM管道中的显著冗余以进行加速。尽管概念上很简单,但实际需要采取方法来最小化与识别和消除这些冗余相关的开销。因此,我们提出了RTGS,这是一种算法与硬件协同设计框架,全面减少了边缘实时3DGS-SLAM的冗余。为了最小化开销,RTGS充分利用了3DGS-SLAM管道的特点。在算法方面,我们引入了(1)自适应高斯修剪步骤,通过重用反向传播中计算的梯度来去除冗余高斯;(2)动态下采样技术,直接重用关键帧识别和alpha计算步骤来消除冗余像素。在硬件方面,我们提出了(1)子瓦级流策略以及由前一迭代信息引导的像素级配对调度策略的工作负载调度单元(WSU);(2)渲染和反向传播(R&B)缓冲区,通过重用渲染过程中计算的中间数据来加速渲染反向传播;(3)梯度合并单元(GMU)减少了原子操作引起的密集内存访问,同时实现了流水线聚合。RTGS被集成到边缘GPU中,在四个数据集和三种算法上实现了实时性能(>= 30 FPS),比基线能效提高高达82.5倍,且质量损失微乎其微。相关代码可在https://github.com/UMN-ZhaoLab/RTGS找到。

论文及项目相关链接

PDF Accepted by MICRO2025

Summary

本文介绍了基于3D高斯拼贴(3DGS)的同步定位与地图构建(SLAM)系统可以受益于3DGS的先进渲染效率和准确性,但由于速度不足,尚未在资源受限的边缘设备上得到广泛应用。为解决此问题,本文提出了RTGS,这是一种算法与硬件协同设计框架,全面减少了实时3DGS-SLAM在边缘计算中的冗余。通过引入自适应高斯修剪步骤和动态下采样技术,算法方面减少了冗余的高斯和像素。硬件方面,本文提出了基于工作负载调度单元的子块级流策略和像素级配对调度策略、渲染与反向传播缓冲以及梯度合并单元等技术,以加速渲染和反向传播过程并减少内存访问。在边缘GPU上集成RTGS后,在四个数据集和三个算法上实现了实时性能(>= 30 FPS),较基线最高能效提升82.5倍,且质量损失可忽略不计。

Key Takeaways

  1. 3DGS的SLAM系统具有高效的渲染能力和准确性。
  2. SLAM系统中存在冗余性,需要消除以提高速度。
  3. RTGS是一个算法-硬件协同设计框架,旨在减少实时3DGS-SLAM的冗余。
  4. RTGS通过自适应高斯修剪和动态下采样技术减少冗余。
  5. 硬件加速策略包括子块级流策略、像素级配对调度、渲染与反向传播缓冲以及梯度合并单元等。
  6. RTGS在边缘GPU上实现了实时性能,较基线能效有显著提升。

Cool Papers

点此查看论文截图

MonoGSDF: Exploring Monocular Geometric Cues for Gaussian Splatting-Guided Implicit Surface Reconstruction

Authors:Kunyi Li, Michael Niemeyer, Zeyu Chen, Nassir Navab, Federico Tombari

Accurate meshing from monocular images remains a key challenge in 3D vision. While state-of-the-art 3D Gaussian Splatting (3DGS) methods excel at synthesizing photorealistic novel views through rasterization-based rendering, their reliance on sparse, explicit primitives severely limits their ability to recover watertight and topologically consistent 3D surfaces.We introduce MonoGSDF, a novel method that couples Gaussian-based primitives with a neural Signed Distance Field (SDF) for high-quality reconstruction. During training, the SDF guides Gaussians’ spatial distribution, while at inference, Gaussians serve as priors to reconstruct surfaces, eliminating the need for memory-intensive Marching Cubes. To handle arbitrary-scale scenes, we propose a scaling strategy for robust generalization. A multi-resolution training scheme further refines details and monocular geometric cues from off-the-shelf estimators enhance reconstruction quality. Experiments on real-world datasets show MonoGSDF outperforms prior methods while maintaining efficiency.

从单目图像中准确生成网格仍然是3D视觉领域的一个关键挑战。虽然最先进的3D高斯展布(3DGS)方法在基于光栅化的渲染中擅长合成逼真的新视图,但它们对稀疏、显式原始数据的依赖严重限制了其在恢复无泄漏和拓扑一致的3D表面方面的能力。我们引入了MonoGSDF这一新方法,它将基于高斯原始数据与神经有向距离场(SDF)相结合,以实现高质量重建。在训练过程中,SDF引导高斯的空间分布,而在推理过程中,高斯作为重建表面的先验,无需使用内存密集型的Marching Cubes。为了处理任意尺度的场景,我们提出了一种稳健的泛化尺度策略。多分辨率训练方案进一步改进了细节,而现成的估计器中的单目几何线索提高了重建质量。在真实世界数据集上的实验表明,MonoGSDF在保持效率的同时,优于以前的方法。

论文及项目相关链接

PDF

Summary

在3D视觉中,从单目图像准确生成网格仍是关键挑战。当前先进的3D高斯展布(3DGS)方法擅长通过基于光栅化的渲染合成逼真的新视图,但它们依赖于稀疏、明确的原始形态,严重限制了其在恢复防水且拓扑一致的3D表面方面的能力。本文提出MonoGSDF新方法,结合高斯基元和神经签名距离场(SDF)实现高质量重建。训练过程中,SDF引导高斯的空间分布;在推理过程中,高斯作为重建表面的先验,无需耗费大量内存的等值面生成算法。为处理任意场景,我们提出一种稳健的泛化尺度策略。多分辨率训练方案进一步细化细节,通过现成的估计器增强单眼几何线索,提高重建质量。实验证明,在真实世界数据集上,MonoGSDF较先前的方法表现更优秀,同时保持高效。

Key Takeaways

  1. 当前从单目图像生成网格是3D视觉的主要挑战之一。
  2. 现有的先进方法如3DGS虽能合成逼真视图,但在恢复3D表面方面存在局限性。
  3. MonoGSDF方法结合了高斯基元和神经签名距离场(SDF),以实现高质量重建。
  4. SDF在训练过程中引导高斯的空间分布。
  5. 高斯作为重建表面的先验,可优化内存使用。
  6. 提出了一种处理任意场景的稳健泛化尺度策略。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
NeRF NeRF
NeRF 方向最新论文已更新,请持续关注 Update in 2025-10-11 Splat the Net Radiance Fields with Splattable Neural Primitives
2025-10-11
下一篇 
GAN GAN
GAN 方向最新论文已更新,请持续关注 Update in 2025-10-11 Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency
2025-10-11
  目录