⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-11-11 更新
4D3R: Motion-Aware Neural Reconstruction and Rendering of Dynamic Scenes from Monocular Videos
Authors:Mengqi Guo, Bo Xu, Yanyan Li, Gim Hee Lee
Novel view synthesis from monocular videos of dynamic scenes with unknown camera poses remains a fundamental challenge in computer vision and graphics. While recent advances in 3D representations such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have shown promising results for static scenes, they struggle with dynamic content and typically rely on pre-computed camera poses. We present 4D3R, a pose-free dynamic neural rendering framework that decouples static and dynamic components through a two-stage approach. Our method first leverages 3D foundational models for initial pose and geometry estimation, followed by motion-aware refinement. 4D3R introduces two key technical innovations: (1) a motion-aware bundle adjustment (MA-BA) module that combines transformer-based learned priors with SAM2 for robust dynamic object segmentation, enabling more accurate camera pose refinement; and (2) an efficient Motion-Aware Gaussian Splatting (MA-GS) representation that uses control points with a deformation field MLP and linear blend skinning to model dynamic motion, significantly reducing computational cost while maintaining high-quality reconstruction. Extensive experiments on real-world dynamic datasets demonstrate that our approach achieves up to 1.8dB PSNR improvement over state-of-the-art methods, particularly in challenging scenarios with large dynamic objects, while reducing computational requirements by 5x compared to previous dynamic scene representations.
从动态场景的单目视频中合成新颖视图,并且面对未知相机姿态,仍然是计算机视觉和图形学中的一项基本挑战。尽管最近的3D表示技术进展,例如神经辐射场(NeRF)和3D高斯展开(3DGS)对静态场景显示出有希望的结果,但它们对于动态内容却表现困难,并且通常依赖于预先计算的相机姿态。我们提出了一个无姿态的动态神经渲染框架4D3R,它通过两阶段方法将静态和动态组件解耦。我们的方法首先利用3D基础模型进行初始姿态和几何估计,然后进行感知运动的细化。4D3R引入两项关键技术创新:(1)一种感知运动的捆绑调整(MA-BA)模块,它将基于变压器的先验学习与SAM2相结合,实现稳健的动态对象分割,从而实现更精确相机姿态细化;(2)高效的感知运动高斯展开(MA-GS)表示法,使用带有变形场MLP和线性混合蒙皮技术的控制点来模拟动态运动,在保持高质量重建的同时显著降低计算成本。在真实世界动态数据集上的广泛实验表明,我们的方法比最先进的方法最多提高了1.8dB的峰值信噪比(PSNR),特别是在具有大型动态对象的挑战场景中表现尤为出色,同时与先前的动态场景表示相比,计算要求降低了5倍。
论文及项目相关链接
PDF 17 pages, 5 figures
Summary
本文提出一种无需姿态的4D动态神经网络渲染框架,用于从动态场景的视角合成图像。通过两个阶段的方法,将静态和动态组件解耦,利用三维基础模型进行初始姿态和几何估计,然后进行动态感知的细化。引入两种关键技术:运动感知束调整模块和运动感知高斯喷绘表示法,以提高动态场景的重建质量和效率。实验证明,该方法在真实动态数据集上取得了显著成果。
Key Takeaways
- 4D动态神经网络渲染框架解决了从动态场景的视角合成图像的问题。
- 利用三维基础模型进行初始姿态和几何估计。
- 通过两个阶段的方法将静态和动态组件解耦。
- 引入运动感知束调整模块,结合基于变压器的先验知识和SAM2进行动态对象分割,提高相机姿态的准确性。
- 提出运动感知高斯喷绘表示法,通过控制点和变形场MLP以及线性混合皮肤技术来模拟动态运动,降低了计算成本并保持了高质量的重构。
点此查看论文截图
Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges
Authors:Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull
Deformable Gaussian Splatting (GS) accomplishes photorealistic dynamic 3-D reconstruction from dense multi-view video (MVV) by learning to deform a canonical GS representation. However, in filmmaking, tight budgets can result in sparse camera configurations, which limits state-of-the-art (SotA) methods when capturing complex dynamic features. To address this issue, we introduce an approach that splits the canonical Gaussians and deformation field into foreground and background components using a sparse set of masks for frames at t=0. Each representation is separately trained on different loss functions during canonical pre-training. Then, during dynamic training, different parameters are modeled for each deformation field following common filmmaking practices. The foreground stage contains diverse dynamic features so changes in color, position and rotation are learned. While, the background containing film-crew and equipment, is typically dimmer and less dynamic so only changes in point position are learned. Experiments on 3-D and 2.5-D entertainment datasets show that our method produces SotA qualitative and quantitative results; up to 3 PSNR higher with half the model size on 3-D scenes. Unlike the SotA and without the need for dense mask supervision, our method also produces segmented dynamic reconstructions including transparent and dynamic textures. Code and video comparisons are available online: https://interims-git.github.io/
可变形的高斯喷绘(GS)通过学习变形标准GS表示法,实现了从密集的多视角视频(MVV)生成逼真的动态三维重建。然而,在电影制作中,紧张的预算可能导致摄像机配置稀疏,这限制了当前先进技术捕捉复杂动态特征的能力。为了解决这一问题,我们引入了一种方法,该方法使用稀疏的掩膜集将标准高斯和变形场分为前景和背景成分,用于t=0时的帧。在标准预训练期间,每个表示都针对不同的损失函数进行单独训练。然后,在动态训练期间,根据电影制作的通用实践,为每个变形场建立不同的参数模型。前景阶段包含各种动态特征,因此学习颜色、位置和旋转的变化。而包含电影摄制组和设备的背景通常较暗且动态较少,因此只学习点的位置变化。在三维和2.5维娱乐数据集上的实验表明,我们的方法产生了最先进的定性和定量结果;在三维场景上,模型大小减半,PSNR提高达3。与当前先进技术不同,且无需密集的掩膜监督,我们的方法还能产生分割的动态重建,包括透明和动态纹理。在线提供代码和视频比较:https://interims-git.github.io/。
论文及项目相关链接
Summary
基于稀疏遮挡的动态可变形高斯贴片(GS)方法实现了从稀疏多视角视频(MVV)到逼真的动态三维重建。为解决预算紧张导致的稀疏相机配置问题,该方法引入了基于时间节点将原始场景中的动态对象拆分为前景与背景部分的机制,以通过独立的预处理方式实现对两者的高效处理与构建,进一步提升了对动态复杂特征的重构精度和模型紧凑度。同时,前景与背景处理阶段采用不同的损失函数与参数建模方式,确保动态特征的有效捕捉与重建质量。实验证明,该方法在三维和二维半娱乐数据集上取得了显著成果,相较于现有技术提高了高达3 PSNR,同时在模型尺寸方面减少了一半的需求。该模型具有高度的鲁棒性和创新性,能够在不使用密集遮挡监督的情况下产生分段动态重建,包括透明和动态纹理。更多详情可访问在线代码和视频对比页面。
Key Takeaways
- 可变形高斯贴片技术通过处理稀疏视角实现了逼真动态的三维重建。
- 将重建分为前景与背景两部分,使用不同的处理方式优化对动态复杂特征的捕捉能力。
- 前景和背景部分通过独立预处理并辅以特定的损失函数训练以提升效果。
- 通过不同的参数建模以适应影视制作中对前景与背景不同的特征变化需求。
- 实验证明该方法在多个娱乐数据集上表现优异,相较现有技术有更高的重建质量和更小的模型需求。
点此查看论文截图
CLM: Removing the GPU Memory Barrier for 3D Gaussian Splatting
Authors:Hexu Zhao, Xiwen Min, Xiaoteng Liu, Moonjun Gong, Yiming Li, Ang Li, Saining Xie, Jinyang Li, Aurojit Panda
3D Gaussian Splatting (3DGS) is an increasingly popular novel view synthesis approach due to its fast rendering time, and high-quality output. However, scaling 3DGS to large (or intricate) scenes is challenging due to its large memory requirement, which exceed most GPU’s memory capacity. In this paper, we describe CLM, a system that allows 3DGS to render large scenes using a single consumer-grade GPU, e.g., RTX4090. It does so by offloading Gaussians to CPU memory, and loading them into GPU memory only when necessary. To reduce performance and communication overheads, CLM uses a novel offloading strategy that exploits observations about 3DGS’s memory access pattern for pipelining, and thus overlap GPU-to-CPU communication, GPU computation and CPU computation. Furthermore, we also exploit observation about the access pattern to reduce communication volume. Our evaluation shows that the resulting implementation can render a large scene that requires 100 million Gaussians on a single RTX4090 and achieve state-of-the-art reconstruction quality.
3D高斯混合(3DGS)是一种越来越受欢迎的新型视图合成方法,因其快速的渲染时间和高质量输出而受到赞誉。然而,将3DGS扩展到大型(或复杂)场景具有挑战性,因为其内存需求巨大,超过了大多数GPU的内存容量。在本文中,我们介绍了CLM系统,它允许使用单个消费级GPU(例如RTX4090)进行大型场景的3DGS渲染。它通过卸载高斯量到CPU内存,仅在必要时加载到GPU内存来实现这一点。为了减少性能和通信开销,CLM采用了一种新型卸载策略,利用对3DGS内存访问模式的观察来进行管道化,从而重叠GPU到CPU的通信、GPU计算和CPU计算。此外,我们还利用对访问模式的观察来减少通信量。我们的评估表明,所实现的系统可以在单个RTX4090上呈现需要1亿个高斯的大型场景,并实现最先进的重建质量。
论文及项目相关链接
PDF Accepted to appear in the 2026 ACM International Conference on Architectural Support for Programming Languages and Operating Systems
Summary
3DGS(三维高斯拼贴技术)在处理大场景时面临内存挑战。CLM系统通过将高斯数据卸载至CPU内存,仅在必要时加载至GPU内存,使得使用单一消费级GPU进行大场景渲染成为可能。CLM利用独特的卸载策略优化性能并减少通信开销,同时通过并行处理和观察数据访问模式来降低通信量。通过CLM,能够在单个RTX4090上实现大场景的渲染并保持一流的重建质量。
Key Takeaways
- 3DGS作为一种新型的视景合成方法因其快速渲染时间和高质量输出而受到关注。但在处理大规模或复杂场景时面临内存挑战。
- CLM系统通过动态调整高斯数据的存储和加载策略解决了这一挑战,允许使用单一消费级GPU进行大场景渲染。
- CLM系统将高斯数据卸载至CPU内存,只在必要时将其加载至GPU内存,从而实现高性能运行并有效利用内存资源。
- 利用对内存访问模式的观察设计了一种独特的卸载策略,实现了GPU与CPU之间的通信、GPU计算和CPU计算的并行处理。这有助于减少性能开销和通信开销。
点此查看论文截图
GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction
Authors:Jiahe Li, Jiawei Zhang, Youmin Zhang, Xiao Bai, Jin Zheng, Xiaohan Yu, Lin Gu
Reconstructing accurate surfaces with radiance fields has achieved remarkable progress in recent years. However, prevailing approaches, primarily based on Gaussian Splatting, are increasingly constrained by representational bottlenecks. In this paper, we introduce GeoSVR, an explicit voxel-based framework that explores and extends the under-investigated potential of sparse voxels for achieving accurate, detailed, and complete surface reconstruction. As strengths, sparse voxels support preserving the coverage completeness and geometric clarity, while corresponding challenges also arise from absent scene constraints and locality in surface refinement. To ensure correct scene convergence, we first propose a Voxel-Uncertainty Depth Constraint that maximizes the effect of monocular depth cues while presenting a voxel-oriented uncertainty to avoid quality degradation, enabling effective and robust scene constraints yet preserving highly accurate geometries. Subsequently, Sparse Voxel Surface Regularization is designed to enhance geometric consistency for tiny voxels and facilitate the voxel-based formation of sharp and accurate surfaces. Extensive experiments demonstrate our superior performance compared to existing methods across diverse challenging scenarios, excelling in geometric accuracy, detail preservation, and reconstruction completeness while maintaining high efficiency. Code is available at https://github.com/Fictionarry/GeoSVR.
近年来,利用辐射场重建精确表面已经取得了显著的进展。然而,目前主要基于高斯涂抹的方法受到表示瓶颈的越来越多的制约。在本文中,我们介绍了GeoSVR,这是一个基于显式的体素框架,它探索和扩展了稀疏体素在实现准确、详细和完整的表面重建方面的潜力。稀疏体素的优势在于能够保持覆盖的完整性和几何清晰度,而相应的挑战也来自于场景约束的缺失和表面细化中的局部性。为了确保正确的场景收敛,我们首先提出了一种体素不确定性深度约束,这种约束最大限度地发挥了单目深度线索的作用,同时呈现出面向体素的不确定性,以避免质量下降,从而实现有效和稳健的场景约束,同时保持高度准确的几何形状。随后,设计了稀疏体素表面正则化,以提高微小体素的几何一致性,促进基于体素的尖锐和精确表面的形成。大量实验表明,与现有方法相比,我们的方法在多种具有挑战性的场景下具有优越的性能,在几何准确性、细节保留和重建完整性方面表现出色,同时保持了高效率。代码可访问 https://github.com/Fictionarry/GeoSVR 了解。
论文及项目相关链接
PDF Accepted at NeurIPS 2025 (Spotlight). Project page: https://fictionarry.github.io/GeoSVR-project/
Summary
本文提出一种基于稀疏体素的显式框架GeoSVR,用于实现准确、精细且完整的表面重建。通过利用稀疏体素的优势,该框架能够保持覆盖完整性和几何清晰度,同时解决场景约束缺失和表面细化中的局部性问题。通过引入体素不确定性深度约束和稀疏体素表面正则化,提高了场景收敛的正确性,增强了几何一致性,实现了尖锐和准确表面的体素化。在多种具有挑战性的场景下,与现有方法相比,该方法的几何准确性、细节保留和重建完整性方面具有卓越性能。
Key Takeaways
- 引入GeoSVR框架,基于稀疏体素进行表面重建,旨在实现准确、精细且完整的重建。
- 稀疏体素在保持覆盖完整性和几何清晰度方面具优势,但缺乏场景约束和表面细化局部性问题仍是挑战。
- 提出体素不确定性深度约束,利用单目深度线索,避免质量退化,实现有效的场景约束。
- 设计稀疏体素表面正则化,提高微小体素的几何一致性,实现尖锐和准确的表面。
- 与现有方法相比,GeoSVR在几何准确性、细节保留和重建完整性方面表现卓越。
- 方法在多种具有挑战性的场景下均有效,包括复杂的场景结构和材质变化。
点此查看论文截图
Diffusion Denoised Hyperspectral Gaussian Splatting
Authors:Sunil Kumar Narayanan, Lingjun Zhao, Lu Gan, Yongsheng Chen
Hyperspectral imaging (HSI) has been widely used in agricultural applications for non-destructive estimation of plant nutrient composition and precise determination of nutritional elements of samples. Recently, 3D reconstruction methods have been used to create implicit neural representations of HSI scenes, which can help localize the target object’s nutrient composition spatially and spectrally. Neural Radiance Field (NeRF) is a cutting-edge implicit representation that can be used to render hyperspectral channel compositions of each spatial location from any viewing direction. However, it faces limitations in training time and rendering speed. In this paper, we propose Diffusion-Denoised Hyperspectral Gaussian Splatting (DD-HGS), which enhances the state-of-the-art 3D Gaussian Splatting (3DGS) method with wavelength-aware spherical harmonics, a Kullback-Leibler divergence-based spectral loss, and a diffusion-based denoiser to enable 3D explicit reconstruction of hyperspectral scenes across the full spectral range. We present extensive evaluations on diverse real-world hyperspectral scenes from the Hyper-NeRF dataset to show the effectiveness of DD-HGS. The results demonstrate that DD-HGS achieves new state-of-the-art performance among previously published methods. Project page: https://dragonpg2000.github.io/DDHGS-website/
高光谱成像(HSI)在农业应用中已得到广泛应用,用于非破坏性估计植物养分组成和精确确定样本的营养元素。最近,三维重建方法被用来创建HSI场景的隐式神经表示,这有助于在空间上光谱定位目标对象的养分组成。神经辐射场(NeRF)是一种前沿的隐式表示方法,可以从任何视角呈现每个空间位置的高光谱通道组成。然而,它在训练时间和渲染速度上存在一些局限性。在本文中,我们提出了扩散去噪高光谱高斯拼贴(DD-HGS),它通过波长感知球面谐波、基于Kullback-Leibler散度的光谱损失和基于扩散的去噪器,增强了当前的高斯拼贴(3DGS)方法,实现了全光谱范围内高光谱场景的三维显式重建。我们对Hyper-NeRF数据集的各种真实世界高光谱场景进行了广泛评估,以展示DD-HGS的有效性。结果表明,DD-HGS在先前发布的方法中达到了新的最先进的性能。项目页面:https://dragonpg2000.github.io/DDHGS-website/
论文及项目相关链接
PDF Accepted to 3DV 2026
Summary
该论文提出一种名为DD-HGS的方法,利用扩散去噪高光谱高斯拼贴技术,结合波长感知球面谐波、基于Kullback-Leibler散度的光谱损失和扩散去噪器,实现了高光谱场景的3D显式重建,并达到了新的最先进的性能。
Key Takeaways
- 该论文介绍高光谱成像(HSI)在农业应用中的非破坏性估计植物养分组成和精确确定样品营养元素的广泛应用。
- 论文指出使用三维重建方法来创建HSI场景隐式神经表示,可帮助在空间上定位目标对象的养分组成。
- Neural Radiance Field(NeRF)是一种前沿的隐式表示方法,可从任何观看方向渲染每个空间位置的超光谱通道组合。
- 研究者提出了一种新的方法DD-HGS,以增强当前的高光谱三维高斯拼贴方法,扩展全光谱范围的三维显式重建。
- DD-HGS结合了波长感知球面谐波来处理高光谱数据特性。
- 该方法采用基于Kullback-Leibler散度的光谱损失来提高性能。
点此查看论文截图
ControlGS: Consistent Structural Compression Control for Deployment-Aware Gaussian Splatting
Authors:Fengdi Zhang, Yibao Sun, Hongkun Cao, Ruqi Huang
3D Gaussian Splatting (3DGS) is a highly deployable real-time method for novel view synthesis. In practice, it requires a universal, consistent control mechanism that adjusts the trade-off between rendering quality and model compression without scene-specific tuning, enabling automated deployment across different device performances and communication bandwidths. In this work, we present ControlGS, a control-oriented optimization framework that maps the trade-off between Gaussian count and rendering quality to a continuous, scene-agnostic, and highly responsive control axis. Extensive experiments across a wide range of scene scales and types (from small objects to large outdoor scenes) demonstrate that, by adjusting a globally unified control hyperparameter, ControlGS can flexibly generate models biased toward either structural compactness or high fidelity, regardless of the specific scene scale or complexity, while achieving markedly higher rendering quality with the same or fewer Gaussians compared to potential competing methods. Project page: https://zhang-fengdi.github.io/ControlGS/
3D高斯摊铺(3DGS)是一种高度可部署的实时新型视图合成方法。在实践中,它需要一种通用且一致的控制机制,在渲染质量和模型压缩之间进行权衡,而无需针对特定场景进行调整,从而能够在不同设备性能和通信带宽上实现自动化部署。在本研究中,我们提出了ControlGS,一个面向控制的优化框架,它将高斯计数和渲染质量之间的权衡映射到一个连续、场景无关且高度响应的控制轴上。在多种场景规模和类型(从小型对象到大型室外场景)的大量实验表明,通过调整全局统一的控制超参数,ControlGS可以灵活地生成偏向于结构紧凑或高保真度的模型,无论特定的场景规模或复杂性如何,同时与潜在竞争方法相比,在相同或更少的高斯数下实现显著更高的渲染质量。项目页面:https://zhang-fengdi.github.io/ControlGS/
论文及项目相关链接
Summary
3D Gaussian Splatting(3DGS)是一种可用于实时渲染的新视角合成方法。为在不同设备性能和通信带宽上实现自动化部署,需要一种通用、一致的控制机制来调整渲染质量与模型压缩之间的平衡,无需针对特定场景进行调整。本研究提出ControlGS,一个面向控制的优化框架,将高斯计数与渲染质量之间的权衡映射到一个连续、场景无关且高度响应的控制轴上。实验证明,通过调整全局统一的控制超参数,ControlGS可以灵活地生成偏向结构紧凑或高保真度的模型,适用于各种场景规模和类型,同时在相同或更少的高斯数下实现显著更高的渲染质量。
Key Takeaways
- 3DGS是一种用于实时渲染的新视角合成方法。
- ControlGS是一个优化框架,旨在控制3DGS的渲染质量和模型压缩之间的平衡。
- ControlGS提供了一个连续、场景无关的控制轴,用于调整超参数以优化渲染效果。
- 通过全局统一的控制超参数,ControlGS可适应不同场景规模和类型,实现灵活偏向结构紧凑或高保真度的模型生成。
- ControlGS在相同或更少的高斯数下,实现了较高的渲染质量。
- 该方法旨在实现自动化部署,无需针对特定场景进行调整。
点此查看论文截图
GSRF: Complex-Valued 3D Gaussian Splatting for Efficient Radio-Frequency Data Synthesis
Authors:Kang Yang, Gaofeng Dong, Sijie Ji, Wan Du, Mani Srivastava
Synthesizing radio-frequency (RF) data given the transmitter and receiver positions, e.g., received signal strength indicator (RSSI), is critical for wireless networking and sensing applications, such as indoor localization. However, it remains challenging due to complex propagation interactions, including reflection, diffraction, and scattering. State-of-the-art neural radiance field (NeRF)-based methods achieve high-fidelity RF data synthesis but are limited by long training times and high inference latency. We introduce GSRF, a framework that extends 3D Gaussian Splatting (3DGS) from the optical domain to the RF domain, enabling efficient RF data synthesis. GSRF realizes this adaptation through three key innovations: First, it introduces complex-valued 3D Gaussians with a hybrid Fourier-Legendre basis to model directional and phase-dependent radiance. Second, it employs orthographic splatting for efficient ray-Gaussian intersection identification. Third, it incorporates a complex-valued ray tracing algorithm, executed on RF-customized CUDA kernels and grounded in wavefront propagation principles, to synthesize RF data in real time. Evaluated across various RF technologies, GSRF preserves high-fidelity RF data synthesis while achieving significant improvements in training efficiency, shorter training time, and reduced inference latency.
根据发射器和接收器位置(例如接收信号强度指示器RSSI)合成无线电频率(RF)数据对于无线联网和感应应用(如室内定位)至关重要。然而,由于复杂的传播交互,包括反射、衍射和散射,仍存在挑战。最新基于神经网络渲染场(NeRF)的方法可以实现高保真RF数据合成,但由于训练时间长和推理延迟高而受到限制。我们引入了GSRF,一个将三维高斯喷绘(3DGS)从光学领域扩展到射频领域的框架,以实现高效的射频数据合成。GSRF通过三个关键创新实现了这一适应:首先,它引入了带有混合傅里叶-勒让德基数的复数三维高斯分布,以模拟方向和相位相关的辐射亮度。其次,它采用正交喷绘技术,以便高效地进行射线-高斯交叉点识别。第三,它结合了实时执行的复数射线追踪算法,该算法基于射频定制CUDA内核和波前传播原理,以合成射频数据。经过对各种射频技术的评估,GSRF保持了高保真射频数据合成,同时在训练效率、缩短训练时间和降低推理延迟方面取得了显著改进。
论文及项目相关链接
Summary
基于射频(RF)数据的收发位置合成,如接收信号强度指示器(RSSI),对于室内定位等无线联网和传感应用至关重要。然而,由于反射、衍射和散射等复杂的传播交互,这仍然是一个挑战。本文引入GSRF框架,将三维高斯摊铺(3DGS)从光学领域扩展到射频领域,实现高效的射频数据合成。GSRF通过三个关键创新实现此适应:引入具有混合傅里叶-勒让德基础的复数三维高斯数来模拟方向和相位依赖的辐射度;采用正交摊铺实现高效的射线-高斯交点识别;结合基于波前传播原理的射频定制CUDA内核,执行复数射线追踪算法,实时合成射频数据。评估结果表明,GSRF在保持高保真射频数据合成的同时,实现了训练效率的提高、训练时间的缩短和推理延迟的降低。
Key Takeaways
- GSRF框架成功将三维高斯摊铺(3DGS)从光学领域扩展到射频领域,用于无线联网和传感应用的射频数据合成。
- GSRF通过引入复数三维高斯数和混合傅里叶-勒让德基础,有效模拟方向和相位依赖的辐射度。
- 采用正交摊铺技术,实现了高效的射线与高斯交点的识别。
- GSRF结合复数射线追踪算法和射频定制CUDA内核,能够在实时合成射频数据。
- GSRF在保持射频数据高保真合成的同时,提高了训练效率,缩短了训练时间,并降低了推理延迟。
- GSRF的适应性使其在多种射频技术中都有良好的表现。
点此查看论文截图
On Scaling Up 3D Gaussian Splatting Training
Authors:Hexu Zhao, Haoyang Weng, Daohan Lu, Ang Li, Jinyang Li, Aurojit Panda, Saining Xie
3D Gaussian Splatting (3DGS) is increasingly popular for 3D reconstruction due to its superior visual quality and rendering speed. However, 3DGS training currently occurs on a single GPU, limiting its ability to handle high-resolution and large-scale 3D reconstruction tasks due to memory constraints. We introduce Grendel, a distributed system designed to partition 3DGS parameters and parallelize computation across multiple GPUs. As each Gaussian affects a small, dynamic subset of rendered pixels, Grendel employs sparse all-to-all communication to transfer the necessary Gaussians to pixel partitions and performs dynamic load balancing. Unlike existing 3DGS systems that train using one camera view image at a time, Grendel supports batched training with multiple views. We explore various optimization hyperparameter scaling strategies and find that a simple sqrt(batch size) scaling rule is highly effective. Evaluations using large-scale, high-resolution scenes show that Grendel enhances rendering quality by scaling up 3DGS parameters across multiple GPUs. On the Rubble dataset, we achieve a test PSNR of 27.28 by distributing 40.4 million Gaussians across 16 GPUs, compared to a PSNR of 26.28 using 11.2 million Gaussians on a single GPU. Grendel is an open-source project available at: https://github.com/nyu-systems/Grendel-GS
3D高斯摊铺(3DGS)因其出色的视觉质量和渲染速度而越来越受欢迎于3D重建。然而,目前的3DGS训练仅在单个GPU上进行,由于内存限制,其处理高分辨率和大规模3D重建任务的能力受到限制。我们引入了Grendel,一个分布式系统,旨在分割3DGS参数并在多个GPU上并行计算。由于每个高斯值影响渲染像素的小而动态子集,Grendel采用稀疏的全对全通信来转移必要的高斯值到像素分区并进行动态负载均衡。与现有的使用单一相机视角图像进行训练的3DGS系统不同,Grendel支持使用多个视角的批量训练。我们探索了各种优化超参数缩放策略,并发现简单的根号(批次大小)缩放规则非常有效。使用大规模、高分辨率场景进行的评估表明,Grendel通过跨多个GPU扩展3DGS参数提高了渲染质量。在废墟数据集上,通过在16个GPU上分布4040万高斯值,我们实现了测试PSNR为27.28,相比之下,在单个GPU上使用1120万高斯值的PSNR为26.28。Grendel是一个开源项目,可在https://github.com/nyu-systems/Grendel-GS找到。
论文及项目相关链接
PDF ICLR 2025 Oral; Homepage: https://daohanlu.github.io/scaling-up-3dgs/
Summary
本文介绍了基于3DGS的分布式系统Grendel,用于大规模高分辨率的3D重建任务。Grendel通过分割3DGS参数和并行计算,支持跨多个GPU的训练。它采用稀疏通信实现动态负载均衡,并支持多视角批量训练。评估结果表明,Grendel能提高渲染质量,实现在多个GPU上扩展3DGS参数。在Rubble数据集上,使用Grendel在16个GPU上分布40.4百万个高斯,测试PSNR达到27.28,相较于单个GPU上的11.2百万个高斯,PSNR提高了至26.28。
Key Takeaways
- Grendel是一个基于3DGS的分布式系统,用于处理大规模高分辨率的3D重建任务。
- Grendel通过分割3DGS参数和并行计算,能在多个GPU上进行训练,突破单GPU内存限制。
- Grendel采用稀疏通信实现动态负载均衡,提升计算效率。
- Grendel支持多视角批量训练,提高训练速度和效果。
- Grendel通过优化超参数缩放策略,实现了有效的性能提升。
- 使用Grendel在Rubble数据集上的测试显示,分布式训练能显著提高渲染质量。