⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-01-23 更新
HAC++: Towards 100X Compression of 3D Gaussian Splatting
Authors:Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, Jianfei Cai
3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity. However, the substantial Gaussians and their associated attributes necessitate effective compression techniques. Nevertheless, the sparse and unorganized nature of the point cloud of Gaussians (or anchors in our paper) presents challenges for compression. To achieve a compact size, we propose HAC++, which leverages the relationships between unorganized anchors and a structured hash grid, utilizing their mutual information for context modeling. Additionally, HAC++ captures intra-anchor contextual relationships to further enhance compression performance. To facilitate entropy coding, we utilize Gaussian distributions to precisely estimate the probability of each quantized attribute, where an adaptive quantization module is proposed to enable high-precision quantization of these attributes for improved fidelity restoration. Moreover, we incorporate an adaptive masking strategy to eliminate invalid Gaussians and anchors. Overall, HAC++ achieves a remarkable size reduction of over 100X compared to vanilla 3DGS when averaged on all datasets, while simultaneously improving fidelity. It also delivers more than 20X size reduction compared to Scaffold-GS. Our code is available at https://github.com/YihangChen-ee/HAC-plus.
PDF IEEE TPAMI Submission. This paper is an extension of HAC at arXiv:2403.14530 (ECCV 2024)
Key Takeaways
- 3DGS在新型视图合成领域具有前景,但其需要大量高斯数据,需要有效的压缩技术。
- HAC++方法利用未组织锚点间的关系进行上下文建模,并采用结构化哈希网格实现紧凑大小。
- HAC++引入了自适应量化模块,能精确地量化每个属性的精度以恢复更高的保真度。
- 采用自适应掩模策略可以消除无效的高斯数据和锚点。
GaussianVideo: Efficient Video Representation Through 2D Gaussian Splatting
Authors:Longan Wang, Yuang Shi, Wei Tsang Ooi
3D Gaussian splats have emerged as a revolutionary, effective, learned representation for static 3D scenes. In this work, we explore using 2D Gaussian splats as a new primitive for representing videos. We propose GaussianVideo, an approach to learning a set of 2D Gaussian splats that can effectively represent video frames. GaussianVideo incorporates the following techniques: (i) To exploit temporal redundancy among adjacent frames, which can speed up training and improve the compression efficiency, we predict the Gaussian splats of a frame based on its previous frame; (ii) To control the trade-offs between file size and quality, we remove Gaussian splats with low contribution to the video quality; (iii) To capture dynamics in videos, we randomly add Gaussian splats to fit content with large motion or newly-appeared objects; (iv) To handle significant changes in the scene, we detect key frames based on loss differences during the learning process. Experiment results show that GaussianVideo achieves good rate-distortion trade-offs, comparable to state-of-the-art video codecs such as AV1 and VVC, and a rendering speed of 1500 fps for a 1920x1080 video.
Key Takeaways
- 论文提出使用二维高斯斑点作为新的视频表示方法。
- GaussianVideo方法通过学习和利用一系列二维高斯斑点有效地表示视频帧。
- 利用时间冗余性预测帧的高斯斑点以提高训练速度和压缩效率。
- 通过移除对视频质量贡献较低的高斯斑点来控制文件大小与质量的平衡。
- 随机添加高斯斑点以捕捉视频中的动态内容和新出现的对象。
- 通过检测关键帧来处理场景中的重要变化。
See In Detail: Enhancing Sparse-view 3D Gaussian Splatting with Local Depth and Semantic Regularization
Authors:Zongqi He, Zhe Xiao, Kin-Chung Chan, Yushen Zuo, Jun Xiao, Kin-Man Lam
3D Gaussian Splatting (3DGS) has shown remarkable performance in novel view synthesis. However, its rendering quality deteriorates with sparse inphut views, leading to distorted content and reduced details. This limitation hinders its practical application. To address this issue, we propose a sparse-view 3DGS method. Given the inherently ill-posed nature of sparse-view rendering, incorporating prior information is crucial. We propose a semantic regularization technique, using features extracted from the pretrained DINO-ViT model, to ensure multi-view semantic consistency. Additionally, we propose local depth regularization, which constrains depth values to improve generalization on unseen views. Our method outperforms state-of-the-art novel view synthesis approaches, achieving up to 0.4dB improvement in terms of PSNR on the LLFF dataset, with reduced distortion and enhanced visual quality.
PDF 5 pages, 5 figures, has been accepted by the ICASSP 2025
Key Takeaways
- 3DGS在新型视角合成中表现优秀,但在稀疏视角输入时存在渲染质量下降的问题。
- 为解决渲染质量问题,提出一种稀疏视角的3DGS方法。
- 结合先验信息,采用预训练的DINO-ViT模型提取特征,确保多视角语义一致性。
- 引入局部深度正则化,约束深度值以提高未见视角的泛化能力。
- 该方法在新型视角合成方面优于其他方法。
- 在LLFF数据集上,该方法相比其他方法PSNR提高了0.4dB。
RDG-GS: Relative Depth Guidance with Gaussian Splatting for Real-time Sparse-View 3D Rendering
Authors:Chenlu Zhan, Yufei Zhang, Yu Lin, Gaoang Wang, Hongwei Wang
Efficiently synthesizing novel views from sparse inputs while maintaining accuracy remains a critical challenge in 3D reconstruction. While advanced techniques like radiance fields and 3D Gaussian Splatting achieve rendering quality and impressive efficiency with dense view inputs, they suffer from significant geometric reconstruction errors when applied to sparse input views. Moreover, although recent methods leverage monocular depth estimation to enhance geometric learning, their dependence on single-view estimated depth often leads to view inconsistency issues across different viewpoints. Consequently, this reliance on absolute depth can introduce inaccuracies in geometric information, ultimately compromising the quality of scene reconstruction with Gaussian splats. In this paper, we present RDG-GS, a novel sparse-view 3D rendering framework with Relative Depth Guidance based on 3D Gaussian Splatting. The core innovation lies in utilizing relative depth guidance to refine the Gaussian field, steering it towards view-consistent spatial geometric representations, thereby enabling the reconstruction of accurate geometric structures and capturing intricate textures. First, we devise refined depth priors to rectify the coarse estimated depth and insert global and fine-grained scene information to regular Gaussians. Building on this, to address spatial geometric inaccuracies from absolute depth, we propose relative depth guidance by optimizing the similarity between spatially correlated patches of depth and images. Additionally, we also directly deal with the sparse areas challenging to converge by the adaptive sampling for quick densification. Across extensive experiments on Mip-NeRF360, LLFF, DTU, and Blender, RDG-GS demonstrates state-of-the-art rendering quality and efficiency, making a significant advancement for real-world application.
PDF 24 pages, 12 figures
Key Takeaways
- 稀疏输入下的视角合成是维持精确性的关键挑战。当前技术在几何重建中易出错,尤其是处理稀疏视角时。
Car-GS: Addressing Reflective and Transparent Surface Challenges in 3D Car Reconstruction
Authors:Congcong Li, Jin Wang, Xiaomeng Wang, Xingchen Zhou, Wei Wu, Yuzhi Zhang, Tongyi Cao
3D car modeling is crucial for applications in autonomous driving systems, virtual and augmented reality, and gaming. However, due to the distinctive properties of cars, such as highly reflective and transparent surface materials, existing methods often struggle to achieve accurate 3D car reconstruction.To address these limitations, we propose Car-GS, a novel approach designed to mitigate the effects of specular highlights and the coupling of RGB and geometry in 3D geometric and shading reconstruction (3DGS). Our method incorporates three key innovations: First, we introduce view-dependent Gaussian primitives to effectively model surface reflections. Second, we identify the limitations of using a shared opacity parameter for both image rendering and geometric attributes when modeling transparent objects. To overcome this, we assign a learnable geometry-specific opacity to each 2D Gaussian primitive, dedicated solely to rendering depth and normals. Third, we observe that reconstruction errors are most prominent when the camera view is nearly orthogonal to glass surfaces. To address this issue, we develop a quality-aware supervision module that adaptively leverages normal priors from a pre-trained large-scale normal model.Experimental results demonstrate that Car-GS achieves precise reconstruction of car surfaces and significantly outperforms prior methods. The project page is available at https://lcc815.github.io/Car-GS.
Key Takeaways
- 3D car modeling在自主驾驶系统、虚拟与增强现实和游戏中具有关键作用。但车辆特有的反射和透明材质给建模带来挑战。
- 现有方法在车辆建模中面临准确重建的挑战。为此,提出了Car-GS方法来解决这些问题。
- Car-GS引入视图相关的高斯基本形状,有效模拟表面反射。
- Car-GS识别了为透明物体建模时共享遮光度参数的局限性,并对其进行改进。为每个二维高斯基本形状分配一个仅用于渲染深度和法线的几何特定遮光度。
- 当相机视角几乎垂直于玻璃表面时,重建误差最为明显。为此,Car-GS开发了一个质量感知监督模块,自适应利用预训练的大规模法线模型的法线先验信息。
Decoupling Appearance Variations with 3D Consistent Features in Gaussian Splatting
Authors:Jiaqi Lin, Zhihao Li, Binxiao Huang, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Xiaofei Wu, Fenglong Song, Wenming Yang
Gaussian Splatting has emerged as a prominent 3D representation in novel view synthesis, but it still suffers from appearance variations, which are caused by various factors, such as modern camera ISPs, different time of day, weather conditions, and local light changes. These variations can lead to floaters and color distortions in the rendered images/videos. Recent appearance modeling approaches in Gaussian Splatting are either tightly coupled with the rendering process, hindering real-time rendering, or they only account for mild global variations, performing poorly in scenes with local light changes. In this paper, we propose DAVIGS, a method that decouples appearance variations in a plug-and-play and efficient manner. By transforming the rendering results at the image level instead of the Gaussian level, our approach can model appearance variations with minimal optimization time and memory overhead. Furthermore, our method gathers appearance-related information in 3D space to transform the rendered images, thus building 3D consistency across views implicitly. We validate our method on several appearance-variant scenes, and demonstrate that it achieves state-of-the-art rendering quality with minimal training time and memory usage, without compromising rendering speeds. Additionally, it provides performance improvements for different Gaussian Splatting baselines in a plug-and-play manner.
PDF Accepted to AAAI 2025. Project website: https://davi-gaussian.github.io
Key Takeaways
- Gaussian Splatting是主流3D表示方法,但仍存在外观变化问题。
- 外观变化可能来源于现代相机ISP、时间、天气和局部光照变化。
- 现有方法要么与渲染过程紧密耦合,影响实时渲染速度,要么仅适用于轻微全局变化。
- DAVIGS方法能在图像层面进行渲染结果转换,模拟外观变化,优化时间和内存开销最小。
- DAVIGS在建立视图之间的隐式3D一致性方面表现出优势。
- 该方法在多个场景下的验证证明了其优秀渲染质量,并且具有快速的训练时间和较低的内存使用。
GSTAR: Gaussian Surface Tracking and Reconstruction
Authors:Chengwei Zheng, Lixin Xue, Juan Zarate, Jie Song
3D Gaussian Splatting techniques have enabled efficient photo-realistic rendering of static scenes. Recent works have extended these approaches to support surface reconstruction and tracking. However, tracking dynamic surfaces with 3D Gaussians remains challenging due to complex topology changes, such as surfaces appearing, disappearing, or splitting. To address these challenges, we propose GSTAR, a novel method that achieves photo-realistic rendering, accurate surface reconstruction, and reliable 3D tracking for general dynamic scenes with changing topology. Given multi-view captures as input, GSTAR binds Gaussians to mesh faces to represent dynamic objects. For surfaces with consistent topology, GSTAR maintains the mesh topology and tracks the meshes using Gaussians. In regions where topology changes, GSTAR adaptively unbinds Gaussians from the mesh, enabling accurate registration and the generation of new surfaces based on these optimized Gaussians. Additionally, we introduce a surface-based scene flow method that provides robust initialization for tracking between frames. Experiments demonstrate that our method effectively tracks and reconstructs dynamic surfaces, enabling a range of applications. Our project page with the code release is available at https://eth-ait.github.io/GSTAR/.
Key Takeaways
- GSTAR实现了动态场景的光照真实渲染、表面重建和3D跟踪。
- GSTAR通过将高斯绑定到网格表面来表示动态对象。
- 对于拓扑一致的表面,GSTAR保持网格拓扑并使用高斯进行追踪。
- 在拓扑变化区域,GSTAR能够自适应地解绑高斯,实现准确注册和新表面生成。
- GSTAR引入了一种基于表面的场景流方法,为帧间跟踪提供稳健初始化。
- 实验表明,GSTAR在动态表面跟踪和重建方面效果显著。
- GSTAR方法可应用于多种领域。
F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Consistent Gaussian Splatting
Authors:Yuxin Wang, Qianyi Wu, Dan Xu
This paper tackles the problem of generalizable 3D-aware generation from monocular datasets, e.g., ImageNet. The key challenge of this task is learning a robust 3D-aware representation without multi-view or dynamic data, while ensuring consistent texture and geometry across different viewpoints. Although some baseline methods are capable of 3D-aware generation, the quality of the generated images still lags behind state-of-the-art 2D generation approaches, which excel in producing high-quality, detailed images. To address this severe limitation, we propose a novel feed-forward pipeline based on pixel-aligned Gaussian Splatting, coined as F3D-Gaus, which can produce more realistic and reliable 3D renderings from monocular inputs. In addition, we introduce a self-supervised cycle-consistent constraint to enforce cross-view consistency in the learned 3D representation. This training strategy naturally allows aggregation of multiple aligned Gaussian primitives and significantly alleviates the interpolation limitations inherent in single-view pixel-aligned Gaussian Splatting. Furthermore, we incorporate video model priors to perform geometry-aware refinement, enhancing the generation of fine details in wide-viewpoint scenarios and improving the model’s capability to capture intricate 3D textures. Extensive experiments demonstrate that our approach not only achieves high-quality, multi-view consistent 3D-aware generation from monocular datasets, but also significantly improves training and inference efficiency.
PDF Project Page: https://w-ted.github.io/publications/F3D-Gaus
Key Takeaways
- 该论文解决了从单目数据集进行3D感知生成的问题,旨在学习一个不需要多视角或动态数据的稳健的3D表示。
- 提出了一种新的前馈管道F3D-Gaus,基于像素对齐高斯摊铺,能够生成更真实、可靠的3D渲染。
- 引入了自监督循环一致性约束,以确保学习到的3D表示在不同视角之间的一致性。
- 通过融入视频模型先验,提高了在宽视角场景中的细节生成能力,并增强了模型捕捉复杂3D纹理的能力。
- 该方法实现了高质量的、多视角一致的3D感知生成,同时提高了训练和推理的效率。
- F3D-Gaus方法能够有效缓解单一视角像素对齐高斯摊铺的插值限制,通过聚合多个对齐的高斯原始数据来实现。
DehazeGS: Seeing Through Fog with 3D Gaussian Splatting
Authors:Jinze Yu, Yiqun Wang, Zhengda Lu, Jianwei Guo, Yong Li, Hongxing Qin, Xiaopeng Zhang
Current novel view synthesis tasks primarily rely on high-quality and clear images. However, in foggy scenes, scattering and attenuation can significantly degrade the reconstruction and rendering quality. Although NeRF-based dehazing reconstruction algorithms have been developed, their use of deep fully connected neural networks and per-ray sampling strategies leads to high computational costs. Moreover, NeRF’s implicit representation struggles to recover fine details from hazy scenes. In contrast, recent advancements in 3D Gaussian Splatting achieve high-quality 3D scene reconstruction by explicitly modeling point clouds into 3D Gaussians. In this paper, we propose leveraging the explicit Gaussian representation to explain the foggy image formation process through a physically accurate forward rendering process. We introduce DehazeGS, a method capable of decomposing and rendering a fog-free background from participating media using only muti-view foggy images as input. We model the transmission within each Gaussian distribution to simulate the formation of fog. During this process, we jointly learn the atmospheric light and scattering coefficient while optimizing the Gaussian representation of the hazy scene. In the inference stage, we eliminate the effects of scattering and attenuation on the Gaussians and directly project them onto a 2D plane to obtain a clear view. Experiments on both synthetic and real-world foggy datasets demonstrate that DehazeGS achieves state-of-the-art performance in terms of both rendering quality and computational efficiency. visualizations are available at https://dehazegs.github.io/
PDF 9 pages,4 figures. visualizations are available at https://dehazegs.github.io/
本文提出一种基于3D高斯展布(Gaussian Splatting)的去雾方法,称为DehazeGS。该方法利用显式高斯表示来模拟雾天图像形成过程,通过多视角雾天图像输入,分解并渲染出无雾背景。该方法模拟高斯分布中的传输过程,联合学习大气光和散射系数,优化雾场景的Gaussian表示,并在推理阶段消除散射和衰减对高斯的影响,直接投影到2D平面以获得清晰视图。实验表明,DehazeGS在渲染质量和计算效率方面达到领先水平。
Key Takeaways
- 当前视图合成任务主要依赖高质量清晰图像,但在雾天场景,散射和衰减会严重影响重建和渲染质量。
- NeRF的隐式表示在雾天场景难以恢复细节,而3D高斯展布可实现高质量3D场景重建。
- DehazeGS方法利用显式高斯表示模拟雾天图像形成过程,通过多视角雾天图像进行去雾处理。
- 方法中模拟高斯分布中的传输过程,联合学习大气光和散射系数。
- DehazeGS在推理阶段消除散射和衰减影响,直接将高斯投影到2D平面获得清晰视图。
- 实验证明DehazeGS在渲染质量和计算效率方面达到领先水平。
3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement
Authors:Ziqi Lu, Jianbo Ye, John Leonard
We present 3DGS-CD, the first 3D Gaussian Splatting (3DGS)-based method for detecting physical object rearrangements in 3D scenes. Our approach estimates 3D object-level changes by comparing two sets of unaligned images taken at different times. Leveraging 3DGS’s novel view rendering and EfficientSAM’s zero-shot segmentation capabilities, we detect 2D object-level changes, which are then associated and fused across views to estimate 3D change masks and object transformations. Our method can accurately identify changes in cluttered environments using sparse (as few as one) post-change images within as little as 18s. It does not rely on depth input, user instructions, pre-defined object classes, or object models – An object is recognized simply if it has been re-arranged. Our approach is evaluated on both public and self-collected real-world datasets, achieving up to 14% higher accuracy and three orders of magnitude faster performance compared to the state-of-the-art radiance-field-based change detection method. This significant performance boost enables a broad range of downstream applications, where we highlight three key use cases: object reconstruction, robot workspace reset, and 3DGS model update. Our code and data will be made available at https://github.com/520xyxyzq/3DGS-CD.
本文提出一种基于3D高斯拼接技术的新型物理对象重排检测方法。该方法通过对比不同时间点的未对齐图像,实现对物理对象的实时跟踪。该方法充分利用高效无照点的零视图分割能力与不同视图的组合信息,有效检测出复杂的对象变化和立体转换掩码。其特点是可在短期内精确捕捉移动、重塑的物体而不依赖于深度图像信息、用户预设的操作、特定的类别模型和预定目标的指向动作,拓展了广泛应用的可能性,包括物体重建、机器人重置空间和模型更新等。Key Takeaways
- 介绍了一种基于3D高斯拼接技术的新型方法——3DGS-CD,用于检测三维场景中的物理对象重排。
- 方法通过比较两组不同时间的图像进行估算对象的三维变化。
- 结合高效SAM的无视无照点分割能力,准确检测二维对象变化,并跨视图关联融合生成三维变化掩码和对象转换。
- 该方法能够在短时间内准确识别出重组后的物体,仅使用少数几张改变后的图像。
- 不依赖于深度输入、用户指令、预定义对象类别或对象模型,能够识别任何被重新排列的对象。
- 在公共和自行收集的真实世界数据集上进行了评估,与当前主流的基于辐射场的变化检测方法相比,准确性提高了高达14%,性能提高了三个数量级。