嘘~ 正在从服务器偷取页面 . . .

3DGS


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-10-10 更新

RTGS: Real-Time 3D Gaussian Splatting SLAM via Multi-Level Redundancy Reduction

Authors:Leshu Li, Jiayin Qin, Jie Peng, Zishen Wan, Huaizhi Qu, Ye Han, Pingqing Zheng, Hongsen Zhang, Yu, Cao, Tianlong Chen, Yang, Zhao

3D Gaussian Splatting (3DGS) based Simultaneous Localization and Mapping (SLAM) systems can largely benefit from 3DGS’s state-of-the-art rendering efficiency and accuracy, but have not yet been adopted in resource-constrained edge devices due to insufficient speed. Addressing this, we identify notable redundancies across the SLAM pipeline for acceleration. While conceptually straightforward, practical approaches are required to minimize the overhead associated with identifying and eliminating these redundancies. In response, we propose RTGS, an algorithm-hardware co-design framework that comprehensively reduces the redundancies for real-time 3DGS-SLAM on edge. To minimize the overhead, RTGS fully leverages the characteristics of the 3DGS-SLAM pipeline. On the algorithm side, we introduce (1) an adaptive Gaussian pruning step to remove the redundant Gaussians by reusing gradients computed during backpropagation; and (2) a dynamic downsampling technique that directly reuses the keyframe identification and alpha computing steps to eliminate redundant pixels. On the hardware side, we propose (1) a subtile-level streaming strategy and a pixel-level pairwise scheduling strategy that mitigates workload imbalance via a Workload Scheduling Unit (WSU) guided by previous iteration information; (2) a Rendering and Backpropagation (R&B) Buffer that accelerates the rendering backpropagation by reusing intermediate data computed during rendering; and (3) a Gradient Merging Unit (GMU) to reduce intensive memory accesses caused by atomic operations while enabling pipelined aggregation. Integrated into an edge GPU, RTGS achieves real-time performance (>= 30 FPS) on four datasets and three algorithms, with up to 82.5x energy efficiency over the baseline and negligible quality loss. Code is available at https://github.com/UMN-ZhaoLab/RTGS.

基于3D高斯拼贴(3DGS)的同时定位与地图构建(SLAM)系统可以大大受益于3DGS的先进渲染效率和准确性。然而,由于速度不足,它们尚未在资源受限的边缘设备上得到应用。为了解决这个问题,我们识别出SLAM管道中显著的冗余以进行加速。尽管在概念上很简单,但实际需要采用方法来最小化与识别和消除这些冗余相关的开销。作为回应,我们提出了RTGS,这是一种算法硬件协同设计框架,它全面减少了边缘实时3DGS-SLAM的冗余。为了最小化开销,RTGS充分利用了3DGS-SLAM管道的特点。在算法方面,我们引入了(1)自适应高斯修剪步骤,通过重用反向传播过程中计算的梯度来去除冗余的高斯;和(2)动态下采样技术,直接重用关键帧识别和alpha计算步骤来消除冗余像素。在硬件方面,我们提出了(1)子块级流策略以及由工作量调度单元(WSU)引导的像素级配对调度策略,该策略通过以前的迭代信息缓解工作量不平衡;(2)一个加速渲染反向传播的渲染和反向传播(R&B)缓冲区,该缓冲区重用了渲染过程中计算的中间数据;(3)一个梯度合并单元(GMU),以减少原子操作引起的密集内存访问,同时实现流水线聚合。RTGS被集成到边缘GPU中,在四个数据集和三种算法上实现了实时性能(>= 30 FPS),比基线高出82.5倍的能效,且质量损失微乎其微。代码可在https://github.com/UMN-ZhaoLab/RTGS找到。

论文及项目相关链接

PDF Accepted by MICRO2025

Summary

本文介绍了基于3D高斯拼贴(3DGS)的同步定位与地图构建(SLAM)系统,该系统具有高效的渲染能力和准确性。然而,由于速度不足,尚未在资源受限的边缘设备中得到应用。为了解决这个问题,本文提出了一种名为RTGS的算法-硬件协同设计框架,该框架全面减少了冗余,以实现边缘设备上的实时3DGS-SLAM。通过利用3DGS-SLAM管道的特性,RTGS在算法方面引入了自适应高斯裁剪步骤和动态降采样技术,以消除冗余。在硬件方面,提出了基于子块级别的流策略、像素级成对调度策略、一个由迭代信息引导的工作量调度单元(WSU)、一个加速渲染反向传播的渲染和反向传播(R&B)缓冲区以及一个减少原子操作引起的密集内存访问的梯度合并单元(GMU)。RTGS被集成到边缘GPU中,在四个数据集和三种算法上实现了实时性能(>= 30 FPS),比基线方案节能高达82.5倍,且质量损失可以忽略不计。

Key Takeaways

  1. 3DGS基于其高效渲染能力和准确性在SLAM系统中具有潜力。
  2. 当前在资源受限的边缘设备上应用受限的主要原因在于速度不足。
  3. RTGS是一个算法-硬件协同设计框架,旨在减少SLAM系统中的冗余以加快速度。
  4. RTGS通过自适应高斯裁剪和动态降采样技术消除算法中的冗余。
  5. 在硬件层面,RTGS采用了一系列优化策略来提高性能,包括子块级别的流策略、像素级成对调度策略、工作量调度单元、渲染和反向传播缓冲区以及梯度合并单元。
  6. RTGS在多个数据集和算法上实现了实时性能,并显著提高了能源效率。

Cool Papers

点此查看论文截图

Active Next-Best-View Optimization for Risk-Averse Path Planning

Authors:Amirhossein Mollaei Khass, Guangyi Liu, Vivek Pandey, Wen Jiang, Boshu Lei, Kostas Daniilidis, Nader Motee

Safe navigation in uncertain environments requires planning methods that integrate risk aversion with active perception. In this work, we present a unified framework that refines a coarse reference path by constructing tail-sensitive risk maps from Average Value-at-Risk statistics on an online-updated 3D Gaussian-splat Radiance Field. These maps enable the generation of locally safe and feasible trajectories. In parallel, we formulate Next-Best-View (NBV) selection as an optimization problem on the SE(3) pose manifold, where Riemannian gradient descent maximizes an expected information gain objective to reduce uncertainty most critical for imminent motion. Our approach advances the state-of-the-art by coupling risk-averse path refinement with NBV planning, while introducing scalable gradient decompositions that support efficient online updates in complex environments. We demonstrate the effectiveness of the proposed framework through extensive computational studies.

在不确定环境中进行安全导航需要一种规划方法,这种方法需要整合风险规避与主动感知。在这项工作中,我们提出了一个统一框架,通过构建对尾部敏感的风险图来完善粗糙的参考路径。这些风险图是根据在线更新的三维高斯斑点辐射场的平均风险价值统计来形成的。这些地图可以生成局部安全和可行的轨迹。与此同时,我们将下一步最佳视图(NBV)的选择制定为SE(3)姿态流形上的优化问题,其中黎曼梯度下降最大化预期信息增益目标,以减少对即将发生的运动至关重要的不确定性。我们的方法通过将风险规避路径优化与NBV规划相结合,同时引入可扩展的梯度分解,支持在复杂环境中的高效在线更新,从而提高了最新技术水平。我们通过大量的计算研究证明了所提出框架的有效性。

论文及项目相关链接

PDF

Summary

本文提出一个统一的框架,通过构建对尾部敏感的风险地图来优化粗糙参考路径,同时降低不确定环境中的航行风险。利用在线更新的三维高斯斯普拉光照场上的平均价值风险统计数据生成安全可行的局部轨迹。同时,将下一步最佳视角的选择作为一个优化问题,在SE(3)姿态流形上解决,利用黎曼梯度下降法最大化预期信息收益目标,以最大限度地减少关键的未来运动不确定性。本文通过耦合风险规避路径优化和下一步最佳视角规划,同时引入可伸缩的梯度分解,支持复杂环境中的高效在线更新,推进了最新技术的状态。

Key Takeaways

  1. 提出一个统一的框架用于规划不确定环境下的安全导航。
  2. 通过构建尾部敏感的风险地图来优化粗糙参考路径。
  3. 利用在线更新的三维高斯斯普拉光照场生成安全可行的局部轨迹。
  4. 将下一步最佳视角的选择作为优化问题来解决,以最大限度地减少未来运动的不确定性。
  5. 利用黎曼梯度下降法最大化预期信息收益目标。
  6. 通过耦合风险规避路径优化和下一步最佳视角规划,提高了导航效率。

Cool Papers

点此查看论文截图

Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction

Authors:Chi Yan, Dan Xu

The 3D occupancy prediction task has witnessed remarkable progress in recent years, playing a crucial role in vision-based autonomous driving systems. While traditional methods are limited to fixed semantic categories, recent approaches have moved towards predicting text-aligned features to enable open-vocabulary text queries in real-world scenes. However, there exists a trade-off in text-aligned scene modeling: sparse Gaussian representation struggles to capture small objects in the scene, while dense representation incurs significant computational overhead. To address these limitations, we present PG-Occ, an innovative Progressive Gaussian Transformer Framework that enables open-vocabulary 3D occupancy prediction. Our framework employs progressive online densification, a feed-forward strategy that gradually enhances the 3D Gaussian representation to capture fine-grained scene details. By iteratively enhancing the representation, the framework achieves increasingly precise and detailed scene understanding. Another key contribution is the introduction of an anisotropy-aware sampling strategy with spatio-temporal fusion, which adaptively assigns receptive fields to Gaussians at different scales and stages, enabling more effective feature aggregation and richer scene information capture. Through extensive evaluations, we demonstrate that PG-Occ achieves state-of-the-art performance with a relative 14.3% mIoU improvement over the previous best performing method. Code and pretrained models will be released upon publication on our project page: https://yanchi-3dv.github.io/PG-Occ

近年来,三维占用预测任务取得了显著进展,在基于视觉的自动驾驶系统中发挥了至关重要的作用。虽然传统方法局限于固定的语义类别,但最近的方法已经转向预测文本对齐特征,从而在现实世界的场景中实现开放词汇表的文本查询。然而,在文本对齐的场景建模中存在权衡:稀疏的高斯表示很难捕捉场景中的小物体,而密集表示则会导致大量的计算开销。为了解决这些局限性,我们提出了PG-Occ,这是一种创新的渐进式高斯变换框架,可实现开放词汇表的三维占用预测。我们的框架采用渐进式在线细化策略,这是一种前馈策略,可逐步增强三维高斯表示,以捕捉场景的精细细节。通过迭代增强表示,该框架实现了越来越精确和详细的场景理解。另一个关键贡献是引入了具有时空融合的各向异性感知采样策略,该策略可以自适应地为不同尺度和阶段的高斯分配感受野,从而实现更有效的特征聚合和更丰富的场景信息捕获。通过广泛的评估,我们证明了PG-Occ达到了最先进的性能,相对于之前性能最佳的方法,相对提高了14.3%的mIoU。代码和预先训练的模型将在我们的项目页面发布时公布:https://yanchi-3dv.github.io/PG-Occ。

论文及项目相关链接

PDF Project Page: https://yanchi-3dv.github.io/PG-Occ

Summary

近期,3D占用预测任务在基于视觉的自动驾驶系统中取得显著进展。传统方法受限于固定语义类别,而新方法趋向于预测文本对齐特征,以实现现实场景中的开放词汇文本查询。本文提出PG-Occ,一种创新的渐进式高斯变换框架,实现开放词汇表的3D占用预测。采用渐进式在线稠密化策略,通过前馈方式逐步增强3D高斯表示,以捕捉场景的精细细节。同时引入各向异性感知采样策略,自适应地为不同尺度和阶段的高斯分配感受野,实现更有效的特征聚合和更丰富的场景信息捕捉。评估结果显示,PG-Occ相较于之前最佳方法实现了相对14.3%的mIoU提升。

Key Takeaways

  • 3D占用预测任务在自动驾驶系统中至关重要,近年取得显著进展。
  • 传统方法受限于固定语义类别,而新方法注重预测文本对齐特征以适应现实场景中的开放词汇文本查询。
  • PG-Occ框架采用渐进式在线稠密化策略,逐步增强3D高斯表示以捕捉场景的精细细节。
  • 引入各向异性感知采样策略,实现更有效的特征聚合和场景信息捕捉。
  • PG-Occ相对于先前方法实现了显著的mIoU提升。

Cool Papers

点此查看论文截图

VGGT-X: When VGGT Meets Dense Novel View Synthesis

Authors:Yang Liu, Chuanchen Luo, Zimo Tang, Junran Peng, Zhaoxiang Zhang

We study the problem of applying 3D Foundation Models (3DFMs) to dense Novel View Synthesis (NVS). Despite significant progress in Novel View Synthesis powered by NeRF and 3DGS, current approaches remain reliant on accurate 3D attributes (e.g., camera poses and point clouds) acquired from Structure-from-Motion (SfM), which is often slow and fragile in low-texture or low-overlap captures. Recent 3DFMs showcase orders of magnitude speedup over the traditional pipeline and great potential for online NVS. But most of the validation and conclusions are confined to sparse-view settings. Our study reveals that naively scaling 3DFMs to dense views encounters two fundamental barriers: dramatically increasing VRAM burden and imperfect outputs that degrade initialization-sensitive 3D training. To address these barriers, we introduce VGGT-X, incorporating a memory-efficient VGGT implementation that scales to 1,000+ images, an adaptive global alignment for VGGT output enhancement, and robust 3DGS training practices. Extensive experiments show that these measures substantially close the fidelity gap with COLMAP-initialized pipelines, achieving state-of-the-art results in dense COLMAP-free NVS and pose estimation. Additionally, we analyze the causes of remaining gaps with COLMAP-initialized rendering, providing insights for the future development of 3D foundation models and dense NVS. Our project page is available at https://dekuliutesla.github.io/vggt-x.github.io/

我们研究了将三维基础模型(3DFMs)应用于密集新颖视图合成(NVS)的问题。尽管NeRF和3DGS驱动的新颖视图合成取得了重大进展,但当前的方法仍然依赖于从运动结构中获取的三维属性(如相机姿态和点云),这在低纹理或低重叠捕获中通常是缓慢且易出错的。最近的3DFMs展示了相对于传统管道的数量级加速和在线NVS的巨大潜力。但大部分的验证和结论仅限于稀疏视图设置。我们的研究揭示,简单地将3DFMs扩展到密集视图会遇到两个基本障碍:显著增加VRAM负担以及由于初始化敏感的三维训练而导致的输出不完美。为了克服这些障碍,我们引入了VGGT-X,它结合了高效的VGGT实现(可扩展到1000+图像)、用于增强VGGT输出的自适应全局对齐以及稳健的3DGS训练实践。大量实验表明,这些措施极大地缩小了与COLMAP初始化管道之间的忠实度差距,在密集的COLMAP免费NVS和姿态估计方面达到了最新结果。此外,我们还分析了与COLMAP初始化渲染之间剩余差距的原因,为三维基础模型和密集NVS的未来发展提供了见解。我们的项目页面可在https://dekuliutesla.github.io/vggt-x.github.io/访问。

论文及项目相关链接

PDF Project Page: https://dekuliutesla.github.io/vggt-x.github.io/

Summary

本文研究了将三维基础模型(3DFMs)应用于密集新颖视图合成(NVS)的问题。针对现有方法依赖结构从运动(SfM)获取准确的3D属性(如相机姿态和点云),存在速度慢、在低纹理或低重叠捕捉情况下易出错的问题,提出了VGGT-X方法。该方法通过引入高效的VGGT实现、自适应全局对齐来提高VGGT输出质量,以及稳健的3DGS训练实践,解决了现有方法的内存负担和输出质量问题。实验表明,该方法显著缩小了与COLMAP初始化管道的真实感差距,在无需COLMAP的密集NVS和姿态估计中取得了最新结果。同时,本文还分析了与COLMAP初始化渲染仍存在差距的原因,为未来三维基础模型和密集NVS的发展提供了见解。

Key Takeaways

  1. 研究将三维基础模型(3DFMs)应用于密集新颖视图合成(NVS)。
  2. 现有方法依赖结构从运动(SfM)获取准确的3D属性,存在速度慢和易出错的问题。
  3. 提出了VGGT-X方法,通过引入高效的VGGT实现、自适应全局对齐和稳健的3DGS训练实践来解决现有问题。
  4. VGGT-X方法实现了在无需COLMAP的密集NVS和姿态估计中的最新结果。
  5. 与COLMAP初始化渲染相比,仍存在一定差距。
  6. 分析了剩余差距的原因,为未来三维基础模型和密集NVS的发展提供了见解。

Cool Papers

点此查看论文截图

HBSplat: Robust Sparse-View Gaussian Reconstruction with Hybrid-Loss Guided Depth and Bidirectional Warping

Authors:Yu Ma, Guoliang Wei, Haihong Xiao, Yue Cheng

Novel View Synthesis (NVS) from sparse views presents a formidable challenge in 3D reconstruction, where limited multi-view constraints lead to severe overfitting, geometric distortion, and fragmented scenes. While 3D Gaussian Splatting (3DGS) delivers real-time, high-fidelity rendering, its performance drastically deteriorates under sparse inputs, plagued by floating artifacts and structural failures. To address these challenges, we introduce HBSplat, a unified framework that elevates 3DGS by seamlessly integrating robust structural cues, virtual view constraints, and occluded region completion. Our core contributions are threefold: a Hybrid-Loss Depth Estimation module that ensures multi-view consistency by leveraging dense matching priors and integrating reprojection, point propagation, and smoothness constraints; a Bidirectional Warping Virtual View Synthesis method that enforces substantially stronger constraints by creating high-fidelity virtual views through bidirectional depth-image warping and multi-view fusion; and an Occlusion-Aware Reconstruction component that recovers occluded areas using a depth-difference mask and a learning-based inpainting model. Extensive evaluations on LLFF, Blender, and DTU benchmarks validate that HBSplat sets a new state-of-the-art, achieving up to 21.13 dB PSNR and 0.189 LPIPS, while maintaining real-time inference. Code is available at: https://github.com/eternalland/HBSplat.

在3D重建中,从稀疏视角进行新颖视图合成(NVS)是一项艰巨的挑战。有限的多视角约束会导致过拟合、几何失真和场景碎片化。虽然3D高斯贴图(3DGS)可以实现实时高保真渲染,但在稀疏输入的情况下,其性能会急剧下降,受到浮动伪影和结构失败的困扰。为了解决这些挑战,我们引入了HBSplat统一框架,它通过无缝集成稳健的结构线索、虚拟视图约束和遮挡区域完成来提升3DGS。我们的核心贡献分为三个部分:混合损失深度估计模块,通过利用密集匹配先验知识并集成重投影、点传播和平滑约束来确保多视角一致性;双向扭曲虚拟视图合成方法,通过双向深度图像扭曲和多视角融合创建高保真虚拟视图,从而强制执行更严格的约束;以及一个遮挡感知重建组件,使用深度差异掩膜和基于学习的修复模型来恢复遮挡区域。在LLFF、Blender和DTU基准测试上的广泛评估证明,HBSplat达到了最新技术水平,实现了高达21.13 dB的PSNR和0.189的LPIPS,同时保持实时推理。代码可用在:https://github.com/eternalland/HBSplat。

论文及项目相关链接

PDF 14 pages, 21 figures

摘要

基于稀疏视图的新型视图合成(NVS)在3D重建中面临巨大挑战,有限的视图约束会导致过度拟合、几何失真和场景碎片化。虽然3D高斯平铺(3DGS)可以实现实时高保真渲染,但在稀疏输入情况下性能急剧下降,受到浮点伪影和结构失败的困扰。为了应对这些挑战,我们提出了HBSplat统一框架,它通过无缝集成稳健的结构线索、虚拟视图约束和遮挡区域完成来提升3DGS。我们的核心贡献有三点:混合损失深度估计模块,通过利用密集匹配先验并集成重投影、点传播和平滑约束来确保多视图一致性;双向扭曲虚拟视图合成方法,通过双向深度图像扭曲和多视图融合创建高保真虚拟视图,实施更强约束;遮挡感知重建组件,使用深度差异掩膜和学习型修复模型恢复遮挡区域。在LLFF、Blender和DTU基准测试上的广泛评估表明,HBSplat树立了新标杆,实现了高达21.13 dB的PSNR和0.189的LPIPS指标,同时保持实时推理。相关代码可通过https://github.com/eternalland/HBSplat获取。

关键见解

  1. 稀疏视图下的新型视图合成(NVS)在3D重建中存在挑战,表现为过度拟合、几何失真和场景碎片化。
  2. 3D高斯平铺(3DGS)虽能实现实时高保真渲染,但在稀疏输入时性能下降,有浮点伪影和结构失败的问题。
  3. HBSplat框架通过集成结构线索、虚拟视图约束和遮挡区域完成来改进3DGS。
  4. 引入混合损失深度估计模块,利用密集匹配先验和多视图约束,确保一致性。
  5. 双向扭曲虚拟视图合成方法创建高保真虚拟视图,实施更强约束。
  6. 采用遮挡感知重建组件,利用深度差异掩膜和学习型修复模型恢复遮挡区域。

Cool Papers

点此查看论文截图

DWTGS: Rethinking Frequency Regularization for Sparse-view 3D Gaussian Splatting

Authors:Hung Nguyen, Runfa Li, An Le, Truong Nguyen

Sparse-view 3D Gaussian Splatting (3DGS) presents significant challenges in reconstructing high-quality novel views, as it often overfits to the widely-varying high-frequency (HF) details of the sparse training views. While frequency regularization can be a promising approach, its typical reliance on Fourier transforms causes difficult parameter tuning and biases towards detrimental HF learning. We propose DWTGS, a framework that rethinks frequency regularization by leveraging wavelet-space losses that provide additional spatial supervision. Specifically, we supervise only the low-frequency (LF) LL subbands at multiple DWT levels, while enforcing sparsity on the HF HH subband in a self-supervised manner. Experiments across benchmarks show that DWTGS consistently outperforms Fourier-based counterparts, as this LF-centric strategy improves generalization and reduces HF hallucinations.

稀疏视角的3D高斯Splatting(3DGS)在重建高质量新颖视角方面存在重大挑战,因为它经常过度拟合稀疏训练视角中广泛变化的高频(HF)细节。虽然频率正则化可以是一种有前景的方法,但它通常依赖于傅里叶变换,导致参数调整困难,并偏向于有害的高频学习。我们提出了DWTGS,这是一个利用小波空间损失重新思考频率正则化的框架,它提供了额外的空间监督。具体来说,我们在多个DWT级别仅监督低频(LF)LL子带,同时以自我监督的方式在HF HH子带上强制执行稀疏性。跨基准的实验表明,DWTGS始终优于基于傅立叶的竞争对手,因为这种以LF为中心的策略提高了泛化能力并减少了高频幻觉。

论文及项目相关链接

PDF Accepted to VCIP 2025

Summary

本文讨论了Sparse-view 3D Gaussian Splatting(3DGS)在重建高质量新型视图时面临的挑战,并针对频率正则化提出了新的方法,通过小波空间损失来替代传统的基于傅立叶变换的方法,提供更精确的指导信息以实现对低频子频带多级别的监控以及对高频子频带的自监督稀疏表示,有效改善泛化性能并降低高频子带假像的问题。总体而言,所提方法能够提高性能。但论文并没有对技术细节进行深入讨论。具体内容请参考原文。

Key Takeaways

Cool Papers

点此查看论文截图

LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS

Authors:Wanhua Li, Yujie Zhao, Minghan Qin, Yang Liu, Yuanhao Cai, Chuang Gan, Hanspeter Pfister

In this paper, we introduce LangSplatV2, which achieves high-dimensional feature splatting at 476.2 FPS and 3D open-vocabulary text querying at 384.6 FPS for high-resolution images, providing a 42 $\times$ speedup and a 47 $\times$ boost over LangSplat respectively, along with improved query accuracy. LangSplat employs Gaussian Splatting to embed 2D CLIP language features into 3D, significantly enhancing speed and learning a precise 3D language field with SAM semantics. Such advancements in 3D language fields are crucial for applications that require language interaction within complex scenes. However, LangSplat does not yet achieve real-time inference performance (8.2 FPS), even with advanced A100 GPUs, severely limiting its broader application. In this paper, we first conduct a detailed time analysis of LangSplat, identifying the heavyweight decoder as the primary speed bottleneck. Our solution, LangSplatV2 assumes that each Gaussian acts as a sparse code within a global dictionary, leading to the learning of a 3D sparse coefficient field that entirely eliminates the need for a heavyweight decoder. By leveraging this sparsity, we further propose an efficient sparse coefficient splatting method with CUDA optimization, rendering high-dimensional feature maps at high quality while incurring only the time cost of splatting an ultra-low-dimensional feature. Our experimental results demonstrate that LangSplatV2 not only achieves better or competitive query accuracy but is also significantly faster. Codes and demos are available at our project page: https://langsplat-v2.github.io.

本文介绍了LangSplatV2,它实现了高维特征展铺达476.2 FPS和针对高分辨率图像的3D开放词汇文本查询达384.6 FPS,分别为LangSplat提供了42倍和47倍的速度提升,同时提高了查询准确性。LangSplat采用高斯展铺将2D CLIP语言特征嵌入到3D中,大大提高了速度并学习了具有SAM语义的精确3D语言场。这种在3D语言领域的进步对于需要在复杂场景中进行语言交互的应用程序至关重要。然而,即使使用先进的A1A GPU,LangSplat仍无法实现实时推理性能(8.2 FPS),严重限制了其更广泛的应用。在本文中,我们首先对LangSplat进行了详细的时间分析,确定了重量级解码器是主要的速度瓶颈。我们的解决方案LangSplatV2假设每个高斯在全球词典中作为稀疏代码出现,从而学习一个无需重量级解码器的3D稀疏系数场。通过利用这种稀疏性,我们进一步提出了具有CUDA优化的高效稀疏系数展铺方法,以高质量呈现高维特征映射,同时仅产生展铺超低维特征的时间成本。我们的实验结果表明,LangSplatV2不仅实现了更好的查询准确性,而且速度也更快。相关代码和演示可在我们的项目页面找到:https://langsplat-v2.github.io

论文及项目相关链接

PDF Accepted by NeurIPS 2025. Project Page: https://langsplat-v2.github.io

Summary
论文介绍了LangSplatV2技术,该技术实现了高维特征分裂与查询加速,针对高分辨率图像实现了高维特征分裂每秒达476帧,支持开放词汇的三维文本查询每秒达384帧。相比之前的LangSplat技术,其速度提升了高达47倍。同时引入稀疏系数场学习来消除原有方法中的解码器瓶颈,从而实现更高效的实时推理性能。同时LangSplatV2提供了优化代码与演示实例供研究参考。

**Key Takeaways**
 以下是关于文本的主要见解:
 
 1. LangSplatV2实现了高维特征分裂与查询加速,提高了查询准确性。
 2. LangSplatV2技术采用高斯分裂技术将二维CLIP语言特征嵌入到三维空间。 
 3. LangSplat仍存在速度瓶颈问题,主要源自解码器运行重量较大。 
 4. LangSplatV2引入稀疏系数场学习来消除原有方法中的解码器瓶颈,通过利用稀疏性提高计算效率。 
 5. LangSplatV2实验结果表明其查询准确性更高且速度更快。 
 6. LangSplatV2项目包含代码和演示实例,可供公众查阅和参考。 
 7. 该技术在高清晰度图像处理和语言交互等复杂场景中应用前景广阔。

Cool Papers

点此查看论文截图

Robust Neural Rendering in the Wild with Asymmetric Dual 3D Gaussian Splatting

Authors:Chengqi Li, Zhihao Shi, Yangdi Lu, Wenbo He, Xiangyu Xu

3D reconstruction from in-the-wild images remains a challenging task due to inconsistent lighting conditions and transient distractors. Existing methods typically rely on heuristic strategies to handle the low-quality training data, which often struggle to produce stable and consistent reconstructions, frequently resulting in visual artifacts. In this work, we propose \modelname{}, a novel framework that leverages the stochastic nature of these artifacts: they tend to vary across different training runs due to minor randomness. Specifically, our method trains two 3D Gaussian Splatting (3DGS) models in parallel, enforcing a consistency constraint that encourages convergence on reliable scene geometry while suppressing inconsistent artifacts. To prevent the two models from collapsing into similar failure modes due to confirmation bias, we introduce a divergent masking strategy that applies two complementary masks: a multi-cue adaptive mask and a self-supervised soft mask, which leads to an asymmetric training process of the two models, reducing shared error modes. In addition, to improve the efficiency of model training, we introduce a lightweight variant called Dynamic EMA Proxy, which replaces one of the two models with a dynamically updated Exponential Moving Average (EMA) proxy, and employs an alternating masking strategy to preserve divergence. Extensive experiments on challenging real-world datasets demonstrate that our method consistently outperforms existing approaches while achieving high efficiency. See the project website at https://steveli88.github.io/AsymGS.

从野外图像进行3D重建仍然是一项具有挑战性的任务,因为存在光照条件不一致和短暂干扰物的问题。现有方法通常依赖于启发式策略来处理低质量训练数据,这往往难以产生稳定和一致的重建结果,经常导致出现视觉伪影。在这项工作中,我们提出了名为“模型名称”的新框架,该框架利用这些伪影的随机性特点:由于微小的随机性,它们在不同的训练运行中往往会发生变化。具体来说,我们的方法并行训练两个3D高斯Splatting(3DGS)模型,实施一致性约束,以鼓励可靠的场景几何结构收敛,同时抑制不一致的伪影。为了防止两个模型因确认偏见而陷入类似的失败模式,我们引入了一种发散掩模策略,该策略应用两种互补掩模:多线索自适应掩模和自监督软掩模,这导致两个模型的不对称训练过程,减少了共享错误模式。此外,为了提高模型训练的效率,我们引入了一种轻量级变体,称为动态EMA代理,它用动态更新的指数移动平均(EMA)代理替换其中一个模型,并采用交替掩模策略以保持发散。在具有挑战性的真实世界数据集上进行的广泛实验表明,我们的方法始终优于现有方法,同时实现了高效率。有关详细信息,请访问项目网站:https://steveli88.github.io/AsymGS。

论文及项目相关链接

PDF NeurIPS 2025 Spotlight; Project page: https://steveli88.github.io/AsymGS/

Summary

本文提出一种名为“模型名称”的新框架,利用野图像中随机性强的艺术伪像特点进行3D重建。通过训练两个并行3D高斯拼贴模型并施加一致性约束,来优化低质量训练数据导致的频繁伪像问题。通过引入互补遮罩策略和动态EMA代理,提升模型训练的效率和可靠性。实验结果展示该框架在真实世界数据集上表现优异。

Key Takeaways

  • 引入了一种新的框架处理野图像中的不一致光照和瞬态干扰因素进行3D重建的问题。
  • 通过训练两个并行3D高斯拼贴模型解决低质量训练数据带来的问题,增强了模型稳定性。
  • 一致性约束条件有助于确保模型收敛于可靠的场景几何结构,同时抑制不一致的伪像。
  • 采用互补遮罩策略防止模型陷入相似的失败模式,包括多线索自适应遮罩和自监督软遮罩。
  • 动态EMA代理被用来提升模型训练效率,并且通过交替遮罩策略来保持模型多样性。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
NeRF NeRF
NeRF 方向最新论文已更新,请持续关注 Update in 2025-10-10 VGGT-X When VGGT Meets Dense Novel View Synthesis
2025-10-10
下一篇 
GAN GAN
GAN 方向最新论文已更新,请持续关注 Update in 2025-10-10 SSDD Single-Step Diffusion Decoder for Efficient Image Tokenization
2025-10-10
  目录