⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-11-05 更新
SAGS: Self-Adaptive Alias-Free Gaussian Splatting for Dynamic Surgical Endoscopic Reconstruction
Authors:Wenfeng Huang, Xiangyun Liao, Yinling Qian, Hao Liu, Yongming Yang, Wenjing Jia, Qiong Wang
Surgical reconstruction of dynamic tissues from endoscopic videos is a crucial technology in robot-assisted surgery. The development of Neural Radiance Fields (NeRFs) has greatly advanced deformable tissue reconstruction, achieving high-quality results from video and image sequences. However, reconstructing deformable endoscopic scenes remains challenging due to aliasing and artifacts caused by tissue movement, which can significantly degrade visualization quality. The introduction of 3D Gaussian Splatting (3DGS) has improved reconstruction efficiency by enabling a faster rendering pipeline. Nevertheless, existing 3DGS methods often prioritize rendering speed while neglecting these critical issues. To address these challenges, we propose SAGS, a self-adaptive alias-free Gaussian splatting framework. We introduce an attention-driven, dynamically weighted 4D deformation decoder, leveraging 3D smoothing filters and 2D Mip filters to mitigate artifacts in deformable tissue reconstruction and better capture the fine details of tissue movement. Experimental results on two public benchmarks, EndoNeRF and SCARED, demonstrate that our method achieves superior performance in all metrics of PSNR, SSIM, and LPIPS compared to the state of the art while also delivering better visualization quality.
手术内窥镜视频中的动态组织重建是机器人辅助手术中的一项关键技术。神经辐射场(NeRFs)的发展极大地推动了变形组织重建的进步,能够从视频和图像序列中获得高质量的结果。然而,由于组织运动导致的混叠和伪影,重建可变形内窥镜场景仍然具有挑战性,这可能会显著降低可视化质量。三维高斯平铺(3DGS)的引入提高了重建效率,通过实现更快的渲染流程。然而,现有的三维高斯平铺方法往往优先追求渲染速度而忽视这些关键问题。为了解决这些挑战,我们提出了SAGS,一种自适应无混叠高斯平铺框架。我们引入了一种注意力驱动、动态加权的四维变形解码器,利用三维平滑滤波器和二维Mip滤波器来减少可变形组织重建中的伪影,并更好地捕捉组织运动的细节。在EndoNeRF和SCARED两个公开基准测试上的实验结果表明,我们的方法在峰值信噪比(PSNR)、结构相似性(SSIM)和局部感知图像相似性(LPIPS)等所有指标上均达到了领先水平,同时提供了更好的可视化质量。
论文及项目相关链接
Summary
基于内镜视频的动态组织重建是机器人辅助手术中的重要技术。神经辐射场(NeRFs)的发展极大地推动了变形组织重建的进步,从视频和图像序列中获得了高质量的结果。然而,由于组织运动导致的混叠和伪影,可变形内窥镜场景的重建仍然具有挑战性,这可能会显著降低可视化质量。3D高斯喷洒(3DGS)的引入提高了重建效率,实现了更快的渲染流程。然而,现有的3DGS方法往往侧重于渲染速度,而忽视了这些关键问题。为此,我们提出了SAGS,一种自适应的无混叠高斯喷洒框架。我们引入了一种基于注意力的、动态加权的4D变形解码器,利用3D平滑滤波器和2DMip滤波器来缓解变形组织重建中的伪影,更好地捕捉组织运动的细节。实验结果表明,我们的方法在公共基准测试EndoNeRF和SCARED上的性能优于其他最新技术,并在PSNR、SSIM和LPIPS等指标上均表现优越,同时提供了更好的可视化质量。
Key Takeaways
- 神经辐射场(NeRF)技术已用于从视频和图像序列重建变形组织,并取得高质量结果。
- 可变形内窥镜场景的重建面临组织运动导致的混叠和伪影的挑战。
- 3D高斯喷洒(3DGS)提高了重建效率,但现有方法忽视了混叠和伪影问题。
- 提出的SAGS框架利用动态加权的4D变形解码器和3D/2D滤波器来减少伪影并捕捉组织运动的细节。
- SAGS在EndoNeRF和SCARED基准测试上的性能优于其他最新技术。
- SAGS在PSNR、SSIM和LPIPS等指标上表现优越。
点此查看论文截图
WildfireX-SLAM: A Large-scale Low-altitude RGB-D Dataset for Wildfire SLAM and Beyond
Authors:Zhicong Sun, Jacqueline Lo, Jinxing Hu
3D Gaussian splatting (3DGS) and its subsequent variants have led to remarkable progress in simultaneous localization and mapping (SLAM). While most recent 3DGS-based SLAM works focus on small-scale indoor scenes, developing 3DGS-based SLAM methods for large-scale forest scenes holds great potential for many real-world applications, especially for wildfire emergency response and forest management. However, this line of research is impeded by the absence of a comprehensive and high-quality dataset, and collecting such a dataset over real-world scenes is costly and technically infeasible. To this end, we have built a large-scale, comprehensive, and high-quality synthetic dataset for SLAM in wildfire and forest environments. Leveraging the Unreal Engine 5 Electric Dreams Environment Sample Project, we developed a pipeline to easily collect aerial and ground views, including ground-truth camera poses and a range of additional data modalities from unmanned aerial vehicle. Our pipeline also provides flexible controls on environmental factors such as light, weather, and types and conditions of wildfire, supporting the need for various tasks covering forest mapping, wildfire emergency response, and beyond. The resulting pilot dataset, WildfireX-SLAM, contains 5.5k low-altitude RGB-D aerial images from a large-scale forest map with a total size of 16 km2. On top of WildfireX-SLAM, a thorough benchmark is also conducted, which not only reveals the unique challenges of 3DGS-based SLAM in the forest but also highlights potential improvements for future works. The dataset and code will be publicly available. Project page: https://zhicongsun.github.io/wildfirexslam.
基于三维高斯展布(3DGS)及其后续变种的研究在同时定位与地图构建(SLAM)领域取得了显著进展。虽然最近基于3DGS的SLAM工作主要集中在小规模室内场景,但为大规模森林场景开发基于3DGS的SLAM方法在许多真实世界应用中具有巨大潜力,尤其是在野火应急响应和森林管理领域。然而,由于缺乏全面且高质量的数据集,这一领域的研究受到阻碍,而在真实场景收集此类数据集成本高昂且技术上不可行。为此,我们建立了一个用于野火和森林环境中SLAM的大型、全面、高质量合成数据集。我们利用Unreal Engine 5 Electric Dreams Environment Sample Project,开发了一条管道,可轻松收集空中和地面视图,包括真实的相机姿态和各种额外的数据模式,来源于无人机。我们的管道还提供环境因素的灵活控制,例如光线、天气、以及野火的类型和状况,支持涵盖森林地图、野火应急响应等各种任务的需求以及其他更多领域。由此产生的试点数据集WildfireX-SLAM包含来自大规模森林地图的5500张低空RGB-D航空图像,总面积为16平方公里。在WildfireX-SLAM的基础上,我们还进行了全面的基准测试,这不仅揭示了森林环境中基于3DGS的SLAM所面临的独特挑战,还突出了未来工作的潜在改进方向。数据集和代码将公开可用。项目页面:https://zhicongsun.github.io/wildfirexslam。
论文及项目相关链接
PDF This paper has been accepted by MMM 2026
Summary
本文介绍了基于3D高斯喷溅(3DGS)的同步定位与地图构建(SLAM)技术在森林场景中的研究与应用。文章指出当前SLAM在大规模森林场景中的潜力与缺乏高质量数据集的问题。为此,利用Unreal Engine 5环境,构建了一个大规模、综合、高质量针对森林火灾应急响应等应用的合成数据集WildfireX-SLAM。该数据集包含低空RGB-D航空图像和灵活的环境控制,如光线、天气和火灾类型及状况等。此外,文章还进行了全面的基准测试,揭示了森林环境中基于3DGS的SLAM的独特挑战和未来改进方向。数据集和代码将公开发布。
Key Takeaways
- 3DGS在SLAM中对于大规模森林场景的应用具有潜力。
- 缺乏针对森林场景的高质量SLAM数据集。
- 利用Unreal Engine 5环境构建了一个名为WildfireX-SLAM的大规模、综合、高质量合成数据集。
- WildfireX-SLAM包含低空RGB-D航空图像,并提供了灵活的环境控制。
- 基于WildfireX-SLAM的基准测试揭示了森林环境中基于3DGS的SLAM的挑战和改进方向。
点此查看论文截图
DC4GS: Directional Consistency-Driven Adaptive Density Control for 3D Gaussian Splatting
Authors:Moonsoo Jeong, Dongbeen Kim, Minseong Kim, Sungkil Lee
We present a Directional Consistency (DC)-driven Adaptive Density Control (ADC) for 3D Gaussian Splatting (DC4GS). Whereas the conventional ADC bases its primitive splitting on the magnitudes of positional gradients, we further incorporate the DC of the gradients into ADC, and realize it through the angular coherence of the gradients. Our DC better captures local structural complexities in ADC, avoiding redundant splitting. When splitting is required, we again utilize the DC to define optimal split positions so that sub-primitives best align with the local structures than the conventional random placement. As a consequence, our DC4GS greatly reduces the number of primitives (up to 30% in our experiments) than the existing ADC, and also enhances reconstruction fidelity greatly.
我们提出了一种基于方向一致性(DC)的适应密度控制(ADC)方法,用于三维高斯模糊(DC4GS)。传统的ADC基于位置梯度的幅度进行基本分裂,我们进一步将梯度的DC纳入ADC,并通过梯度的角相干性来实现。我们的DC能更好地捕捉ADC中的局部结构复杂性,避免冗余分裂。当需要分裂时,我们再次使用DC来定义最佳分裂位置,以使子基本体比传统随机放置更好地与局部结构对齐。因此,我们的DC4GS大大减少了基本体的数量(在我们的实验中最多减少30%),并且也比现有的ADC大大提高了重建的保真度。
论文及项目相关链接
PDF Accepted to NeurIPS 2025 / Project page: https://github.com/cgskku/dc4gs
Summary
本文提出了基于方向一致性(DC)驱动的适应性密度控制(ADC)的3D高斯喷绘(DC4GS)。该方法将梯度方向的一致性纳入适应性密度控制中,通过梯度的角度一致性实现。DC能够更好地捕捉适应性密度控制中的局部结构复杂性,避免冗余分裂。在需要分裂时,利用DC定义最佳分裂位置,使子原始数据更好地与局部结构对齐。因此,DC4GS大大减少了原始数据量(实验中减少30%),并大大提高了重建的保真度。
Key Takeaways
- 提出了基于方向一致性(DC)的适应性密度控制(ADC)方法,用于改进3D高斯喷绘(DC4GS)。
- DC方法将梯度方向的一致性纳入ADC中,提高了对局部结构复杂性的捕捉能力。
- DC能够更好地避免冗余分裂,提高分裂效率。
- 在需要分裂时,利用DC定义最佳分裂位置,使子原始数据与局部结构更对齐。
- DC4GS方法相较于传统ADC方法减少了原始数据量。
- DC4GS能够显著提高重建的保真度。
点此查看论文截图
HEIR: Learning Graph-Based Motion Hierarchies
Authors:Cheng Zheng, William Koch, Baiang Li, Felix Heide
Hierarchical structures of motion exist across research fields, including computer vision, graphics, and robotics, where complex dynamics typically arise from coordinated interactions among simpler motion components. Existing methods to model such dynamics typically rely on manually-defined or heuristic hierarchies with fixed motion primitives, limiting their generalizability across different tasks. In this work, we propose a general hierarchical motion modeling method that learns structured, interpretable motion relationships directly from data. Our method represents observed motions using graph-based hierarchies, explicitly decomposing global absolute motions into parent-inherited patterns and local motion residuals. We formulate hierarchy inference as a differentiable graph learning problem, where vertices represent elemental motions and directed edges capture learned parent-child dependencies through graph neural networks. We evaluate our hierarchical reconstruction approach on three examples: 1D translational motion, 2D rotational motion, and dynamic 3D scene deformation via Gaussian splatting. Experimental results show that our method reconstructs the intrinsic motion hierarchy in 1D and 2D cases, and produces more realistic and interpretable deformations compared to the baseline on dynamic 3D Gaussian splatting scenes. By providing an adaptable, data-driven hierarchical modeling paradigm, our method offers a formulation applicable to a broad range of motion-centric tasks. Project Page: https://light.princeton.edu/HEIR/
在运动研究领域,包括计算机视觉、图形学和机器人技术,都存在层次化的运动结构。复杂的动态通常源于简单运动组件之间的协调交互。现有的建模此类动态的方法通常依赖于手动定义或启发式层次结构以及固定的运动原始形态,这限制了它们在不同任务中的通用性。在这项工作中,我们提出了一种通用的层次化运动建模方法,该方法直接从数据中学习结构化、可解释的运动关系。我们的方法使用基于图的层次结构来表示观察到的运动,显式地将全局绝对运动分解为父级继承模式和局部运动残差。我们将层次结构推断制定为可微图学习问题,其中顶点代表基本运动,有向边通过图神经网络捕获习得的父子依赖关系。我们在三个示例上评估了我们的层次重建方法:1D平移运动、2D旋转运动和通过高斯映射的动态3D场景变形。实验结果表明,我们的方法在1D和2D情况下重建了内在运动层次,并在动态3D高斯映射场景上与基线相比产生了更真实、更可解释的变形。通过提供自适应、数据驱动的层次化建模范式,我们的方法适用于广泛的以运动为中心的任务。项目页面:https://light.princeton.edu/HEIR/
论文及项目相关链接
PDF Code link: https://github.com/princeton-computational-imaging/HEIR
Summary
本文提出一种通用的层次化运动建模方法,该方法直接从数据中学习结构化的、可解释的运动关系。通过基于图的层次结构表示观察到的运动,将全局绝对运动显式地分解为父系继承的模式和局部运动残留。层次推断被制定为一个可微分的图学习问题,其中顶点代表基本运动,有向边通过图神经网络学习父-子依赖关系。在1D平移运动、2D旋转运动和动态3D场景变形等三个实验案例中,该方法重建了内在运动层次,并在动态3D高斯喷溅场景中相比基线方法产生更真实、更可解释的结果。该方法提供了一种适应性强、数据驱动的运动层次建模范式,适用于广泛的运动任务。
Key Takeaways
- 层次化运动建模方法直接从数据中学习结构化的运动关系。
- 方法采用基于图的层次结构来表示观察到的运动,分解全局运动和局部运动残留。
- 层次推断被转化为可微分的图学习问题,其中顶点代表基本运动,边代表父-子依赖关系。
- 方法在1D和2D运动案例中成功重建了内在运动层次。
- 在动态3D场景变形实验中,相比基线方法,该方法产生更真实、更可解释的结果。
- 该方法提供了一种数据驱动的运动层次建模范式,具有广泛的适用性。
点此查看论文截图
AgriGS-SLAM: Orchard Mapping Across Seasons via Multi-View Gaussian Splatting SLAM
Authors:Mirko Usuelli, David Rapado-Rincon, Gert Kootstra, Matteo Matteucci
Autonomous robots in orchards require real-time 3D scene understanding despite repetitive row geometry, seasonal appearance changes, and wind-driven foliage motion. We present AgriGS-SLAM, a Visual–LiDAR SLAM framework that couples direct LiDAR odometry and loop closures with multi-camera 3D Gaussian Splatting (3DGS) rendering. Batch rasterization across complementary viewpoints recovers orchard structure under occlusions, while a unified gradient-driven map lifecycle executed between keyframes preserves fine details and bounds memory. Pose refinement is guided by a probabilistic LiDAR-based depth consistency term, back-propagated through the camera projection to tighten geometry-appearance coupling. We deploy the system on a field platform in apple and pear orchards across dormancy, flowering, and harvesting, using a standardized trajectory protocol that evaluates both training-view and novel-view synthesis to reduce 3DGS overfitting in evaluation. Across seasons and sites, AgriGS-SLAM delivers sharper, more stable reconstructions and steadier trajectories than recent state-of-the-art 3DGS-SLAM baselines while maintaining real-time performance on-tractor. While demonstrated in orchard monitoring, the approach can be applied to other outdoor domains requiring robust multimodal perception.
在果园中,自主机器人需要实时理解三维场景,即便面对重复的行列几何结构、季节性的外观变化,以及风力驱动的叶子运动。我们推出了AgriGS-SLAM系统,这是一个视觉与激光雷达SLAM框架,结合了直接的激光雷达里程计与循环闭合,以及多摄像头三维高斯贴片(3DGS)渲染技术。在不同视角的批量光栅化处理下,即使在遮挡情况下也能恢复果园结构;而在关键帧之间执行统一梯度驱动地图生命周期,可以保留细节并限制内存使用。位姿优化由基于概率的激光雷达深度一致性项引导,通过相机投影进行反向传播,以加强几何外观耦合。我们在苹果和梨园的平台现场,跨越休眠期、花期和采收期部署了该系统,采用标准化轨迹协议来评估训练视图和新颖视图合成,以减少评估中的3DGS过度拟合。在不同季节和地点,AgriGS-SLAM提供了更清晰、更稳定的重建结果和更稳定的轨迹跟踪,相较于最新的高级3DGS-SLAM基准测试,同时在拖拉机上实现了实时性能。虽然该方法在果园监测中得到了展示,但它也可以应用于其他需要稳健多模式感知的户外领域。
论文及项目相关链接
Summary
农业环境中的自主机器人需要实时三维场景理解,面临果园排列重复、季节外观变化和风驱枝叶运动等挑战。本文提出AgriGS-SLAM,结合激光雷达直接里程计与循环闭合,以及多相机三维高斯喷绘(3DGS)渲染技术。通过不同视角的批量光栅化恢复果园结构遮挡问题,并在关键帧之间执行统一梯度驱动地图生命周期以保留细节并限制内存。位姿优化由基于概率的激光雷达深度一致性项引导,反向传播到相机投影以加强几何外观耦合。系统部署在苹果和梨树果园的田野平台上,采用标准化轨迹协议评估训练视图和新颖视图合成,以减少3DGS过度拟合于评估结果。跨季节和地点,AgriGS-SLAM提供更清晰、更稳定的重建和更稳定的轨迹,相比最新的先进3DGS-SLAM基线保持实时性能。虽然演示于果园监测,但该方法可应用于其他需要稳健多模式感知的户外领域。
Key Takeaways
- 自主机器人在果园环境中的实时三维场景理解至关重要。
- AgriGS-SLAM结合了LiDAR直接里程计与循环闭合技术。
- 多相机三维高斯喷绘(3DGS)渲染用于恢复果园结构。
- 通过不同视角的批量光栅化处理遮挡问题。
- 统一梯度驱动地图生命周期保留细节并管理内存。
- 位姿优化基于概率的激光雷达深度一致性项引导。
点此查看论文截图
6D Channel Knowledge Map Construction via Bidirectional Wireless Gaussian Splatting
Authors:Juncong Zhou, Chao Hu, Guanlin Wu, Zixiang Ren, Han Hu, Juyong Zhang, Rui Zhang, Jie Xu
This paper investigates the construction of channel knowledge map (CKM) from sparse channel measurements. Dif ferent from conventional two-/three-dimensional (2D/3D) CKM approaches assuming fixed base station configurations, we present a six-dimensional (6D) CKM framework named bidirectional wireless Gaussian splatting (BiWGS), which is capable of mod eling wireless channels across dynamic transmitter (Tx) and receiver (Rx) positions in 3D space. BiWGS uses Gaussian el lipsoids to represent virtual scatterer clusters and environmental obstacles in the wireless environment. By properly learning the bidirectional scattering patterns and complex attenuation profiles based on channel measurements, these ellipsoids inherently cap ture the electromagnetic transmission characteristics of wireless environments, thereby accurately modeling signal transmission under varying transceiver configurations. Experiment results show that BiWGS significantly outperforms classic multi-layer perception (MLP) for the construction of 6D channel power gain map with varying Tx-Rx positions, and achieves spatial spectrum prediction accuracy comparable to the state-of-the art wireless radiation field Gaussian splatting (WRF-GS) for 3D CKM construction. This validates the capability of the proposed BiWGS in accomplishing dimensional expansion of 6D CKM construction, without compromising fidelity.
本文研究了基于稀疏信道测量的信道知识图谱(CKM)的构建问题。不同于传统的二维/三维(2D/3D)CKM方法假定固定基站配置,我们提出了一种名为双向无线高斯展布(BiWGS)的六维(6D)CKM框架。该框架能够建模动态发射器(Tx)和接收器(Rx)在三维空间中的无线信道。BiWGS使用高斯椭圆体来表示无线环境中的虚拟散射簇和环境障碍物。通过基于信道测量适当地学习双向散射模式和复杂的衰减曲线,这些椭圆体能够固有地捕获无线电磁环境的传输特性,从而准确地在不同的收发器配置下建模信号传输。实验结果表明,与经典的多层感知器(MLP)相比,BiWGS在构建具有不同Tx-Rx位置的6D信道功率增益图方面表现显著优势,并在空间频谱预测准确性方面实现了与最新无线辐射场高斯展布(WRF-GS)相比的3D CKM构建。这验证了所提出的BiWGS在完成6D CKM构建的维度扩展方面的能力,并且不会损害保真度。
论文及项目相关链接
Summary
无线通道知识图谱构建技术得到研究,本文提出了一个基于六维的无线通道知识图谱(CKM)框架——双向无线高斯展开技术(BiWGS)。区别于传统的二维或三维CKM方法假设固定基站配置的局限,BiWGS可模拟跨越动态发射器和接收器位置的无线通道,并具有在不同发射和接收配置下精确建模信号传输的能力。实验证明,BiWGS在构建具有不同发射器和接收器位置的六维通道功率增益图方面显著优于经典的多层感知技术,其预测准确性可以与最新无线辐射场高斯展开技术相当。验证了BiWGS在完成六维CKM构建的维度扩展方面的能力,且保真度不会降低。
Key Takeaways
- 本研究探讨了基于稀疏通道测量的无线通道知识图谱(CKM)构建技术。
- 提出了一种新的六维CKM框架——双向无线高斯展开技术(BiWGS)。
- BiWGS能够在动态发射器和接收器位置建模无线通道。
- BiWGS通过模拟虚拟散射簇和环境障碍来捕捉无线环境的电磁传输特性。
- 实验证明,BiWGS在构建六维通道功率增益图方面优于多层感知技术。
- BiWGS的预测准确性可与最新的无线辐射场高斯展开技术相当。
点此查看论文截图
JOGS: Joint Optimization of Pose Estimation and 3D Gaussian Splatting
Authors:Yuxuan Li, Tao Wang, Xianben Yang
Traditional novel view synthesis methods heavily rely on external camera pose estimation tools such as COLMAP, which often introduce computational bottlenecks and propagate errors. To address these challenges, we propose a unified framework that jointly optimizes 3D Gaussian points and camera poses without requiring pre-calibrated inputs. Our approach iteratively refines 3D Gaussian parameters and updates camera poses through a novel co-optimization strategy, ensuring simultaneous improvements in scene reconstruction fidelity and pose accuracy. The key innovation lies in decoupling the joint optimization into two interleaved phases: first, updating 3D Gaussian parameters via differentiable rendering with fixed poses, and second, refining camera poses using a customized 3D optical flow algorithm that incorporates geometric and photometric constraints. This formulation progressively reduces projection errors, particularly in challenging scenarios with large viewpoint variations and sparse feature distributions, where traditional methods struggle. Extensive evaluations on multiple datasets demonstrate that our approach significantly outperforms existing COLMAP-free techniques in reconstruction quality, and also surpasses the standard COLMAP-based baseline in general.
传统的小说视图合成方法严重依赖于如COLMAP等外部相机姿态估计工具,这往往引入计算瓶颈并传播错误。为了解决这些挑战,我们提出了一种联合优化3D高斯点和相机姿态的统一框架,而无需预先校准的输入。我们的方法通过一种新颖的协同优化策略,迭代地优化3D高斯参数并更新相机姿态,确保场景重建保真度和姿态准确性得到同时提高。关键创新在于将联合优化解耦为两个交替进行的阶段:首先,通过可微渲染和固定姿态更新3D高斯参数;其次,使用定制的3D光流算法细化相机姿态,该算法结合了几何和光度约束。这种公式逐步减少了投影误差,特别是在视点变化大、特征分布稀疏等具有挑战性的场景中,传统方法往往难以应对。在多个数据集上的广泛评估表明,我们的方法在重建质量上大大优于现有的无COLMAP技术,并且在总体上甚至超越了基于COLMAP的基线水平。
论文及项目相关链接
Summary
本文提出一种联合优化3D高斯点和相机姿态的统一框架,无需预先校准的输入。通过迭代优化3D高斯参数并更新相机姿态,同时提高场景重建的保真度和姿态准确性。关键创新在于将联合优化解耦为两个交替阶段:首先,通过可微渲染固定姿态更新3D高斯参数;其次,使用定制化的3D光流算法优化相机姿态,融入几何和光度约束。此方法在视点变化大、特征分布稀疏等挑战场景中表现优异,显著优于无COLMAP技术和标准的COLMAP基线方法。
Key Takeaways
- 提出一种无需依赖外部相机姿态估计工具(如COLMAP)的统一框架,解决了传统小说视图合成方法面临的挑战。
- 通过联合优化3D高斯点和相机姿态,提高了场景重建的保真度和姿态准确性。
- 将优化过程解耦为两个交替阶段:更新3D高斯参数和更新相机姿态。
- 利用可微渲染技术固定相机姿态,更新3D高斯参数。
- 采用定制的3D光流算法优化相机姿态,同时考虑几何和光度约束。
- 该方法在视点变化大、特征分布稀疏等挑战场景中表现优异。
点此查看论文截图
D$^2$GS: Dense Depth Regularization for LiDAR-free Urban Scene Reconstruction
Authors:Kejing Xia, Jidong Jia, Ke Jin, Yucai Bai, Li Sun, Dacheng Tao, Youjian Zhang
Recently, Gaussian Splatting (GS) has shown great potential for urban scene reconstruction in the field of autonomous driving. However, current urban scene reconstruction methods often depend on multimodal sensors as inputs, \textit{i.e.} LiDAR and images. Though the geometry prior provided by LiDAR point clouds can largely mitigate ill-posedness in reconstruction, acquiring such accurate LiDAR data is still challenging in practice: i) precise spatiotemporal calibration between LiDAR and other sensors is required, as they may not capture data simultaneously; ii) reprojection errors arise from spatial misalignment when LiDAR and cameras are mounted at different locations. To avoid the difficulty of acquiring accurate LiDAR depth, we propose D$^2$GS, a LiDAR-free urban scene reconstruction framework. In this work, we obtain geometry priors that are as effective as LiDAR while being denser and more accurate. $\textbf{First}$, we initialize a dense point cloud by back-projecting multi-view metric depth predictions. This point cloud is then optimized by a Progressive Pruning strategy to improve the global consistency. $\textbf{Second}$, we jointly refine Gaussian geometry and predicted dense metric depth via a Depth Enhancer. Specifically, we leverage diffusion priors from a depth foundation model to enhance the depth maps rendered by Gaussians. In turn, the enhanced depths provide stronger geometric constraints during Gaussian training. $\textbf{Finally}$, we improve the accuracy of ground geometry by constraining the shape and normal attributes of Gaussians within road regions. Extensive experiments on the Waymo dataset demonstrate that our method consistently outperforms state-of-the-art methods, producing more accurate geometry even when compared with those using ground-truth LiDAR data.
近期,高斯摊铺(GS)在自动驾驶领域的城市场景重建中显示出巨大潜力。然而,当前的城市场景重建方法往往依赖于多模态传感器作为输入,例如激光雷达和图像。尽管激光雷达点云提供的几何先验信息可以在很大程度上缓解重建中的不适定性问题,但在实践中获取如此精确的激光雷达数据仍然具有挑战性:一是对激光雷达与其他传感器之间的精确时空标定有要求,因为它们可能无法同时捕获数据;二是当激光雷达和相机安装在不同的位置时,会出现由空间错位引起的再投影误差。为了避免获取准确激光雷达深度的困难,我们提出了D$^2$GS,这是一种无需激光雷达的城市场景重建框架。在这项工作中,我们获得了与激光雷达同样有效的几何先验信息,同时更加密集和准确。
首先,我们通过后向投影多视图度量深度预测来初始化密集点云。然后采用渐进修剪策略对点云进行优化,以提高全局一致性。
其次,我们通过深度增强器联合优化高斯几何和预测密集度量深度。具体来说,我们利用深度基础模型的扩散先验知识来增强高斯渲染的深度图。反过来,增强的深度在高斯训练过程中提供了更强的几何约束。
论文及项目相关链接
摘要
近期,高斯融合技术(GS)在自动驾驶领域的城市场景重建中展现出巨大潜力。然而,当前的城市场景重建方法往往依赖于多模态传感器输入,如激光雷达和图像。虽然激光雷达点云提供的几何先验信息能够大大减少重建中的不适定性,但在实践中获取准确的激光雷达数据仍然具有挑战性:首先,激光雷达与其他传感器之间的精确时空标定是必要的,因为它们可能无法同时捕获数据;其次,当激光雷达和相机安装位置不同时,会出现空间不匹配导致重投影误差。为了避免获取准确激光雷达深度的困难,我们提出了D$^2$GS,一种无需激光雷达的城市场景重建框架。我们的方法能获取与激光雷达同样有效的几何先验信息,同时更密集、更准确。首先,我们通过后向投影多视角度量深度预测来初始化密集点云,然后通过渐进剪枝策略对其进行优化,以提高全局一致性。其次,我们通过深度增强器联合优化高斯几何和预测密集度量深度。具体来说,我们利用深度基础模型的扩散先验信息来增强由高斯渲染的深度图。反过来,增强后的深度在训练高斯时提供了更强的几何约束。最后,我们通过约束道路区域内的高斯形状和法线属性来提高地面几何的准确性。在Waymo数据集上的广泛实验表明,我们的方法始终优于最先进的方法,即使在与使用真实激光雷达数据的方法相比时,也能产生更准确的几何结构。
要点摘要
- 高斯融合技术(GS)在城市场景重建中具有潜力。
- 当前方法依赖多模态传感器,如激光雷达和图像,但获取准确激光雷达数据具有挑战性。
- 提出D$^2$GS框架,无需激光雷达进行城市场景重建。
- 通过后向投影多视角度量深度预测初始化密集点云,并通过渐进剪枝策略优化。
- 联合优化高斯几何和预测深度,利用深度基础模型的扩散先验信息增强深度图。
- 增强后的深度在训练高斯时提供更强几何约束。
点此查看论文截图
AtlasGS: Atlanta-world Guided Surface Reconstruction with Implicit Structured Gaussians
Authors:Xiyu Zhang, Chong Bao, Yipeng Chen, Hongjia Zhai, Yitong Dong, Hujun Bao, Zhaopeng Cui, Guofeng Zhang
3D reconstruction of indoor and urban environments is a prominent research topic with various downstream applications. However, existing geometric priors for addressing low-texture regions in indoor and urban settings often lack global consistency. Moreover, Gaussian Splatting and implicit SDF fields often suffer from discontinuities or exhibit computational inefficiencies, resulting in a loss of detail. To address these issues, we propose an Atlanta-world guided implicit-structured Gaussian Splatting that achieves smooth indoor and urban scene reconstruction while preserving high-frequency details and rendering efficiency. By leveraging the Atlanta-world model, we ensure the accurate surface reconstruction for low-texture regions, while the proposed novel implicit-structured GS representations provide smoothness without sacrificing efficiency and high-frequency details. Specifically, we propose a semantic GS representation to predict the probability of all semantic regions and deploy a structure plane regularization with learnable plane indicators for global accurate surface reconstruction. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in both indoor and urban scenes, delivering superior surface reconstruction quality.
室内和室外环境的3D重建是一个重要的研究主题,具有多种下游应用。然而,针对室内和城市环境中低纹理区域的现有几何先验通常缺乏全局一致性。此外,高斯平铺和隐式SDF场经常出现不连续或计算效率低下的问题,导致细节丢失。为了解决这些问题,我们提出了一种受亚特兰大世界引导的隐式结构化高斯平铺方法,该方法可以实现室内和城市场景的平滑重建,同时保留高频细节并提高渲染效率。通过利用亚特兰大世界模型,我们确保了低纹理区域的准确表面重建,而我们所提出的新型隐式结构化GS表示则提供了平滑性,同时不牺牲效率和高频细节。具体来说,我们提出了一种语义GS表示,用于预测所有语义区域的概率,并部署了一种结构平面正则化以及可学习的平面指标进行全局准确表面重建。大量实验表明,我们的方法在室内和室外场景中都优于最新技术,可提供卓越的表面重建质量。
论文及项目相关链接
PDF 18 pages, 11 figures. NeurIPS 2025; Project page: https://zju3dv.github.io/AtlasGS/
Summary
本文提出一种基于Atlanta-world模型引导的新型隐结构高斯喷射法(Gaussian Splatting),旨在解决室内和城市环境中低纹理区域的几何先验缺乏全局一致性的问题。该方法不仅能平滑重建场景,还能保留高频细节和提高渲染效率。通过引入语义GS表示和结构平面正则化,实现了对低纹理区域的准确表面重建和全局精确表面重建。
Key Takeaways
- 3D重建室内和城市环境是热门研究话题,具有多种下游应用。
- 现有几何先验在处理低纹理区域时缺乏全局一致性。
- 提出的基于Atlanta-world模型引导的新型隐结构高斯喷射法可实现平滑的室内和城市场景重建。
- 该方法能保留高频细节并提高渲染效率。
- 语义GS表示用于预测所有语义区域的概率。
- 引入结构平面正则化,通过可学习的平面指标实现全局精确表面重建。
点此查看论文截图
NVSim: Novel View Synthesis Simulator for Large Scale Indoor Navigation
Authors:Mingyu Jeong, Eunsung Kim, Sehun Park, Andrew Jaeyong Choi
We present NVSim, a framework that automatically constructs large-scale, navigable indoor simulators from only common image sequences, overcoming the cost and scalability limitations of traditional 3D scanning. Our approach adapts 3D Gaussian Splatting to address visual artifacts on sparsely observed floors a common issue in robotic traversal data. We introduce Floor-Aware Gaussian Splatting to ensure a clean, navigable ground plane, and a novel mesh-free traversability checking algorithm that constructs a topological graph by directly analyzing rendered views. We demonstrate our system’s ability to generate valid, large-scale navigation graphs from real-world data. A video demonstration is avilable at https://youtu.be/tTiIQt6nXC8
我们推出了NVSim,这是一个能够从常见的图像序列自动构建大规模、可导航室内模拟器的框架,克服了传统3D扫描的成本和可扩展性限制。我们的方法适应了3D高斯拼贴技术,以解决在机器人遍历数据中常见的稀疏观测地面上的视觉伪影问题。我们引入了地面感知高斯拼贴技术,以确保干净、可导航的地面平面,以及一种新型的无需网格的遍历性检查算法,该算法通过直接分析渲染视图来构建拓扑图。我们展示了我们的系统从真实世界数据生成有效的大规模导航图的能力。视频演示可在https://youtu.be/tTiIQt6nXC8观看。
论文及项目相关链接
PDF 9 pages, 10 figures
Summary
基于常见的图像序列,NVSim框架可自动构建大规模可导航室内模拟器,解决了传统3D扫描的成本和可扩展性限制问题。该方法采用改进的3D高斯拼贴技术,解决了机器人遍历数据在稀疏观测地板上常见的视觉伪影问题。通过引入地板感知高斯拼贴技术,确保清洁、可导航的地面平面,并采用新型的无需网格的遍历性检查算法,通过直接分析渲染视图构建拓扑图。系统能够从真实世界数据生成有效的大规模导航图。
Key Takeaways
- NVSim框架能够从常见的图像序列自动构建大规模可导航室内模拟器。
- NVSim解决了传统3D扫描的成本和可扩展性限制。
- 采用改进的3D高斯拼贴技术处理视觉伪影问题。
- 引入地板感知高斯拼贴技术,确保清洁、可导航的地面平面。
- 采用无需网格的遍历性检查算法,直接分析渲染视图构建拓扑图。
- 系统能够生成有效的大规模导航图。
点此查看论文截图
LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation
Authors:Haotian Zhou, Xiaole Wang, He Li, Fusheng Sun, Shengyu Guo, Guolei Qi, Jianghuan Xu, Huijing Zhao
Navigating to a designated goal using visual information is a fundamental capability for intelligent robots. Most classical visual navigation methods are restricted to single-goal, single-modality, and closed set goal settings. To address the practical demands of multi-modal, open-vocabulary goal queries and multi-goal visual navigation, we propose LagMemo, a navigation system that leverages a language 3D Gaussian Splatting memory. During exploration, LagMemo constructs a unified 3D language memory. With incoming task goals, the system queries the memory, predicts candidate goal locations, and integrates a local perception-based verification mechanism to dynamically match and validate goals during navigation. For fair and rigorous evaluation, we curate GOAT-Core, a high-quality core split distilled from GOAT-Bench tailored to multi-modal open-vocabulary multi-goal visual navigation. Experimental results show that LagMemo’s memory module enables effective multi-modal open-vocabulary goal localization, and that LagMemo outperforms state-of-the-art methods in multi-goal visual navigation. Project page: https://weekgoodday.github.io/lagmemo
使用视觉信息导航到指定目标是智能机器人的基本能力。大多数传统的视觉导航方法仅限于单目标、单模态和封闭集合目标设置。为了解决多模态、开放词汇目标查询和多目标视觉导航的实际需求,我们提出了LagMemo导航系统,该系统利用语言3D高斯散斑记忆。在探索过程中,LagMemo构建了一个统一的3D语言记忆。对于输入的任务目标,系统会查询记忆,预测候选目标位置,并结合基于局部感知的验证机制,在导航过程中动态匹配和验证目标。为了进行公平和严格的评价,我们从GOAT-Bench中精心挑选了GOAT-Core高质量核心子集,专门用于多模态开放词汇多目标视觉导航。实验结果表明,LagMemo的记忆模块能够实现有效的多模态开放词汇目标定位,并且在多目标视觉导航方面优于最先进的方法。项目页面:https://weekgoodday.github.io/lagmemo
论文及项目相关链接
Summary
LagMemo是一种针对智能机器人的导航系统的提出,解决了现实环境中的多种模态、开放式词汇表的目标查询和多目标视觉导航问题。该系统利用语言3D高斯混合记忆技术,构建统一的语言记忆模型,通过查询记忆预测目标位置,并结合基于局部感知的验证机制进行动态匹配和验证。实验结果表明,LagMemo的记忆模块能够实现有效的多模态开放式词汇表目标定位,并在多目标视觉导航方面优于现有技术。
Key Takeaways
- LagMemo是一个解决多模态、开放式词汇表和多目标视觉导航问题的系统。
- 利用语言3D高斯混合记忆技术构建统一的语言记忆模型。
- 通过查询记忆预测目标位置,并结合局部感知进行动态匹配和验证。
- GOAT-Core是一个专为多模态开放式词汇表多目标视觉导航任务定制的优质核心数据集。
- 实验结果显示LagMemo的记忆模块具有有效定位多模态开放式词汇表目标的能力。
- LagMemo在多目标视觉导航方面的性能优于现有技术。
点此查看论文截图
GauSSmart: Enhanced 3D Reconstruction through 2D Foundation Models and Geometric Filtering
Authors:Alexander Valverde, Brian Xu, Yuyin Zhou, Meng Xu, Hongyun Wang
Scene reconstruction has emerged as a central challenge in computer vision, with approaches such as Neural Radiance Fields (NeRF) and Gaussian Splatting achieving remarkable progress. While Gaussian Splatting demonstrates strong performance on large-scale datasets, it often struggles to capture fine details or maintain realism in regions with sparse coverage, largely due to the inherent limitations of sparse 3D training data. In this work, we propose GauSSmart, a hybrid method that effectively bridges 2D foundational models and 3D Gaussian Splatting reconstruction. Our approach integrates established 2D computer vision techniques, including convex filtering and semantic feature supervision from foundational models such as DINO, to enhance Gaussian-based scene reconstruction. By leveraging 2D segmentation priors and high-dimensional feature embeddings, our method guides the densification and refinement of Gaussian splats, improving coverage in underrepresented areas and preserving intricate structural details. We validate our approach across three datasets, where GauSSmart consistently outperforms existing Gaussian Splatting in the majority of evaluated scenes. Our results demonstrate the significant potential of hybrid 2D-3D approaches, highlighting how the thoughtful combination of 2D foundational models with 3D reconstruction pipelines can overcome the limitations inherent in either approach alone.
场景重建已成为计算机视觉的核心挑战,神经辐射场(NeRF)和高斯贴图(Gaussian Splatting)等方法取得了显著进展。虽然高斯贴图在大规模数据集上表现出强大的性能,但在稀疏覆盖的区域捕捉精细细节或保持真实感方面常常遇到困难,这主要是由于稀疏3D训练数据的固有局限性所致。
论文及项目相关链接
摘要
提出了一种结合二维基础模型和三维高斯平铺重建的混合方法GauSSmart,解决了场景重建中的关键问题。该方法整合了成熟的二维计算机视觉技术,如凸过滤器和来自DINO等基础模型的语义特征监督,增强了基于高斯的场景重建。通过利用二维分割先验和高维特征嵌入,该方法指导高斯平铺的稠密化和精细化,改进了代表性不足区域的覆盖情况,同时保留了复杂结构细节。在三个数据集上的实验结果表明,GauSSmart在大多数评估场景中一致地优于现有的高斯平铺技术,展示了混合二维-三维方法的巨大潜力。
要点
- Gaussian Splatting在大型数据集上表现出强大的性能,但在细节捕捉和稀疏区域的现实性维持方面存在挑战。
- 提出了一种新的混合方法GauSSmart,结合了二维基础模型和三维Gaussian Splatting重建。
- 通过整合成熟的二维计算机视觉技术(如凸过滤器和来自DINO等基础模型的语义特征监督)增强了Gaussian-based场景重建。
- 利用二维分割先验和高维特征嵌入,改进了稀疏区域的覆盖并保留了复杂结构细节。
- 在多个数据集上验证了GauSSmart的有效性,表现优于传统的高斯Splatting方法。
- 显示了混合二维-三维方法克服单一方法局限性的巨大潜力。
- 高维度的特征嵌入和二维分割先验对于稠密化和精细化高斯平铺起到了关键作用。
点此查看论文截图
Capture, Canonicalize, Splat: Zero-Shot 3D Gaussian Avatars from Unstructured Phone Images
Authors:Emanuel Garbin, Guy Adam, Oded Krams, Zohar Barzelay, Eran Guendelman, Michael Schwarz, Matteo Presutto, Moran Vatelmacher, Yigal Shenkman, Eli Peker, Itai Druker, Uri Patish, Yoav Blum, Max Bluvstein, Junxuan Li, Rawal Khirodkar, Shunsuke Saito
We present a novel, zero-shot pipeline for creating hyperrealistic, identity-preserving 3D avatars from a few unstructured phone images. Existing methods face several challenges: single-view approaches suffer from geometric inconsistencies and hallucinations, degrading identity preservation, while models trained on synthetic data fail to capture high-frequency details like skin wrinkles and fine hair, limiting realism. Our method introduces two key contributions: (1) a generative canonicalization module that processes multiple unstructured views into a standardized, consistent representation, and (2) a transformer-based model trained on a new, large-scale dataset of high-fidelity Gaussian splatting avatars derived from dome captures of real people. This “Capture, Canonicalize, Splat” pipeline produces static quarter-body avatars with compelling realism and robust identity preservation from unstructured photos.
我们提出了一种新颖的零样本管道,用于从少量非结构化的手机图像中创建超逼真的身份保留3D头像。现有方法面临诸多挑战:单视图方法受到几何不一致和幻觉的影响,损害身份保留,而基于合成数据训练的模型无法捕捉皮肤皱纹和精细头发等高频细节,限制了真实性。我们的方法引入了两个关键贡献:(1)一个生成规范化模块,该模块将多个非结构化视图处理为标准化、一致性的表示;(2)一个基于变压器的新模型,该模型是在从真实人物的穹顶捕获的高保真高斯平铺头像的大规模数据集上训练的。“捕获、规范化、平铺”管道能够从非结构化照片中产生静态的四分之一身体头像,具有引人注目的真实性和稳健的身份保留。
论文及项目相关链接
PDF This work received the Best Paper Honorable Mention at the AMFG Workshop, ICCV 2025
Summary
本文介绍了一种全新的零样本管道,该管道可从少量非结构化手机图像中创建超逼真的身份保留3D头像。现有方法面临几何不一致性、图像变形等问题,难以保持身份的一致性。本文提出的方法有两个关键贡献:一是生成规范模块,能将多个非结构化视图处理成标准化的一致表示;二是基于转换器的新大型数据集模型训练模型训练于基于真实人物穹顶捕捉的高保真高斯涂抹头像数据集。这种“捕捉、规范、涂抹”管道能从非结构化照片中生成令人信服的四分之一身体头像,具有高度现实感和强大的身份保留能力。
Key Takeaways
- 提出了一种新的零样本管道用于创建超逼真的身份保留的3D头像。
- 解决了现有方法面临的几何不一致性和图像变形问题。
- 生成规范模块能将多个非结构化视图处理成标准化的一致表示。
- 利用基于转换器的大型数据集模型训练于基于真实人物穹顶捕捉的高保真高斯涂抹头像数据集。
- 该方法实现了从非结构化照片生成高度逼真的四分之一身体头像的能力。
- 具有高度的现实感和强大的身份保留能力是该方法的显著特点。
点此查看论文截图
Dynamic Gaussian Splatting from Defocused and Motion-blurred Monocular Videos
Authors:Xuankai Zhang, Junjin Xiao, Qing Zhang
This paper presents a unified framework that allows high-quality dynamic Gaussian Splatting from both defocused and motion-blurred monocular videos. Due to the significant difference between the formation processes of defocus blur and motion blur, existing methods are tailored for either one of them, lacking the ability to simultaneously deal with both of them. Although the two can be jointly modeled as blur kernel-based convolution, the inherent difficulty in estimating accurate blur kernels greatly limits the progress in this direction. In this work, we go a step further towards this direction. Particularly, we propose to estimate per-pixel reliable blur kernels using a blur prediction network that exploits blur-related scene and camera information and is subject to a blur-aware sparsity constraint. Besides, we introduce a dynamic Gaussian densification strategy to mitigate the lack of Gaussians for incomplete regions, and boost the performance of novel view synthesis by incorporating unseen view information to constrain scene optimization. Extensive experiments show that our method outperforms the state-of-the-art methods in generating photorealistic novel view synthesis from defocused and motion-blurred monocular videos. Our code is available at https://github.com/hhhddddddd/dydeblur.
本文提出了一个统一的框架,可以从失焦和动态模糊的单目视频中实现高质量动态高斯混合。由于失焦模糊和动态模糊的形成过程之间存在很大差异,现有方法大多针对其中一种情况,缺乏同时处理两者的能力。虽然可以将这两者共同建模为基于模糊核的卷积,但估计准确模糊核的固有难度极大地限制了这一方向的发展。在这项工作中,我们朝这个方向迈出了新的一步。具体来说,我们提出利用模糊预测网络来估计每个像素可靠的模糊核,该网络利用与模糊相关的场景和相机信息,并受到模糊感知稀疏性约束。此外,我们引入了一种动态高斯稠密化策略,以缓解不完整区域的高斯缺乏问题,并通过融入未见视图信息来优化场景合成性能。大量实验表明,我们的方法在从失焦和动态模糊的单目视频中生成逼真新视图方面优于最先进的方法。我们的代码可在https://github.com/hhhddddddd/dydeblur上找到。
论文及项目相关链接
PDF Accepted to NeurIPS 2025
Summary
本文提出了一种统一的框架,能够从失焦和动态模糊的单目视频中实现高质量动态高斯融合。由于失焦模糊和动态模糊形成过程的显著差异,现有方法大多针对其中之一,缺乏同时处理两者的能力。本文提出了一种基于模糊预测网络的可靠模糊核估计方法,并结合模糊感知稀疏约束,利用场景和相机信息进行模糊相关预测。此外,还引入了一种动态高斯稠密化策略,以缓解不完整区域的高斯缺失问题,并通过融入未见视图信息提升场景优化的性能。实验表明,该方法在生成失焦和动态模糊单目视频的光照真实感新视图方面优于现有技术。
Key Takeaways
- 文章提出了一种新的统一框架,用于从失焦和动态模糊的单目视频中实现高质量动态高斯融合。
- 现有方法主要处理失焦模糊或动态模糊中的一种,而该文章的方法可以同时处理两者。
- 该文章通过模糊预测网络估计可靠的模糊核,并结合模糊感知稀疏约束来处理模糊相关预测。
- 引入了一种动态高斯稠密化策略,以改善不完整区域的高斯缺失问题。
- 通过融入未见视图信息提升场景优化的性能。
- 实验证明该方法在生成失焦和动态模糊单目视频的光照真实感新视图方面优于现有技术。
点此查看论文截图
InstDrive: Instance-Aware 3D Gaussian Splatting for Driving Scenes
Authors:Hongyuan Liu, Haochen Yu, Bochao Zou, Jianfei Jiang, Qiankun Liu, Jiansheng Chen, Huimin Ma
Reconstructing dynamic driving scenes from dashcam videos has attracted increasing attention due to its significance in autonomous driving and scene understanding. While recent advances have made impressive progress, most methods still unify all background elements into a single representation, hindering both instance-level understanding and flexible scene editing. Some approaches attempt to lift 2D segmentation into 3D space, but often rely on pre-processed instance IDs or complex pipelines to map continuous features to discrete identities. Moreover, these methods are typically designed for indoor scenes with rich viewpoints, making them less applicable to outdoor driving scenarios. In this paper, we present InstDrive, an instance-aware 3D Gaussian Splatting framework tailored for the interactive reconstruction of dynamic driving scene. We use masks generated by SAM as pseudo ground-truth to guide 2D feature learning via contrastive loss and pseudo-supervised objectives. At the 3D level, we introduce regularization to implicitly encode instance identities and enforce consistency through a voxel-based loss. A lightweight static codebook further bridges continuous features and discrete identities without requiring data pre-processing or complex optimization. Quantitative and qualitative experiments demonstrate the effectiveness of InstDrive, and to the best of our knowledge, it is the first framework to achieve 3D instance segmentation in dynamic, open-world driving scenes.More visualizations are available at our project page.
重建行车动态场景从行车记录仪视频已经引起了越来越多的关注,这在自动驾驶和场景理解方面具有重要意义。虽然最近的进展已经取得了令人印象深刻的进步,但大多数方法仍然将所有背景元素统一到一个单一表示中,这阻碍了实例级别的理解和灵活的场景编辑。一些方法试图将2D分割提升到3D空间,但通常依赖于预先处理的实例ID或复杂的管道来将连续特征映射到离散身份。而且,这些方法通常针对室内场景设计,视点丰富,使其不太适合室外驾驶场景。在本文中,我们提出了专为动态驾驶场景的交互式重建量身定制的实例感知3D高斯Splatting框架——InstDrive。我们使用SAM生成的蒙版作为伪真实标签,通过对比损失和伪监督目标来指导2D特征学习。在3D层面,我们引入正则化来隐式编码实例身份并通过基于体素的损失强制执行一致性。一个轻量级的静态码本进一步连接连续特征和离散身份,无需数据预处理或复杂的优化。定量和定性实验证明了InstDrive的有效性,据我们所知,它是第一个实现在动态、开放世界驾驶场景中3D实例分割的框架。更多的可视化内容可在我们的项目页面查看。
论文及项目相关链接
Summary
该文本提出了一种名为InstDrive的交互式重建动态驾驶场景的实例感知3D高斯涂抹框架。它通过生成对抗掩模来引导特征学习,同时在3D级别引入正则化隐式编码实例身份和强制一致性。框架可在无预处理复杂优化和数据预处理的情境下连接连续特征和离散身份。此为动态、开放世界驾驶场景中的首个实现3D实例分割的框架。
Key Takeaways
- 动态驾驶场景的重建已成为自主驾驶和场景理解的重要研究方向。
- 目前大多数方法无法区分背景元素的实例级别,影响了对场景的理解和编辑灵活性。
- 该文本提出了一种名为InstDrive的新框架,旨在实现动态驾驶场景的交互式重建,并特别关注实例感知的3D分割。
- 使用SAM生成的掩模作为伪真实标签来引导特征学习。
- 在3D层面引入正则化,隐式编码实例身份并强制一致性。
- 通过轻量级静态代码本桥接连续特征和离散身份,无需数据预处理或复杂优化。
点此查看论文截图
MILo: Mesh-In-the-Loop Gaussian Splatting for Detailed and Efficient Surface Reconstruction
Authors:Antoine Guédon, Diego Gomez, Nissim Maruani, Bingchen Gong, George Drettakis, Maks Ovsjanikov
While recent advances in Gaussian Splatting have enabled fast reconstruction of high-quality 3D scenes from images, extracting accurate surface meshes remains a challenge. Current approaches extract the surface through costly post-processing steps, resulting in the loss of fine geometric details or requiring significant time and leading to very dense meshes with millions of vertices. More fundamentally, the a posteriori conversion from a volumetric to a surface representation limits the ability of the final mesh to preserve all geometric structures captured during training. We present MILo, a novel Gaussian Splatting framework that bridges the gap between volumetric and surface representations by differentiably extracting a mesh from the 3D Gaussians. We design a fully differentiable procedure that constructs the mesh-including both vertex locations and connectivity-at every iteration directly from the parameters of the Gaussians, which are the only quantities optimized during training. Our method introduces three key technical contributions: a bidirectional consistency framework ensuring both representations-Gaussians and the extracted mesh-capture the same underlying geometry during training; an adaptive mesh extraction process performed at each training iteration, which uses Gaussians as differentiable pivots for Delaunay triangulation; a novel method for computing signed distance values from the 3D Gaussians that enables precise surface extraction while avoiding geometric erosion. Our approach can reconstruct complete scenes, including backgrounds, with state-of-the-art quality while requiring an order of magnitude fewer mesh vertices than previous methods. Due to their light weight and empty interior, our meshes are well suited for downstream applications such as physics simulations or animation.
虽然高斯涂抹(Gaussian Splatting)的近期进展已经实现了从图像快速重建高质量的三维场景,但提取精确的表面网格仍然是一个挑战。当前的方法通过昂贵的后处理步骤提取表面,导致丢失了精细的几何细节,或者需要花费大量时间,并产生非常密集的网格,包含数百万个顶点。更根本的是,从体积表示到表面表示的后期转换限制了最终网格保留在训练过程中捕获的所有几何结构的能力。我们提出了MILo,这是一种新型的高斯涂抹框架,它通过从三维高斯中可微提取网格来弥合体积表示和表面表示之间的差距。我们设计了一种完全可微分的程序,它可以直接从高斯的参数(在训练过程中唯一被优化的量)中,在每次迭代时构建网格,包括顶点的位置和连接。我们的方法引入了三项关键的技术贡献:一种双向一致性框架,确保高斯和提取的网格在训练过程中捕获相同的底层几何;一种在每个训练迭代中进行的自适应网格提取过程,它使用高斯作为Delaunay三角剖分的可微分支点;一种从三维高斯计算带符号距离值的新方法,它能够实现精确的表面提取,同时避免几何侵蚀。我们的方法可以重建包括背景在内的完整场景,达到最先进的品质,同时所需的网格顶点比以前的方法少一个数量级。由于我们的网格轻巧且内部为空,因此非常适合用于物理模拟或动画等下游应用。
论文及项目相关链接
PDF 10 pages. A presentation video of our approach is available at https://youtu.be/_SGNhhNz0fE
Summary
本文介绍了基于高斯拼贴的新方法MILO,该方法能够在训练过程中直接从高斯参数构建网格,实现高质量的三维场景重建。MILO通过可微分的程序提取网格,包含顶点位置和连接信息,并引入三项关键技术贡献:确保高斯和提取网格捕捉相同底层几何的双向一致性框架;以高斯作为可微分中心的自适应网格提取过程;计算从三维高斯中精确提取表面的带符号距离值的新方法。该方法能够重建包括背景在内的完整场景,使用比传统方法更少一个数量级的网格顶点。
Key Takeaways
- MILO是一种基于高斯拼贴的新方法,实现了高质量的三维场景重建。
- 传统的网格提取方法成本高且可能损失细节,而MILO能够在训练过程中直接构建网格,减少了复杂的后期处理步骤。
- MILO的主要技术贡献包括双向一致性框架、自适应网格提取过程和计算带符号距离值的新方法。
点此查看论文截图
Anti-Aliased 2D Gaussian Splatting
Authors:Mae Younes, Adnane Boukhayma
2D Gaussian Splatting (2DGS) has recently emerged as a promising method for novel view synthesis and surface reconstruction, offering better view-consistency and geometric accuracy than volumetric 3DGS. However, 2DGS suffers from severe aliasing artifacts when rendering at different sampling rates than those used during training, limiting its practical applications in scenarios requiring camera zoom or varying fields of view. We identify that these artifacts stem from two key limitations: the lack of frequency constraints in the representation and an ineffective screen-space clamping approach. To address these issues, we present AA-2DGS, an anti-aliased formulation of 2D Gaussian Splatting that maintains its geometric benefits while significantly enhancing rendering quality across different scales. Our method introduces a world-space flat smoothing kernel that constrains the frequency content of 2D Gaussian primitives based on the maximal sampling frequency from training views, effectively eliminating high-frequency artifacts when zooming in. Additionally, we derive a novel object-space Mip filter by leveraging an affine approximation of the ray-splat intersection mapping, which allows us to efficiently apply proper anti-aliasing directly in the local space of each splat.
二维高斯贴片(2DGS)作为一种新兴技术,在新型视图合成和表面重建领域展现出巨大的潜力,相较于体积三维高斯贴片(3DGS),它在视图一致性和几何精度方面表现更佳。然而,当在不同于训练时使用的采样率进行渲染时,二维高斯贴片会出现严重的混叠伪影,限制了其在相机缩放或变化视野场景中的实际应用。我们确定了这些伪影源于两个主要局限:表示中缺乏频率约束以及屏幕空间夹持方法无效。为了解决这些问题,我们提出了抗混叠二维高斯贴片(AA-2DGS),这是一种保持几何优势的同时显著提高不同尺度渲染质量的抗混叠二维高斯贴片方法。我们的方法引入了一个世界空间平面平滑核,根据训练视图的最高采样频率约束二维高斯基元的频率内容,从而在放大时有效消除高频伪影。此外,我们通过利用射线贴片交点映射的仿射近似,推导了一种新型的对象空间Mip滤波器,这使我们能够在每个贴片的局部空间中直接有效地应用适当的抗混叠处理。
论文及项目相关链接
PDF NeurIPS 2025. Code will be available at https://github.com/maeyounes/AA-2DGS
Summary
这篇摘要介绍了二维高斯采样技术(2DGS)的局限性和抗锯齿处理解决方案。文章指出,尽管其在新型视角合成和表面重建领域表现优异,但在不同采样率下进行渲染时会出现严重的锯齿状伪影,限制了其在相机缩放或不同视角场景中的应用。文章提出了AA-2DGS方案,通过引入世界空间平面平滑核和对象空间Mip滤镜技术,消除了锯齿状伪影并提升了渲染质量。这一技术增强了高频信息的处理,优化了视觉效果。
Key Takeaways
- 二维高斯采样技术(2DGS)在视角合成和表面重建领域表现优异,但在不同采样率下渲染时出现严重的锯齿状伪影。这限制了它在需要相机缩放或不同视野场景中的实际应用。这些伪影是由于表示中的频率约束不足和屏幕空间夹持方法无效导致的。
- AA-2DGS是一种抗锯齿处理的二维高斯采样技术,它保持了二维高斯采样的几何优势,同时显著提高了不同尺度下的渲染质量。通过引入世界空间平面平滑核技术,基于训练视图的最大采样频率约束二维高斯原始数据的频率内容,有效地消除了缩放时的高频伪影。
点此查看论文截图
HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene
Authors:Jianing Chen, Zehao Li, Yujun Cai, Hao Jiang, Chengxuan Qian, Juyuan Kang, Shuqin Gao, Honglong Zhao, Tianlu Mao, Yucheng Zhang
Reconstructing dynamic 3D scenes from monocular videos remains a fundamental challenge in 3D vision. While 3D Gaussian Splatting (3DGS) achieves real-time rendering in static settings, extending it to dynamic scenes is challenging due to the difficulty of learning structured and temporally consistent motion representations. This challenge often manifests as three limitations in existing methods: redundant Gaussian updates, insufficient motion supervision, and weak modeling of complex non-rigid deformations. These issues collectively hinder coherent and efficient dynamic reconstruction. To address these limitations, we propose HAIF-GS, a unified framework that enables structured and consistent dynamic modeling through sparse anchor-driven deformation. It first identifies motion-relevant regions via an Anchor Filter to suppress redundant updates in static areas. A self-supervised Induced Flow-Guided Deformation module induces anchor motion using multi-frame feature aggregation, eliminating the need for explicit flow labels. To further handle fine-grained deformations, a Hierarchical Anchor Propagation mechanism increases anchor resolution based on motion complexity and propagates multi-level transformations. Extensive experiments on synthetic and real-world benchmarks validate that HAIF-GS significantly outperforms prior dynamic 3DGS methods in rendering quality, temporal coherence, and reconstruction efficiency.
从单目视频中重建动态三维场景仍然是三维视觉领域的一个基本挑战。虽然三维高斯拼贴(3DGS)在静态场景中实现了实时渲染,但将其扩展到动态场景却具有挑战性,这主要是由于学习结构和时间上一致的运动表示很困难。这一挑战在现有方法中通常表现为三个局限性:高斯更新冗余、运动监督不足以及复杂非刚性变形的弱建模。这些问题共同阻碍了连贯和高效的动态重建。为了解决这些局限性,我们提出了HAIF-GS,这是一个通过稀疏锚点驱动变形实现结构和一致动态建模的统一框架。它首先通过锚点过滤器识别运动相关区域,以抑制静态区域中的冗余更新。自监督的诱导流引导变形模块利用多帧特征聚合诱导锚点运动,从而消除了对显式流标签的需求。为了进一步处理细微的变形,分层锚点传播机制基于运动复杂性提高锚点分辨率,并传播多级变换。在合成和真实世界的基准测试上的大量实验验证了HAIF-GS在渲染质量、时间连贯性和重建效率方面显著优于先前的动态3DGS方法。
论文及项目相关链接
PDF Accepted to NeurIPS 2025. Project page: https://echopickle.github.io/HAIF-GS.github.io/
Summary
动态三维场景的单目视频重建仍是三维视觉领域的基本挑战。针对现有方法在动态场景重建中出现的冗余高斯更新、运动监督不足以及复杂非刚性变形建模较弱等问题,我们提出了HAIF-GS统一框架。它通过稀疏锚点驱动变形进行结构化、一致性的动态建模,有效解决了上述问题。通过多层次实验验证,HAIF-GS在渲染质量、时间连贯性和重建效率方面均显著优于其他动态3DGS方法。
Key Takeaways
- 动态三维场景重建仍是3D视觉领域的核心挑战。
- 现有方法在动态场景重建中存在冗余高斯更新、运动监督不足和复杂非刚性变形建模较弱等问题。
- HAIF-GS框架通过稀疏锚点驱动变形进行结构化、一致性的动态建模,以解决上述问题。
- Anchor Filter能识别运动相关区域,抑制静态区域的冗余更新。
- 引入了自监督的Induced Flow-Guided Deformation模块,利用多帧特征聚合诱导锚点运动,无需明确的流标签。
- Hierarchical Anchor Propagation机制能提高锚点分辨率,根据运动复杂性传播多层次变换。
点此查看论文截图
GS4: Generalizable Sparse Splatting Semantic SLAM
Authors:Mingqi Jiang, Chanho Kim, Chen Ziwen, Li Fuxin
Traditional SLAM algorithms excel at camera tracking, but typically produce incomplete and low-resolution maps that are not tightly integrated with semantics prediction. Recent work integrates Gaussian Splatting (GS) into SLAM to enable dense, photorealistic 3D mapping, yet existing GS-based SLAM methods require per-scene optimization that is slow and consumes an excessive number of Gaussians. We present GS4, the first generalizable GS-based semantic SLAM system. Compared with prior approaches, GS4 runs 10x faster, uses 10x fewer Gaussians, and achieves state-of-the-art performance across color, depth, semantic mapping and camera tracking. From an RGB-D video stream, GS4 incrementally builds and updates a set of 3D Gaussians using a feed-forward network. First, the Gaussian Prediction Model estimates a sparse set of Gaussian parameters from input frame, which integrates both color and semantic prediction with the same backbone. Then, the Gaussian Refinement Network merges new Gaussians with the existing set while avoiding redundancy. Finally, we propose to optimize GS for only 1-5 iterations that corrects drift and floaters when significant pose changes are detected. Experiments on the real-world ScanNet and ScanNet++ benchmarks demonstrate state-of-the-art semantic SLAM performance, with strong generalization capability shown through zero-shot transfer to the NYUv2 and TUM RGB-D datasets.
传统SLAM算法在摄像头跟踪方面表现出色,但通常生成不完整且分辨率低的地图,无法紧密地融入语义预测。最近的研究将高斯溅墨法(GS)集成到SLAM中,以实现密集、逼真的三维映射,但现有的基于GS的SLAM方法需要对每个场景进行优化,这既缓慢又消耗了大量高斯数。我们提出了GS4,这是第一个基于GS的可通用的语义SLAM系统。与以前的方法相比,GS4运行速度提高了10倍,使用的高斯数减少了10倍,并在颜色、深度、语义映射和摄像头跟踪方面达到了最新性能水平。GS4通过RGB-D视频流逐步构建并更新一组三维高斯数,使用前馈网络进行处理。首先,高斯预测模型从输入帧估计一组稀疏的高斯参数,同时整合颜色和语义预测。然后,高斯细化网络将新高斯数与现有集合合并,同时避免冗余。最后,我们提出仅对GS进行一到五次迭代优化,当检测到显著姿态变化时,可纠正漂移和浮动问题。在现实世界ScanNet和ScanNet++基准测试上的实验证明了其领先的语义SLAM性能,强大的泛化能力通过零样本迁移到NYUv2和TUM RGB-D数据集上得以展示。
论文及项目相关链接
PDF 17 pages, 6 figures
Summary
本文介绍了GS4,一个基于高斯涂绘(GS)的语义SLAM系统。相较于传统SLAM算法和现有GS-based SLAM方法,GS4实现了快速、高效的密集三维映射,具有更强的泛化能力。它使用RGB-D视频流进行增量更新和优化,集成了颜色和语义预测,提高了地图的完整性和分辨率。
Key Takeaways
- GS4是首个基于高斯涂绘(GS)的语义SLAM系统,实现了快速、高效的三维映射。
- GS4使用RGB-D视频流进行增量更新和优化,提高了地图的完整性和分辨率。
- GS4通过集成颜色和语义预测,提高了地图的光照真实感和语义丰富性。
- GS4采用稀疏高斯参数估计和冗余避免策略,实现了更高效的高斯涂绘处理。
- GS4仅进行少量迭代优化,可在检测到显著姿态变化时纠正漂移和浮动问题。
- 实验结果表明,GS4在ScanNet和ScanNet++等真实世界数据集上实现了最先进的语义SLAM性能。
点此查看论文截图
LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering
Authors:Jonas Kulhanek, Marie-Julie Rakotosaona, Fabian Manhardt, Christina Tsalicoglou, Michael Niemeyer, Torsten Sattler, Songyou Peng, Federico Tombari
In this work, we present a novel level-of-detail (LOD) method for 3D Gaussian Splatting that enables real-time rendering of large-scale scenes on memory-constrained devices. Our approach introduces a hierarchical LOD representation that iteratively selects optimal subsets of Gaussians based on camera distance, thus largely reducing both rendering time and GPU memory usage. We construct each LOD level by applying a depth-aware 3D smoothing filter, followed by importance-based pruning and fine-tuning to maintain visual fidelity. To further reduce memory overhead, we partition the scene into spatial chunks and dynamically load only relevant Gaussians during rendering, employing an opacity-blending mechanism to avoid visual artifacts at chunk boundaries. Our method achieves state-of-the-art performance on both outdoor (Hierarchical 3DGS) and indoor (Zip-NeRF) datasets, delivering high-quality renderings with reduced latency and memory requirements.
在这项工作中,我们提出了一种用于3D高斯平铺(3DGS)的新型细节层次(LOD)方法,该方法能够在内存受限的设备上实现大规模场景的实时渲染。我们的方法引入了一种层次型LOD表示,根据摄像机距离迭代选择最优的高斯子集,从而大大降低了渲染时间和GPU内存的使用。我们通过应用深度感知的3D平滑滤波器来构建每个LOD级别,然后基于重要性进行修剪和微调以维持视觉保真度。为了进一步优化内存开销,我们将场景划分为空间块,并在渲染时动态加载相关的高斯值,采用不透明度混合机制以避免块边界处的视觉伪影。我们的方法在户外(分层3DGS)和室内(Zip-NeRF)数据集上都达到了最先进的技术水平,能够以较低延迟和内存需求实现高质量渲染效果。
论文及项目相关链接
PDF NeurIPS 2025; Web: https://lodge-gs.github.io/
Summary
该研究提出了一种用于实时渲染大规模场景的新型细节层次(LOD)方法,适用于内存受限的设备。该方法引入了一种层次型LOD表示,根据摄像机距离迭代选择最优的高斯子集,从而大大减少渲染时间和GPU内存使用。通过深度感知的3D平滑滤波器构建每个LOD级别,通过重要性修剪和微调保持视觉保真度。通过将场景划分为空间块并动态加载渲染期间的有关高斯值,进一步减少内存开销,采用不透明度混合机制避免块边界处的视觉伪影。该方法在户外和室内数据集上均实现了卓越性能,以低延迟和内存要求提供高质量渲染。
Key Takeaways
- 引入了一种新型的细节层次(LOD)方法,适用于实时渲染大规模场景在内存受限的设备上。
- 基于摄像机距离选择最优的高斯子集,显著减少渲染时间和GPU内存使用。
- 通过深度感知的3D平滑滤波器构建每个LOD层次,增强视觉效果。
- 实施重要性修剪和微调来保持视觉保真度。
- 通过划分场景为空间块并动态加载相关高斯值,降低内存开销。
- 采用不透明度混合机制避免块边界的视觉伪影。
点此查看论文截图