⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-09-12 更新
SplatFill: 3D Scene Inpainting via Depth-Guided Gaussian Splatting
Authors:Mahtab Dahaghin, Milind G. Padalkar, Matteo Toso, Alessio Del Bue
3D Gaussian Splatting (3DGS) has enabled the creation of highly realistic 3D scene representations from sets of multi-view images. However, inpainting missing regions, whether due to occlusion or scene editing, remains a challenging task, often leading to blurry details, artifacts, and inconsistent geometry. In this work, we introduce SplatFill, a novel depth-guided approach for 3DGS scene inpainting that achieves state-of-the-art perceptual quality and improved efficiency. Our method combines two key ideas: (1) joint depth-based and object-based supervision to ensure inpainted Gaussians are accurately placed in 3D space and aligned with surrounding geometry, and (2) we propose a consistency-aware refinement scheme that selectively identifies and corrects inconsistent regions without disrupting the rest of the scene. Evaluations on the SPIn-NeRF dataset demonstrate that SplatFill not only surpasses existing NeRF-based and 3DGS-based inpainting methods in visual fidelity but also reduces training time by 24.5%. Qualitative results show our method delivers sharper details, fewer artifacts, and greater coherence across challenging viewpoints.
3D高斯延展(3DGS)技术能够从多视角图像集中创建出高度逼真的3D场景表示。然而,修复缺失区域,无论是因为遮挡还是场景编辑,仍然是一项具有挑战性的任务,经常导致细节模糊、出现伪影和几何不一致。在这项工作中,我们引入了SplatFill,这是一种用于3DGS场景修复的新型深度指导方法,它实现了最先进的感知质量和效率提升。我们的方法结合了两种关键思想:(1)基于深度和基于对象的联合监督,以确保修复的高斯分布被准确地放置在三维空间中并与周围的几何结构对齐;(2)我们提出了一种一致性感知的细化方案,能够选择性地识别和修正不一致的区域,而不破坏场景的其余部分。在SPIn-NeRF数据集上的评估表明,SplatFill不仅在视觉保真度上超越了现有的NeRF和3DGS修复方法,而且将训练时间缩短了24.5%。定性结果表明,我们的方法提供了更清晰的细节、更少的伪影,并且在具有挑战性的视角之间具有更大的连贯性。
论文及项目相关链接
Summary
本文介绍了基于深度引导的3DGS场景补全新方法——SplatFill。该方法结合了深度基和对象基的监督方式,确保补全的Gaussians在三维空间中准确放置并与周围几何结构对齐。同时,提出了一种一致性感知的细化方案,能够选择性识别和修正不一致区域,不影响其余场景。在SPIn-NeRF数据集上的评估表明,SplatFill不仅在视觉保真度上超越了现有的NeRF和3DGS补全方法,而且将训练时间缩短了24.5%。
Key Takeaways
- 3D Gaussian Splatting (3DGS)可用于从多视角图像创建高度逼真的3D场景表示。
- 场景补全(inpainting)是3DGS中的一个挑战,常导致模糊细节、伪影和几何不一致。
- SplatFill是一种新型的深度引导3DGS场景补全方法,确保补全的Gaussians在三维空间中的准确放置和与周围几何结构的对齐。
- SplatFill结合了深度基和对象基的监督方式。
- SplatFill提出了一致性感知的细化方案,能够选择性识别和修正不一致区域,保持场景的一致性。
- 在SPIn-NeRF数据集上的评估显示,SplatFill在视觉保真度和训练效率上均优于其他方法。
点此查看论文截图


HairGS: Hair Strand Reconstruction based on 3D Gaussian Splatting
Authors:Yimin Pan, Matthias Nießner, Tobias Kirschstein
Human hair reconstruction is a challenging problem in computer vision, with growing importance for applications in virtual reality and digital human modeling. Recent advances in 3D Gaussians Splatting (3DGS) provide efficient and explicit scene representations that naturally align with the structure of hair strands. In this work, we extend the 3DGS framework to enable strand-level hair geometry reconstruction from multi-view images. Our multi-stage pipeline first reconstructs detailed hair geometry using a differentiable Gaussian rasterizer, then merges individual Gaussian segments into coherent strands through a novel merging scheme, and finally refines and grows the strands under photometric supervision. While existing methods typically evaluate reconstruction quality at the geometric level, they often neglect the connectivity and topology of hair strands. To address this, we propose a new evaluation metric that serves as a proxy for assessing topological accuracy in strand reconstruction. Extensive experiments on both synthetic and real-world datasets demonstrate that our method robustly handles a wide range of hairstyles and achieves efficient reconstruction, typically completing within one hour. The project page can be found at: https://yimin-pan.github.io/hair-gs/
人类头发的重建是计算机视觉领域的一个挑战性问题,随着其在虚拟现实和数字人类建模等领域的应用不断增长,其重要性日益凸显。基于三维高斯点云(3DGS)的最新进展提供了高效且明确的三维场景表示方法,这些方法自然符合头发丝缕的结构。在这项工作中,我们扩展了3DGS框架,以支持从多视角图像进行发丝级别的头发几何重建。我们的多阶段流程首先使用可微分的高斯光栅化器重建详细的头发几何形状,然后通过一种新的合并方案将单独的高斯段合并为连贯的丝缕,最后在光测数据的监督下对丝缕进行细化和生长。现有的方法通常只在几何层面评估重建质量,但往往忽略了头发丝缕的连接性和拓扑结构。为了解决这一问题,我们提出了一种新的评估指标,作为评估丝缕重建拓扑精度的代理指标。在合成和真实数据集上的大量实验表明,我们的方法能够稳健地处理各种发型,实现高效的重建,通常在一小时内完成。项目页面可在以下链接找到:https://yimin-pan.github.io/hair-gs/。
论文及项目相关链接
PDF This is the arXiv preprint of the paper “Hair Strand Reconstruction based on 3D Gaussian Splatting” published at BMVC 2025. Project website: https://yimin-pan.github.io/hair-gs/
Summary
本文介绍了基于3D Gaussians Splatting(3DGS)的毛发重建技术。该技术通过多阶段管道实现,包括使用可微分的高斯渲染器重建详细毛发几何结构,通过新颖的融合方案将高斯片段合并为连贯的毛发束,并在光度监督下对毛发束进行精细化和生长。此外,提出了一种新的评估指标,用于评估毛发束重建的拓扑准确性。实验结果表明,该方法对各种发型具有稳健的处理能力,可实现高效的重建。
Key Takeaways
- 3DGS技术用于解决计算机视觉中毛发重建的挑战性问题。
- 技术扩展至能够进行基于多视角图像的毛发几何重建。
- 采用多阶段管道,包括几何重建、高斯片段合并和毛发束精细化生长。
- 提出新的评估指标来评估毛发重建的拓扑准确性。
- 方法对各种发型具有稳健性,并在合成和真实世界数据集上实现高效重建。
- 该技术通常在一小时内完成重建过程。
点此查看论文截图



OmniMap: A General Mapping Framework Integrating Optics, Geometry, and Semantics
Authors:Yinan Deng, Yufeng Yue, Jianyu Dou, Jingyu Zhao, Jiahui Wang, Yujie Tang, Yi Yang, Mengyin Fu
Robotic systems demand accurate and comprehensive 3D environment perception, requiring simultaneous capture of photo-realistic appearance (optical), precise layout shape (geometric), and open-vocabulary scene understanding (semantic). Existing methods typically achieve only partial fulfillment of these requirements while exhibiting optical blurring, geometric irregularities, and semantic ambiguities. To address these challenges, we propose OmniMap. Overall, OmniMap represents the first online mapping framework that simultaneously captures optical, geometric, and semantic scene attributes while maintaining real-time performance and model compactness. At the architectural level, OmniMap employs a tightly coupled 3DGS-Voxel hybrid representation that combines fine-grained modeling with structural stability. At the implementation level, OmniMap identifies key challenges across different modalities and introduces several innovations: adaptive camera modeling for motion blur and exposure compensation, hybrid incremental representation with normal constraints, and probabilistic fusion for robust instance-level understanding. Extensive experiments show OmniMap’s superior performance in rendering fidelity, geometric accuracy, and zero-shot semantic segmentation compared to state-of-the-art methods across diverse scenes. The framework’s versatility is further evidenced through a variety of downstream applications, including multi-domain scene Q&A, interactive editing, perception-guided manipulation, and map-assisted navigation.
机器人系统需要准确而全面的3D环境感知,要求同时捕捉逼真的外观(光学)、精确的布局形状(几何)和开放词汇场景理解(语义)。现有方法通常只能部分满足这些要求,同时表现出光学模糊、几何不规则和语义模糊。为了应对这些挑战,我们提出了OmniMap。总体而言,OmniMap是第一个在线映射框架,能够同时捕获光学、几何和语义场景属性,同时保持实时性能和模型紧凑性。在架构层面,OmniMap采用紧密耦合的3DGS-体素混合表示,将精细建模与结构稳定性相结合。在实现层面,OmniMap识别了不同模态的关键挑战,并引入了多项创新:自适应相机建模用于运动模糊和曝光补偿、带有法线约束的混合增量表示、以及用于稳健实例级理解的概率融合。大量实验表明,与最新技术相比,OmniMap在渲染保真度、几何准确性和零样本语义分割方面表现出卓越性能。该框架的通用性进一步通过各种下游应用得到证明,包括多域场景问答、交互编辑、感知引导操作和地图辅助导航。
论文及项目相关链接
PDF Accepted by IEEE Transactions on Robotics (TRO), project website: https://omni-map.github.io/
Summary
OmniMap是首个同时捕捉光学、几何和语义场景属性的在线映射框架,具备实时性能和模型紧凑性。它通过紧密结合的3DGS-体素混合表示,实现了精细建模与结构稳定性的结合。该框架解决了不同模态的关键挑战,引入多项创新技术,包括自适应相机建模、混合增量表示法以及概率融合等。实验表明,OmniMap在渲染保真度、几何精度和零射击语义分割方面优于现有方法,且适用于多种下游应用。
Key Takeaways
- OmniMap是首个同时捕捉光学、几何和语义场景属性的在线映射框架。
- OmniMap实现了实时性能和模型紧凑性的平衡。
- OmniMap采用紧密结合的3DGS-体素混合表示,实现精细建模与结构稳定性的结合。
- OmniMap解决了不同模态的关键挑战,如运动模糊、曝光补偿、正常约束和实例级别的理解。
- OmniMap通过多项创新技术,包括自适应相机建模、混合增量表示法和概率融合,提高了性能。
- 实验表明,OmniMap在渲染保真度、几何精度和语义分割方面优于现有方法。
点此查看论文截图





DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation
Authors:Ze-Xin Yin, Jiaxiong Qiu, Liu Liu, Xinjie Wang, Wei Sui, Zhizhong Su, Jian Yang, Jin Xie
The labor- and experience-intensive creation of 3D assets with physically based rendering (PBR) materials demands an autonomous 3D asset creation pipeline. However, most existing 3D generation methods focus on geometry modeling, either baking textures into simple vertex colors or leaving texture synthesis to post-processing with image diffusion models. To achieve end-to-end PBR-ready 3D asset generation, we present Lightweight Gaussian Asset Adapter (LGAA), a novel framework that unifies the modeling of geometry and PBR materials by exploiting multi-view (MV) diffusion priors from a novel perspective. The LGAA features a modular design with three components. Specifically, the LGAA Wrapper reuses and adapts network layers from MV diffusion models, which encapsulate knowledge acquired from billions of images, enabling better convergence in a data-efficient manner. To incorporate multiple diffusion priors for geometry and PBR synthesis, the LGAA Switcher aligns multiple LGAA Wrapper layers encapsulating different knowledge. Then, a tamed variational autoencoder (VAE), termed LGAA Decoder, is designed to predict 2D Gaussian Splatting (2DGS) with PBR channels. Finally, we introduce a dedicated post-processing procedure to effectively extract high-quality, relightable mesh assets from the resulting 2DGS. Extensive quantitative and qualitative experiments demonstrate the superior performance of LGAA with both text-and image-conditioned MV diffusion models. Additionally, the modular design enables flexible incorporation of multiple diffusion priors, and the knowledge-preserving scheme leads to efficient convergence trained on merely 69k multi-view instances. Our code, pre-trained weights, and the dataset used will be publicly available via our project page: https://zx-yin.github.io/dreamlifting/.
基于物理渲染(PBR)材料的3D资产创建是一个劳动力和经验密集的过程,需要自主的3D资产创建流水线。然而,现有的大多数3D生成方法主要关注几何建模,要么将纹理烘焙成简单的顶点颜色,要么将纹理合成留给使用图像扩散模型的后期处理。为了实现端到端的PBR准备3D资产生成,我们提出了轻量级高斯资产适配器(LGAA),这是一个新的框架,它通过利用多视图(MV)扩散先验值从全新角度对几何和PBR材料进行建模。LGAA具有模块化设计,包含三个组件。具体来说,LGAA包装器重用并适应MV扩散模型的网络层,这些网络层封装了从数十亿张图像中获得的知识,能够以数据高效的方式实现更好的收敛。为了结合用于几何和PBR合成的多个扩散先验值,LGAA切换器对齐多个封装不同知识的LGAA包装器层。然后,设计了一个驯化的变分自编码器(VAE),称为LGAA解码器,用于预测具有PBR通道的2D高斯喷射(2DGS)。最后,我们引入了一个专门的后期处理过程,以有效地从生成的2DGS中提取高质量的可重新照亮的网格资产。大量的定量和定性实验表明,LGAA在文本和图像条件下的MV扩散模型中都表现出卓越的性能。此外,模块化设计使得可以灵活地结合多个扩散先验值,知识保留方案在仅使用69k多视图实例进行训练时实现了有效的收敛。我们的代码、预训练权重以及所使用的数据集将通过我们的项目页面公开可用:https://zx-yin.github.io/dreamlifting/。
论文及项目相关链接
PDF 14 pages, 7 figures, project page: https://zx-yin.github.io/dreamlifting/
Summary
本文介绍了一种新型的自主3D资产创建管道Lightweight Gaussian Asset Adapter(LGAA),该管道能够统一几何模型和基于物理的渲染(PBR)材料建模。通过利用多视角(MV)扩散先验知识,LGAA实现了端到端的PBR准备3D资产生成。LGAA的特点是具有模块化设计,包括LGAA Wrapper、LGAA Switcher和LGAA Decoder三个部分。最后通过特定的后处理流程,从结果中提取高质量、可重新照明的网格资产。
Key Takeaways
- 提出了新型的自主3D资产创建管道Lightweight Gaussian Asset Adapter(LGAA)。
- LGAA实现了几何模型和基于物理的渲染(PBR)材料建模的统一。
- 利用多视角(MV)扩散先验知识,实现了端到端的PBR准备3D资产生成。
- LGAA具有模块化设计,包括LGAA Wrapper、LGAA Switcher和LGAA Decoder三个关键组件。
- 通过特定的后处理流程,可以从结果中提取高质量、可重新照明的网格资产。
- 定量和定性实验证明了LGAA的优越性能,无论是文本还是图像条件下的MV扩散模型。
点此查看论文截图




PINGS: Gaussian Splatting Meets Distance Fields within a Point-Based Implicit Neural Map
Authors:Yue Pan, Xingguang Zhong, Liren Jin, Louis Wiesmann, Marija Popović, Jens Behley, Cyrill Stachniss
Robots benefit from high-fidelity reconstructions of their environment, which should be geometrically accurate and photorealistic to support downstream tasks. While this can be achieved by building distance fields from range sensors and radiance fields from cameras, realising scalable incremental mapping of both fields consistently and at the same time with high quality is challenging. In this paper, we propose a novel map representation that unifies a continuous signed distance field and a Gaussian splatting radiance field within an elastic and compact point-based implicit neural map. By enforcing geometric consistency between these fields, we achieve mutual improvements by exploiting both modalities. We present a novel LiDAR-visual SLAM system called PINGS using the proposed map representation and evaluate it on several challenging large-scale datasets. Experimental results demonstrate that PINGS can incrementally build globally consistent distance and radiance fields encoded with a compact set of neural points. Compared to state-of-the-art methods, PINGS achieves superior photometric and geometric rendering at novel views by constraining the radiance field with the distance field. Furthermore, by utilizing dense photometric cues and multi-view consistency from the radiance field, PINGS produces more accurate distance fields, leading to improved odometry estimation and mesh reconstruction. We also provide an open-source implementation of PING at: https://github.com/PRBonn/PINGS.
机器人受益于其环境的高保真重建,这需要在几何上准确且光感逼真,以支持下游任务。虽然可以通过从距离传感器构建距离场和从相机构建辐射场来实现这一点,但实现这两个字段的可扩展增量映射,同时保持持续一致性并且达到高质量仍然具有挑战性。在本文中,我们提出了一种新型地图表示方法,该方法在弹性紧凑的点基隐式神经网络中将连续符号距离场和高斯喷射辐射场统一起来。通过在这些字段之间强制执行几何一致性,我们利用这两种模式实现了相互改进。我们提出了一种使用所提议的地图表示的新型激光雷达视觉SLAM系统,称为PINGS,并在几个具有挑战性的大规模数据集上对其进行了评估。实验结果表明,PINGS可以增量构建全局一致的距离和辐射场,并使用紧凑的神经点集进行编码。与最新方法相比,PINGS通过约束辐射场与距离场,在新的视角上实现了卓越的光度学和几何渲染。此外,通过利用辐射场的密集光度线索和多视角一致性,PINGS产生了更精确的距离场,从而提高了里程计估计和网格重建。我们还提供了PING的开源实现:https://github.com/PRBonn/PINGS。
论文及项目相关链接
PDF 15 pages, 8 figures, presented at RSS 2025
Summary
本文提出一种新型地图表示方法,将连续符号距离场和基于高斯喷溅技术的辐射场统一于弹性紧凑的点基隐神经地图中。通过强制这两个场之间的几何一致性,利用两种模式实现相互改进。实验结果表明,使用所提地图表示的LiDAR-视觉SLAM系统PINGS,能够在具有挑战性的大规模数据集上逐步构建全局一致的距离和辐射场,并使用紧凑的神经点集进行编码。与现有方法相比,通过约束辐射场与距离场,PINGS在新视角下的光度和几何渲染效果更优越。此外,PINGS利用辐射场的密集光度线索和多视角一致性,产生更精确的距离场,从而改进了odometry估计和网格重建。
Key Takeaways
- 高保真环境重建对机器人执行下游任务至关重要,需要几何准确性和光栅化现实。
- 通过距离场和辐射场的构建来实现环境的高保真重建,但一致且高质量地同时映射这两个场具有挑战性。
- 新型地图表示方法整合连续符号距离场和基于高斯喷溅技术的辐射场于弹性紧凑的点基隐神经地图中。
- 通过强制几何一致性,利用距离场和辐射场实现相互提升效果。
- PINGS系统使用新型地图表示,在大型数据集上实现了全局一致的距离和辐射场的增量构建。
- PINGS在新型视角下的光度和几何渲染效果优于现有方法。
- PINGS利用辐射场的密集光度线索和多视角一致性来提高距离场的准确性,进而改善odometry估计和网格重建。
点此查看论文截图



Don’t Splat your Gaussians: Volumetric Ray-Traced Primitives for Modeling and Rendering Scattering and Emissive Media
Authors:Jorge Condor, Sebastien Speierer, Lukas Bode, Aljaz Bozic, Simon Green, Piotr Didyk, Adrian Jarabo
Efficient scene representations are essential for many computer graphics applications. A general unified representation that can handle both surfaces and volumes simultaneously, remains a research challenge. Inspired by recent methods for scene reconstruction that leverage mixtures of 3D Gaussians to model radiance fields, we formalize and generalize the modeling of scattering and emissive media using mixtures of simple kernel-based volumetric primitives. We introduce closed-form solutions for transmittance and free-flight distance sampling for different kernels, and propose several optimizations to use our method efficiently within any off-the-shelf volumetric path tracer. We demonstrate our method as a compact and efficient alternative to other forms of volume modeling for forward and inverse rendering of scattering media. Furthermore, we adapt and showcase our method in radiance field optimization and rendering, providing additional flexibility compared to current state of the art given its ray-tracing formulation. We also introduce the Epanechnikov kernel and demonstrate its potential as an efficient alternative to the traditionally-used Gaussian kernel in scene reconstruction tasks. The versatility and physically-based nature of our approach allows us to go beyond radiance fields and bring to kernel-based modeling and rendering any path-tracing enabled functionality such as scattering, relighting and complex camera models.
高效场景表示对于许多计算机图形学应用至关重要。一个能够同时处理表面和体积的通用统一表示,仍然是一个研究挑战。受最近利用混合三维高斯建模辐射场的场景重建方法的启发,我们形式化并概括了使用基于简单内核的体积原语的混合物对散射和发光介质的建模。我们为不同的内核引入了透射和自由飞行距离采样的封闭形式解,并提出了若干优化,以在任何现成的体积路径追踪器中高效地使用我们的方法。我们演示了我们的方法,作为一种紧凑有效的替代其他形式的体积建模,用于散射介质的正向和逆向渲染。此外,我们适应并展示了我们的方法在辐射场优化和渲染中的应用,与当前最先进的射线追踪公式相比,提供了额外的灵活性。我们还介绍了Epanechnikov内核,并展示了其在场景重建任务中作为传统高斯内核的高效替代品的潜力。我们的方法的通用性和物理基础性质使我们超越了辐射场,并将基于内核的建模和渲染任何路径追踪功能,如散射、重新照明和复杂的相机模型等。
论文及项目相关链接
PDF 17 pages, 17 figures
Summary
本文介绍了一种基于混合简单核体积基元对散射和发射介质进行建模的方法。该方法通过闭式解方案解决透光率和自由飞行距离采样问题,并优化算法,可在任何现成的体积路径追踪器中高效使用。此方法作为体积建模的紧凑和高效替代方案,展示了其在散射介质的正向和逆向渲染中的应用。此外,文章还展示了其在辐射场优化和渲染中的适应性,并引入了Epanechnikov核,作为场景重建任务中传统使用的Gaussian核的高效替代方案。该方法的通用性和物理基础使其能够超越辐射场,并引入任何路径追踪功能支持的基于内核的建模和渲染,如散射、重新照明和复杂相机模型。
Key Takeaways
- 介绍了混合简单核体积基元建模散射和发射介质的方法。
- 提供了透光率和自由飞行距离采样的闭式解决方案。
- 方法可高效地在任何现成的体积路径追踪器中使用。
- 作为紧凑和高效的体积建模替代方案,用于散射介质的正向和逆向渲染。
- 展示了方法在辐射场优化和渲染中的适应性。
- 引入了Epanechnikov核作为场景重建中Gaussian核的高效替代。
点此查看论文截图


