3DGS

发布日期: 2025-11-06

更新日期: 2025-11-27

文章字数: 13.9k

阅读时长: 56 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-11-06 更新

PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing

Authors:Antonio Oroz, Matthias Nießner, Tobias Kirschstein

We present PercHead, a method for single-image 3D head reconstruction and semantic 3D editing - two tasks that are inherently challenging due to severe view occlusions, weak perceptual supervision, and the ambiguity of editing in 3D space. We develop a unified base model for reconstructing view-consistent 3D heads from a single input image. The model employs a dual-branch encoder followed by a ViT-based decoder that lifts 2D features into 3D space through iterative cross-attention. Rendering is performed using Gaussian Splatting. At the heart of our approach is a novel perceptual supervision strategy based on DINOv2 and SAM2.1, which provides rich, generalized signals for both geometric and appearance fidelity. Our model achieves state-of-the-art performance in novel-view synthesis and, furthermore, exhibits exceptional robustness to extreme viewing angles compared to established baselines. Furthermore, this base model can be seamlessly extended for semantic 3D editing by swapping the encoder and finetuning the network. In this variant, we disentangle geometry and style through two distinct input modalities: a segmentation map to control geometry and either a text prompt or a reference image to specify appearance. We highlight the intuitive and powerful 3D editing capabilities of our model through a lightweight, interactive GUI, where users can effortlessly sculpt geometry by drawing segmentation maps and stylize appearance via natural language or image prompts. Project Page: https://antoniooroz.github.io/PercHead Video: https://www.youtube.com/watch?v=4hFybgTk4kE

我们提出了PercHead方法，这是一种用于单图像3D头部重建和语义3D编辑的方法——这两项任务由于严重的视图遮挡、微弱的感知监督和3D空间编辑的模糊性而具有固有的挑战性。我们开发了一个统一的基准模型，用于从单个输入图像重建视角一致的3D头部。该模型采用双分支编码器，其后是基于ViT的解码器，通过迭代交叉注意力将2D特征提升到3D空间。渲染是使用高斯平铺技术完成的。我们的方法的核心在于一种基于DINOv2和SAM2.1的新型感知监督策略，它为几何和外观保真度提供了丰富的一般化信号。我们的模型在新视角合成方面达到了最新技术水平，并且与既定的基准相比，对极端视角表现出了异常的稳健性。此外，这个基准模型可以通过交换编码器和微调网络无缝扩展进行语义3D编辑。在这个变种中，我们通过两种独特的输入模式：分割图控制几何，文本提示或参考图像指定外观，来分离几何和风格。我们通过一个轻量级的交互式GUI突出我们模型的直观和强大的3D编辑功能，用户可以通过绘制分割图轻松地塑造几何，并通过自然语言或图像提示来个性化外观。项目页面：https://antoniooroz.github.io/PercHead 视频：https://www.youtube.com/watch?v=4hFybgTk4kE。

Summary

本文介绍了PercHead方法，该方法用于单图像3D头部重建和语义3D编辑。该方法通过统一的基础模型，采用双分支编码器与ViT解码器相结合的方式，从单一输入图像重建视角一致的3D头部。通过迭代交叉注意力将2D特征提升到3D空间，并使用高斯贴图进行渲染。其创新之处在于基于DINOv2和SAM2.1的感知监督策略，为几何和外观保真度提供了丰富的一般化信号。该方法在新型视图合成方面取得了最先进的性能，并对极端视角表现出惊人的稳健性。此外，该基础模型可无缝扩展用于语义3D编辑，通过交换编码器和微调网络来实现。用户可以通过轻量级的交互GUI直观、强大地编辑3D模型，通过绘制分割图来塑造几何形状，并通过自然语言或图像提示来个性化外观。

Key Takeaways

PercHead是一个用于单图像3D头部重建和语义3D编辑的方法。
它使用统一的基础模型，结合双分支编码器和ViT解码器，从单一图像重建视角一致的3D头部。
该方法采用迭代交叉注意力将2D特征提升到3D空间，并通过高斯贴图进行渲染。
创新的感知监督策略基于DINOv2和SAM2.1，为几何和外观保真度提供丰富信号。
在新型视图合成方面取得先进性能，并对极端视角展现出稳健性。
基础模型可扩展到语义3D编辑，允许通过交换编码器和微调网络进行个性化编辑。

Cool Papers

点此查看论文截图

Object-Centric 3D Gaussian Splatting for Strawberry Plant Reconstruction and Phenotyping

Authors:Jiajia Li, Keyi Zhu, Qianwen Zhang, Dong Chen, Qi Sun, Zhaojian Li

Strawberries are among the most economically significant fruits in the United States, generating over $2 billion in annual farm-gate sales and accounting for approximately 13% of the total fruit production value. Plant phenotyping plays a vital role in selecting superior cultivars by characterizing plant traits such as morphology, canopy structure, and growth dynamics. However, traditional plant phenotyping methods are time-consuming, labor-intensive, and often destructive. Recently, neural rendering techniques, notably Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have emerged as powerful frameworks for high-fidelity 3D reconstruction. By capturing a sequence of multi-view images or videos around a target plant, these methods enable non-destructive reconstruction of complex plant architectures. Despite their promise, most current applications of 3DGS in agricultural domains reconstruct the entire scene, including background elements, which introduces noise, increases computational costs, and complicates downstream trait analysis. To address this limitation, we propose a novel object-centric 3D reconstruction framework incorporating a preprocessing pipeline that leverages the Segment Anything Model v2 (SAM-2) and alpha channel background masking to achieve clean strawberry plant reconstructions. This approach produces more accurate geometric representations while substantially reducing computational time. With a background-free reconstruction, our algorithm can automatically estimate important plant traits, such as plant height and canopy width, using DBSCAN clustering and Principal Component Analysis (PCA). Experimental results show that our method outperforms conventional pipelines in both accuracy and efficiency, offering a scalable and non-destructive solution for strawberry plant phenotyping.

草莓在美国是具有极高经济价值的水果之一，每年农场销售额超过2亿美元，约占水果生产总值的13%。植物表型鉴定在通过形态、冠层结构和生长动态等植物特征来选择优良品种方面起着至关重要的作用。然而，传统的植物表型鉴定方法耗时、劳力密集且常常具有破坏性。最近，神经渲染技术，特别是神经辐射场（NeRF）和三维高斯贴图（3DGS），已经涌现为高精度的三维重建的强大框架。这些方法能够通过捕捉目标植物周围的一系列多视角图像或视频，实现复杂植物结构的非破坏性重建。尽管它们具有潜力，但目前在农业领域应用的大多数3DGS技术都是重建整个场景，包括背景元素，这引入了噪声，增加了计算成本，并使得下游特征分析复杂化。为了解决这一局限性，我们提出了一种新的以对象为中心的三维重建框架，它结合了预处理管道，利用“任何事物分割模型”第二版（SAM-2）和alpha通道背景屏蔽技术来实现清洁的草莓植物重建。这种方法产生了更准确的几何表示，同时大大减少了计算时间。通过无背景重建，我们的算法可以自动估计重要的植物特征，如植物高度和冠层宽度，使用DBSCAN聚类分析和主成分分析（PCA）。实验结果表明，我们的方法在准确性和效率方面都优于传统管道，为草莓植物表型鉴定提供了可扩展和非破坏性的解决方案。

论文及项目相关链接

PDF 11 pages, 4 figures, 3 tables

摘要
神经渲染技术，如神经辐射场（NeRF）和三维高斯喷涂（3DGS），为高质量三维重建提供了强大框架。本文通过捕捉目标植物的多视角图像或视频，实现了复杂植物架构的非破坏性重建。为解决农业领域中3DGS应用普遍存在的背景噪声问题，提出一种结合预处理管道的新颖对象中心三维重建框架，利用Segment Anything Model v2（SAM-2）和alpha通道背景遮蔽技术，实现清洁草莓植物重建。此方法在提高几何表示准确性的同时，大幅减少计算时间，并能自动估计植物重要特征。

关键见解

草莓在美国的经济重要性：草莓是美国重要的水果作物，年销售额超过2亿美元，占水果生产总值的约13%。
植物表型选择的重要性：植物表型在选育优良品种中起关键作用，通过表征植物形态、冠层结构和生长动态来筛选。
传统植物表型方法的局限性：传统方法耗时、劳力密集且常具有破坏性。
神经渲染技术在植物重建中的应用：NeRF和3DGS等神经渲染技术为高质量三维重建提供了新方法，可实现复杂植物架构的非破坏性重建。
当前3DGS技术的挑战：现有应用常重建整个场景，包括背景元素，导致噪声、计算成本增加及后续特征分析复杂化。
新颖的对象中心三维重建框架：结合SAM-2模型和alpha通道背景遮蔽技术，实现草莓植物清洁重建，提高几何表示准确性并减少计算时间。

Cool Papers

点此查看论文截图

3D Gaussian Radiation Field Modeling for Integrated RIS-FAS Systems: Analysis and Optimization

Authors:Kaining Wang, Bo Yang, Yusheng Lei, Zhiwen Yu, Xuelin Cao, Liang Wang, Bin Guo, George C. Alexandropoulos, Mérouane Debbah, Zhu Han

The integration of reconfigurable intelligent surfaces (RIS) and fluid antenna systems (FAS) has attracted considerable attention due to its tremendous potential in enhancing wireless communication performance. However, under fast-fading channel conditions, rapidly and effectively performing joint optimization of the antenna positions in an FAS system and the RIS phase configuration remains a critical challenge. Traditional optimization methods typically rely on complex iterative computations, thus making it challenging to obtain optimal solutions in real time within dynamic channel environments. To address this issue, this paper introduces a field information-driven optimization method based on three-dimensional Gaussian radiation-field modeling for real-time optimization of integrated FAS-RIS systems. In the proposed approach, obstacles are treated as virtual transmitters and, by separately learning the amplitude and phase variations, the model can quickly generate high-precision channel information based on the transmitter’s position. This design eliminates the need for extensive pilot overhead and cumbersome computations. On this framework, an alternating optimization scheme is presented to jointly optimize the FAS position and the RIS phase configuration. Simulation results demonstrate that the proposed method significantly outperforms existing approaches in terms of spectrum prediction accuracy, convergence speed, and minimum achievable rate, validating its effectiveness and practicality in fast-fading scenarios.

将可重构智能表面（RIS）与流体天线系统（FAS）的集成因其增强无线通信性能的巨大潜力而备受关注。然而，在快速衰落信道条件下，快速有效地对FAS系统中的天线位置和RIS相位配置进行联合优化仍然是一个关键挑战。传统优化方法通常依赖于复杂的迭代计算，因此在动态信道环境中实时获得最优解具有挑战性。为解决这一问题，本文引入了一种基于三维高斯辐射场建模的场信息驱动优化方法，用于FAS-RIS集成系统的实时优化。在所提出的方法中，障碍物被视为虚拟发射器，通过分别学习振幅和相位变化，模型可以快速基于发射器的位置生成高精度信道信息。这一设计消除了需要大量试点开销和繁琐计算的需求。在该框架下，提出了一种交替优化方案，以联合优化FAS位置和RIS相位配置。仿真结果表明，所提方法在频谱预测精度、收敛速度和最小可达速率方面显著优于现有方法，验证了其在快速衰落场景中的有效性和实用性。

论文及项目相关链接

PDF

Summary

可重构智能表面（RIS）与流体天线系统（FAS）的融合因其在增强无线通信性能方面的巨大潜力而备受关注。然而，在快速衰减的信道条件下，如何快速有效地对FAS系统中的天线位置和RIS相位配置进行优化仍是关键挑战。传统优化方法通常依赖于复杂的迭代计算，难以在动态信道环境中实时获得最优解。本文提出一种基于三维高斯辐射场建模的现场信息驱动优化方法，用于FAS-RIS集成系统的实时优化。该方法将障碍物视为虚拟发射器，通过分别学习振幅和相位变化，可以快速生成基于发射器位置的高精度信道信息，从而消除大量试点开销和繁琐计算。模拟结果表明，该方法在频谱预测精度、收敛速度和最低可达速率方面显著优于现有方法，验证了其在快速衰减场景中的有效性和实用性。

Key Takeaways

可重构智能表面（RIS）与流体天线系统（FAS）的融合对于增强无线通信性能具有巨大潜力。
在快速衰减的信道条件下，联合优化FAS中的天线位置和RIS相位配置是一项关键挑战。
传统优化方法面临实时获得最优解的困难，尤其在动态信道环境中。
本文提出一种基于三维高斯辐射场建模的现场信息驱动优化方法，将障碍物视为虚拟发射器，快速生成高精度信道信息。
该方法通过分别学习振幅和相位变化，消除了大量试点开销和繁琐计算。
模拟结果表明，该方法在频谱预测精度、收敛速度和最低可达速率方面显著优于现有方法。

Cool Papers

点此查看论文截图

GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies

Authors:Ziye Wang, Li Kang, Yiran Qin, Jiahua Ma, Zhanglin Peng, Lei Bai, Ruimao Zhang

Recently, effective coordination in embodied multi-agent systems has remained a fundamental challenge, particularly in scenarios where agents must balance individual perspectives with global environmental awareness. Existing approaches often struggle to balance fine-grained local control with comprehensive scene understanding, resulting in limited scalability and compromised collaboration quality. In this paper, we present GauDP, a novel Gaussian-image synergistic representation that facilitates scalable, perception-aware imitation learning in multi-agent collaborative systems. Specifically, GauDP constructs a globally consistent 3D Gaussian field from decentralized RGB observations, then dynamically redistributes 3D Gaussian attributes to each agent’s local perspective. This enables all agents to adaptively query task-critical features from the shared scene representation while maintaining their individual viewpoints. This design facilitates both fine-grained control and globally coherent behavior without requiring additional sensing modalities (e.g., 3D point cloud). We evaluate GauDP on the RoboFactory benchmark, which includes diverse multi-arm manipulation tasks. Our method achieves superior performance over existing image-based methods and approaches the effectiveness of point-cloud-driven methods, while maintaining strong scalability as the number of agents increases.

近期，实体多智能体系统的有效协调仍然是一个基本挑战，特别是在智能体必须平衡个体视角与全局环境意识的场景中。现有方法往往很难在精细的局部控制和全面的场景理解之间取得平衡，导致可扩展性有限和协作质量受损。在本文中，我们提出了GauDP，这是一种新的高斯图像协同表示法，它有助于在多智能体协作系统中实现可扩展的、感知意识的学习过程。具体来说，GauDP会从分散的RGB观测中构建全局一致的3D高斯场，然后动态地将3D高斯属性重新分配到每个智能体的局部视角。这使得所有智能体都能从共享的场景表示中自适应地查询任务关键特征，同时保持其个体观点。这种设计既有助于精细控制又有助于全局一致的行为，而无需额外的感知模式（例如，点云）。我们在RoboFactory基准测试上对GauDP进行了评估，该测试包括多种多臂操作任务。我们的方法在性能上优于现有的基于图像的方法，并接近点云驱动方法的有效性，随着智能体数量的增加，其强大的可扩展性得以保持。

论文及项目相关链接

PDF Accepted by NeurIPS 2025. Project page: https://ziyeeee.github.io/gaudp.io/

Summary

本文提出了一种名为GauDP的新型高斯图像协同表示方法，该方法可以在多智能体协作系统中实现可扩展、感知感知的模仿学习。GauDP通过分散的RGB观测构建全局一致的3D高斯场，然后动态地将3D高斯属性重新分配给每个智能体的局部视角。这允许智能体在保持其个人观点的同时，从共享的场景表示中自适应地查询任务关键特征。这种设计既可实现精细控制，又可实现全局一致的行为，无需额外的感知模式（如3D点云）。在RoboFactory基准测试中，我们的方法在多种多臂操作任务上实现了卓越的性能，超越了现有的图像方法，并接近了点云驱动的方法的有效性，同时在智能体数量增加时保持了强大的可扩展性。

Key Takeaways

多智能体系统的协调问题仍是核心挑战，特别是在平衡个体视角和全局环境意识方面。
现有方法难以在精细的局部控制和全面的场景理解之间取得平衡。
GauDP是一种新型的高斯图像协同表示方法，能在多智能体协作系统中实现感知感知的模仿学习。
GauDP通过构建全局一致的3D高斯场，并从分散的RGB观测中获取信息。
GauDP能够动态分配3D高斯属性到每个智能体的局部视角，实现自适应查询任务关键特征。
该设计允许智能体在保持个体观点的同时查询共享场景信息，实现了精细控制和全局行为的一致性。

Cool Papers

点此查看论文截图

4D Neural Voxel Splatting: Dynamic Scene Rendering with Voxelized Guassian Splatting

Authors:Chun-Tin Wu, Jun-Cheng Chen

Although 3D Gaussian Splatting (3D-GS) achieves efficient rendering for novel view synthesis, extending it to dynamic scenes still results in substantial memory overhead from replicating Gaussians across frames. To address this challenge, we propose 4D Neural Voxel Splatting (4D-NVS), which combines voxel-based representations with neural Gaussian splatting for efficient dynamic scene modeling. Instead of generating separate Gaussian sets per timestamp, our method employs a compact set of neural voxels with learned deformation fields to model temporal dynamics. The design greatly reduces memory consumption and accelerates training while preserving high image quality. We further introduce a novel view refinement stage that selectively improves challenging viewpoints through targeted optimization, maintaining global efficiency while enhancing rendering quality for difficult viewing angles. Experiments demonstrate that our method outperforms state-of-the-art approaches with significant memory reduction and faster training, enabling real-time rendering with superior visual fidelity.

尽管3D高斯点描技术（3D-GS）在实现新颖视角合成方面具有较高的渲染效率，但将其扩展到动态场景时，由于跨帧复制高斯点而造成的内存开销仍然很大。为了应对这一挑战，我们提出了4D神经网络体素点描技术（4D-NVS），它将基于体素的表示方法与神经网络高斯点描相结合，以实现高效的动态场景建模。我们的方法不是为每个时间戳生成单独的高斯集，而是采用紧凑的神经网络体素集和学习的变形场来对时间动态进行建模。这种设计极大地减少了内存消耗并加速了训练，同时保持了高质量图像。我们还进一步引入了一个新的视点优化阶段，通过有针对性的优化有选择地改进具有挑战性的视点，在保持全局效率的同时提高了困难观看角度的渲染质量。实验表明，我们的方法在内存使用和训练速度上均优于现有先进技术，实现了具有卓越视觉保真度的实时渲染。

论文及项目相关链接

PDF 10 pages, 7 figures

Summary

本文提出一种名为4D神经网络体素渲染（NVS）的技术，该技术结合了体素和神经网络高斯渲染，以高效地对动态场景进行建模。该方法通过采用紧凑的神经网络体素集和学习的变形场来模拟动态场景的时间变化，而非为每个时间戳生成独立的高斯集，从而大幅降低了内存消耗并加速了训练过程，同时保证了高图像质量。同时，本文还引入了一种新颖的视点优化阶段，通过有针对性的优化选择性地改进了具有挑战性的视点，在保持全局效率的同时提高了困难视角的渲染质量。实验表明，该方法在内存使用和训练速度上均优于现有技术，实现了实时高质量渲染。

Key Takeaways

提出了全新的动态场景建模技术——4D神经网络体素渲染（NVS）。
结合体素和神经网络高斯渲染，实现高效动态场景建模。
采用紧凑的神经网络体素集和学习的变形场模拟动态场景的时间变化。
有效降低了内存消耗并加速了训练过程。
引入新颖的视点优化阶段，提高困难视角的渲染质量。
实验证明该方法在内存使用和训练速度上优于现有技术。

Cool Papers

点此查看论文截图

GS-Verse: Mesh-based Gaussian Splatting for Physics-aware Interaction in Virtual Reality

Authors:Anastasiya Pechko, Piotr Borycki, Joanna Waczyńska, Daniel Barczyk, Agata Szymańska, Sławomir Tadeja, Przemysław Spurek

As the demand for immersive 3D content grows, the need for intuitive and efficient interaction methods becomes paramount. Current techniques for physically manipulating 3D content within Virtual Reality (VR) often face significant limitations, including reliance on engineering-intensive processes and simplified geometric representations, such as tetrahedral cages, which can compromise visual fidelity and physical accuracy. In this paper, we introduce GS-Verse (Gaussian Splatting for Virtual Environment Rendering and Scene Editing), a novel method designed to overcome these challenges by directly integrating an object’s mesh with a Gaussian Splatting (GS) representation. Our approach enables more precise surface approximation, leading to highly realistic deformations and interactions. By leveraging existing 3D mesh assets, GS-Verse facilitates seamless content reuse and simplifies the development workflow. Moreover, our system is designed to be physics-engine-agnostic, granting developers robust deployment flexibility. This versatile architecture delivers a highly realistic, adaptable, and intuitive approach to interactive 3D manipulation. We rigorously validate our method against the current state-of-the-art technique that couples VR with GS in a comparative user study involving 18 participants. Specifically, we demonstrate that our approach is statistically significantly better for physics-aware stretching manipulation and is also more consistent in other physics-based manipulations like twisting and shaking. Further evaluation across various interactions and scenes confirms that our method consistently delivers high and reliable performance, showing its potential as a plausible alternative to existing methods.

随着对沉浸式3D内容的需求不断增长，对直观、高效交互方法的需求变得至关重要。当前在虚拟现实（VR）中物理操作3D内容的技巧常常面临重大局限，包括依赖于工程密集过程和简化几何表示（如四面体笼子），这可能会损害视觉保真和物理准确性。在本文中，我们介绍了GS-Verse（用于虚拟环境渲染和场景编辑的高斯摊开法），这是一种旨在通过直接将对象的网格与高斯摊开法（GS）表示相结合来克服这些挑战的新方法。我们的方法使表面近似更加精确，导致变形和交互非常逼真。通过利用现有的3D网格资产，GS-Verse实现了无缝的内容再利用，并简化了开发工作流程。此外，我们的系统设计为独立于物理引擎，为开发者提供了强大的部署灵活性。这种通用架构提供了一种高度逼真、适应性强和直观的方法来执行交互式3D操作。我们通过一项涉及18名参与者的比较用户研究，严格验证了我们方法与当前最先进的将VR与GS相结合的技术。具体来说，我们证明我们的方法在物理感知拉伸操作方面统计上显著更好，并且在其他基于物理的操作（如扭曲和摇晃）中也更加一致。对各种交互和场景进行进一步评估证实，我们的方法始终提供高可靠性能，显示出作为现有方法的可行替代方案的潜力。

论文及项目相关链接

PDF

Summary

随着对沉浸式3D内容需求的增长，对直观高效交互方法的需求变得至关重要。本文提出GS-Verse（高斯贴图虚拟环境渲染和场景编辑），一种通过直接整合对象网格与高斯贴图（GS）表示来克服现有挑战的新方法。它能实现更精确的表面近似，实现高度逼真的变形和交互。GS-Verse利用现有3D网格资产，促进无缝内容复用，简化开发流程。此外，我们的系统设计具有物理引擎独立性，为开发者提供强大的部署灵活性。这一通用架构为实现高度逼真、适应性强和直观的交互式3D操作提供了方法。经过对18名参与者的比较用户研究，我们的方法被证明在物理感知拉伸操作方面在统计上显著优于当前最先进的结合VR和GS的技术，并且在其他物理基础操作如扭曲和摇晃方面也更一致。进一步在不同交互和场景下的评估表明，我们的方法始终提供高可靠性能，显示出作为现有方法的可行替代方案的潜力。

Key Takeaways

随着对沉浸式3D内容需求的增长，需要更直观和高效的交互方法来满足用户需求。
当前VR中的3D内容交互方法存在工程过程复杂和几何表示简化的问题，可能影响视觉保真度和物理准确性。
GS-Verse方法通过直接整合对象网格与高斯贴图（GS）表示，克服了现有挑战。
GS-Verse能实现更精确的表面近似，提供高度逼真的变形和交互体验。
GS-Verse利用现有3D网格资产，简化开发流程，促进无缝内容复用。
系统设计具有物理引擎独立性，为开发者提供强大的部署灵活性。
GS-Verse在物理感知拉伸操作方面表现优异，并在其他物理基础操作中也表现出一致性和可靠性。

Cool Papers

点此查看论文截图

Advances in Feed-Forward 3D Reconstruction and View Synthesis: A Survey

Authors:Jiahui Zhang, Yuelei Li, Anpei Chen, Muyu Xu, Kunhao Liu, Jianyuan Wang, Xiao-Xiao Long, Hanxue Liang, Zexiang Xu, Hao Su, Christian Theobalt, Christian Rupprecht, Andrea Vedaldi, Kaichen Zhou, Paul Pu Liang, Shijian Lu, Fangneng Zhan

3D reconstruction and view synthesis are foundational problems in computer vision, graphics, and immersive technologies such as augmented reality (AR), virtual reality (VR), and digital twins. Traditional methods rely on computationally intensive iterative optimization in a complex chain, limiting their applicability in real-world scenarios. Recent advances in feed-forward approaches, driven by deep learning, have revolutionized this field by enabling fast and generalizable 3D reconstruction and view synthesis. This survey offers a comprehensive review of feed-forward techniques for 3D reconstruction and view synthesis, with a taxonomy according to the underlying representation architectures including point cloud, 3D Gaussian Splatting (3DGS), Neural Radiance Fields (NeRF), etc. We examine key tasks such as pose-free reconstruction, dynamic 3D reconstruction, and 3D-aware image and video synthesis, highlighting their applications in digital humans, SLAM, robotics, and beyond. In addition, we review commonly used datasets with detailed statistics, along with evaluation protocols for various downstream tasks. We conclude by discussing open research challenges and promising directions for future work, emphasizing the potential of feed-forward approaches to advance the state of the art in 3D vision.

3D重建和视图合成是计算机视觉、图形学以及增强现实（AR）、虚拟现实（VR）和数字孪生等沉浸式技术中的基础问题。传统方法依赖于复杂链中的计算密集型迭代优化，在真实场景中的应用受到限制。最近，以深度学习为驱动的前馈方法的进步，已经彻底改变了这一领域，实现了快速和通用的3D重建和视图合成。这篇综述对前馈技术在3D重建和视图合成方面的应用进行了全面的回顾，根据底层表示架构，包括点云、3D高斯喷绘（3DGS）、神经辐射场（NeRF）等进行了分类。我们研究了关键任务，如无姿态重建、动态3D重建和三维感知图像和视频合成等，突出了它们在数字人类、SLAM、机器人技术等领域的应用。此外，我们还回顾了常用的数据集及其详细统计数据，以及各种下游任务的评估协议。最后，我们讨论了当前的研究挑战以及未来工作的有前途的方向，强调了前馈方法在推动3D视觉技术前沿的潜力。

论文及项目相关链接

PDF A project page associated with this survey is available at https://fnzhan.com/projects/Feed-Forward-3D

Summary

本文综述了基于深度学习的反馈前馈技术在三维重建和视图合成方面的最新进展。文章详细介绍了点云、三维高斯溅射（3DGS）、神经辐射场（NeRF）等底层表示架构的分类，并探讨了姿势自由重建、动态三维重建、三维感知图像和视频合成等关键任务在数字人、SLAM、机器人等领域的应用。同时，本文还回顾了常用的数据集和评估协议，并讨论了当前的研究挑战和未来有前景的研究方向，强调了反馈前馈方法在推动三维视觉技术方面的潜力。

Key Takeaways

3D重建和视图合成是计算机视觉、图形学和沉浸式技术（如增强现实、虚拟现实和数字孪生）的基础问题。
传统方法受限于计算密集型的迭代优化，在真实场景中的应用有限。
深度学习驱动的反馈前馈方法已经革命化了3D重建和视图合成的领域，实现了快速和通用的重建和合成。
关键任务包括姿势自由重建、动态3D重建和3D感知图像及视频合成，应用于数字人、SLAM、机器人等领域。
文章综述了不同数据集的详细统计信息和各种下游任务的评估协议。

Cool Papers

点此查看论文截图

3DBonsai: Structure-Aware Bonsai Modeling Using Conditioned 3D Gaussian Splatting

Authors:Hao Wu, Hao Wang, Ruochong Li, Xuran Ma, Hui Xiong

Recent advancements in text-to-3D generation have shown remarkable results by leveraging 3D priors in combination with 2D diffusion. However, previous methods utilize 3D priors that lack detailed and complex structural information, limiting them to generating simple objects and presenting challenges for creating intricate structures such as bonsai. In this paper, we propose 3DBonsai, a novel text-to-3D framework for generating 3D bonsai with complex structures. Technically, we first design a trainable 3D space colonization algorithm to produce bonsai structures, which are then enhanced through random sampling and point cloud augmentation to serve as the 3D Gaussian priors. We introduce two bonsai generation pipelines with distinct structural levels: fine structure conditioned generation, which initializes 3D Gaussians using a 3D structure prior to produce detailed and complex bonsai, and coarse structure conditioned generation, which employs a multi-view structure consistency module to align 2D and 3D structures. Moreover, we have compiled a unified 2D and 3D Chinese-style bonsai dataset. Our experimental results demonstrate that 3DBonsai significantly outperforms existing methods, providing a new benchmark for structure-aware 3D bonsai generation.

近期文本到3D生成的进展，通过结合3D先验知识和2D扩散，取得了显著的效果。然而，之前的方法利用的3D先验知识缺乏详细和复杂的结构信息，仅限于生成简单物体，对于创建复杂的结构如盆景，存在挑战。在本文中，我们提出了3DBonsai，一个用于生成具有复杂结构的3D盆景的文本到3D框架。在技术上，我们首先设计了一个可训练的3D空间殖民化算法来生成盆景结构，然后通过随机采样和点云增强来增强这些结构，作为3D高斯先验知识。我们介绍了两种具有不同结构层次的盆景生成管道：以精细结构为条件的生成，使用3D结构先验来初始化3D高斯以生成详细而复杂的盆景；以粗糙结构为条件的生成，采用多视图结构一致性模块来对齐2D和3D结构。此外，我们编译了一个统一的2D和3D中式盆景数据集。我们的实验结果表明，3DBonsai显著优于现有方法，为结构感知的3D盆景生成提供了新的基准。

论文及项目相关链接

PDF

Summary

本文提出一种名为3DBonsai的新型文本到三维生成框架，用于生成具有复杂结构的盆景。通过结合三维先验知识和二维扩散技术，实现了对盆景精细结构的生成。文章介绍了一种可训练的三维空间殖民化算法，用于生成盆景结构，并通过随机采样和点云增强技术优化这些结构，将其作为三维高斯先验。文章还介绍了两种盆景生成管道，分别用于生成不同结构层次的产品。实验结果表明，3DBonsai在结构感知的三维盆景生成方面显著优于现有方法，为相关领域树立了新的基准。

Key Takeaways

3DBonsai框架利用三维先验和二维扩散技术，实现了对具有复杂结构的盆景的三维生成。
提出一种可训练的三维空间殖民化算法，用于生成盆景的基本结构。
通过随机采样和点云增强技术优化生成的盆景结构，将其作为三维高斯先验。
3DBonsai包含两种盆景生成管道，分别适用于不同结构层次的生成需求。
框架引入了多视图结构一致性模块，用于对齐二维和三维结构。
编制了一个统一的二维和三维中式盆景数据集。

Cool Papers

点此查看论文截图

SonarSplat: Novel View Synthesis of Imaging Sonar via Gaussian Splatting

Authors:Advaith V. Sethuraman, Max Rucker, Onur Bagoren, Pou-Chun Kung, Nibarkavi N. B. Amutha, Katherine A. Skinner

In this paper, we present SonarSplat, a novel Gaussian splatting framework for imaging sonar that demonstrates realistic novel view synthesis and models acoustic streaking phenomena. Our method represents the scene as a set of 3D Gaussians with acoustic reflectance and saturation properties. We develop a novel method to efficiently rasterize Gaussians to produce a range/azimuth image that is faithful to the acoustic image formation model of imaging sonar. In particular, we develop a novel approach to model azimuth streaking in a Gaussian splatting framework. We evaluate SonarSplat using real-world datasets of sonar images collected from an underwater robotic platform in a controlled test tank and in a real-world river environment. Compared to the state-of-the-art, SonarSplat offers improved image synthesis capabilities (+3.2 dB PSNR) and more accurate 3D reconstruction (77% lower Chamfer Distance). We also demonstrate that SonarSplat can be leveraged for azimuth streak removal.

本文介绍了SonarSplat，这是一种新颖的用于成像声呐的高斯模糊框架，它能够实现真实的新型视图合成，并模拟声迹现象。我们的方法将场景表示为具有声反射和饱和属性的3D高斯集合。我们开发了一种新方法，可以有效地将高斯量化为范围/方位图像，该图像忠实于成像声呐的声成像模型。特别是，我们在高斯模糊框架中开发了一种新型方法来模拟方位模糊。我们使用从水下机器人平台收集的声呐图像的真实数据集对SonarSplat进行了评估，这些图像是在受控测试罐和真实河流环境中收集的。与最新技术相比，SonarSplat提供了更好的图像合成能力（+3.2分贝峰值信噪比），并且3D重建更准确（降低了7 误差距离）。我们还证明了SonarSplat可用于消除方位模糊。

论文及项目相关链接

PDF

摘要
SonarSplat是一款新型的高斯混染框架，可实现真实的成像声纳新视角合成，并模拟声波折射现象。该场景以带有声反射和饱和属性的三维高斯集来表现。通过开发一种新颖的高效栅格化高斯法，产生忠于声纳成像原理的范围/方位图像。尤其是发展了一种新型方法来模拟高斯混染框架中的方位折射现象。在现实世界采集的水下机器人平台测试水池和真实河流环境的声纳图像数据集上评估SonarSplat性能，相较于其他技术，SonarSplat图像合成能力提升了（+3.2分贝峰值信噪比），并且三维重建更为精准（降低77%的Chamfer距离）。此外，SonarSplat还可以用于消除方位折射。

要点分析

SonarSplat是一个新颖的用于声纳成像的高斯混染框架。它可以呈现真实的新视角合成和模拟声波折射现象。这是该论文的核心创新点之一。
该方法将场景表示为具有声反射和饱和属性的三维高斯集。这为声纳图像提供了一个更为精准且详尽的描述方式。这一点显示了其在数据分析和图像处理领域的潜力。
该论文提出一种高效栅格化高斯的方法以产生声音传播的路径（即方位图像），符合声学图像的构造原理。该创新的方法能极大地改善对声学传播的理解和预测准确性。这种技术和方法是另一个主要亮点。它让理论的声音传播更接近实际的物理环境状况，且易于理解和分析。
SonarSplat的性能在现实世界的数据集上得到了验证，这些数据集来自水下机器人平台在测试水池和真实河流环境中的声纳图像。这为未来的水下探测提供了可靠的依据。其优势在于相对于其他技术，其图像合成能力提升了（+3.2分贝峰值信噪比），三维重建更加精准（降低77%的Chamfer距离）。这是该论文的另一个重要发现。它证明了SonarSplat在实际应用中的有效性。

Cool Papers

点此查看论文截图

MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussians

Authors:Peng Chen, Xiaobao Wei, Qingpo Wuwu, Xinyi Wang, Xingyu Xiao, Ming Lu

Reconstructing high-fidelity 3D head avatars is crucial in various applications such as virtual reality. The pioneering methods reconstruct realistic head avatars with Neural Radiance Fields (NeRF), which have been limited by training and rendering speed. Recent methods based on 3D Gaussian Splatting (3DGS) significantly improve the efficiency of training and rendering. However, the surface inconsistency of 3DGS results in subpar geometric accuracy; later, 2DGS uses 2D surfels to enhance geometric accuracy at the expense of rendering fidelity. To leverage the benefits of both 2DGS and 3DGS, we propose a novel method named MixedGaussianAvatar for realistically and geometrically accurate head avatar reconstruction. Our main idea is to utilize 2D Gaussians to reconstruct the surface of the 3D head, ensuring geometric accuracy. We attach the 2D Gaussians to the triangular mesh of the FLAME model and connect additional 3D Gaussians to those 2D Gaussians where the rendering quality of 2DGS is inadequate, creating a mixed 2D-3D Gaussian representation. These 2D-3D Gaussians can then be animated using FLAME parameters. We further introduce a progressive training strategy that first trains the 2D Gaussians and then fine-tunes the mixed 2D-3D Gaussians. We use a unified mixed Gaussian representation to integrate the two modalities of 2D image and 3D mesh. Furthermore, the comprehensive experiments demonstrate the superiority of MixedGaussianAvatar. The code will be released.

重建高保真3D头像在虚拟现实等各种应用中至关重要。前沿方法使用神经辐射场（NeRF）重建逼真的头像，但受限于训练和渲染速度。基于3D高斯拼贴（3DGS）的最近的方法显著提高了训练和渲染的效率。然而，3DGS的表面不一致导致几何精度不佳；后来的2DGS使用2D表面元素以提高几何精度，但牺牲了渲染保真度。为了结合2DGS和3DGS的优点，我们提出了一种名为MixedGaussianAvatar的新方法，用于真实且几何准确的头像重建。我们的主要想法是使用2D高斯重建3D头部的表面，以确保几何精度。我们将2D高斯附加到FLAME模型的三角网格上，并在2DGS渲染质量不足的地方连接到额外的3D高斯，创建混合的2D-3D高斯表示。这些2D-3D高斯可以使用FLAME参数进行动画处理。我们还引入了一种渐进的训练策略，首先训练2D高斯，然后对混合的2D-3D高斯进行微调。我们使用统一的混合高斯表示来整合2D图像和3D网格的两种模式。此外，综合实验证明了MixedGaussianAvatar的优势。代码将发布。

论文及项目相关链接

PDF

摘要
采用神经网络辐射场（NeRF）技术的先进方法可以重建逼真的头部化身，但存在训练和渲染速度的限制。基于三维高斯拼贴（3DGS）的方法提高了训练和渲染效率，但几何精度不高。为此，本文提出一种结合二维高斯（MixedGaussianAvatar）与三维高斯的方法重建头部化身，采用二维高斯重建头部三维表面以确保几何精度。我们将其连接到FLAME模型的三角网格上，并在需要提高渲染质量的地方额外添加三维高斯值，从而构建出混合的二维-三维高斯表示。该表示方法可以通过FLAME参数进行动画渲染。我们还引入了一种渐进的训练策略，首先训练二维高斯值，然后微调混合的二维-三维高斯表示。通过一系列综合实验证明MixedGaussianAvatar的优势。代码将公开发布。

关键见解

采用神经网络辐射场（NeRF）技术重建头部化身具有逼真度高的优点，但面临训练和渲染速度的限制。
基于三维高斯拼贴（3DGS）的方法提高了训练和渲染效率，但几何精度有待提高。
提出了一种名为MixedGaussianAvatar的新方法，结合了二维和三维高斯技术来重建头部化身，旨在提高几何精度和渲染质量。
使用二维高斯重建头部三维表面，并将其连接到FLAME模型的三角网格上。
在需要提高渲染质量的地方添加额外的三维高斯值，形成混合的二维-三维高斯表示。
通过渐进的训练策略，首先训练二维高斯值，然后微调混合的二维-三维高斯表示，以提高性能。

Cool Papers

点此查看论文截图

Gaussian Splashing: Direct Volumetric Rendering Underwater

Authors:Nir Mualem, Roy Amoyal, Oren Freifeld, Derya Akkaynak

In underwater images, most useful features are occluded by water. The extent of the occlusion depends on imaging geometry and can vary even across a sequence of burst images. As a result, 3D reconstruction methods robust on in-air scenes, like Neural Radiance Field methods (NeRFs) or 3D Gaussian Splatting (3DGS), fail on underwater scenes. While a recent underwater adaptation of NeRFs achieved state-of-the-art results, it is impractically slow: reconstruction takes hours and its rendering rate, in frames per second (FPS), is less than 1. Here, we present a new method that takes only a few minutes for reconstruction and renders novel underwater scenes at 140 FPS. Named Gaussian Splashing, our method unifies the strengths and speed of 3DGS with an image formation model for capturing scattering, introducing innovations in the rendering and depth estimation procedures and in the 3DGS loss function. Despite the complexities of underwater adaptation, our method produces images at unparalleled speeds with superior details. Moreover, it reveals distant scene details with far greater clarity than other methods, dramatically improving reconstructed and rendered images. We demonstrate results on existing datasets and a new dataset we have collected. Additional visual results are available at: https://bgu-cs-vil.github.io/gaussiansplashingUW.github.io/ .

在水下图像中，大多数有用的特征都被水遮挡了。遮挡的程度取决于成像几何，甚至在连续的多帧图像中也会有所不同。因此，像神经辐射场方法（NeRFs）或三维高斯平铺（3DGS）这样在空气中场景表现稳健的三维重建方法在水下场景中会失效。虽然最近的水下NeRF适应技术取得了最新技术成果，但它实际上非常慢：重建需要数小时，并且以每秒帧数（FPS）为单位的渲染速率低于每秒一帧。在这里，我们提出了一种新的方法，重建过程只需几分钟时间，能以每秒一十四帧的速度渲染水下新场景。我们称之为高斯飞溅方法（Gaussian Splashing），它结合了三维高斯方法的优势和速度以及散射过程的图像形成模型，同时推进了渲染和深度估计的程序和三维高斯过程的损失函数设计。尽管在水下环境中有种种适应挑战，但我们的方法在创纪录的速度下仍然产生了优质且具有高清晰度的图像。与其他方法相比，它能揭示更远场景中的细节并大幅改进重建和渲染的图像质量。我们在现有的数据集和我们新收集的数据集上展示了我们的结果。其他可视化结果可在链接上查看。

论文及项目相关链接

PDF

Summary
针对水下图像的特点，提出了一种名为Gaussian Splashing的新方法，结合了3DGS的优势和速度，并配备了图像形成模型以捕捉散射。该方法在渲染和深度估计程序以及3DGS损失函数中进行了创新，能够迅速、详细地重建和渲染水下场景，揭示远距离场景的细节，并在现有数据集和新收集的数据集上展示了结果。

Key Takeaways

水下图像的特征经常被水遮挡，遮挡程度取决于成像几何，并且在连续图像中也会有所不同。
现有的针对空中场景的3D重建方法，如NeRFs和3DGS，在水下场景中并不适用。
一种新的水下NeRFs适应方法虽然达到了最新技术水平，但计算效率低下，重建需要数小时，渲染速率低于每秒一帧。
提出了一种名为Gaussian Splashing的新方法，能够在几分钟内完成重建并以每秒140帧的速度渲染水下场景。
Gaussian Splashing方法结合了3DGS的优势和速度，并配备了针对散射的图像形成模型。
该方法在渲染和深度估计程序以及3DGS损失函数方面进行了创新。

Cool Papers

点此查看论文截图

EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting

Authors:Bohao Liao, Wei Zhai, Zengyu Wan, Zhixin Cheng, Wenfei Yang, Tianzhu Zhang, Yang Cao, Zheng-Jun Zha

Scene reconstruction from casually captured videos has wide applications in real-world scenarios. With recent advancements in differentiable rendering techniques, several methods have attempted to simultaneously optimize scene representations (NeRF or 3DGS) and camera poses. Despite recent progress, existing methods relying on traditional camera input tend to fail in high-speed (or equivalently low-frame-rate) scenarios. Event cameras, inspired by biological vision, record pixel-wise intensity changes asynchronously with high temporal resolution, providing valuable scene and motion information in blind inter-frame intervals. In this paper, we introduce the event camera to aid scene construction from a casually captured video for the first time, and propose Event-Aided Free-Trajectory 3DGS, called EF-3DGS, which seamlessly integrates the advantages of event cameras into 3DGS through three key components. First, we leverage the Event Generation Model (EGM) to fuse events and frames, supervising the rendered views observed by the event stream. Second, we adopt the Contrast Maximization (CMax) framework in a piece-wise manner to extract motion information by maximizing the contrast of the Image of Warped Events (IWE), thereby calibrating the estimated poses. Besides, based on the Linear Event Generation Model (LEGM), the brightness information encoded in the IWE is also utilized to constrain the 3DGS in the gradient domain. Third, to mitigate the absence of color information of events, we introduce photometric bundle adjustment (PBA) to ensure view consistency across events and frames. We evaluate our method on the public Tanks and Temples benchmark and a newly collected real-world dataset, RealEv-DAVIS. Our project page is https://lbh666.github.io/ef-3dgs/.

场景重建在现实世界场景中的广泛应用通常基于偶然捕获的视频。随着可微分渲染技术的最新进展，一些方法已经尝试同时优化场景表示（NeRF或3DGS）和相机姿态。尽管近期有所进展，但依赖于传统相机输入的方法往往会在高速场景（或等效的低帧率场景）中失效。事件相机受到生物视觉的启发，能够异步记录像素级的强度变化，具有高的时间分辨率，在盲帧间隔中提供了宝贵的场景和运动信息。在本文中，我们首次引入事件相机来辅助从偶然捕获的视频中进行场景构建，并提出了事件辅助自由轨迹3DGS，称为EF-3DGS，它通过三个关键组件无缝集成了事件相机的优势。首先，我们利用事件生成模型（EGM）融合事件和帧，监督事件流观察到的渲染视图。其次，我们以分段的方式采用对比最大化（CMax）框架，通过最大化变形事件的图像对比度来提取运动信息，从而校准估计的姿态。此外，基于线性事件生成模型（LEGM），IWE中编码的亮度信息也用于在梯度域约束3DGS。第三，为了缓解事件色彩信息的缺失，我们引入了光度捆绑调整（PBA）以确保事件和帧之间的视图一致性。我们在公共的Tanks and Temples基准测试和新收集的现实世界数据集RealEv-DAVIS上评估了我们的方法。我们的项目页面是https://lbh666.github.io/ef-3dgs/。%E3%80%82)

论文及项目相关链接

PDF Accepted to NeurIPS 2025,Project Page: https://lbh666.github.io/ef-3dgs/

Summary
事件相机辅助从随意捕捉的视频中进行场景构建具有广泛应用。本文首次引入事件相机，提出事件辅助自由轨迹3DGS（EF-3DGS），通过三个关键组件无缝集成事件相机的优势。利用事件生成模型（EGM）融合事件和帧信息，采用对比最大化（CMax）框架提取运动信息，并基于线性事件生成模型（LEGM）利用亮度信息约束3DGS。此外，还引入光度捆绑调整（PBA）以确保事件和帧之间的视图一致性。

Key Takeaways

事件相机在场景构建中具有广泛应用，特别是在高速度或低帧率场景中。
本文首次将事件相机引入从随意捕捉的视频中进行场景构建。
提出了事件辅助自由轨迹3DGS（EF-3DGS），集成了事件相机的优势。
利用事件生成模型（EGM）融合事件和帧信息，监督事件流观察到的渲染视图。
采用对比最大化（CMax）框架提取运动信息，校准估计的姿势。
基于线性事件生成模型（LEGM），利用IWE中的亮度信息在梯度域约束3DGS。
引入光度捆绑调整（PBA）以确保事件和帧之间的视图一致性。

Cool Papers

点此查看论文截图

GASP: Gaussian Splatting for Physic-Based Simulations

Authors:Piotr Borycki, Weronika Smolak, Joanna Waczyńska, Marcin Mazur, Sławomir Tadeja, Przemysław Spurek

Physics simulation is paramount for modeling and utilizing 3D scenes in various real-world applications. However, integrating with state-of-the-art 3D scene rendering techniques such as Gaussian Splatting (GS) remains challenging. Existing models use additional meshing mechanisms, including triangle or tetrahedron meshing, marching cubes, or cage meshes. Alternatively, we can modify the physics-grounded Newtonian dynamics to align with 3D Gaussian components. Current models take the first-order approximation of a deformation map, which locally approximates the dynamics by linear transformations. In contrast, our GS for Physics-Based Simulations (GASP) pipeline uses parametrized flat Gaussian distributions. Consequently, the problem of modeling Gaussian components using the physics engine is reduced to working with 3D points. In our work, we present additional rules for manipulating Gaussians, demonstrating how to adapt the pipeline to incorporate meshes, control Gaussian sizes during simulations, and enhance simulation efficiency. This is achieved through the Gaussian grouping strategy, which implements hierarchical structuring and enables simulations to be performed exclusively on selected Gaussians. The resulting solution can be integrated into any physics engine that can be treated as a black box. As demonstrated in our studies, the proposed pipeline exhibits superior performance on a diverse range of benchmark datasets designed for 3D object rendering. The project webpage, which includes additional visualizations, can be found at https://waczjoan.github.io/GASP.

物理模拟在模拟和利用现实世界中的各种应用中的三维场景建模方面至关重要。然而，与最新三维场景渲染技术（如高斯溅出技术）相结合仍存在挑战。现有模型使用额外的网格化机制，包括三角形或四面体网格化、行立方或笼子网格。我们可以选择修改基于物理的牛顿动力学，使其与三维高斯组件保持一致。当前模型采用一阶变形映射近似值，通过线性变换局部近似动态。相比之下，我们的基于物理模拟的高斯溅出（GASP）管道使用参数化的平面高斯分布。因此，使用物理引擎对高斯组件进行建模的问题简化为处理三维点的问题。在我们的工作中，我们为操作高斯提出了额外的规则，展示了如何调整管道以融入网格、控制模拟过程中的高斯大小并提高模拟效率。这是通过高斯分组策略实现的，该策略实现了层次结构，使得模拟仅在对选定的高斯上执行。所得的解决方案可以集成到任何物理引擎中，物理引擎被视为一个黑盒子即可。根据我们的研究展示，该管道在针对三维对象渲染设计的各种基准数据集上表现出卓越的性能。项目网页（包含额外的可视化内容）可在https://waczjoan.github.io/GASP找到。

论文及项目相关链接

PDF

Summary

本文介绍了在物理模拟中引入高斯贴图（Gaussian Splatting，GS）技术的重要性及其面临的挑战。现有模型通常使用额外的网格机制进行集成，而本文提出的GS用于物理模拟（GASP）管道则采用参数化的平面高斯分布，简化了高斯组件的建模问题。通过引入高斯分组策略，提高了模拟效率，并展示了该管道在各种3D对象渲染基准测试集上的优越性能。

Key Takeaways