嘘~ 正在从服务器偷取页面 . . .

3DGS


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-04-04 更新

Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis

Authors:Niluthpol Chowdhury Mithun, Tuan Pham, Qiao Wang, Ben Southall, Kshitij Minhas, Bogdan Matei, Stephan Mandt, Supun Samarasekera, Rakesh Kumar

Recent advancements in 3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) have achieved impressive results in real-time 3D reconstruction and novel view synthesis. However, these methods struggle in large-scale, unconstrained environments where sparse and uneven input coverage, transient occlusions, appearance variability, and inconsistent camera settings lead to degraded quality. We propose GS-Diff, a novel 3DGS framework guided by a multi-view diffusion model to address these limitations. By generating pseudo-observations conditioned on multi-view inputs, our method transforms under-constrained 3D reconstruction problems into well-posed ones, enabling robust optimization even with sparse data. GS-Diff further integrates several enhancements, including appearance embedding, monocular depth priors, dynamic object modeling, anisotropy regularization, and advanced rasterization techniques, to tackle geometric and photometric challenges in real-world settings. Experiments on four benchmarks demonstrate that GS-Diff consistently outperforms state-of-the-art baselines by significant margins.

近年来,3D高斯Splatting(3DGS)和神经辐射场(NeRF)的进展在实时3D重建和新型视角合成方面取得了令人印象深刻的结果。然而,这些方法在大型、无约束的环境中表现较差,其中稀疏和不均匀的输入覆盖、短暂的遮挡、外观变化和不一致的相机设置导致质量下降。我们提出了GS-Diff,这是一种新型3DGS框架,由多视图扩散模型引导,以解决这些局限性。通过基于多视图输入生成伪观察结果,我们的方法将约束不足的3D重建问题转化为明确界定的问题,即使在稀疏数据的情况下也能实现稳健优化。GS-Diff还集成了几项改进,包括外观嵌入、单目深度先验、动态对象建模、各向异性正则化和先进的栅格化技术,以解决现实环境中的几何和光度挑战。在四个基准测试上的实验表明,GS-Diff始终显著优于最新基线。

论文及项目相关链接

PDF WACV ULTRRA Workshop 2025

Summary

近期,3D高斯展开(3DGS)和神经辐射场(NeRF)的最新进展在实时3D重建和新颖视图合成方面取得了令人印象深刻的结果。然而,这些方法在大型、无约束环境中面临挑战,如输入覆盖稀疏、不均匀、瞬时遮挡、外观变化和相机设置不一致等问题导致质量下降。为此,我们提出了GS-Diff,一种由多视角扩散模型引导的新型3DGS框架。该方法通过生成多视角输入的伪观察结果,将约束不足的3D重建问题转化为明确的问题,即使在稀疏数据的情况下也能实现稳健优化。GS-Diff还集成了外观嵌入、单眼深度先验、动态对象建模、各向异性正则化和高级光栅化技术等多个增强功能,以应对现实环境中的几何和光度挑战。在四个基准测试上的实验表明,GS-Diff显著优于最先进的基线方法。

Key Takeaways

  1. 3DGS和NeRF的最新进展已在实时3D重建和新颖视图合成上取得显著成果。
  2. 在大型、无约束环境中,现有方法面临稀疏和不均匀输入、瞬时遮挡、外观变化和相机设置不一致等挑战。
  3. GS-Diff是一种新型3DGS框架,通过生成伪观察结果解决上述问题,实现稳健优化,即使数据稀疏也能应对。
  4. GS-Diff集成了多个增强功能,包括外观嵌入、单眼深度先验、动态对象建模等,以应对现实环境中的几何和光度挑战。
  5. GS-Diff在四个基准测试上的实验表现优于现有最先进的基线方法。
  6. GS-Diff的方法是将伪观察结果与多视角输入相结合,将约束不足的3D重建问题转化为明确的问题。

Cool Papers

点此查看论文截图

GaussianLSS – Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting

Authors:Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen

Bird’s-eye view (BEV) perception has gained significant attention because it provides a unified representation to fuse multiple view images and enables a wide range of down-stream autonomous driving tasks, such as forecasting and planning. Recent state-of-the-art models utilize projection-based methods which formulate BEV perception as query learning to bypass explicit depth estimation. While we observe promising advancements in this paradigm, they still fall short of real-world applications because of the lack of uncertainty modeling and expensive computational requirement. In this work, we introduce GaussianLSS, a novel uncertainty-aware BEV perception framework that revisits unprojection-based methods, specifically the Lift-Splat-Shoot (LSS) paradigm, and enhances them with depth un-certainty modeling. GaussianLSS represents spatial dispersion by learning a soft depth mean and computing the variance of the depth distribution, which implicitly captures object extents. We then transform the depth distribution into 3D Gaussians and rasterize them to construct uncertainty-aware BEV features. We evaluate GaussianLSS on the nuScenes dataset, achieving state-of-the-art performance compared to unprojection-based methods. In particular, it provides significant advantages in speed, running 2.5x faster, and in memory efficiency, using 0.3x less memory compared to projection-based methods, while achieving competitive performance with only a 0.4% IoU difference.

鸟瞰视图(BEV)感知已经引起了广泛关注,因为它为多视图图像融合提供了统一表示,并可以支持广泛的下游自动驾驶任务,如预测和规划。最近最先进的模型采用基于投影的方法,将BEV感知制定为查询学习,以规避明确的深度估计。虽然我们在这一范式中看到了有前景的进展,但它们仍然因为缺乏不确定性建模和昂贵的计算要求而无法应用于现实世界。在这项工作中,我们引入了GaussianLSS,这是一种新型的不确定性感知BEV感知框架,它重新审视了基于非投影的方法,特别是提升-平铺-射击(LSS)范式,并通过深度不确定性建模增强了它们。GaussianLSS通过学习软深度均值并计算深度分布的方差来表示空间散布,这隐含地捕获了对象范围。然后我们将深度分布转换为3D高斯并进行栅格化,以构建不确定性感知的BEV特征。我们在nuscenes数据集上评估了GaussianLSS的性能,与基于非投影的方法相比,它实现了最先进的性能。尤其值得一提的是,它在速度上提供了显著优势,运行速度为基于投影的方法的2.5倍,在内存效率上也更为出色,使用了仅为基于投影方法的0.3倍的内存,同时在仅与投影方法存在0.4%的交并比差异的情况下实现了具有竞争力的性能。

论文及项目相关链接

PDF Accepted to CVPR 2025

Summary
本文提出一种名为GaussianLSS的不确定性感知鸟瞰图感知框架,该框架重新审视基于反投影的方法,特别是Lift-Splat-Shoot(LSS)范式,并通过深度不确定性建模进行增强。该方法学习深度分布的软均值并计算方差,将深度分布转换为三维高斯并对其进行栅格化,以构建不确定性感知鸟瞰图特征。在nuScenes数据集上评估表明,GaussianLSS实现了最先进的性能,与其他基于反投影的方法相比,其运行速度更快、内存效率更高,同时在交并比方面表现出竞争力。

Key Takeaways

  • BEV感知为融合多视角图像提供了统一表示,并能用于多种自动驾驶下游任务,如预测和规划。
  • 最新先进模型利用基于投影的方法,将BEV感知制定为查询学习以绕过明确的深度估计。
  • 尽管存在基于投影的方法的进展,但它们仍然因为缺乏不确定性建模和昂贵的计算要求而难以在真实世界应用。
  • GaussianLSS是一种不确定性感知的BEV感知框架,它重新审视基于反投影的方法(特别是LSS范式)并增强深度不确定性建模。
  • GaussianLSS通过学习和计算深度分布的软均值和方差来表示空间分布。
  • 该方法将深度分布转换为三维高斯并进行栅格化,以构建不确定性感知的BEV特征。

Cool Papers

点此查看论文截图

BOGausS: Better Optimized Gaussian Splatting

Authors:Stéphane Pateux, Matthieu Gendrin, Luce Morin, Théo Ladune, Xiaoran Jiang

3D Gaussian Splatting (3DGS) proposes an efficient solution for novel view synthesis. Its framework provides fast and high-fidelity rendering. Although less complex than other solutions such as Neural Radiance Fields (NeRF), there are still some challenges building smaller models without sacrificing quality. In this study, we perform a careful analysis of 3DGS training process and propose a new optimization methodology. Our Better Optimized Gaussian Splatting (BOGausS) solution is able to generate models up to ten times lighter than the original 3DGS with no quality degradation, thus significantly boosting the performance of Gaussian Splatting compared to the state of the art.

3D高斯融合(3DGS)为新型视图合成提供了一种高效的解决方案。其框架可实现快速且高保真渲染。虽然相较于其他解决方案(如神经辐射场(NeRF))不太复杂,但在构建不牺牲质量的小模型时仍面临一些挑战。在本研究中,我们对3DGS训练过程进行了仔细分析,并提出了一种新的优化方法。我们优化后的高斯融合(BOGausS)解决方案能够生成比原始3DGS轻十倍的模型,同时不降低质量,从而极大地提高了高斯融合与最新技术的性能表现。

论文及项目相关链接

PDF

Summary

本文介绍了三维高斯喷溅技术(3DGS)用于合成新颖视角的有效解决方案,并提出一种针对该技术的优化方法。经过仔细分析训练过程,提出了优化的高斯喷溅(BOGausS)方法,能在不损失质量的前提下生成体积为原模型十分之一的模型,大大提高了高斯喷溅技术的性能表现。

Key Takeaways

  • 3DGS 为合成新颖视角提供了一个高效的方法。
  • 其框架具有快速和高保真渲染的特点。
  • 与其他解决方案相比,如神经网络辐射场(NeRF),构建小型模型时仍面临挑战。
  • 本文详细分析了 3DGS 训练过程。
  • 新提出的 BOGausS 方法能够在不损失质量的前提下生成体积为原模型十分之一的模型。

Cool Papers

点此查看论文截图

3DBonsai: Structure-Aware Bonsai Modeling Using Conditioned 3D Gaussian Splatting

Authors:Hao Wu, Hao Wang, Ruochong Li, Xuran Ma, Hui Xiong

Recent advancements in text-to-3D generation have shown remarkable results by leveraging 3D priors in combination with 2D diffusion. However, previous methods utilize 3D priors that lack detailed and complex structural information, limiting them to generating simple objects and presenting challenges for creating intricate structures such as bonsai. In this paper, we propose 3DBonsai, a novel text-to-3D framework for generating 3D bonsai with complex structures. Technically, we first design a trainable 3D space colonization algorithm to produce bonsai structures, which are then enhanced through random sampling and point cloud augmentation to serve as the 3D Gaussian priors. We introduce two bonsai generation pipelines with distinct structural levels: fine structure conditioned generation, which initializes 3D Gaussians using a 3D structure prior to produce detailed and complex bonsai, and coarse structure conditioned generation, which employs a multi-view structure consistency module to align 2D and 3D structures. Moreover, we have compiled a unified 2D and 3D Chinese-style bonsai dataset. Our experimental results demonstrate that 3DBonsai significantly outperforms existing methods, providing a new benchmark for structure-aware 3D bonsai generation.

近期文本到3D生成的进展,通过利用3D先验和2D扩散的结合,取得了显著的效果。然而,之前的方法使用的3D先验缺乏详细和复杂的结构信息,仅限于生成简单物体,对于创建复杂的结构如盆景,构成了挑战。本文提出了3DBonsai,一个用于生成具有复杂结构的3D盆景的文本到3D新框架。从技术上讲,我们首先设计了一个可训练的3D空间殖民化算法来生成盆景结构,然后通过随机采样和点云增强来优化这些结构,作为3D高斯先验。我们介绍了两种具有不同结构层次的盆景生成管道:精细结构条件生成,使用3D结构先验初始化3D高斯来生成详细而复杂的盆景;粗结构条件生成,采用多视角结构一致性模块来对齐2D和3D结构。此外,我们还整合了一个统一的2D和3D中式盆景数据集。我们的实验结果表明,3DBonsai在结构感知的3D盆景生成方面显著优于现有方法,提供了新的基准测试。

论文及项目相关链接

PDF Accepted by ICME 2025

Summary

本文提出了一种名为3DBonsai的文本到三维生成框架,用于生成具有复杂结构的盆景。通过设计可训练的3D空间殖民化算法生成盆景结构,并通过随机采样和点云增强技术提高3D高斯先验质量。实验结果表明,与传统的技术相比,新框架提供了更好的结果。文章还提出了一种具有不同结构水平的盆景生成流水线,分别是细节结构生成流水线和粗糙结构生成流水线。该方法利用了数据集统一化、多视角结构一致性模块等技术手段,旨在提高盆景生成的精细度和逼真度。该框架在盆景生成领域树立了一个新的基准。

Key Takeaways

以下是基于文本的关键见解:

  • 提出了名为3DBonsai的文本到三维生成框架,用于生成复杂结构的盆景。该框架克服了先前技术的局限性和挑战。
  • 设计了可训练的3D空间殖民化算法,以生成盆景结构。这一算法可以通过随机采样和点云增强技术来优化其产生的结果。
  • 介绍了两种盆景生成流水线,包括细节结构生成流水线和粗糙结构生成流水线,分别适用于不同的结构层次和生成需求。
  • 提出了一种多视角结构一致性模块,以提高生成的盆景在细节和结构上的一致性。该模块将二维和三维结构相结合,提高生成的盆景的逼真度。
  • 作者构建了一个统一的二维和三维中文盆景数据集,为该领域的进一步研究提供了丰富的资源。

Cool Papers

点此查看论文截图

High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model

Authors:Yiyang Shen, Kun Zhou, He Wang, Yin Yang, Tianjia Shao

Recently single-view 3D generation via Gaussian splatting has emerged and developed quickly. They learn 3D Gaussians from 2D RGB images generated from pre-trained multi-view diffusion (MVD) models, and have shown a promising avenue for 3D generation through a single image. Despite the current progress, these methods still suffer from the inconsistency jointly caused by the geometric ambiguity in the 2D images, and the lack of structure of 3D Gaussians, leading to distorted and blurry 3D object generation. In this paper, we propose to fix these issues by GS-RGBN, a new RGBN-volume Gaussian Reconstruction Model designed to generate high-fidelity 3D objects from single-view images. Our key insight is a structured 3D representation can simultaneously mitigate the afore-mentioned two issues. To this end, we propose a novel hybrid Voxel-Gaussian representation, where a 3D voxel representation contains explicit 3D geometric information, eliminating the geometric ambiguity from 2D images. It also structures Gaussians during learning so that the optimization tends to find better local optima. Our 3D voxel representation is obtained by a fusion module that aligns RGB features and surface normal features, both of which can be estimated from 2D images. Extensive experiments demonstrate the superiority of our methods over prior works in terms of high-quality reconstruction results, robust generalization, and good efficiency.

近期,通过高斯涂抹(Gaussian splatting)进行单视图3D生成的方法迅速出现并发展。这些方法从由预训练的多视角扩散(MVD)模型生成的2D RGB图像中学习3D高斯分布,并为通过单图像进行3D生成展示了一条充满希望的道路。尽管目前有所进展,这些方法仍然受到由2D图像中的几何模糊和3D高斯结构缺失联合引起的不一致性的困扰,导致生成的3D对象出现失真和模糊。在本文中,我们提出通过GS-RGBN解决这些问题,GS-RGBN是一个新型的RGBN体积高斯重建模型,旨在从单视图图像生成高保真度的3D对象。我们的关键见解是,结构化的3D表示可以同时缓解上述两个问题。为此,我们提出了一种新颖的混合体素-高斯(Voxel-Gaussian)表示法,其中3D体素表示包含明确的3D几何信息,消除了来自2D图像的几何模糊。它还在学习过程中对高斯进行结构化,使优化更倾向于找到更好的局部最优解。我们的3D体素表示是通过融合模块获得的,该模块对齐可以从2D图像估计的RGB特征和表面法线特征。大量实验证明,我们的方法在高质量重建结果、稳健的泛化能力和效率方面优于先前的工作。

论文及项目相关链接

PDF 12 pages

Summary

本文提出了一种新的RGBN体积高斯重建模型GS-RGBN,用于从单视图图像生成高保真度的3D对象。针对现有方法存在的几何模糊和缺乏结构的问题,本文提出了一个混合的体素-高斯表示法,其中包含一个包含明确三维几何信息的体素表示法,用于消除来自二维图像的几何模糊问题,并在学习过程中结构化高斯分布,使优化更容易找到更好的局部最优解。通过融合模块获得三维体素表示法,该模块将可从二维图像估计的RGB特征和表面法向量特征对齐。实验证明,该方法在高质量重建结果、稳健的泛化能力和良好的效率方面优于先前的方法。

Key Takeaways

  • 该研究解决了现有方法在三维重建中因二维图像带来的几何模糊问题。
  • 提出了一种新的RGBN体积高斯重建模型GS-RGBN用于从单视图图像生成三维对象。
  • 通过混合体素-高斯表示法消除几何模糊并优化高斯结构学习。
  • 采用融合模块获取三维体素表示法,结合RGB特征和表面法向量特征。

Cool Papers

点此查看论文截图

Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment

Authors:Ziteng Cui, Xuangeng Chu, Tatsuya Harada

Capturing high-quality photographs under diverse real-world lighting conditions is challenging, as both natural lighting (e.g., low-light) and camera exposure settings (e.g., exposure time) significantly impact image quality. This challenge becomes more pronounced in multi-view scenarios, where variations in lighting and image signal processor (ISP) settings across viewpoints introduce photometric inconsistencies. Such lighting degradations and view-dependent variations pose substantial challenges to novel view synthesis (NVS) frameworks based on Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). To address this, we introduce Luminance-GS, a novel approach to achieving high-quality novel view synthesis results under diverse challenging lighting conditions using 3DGS. By adopting per-view color matrix mapping and view-adaptive curve adjustments, Luminance-GS achieves state-of-the-art (SOTA) results across various lighting conditions – including low-light, overexposure, and varying exposure – while not altering the original 3DGS explicit representation. Compared to previous NeRF- and 3DGS-based baselines, Luminance-GS provides real-time rendering speed with improved reconstruction quality.

在多样化的真实世界照明条件下捕捉高质量照片是一项挑战,因为自然光照(例如低光环境)和相机曝光设置(例如曝光时间)都会显著影响图像质量。这一挑战在多视角场景中尤为突出,其中不同视角的光照和图像信号处理器(ISP)设置的变化引入了光度不一致性。这种光照退化和视角相关的变化给基于神经辐射场(NeRF)和3D高斯拼贴(3DGS)的新视角合成(NVS)框架带来了重大挑战。为了解决这一问题,我们引入了Luminance-GS,这是一种利用3DGS在多种具有挑战性的光照条件下实现高质量新视角合成结果的新方法。通过采用每视图颜色矩阵映射和视图自适应曲线调整,Luminance-GS在各种光照条件下实现了最先进的成果,包括低光、过曝光和可变曝光,同时不改变原始3DGS显式表示。与之前的NeRF和3DGS基线相比,Luminance-GS以实时渲染速度提高了重建质量。

论文及项目相关链接

PDF CVPR 2025, project page: https://cuiziteng.github.io/Luminance_GS_web/

Summary
针对真实世界多种光照条件下拍摄高质量照片的挑战,提出了一种基于3DGS的新方法Luminance-GS,实现高质量的新型视图合成。该方法通过采用每视图颜色矩阵映射和视图自适应曲线调整,可在各种光照条件下获得最佳结果,包括低光、过曝光和曝光变化等。Luminance-GS在保持原始3DGS显式表示的同时,与基于NeRF和3DGS的基线相比,具有实时渲染速度和改进的重构质量。

Key Takeaways

  1. 真实世界多种光照条件对拍摄高质量照片带来挑战。
  2. 自然光照和相机曝光设置对图像质量有重要影响。
  3. 多视角场景中的光照变化和ISP设置导致光度不一致性。
  4. 光照退化和视角相关的变化给基于NeRF和3DGS的新型视图合成(NVS)框架带来挑战。
  5. 提出了一种基于3DGS的新方法Luminance-GS,用于在多种挑战性光照条件下实现高质量的新型视图合成。
  6. Luminance-GS采用每视图颜色矩阵映射和视图自适应曲线调整,适应各种光照条件。

Cool Papers

点此查看论文截图

3D Gaussian Inverse Rendering with Approximated Global Illumination

Authors:Zirui Wu, Jianteng Chen, Laijian Li, Shaoteng Wu, Zhikai Zhu, Kang Xu, Martin R. Oswald, Jie Song

3D Gaussian Splatting shows great potential in reconstructing photo-realistic 3D scenes. However, these methods typically bake illumination into their representations, limiting their use for physically-based rendering and scene editing. Although recent inverse rendering approaches aim to decompose scenes into material and lighting components, they often rely on simplifying assumptions that fail when editing. We present a novel approach that enables efficient global illumination for 3D Gaussians Splatting through screen-space ray tracing. Our key insight is that a substantial amount of indirect light can be traced back to surfaces visible within the current view frustum. Leveraging this observation, we augment the direct shading computed by 3D Gaussians with Monte-Carlo screen-space ray-tracing to capture one-bounce indirect illumination. In this way, our method enables realistic global illumination without sacrificing the computational efficiency and editability benefits of 3D Gaussians. Through experiments, we show that the screen-space approximation we utilize allows for indirect illumination and supports real-time rendering and editing. Code, data, and models will be made available at our project page: https://wuzirui.github.io/gs-ssr.

3D高斯贴图在重建逼真的三维场景方面显示出巨大的潜力。然而,这些方法通常将光照融入其表示中,限制了它们在基于物理的渲染和场景编辑中的使用。尽管最近的逆向渲染方法旨在将场景分解为材质和光照组件,但它们通常依赖于简化假设,这些假设在编辑时会失败。我们提出了一种通过屏幕空间光线追踪实现高效的3D高斯全局光照的新方法。我们的关键见解是,大量的间接光可以回溯到当前视锥内可见的表面上。基于这一观察,我们利用蒙特卡洛屏幕空间光线追踪来捕捉一次弹射的间接照明,增强由3D高斯计算的直接着色。通过这种方式,我们的方法能够在不牺牲计算效率和编辑好处的3D高斯情况下实现逼真的全局光照。通过实验,我们证明了我们所使用的屏幕空间近似值可以实现间接照明并支持实时渲染和编辑。代码、数据和模型将在我们的项目页面提供:https://wuzirui.github.io/gs-ssr。

论文及项目相关链接

PDF

Summary

这篇论文展示了利用屏幕空间射线追踪实现高效的间接光照技术的优势,这种技术可以在三维高斯溅出中创建逼真的全局光照效果。它通过将间接光照追溯到当前视锥体中的可见表面,并结合三维高斯直接着色和蒙特卡罗屏幕空间射线追踪技术,捕捉单次反弹的间接光照效果。该方法在保留三维高斯计算的优点的同时,实现了逼真的全局光照效果。

Key Takeaways

Cool Papers

点此查看论文截图

DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting

Authors:Hyunwoo Park, Gun Ryu, Wonjun Kim

Recently, 3D Gaussian splatting (3DGS) has gained considerable attentions in the field of novel view synthesis due to its fast performance while yielding the excellent image quality. However, 3DGS in sparse-view settings (e.g., three-view inputs) often faces with the problem of overfitting to training views, which significantly drops the visual quality of novel view images. Many existing approaches have tackled this issue by using strong priors, such as 2D generative contextual information and external depth signals. In contrast, this paper introduces a prior-free method, so-called DropGaussian, with simple changes in 3D Gaussian splatting. Specifically, we randomly remove Gaussians during the training process in a similar way of dropout, which allows non-excluded Gaussians to have larger gradients while improving their visibility. This makes the remaining Gaussians to contribute more to the optimization process for rendering with sparse input views. Such simple operation effectively alleviates the overfitting problem and enhances the quality of novel view synthesis. By simply applying DropGaussian to the original 3DGS framework, we can achieve the competitive performance with existing prior-based 3DGS methods in sparse-view settings of benchmark datasets without any additional complexity. The code and model are publicly available at: https://github.com/DCVL-3D/DropGaussian release.

近期,3D高斯涂抹技术(3DGS)因其在快速性能下生成的高质量图像而在新型视图合成领域引起了广泛关注。然而,在稀疏视图设置(例如三视图输入)中,3DGS常常面临过度拟合训练视图的问题,这大大降低了新视图的图像视觉质量。许多现有方法通过使用强大的先验来解决这个问题,例如二维生成上下文信息和外部深度信号。相比之下,本文介绍了一种无先验的方法,称为DropGaussian,通过在3D高斯涂抹中进行简单更改来实现。具体来说,我们随机删除训练过程中的高斯分布,类似于dropout方法,这允许未被排除的高斯分布具有更大的梯度并提高它们的可见性。这使得剩余的高斯分布对使用稀疏输入视图进行渲染的优化过程贡献更大。这种简单的操作有效地减轻了过度拟合问题并提高了新视图合成的质量。通过简单地将DropGaussian应用于原始3DGS框架,我们可以在基准数据集的稀疏视图设置中实现与现有基于先验的3DGS方法相当的性能,并且没有任何额外的复杂性。代码和模型可在以下网址公开访问:https://github.com/DCVL-3D/DropGaussian release。

论文及项目相关链接

PDF Accepted by CVPR 2025

Summary

3D Gaussian splatting(3DGS)在新型视图合成领域受到广泛关注,但在稀疏视图环境下易出现过拟合问题。本文提出一种无先验方法的DropGaussian,通过随机移除训练过程中的高斯体素,改善剩余高斯体素的可见性和对优化过程的贡献,有效缓解过拟合问题,提高稀疏视图下新型视图合成的质量。

Key Takeaways

  1. 3D Gaussian splatting(3DGS)在新型视图合成中表现优异,但稀疏视图环境下存在过拟合问题。
  2. 现有方法常使用强先验信息(如2D生成上下文信息和外部深度信号)来解决过拟合问题。
  3. DropGaussian方法通过随机移除训练过程中的高斯体素,改善剩余高斯体素的可见性,提高优化效果。
  4. DropGaussian能有效缓解过拟合问题,提高稀疏视图下新型视图合成的质量。
  5. DropGaussian方法简单易行,可在原3DGS框架上实现,无需增加额外复杂度。
  6. DropGaussian方法与现有基于先验的3DGS方法在基准数据集上的稀疏视图设置中具有竞争力。

Cool Papers

点此查看论文截图

UnIRe: Unsupervised Instance Decomposition for Dynamic Urban Scene Reconstruction

Authors:Yunxuan Mao, Rong Xiong, Yue Wang, Yiyi Liao

Reconstructing and decomposing dynamic urban scenes is crucial for autonomous driving, urban planning, and scene editing. However, existing methods fail to perform instance-aware decomposition without manual annotations, which is crucial for instance-level scene editing.We propose UnIRe, a 3D Gaussian Splatting (3DGS) based approach that decomposes a scene into a static background and individual dynamic instances using only RGB images and LiDAR point clouds. At its core, we introduce 4D superpoints, a novel representation that clusters multi-frame LiDAR points in 4D space, enabling unsupervised instance separation based on spatiotemporal correlations. These 4D superpoints serve as the foundation for our decomposed 4D initialization, i.e., providing spatial and temporal initialization to train a dynamic 3DGS for arbitrary dynamic classes without requiring bounding boxes or object templates.Furthermore, we introduce a smoothness regularization strategy in both 2D and 3D space, further improving the temporal stability.Experiments on benchmark datasets show that our method outperforms existing methods in decomposed dynamic scene reconstruction while enabling accurate and flexible instance-level editing, making it a practical solution for real-world applications.

对动态城市场景进行重建和分解对于自动驾驶、城市规划和场景编辑至关重要。然而,现有方法无法在没有手动注释的情况下执行实例感知分解,这对于实例级场景编辑至关重要。我们提出了UnIRe,这是一种基于三维高斯扩展(3DGS)的方法,它可以将场景分解为静态背景和单独的动态实例,仅使用RGB图像和激光雷达点云。我们的核心引入了4D超点这一新颖的表示形式,它将多帧激光雷达点在时空维度上进行聚类,从而基于时空相关性实现无监督实例分离。这些4D超点为我们分解的4D初始化提供了基础,即为训练和动态3DGS提供空间和时间初始化,无需边界框或对象模板即可处理任意动态类别。此外,我们在二维和三维空间中都引入了平滑正则化策略,进一步提高了时间稳定性。在基准数据集上的实验表明,我们的方法在分解动态场景重建方面优于现有方法,同时能够实现精确灵活的实例级编辑,使其成为实际应用中的实用解决方案。

论文及项目相关链接

PDF

Summary

该文提出一种基于三维高斯展铺(3DGS)的方法,称为UnIRe,用于从RGB图像和激光雷达点云中分解城市场景为静态背景和单个动态实例。其核心创新在于引入四维超点(4D superpoints),这是一种新的表示方法,能够聚集多帧激光雷达的4D空间点,基于时空相关性实现无监督实例分离。此方法无需边界框或对象模板,即可对任意动态类别进行训练。实验表明,该方法在分解动态场景重建方面优于现有方法,同时可实现精确灵活的实例级编辑,使其成为真实世界应用的实际解决方案。

Key Takeaways

  1. UnIRe方法利用三维高斯展铺(3DGS)技术分解城市场景为静态背景和动态实例。
  2. 引入四维超点(4D superpoints)作为核心创新,实现无监督实例分离。
  3. 方法仅使用RGB图像和激光雷达点云数据。
  4. 不需要边界框或对象模板即可对任意动态类别进行训练。
  5. 通过在二维和三维空间引入平滑正则化策略,提高了时间稳定性。
  6. 实验结果表明,UnIRe方法在分解动态场景重建方面优于现有技术。

Cool Papers

点此查看论文截图

Monocular and Generalizable Gaussian Talking Head Animation

Authors:Shengjie Gong, Haojie Li, Jiapeng Tang, Dongming Hu, Shuangping Huang, Hao Chen, Tianshui Chen, Zhuoman Liu

In this work, we introduce Monocular and Generalizable Gaussian Talking Head Animation (MGGTalk), which requires monocular datasets and generalizes to unseen identities without personalized re-training. Compared with previous 3D Gaussian Splatting (3DGS) methods that requires elusive multi-view datasets or tedious personalized learning/inference, MGGtalk enables more practical and broader applications. However, in the absence of multi-view and personalized training data, the incompleteness of geometric and appearance information poses a significant challenge. To address these challenges, MGGTalk explores depth information to enhance geometric and facial symmetry characteristics to supplement both geometric and appearance features. Initially, based on the pixel-wise geometric information obtained from depth estimation, we incorporate symmetry operations and point cloud filtering techniques to ensure a complete and precise position parameter for 3DGS. Subsequently, we adopt a two-stage strategy with symmetric priors for predicting the remaining 3DGS parameters. We begin by predicting Gaussian parameters for the visible facial regions of the source image. These parameters are subsequently utilized to improve the prediction of Gaussian parameters for the non-visible regions. Extensive experiments demonstrate that MGGTalk surpasses previous state-of-the-art methods, achieving superior performance across various metrics.

在这项工作中,我们介绍了单目和通用高斯语音头动画(MGGTalk),它只需要单目数据集,并对未见过的身份进行通用化,无需个性化再训练。与之前需要难以获取的多视角数据集或繁琐个性化学习/推断的3D高斯展平(3DGS)方法相比,MGGTalk使实际应用和更广泛的应用成为可能。然而,在没有多视角和个性化训练数据的情况下,几何和外观信息的完整性构成了重大挑战。为了解决这些挑战,MGGTalk探索深度信息以增强几何和面部对称特征,以补充几何和外观特征。首先,基于从深度估计获得的像素级几何信息,我们结合了对称操作和点云滤波技术,以确保3DGS的完整和精确的位置参数。随后,我们采用具有对称先验的两阶段策略来预测剩余的3DGS参数。我们首先预测源图像可见面部区域的高斯参数。这些参数随后用于改进非可见区域的高斯参数预测。大量实验表明,MGGTalk超越了现有先进技术,在各种指标上实现了卓越性能。

论文及项目相关链接

PDF Accepted by CVPR 2025

Summary

本文介绍了单目和通用高斯说话人头动画(MGGTalk)技术,该技术基于单目数据集,可推广到未见过的身份而无需个性化再训练。与之前的3D高斯拼贴(3DGS)方法相比,MGGTalk更实用且应用范围更广。为解决缺乏多视角和个性化训练数据带来的几何和外观信息不完整问题,MGGTalk探索深度信息以增强几何和面部对称特征,以补充几何和外观特征。通过深度估计获得的像素级几何信息,结合对称操作和点云过滤技术,确保3DGS的位置参数完整且精确。采用对称先验的两阶段策略预测其余3DGS参数,先预测源图像可见面部区域的高斯参数,再用于改进非可见区域的高斯参数预测。实验表明,MGGTalk超越先前的方法,在各项指标上达到优越性能。

Key Takeaways

  1. MGGTalk技术基于单目数据集,无需多视角数据或个性化训练,更具实用性和广泛应用性。
  2. 面临几何和外观信息不完整性的挑战,MGGTalk通过探索深度信息增强几何和面部对称特征来解决这一问题。
  3. 通过深度估计获得像素级几何信息,结合对称操作和点云过滤技术,确保3DGS的位置参数精确完整。
  4. 采用两阶段策略,利用对称先验预测高斯参数,首先预测源图像可见面部区域的高斯参数,再用于非可见区域。
  5. MGGTalk在各项指标上表现优越,超越了先前的3DGS方法。
  6. MGGTalk技术可以应用于说话人头动画领域,为虚拟角色提供真实感强的动画效果。

Cool Papers

点此查看论文截图

Coca-Splat: Collaborative Optimization for Camera Parameters and 3D Gaussians

Authors:Jiamin Wu, Hongyang Li, Xiaoke Jiang, Yuan Yao, Lei Zhang

In this work, we introduce Coca-Splat, a novel approach to addressing the challenges of sparse view pose-free scene reconstruction and novel view synthesis (NVS) by jointly optimizing camera parameters with 3D Gaussians. Inspired by deformable DEtection TRansformer, we design separate queries for 3D Gaussians and camera parameters and update them layer by layer through deformable Transformer layers, enabling joint optimization in a single network. This design demonstrates better performance because to accurately render views that closely approximate ground-truth images relies on precise estimation of both 3D Gaussians and camera parameters. In such a design, the centers of 3D Gaussians are projected onto each view by camera parameters to get projected points, which are regarded as 2D reference points in deformable cross-attention. With camera-aware multi-view deformable cross-attention (CaMDFA), 3D Gaussians and camera parameters are intrinsically connected by sharing the 2D reference points. Additionally, 2D reference point determined rays (RayRef) defined from camera centers to the reference points assist in modeling relationship between 3D Gaussians and camera parameters through RQ-decomposition on an overdetermined system of equations derived from the rays, enhancing the relationship between 3D Gaussians and camera parameters. Extensive evaluation shows that our approach outperforms previous methods, both pose-required and pose-free, on RealEstate10K and ACID within the same pose-free setting.

在这项工作中,我们引入了Coca-Splat这一新方法,通过联合优化三维高斯和相机参数来解决无姿态场景重建和新颖视角合成所面临的挑战。该方法受到可变形检测变压器(DEtection TRansformer)的启发,为三维高斯和相机参数设计单独的查询,并通过可变形变压器层逐层更新它们,从而在单个网络中实现联合优化。这种设计展现了更好的性能,因为准确渲染接近真实图像的视角依赖于对三维高斯和相机参数的精确估计。在这种设计中,三维高斯的中心通过相机参数投影到每个视图上,得到投影点,这些点被视为可变形交叉注意中的二维参考点。通过相机感知的多视角可变形交叉注意(CaMDFA),三维高斯和相机参数通过共享二维参考点而固有地连接在一起。此外,从相机中心到参考点的二维参考点确定射线(RayRef)有助于通过射线产生的超定系统方程的RQ分解来建模三维高斯和相机参数之间的关系,从而增强三维高斯和相机参数之间的联系。大量评估表明,我们的方法在相同无姿态设置的RealEstate10K和ACID上,优于以前的姿态要求和无姿态方法。

论文及项目相关链接

PDF

Summary

本文提出了Coca-Splat方法,这是一种针对无姿态场景重建和新颖视角合成(NVS)的挑战的新方法。它通过联合优化相机参数和三维高斯分布来解决这些问题。该方法受到可变形检测变压器(DETR)的启发,为三维高斯分布和相机参数设计单独的查询,并通过可变形变压器层逐层更新这些查询,使它们能够在单个网络中联合优化。该方法通过准确估计三维高斯分布和相机参数来准确渲染接近真实图像的视图,表现出更好的性能。此外,通过引入相机感知的多视角可变形交叉注意力(CaMDFA),将三维高斯分布和相机参数通过共享二维参考点进行内在联系。从相机中心到参考点的射线辅助建模三维高斯分布和相机参数之间的关系,并通过在由射线构成的超定系统方程上应用RQ分解来增强这种关系。经过广泛评估,该方法在相同无姿态设置的RealEstate10K和ACID上均优于之前的姿态需求和无姿态方法。

Key Takeaways

  1. Coca-Splat是一种新颖的针对场景重建和新颖视角合成挑战的方法。
  2. 该方法联合优化相机参数和三维高斯分布来解决视图稀疏或无姿态的问题。
  3. 方法受可变形检测变压器的启发,设计查询以处理三维高斯分布和相机参数。
  4. 通过共享二维参考点,将三维高斯分布和相机参数进行内在联系。
  5. 通过从相机中心到参考点的射线辅助建模两者之间的关系。
  6. 方法通过在超定系统方程上应用RQ分解来增强这种关系。

Cool Papers

点此查看论文截图

Distilling Multi-view Diffusion Models into 3D Generators

Authors:Hao Qin, Luyuan Chen, Ming Kong, Mengxu Lu, Qiang Zhu

We introduce DD3G, a formulation that Distills a multi-view Diffusion model (MV-DM) into a 3D Generator using gaussian splatting. DD3G compresses and integrates extensive visual and spatial geometric knowledge from the MV-DM by simulating its ordinary differential equation (ODE) trajectory, ensuring the distilled generator generalizes better than those trained solely on 3D data. Unlike previous amortized optimization approaches, we align the MV-DM and 3D generator representation spaces to transfer the teacher’s probabilistic flow to the student, thus avoiding inconsistencies in optimization objectives caused by probabilistic sampling. The introduction of probabilistic flow and the coupling of various attributes in 3D Gaussians introduce challenges in the generation process. To tackle this, we propose PEPD, a generator consisting of Pattern Extraction and Progressive Decoding phases, which enables efficient fusion of probabilistic flow and converts a single image into 3D Gaussians within 0.06 seconds. Furthermore, to reduce knowledge loss and overcome sparse-view supervision, we design a joint optimization objective that ensures the quality of generated samples through explicit supervision and implicit verification. Leveraging existing 2D generation models, we compile 120k high-quality RGBA images for distillation. Experiments on synthetic and public datasets demonstrate the effectiveness of our method. Our project is available at: https://qinbaigao.github.io/DD3G_project/

我们介绍了DD3G,这是一种通过高斯拼贴技术将多视图扩散模型(MV-DM)蒸馏到3D生成器的方法。DD3G通过模拟MV-DM的常微分方程(ODE)轨迹,压缩并整合了大量的视觉和空间几何知识,确保蒸馏后的生成器比仅在3D数据上训练的生成器具有更好的泛化性能。与之前的一次性优化方法不同,我们将MV-DM和3D生成器的表示空间对齐,以将教师的概率流传输给学生,从而避免由概率采样引起的优化目标不一致。概率流的引入和3D高斯中各种属性的耦合给生成过程带来了挑战。为解决这一问题,我们提出了PEPD生成器,它由模式提取和渐进解码两个阶段组成,能够高效地融合概率流,并将单幅图像在0.06秒内转换为3D高斯。此外,为了减少知识损失并克服稀疏视图监督,我们设计了一个联合优化目标,通过显式监督和隐式验证确保生成样本的质量。我们利用现有的2D生成模型,编译了12万张高质量RGBA图像进行蒸馏。在合成和公开数据集上的实验证明了我们的方法的有效性。我们的项目可访问于:https://qinbaigao.github.io/DD3G_project/

论文及项目相关链接

PDF

Summary

本文介绍了DD3G方法,它将多视图扩散模型(MV-DM)通过高斯拼贴技术蒸馏到3D生成器中。DD3G通过模拟MV-DM的普通微分方程(ODE)轨迹,压缩并整合视觉和空间几何知识,确保蒸馏的生成器比仅使用3D数据训练的生成器具有更好的泛化性能。该方法引入概率流,将MV-DM和3D生成器的表示空间对齐,避免了由于概率采样导致的目标优化不一致的问题。为解决生成过程中的挑战,提出PEPD生成器,包括模式提取和渐进解码阶段,实现概率流的有效融合,将单幅图像转换为3D高斯分布只需0.06秒。同时,设计联合优化目标以减少知识损失并克服稀疏视图监督问题,确保生成样本的质量。利用现有2D生成模型编译120万张高质量RGBA图像进行蒸馏。实验证明该方法在合成和公开数据集上的有效性。

Key Takeaways

  1. DD3G方法成功将多视图扩散模型(MV-DM)转化为3D生成器。
  2. 通过模拟MV-DM的ODE轨迹,DD3G整合了视觉和空间几何知识。
  3. DD3G采用概率流对齐MV-DM和3D生成器的表示空间,避免优化目标的不一致性。
  4. PEPD生成器实现概率流的有效融合,快速将图像转换为3D高斯分布。
  5. 联合优化目标确保生成样本的质量,同时减少知识损失并克服稀疏视图监督问题。
  6. 利用现有2D生成模型编译大量高质量RGBA图像用于蒸馏。
  7. 实验证明DD3G方法在合成和公开数据集上的有效性。

Cool Papers

点此查看论文截图

ADGaussian: Generalizable Gaussian Splatting for Autonomous Driving with Multi-modal Inputs

Authors:Qi Song, Chenghong Li, Haotong Lin, Sida Peng, Rui Huang

We present a novel approach, termed ADGaussian, for generalizable street scene reconstruction. The proposed method enables high-quality rendering from single-view input. Unlike prior Gaussian Splatting methods that primarily focus on geometry refinement, we emphasize the importance of joint optimization of image and depth features for accurate Gaussian prediction. To this end, we first incorporate sparse LiDAR depth as an additional input modality, formulating the Gaussian prediction process as a joint learning framework of visual information and geometric clue. Furthermore, we propose a multi-modal feature matching strategy coupled with a multi-scale Gaussian decoding model to enhance the joint refinement of multi-modal features, thereby enabling efficient multi-modal Gaussian learning. Extensive experiments on two large-scale autonomous driving datasets, Waymo and KITTI, demonstrate that our ADGaussian achieves state-of-the-art performance and exhibits superior zero-shot generalization capabilities in novel-view shifting.

我们提出了一种新型的方法,称为ADGaussian,用于可泛化的街道场景重建。所提出的方法能够从单视角输入实现高质量渲染。与以往主要关注几何精修的Gaussian Splatting方法不同,我们强调联合优化图像和深度特征以进行精确的高斯预测的重要性。为此,我们首先引入稀疏激光雷达深度作为附加输入模式,将高斯预测过程制定为视觉信息和几何线索的联合学习框架。此外,我们提出了一种多模式特征匹配策略,结合多尺度高斯解码模型,以增强多模式特征的联合优化,从而实现高效的多模式高斯学习。在Waymo和KITTI两个大规模自动驾驶数据集上的广泛实验表明,我们的ADGaussian达到了最先进的性能,并在新型视角变化中表现出了卓越的零样本泛化能力。

论文及项目相关链接

PDF The project page can be found at https://maggiesong7.github.io/research/ADGaussian/

摘要

本研究提出了一种新的方法,名为ADGaussian,用于可泛化的街道场景重建。该方法能够实现从单视角输入的高品质渲染。不同于以往主要关注几何精修的Gaussian Splatting方法,我们强调联合优化图像和深度特征对于精确Gaussian预测的重要性。为此,我们首先将稀疏激光雷达深度作为额外的输入模式融入,将Gaussian预测过程制定为视觉信息和几何线索的联合学习框架。此外,我们提出了一种多模式特征匹配策略,结合多尺度Gaussian解码模型,以增强多模式特征的联合精修,从而实现高效的多模式Gaussian学习。在Waymo和KITTI两个大规模自动驾驶数据集上的广泛实验表明,我们的ADGaussian达到了最新技术水平,并在新型视角转换中表现出了卓越的零样本泛化能力。

要点

  1. 提出了一种名为ADGaussian的新型街道场景重建方法,可从单视角输入实现高品质渲染。
  2. 强调联合优化图像和深度特征对于精确Gaussian预测的重要性。
  3. 融入稀疏激光雷达深度作为额外输入模式,制定Gaussian预测的联合学习框架。
  4. 提出多模式特征匹配策略,结合多尺度Gaussian解码模型,增强多模式特征的联合精修。
  5. 实现高效的多模式Gaussian学习。
  6. 在大规模自动驾驶数据集Waymo和KITTI上的实验表明,ADGaussian达到最新技术水平。

Cool Papers

点此查看论文截图

Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration

Authors:Zilong Huang, Jun He, Junyan Ye, Lihan Jiang, Weijia Li, Yiping Chen, Ting Han

The reconstruction of immersive and realistic 3D scenes holds significant practical importance in various fields of computer vision and computer graphics. Typically, immersive and realistic scenes should be free from obstructions by dynamic objects, maintain global texture consistency, and allow for unrestricted exploration. The current mainstream methods for image-driven scene construction involves iteratively refining the initial image using a moving virtual camera to generate the scene. However, previous methods struggle with visual discontinuities due to global texture inconsistencies under varying camera poses, and they frequently exhibit scene voids caused by foreground-background occlusions. To this end, we propose a novel layered 3D scene reconstruction framework from panoramic image, named Scene4U. Specifically, Scene4U integrates an open-vocabulary segmentation model with a large language model to decompose a real panorama into multiple layers. Then, we employs a layered repair module based on diffusion model to restore occluded regions using visual cues and depth information, generating a hierarchical representation of the scene. The multi-layer panorama is then initialized as a 3D Gaussian Splatting representation, followed by layered optimization, which ultimately produces an immersive 3D scene with semantic and structural consistency that supports free exploration. Scene4U outperforms state-of-the-art method, improving by 24.24% in LPIPS and 24.40% in BRISQUE, while also achieving the fastest training speed. Additionally, to demonstrate the robustness of Scene4U and allow users to experience immersive scenes from various landmarks, we build WorldVista3D dataset for 3D scene reconstruction, which contains panoramic images of globally renowned sites. The implementation code and dataset will be released at https://github.com/LongHZ140516/Scene4U .

全景三维场景重建在计算机视觉和计算机图形学领域具有重大实际意义。通常,沉浸式且逼真的场景应不受动态物体的遮挡,保持全局纹理一致性,并允许无限制的探索。当前主流的图像驱动场景构建方法通过移动虚拟相机迭代优化初始图像以生成场景。然而,以前的方法在处理由于不同相机姿态下的全局纹理不一致导致的视觉断层时遇到困难,并且它们经常由于前景背景遮挡而出现场景空洞。为此,我们提出了一种新型的全景三维场景重建框架Scene4U。具体来说,Scene4U结合了开放词汇分割模型与大型语言模型,将真实全景图像分解为多个层次。然后,我们采用基于扩散模型的分层修复模块,利用视觉线索和深度信息恢复遮挡区域,生成场景的层次表示。多层全景图像被初始化为三维高斯平铺表示,然后进行分层优化,最终生成具有语义和结构一致性的沉浸式三维场景,支持自由探索。Scene4U在LPIPS上提高了24.24%,在BRISQUE上提高了24.40%,同时达到了最快的训练速度。此外,为了证明Scene4U的稳健性并允许用户从各种地标体验沉浸式场景,我们为三维场景重建构建了WorldVista3D数据集,其中包含全球知名景点的全景图像。实施代码和数据集将在https://github.com/LongHZ140516/Scene4U发布。

论文及项目相关链接

PDF CVPR 2025, 11 pages, 7 figures

摘要

基于全景图像的分层3D场景重建方法具有重要的实际应用价值。Scene4U框架通过整合开放词汇分割模型与大型语言模型,将真实全景图像分解为多层,并运用基于扩散模型的修复模块对遮挡区域进行修复,生成场景层次表示。该方法初始化多层全景图像为3D高斯贴图表示,随后进行分层优化,生成具有语义和结构一致性的沉浸式3D场景,支持自由探索。Scene4U在LPIPS和BRISQUE指标上分别提高了24.24%和24.40%,且训练速度最快。为展示Scene4U的稳健性并让用户体验来自不同地标的沉浸式场景,构建了WorldVista3D数据集。

关键见解

  1. Scene4U框架提出一种基于全景图像的分层3D场景重建方法,整合开放词汇分割模型与大型语言模型,分解全景图像为多层。
  2. 采用基于扩散模型的修复模块,利用视觉线索和深度信息恢复遮挡区域。
  3. 生成具有语义和结构一致性的多层次场景表示。
  4. Scene4U在LPIPS和BRISQUE指标上显著优于现有方法,训练速度最快。
  5. 构建WorldVista3D数据集,包含全球知名景点的全景图像,用于3D场景重建。
  6. Scene4U框架表现出强大的稳健性。

Cool Papers

点此查看论文截图

LITA-GS: Illumination-Agnostic Novel View Synthesis via Reference-Free 3D Gaussian Splatting and Physical Priors

Authors:Han Zhou, Wei Dong, Jun Chen

Directly employing 3D Gaussian Splatting (3DGS) on images with adverse illumination conditions exhibits considerable difficulty in achieving high-quality, normally-exposed representations due to: (1) The limited Structure from Motion (SfM) points estimated in adverse illumination scenarios fail to capture sufficient scene details; (2) Without ground-truth references, the intensive information loss, significant noise, and color distortion pose substantial challenges for 3DGS to produce high-quality results; (3) Combining existing exposure correction methods with 3DGS does not achieve satisfactory performance due to their individual enhancement processes, which lead to the illumination inconsistency between enhanced images from different viewpoints. To address these issues, we propose LITA-GS, a novel illumination-agnostic novel view synthesis method via reference-free 3DGS and physical priors. Firstly, we introduce an illumination-invariant physical prior extraction pipeline. Secondly, based on the extracted robust spatial structure prior, we develop the lighting-agnostic structure rendering strategy, which facilitates the optimization of the scene structure and object appearance. Moreover, a progressive denoising module is introduced to effectively mitigate the noise within the light-invariant representation. We adopt the unsupervised strategy for the training of LITA-GS and extensive experiments demonstrate that LITA-GS surpasses the state-of-the-art (SOTA) NeRF-based method while enjoying faster inference speed and costing reduced training time. The code is released at https://github.com/LowLevelAI/LITA-GS.

直接对不良照明条件下的图像应用3D高斯Splatting(3DGS)在实现高质量的正常曝光表示方面存在相当大的困难,原因如下:(1)在不良照明场景中估计的运动结构(SfM)点有限,无法捕获足够的场景细节;(2)没有地面真实参考,密集的信息丢失、显著的噪声和色彩失真给3DGS带来挑战,难以产生高质量的结果;(3)将现有的曝光校正方法与3DGS相结合并不能实现令人满意的性能,因为它们的个别增强处理过程导致从不同视点增强的图像之间的照明不一致。为了解决这些问题,我们提出了LITA-GS,这是一种新型的无照明新型视图合成方法,通过无参考的3DGS和物理先验知识来实现。首先,我们引入了一种光照不变物理先验提取管道。其次,基于提取的稳健空间结构先验知识,我们开发了光照无关的结构渲染策略,这有助于优化场景结构和对象外观。此外,还引入了一个渐进的去噪模块,以有效地减轻光不变表示中的噪声。我们采用无监督策略对LITA-GS进行训练,大量实验表明,LITA-GS超越了基于NeRF的最先进方法,同时拥有更快的推理速度和更短的训练时间。代码已发布在https://github.com/LowLevelAI/LITA-GS。

论文及项目相关链接

PDF Accepted by CVPR 2025. 3DGS, Adverse illumination conditions, Reference-free, Physical priors

摘要

本文指出在不良照明条件下直接采用3D高斯喷溅(3DGS)处理图像时,难以获得高质量的正常曝光表示。为解决这一问题,提出一种名为LITA-GS的新型无照明参考的新视角合成方法。该方法通过提取光照不变的物理先验信息,发展了一种无照明结构渲染策略,并引入渐进式去噪模块降低光照不变表示中的噪声。实验证明,LITA-GS在速度上超越基于NeRF的方法,同时降低训练时间,并可在不良照明条件下实现高质量图像合成。相关代码已发布在GitHub上。

关键见解

  1. 不良照明条件下,直接采用3DGS处理图像会面临高难度,主要是由于场景细节捕捉不足、信息丢失严重、噪声和色彩失真等问题。
  2. LITA-GS方法通过提取光照不变的物理先验信息来解决这些问题,有助于优化场景结构和物体外观。
  3. LITA-GS采用无监督策略进行训练,并通过实验证明其性能超越了基于NeRF的方法。
  4. LITA-GS方法具有更快的推理速度和减少的训练时间。
  5. 该方法通过引入渐进式去噪模块,有效地降低了光照不变表示中的噪声。
  6. 代码已公开发布,便于后续研究与应用。
  7. LITA-GS方法具有广泛的应用前景,特别是在不良照明条件下的图像处理和增强现实等领域。

Cool Papers

点此查看论文截图

Visual Acoustic Fields

Authors:Yuelei Li, Hyunjin Kim, Fangneng Zhan, Ri-Zhao Qiu, Mazeyu Ji, Xiaojun Shan, Xueyan Zou, Paul Liang, Hanspeter Pfister, Xiaolong Wang

Objects produce different sounds when hit, and humans can intuitively infer how an object might sound based on its appearance and material properties. Inspired by this intuition, we propose Visual Acoustic Fields, a framework that bridges hitting sounds and visual signals within a 3D space using 3D Gaussian Splatting (3DGS). Our approach features two key modules: sound generation and sound localization. The sound generation module leverages a conditional diffusion model, which takes multiscale features rendered from a feature-augmented 3DGS to generate realistic hitting sounds. Meanwhile, the sound localization module enables querying the 3D scene, represented by the feature-augmented 3DGS, to localize hitting positions based on the sound sources. To support this framework, we introduce a novel pipeline for collecting scene-level visual-sound sample pairs, achieving alignment between captured images, impact locations, and corresponding sounds. To the best of our knowledge, this is the first dataset to connect visual and acoustic signals in a 3D context. Extensive experiments on our dataset demonstrate the effectiveness of Visual Acoustic Fields in generating plausible impact sounds and accurately localizing impact sources. Our project page is at https://yuelei0428.github.io/projects/Visual-Acoustic-Fields/.

当物体被击中时会产生不同的声音,人类可以直观地根据物体的外观和材料属性推断出物体可能发出的声音。受此启发,我们提出了视觉声场(Visual Acoustic Fields)这一概念,通过三维高斯拼贴(3DGS)技术,在三维空间中架起击中声音和视觉信号的桥梁。我们的方法包含两个关键模块:声音生成和声音定位。声音生成模块利用条件扩散模型,以特征增强型三维高斯拼贴渲染的多尺度特征来生成逼真的击打声音。同时,声音定位模块能够对由特征增强型三维高斯拼贴表示的三维场景进行查询,根据声源定位击打位置。为了支持这一框架,我们引入了一种新的场景级视觉-声音样本对收集流程,实现了捕获图像、冲击位置和相应声音之间的对齐。据我们所知,这是首个在三维环境中连接视觉和声音信号的数据库。在我们的数据库上进行的广泛实验证明了视觉声场在生成合理的冲击声和准确定位冲击源方面的有效性。我们的项目页面位于https://yuelei0428.github.io/projects/Visual-Acoustic-Fields/。

论文及项目相关链接

PDF

Summary

基于物体被击打时产生的声音以及人类根据物体外观和材料属性直觉判断声音的特性,提出了视觉声场(Visual Acoustic Fields)框架。该框架利用3D高斯喷溅(3DGS)技术,在3D空间中桥接打击声与视觉信号。包括声音生成与声音定位两个核心模块,其中声音生成模块采用条件扩散模型,利用特征增强的3DGS多尺度特征生成逼真的打击声;声音定位模块则可通过查询特征增强的3D场景来定位声源。为支持此框架,引入新型场景级视觉声音样本采集管道,实现图像捕捉、冲击位置与对应声音之间的对齐。据我们所知,这是首个在3D环境中连接视觉和声音信号的数据库。实验证明,视觉声场在生成合理冲击声和准确定位声源方面效果显著。

Key Takeaways

  • 视觉声场(Visual Acoustic Fields)框架结合了视觉和声音信号,在3D空间中模拟物体被击打时的声场。
  • 该框架利用3D高斯喷溅(3DGS)技术作为基础,实现声音与视觉信号的桥接。
  • 框架包含两个核心模块:声音生成和声音定位。
  • 声音生成模块采用条件扩散模型,基于多尺度特征生成真实打击声。
  • 引入新型场景级视觉声音样本采集方法,实现图像、冲击位置和对应声音的对齐。

Cool Papers

点此查看论文截图

VizFlyt: Perception-centric Pedagogical Framework For Autonomous Aerial Robots

Authors:Kushagra Srivastava, Rutwik Kulkarni, Manoj Velmurugan, Nitin J. Sanket

Autonomous aerial robots are becoming commonplace in our lives. Hands-on aerial robotics courses are pivotal in training the next-generation workforce to meet the growing market demands. Such an efficient and compelling course depends on a reliable testbed. In this paper, we present VizFlyt, an open-source perception-centric Hardware-In-The-Loop (HITL) photorealistic testing framework for aerial robotics courses. We utilize pose from an external localization system to hallucinate real-time and photorealistic visual sensors using 3D Gaussian Splatting. This enables stress-free testing of autonomy algorithms on aerial robots without the risk of crashing into obstacles. We achieve over 100Hz of system update rate. Lastly, we build upon our past experiences of offering hands-on aerial robotics courses and propose a new open-source and open-hardware curriculum based on VizFlyt for the future. We test our framework on various course projects in real-world HITL experiments and present the results showing the efficacy of such a system and its large potential use cases. Code, datasets, hardware guides and demo videos are available at https://pear.wpi.edu/research/vizflyt.html

自主空中机器人正逐渐成为我们日常生活中的常见事物。动手的空中机器人课程对于培养新一代劳动力以满足不断增长的市场需求至关重要。这样的高效和有吸引力的课程依赖于可靠的测试平台。在本文中,我们介绍了VizFlyt,这是一个为空中机器人课程提供的以感知为中心的开源硬件在环(HITL)逼真测试框架。我们利用外部定位系统的姿态,使用三维高斯涂抹技术来模拟实时逼真的视觉传感器。这能够在没有撞击障碍物的风险的情况下,对空中机器人的自主算法进行无压力测试。我们实现了超过100Hz的系统更新率。最后,我们基于过去提供动手空中机器人课程的经验,提出了基于VizFlyt的新的开源开放硬件课程大纲。我们在现实世界的HITL实验中对我们的框架进行了各种课程项目的测试,并展示了该系统的有效性及其大量的潜在用例。代码、数据集、硬件指南和演示视频可在https://pear.wpi.edu/research/vizflyt.html找到。

论文及项目相关链接

PDF Accepted at ICRA 2025. Projected Page: https://pear.wpi.edu/research/vizflyt.html

Summary
自主空中机器人日益普及,为应对市场需求,实战空中机器人课程至关重要。本文提出了一种用于空中机器人课程的开放源码感知中心硬件在环测试框架VizFlyt。通过外部定位系统姿态实现实时三维高斯分裂,生成逼真的视觉传感器数据,使空中机器人自主算法测试更加轻松,无碰撞风险。系统更新率超过每秒百帧。此外,本文基于VizFlyt构建了一个开放源码和开放硬件的课程大纲,并在实际HITL实验中测试了框架在各种课程项目上的表现,证明了其有效性和巨大的潜在应用场景。相关代码、数据集、硬件指南和演示视频可在https://pear.wpi.edu/research/vizflyt.html找到。

Key Takeaways

  1. 实战空中机器人课程需要可靠的测试框架以应对市场需求。
  2. VizFlyt是一个为空中机器人课程设计的感知中心硬件在环测试框架。
  3. 利用外部定位系统姿态生成逼真的视觉传感器数据,实现轻松测试自主算法。
  4. 系统更新率超过每秒百帧,提高测试效率。
  5. 基于VizFlyt构建开放源码和开放硬件的课程大纲。
  6. 在实际HITL实验中测试了框架在各种课程项目上的表现。

Cool Papers

点此查看论文截图

Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis

Authors:Shuai Shen, Wanhua Li, Yunpeng Zhang, Weipeng Hu, Yap-Peng Tan

Talking head synthesis has become a key research area in computer graphics and multimedia, yet most existing methods often struggle to balance generation quality with computational efficiency. In this paper, we present a novel approach that leverages an Audio Factorization Plane (Audio-Plane) based Gaussian Splatting for high-quality and real-time talking head generation. For modeling a dynamic talking head, 4D volume representation is needed. However, directly storing a dense 4D grid is impractical due to the high cost and lack of scalability for longer durations. We overcome this challenge with the proposed Audio-Plane, where the 4D volume representation is decomposed into audio-independent space planes and audio-dependent planes. This provides a compact and interpretable feature representation for talking head, facilitating more precise audio-aware spatial encoding and enhanced audio-driven lip dynamic modeling. To further improve speech dynamics, we develop a dynamic splatting method that helps the network more effectively focus on modeling the dynamics of the mouth region. Extensive experiments demonstrate that by integrating these innovations with the powerful Gaussian Splatting, our method is capable of synthesizing highly realistic talking videos in real time while ensuring precise audio-lip synchronization. Synthesized results are available in https://sstzal.github.io/Audio-Plane/.

说话人头部合成已成为计算机图形学和多媒体领域的一个关键研究点,然而,大多数现有方法往往难以在生成质量和计算效率之间取得平衡。在本文中,我们提出了一种利用基于音频分解平面(Audio-Plane)的高斯涂抹技术(Gaussian Splatting)进行高质量实时说话头部生成的新方法。为了模拟动态的说话头部,需要采用4D体积表示。然而,直接存储密集的4D网格并不实际,因为成本高昂且对于更长时间的模拟缺乏可扩展性。我们克服了这一挑战,提出了音频平面(Audio-Plane),其中将4D体积表示分解成与音频无关的空间平面和与音频相关的平面。这为说话头部提供了一种紧凑且可解释的特征表示,促进了更精确的声音感知空间编码和增强的音频驱动嘴唇动态建模。为了进一步提高语音动态效果,我们开发了一种动态涂抹方法,帮助网络更有效地专注于嘴巴区域的动态建模。大量实验表明,通过将这些创新与强大的高斯涂抹技术相结合,我们的方法能够在确保精确音频与嘴唇同步的情况下,实时合成高度逼真的说话视频。合成的结果可在https://sstzal.github.io/Audio-Plane/查看。

论文及项目相关链接

PDF

Summary

本文提出了一种基于音频分解平面的高斯溅出法,用于高质量实时谈话头部生成。针对动态谈话头部的建模,采用4D体积表示法,但直接存储密集4D网格不实际。因此,本文提出音频分解平面,将4D体积表示分解为音频独立空间平面和音频依赖平面,为谈话头部提供紧凑且可解释的特征表示,促进更精确的音频感知空间编码和增强的音频驱动唇部动态建模。实验表明,结合高斯溅出法,该方法能实时合成高度逼真的谈话视频,确保音频与唇部同步。

Key Takeaways

  1. 谈话头合成是计算机图形学和多媒体的关键研究领域,但平衡生成质量与计算效率仍是挑战。
  2. 提出一种基于音频分解平面的高斯溅出法,用于高质量实时谈话头部生成。
  3. 4D体积表示法是动态谈话头部建模的关键,但直接存储密集4D网格不实际。
  4. 通过音频分解平面解决此问题,将4D体积表示分解为音频独立和依赖的平面。
  5. 这种方法为谈话头部提供紧凑且可解释的特征表示,改善音频感知空间编码和唇部动态建模。
  6. 结合高斯溅出法,实现高度逼真的实时谈话视频合成,确保音频与唇部精确同步。

Cool Papers

点此查看论文截图

RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting

Authors:Qiyu Dai, Xingyu Ni, Qianfan Shen, Wenzheng Chen, Baoquan Chen, Mengyu Chu

We consider the problem of adding dynamic rain effects to in-the-wild scenes in a physically-correct manner. Recent advances in scene modeling have made significant progress, with NeRF and 3DGS techniques emerging as powerful tools for reconstructing complex scenes. However, while effective for novel view synthesis, these methods typically struggle with challenging scene editing tasks, such as physics-based rain simulation. In contrast, traditional physics-based simulations can generate realistic rain effects, such as raindrops and splashes, but they often rely on skilled artists to carefully set up high-fidelity scenes. This process lacks flexibility and scalability, limiting its applicability to broader, open-world environments. In this work, we introduce RainyGS, a novel approach that leverages the strengths of both physics-based modeling and 3DGS to generate photorealistic, dynamic rain effects in open-world scenes with physical accuracy. At the core of our method is the integration of physically-based raindrop and shallow water simulation techniques within the fast 3DGS rendering framework, enabling realistic and efficient simulations of raindrop behavior, splashes, and reflections. Our method supports synthesizing rain effects at over 30 fps, offering users flexible control over rain intensity – from light drizzles to heavy downpours. We demonstrate that RainyGS performs effectively for both real-world outdoor scenes and large-scale driving scenarios, delivering more photorealistic and physically-accurate rain effects compared to state-of-the-art methods. Project page can be found at https://pku-vcl-geometry.github.io/RainyGS/

我们考虑以物理正确的方式为野外场景添加动态下雨效果的问题。最近的场景建模进展已经取得了重大进展,NeRF和3DGS技术作为重建复杂场景的强大工具而崭露头角。然而,尽管这些方法在合成新视角方面很有效,但它们通常会在具有挑战性的场景编辑任务上遇到困难,比如基于物理的雨水模拟。相比之下,传统的基于物理的模拟可以产生逼真的下雨效果,如雨滴和飞溅的水花,但它们通常依赖于熟练的艺术家来仔细设置高保真度的场景。这个过程缺乏灵活性和可扩展性,限制了其在更广泛的开放世界环境中的适用性。在这项工作中,我们引入了RainyGS,这是一种基于物理建模和3DGS的优势相结合的新方法,可在开放世界场景中生成具有物理精度的逼真动态下雨效果。我们的方法的核心是在快速的3DGS渲染框架内整合基于物理的雨滴和浅水模拟技术,能够逼真且高效地模拟雨滴行为、水花和反射。我们的方法可以合成超过30帧/秒的下雨效果,让用户可以灵活地控制雨强度——从轻微的细雨到倾盆大雨。我们证明RainyGS对于真实户外场景和大规模驾驶场景都表现有效,与最先进的方法相比,它提供了更逼真和更准确的下雨效果。项目页面可以在https://pku-vcl-geometry.github.io/RainyGS/找到。

论文及项目相关链接

PDF CVPR 2025

Summary

本文介绍了一种名为RainyGS的新方法,该方法结合了基于物理的建模和3DGS技术,以在开放世界场景中生成具有物理准确性的高逼真度动态雨水效果。该方法融合了物理雨滴和浅水模拟技术,在快速的3DGS渲染框架内实现了雨滴行为、溅水和反射的真实且高效的模拟。用户可灵活控制雨强度,从轻微细雨到倾盆大雨。相较于现有方法,RainyGS在真实户外场景和大规模驾驶场景中表现更出色,提供更为逼真和物理准确的雨水效果。

Key Takeaways

  1. RainyGS结合物理建模和3DGS技术生成高逼真度动态雨水效果。
  2. 方法融合了物理雨滴和浅水模拟技术,在3DGS渲染框架内实现真实且高效的模拟。
  3. 用户可灵活控制雨强度,实现不同降雨情况的模拟。
  4. RainyGS在真实户外场景和大规模驾驶场景中表现优异。
  5. 相比现有方法,RainyGS提供更为逼真和物理准确的雨水效果。
  6. RainyGS支持高帧率(超过30fps)的雨水效果合成。

Cool Papers

点此查看论文截图

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

Authors:Wanhua Li, Renping Zhou, Jiawei Zhou, Yingwei Song, Johannes Herter, Minghan Qin, Gao Huang, Hanspeter Pfister

Learning 4D language fields to enable time-sensitive, open-ended language queries in dynamic scenes is essential for many real-world applications. While LangSplat successfully grounds CLIP features into 3D Gaussian representations, achieving precision and efficiency in 3D static scenes, it lacks the ability to handle dynamic 4D fields as CLIP, designed for static image-text tasks, cannot capture temporal dynamics in videos. Real-world environments are inherently dynamic, with object semantics evolving over time. Building a precise 4D language field necessitates obtaining pixel-aligned, object-wise video features, which current vision models struggle to achieve. To address these challenges, we propose 4D LangSplat, which learns 4D language fields to handle time-agnostic or time-sensitive open-vocabulary queries in dynamic scenes efficiently. 4D LangSplat bypasses learning the language field from vision features and instead learns directly from text generated from object-wise video captions via Multimodal Large Language Models (MLLMs). Specifically, we propose a multimodal object-wise video prompting method, consisting of visual and text prompts that guide MLLMs to generate detailed, temporally consistent, high-quality captions for objects throughout a video. These captions are encoded using a Large Language Model into high-quality sentence embeddings, which then serve as pixel-aligned, object-specific feature supervision, facilitating open-vocabulary text queries through shared embedding spaces. Recognizing that objects in 4D scenes exhibit smooth transitions across states, we further propose a status deformable network to model these continuous changes over time effectively. Our results across multiple benchmarks demonstrate that 4D LangSplat attains precise and efficient results for both time-sensitive and time-agnostic open-vocabulary queries.

学习四维语言字段,以实现动态场景中的时间敏感和无固定结束时间的语言查询,对于许多实际应用至关重要。虽然LangSplat成功地将CLIP特性融入三维高斯表示,实现了在三维静态场景中的精度和效率,但它无法处理动态四维字段,因为CLIP是为静态图像文本任务设计的,无法捕获视频中的时间动态。现实世界的环境本质上是动态的,物体语义会随时间演变。构建精确的四维语言字段需要获得像素对齐的、面向对象的视频特征,而当前视觉模型很难实现这一目标。为了应对这些挑战,我们提出了四维LangSplat,它学习四维语言字段,以高效地处理动态场景中的时间无关或时间敏感的无固定词汇查询。四维LangSplat绕过从视觉特征中学习语言字段,而是直接从根据面向对象的视频字幕生成文本中学习。具体来说,我们提出了一种多模式面向对象视频提示方法,包括视觉和文本提示,可以引导多模式大型语言模型(MLLMs)为视频中的对象生成详细、时间连贯的高质量字幕。这些字幕使用大型语言模型进行编码,生成高质量句子嵌入,然后作为像素对齐的、面向对象的特征监督,通过共享嵌入空间实现开放式词汇文本查询。我们认识到四维场景中的对象在状态之间呈现出平滑过渡,因此进一步提出了一种状态可变形网络,以有效地对这些随时间变化的连续变化进行建模。我们在多个基准测试上的结果表明,四维LangSplat对于时间敏感和时间无关的无固定词汇查询都达到了精确和高效的结果。

论文及项目相关链接

PDF CVPR 2025. Project Page: https://4d-langsplat.github.io

摘要

学习四维语言场对于实现时间敏感和开放式语言查询在动态场景中的应用至关重要。LangSplat成功地将CLIP特性融入三维高斯表示,实现了在三维静态场景中的精确性和效率,但它无法处理动态四维场。本文提出4D LangSplat,学习四维语言场以处理动态场景中的时间无关或时间敏感开放式词汇查询。它通过文本生成和多模态大型语言模型(MLLMs)直接从对象级视频字幕中学习语言场,而不是从视觉特征中学习。我们提出了一种多模态对象级视频提示方法,包括视觉和文本提示,引导MLLMs为视频中的对象生成详细、时间一致的高质量字幕。这些字幕通过大型语言模型编码成高质量句子嵌入,作为像素对齐的对象特定特征监督,通过共享嵌入空间实现开放式文本查询。为了识别出四维场景中对象的平滑状态过渡,我们进一步提出了状态可变形网络,以有效地模拟这些随时间变化的连续变化。多项基准测试结果表明,4D LangSplat对时间敏感和时间无关的开放式词汇查询都能达到精确和高效的结果。

关键见解

  • 学习四维语言场是实现动态场景中时间敏感和开放式语言查询的关键。
  • LangSplat在三维静态场景中表现良好,但无法处理动态四维场。
  • 4D LangSplat通过直接从对象级视频字幕和大型语言模型中学习语言场来解决此问题。
  • 提出多模态对象级视频提示方法,为视频中的对象生成高质量字幕。
  • 通过状态可变形网络模拟四维场景中对象的连续变化。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
3DGS 3DGS
3DGS 方向最新论文已更新,请持续关注 Update in 2025-04-04 Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis
2025-04-04
下一篇 
元宇宙/虚拟人 元宇宙/虚拟人
元宇宙/虚拟人 方向最新论文已更新,请持续关注 Update in 2025-04-04 FRAME Floor-aligned Representation for Avatar Motion from Egocentric Video
  目录