⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-10-02 更新
HART: Human Aligned Reconstruction Transformer
Authors:Xiyi Chen, Shaofei Wang, Marko Mihajlovic, Taewon Kang, Sergey Prokudin, Ming Lin
We introduce HART, a unified framework for sparse-view human reconstruction. Given a small set of uncalibrated RGB images of a person as input, it outputs a watertight clothed mesh, the aligned SMPL-X body mesh, and a Gaussian-splat representation for photorealistic novel-view rendering. Prior methods for clothed human reconstruction either optimize parametric templates, which overlook loose garments and human-object interactions, or train implicit functions under simplified camera assumptions, limiting applicability in real scenes. In contrast, HART predicts per-pixel 3D point maps, normals, and body correspondences, and employs an occlusion-aware Poisson reconstruction to recover complete geometry, even in self-occluded regions. These predictions also align with a parametric SMPL-X body model, ensuring that reconstructed geometry remains consistent with human structure while capturing loose clothing and interactions. These human-aligned meshes initialize Gaussian splats to further enable sparse-view rendering. While trained on only 2.3K synthetic scans, HART achieves state-of-the-art results: Chamfer Distance improves by 18-23 percent for clothed-mesh reconstruction, PA-V2V drops by 6-27 percent for SMPL-X estimation, LPIPS decreases by 15-27 percent for novel-view synthesis on a wide range of datasets. These results suggest that feed-forward transformers can serve as a scalable model for robust human reconstruction in real-world settings. Code and models will be released.
我们介绍了HART,这是一个用于稀疏视角人体重建的统一框架。给定一小部分未校准的RGB人物图像作为输入,它输出一个无接缝的衣物网格、对齐的SMPL-X人体网格和用于逼真新视角渲染的高斯splat表示。以前的人体着装重建方法要么优化参数模板,忽略了松散衣物和人与物体的交互,要么在简化的相机假设下训练隐函数,这在真实场景中的应用受到限制。相比之下,HART预测像素级的3D点图、法线和身体对应关系,并采用遮挡感知的Poisson重建来恢复完整的几何结构,即使在自遮挡区域也是如此。这些预测也与参数化SMPL-X身体模型对齐,确保重建的几何结构在捕捉松散衣物和交互时与人体结构保持一致。这些符合人体结构的网格初始化高斯splat,进一步实现了稀疏视角渲染。尽管仅在2.3K合成扫描上进行训练,但HART达到了最先进的结果:衣冠整齐网格重建的Chamfer距离提高了18-23%,SMPL-X估计的PA-V2V下降了6-27%,新视角合成的LPIPS在多个数据集上下降了15-27%。这些结果表明,前馈变压器可以作为真实环境中稳健人体重建的可扩展模型。代码和模型将发布。
论文及项目相关链接
PDF Project page: https://xiyichen.github.io/hart
Summary
该论文提出了一种名为HART的统一框架,用于稀疏视角下的三维人体重建。该框架接受少量未校准的RGB图像作为输入,输出包括贴合人体的衣物网格、对齐的SMPL-X身体网格以及用于真实感渲染的高斯贴片表示。相较于以往的方法,HART能更准确地预测每一点的3D点图、法线和身体对应关系,并采用遮挡感知的Poisson重建算法恢复自遮挡区域的完整几何结构。此外,HART还能确保重建的几何结构符合人体结构,并捕捉宽松衣物和互动信息。这些人体对齐的网格初始化了高斯贴片,进一步实现了稀疏视角下的渲染。尽管仅在2.3K合成扫描上进行训练,但HART在各项评估指标上均达到或超越了现有技术水平。该研究表明,前馈变压器可作为面向现实世界的稳健人体重建的可扩展模型。
Key Takeaways
- HART是一个统一框架,用于稀疏视角下的三维人体重建,接受少量RGB图像作为输入。
- 输出包括贴合人体的衣物网格、对齐的SMPL-X身体网格以及用于真实感渲染的高斯贴片表示。
- HART能预测每一点的3D点图、法线和身体对应关系,并采用遮挡感知的Poisson重建算法恢复自遮挡区域的完整几何结构。
- 该方法能捕捉宽松衣物和互动信息,确保重建的几何结构符合人体结构。
- HART在各项评估指标上均达到或超越了现有技术水平,包括衣物网格重建的Chamfer距离、SMPL-X估计的PA-V2V以及新型视图合成的LPIPS。
- 该研究展示了前馈变压器在面向现实世界的稳健人体重建中的潜力。
点此查看论文截图



GaussEdit: Adaptive 3D Scene Editing with Text and Image Prompts
Authors:Zhenyu Shu, Junlong Yu, Kai Chao, Shiqing Xin, Ligang Liu
This paper presents GaussEdit, a framework for adaptive 3D scene editing guided by text and image prompts. GaussEdit leverages 3D Gaussian Splatting as its backbone for scene representation, enabling convenient Region of Interest selection and efficient editing through a three-stage process. The first stage involves initializing the 3D Gaussians to ensure high-quality edits. The second stage employs an Adaptive Global-Local Optimization strategy to balance global scene coherence and detailed local edits and a category-guided regularization technique to alleviate the Janus problem. The final stage enhances the texture of the edited objects using a sophisticated image-to-image synthesis technique, ensuring that the results are visually realistic and align closely with the given prompts. Our experimental results demonstrate that GaussEdit surpasses existing methods in editing accuracy, visual fidelity, and processing speed. By successfully embedding user-specified concepts into 3D scenes, GaussEdit is a powerful tool for detailed and user-driven 3D scene editing, offering significant improvements over traditional methods.
本文介绍了GaussEdit,这是一个通过文本和图像提示引导的自适应3D场景编辑框架。GaussEdit利用3D高斯拼贴作为其场景表示的核心,通过三阶段过程实现方便的感兴趣区域选择和高效编辑。第一阶段涉及初始化3D高斯,以确保高质量的编辑。第二阶段采用自适应全局-局部优化策略,以平衡全局场景连贯性和详细的局部编辑,并采用类别指导正则化技术来缓解 Janus 问题。最后一个阶段使用先进的图像到图像合成技术增强编辑对象的纹理,确保结果视觉逼真并与给定的提示紧密对齐。我们的实验结果表明,GaussEdit在编辑精度、视觉保真度和处理速度方面超过了现有方法。通过将用户指定的概念成功嵌入到3D场景中,GaussEdit是一个强大的工具,用于详细和用户驱动的3D场景编辑,对传统方法实现了重大改进。
论文及项目相关链接
Summary
本文介绍了GaussEdit框架,它通过文本和图像提示进行自适应的三维场景编辑。GaussEdit利用三维高斯贴图作为其场景表示的核心技术,通过三个阶段实现便捷的区域选择及高效编辑。实验结果表明,GaussEdit在编辑精度、视觉保真度和处理速度方面均优于现有方法,是详细和用户驱动的三维场景编辑的强大工具。
Key Takeaways
- GaussEdit是一个用于自适应三维场景编辑的框架,支持文本和图像提示。
- GaussEdit采用三维高斯贴图作为核心技术进行场景表示。
- GaussEdit实现了便捷的区域选择及高效编辑,分为三个主要阶段。
- 第一阶段初始化三维高斯,确保高质量编辑。
- 第二阶段采用自适应全局-局部优化策略,平衡全局场景连贯性和局部细节编辑,并使用类别指导正则化技术解决Janus问题。
- 第三阶段采用先进的图像到图像合成技术,增强编辑对象的纹理,确保结果视觉真实且与提示紧密对齐。
点此查看论文截图




PFDepth: Heterogeneous Pinhole-Fisheye Joint Depth Estimation via Distortion-aware Gaussian-Splatted Volumetric Fusion
Authors:Zhiwei Zhang, Ruikai Xu, Weijian Zhang, Zhizhong Zhang, Xin Tan, Jingyu Gong, Yuan Xie, Lizhuang Ma
In this paper, we present the first pinhole-fisheye framework for heterogeneous multi-view depth estimation, PFDepth. Our key insight is to exploit the complementary characteristics of pinhole and fisheye imagery (undistorted vs. distorted, small vs. large FOV, far vs. near field) for joint optimization. PFDepth employs a unified architecture capable of processing arbitrary combinations of pinhole and fisheye cameras with varied intrinsics and extrinsics. Within PFDepth, we first explicitly lift 2D features from each heterogeneous view into a canonical 3D volumetric space. Then, a core module termed Heterogeneous Spatial Fusion is designed to process and fuse distortion-aware volumetric features across overlapping and non-overlapping regions. Additionally, we subtly reformulate the conventional voxel fusion into a novel 3D Gaussian representation, in which learnable latent Gaussian spheres dynamically adapt to local image textures for finer 3D aggregation. Finally, fused volume features are rendered into multi-view depth maps. Through extensive experiments, we demonstrate that PFDepth sets a state-of-the-art performance on KITTI-360 and RealHet datasets over current mainstream depth networks. To the best of our knowledge, this is the first systematic study of heterogeneous pinhole-fisheye depth estimation, offering both technical novelty and valuable empirical insights.
本文中,我们首次提出了用于异质多视图深度估计的针孔鱼眼框架,名为PFDepth。我们的关键见解是利用针孔图像和鱼眼图像的互补特性(无畸变与畸变、小与大视野、远与近场)进行联合优化。PFDepth采用一种统一架构,能够处理具有不同内在和外在参数的针孔和鱼眼相机的任意组合。在PFDepth中,我们首先从每个异质视图中显式提取2D特征,并将其提升到标准的3D体积空间中。然后,设计了一个名为Heterogeneous Spatial Fusion的核心模块,用于处理和融合重叠和非重叠区域的失真感知体积特征。此外,我们将传统的体素融合巧妙地重新制定为一种新型3D高斯表示,其中可学习的潜在高斯球体能够动态适应局部图像纹理,以实现更精细的3D聚合。最后,将融合后的体积特征呈现为多视图深度图。通过广泛的实验,我们证明PFDepth在KITTI-360和RealHet数据集上的性能达到了当前主流深度网络的前沿水平。据我们所知,这是关于异质针孔鱼眼深度估计的首次系统研究,既具有技术新颖性,又提供了有价值的经验见解。
论文及项目相关链接
PDF Accepted by ACM MM 2025 Conference
Summary
本文提出首个针对异构图多视角深度估计的针孔鱼眼框架PFDepth,利用针孔与鱼眼图像的互补特性进行联合优化,并采用统一架构处理不同参数的内窥镜和外窥镜相机。核心模块将图像从异构图中的2D特征映射到标准的三维空间,并运用新颖的空间融合技术和动态的隐高斯球模型细化特征聚合效果,显著提高多视角深度图的表现性能。研究成果已展现出极高的精度。这一突破展示了技术上的创新和重要的实践经验。
Key Takeaways
- 提出首个用于异构图多视角深度估计的针孔鱼眼框架PFDepth。
- 利用针孔和鱼眼图像的互补特性进行联合优化,以适应不同相机的图像特点。
- 使用统一架构处理具有不同参数的多种内窥镜和外窥镜相机。
- 核心模块将图像从异构图中的二维特征映射到三维空间,并融合不同视角的特征信息。
- 采用新颖的空间融合技术和动态的隐高斯球模型细化特征聚合效果。
- PFDepth可以在真实场景深度数据集(如KITTI-360和RealHet)上表现出极高的深度估计性能。
点此查看论文截图





LLM-Powered Code Analysis and Optimization for Gaussian Splatting Kernels
Authors:Yi Hu, Huiyang Zhou
3D Gaussian splatting (3DGS) is a transformative technique with profound implications on novel view synthesis and real-time rendering. Given its importance, there have been many attempts to improve its performance. However, with the increasing complexity of GPU architectures and the vast search space of performance-tuning parameters, it is a challenging task. Although manual optimizations have achieved remarkable speedups, they require domain expertise and the optimization process can be highly time consuming and error prone. In this paper, we propose to exploit large language models (LLMs) to analyze and optimize Gaussian splatting kernels. To our knowledge, this is the first work to use LLMs to optimize highly specialized real-world GPU kernels. We reveal the intricacies of using LLMs for code optimization and analyze the code optimization techniques from the LLMs. We also propose ways to collaborate with LLMs to further leverage their capabilities. For the original 3DGS code on the MipNeRF360 datasets, LLMs achieve significant speedups, 19% with Deepseek and 24% with GPT-5, demonstrating the different capabilities of different LLMs. By feeding additional information from performance profilers, the performance improvement from LLM-optimized code is enhanced to up to 42% and 38% on average. In comparison, our best-effort manually optimized version can achieve a performance improvement up to 48% and 39% on average, showing that there are still optimizations beyond the capabilities of current LLMs. On the other hand, even upon a newly proposed 3DGS framework with algorithmic optimizations, Seele, LLMs can still further enhance its performance by 6%, showing that there are optimization opportunities missed by domain experts. This highlights the potential of collaboration between domain experts and LLMs.
3D高斯展平(3DGS)是一种对新型视图合成和实时渲染有深远影响的变革性技术。鉴于其重要性,人们已尝试多种方法提升其性能。然而,随着GPU架构的日益复杂和性能调整参数搜索空间的不断扩大,这是一个具有挑战性的任务。尽管手动优化已经实现了显著的加速,但它们需要领域专业知识,而且优化过程可能非常耗时且易出错。在本文中,我们提出利用大型语言模型(LLMs)分析和优化高斯展平内核。据我们所知,这是首次使用LLMs优化高度专业化的现实世界GPU内核的工作。我们揭示了使用LLMs进行代码优化的复杂性,并分析了LLMs的代码优化技术。我们还提出了与LLMs协作以进一步利用其能力的方法。对于MipNeRF360数据集上的原始3DGS代码,LLMs实现了显著的加速,使用Deepseek加速19%,使用GPT-5加速24%,展示了不同LLMs的不同能力。通过从性能分析器提供额外信息,LLM优化代码的性能提升平均增强到最高达42%和38%。相比之下,我们最好的手动优化版本可以将性能提高最高达48%和平均提高39%,这表明仍有超出当前LLMs能力的优化空间。另一方面,即使在具有算法优化的新提出的3DGS框架Seele上,LLMs仍然可以进一步提高其性能6%,这表明有专家错过的优化机会。这突显了领域专家与LLMs合作的潜力。
论文及项目相关链接
Summary
本文介绍了利用大型语言模型(LLMs)对三维高斯喷绘(3DGS)技术进行优化。通过LLMs分析并优化高斯喷绘内核,实现了对原始3DGS代码在MipNeRF360数据集上的显著加速。尽管LLMs的优化效果尚未达到手动优化的最佳效果,但其潜力巨大,并能与领域专家形成互补,为未来的合作提供了广阔的空间。
Key Takeaways
- 首次尝试使用大型语言模型(LLMs)对高度专业化的实时GPU内核进行优化。
- LLMs对原始3DGS代码实现了显著的速度提升,其中Deepseek和GPT-5分别提升了19%和24%。
- 通过性能分析器的附加信息,LLM优化代码的性能提升可进一步增强,平均提升达42%和38%。
- LLMs的优化效果虽未达到手动优化的最佳水平,但仍能提升性能,平均提升约6%,显示出其潜力。
- LLMs与领域专家的合作可以进一步提高优化效果,展现了合作的前景。
点此查看论文截图




VGGT-X: When VGGT Meets Dense Novel View Synthesis
Authors:Yang Liu, Chuanchen Luo, Zimo Tang, Junran Peng, Zhaoxiang Zhang
We study the problem of applying 3D Foundation Models (3DFMs) to dense Novel View Synthesis (NVS). Despite significant progress in Novel View Synthesis powered by NeRF and 3DGS, current approaches remain reliant on accurate 3D attributes (e.g., camera poses and point clouds) acquired from Structure-from-Motion (SfM), which is often slow and fragile in low-texture or low-overlap captures. Recent 3DFMs showcase orders of magnitude speedup over the traditional pipeline and great potential for online NVS. But most of the validation and conclusions are confined to sparse-view settings. Our study reveals that naively scaling 3DFMs to dense views encounters two fundamental barriers: dramatically increasing VRAM burden and imperfect outputs that degrade initialization-sensitive 3D training. To address these barriers, we introduce VGGT-X, incorporating a memory-efficient VGGT implementation that scales to 1,000+ images, an adaptive global alignment for VGGT output enhancement, and robust 3DGS training practices. Extensive experiments show that these measures substantially close the fidelity gap with COLMAP-initialized pipelines, achieving state-of-the-art results in dense COLMAP-free NVS and pose estimation. Additionally, we analyze the causes of remaining gaps with COLMAP-initialized rendering, providing insights for the future development of 3D foundation models and dense NVS. Our project page is available at https://dekuliutesla.github.io/vggt-x.github.io/
我们研究了将三维基础模型(3DFMs)应用于密集的新型视图合成(NVS)的问题。尽管由NeRF和3DGS驱动的新型视图合成取得了显著进展,但当前的方法仍然依赖于从运动恢复结构(SfM)获取的准确3D属性(例如相机姿态和点云),这在低纹理或低重叠捕获时通常速度较慢且脆弱。最近的3DFMs与传统管道相比实现了数量级的加速,并且对于在线NVS具有巨大潜力。但大多数验证和结论仅限于稀疏视图设置。我们的研究揭示,直接扩展3DFMs到密集视图会遇到两个基本障碍:显著增加VRAM负担以及输出不完善会破坏对初始化敏感的3D训练。为了解决这些障碍,我们引入了VGGT-X,它结合了高效的VGGT实现(可扩展到超过一千张图像)、用于增强VGGT输出的自适应全局对齐以及稳健的3DGS训练实践。大量实验表明,这些措施极大地缩小了与COLMAP初始化管道之间的保真度差距,在无需COLMAP的密集NVS和姿态估计方面达到了最新结果。此外,我们还分析了与COLMAP初始化渲染之间剩余差距的原因,为未来三维基础模型和密集NVS的发展提供了见解。我们的项目页面可在https://dekuliutesla.github.io/vggt-x.github.io/上找到。
论文及项目相关链接
PDF Project Page: https://dekuliutesla.github.io/vggt-x.github.io/
Summary
研究将3D基础模型(3DFMs)应用于密集新颖视图合成(NVS)的问题。现有方法依赖结构从运动(SfM)获得的3D属性,速度慢且在低纹理或低重叠捕捉中易出错。研究提出VGGT-X方法,通过高效VGGT实现、自适应全局对齐和稳健的3DGS训练实践,解决了大规模应用中的内存负担和输出质量问题。实验显示,该方法在无需COLMAP的密集NVS和姿态估计方面达到最佳效果。
Key Takeaways
- 3D Foundation Models (3DFMs)在密集新颖视图合成(NVS)中的应用受到关注。
- 当前方法依赖Structure-from-Motion (SfM)获取的3D属性,存在速度慢和易出错的问题。
- VGGT-X方法通过高效VGGT实现、自适应全局对齐和稳健的3DGS训练实践解决了大规模应用中的内存负担和输出质量问题。
- VGGT-X在无需COLMAP的密集NVS和姿态估计方面达到最佳效果。
点此查看论文截图



PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos
Authors:Ting-Hsuan Liao, Haowen Liu, Yiran Xu, Songwei Ge, Gengshan Yang, Jia-Bin Huang
We present PAD3R, a method for reconstructing deformable 3D objects from casually captured, unposed monocular videos. Unlike existing approaches, PAD3R handles long video sequences featuring substantial object deformation, large-scale camera movement, and limited view coverage that typically challenge conventional systems. At its core, our approach trains a personalized, object-centric pose estimator, supervised by a pre-trained image-to-3D model. This guides the optimization of deformable 3D Gaussian representation. The optimization is further regularized by long-term 2D point tracking over the entire input video. By combining generative priors and differentiable rendering, PAD3R reconstructs high-fidelity, articulated 3D representations of objects in a category-agnostic way. Extensive qualitative and quantitative results show that PAD3R is robust and generalizes well across challenging scenarios, highlighting its potential for dynamic scene understanding and 3D content creation.
我们提出了PAD3R方法,它是一种从随意捕捉的单目视频中重建可变形3D物体的方法。与现有方法不同,PAD3R能够处理包含显著物体变形、大规模相机移动和有限视图覆盖的长期视频序列,这些通常对常规系统构成挑战。我们的方法的核心是训练一个个性化的、以物体为中心的姿态估计器,该姿态估计器由预训练的图像到三维模型进行监督。这引导了对可变形三维高斯表示的进一步优化。通过在整个输入视频上进行长期的二维点跟踪,进一步优化了正则化。通过结合生成先验知识和可微分渲染,PAD3R以类别无关的方式重建了高保真、灵活的对象三维表示。广泛的质量和数量结果表明,PAD3R是稳健的,在具有挑战性的场景中表现良好,这突出了其在动态场景理解和三维内容创建方面的潜力。
论文及项目相关链接
PDF SIGGRAPH Asia 2025. Project page:https://pad3r.github.io/
摘要
PAD3R是一种从随机捕捉的单目视频中重建可变形3D物体的方法。该方法能处理长时间序列的视频,包含物体的显著变形、大规模相机移动和有限的视野覆盖,这对于传统系统是一大挑战。其核心在于训练一个个性化的、以物体为中心的姿态估计器,由预训练的图像到三维模型进行监督。这引导了对可变形三维高斯表示的进一步优化。优化还通过在整个输入视频上的长期二维点跟踪进行进一步规范。通过结合生成先验知识和可微分渲染,PAD3R能够重建出高保真、灵活表现的物体三维表示,且不受类别限制。广泛的定性和定量结果表明,PAD3R稳健且能在具有挑战性的场景中良好地推广,突显其在动态场景理解和三维内容创建方面的潜力。
要点
- PAD3R是一种从单目视频中重建可变形3D物体的新方法。
- 该方法能处理包含物体显著变形、大规模相机移动和有限视野的视频。
- PAD3R的核心是训练个性化的、以物体为中心的姿态估计器,由预训练图像到三维模型进行监督。
- 通过长期二维点跟踪和优化的方法,实现对可变形三维高斯表示的进一步优化。
- 结合生成先验知识和可微分渲染,重建高保真、灵活表现的物体三维表示。
- PAD3R方法不受物体类别的限制。
点此查看论文截图





Dynamic Novel View Synthesis in High Dynamic Range
Authors:Kaixuan Zhang, Zhipeng Xiong, Minxian Li, Mingwu Ren, Jiankang Deng, Xiatian Zhu
High Dynamic Range Novel View Synthesis (HDR NVS) seeks to learn an HDR 3D model from Low Dynamic Range (LDR) training images captured under conventional imaging conditions. Current methods primarily focus on static scenes, implicitly assuming all scene elements remain stationary and non-living. However, real-world scenarios frequently feature dynamic elements, such as moving objects, varying lighting conditions, and other temporal events, thereby presenting a significantly more challenging scenario. To address this gap, we propose a more realistic problem named HDR Dynamic Novel View Synthesis (HDR DNVS), where the additional dimension ``Dynamic’’ emphasizes the necessity of jointly modeling temporal radiance variations alongside sophisticated 3D translation between LDR and HDR. To tackle this complex, intertwined challenge, we introduce HDR-4DGS, a Gaussian Splatting-based architecture featured with an innovative dynamic tone-mapping module that explicitly connects HDR and LDR domains, maintaining temporal radiance coherence by dynamically adapting tone-mapping functions according to the evolving radiance distributions across the temporal dimension. As a result, HDR-4DGS achieves both temporal radiance consistency and spatially accurate color translation, enabling photorealistic HDR renderings from arbitrary viewpoints and time instances. Extensive experiments demonstrate that HDR-4DGS surpasses existing state-of-the-art methods in both quantitative performance and visual fidelity. Source code will be released.
高动态范围新颖视图合成(HDR NVS)旨在从在常规成像条件下捕获的低动态范围(LDR)训练图像中学习HDR 3D模型。当前的方法主要关注静态场景,隐含地假设所有场景元素都是静止的且非生物。然而,现实世界中的场景通常包含动态元素,例如移动物体、变化的照明条件和其他临时事件,因此呈现了一个更具挑战性的场景。为了解决这个问题,我们提出了一个更现实的问题,称为HDR动态新颖视图合成(HDR DNVS),其中“动态”这个额外的维度强调了同时建模时间辐射变化以及LDR和HDR之间复杂的三维转换的必要性。为了应对这一复杂且相互关联的挑战,我们引入了HDR-4DGS,这是一个基于高斯飞溅的架构,具有创新的动态色调映射模块,该模块显式连接HDR和LDR域,通过动态适应色调映射功能来保持时间辐射一致性,根据时间维度上不断变化的辐射分布。因此,HDR-4DGS实现了时间辐射一致性和空间精确的颜色转换,支持从任意观点和时间实例进行逼真的HDR渲染。大量实验表明,HDR-4DGS在定量性能和视觉保真度方面都超越了现有的最先进方法。源代码将发布。
论文及项目相关链接
Summary
本文提出HDR动态新颖视图合成(HDR DNVS)以解决现实世界场景中存在的动态元素问题,如移动物体、光照条件变化等。为此,文章引入了HDR-4DGS模型,该模型采用基于高斯拼贴架构,并配备了动态色调映射模块,能够明确连接HDR和LDR领域,保持时间辐射一致性。实验证明,HDR-4DGS在定量性能和视觉保真度上均超越了现有方法。
Key Takeaways
- HDR动态新颖视图合成(HDR DNVS)针对包含动态元素的真实场景,如移动物体和光照变化。
- 现有方法主要关注静态场景,忽略现实世界的动态元素。
- HDR-4DGS模型采用基于高斯拼贴架构,配备动态色调映射模块。
- HDR-4DGS能够明确连接HDR和LDR领域,保持时间辐射一致性。
- HDR-4DGS实现了时空辐射一致性和空间精确色彩翻译。
- HDR-4DGS在定量性能和视觉保真度上超越了现有方法。
点此查看论文截图



Image-Conditioned 3D Gaussian Splat Quantization
Authors:Xinshuang Liu, Runfa Blark Li, Keito Suzuki, Truong Nguyen
3D Gaussian Splatting (3DGS) has attracted considerable attention for enabling high-quality real-time rendering. Although 3DGS compression methods have been proposed for deployment on storage-constrained devices, two limitations hinder archival use: (1) they compress medium-scale scenes only to the megabyte range, which remains impractical for large-scale scenes or extensive scene collections; and (2) they lack mechanisms to accommodate scene changes after long-term archival. To address these limitations, we propose an Image-Conditioned Gaussian Splat Quantizer (ICGS-Quantizer) that substantially enhances compression efficiency and provides adaptability to scene changes after archiving. ICGS-Quantizer improves quantization efficiency by jointly exploiting inter-Gaussian and inter-attribute correlations and by using shared codebooks across all training scenes, which are then fixed and applied to previously unseen test scenes, eliminating the overhead of per-scene codebooks. This approach effectively reduces the storage requirements for 3DGS to the kilobyte range while preserving visual fidelity. To enable adaptability to post-archival scene changes, ICGS-Quantizer conditions scene decoding on images captured at decoding time. The encoding, quantization, and decoding processes are trained jointly, ensuring that the codes, which are quantized representations of the scene, are effective for conditional decoding. We evaluate ICGS-Quantizer on 3D scene compression and 3D scene updating. Experimental results show that ICGS-Quantizer consistently outperforms state-of-the-art methods in compression efficiency and adaptability to scene changes. Our code, model, and data will be publicly available on GitHub.
3D高斯贴图(3DGS)因其能够实现高质量实时渲染而备受关注。虽然已有针对存储受限设备部署的3DGS压缩方法被提出,但存在两个限制阻碍了其存档使用:(1)它们仅能将中等规模的场景压缩到兆字节范围,对于大规模场景或广泛的场景集合来说仍然不切实际;(2)它们缺乏在长期存档后适应场景变化的机制。为了解决这些限制,我们提出了一种图像条件高斯贴图量化器(ICGS-Quantizer),它大大提高了压缩效率,并为存档后的场景变化提供了适应性。ICGS-Quantizer通过联合利用高斯之间和属性之间的相关性,并使用所有训练场景的共同代码本,提高了量化效率。这些代码本在训练过程中固定下来,并应用于先前未见过的测试场景,消除了为每个场景准备代码本的开销。这种方法有效地将3DGS的存储需求降低到千字节范围,同时保持视觉保真度。为了使存档后的场景变化具有适应性,ICGS-Quantizer根据解码时捕获的图像进行场景解码。编码、量化和解码过程联合训练,确保场景的量化表示代码对于条件解码是有效的。我们评估了ICGS-Quantizer在三维场景压缩和三维场景更新方面的表现。实验结果表明,ICGS-Quantizer在压缩效率和适应场景变化方面始终优于最先进的方法。我们的代码、模型和数据将在GitHub上公开可用。
论文及项目相关链接
摘要
针对实时渲染的高质量需求,引入了一种改进后的三维高斯模糊(3DGS)技术——图像条件高斯模糊量化器(ICGS-Quantizer)。该技术提高了压缩效率,降低了存储需求至千字节范围,同时保留了视觉保真度。它利用高斯之间的相关性以及跨场景共享的代码本进行量化处理,提高了量化效率。此外,ICGS-Quantizer能够在解码时根据捕获的图像进行场景解码,从而适应场景变化。实验结果表明,ICGS-Quantizer在压缩效率和场景变化适应性方面均优于现有技术。相关代码、模型和资料将在GitHub上公开提供。
关键要点
- 3DGS技术可实现高质量实时渲染。但现有压缩方法在大规模场景或大规模场景集合的应用中存在限制。
- 图像条件高斯模糊量化器(ICGS-Quantizer)被提出以提高压缩效率和适应存档后的场景变化。它通过利用高斯之间的相关性以及跨场景共享的代码本提高了量化效率。
- ICGS-Quantizer降低了存储需求至千字节范围,同时保持了视觉保真度。它能在解码时根据图像进行场景解码,增强了适应场景变化的能力。
点此查看论文截图




ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting
Authors:Daniel Wang, Patrick Rim, Tian Tian, Dong Lao, Alex Wong, Ganesh Sundaramoorthi
We introduce ODE-GS, a novel approach that integrates 3D Gaussian Splatting with latent neural ordinary differential equations (ODEs) to enable future extrapolation of dynamic 3D scenes. Unlike existing dynamic scene reconstruction methods, which rely on time-conditioned deformation networks and are limited to interpolation within a fixed time window, ODE-GS eliminates timestamp dependency by modeling Gaussian parameter trajectories as continuous-time latent dynamics. Our approach first learns an interpolation model to generate accurate Gaussian trajectories within the observed window, then trains a Transformer encoder to aggregate past trajectories into a latent state evolved via a neural ODE. Finally, numerical integration produces smooth, physically plausible future Gaussian trajectories, enabling rendering at arbitrary future timestamps. On the D-NeRF, NVFi, and HyperNeRF benchmarks, ODE-GS achieves state-of-the-art extrapolation performance, improving metrics by 19.8% compared to leading baselines, demonstrating its ability to accurately represent and predict 3D scene dynamics.
我们介绍了ODE-GS,这是一种将3D高斯摊铺与潜在神经常微分方程(ODEs)相结合的新型方法,能够对动态3D场景进行未来外推。与现有的依赖于时间条件变形网络且仅限于固定时间窗口内的插值方法的动态场景重建方法不同,ODE-GS通过模拟高斯参数轨迹作为连续时间的潜在动态,消除了时间戳依赖。我们的方法首先学习一个插值模型,在观察窗口内生成准确的高斯轨迹,然后训练一个Transformer编码器,将过去轨迹聚集到一个通过神经ODE演化的潜在状态。最后,数值积分产生平滑且物理上合理的未来高斯轨迹,可以在任意未来时间戳进行渲染。在D-NeRF、NVFi和HyperNeRF基准测试中,ODE-GS达到了最先进的预测性能,与主要基线相比提高了指标达到19.8%,这证明了其准确表示和预测3D场景动态的能力。
论文及项目相关链接
Summary
ODE-GS方法结合3D高斯模糊与潜在神经常微分方程(ODEs),实现了动态3D场景的预测。不同于依赖时间条件变形网络的现有动态场景重建方法,ODE-GS通过模拟高斯参数轨迹的连续时间潜在动态,消除了时间戳的依赖。该方法首先学习在观察窗口内的准确高斯轨迹插值模型,然后训练Transformer编码器汇聚过去轨迹,通过神经ODE演化成潜在状态。最后通过数值积分生成平滑的物理可行的未来高斯轨迹,可以在任意未来时间戳进行渲染。在D-NeRF、NVFi和HyperNeRF基准测试中,ODE-GS实现了最先进的预测性能,相较于现有基线提高了19.8%,证明了其在准确表示和预测3D场景动态方面的能力。
Key Takeaways
- ODE-GS结合3D高斯模糊与潜在神经常微分方程进行动态场景预测。
- 与现有依赖时间条件变形网络的方法不同,ODE-GS消除了时间戳依赖。
- ODE-GS首先学习在观察窗口内的准确高斯轨迹插值模型。
- 使用Transformer编码器汇聚过去轨迹并转化为潜在状态。
- 通过数值积分生成未来高斯轨迹,实现任意时间戳的渲染。
- 在多个基准测试中,ODE-GS实现了最先进的预测性能。
点此查看论文截图



ResGS: Residual Densification of 3D Gaussian for Efficient Detail Recovery
Authors:Yanzhe Lyu, Kai Cheng, Xin Kang, Xuejin Chen
Recently, 3D Gaussian Splatting (3D-GS) has prevailed in novel view synthesis, achieving high fidelity and efficiency. However, it often struggles to capture rich details and complete geometry. Our analysis reveals that the 3D-GS densification operation lacks adaptiveness and faces a dilemma between geometry coverage and detail recovery. To address this, we introduce a novel densification operation, residual split, which adds a downscaled Gaussian as a residual. Our approach is capable of adaptively retrieving details and complementing missing geometry. To further support this method, we propose a pipeline named ResGS. Specifically, we integrate a Gaussian image pyramid for progressive supervision and implement a selection scheme that prioritizes the densification of coarse Gaussians over time. Extensive experiments demonstrate that our method achieves SOTA rendering quality. Consistent performance improvements can be achieved by applying our residual split on various 3D-GS variants, underscoring its versatility and potential for broader application in 3D-GS-based applications.
最近,3D高斯拼贴(3D-GS)在新型视图合成中盛行,实现了高保真和高效性。然而,它在捕捉丰富细节和完整几何结构方面常常遇到困难。我们的分析表明,3D-GS稠密化操作缺乏适应性,面临几何覆盖与细节恢复的困境。为了解决这一问题,我们引入了一种新型的稠密化操作,即残差分割,其添加一个降尺度的高斯作为残差。我们的方法能够自适应地检索细节并补充缺失的几何结构。为了进一步支持此方法,我们提出了一种名为ResGS的管道。具体来说,我们结合了高斯图像金字塔进行渐进监督,并实现了优先选择随时间对粗高斯进行稠密化的选择方案。大量实验表明,我们的方法达到了先进的渲染质量。将我们的残差分割应用于各种3-GS变体,可以实现持续的性能改进,这突显了其在基于3D-GS的更广泛应用中的通用性和潜力。
论文及项目相关链接
Summary
本文介绍了三维高斯拼贴(3D-GS)在新型视图合成中的优势,但也指出了其捕捉丰富细节和完整几何形状的困难。分析表明,3D-GS的密集化操作缺乏适应性,面临几何覆盖与细节恢复的困境。为解决这一问题,本文提出了一种新型的密集化操作——残差分割,通过添加降维的高斯残差来增强适应性。为支持此方法,本文还提出了一种名为ResGS的管道,集成高斯图像金字塔进行渐进监督,并实现了优先选择随时间对粗糙高斯进行密集化的选择方案。实验表明,该方法达到了先进的渲染质量,并在各种3D-GS变体上应用残差分割,证明了其通用性和在基于3D-GS的应用中的广泛应用潜力。
Key Takeaways
- 3D Gaussian Splatting (3D-GS)在新型视图合成中占据优势,但存在捕捉细节和几何形状的困难。
- 现有方法中的密集化操作缺乏适应性,面临几何覆盖与细节恢复的矛盾。
- 引入了一种新的密集化操作——残差分割,通过添加降维的高斯残差来提高适应性。
- 提出了一种名为ResGS的管道,集成高斯图像金字塔进行渐进监督。
- 实现了优先选择对粗糙高斯进行密集化的选择方案。
- 实验证明该方法达到了先进的渲染质量。
点此查看论文截图





