嘘~ 正在从服务器偷取页面 . . .

3DGS


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-10-21 更新

Fix False Transparency by Noise Guided Splatting

Authors:Aly El Hakie, Yiren Lu, Yu Yin, Michael Jenkins, Yehe Liu

Opaque objects reconstructed by 3DGS often exhibit a falsely transparent surface, leading to inconsistent background and internal patterns under camera motion in interactive viewing. This issue stems from the ill-posed optimization in 3DGS. During training, background and foreground Gaussians are blended via alpha-compositing and optimized solely against the input RGB images using a photometric loss. As this process lacks an explicit constraint on surface opacity, the optimization may incorrectly assign transparency to opaque regions, resulting in view-inconsistent and falsely transparent. This issue is difficult to detect in standard evaluation settings but becomes particularly evident in object-centric reconstructions under interactive viewing. Although other causes of view-inconsistency have been explored recently, false transparency has not been explicitly identified. To the best of our knowledge, we are the first to identify, characterize, and develop solutions for this artifact, an underreported artifact in 3DGS. Our strategy, NGS, encourages surface Gaussians to adopt higher opacity by injecting opaque noise Gaussians in the object volume during training, requiring only minimal modifications to the existing splatting process. To quantitatively evaluate false transparency in static renderings, we propose a transmittance-based metric that measures the severity of this artifact. In addition, we introduce a customized, high-quality object-centric scan dataset exhibiting pronounced transparency issues, and we augment popular existing datasets with complementary infill noise specifically designed to assess the robustness of 3D reconstruction methods to false transparency. Experiments across multiple datasets show that NGS substantially reduces false transparency while maintaining competitive performance on standard rendering metrics, demonstrating its overall effectiveness.

通过3DGS重建的不透明物体表面常常会出现虚假的透明效果,导致在交互观看时,背景和内部图案在相机移动时表现出不一致性。这个问题源于3DGS中的不适定优化。在训练过程中,背景和前景的高斯模型通过alpha混合进行融合,仅针对输入RGB图像使用光度损失进行优化。由于此过程缺乏对表面不透明度的明确约束,优化可能会错误地将透明度分配给不透明区域,导致视图不一致和虚假的透明效果。这个问题在标准评估环境中很难检测到,但在交互观看的对象中心重建中变得特别明显。尽管最近已经探索了其他导致视图不一致的原因,但虚假的透明度尚未被明确识别。据我们所知,我们是第一批识别、表征和解决这一疏忽的3DGS伪影的团队。我们的策略NGS通过训练过程中在对象体积内注入不透明噪声高斯模型,鼓励表面高斯模型采用更高的不透明度,并且只需对现有涂抹过程进行最小限度的修改。为了定量评估静态渲染中的虚假透明度,我们提出了一种基于透光率的指标,该指标可以测量此伪影的严重性。此外,我们引入了一个高质量的对象中心扫描数据集,该数据集表现出明显的透明度问题,我们还通过专门设计用于评估3D重建方法对虚假透明度的稳健性的补充填充噪声来增强现有数据集。跨多个数据集的实验表明,NGS在减少虚假透明度方面表现出色,同时在标准渲染指标上保持竞争力,证明了其整体有效性。

论文及项目相关链接

PDF

Summary

本文探讨了使用三维几何扫描(3DGS)重建物体时出现的虚假透明表面问题。该问题源于优化过程不当,导致在交互观看时背景与内部图案不一致。为解决此问题,本文首次识别、表征并开发了针对此问题的解决方案。通过注入不透明噪声高斯分布,鼓励表面高斯分布采用更高的不透明度,从而减小虚假透明度。此外,本文还提出了定量评估静态渲染中虚假透明度的传输率指标,并引入了一个质量高的物体中心扫描数据集来展示明显的透明度问题。实验结果表明,该策略在减少虚假透明度的同时保持了标准的渲染性能。

Key Takeaways

  • 3DGS重建的物体经常出现虚假透明表面,导致交互观看时背景与内部图案不一致。
  • 问题源于优化过程不当,特别是缺乏表面不透明度的明确约束。
  • 首次识别、表征并开发了针对虚假透明问题的解决方案。
  • 通过注入不透明噪声高斯分布,鼓励表面高斯分布采用更高的不透明度,减小虚假透明度。
  • 提出了定量评估静态渲染中虚假透明度的传输率指标。
  • 引入了一个质量高的物体中心扫描数据集来展示明显的透明度问题,并扩充了现有数据集以评估方法对虚假透明度的稳健性。

Cool Papers

点此查看论文截图

PFGS: Pose-Fused 3D Gaussian Splatting for Complete Multi-Pose Object Reconstruction

Authors:Ting-Yu Yen, Yu-Sheng Chiu, Shih-Hsuan Hung, Peter Wonka, Hung-Kuo Chu

Recent advances in 3D Gaussian Splatting (3DGS) have enabled high-quality, real-time novel-view synthesis from multi-view images. However, most existing methods assume the object is captured in a single, static pose, resulting in incomplete reconstructions that miss occluded or self-occluded regions. We introduce PFGS, a pose-aware 3DGS framework that addresses the practical challenge of reconstructing complete objects from multi-pose image captures. Given images of an object in one main pose and several auxiliary poses, PFGS iteratively fuses each auxiliary set into a unified 3DGS representation of the main pose. Our pose-aware fusion strategy combines global and local registration to merge views effectively and refine the 3DGS model. While recent advances in 3D foundation models have improved registration robustness and efficiency, they remain limited by high memory demands and suboptimal accuracy. PFGS overcomes these challenges by incorporating them more intelligently into the registration process: it leverages background features for per-pose camera pose estimation and employs foundation models for cross-pose registration. This design captures the best of both approaches while resolving background inconsistency issues. Experimental results demonstrate that PFGS consistently outperforms strong baselines in both qualitative and quantitative evaluations, producing more complete reconstructions and higher-fidelity 3DGS models.

近期三维高斯扩展(3DGS)的进展为实现高质量实时从多视角图像生成新型视角的合成提供了可能。然而,大多数现有方法假定物体以单一静态姿态被捕获,导致重建不完整,缺失遮挡或自遮挡区域。我们引入了PFGS,一个姿态感知的3DGS框架,它解决了从多姿态图像捕获重建完整物体的实际挑战。给定物体在一个主姿态和几个辅助姿态的图像,PFGS迭代地将每个辅助集融合到主姿态的统一3DGS表示中。我们的姿态感知融合策略结合了全局和局部注册,以有效合并视图并优化3DGS模型。虽然最近的3D基础模型的进展提高了注册的稳健性和效率,但它们仍然受到高内存需求和次优准确度的限制。PFGS通过更智能地将它们纳入注册过程来克服这些挑战:它利用背景特征进行逐姿态相机姿态估计,并采用基础模型进行跨姿态注册。这种设计捕捉了两种方法的优点,同时解决了背景不一致问题。实验结果表明,无论是在定性还是定量评估中,PFGS都始终优于强大的基准测试,生成更完整的重建和更高保真的3DGS模型。

论文及项目相关链接

PDF

Summary

本文介绍了PFGS,一种基于姿态感知的3D高斯喷溅(3DGS)框架,解决了从多姿态图像捕捉中重建完整物体的实际问题。PFGS通过迭代融合每个辅助集到一个统一的3DGS表示中,从而实现对主姿态的重建。该框架结合全局和局部注册进行姿态感知融合策略,并改进了3DGS模型。此外,PFGS通过更智能地整合注册过程克服了高内存需求和准确性不佳的限制。实验结果表明,PFGS在定性和定量评估中均表现优于强大的基准测试,能够产生更完整的重建和更高保真度的3DGS模型。

Key Takeaways

  1. PFGS是一种基于姿态感知的3D高斯喷溅(3DGS)框架,用于从多姿态图像中重建完整物体。
  2. PFGS通过迭代融合辅助集到统一的3DGS表示中,实现对主姿态的重建。
  3. 姿态感知融合策略结合了全局和局部注册,提高了3DGS模型的精度。
  4. PFGS通过更智能地整合注册过程,克服了现有方法的高内存需求和准确性问题。
  5. PFGS利用背景特征进行每姿态摄像头姿态估计,并采用基础模型进行跨姿态注册。
  6. PFGS解决了背景不一致性问题,并实验证明其表现优于其他基准测试方法。

Cool Papers

点此查看论文截图

GaussGym: An open-source real-to-sim framework for learning locomotion from pixels

Authors:Alejandro Escontrela, Justin Kerr, Arthur Allshire, Jonas Frey, Rocky Duan, Carmelo Sferrazza, Pieter Abbeel

We present a novel approach for photorealistic robot simulation that integrates 3D Gaussian Splatting as a drop-in renderer within vectorized physics simulators such as IsaacGym. This enables unprecedented speed – exceeding 100,000 steps per second on consumer GPUs – while maintaining high visual fidelity, which we showcase across diverse tasks. We additionally demonstrate its applicability in a sim-to-real robotics setting. Beyond depth-based sensing, our results highlight how rich visual semantics improve navigation and decision-making, such as avoiding undesirable regions. We further showcase the ease of incorporating thousands of environments from iPhone scans, large-scale scene datasets (e.g., GrandTour, ARKit), and outputs from generative video models like Veo, enabling rapid creation of realistic training worlds. This work bridges high-throughput simulation and high-fidelity perception, advancing scalable and generalizable robot learning. All code and data will be open-sourced for the community to build upon. Videos, code, and data available at https://escontrela.me/gauss_gym/.

我们提出了一种将三维高斯涂抹(3D Gaussian Splatting)作为向量化物理模拟器(如IsaacGym)的插件渲染器的逼真机器人仿真新方法。这实现了前所未有的速度——在消费级GPU上每秒超过10万步——同时保持了高视觉保真度,我们在各种任务中展示了这一点。我们还展示了它在模拟到真实机器人的设置中的适用性。除了基于深度的感知,我们的结果还强调了丰富的视觉语义如何改善导航和决策制定,例如避免不理想的区域。我们还展示了从iPhone扫描、大规模场景数据集(例如GrandTour、ARKit)以及来自生成视频模型(如Veo)的输出中轻松整合数千个环境的优势,能够实现快速创建逼真的训练世界。这项工作结合了高通量仿真和高保真感知,推动了可伸缩和通用的机器人学习的发展。所有代码和数据都将开源,供社区在此基础上进行构建。视频、代码和数据可通过https://escontrela.me/gauss_gym/获取。

论文及项目相关链接

PDF

Summary

本文介绍了一种将3D高斯贴图技术集成到向量物理模拟器(如IsaacGym)中的新型机器人仿真方法。该方法在消费者GPU上实现了超过每秒10万次的速度,同时保持了高视觉保真度,并在多种任务中展示了其效果。此外,该研究还展示了其在模拟到现实机器人应用中的适用性,证明了丰富的视觉语义如何改进导航和决策制定,例如避免不良区域。研究还展示了如何从iPhone扫描、大规模场景数据集(如GrandTour、ARKit)和生成视频模型(如Veo)中轻松集成数千个环境,从而快速创建逼真的训练世界。这项工作将高速仿真和高保真感知相结合,推动了机器人学习的可扩展性和泛化能力。

Key Takeaways

  1. 提出了一种新的机器人仿真方法,结合了3D高斯贴图技术,显著提高了仿真速度和视觉质量。
  2. 在多种任务中验证了该方法的性能,并展示了其在模拟到现实机器人应用中的潜力。
  3. 丰富的视觉语义信息有助于改进机器人的导航和决策制定。
  4. 该方法能够轻松集成来自不同来源的环境数据,如iPhone扫描、大规模场景数据集和生成视频模型。
  5. 研究成果有助于快速创建逼真的训练世界,增强了机器人学习的现实性和有效性。
  6. 该研究推动了机器人仿真领域的发展,提高了仿真速度和感知质量之间的平衡。

Cool Papers

点此查看论文截图

DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion

Authors:Weijie Wang, Jiagang Zhu, Zeyu Zhang, Xiaofeng Wang, Zheng Zhu, Guosheng Zhao, Chaojun Ni, Haoxiao Wang, Guan Huang, Xinze Chen, Yukun Zhou, Wenkang Qin, Duochao Shi, Haoyun Li, Guanghong Jia, Jiwen Lu

We present DriveGen3D, a novel framework for generating high-quality and highly controllable dynamic 3D driving scenes that addresses critical limitations in existing methodologies. Current approaches to driving scene synthesis either suffer from prohibitive computational demands for extended temporal generation, focus exclusively on prolonged video synthesis without 3D representation, or restrict themselves to static single-scene reconstruction. Our work bridges this methodological gap by integrating accelerated long-term video generation with large-scale dynamic scene reconstruction through multimodal conditional control. DriveGen3D introduces a unified pipeline consisting of two specialized components: FastDrive-DiT, an efficient video diffusion transformer for high-resolution, temporally coherent video synthesis under text and Bird’s-Eye-View (BEV) layout guidance; and FastRecon3D, a feed-forward reconstruction module that rapidly builds 3D Gaussian representations across time, ensuring spatial-temporal consistency. Together, these components enable real-time generation of extended driving videos (up to $424\times800$ at 12 FPS) and corresponding dynamic 3D scenes, achieving SSIM of 0.811 and PSNR of 22.84 on novel view synthesis, all while maintaining parameter efficiency.

我们提出了DriveGen3D,这是一种生成高质量、高度可控的动态3D驾驶场景的新型框架,解决了现有方法的关键局限性。现有的驾驶场景合成方法要么因为扩展时间生成的计算需求过高而受到限制,要么专注于长时间的视频合成而没有3D表示,要么仅限于静态单场景重建。我们的工作通过整合加速的长期视频生成与大规模动态场景重建的多模式条件控制来弥补这一方法上的差距。DriveGen3D引入了一个统一的管道,包含两个专业组件:FastDrive-DiT,一个高效的视频扩散变压器,用于在文本和鸟瞰图(BEV)布局指导下进行高分辨率、时间连贯的视频合成;以及FastRecon3D,一个前馈重建模块,该模块能够快速地构建跨时间的三维高斯表示,确保时空一致性。这两个组件相结合,可以实时生成扩展的驾驶视频(最高可达424x800分辨率,每秒12帧)和相应的动态三维场景,在新视角合成方面实现结构相似性度量(SSIM)为0.811和峰值信噪比(PSNR)为22.84的同时保持参数效率。

论文及项目相关链接

PDF Accepted by NeurIPS Workshop on Next Practices in Video Generation and Evaluation (Short Paper Track)

Summary

驱动场景生成技术的新突破:DriveGen3D框架能够实现高质量、高度可控的动态三维驾驶场景生成,解决了现有方法的局限性问题。它集成了加速长期视频生成与大规模动态场景重建,通过多模式条件控制,实现了高分辨率、时间连贯的视频合成,并能在鸟瞰图布局指导下进行三维重建。

Key Takeaways

  1. DriveGen3D是一个用于生成高质量、高度可控的动态三维驾驶场景的新框架。
  2. 现有驾驶场景合成方法存在计算量大、无法长时间生成或仅关注静态场景重建等问题,DriveGen3D解决了这些局限性。
  3. DriveGen3D集成了加速长期视频生成与大规模动态场景重建,通过多模式条件控制实现视频合成。
  4. 框架包含两个专业组件:FastDrive-DiT用于高效视频扩散变换,实现高分辨率、时间连贯的视频合成;FastRecon3D为前馈重建模块,确保时空一致性。
  5. 该框架能够实时生成扩展的驾驶视频(高达424x800分辨率,每秒12帧)。
  6. 在新型视图合成方面,DriveGen3D实现了结构相似性(SSIM)为0.811和峰值信噪比(PSNR)为22.84。

Cool Papers

点此查看论文截图

SaLon3R: Structure-aware Long-term Generalizable 3D Reconstruction from Unposed Images

Authors:Jiaxin Guo, Tongfan Guan, Wenzhen Dong, Wenzhao Zheng, Wenting Wang, Yue Wang, Yeung Yam, Yun-Hui Liu

Recent advances in 3D Gaussian Splatting (3DGS) have enabled generalizable, on-the-fly reconstruction of sequential input views. However, existing methods often predict per-pixel Gaussians and combine Gaussians from all views as the scene representation, leading to substantial redundancies and geometric inconsistencies in long-duration video sequences. To address this, we propose SaLon3R, a novel framework for Structure-aware, Long-term 3DGS Reconstruction. To our best knowledge, SaLon3R is the first online generalizable GS method capable of reconstructing over 50 views in over 10 FPS, with 50% to 90% redundancy removal. Our method introduces compact anchor primitives to eliminate redundancy through differentiable saliency-aware Gaussian quantization, coupled with a 3D Point Transformer that refines anchor attributes and saliency to resolve cross-frame geometric and photometric inconsistencies. Specifically, we first leverage a 3D reconstruction backbone to predict dense per-pixel Gaussians and a saliency map encoding regional geometric complexity. Redundant Gaussians are compressed into compact anchors by prioritizing high-complexity regions. The 3D Point Transformer then learns spatial structural priors in 3D space from training data to refine anchor attributes and saliency, enabling regionally adaptive Gaussian decoding for geometric fidelity. Without known camera parameters or test-time optimization, our approach effectively resolves artifacts and prunes the redundant 3DGS in a single feed-forward pass. Experiments on multiple datasets demonstrate our state-of-the-art performance on both novel view synthesis and depth estimation, demonstrating superior efficiency, robustness, and generalization ability for long-term generalizable 3D reconstruction. Project Page: https://wrld.github.io/SaLon3R/.

近期在三维高斯拼贴(3DGS)方面的进展已经能够实现通用、即时重建连续输入视图。然而,现有方法通常预测每个像素的高斯分布,并将所有视图的高斯分布作为场景表示进行组合,这导致了长时间序列视频中的大量冗余和几何不一致性。为了解决这一问题,我们提出了SaLon3R,这是一种结构感知的长期3DGS重建新型框架。据我们所知,SaLon3R是在线通用的GS方法中首个能够重建超过50个视图并在超过10FPS的情况下实现50%至90%冗余去除的方法。我们的方法引入了紧凑的锚点基本体,通过可微的显著性感知高斯量化来消除冗余,结合一个三维点转换器来完善锚点属性和显著性,以解决跨帧几何和光度不一致的问题。具体来说,我们首先利用三维重建主干来预测密集的像素级高斯分布和一个编码区域几何复杂性的显著性地图。冗余的高斯分布被压缩成紧凑的锚点,优先处理高复杂性区域。然后,三维点转换器从训练数据中学习三维空间中的空间结构先验知识,以完善锚点属性和显著性,从而实现区域自适应高斯解码以获取几何保真度。无需已知相机参数或测试时间优化,我们的方法可以有效地解决伪影问题并在单次前向传递中删除冗余的3DGS。在多个数据集上的实验表明,我们在新型视图合成和深度估计方面都达到了最先进的性能,证明了我们在长期通用三维重建方面的卓越效率、稳健性和泛化能力。项目页面:https://wrld.github.io/SaLon3R/。

论文及项目相关链接

PDF

Summary

在三维高斯插值(3DGS)的最新进展基础上,研究者提出了一种名为SaLon3R的新型在线通用重建框架,用于解决长期视频序列中的冗余和几何不一致性问题。该框架通过紧凑锚点消除冗余,并利用可微分的显著性感知高斯量化与三维点变换器来细化锚点属性和显著性,解决了跨帧几何和光度不一致问题。这一创新在未知相机参数及测试时无需优化的条件下,单次前向传递即可有效地解决伪影问题并剔除冗余的三维重建结果。该方法具有高效率、鲁棒性和长期通用三维重建的泛化能力,在多个数据集上的实验证明了其在新型视图合成和深度估计方面的卓越性能。项目页面:https://wrld.github.io/SaLon3R/

Key Takeaways

  • SaLon3R是首个在线通用的在线长期三维重建框架,解决了长期视频序列中的冗余和几何不一致性问题。
  • 通过紧凑锚点消除冗余,并采用可微分的显著性感知高斯量化提高锚点的精细程度。

Cool Papers

点此查看论文截图

BSGS: Bi-stage 3D Gaussian Splatting for Camera Motion Deblurring

Authors:An Zhao, Piaopiao Yu, Zhe Zhu, Mingqiang Wei

3D Gaussian Splatting has exhibited remarkable capabilities in 3D scene reconstruction. However, reconstructing high-quality 3D scenes from motion-blurred images caused by camera motion poses a significant challenge.The performance of existing 3DGS-based deblurring methods are limited due to their inherent mechanisms, such as extreme dependence on the accuracy of camera poses and inability to effectively control erroneous Gaussian primitives densification caused by motion blur. To solve these problems, we introduce a novel framework, Bi-Stage 3D Gaussian Splatting, to accurately reconstruct 3D scenes from motion-blurred images. BSGS contains two stages. First, Camera Pose Refinement roughly optimizes camera poses to reduce motion-induced distortions. Second, with fixed rough camera poses, Global RigidTransformation further corrects motion-induced blur distortions. To alleviate multi-subframe gradient conflicts, we propose a subframe gradient aggregation strategy to optimize both stages. Furthermore, a space-time bi-stage optimization strategy is introduced to dynamically adjust primitive densification thresholds and prevent premature noisy Gaussian generation in blurred regions. Comprehensive experiments verify the effectiveness of our proposed deblurring method and show its superiority over the state of the arts.Our source code is available at https://github.com/wsxujm/bsgs

3D高斯展开技术在3D场景重建中展现出了卓越的能力。然而,从由于相机运动造成的运动模糊图像重建高质量3D场景是一个巨大的挑战。现有基于3DGS的去模糊方法的性能由于其固有机制而受到限制,例如极度依赖相机姿态的准确性,以及无法有效控制由运动模糊引起的错误高斯基本元素密集化。为了解决这些问题,我们引入了一种新型框架——双阶段3D高斯展开(Bi-Stage 3D Gaussian Splatting),用于准确地从运动模糊图像重建3D场景。BSGS包含两个阶段。首先,相机姿态优化大致优化相机姿态,以减少运动引起的失真。其次,使用固定的粗略相机姿态,全局刚性变换进一步校正运动引起的模糊失真。为了缓解多子帧梯度冲突,我们提出了一种子帧梯度聚合策略来优化这两个阶段。此外,还引入了时空双阶段优化策略,以动态调整基本元素密集化阈值,并防止模糊区域中过早产生嘈杂的高斯。综合实验验证了我们所提出的去模糊方法的有效性,并展示了其超越现有技术的优势。我们的源代码可在https://github.com/wsxujm/bsgs找到。

论文及项目相关链接

PDF Accept by ACM MM 2025

Summary

现有技术在处理由于相机运动引起的模糊图像在重建高质量3D场景时面临挑战。为此,我们引入了一种名为Bi-Stage 3D Gaussian Splatting的新框架,通过两个阶段准确重建模糊图像的3D场景。首先是相机姿态优化阶段,大致优化相机姿态以减少运动引起的畸变;然后是全局刚体变换阶段,进一步校正运动引起的模糊畸变。通过子帧梯度聚合策略优化两个阶段,并引入时空双阶段优化策略动态调整原始点密度阈值,防止模糊区域过早产生噪声高斯点。实验验证了该方法的优越性。

Key Takeaways

  • 现有方法在重建由于相机运动引起的模糊图像的3D场景时存在挑战。
  • Bi-Stage 3D Gaussian Splatting框架包括两个阶段:相机姿态优化和全局刚体变换。
  • 相机的姿态优化阶段旨在减少运动引起的畸变。
  • 全局刚体变换阶段进一步校正运动引起的模糊畸变。
  • 子帧梯度聚合策略用于优化两个阶段,并引入时空双阶段优化策略来动态调整原始点密度阈值以防止产生噪声高斯点。

Cool Papers

点此查看论文截图

Low-Frequency First: Eliminating Floating Artifacts in 3D Gaussian Splatting

Authors:Jianchao Wang, Peng Zhou, Cen Li, Rong Quan, Jie Qin

3D Gaussian Splatting (3DGS) is a powerful and computationally efficient representation for 3D reconstruction. Despite its strengths, 3DGS often produces floating artifacts, which are erroneous structures detached from the actual geometry and significantly degrade visual fidelity. The underlying mechanisms causing these artifacts, particularly in low-quality initialization scenarios, have not been fully explored. In this paper, we investigate the origins of floating artifacts from a frequency-domain perspective and identify under-optimized Gaussians as the primary source. Based on our analysis, we propose \textit{Eliminating-Floating-Artifacts} Gaussian Splatting (EFA-GS), which selectively expands under-optimized Gaussians to prioritize accurate low-frequency learning. Additionally, we introduce complementary depth-based and scale-based strategies to dynamically refine Gaussian expansion, effectively mitigating detail erosion. Extensive experiments on both synthetic and real-world datasets demonstrate that EFA-GS substantially reduces floating artifacts while preserving high-frequency details, achieving an improvement of 1.68 dB in PSNR over baseline method on our RWLQ dataset. Furthermore, we validate the effectiveness of our approach in downstream 3D editing tasks. Project Website: https://jcwang-gh.github.io/EFA-GS

3D高斯摊铺(3DGS)是一种强大的三维重建计算效率高的表示方法。尽管其性能强大,但3DGS经常会产生浮动伪影,这些伪影是从实际几何结构中脱离的错误结构,并显著降低了视觉保真度。特别是在低质量初始化场景中导致这些伪影的潜在机制尚未被完全探索。在本文中,我们从频域的角度研究了浮动伪影的起源,并确定了未优化的高斯为主要来源。基于我们的分析,我们提出了消除浮动伪影的高斯摊铺(EFA-GS),它选择性地扩展未优化的高斯以优先进行准确的低频学习。此外,我们还引入了基于深度和基于尺度的补充策略来动态优化高斯扩展,有效地减轻了细节侵蚀。在合成和真实世界数据集上的大量实验表明,EFA-GS在减少浮动伪影的同时保留了高频细节,在我们的RWLQ数据集上比基线方法提高了1.68 dB的PSNR。此外,我们在下游的3D编辑任务中验证了我们的方法的有效性。项目网站:https://jcwang-gh.github.io/EFA-GS

简化解释

论文及项目相关链接

PDF Our paper has been accepted by the 24th International Conference on Cyberworlds and recieved the Best Paper Honorable Mention

Summary

3D Gaussian Splatting(3DGS)是一种用于三维重建的强大且计算效率高的表示方法。然而,它会产生漂浮的伪影,这些伪影与实际几何结构分离,严重降低了视觉保真度。本文探讨了漂浮伪影的来源,并从频率域的角度分析了其主要原因。研究指出,高斯函数的优化不足是主要来源。基于分析,提出了Eliminating-Floating-Artifacts Gaussian Splatting(EFA-GS)方法,该方法能够选择性地扩展未优化的高斯函数,优先学习准确的低频信息。此外,还引入了基于深度和尺度的策略来动态优化高斯扩展,有效避免细节损失。实验证明,EFA-GS在减少漂浮伪影的同时保留了高频细节,在RWLQ数据集上的峰值信噪比(PSNR)提高了1.68 dB。同时,验证了该方法在下游三维编辑任务中的有效性。

Key Takeaways

  • 3D Gaussian Splatting (3DGS) 是用于三维重建的一种有效方法,但存在漂浮伪影问题。
  • 漂浮伪影的主要来源是高频域中高斯函数的优化不足。
  • 本文提出了Eliminating-Floating-Artifacts Gaussian Splatting (EFA-GS) 方法,通过选择性扩展未优化的高斯函数来减少漂浮伪影。
  • EFA-GS 还结合了深度信息和尺度信息来动态优化高斯扩展,避免细节损失。
  • 实验结果表明,EFA-GS 在减少漂浮伪影的同时保留了高频细节,提高了图像质量。

Cool Papers

点此查看论文截图

iLRM: An Iterative Large 3D Reconstruction Model

Authors:Gyeongjin Kang, Seungtae Nam, Seungkwon Yang, Xiangyu Sun, Sameh Khamis, Abdelrahman Mohamed, Eunbyung Park

Feed-forward 3D modeling has emerged as a promising approach for rapid and high-quality 3D reconstruction. In particular, directly generating explicit 3D representations, such as 3D Gaussian splatting, has attracted significant attention due to its fast and high-quality rendering, as well as numerous applications. However, many state-of-the-art methods, primarily based on transformer architectures, suffer from severe scalability issues because they rely on full attention across image tokens from multiple input views, resulting in prohibitive computational costs as the number of views or image resolution increases. Toward a scalable and efficient feed-forward 3D reconstruction, we introduce an iterative Large 3D Reconstruction Model (iLRM) that generates 3D Gaussian representations through an iterative refinement mechanism, guided by three core principles: (1) decoupling the scene representation from input-view images to enable compact 3D representations; (2) decomposing fully-attentional multi-view interactions into a two-stage attention scheme to reduce computational costs; and (3) injecting high-resolution information at every layer to achieve high-fidelity reconstruction. Experimental results on widely used datasets, such as RE10K and DL3DV, demonstrate that iLRM outperforms existing methods in both reconstruction quality and speed.

前馈三维建模已成为一种有前景的快速、高质量的三维重建方法。特别是直接生成显式三维表示(如三维高斯散斑)的方法,由于其快速、高质量渲染以及众多应用而受到广泛关注。然而,许多最先进的方法主要基于变压器架构,存在严重的可扩展性问题,因为它们依赖于来自多个输入视图的图像标记的全注意力,随着视图数量或图像分辨率的增加,计算成本变得难以承受。为了实现可扩展和高效的前馈三维重建,我们引入了一种迭代式大型三维重建模型(iLRM),该模型通过迭代细化机制生成三维高斯表示,由三个核心原则指导:(1)将场景表示与输入视图图像解耦,以实现紧凑的三维表示;(2)将全注意力多视图交互分解为两阶段注意力方案,以降低计算成本;(3)在每一层注入高分辨率信息,以实现高保真重建。在广泛使用的数据集(如RE10K和DL3DV)上的实验结果表明,iLRM在重建质量和速度方面都优于现有方法。

论文及项目相关链接

PDF Project page: https://gynjn.github.io/iLRM/

Summary

本文介绍了基于迭代的大型三维重建模型(iLRM)在快速、高质量的三维重建中的优势。通过迭代细化机制生成三维高斯表示,采用三大核心原则,即脱离场景表示的输入视图图像以实现紧凑的三维表示,将全注意力多视图交互分解为两阶段注意力方案以降低计算成本,以及在每一层注入高分辨率信息以实现高保真重建。在广泛使用的数据集RE10K和DL3DV上的实验结果表明,iLRM在重建质量和速度上均优于现有方法。

Key Takeaways

  1. 馈前3D建模已成为快速高质量3D重建的热门方法。
  2. 直接生成显式3D表示(如3D高斯斑点)受到关注,因其快速高质量渲染及广泛应用。
  3. 当前先进方法主要基于变压器架构,面临严重可扩展性问题,随着视图数量或图像分辨率的增加,计算成本显著增加。
  4. 引入迭代大型三维重建模型(iLRM)通过迭代细化机制生成三维高斯表示。
  5. iLRM遵循三大核心原则:脱离场景表示的输入视图图像,采用两阶段注意力方案降低计算成本,以及注入高分辨率信息实现高保真重建。
  6. 在RE10K和DL3DV等数据集上的实验结果表明,iLRM在重建质量和速度方面均优于现有方法。

Cool Papers

点此查看论文截图

X$^{2}$-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction

Authors:Weihao Yu, Yuanhao Cai, Ruyi Zha, Zhiwen Fan, Chenxin Li, Yixuan Yuan

Four-dimensional computed tomography (4D CT) reconstruction is crucial for capturing dynamic anatomical changes but faces inherent limitations from conventional phase-binning workflows. Current methods discretize temporal resolution into fixed phases with respiratory gating devices, introducing motion misalignment and restricting clinical practicality. In this paper, We propose X$^2$-Gaussian, a novel framework that enables continuous-time 4D-CT reconstruction by integrating dynamic radiative Gaussian splatting with self-supervised respiratory motion learning. Our approach models anatomical dynamics through a spatiotemporal encoder-decoder architecture that predicts time-varying Gaussian deformations, eliminating phase discretization. To remove dependency on external gating devices, we introduce a physiology-driven periodic consistency loss that learns patient-specific breathing cycles directly from projections via differentiable optimization. Extensive experiments demonstrate state-of-the-art performance, achieving a 9.93 dB PSNR gain over traditional methods and 2.25 dB improvement against prior Gaussian splatting techniques. By unifying continuous motion modeling with hardware-free period learning, X$^2$-Gaussian advances high-fidelity 4D CT reconstruction for dynamic clinical imaging. Code is publicly available at: https://x2-gaussian.github.io/.

四维计算机断层扫描(4D CT)重建对于捕捉动态解剖结构变化至关重要,但面临着传统相位分箱工作流程的固有局限性。当前的方法通过将时间分辨率离散化为具有呼吸门控设备的固定相位,引入运动错位并限制了临床实用性。在本文中,我们提出了X$^2$-Gaussian,这是一个通过整合动态辐射高斯喷溅与自我监督的呼吸运动学习,实现连续时间4D-CT重建的新型框架。我们的方法通过时空编码器-解码器架构对解剖结构动态进行建模,预测随时间变化的高斯变形,消除了相位离散化。为了消除对外部门控设备的依赖,我们引入了一种生理驱动周期性一致性损失,该损失直接通过学习患者特定的呼吸周期,通过可微分优化从投影中获取信息。大量实验证明了其卓越性能,相比传统方法获得9.93分贝峰值信噪比增益,相比先前的高斯喷溅技术提高了2.25分贝。通过统一连续运动建模与无硬件周期学习,X$^2$-Gaussian推动了高保真4D CT重建在动态临床成像中的应用。代码公开可用在:https://x2-gaussian.github.io/。

论文及项目相关链接

PDF Project Page: https://x2-gaussian.github.io/

Summary

该文提出了一种新型的4D-CT重建框架X$^2$-Gaussian,解决了传统的分相位重建工作流程所面临的问题,为动态解剖变化的捕捉带来了新方法。新框架采用动态辐射高斯投影法及自我监督的呼吸运动学习,能连续记录时间维度上的CT影像信息。其独特的时空编码器解码器结构能够预测随时间变化的高斯形变信息,并且移除依赖于外部门控装置的限制。此外,该研究还引入生理周期一致性损失,从投影中学习患者的特定呼吸周期。实验结果证明了其性能优势,与传统方法相比提高了高达9.93 dB的峰值信噪比(PSNR)。该框架结合了连续运动建模与无硬件周期学习,为动态临床成像的高保真度提供了强大的工具。公开代码链接:链接地址

Key Takeaways

以下为该论文中的关键要点:

  • 提出了一种新型的连续时间四维计算机断层扫描(CT)重建框架X$^2$-Gaussian。该框架通过动态辐射高斯投影法和自我监督的呼吸运动学习实现了连续的CT重建过程。这种方法的优点在于可以捕捉动态的解剖变化。
  • 该方法使用时空编码器解码器结构预测随时间变化的高斯形变信息,避免了传统方法因使用呼吸门控设备引起的运动失准问题。同时引入了一种新的技术去除对外部门控装置的依赖。这种方法的引入增强了其临床实用性。
  • 该论文还介绍了一种基于生理周期的损失函数模型,这种模型可以直接从投影中学习病人的特定呼吸周期,而无需使用任何外部设备。这使得该方法能够自我学习并进行精准诊断。

Cool Papers

点此查看论文截图

CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image

Authors:Arindam Dutta, Meng Zheng, Zhongpai Gao, Benjamin Planche, Anwesha Choudhuri, Terrence Chen, Amit K. Roy-Chowdhury, Ziyan Wu

Reconstructing clothed humans from a single image is a fundamental task in computer vision with wide-ranging applications. Although existing monocular clothed human reconstruction solutions have shown promising results, they often rely on the assumption that the human subject is in an occlusion-free environment. Thus, when encountering in-the-wild occluded images, these algorithms produce multiview inconsistent and fragmented reconstructions. Additionally, most algorithms for monocular 3D human reconstruction leverage geometric priors such as SMPL annotations for training and inference, which are extremely challenging to acquire in real-world applications. To address these limitations, we propose CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-ConsistEncy from a Single Image, a novel pipeline designed to reconstruct occlusion-resilient 3D humans with multiview consistency from a single occluded image, without requiring either ground-truth geometric prior annotations or 3D supervision. Specifically, CHROME leverages a multiview diffusion model to first synthesize occlusion-free human images from the occluded input, compatible with off-the-shelf pose control to explicitly enforce cross-view consistency during synthesis. A 3D reconstruction model is then trained to predict a set of 3D Gaussians conditioned on both the occluded input and synthesized views, aligning cross-view details to produce a cohesive and accurate 3D representation. CHROME achieves significant improvements in terms of both novel view synthesis (upto 3 db PSNR) and geometric reconstruction under challenging conditions.

从单一图像重建穿着衣服的人类是计算机视觉领域的一项基本任务,具有广泛的应用范围。尽管现有的单目穿着人类重建解决方案已经取得了有前景的结果,但它们通常假设人类主体处于无遮挡的环境中。因此,当遇到野外遮挡图像时,这些算法会产生多视角不一致和分散的重建结果。此外,大多数用于单目3D人体重建的算法都依赖于几何先验,如SMPL注释用于训练和推理,这在现实世界的应用中获取它们极具挑战性。为了解决这些局限性,我们提出了CHROME:从单一图像进行遮挡恢复和多视角一致的穿衣人类重建。这是一种新型管道设计,旨在从单个遮挡图像重建遮挡恢复的多视角一致的3D人类,无需真实几何先验注释或3D监督。具体来说,CHROME首先利用多视角扩散模型从遮挡输入中合成无遮挡的人类图像,与现成的姿势控制相结合,在合成过程中显式地强制执行跨视角一致性。然后训练一个3D重建模型,根据遮挡输入和合成视图预测一组3D高斯分布,对齐跨视角的细节以产生连贯和准确的3D表示。CHROME在新型视图合成(高达3分贝PSNR)和具有挑战性的条件下的几何重建方面都取得了显著的改进。

论文及项目相关链接

PDF Accepted at ICCV 2025

Summary

本文提出了一种名为CHROME的新型管道,可从单一遮挡图像重建出具有遮挡恢复能力的三维人体。该方法无需真实几何先验标注或三维监督信息,即可实现遮挡恢复的多视角一致性重建。CHROME首先利用多视角扩散模型合成无遮挡的人体图像,然后通过训练的三维重建模型预测一组以遮挡输入和合成视角为条件的三维高斯分布,实现跨视角的细节对齐,生成连贯且精确的三维表示。该方法在新型视角合成和具有挑战性的条件下的几何重建方面都取得了显著改进。

Key Takeaways

  • 提出了一种新型管道CHROME,用于从单一遮挡图像重建三维人体。
  • CHROME实现了遮挡恢复的多视角一致性重建,无需真实几何先验标注或三维监督信息。
  • CHROME利用多视角扩散模型合成无遮挡的人体图像。
  • 通过训练的三维重建模型预测以遮挡输入和合成视角为条件的三维高斯分布。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
Diffusion Models Diffusion Models
Diffusion Models 方向最新论文已更新,请持续关注 Update in 2025-10-21 LightsOut Diffusion-based Outpainting for Enhanced Lens Flare Removal
下一篇 
GAN GAN
GAN 方向最新论文已更新,请持续关注 Update in 2025-10-21 Dataset Distillation for Super-Resolution without Class Labels and Pre-trained Models
2025-10-21
  目录