⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-11-20 更新
iGaussian: Real-Time Camera Pose Estimation via Feed-Forward 3D Gaussian Splatting Inversion
Authors:Hao Wang, Linqing Zhao, Xiuwei Xu, Jiwen Lu, Haibin Yan
Recent trends in SLAM and visual navigation have embraced 3D Gaussians as the preferred scene representation, highlighting the importance of estimating camera poses from a single image using a pre-built Gaussian model. However, existing approaches typically rely on an iterative \textit{render-compare-refine} loop, where candidate views are first rendered using NeRF or Gaussian Splatting, then compared against the target image, and finally, discrepancies are used to update the pose. This multi-round process incurs significant computational overhead, hindering real-time performance in robotics. In this paper, we propose iGaussian, a two-stage feed-forward framework that achieves real-time camera pose estimation through direct 3D Gaussian inversion. Our method first regresses a coarse 6DoF pose using a Gaussian Scene Prior-based Pose Regression Network with spatial uniform sampling and guided attention mechanisms, then refines it through feature matching and multi-model fusion. The key contribution lies in our cross-correlation module that aligns image embeddings with 3D Gaussian attributes without differentiable rendering, coupled with a Weighted Multiview Predictor that fuses features from Multiple strategically sampled viewpoints. Experimental results on the NeRF Synthetic, Mip-NeRF 360, and T&T+DB datasets demonstrate a significant performance improvement over previous methods, reducing median rotation errors to 0.2° while achieving 2.87 FPS tracking on mobile robots, which is an impressive 10 times speedup compared to optimization-based approaches. Code: https://github.com/pythongod-exe/iGaussian
近年来,SLAM(同时定位与地图构建)和视觉导航的最新趋势已经采用了3D高斯作为首选的场景表示,突显出利用预先构建的高斯模型从单张图像估计相机姿态的重要性。然而,现有方法通常依赖于迭代渲染-比较-修正循环,首先使用NeRF或高斯贴图渲染候选视图,然后与目标图像进行比较,最后利用差异来更新姿态。这种多轮过程产生了巨大的计算开销,阻碍了机器人在实时性能方面的表现。在本文中,我们提出了iGaussian,这是一个两阶段的前馈框架,通过直接3D高斯反演实现实时相机姿态估计。我们的方法首先使用基于高斯场景先验的姿态回归网络进行粗略的6DoF姿态回归,该网络具有空间均匀采样和导向注意机制。然后通过特征匹配和多模型融合进行细化。关键贡献在于我们的互相关模块,它能够将图像嵌入与3D高斯属性对齐,而无需可微分渲染,以及我们的加权多视角预测器,该预测器融合了来自多个战略采样视角的特征。在NeRF合成、Mip-NeRF 360和T&T+DB数据集上的实验结果表明,与以前的方法相比,我们的方法性能显著提高,将中位数旋转误差减少到0.2°,同时在移动机器人上实现2.87 FPS的跟踪速度,与基于优化的方法相比,速度提高了令人印象深刻的10倍。代码地址:https://github.com/pythongod-exe/iGaussian
论文及项目相关链接
PDF IROS 2025
Summary
论文提出iGaussian框架,采用两阶段前馈方式实现实时相机姿态估计。通过高斯场景先验姿势回归网络得到粗略的6DoF姿态,再通过特征匹配和多模型融合进行细化。关键贡献在于跨相关模块,无需可微分渲染即可对齐图像嵌入与3D高斯属性,同时提出加权多视角预测器融合多视角特征。在NeRF合成、Mip-NeRF 360和T&T+DB数据集上的实验结果表明,该方法较之前的方法有显著改善,将中位数旋转误差减少到0.2°,在移动机器人上实现2.87 FPS的跟踪速度,是优化方法的10倍。
Key Takeaways
- 论文提出iGaussian框架,实现实时相机姿态估计,采用两阶段前馈方式。
- 利用高斯场景先验姿势回归网络得到初步姿态估计。
- 跨相关模块无需可微分渲染即可对齐图像嵌入与3D高斯属性。
- 加权多视角预测器融合多视角特征以提高估计精度。
- 在多个数据集上的实验结果表明,该方法较之前的方法有显著改善。
- iGaussian实现了较高的跟踪速度,达到2.87 FPS,较优化方法提速10倍。
点此查看论文截图
PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos
Authors:Dianbing Xi, Guoyuan An, Jingsen Zhu, Zhijian Liu, Yuan Liu, Ruiyuan Zhang, Jiayuan Lu, Yuchi Huo, Rui Wang
We propose PFAvatar (Pose-Fusion Avatar), a new method that reconstructs high-quality 3D avatars from Outfit of the Day(OOTD) photos, which exhibit diverse poses, occlusions, and complex backgrounds. Our method consists of two stages: (1) fine-tuning a pose-aware diffusion model from few-shot OOTD examples and (2) distilling a 3D avatar represented by a neural radiance field (NeRF). In the first stage, unlike previous methods that segment images into assets (e.g., garments, accessories) for 3D assembly, which is prone to inconsistency, we avoid decomposition and directly model the full-body appearance. By integrating a pre-trained ControlNet for pose estimation and a novel Condition Prior Preservation Loss (CPPL), our method enables end-to-end learning of fine details while mitigating language drift in few-shot training. Our method completes personalization in just 5 minutes, achieving a 48x speed-up compared to previous approaches. In the second stage, we introduce a NeRF-based avatar representation optimized by canonical SMPL-X space sampling and Multi-Resolution 3D-SDS. Compared to mesh-based representations that suffer from resolution-dependent discretization and erroneous occluded geometry, our continuous radiance field can preserve high-frequency textures (e.g., hair) and handle occlusions correctly through transmittance. Experiments demonstrate that PFAvatar outperforms state-of-the-art methods in terms of reconstruction fidelity, detail preservation, and robustness to occlusions/truncations, advancing practical 3D avatar generation from real-world OOTD albums. In addition, the reconstructed 3D avatar supports downstream applications such as virtual try-on, animation, and human video reenactment, further demonstrating the versatility and practical value of our approach.
我们提出了PFAvatar(姿态融合化身),这是一种新的方法,可以从日常穿搭(OOTD)照片重建高质量的三维化身,这些照片展示了各种姿态、遮挡和复杂背景。我们的方法分为两个阶段:(1)通过少量OOTD示例对姿态感知扩散模型进行微调;(2)对一个由神经辐射场(NeRF)表示的3D化身进行提炼。在第一阶段,与以往将图像分割成资产(如服装、配饰)进行三维组装的方法不同,这种方法容易导致不一致性。我们避免分解,直接对全身外观进行建模。通过集成预训练的ControlNet进行姿态估计和新颖的条件先验保留损失(CPPL),我们的方法能够在端到端学习中精细细节,同时减少少量训练中的语言漂移。我们的方法只需5分钟即可完成个性化设置,与以前的方法相比,实现了48倍的速度提升。在第二阶段,我们引入了一种基于NeRF的化身表示,通过标准SMPL-X空间采样和多分辨率三维SDS进行优化。与基于网格的表示相比,后者受到分辨率相关离散化和遮挡几何错误的影响,我们连续的辐射场可以保留高频纹理(例如头发)并通过透光率正确处理遮挡。实验表明,PFAvatar在重建保真度、细节保留以及遮挡/截断鲁棒性方面优于最先进的方法,推动了从现实世界OOTD相册生成实用三维化身的进步。此外,重建的3D化身支持下游应用,如虚拟试穿、动画和人类视频重现等,进一步证明了我们方法的通用性和实用价值。
论文及项目相关链接
PDF Accepted by AAAI 2026
Summary
基于Pose-Fusion Avatar方法,可以从日常穿搭照片重建高质量3D人物模型。该方法分为两个阶段:一是通过少量日常穿搭照片微调姿态感知扩散模型,二是通过神经辐射场(NeRF)表示3D人物模型。该方法避免了资产分解的不一致性,能端到端地学习精细细节,并缓解少样本训练中的语言漂移问题。第二阶段采用基于NeRF的人物表示法,通过标准SMPL-X空间采样和多分辨率3D-SDS进行优化。相较于网格表示法,连续辐射场能保留高频纹理,正确处理遮挡问题。实验表明,Pose-Fusion Avatar在重建保真度、细节保留、遮挡/截断鲁棒性方面超越现有方法,推动实际日常穿搭3D人物模型的生成。
Key Takeaways
- PFAvatar能从多样姿态、遮挡和复杂背景的日常穿搭照片重建高质量3D人物模型。
- 方法分为两个阶段:微调姿态感知扩散模型和通过NeRF表示3D人物模型。
- 避免资产分解的不一致性,能端到端地学习精细细节,并缓解少样本训练中的语言漂移。
- 采用基于NeRF的人物表示法,能保留高频纹理,正确处理遮挡问题。
- 相较于网格表示法,NeRF具有优势。
- PFAvatar在重建保真度、细节保留、遮挡/截断鲁棒性方面超越现有方法。