⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-11-22 更新
TRIM: Scalable 3D Gaussian Diffusion Inference with Temporal and Spatial Trimming
Authors:Zeyuan Yin, Xiaoming Liu
Recent advances in 3D Gaussian diffusion models suffer from time-intensive denoising and post-denoising processing due to the massive number of Gaussian primitives, resulting in slow generation and limited scalability along sampling trajectories. To improve the efficiency of 3D diffusion models, we propose $\textbf{TRIM}$ ($\textbf{T}$rajectory $\textbf{R}$eduction and $\textbf{I}$nstance $\textbf{M}$ask denoising), a post-training approach that incorporates both temporal and spatial trimming strategies, to accelerate inference without compromising output quality while supporting the inference-time scaling for Gaussian diffusion models. Instead of scaling denoising trajectories in a costly end-to-end manner, we develop a lightweight selector model to evaluate latent Gaussian primitives derived from multiple sampled noises, enabling early trajectory reduction by selecting candidates with high-quality potential. Furthermore, we introduce instance mask denoising to prune learnable Gaussian primitives by filtering out redundant background regions, reducing inference computation at each denoising step. Extensive experiments and analysis demonstrate that TRIM significantly improves both the efficiency and quality of 3D generation. Source code is available at $\href{https://github.com/zeyuanyin/TRIM}{link}$.
近期三维高斯扩散模型的进展因大量高斯基元而面临耗时的去噪和去噪后处理,导致生成速度慢和沿采样轨迹的可扩展性有限。为了提高三维扩散模型的效率,我们提出了TRIM(轨迹减少和实例掩膜去噪),这是一种后训练方法,结合了时间和空间的修剪策略,以加速推理过程,同时支持高斯扩散模型的推理时间缩放而不损害输出质量。我们并不以昂贵的方式端到端地调整去噪轨迹的缩放,而是开发了一个轻量级的选择器模型,以评估来自多个采样噪声的潜在高斯基元,通过选择具有高质量潜力的候选者来实现早期的轨迹减少。此外,我们引入了实例掩膜去噪,通过过滤掉冗余的背景区域来修剪可学习的高斯基元,从而减少每个去噪步骤的推理计算。大量的实验和分析表明,TRIM显著提高了三维生成的质量和效率。源代码可在链接中找到。
论文及项目相关链接
PDF NeurIPS 2025
Summary
该文本介绍了针对3D高斯扩散模型的效率问题,提出了一种名为TRIM的后训练加速方法。该方法结合了轨迹减少和实例掩模去噪策略,可在不损失输出质量的情况下加速推理,并支持高斯扩散模型的推理时间缩放。通过引入一个轻量级的选择模型,实现了早期轨迹减少,并引入了实例掩模去噪来减少每个去噪步骤的推理计算。实验和分析表明,TRIM可显著提高3D生成的效率和品质。
Key Takeaways
- TRIM旨在提高3D高斯扩散模型的效率,通过结合轨迹减少和实例掩模去噪策略实现。
- TRIM采用后训练方式,无需调整模型结构即可加速推理。
- 使用轻量级的选择模型来实现早期轨迹减少,通过选择具有高潜力候选者来优化推理效率。
- 实例掩模去噪策略用于过滤掉冗余背景区域,减少每个去噪步骤的推理计算。
- TRIM在不影响输出质量的前提下显著提高3D生成的效率。
- TRIM方法已进行广泛实验和分析验证,证明其有效性和可靠性。
点此查看论文截图
EOGS++: Earth Observation Gaussian Splatting with Internal Camera Refinement and Direct Panchromatic Rendering
Authors:Pierrick Bournez, Luca Savant Aira, Thibaud Ehret, Gabriele Facciolo
Recently, 3D Gaussian Splatting has been introduced as a compelling alternative to NeRF for Earth observation, offering com- petitive reconstruction quality with significantly reduced training times. In this work, we extend the Earth Observation Gaussian Splatting (EOGS) framework to propose EOGS++, a novel method tailored for satellite imagery that directly operates on raw high-resolution panchromatic data without requiring external preprocessing. Furthermore, leveraging optical flow techniques we embed bundle adjustment directly within the training process, avoiding reliance on external optimization tools while improving camera pose estimation. We also introduce several improvements to the original implementation, including early stopping and TSDF post-processing, all contributing to sharper reconstructions and better geometric accuracy. Experiments on the IARPA 2016 and DFC2019 datasets demonstrate that EOGS++ achieves state-of-the-art performance in terms of reconstruction quality and effi- ciency, outperforming the original EOGS method and other NeRF-based methods while maintaining the computational advantages of Gaussian Splatting. Our model demonstrates an improvement from 1.33 to 1.19 mean MAE errors on buildings compared to the original EOGS models
最近,3D高斯涂抹(3D Gaussian Splatting)作为NeRF的替代品,在地球观测领域引起了广泛关注。它提供了有竞争力的重建质量,并大大缩短了训练时间。在这项工作中,我们扩展了地球观测高斯涂抹(EOGS)框架,提出了EOGS++,这是一种针对卫星图像的新方法,它可以直接在原始高分辨率的全色数据上操作,无需外部预处理。此外,我们借助光流技术,将捆绑调整直接嵌入到训练过程中,避免了对外部优化工具的依赖,同时提高了相机姿态估计。我们还对原始实现进行了几项改进,包括早期停止和TSDF后处理,所有这些都有助于实现更清晰的重建和更高的几何精度。在IARPA 2016和DFC2019数据集上的实验表明,EOGS++在重建质量和效率方面达到了最新技术水平,超越了原始的EOGS方法和其他的NeRF方法,同时保持了高斯涂抹的计算优势。与原始EOGS模型相比,我们的模型在建筑物上的平均MAE误差从1.33改进到1.19。
论文及项目相关链接
PDF 8 pages, ISPRS
Summary
本文介绍了针对地球观测领域的三维高斯展开技术的新进展。研究者扩展了地球观测高斯展开(EOGS)框架,提出了一种针对卫星图像的新方法EOGS++,可直接在原始高分辨率单色数据上操作,无需外部预处理。该方法结合光学流技术,将束调整嵌入到训练过程中,提高了相机姿态估计的精度,同时避免了对外部优化工具的依赖。此外,研究者还对原始实现进行了多项改进,包括早期停止和TSDF后处理,这些改进有助于获得更清晰的重构结果和更高的几何精度。在IARPA 2016和DFC2019数据集上的实验表明,EOGS++在重构质量和效率方面达到了最新水平,相较于原始EOGS方法和基于NeRF的方法表现出色,同时保持了高斯展开的运算优势。
Key Takeaways
- EOGS++是对地球观测高斯展开(EOGS)框架的扩展,专为卫星图像设计。
- EOGS++可直接在原始高分辨率单色数据上操作,无需外部预处理。
- 通过结合光学流技术,EOGS++将束调整嵌入训练过程,提高了相机姿态估计的精度。
- EOGS++避免了对外部优化工具的依赖。
- 研究者对原始实现进行了多项改进,包括早期停止和TSDF后处理。
- EOGS++在IARPA 2016和DFC2019数据集上实现了最先进的性能,相较于其他方法具有更好的重构质量和效率。
点此查看论文截图
Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling
Authors:Minseok Seo, Mark Hamilton, Changick Kim
We present \textbf{Upsample Anything}, a lightweight test-time optimization (TTO) framework that restores low-resolution features to high-resolution, pixel-wise outputs without any training. Although Vision Foundation Models demonstrate strong generalization across diverse downstream tasks, their representations are typically downsampled by 14x/16x (e.g., ViT), which limits their direct use in pixel-level applications. Existing feature upsampling approaches depend on dataset-specific retraining or heavy implicit optimization, restricting scalability and generalization. Upsample Anything addresses these issues through a simple per-image optimization that learns an anisotropic Gaussian kernel combining spatial and range cues, effectively bridging Gaussian Splatting and Joint Bilateral Upsampling. The learned kernel acts as a universal, edge-aware operator that transfers seamlessly across architectures and modalities, enabling precise high-resolution reconstruction of features, depth, or probability maps. It runs in only $\approx0.419 \text{s}$ per 224x224 image and achieves state-of-the-art performance on semantic segmentation, depth estimation, and both depth and probability map upsampling.
我们提出了Upsample Anything,这是一个轻量级的测试时优化(TTO)框架,无需任何训练即可将低分辨率特征恢复为高分辨率的像素级输出。尽管视觉基础模型在不同下游任务中表现出强大的泛化能力,但它们的表示通常通过14x/16x(例如ViT)进行降采样,这限制了它们在像素级应用中的直接使用。现有的特征上采样方法依赖于特定数据集的重训练或复杂的隐式优化,这限制了可扩展性和泛化能力。Upsample Anything通过简单的单图像优化解决了这些问题,该优化学习了一个结合空间和范围线索的定向高斯核,有效地桥接了高斯喷射和联合双边上采样。学习的内核就像一个通用的边缘感知算子,可以无缝地跨越架构和模式进行转移,能够实现特征、深度或概率图的精确高分辨率重建。它在语义分割、深度估计以及深度和概率图上采样方面均达到了最新技术水平,每个224x224图像的运行时间仅为约0.419秒。
论文及项目相关链接
PDF 15 pages, 12 figures
Summary
本文介绍了一种名为“Upsample Anything”的轻量级测试时优化(TTO)框架,它能在无需任何训练的情况下,将低分辨率特征恢复为高分辨率的像素级输出。该框架解决了现有特征上采样方法依赖于特定数据集的重训练或隐式优化的问题,通过简单的单图像优化,学习一个结合空间和范围线索的定向高斯内核,有效地桥接了高斯平铺和联合双边上采样。所学习的内核作为通用、边缘感知操作符,可无缝地跨越架构和模态,实现对特征、深度或概率地图的高精度高分辨率重建。
Key Takeaways
- “Upsample Anything”是一个轻量级的测试时优化(TTO)框架,能够恢复低分辨率特征至高分辨率的像素级输出,无需额外训练。
- 现有特征上采样方法存在局限性,如依赖于特定数据集的重训练或隐式优化,限制了其可扩展性和通用性。
- 该框架通过简单的学习机制,结合空间和范围线索,学习一个定向高斯内核,实现了有效的特征上采样。
- 学习的高斯内核作为一个通用、边缘感知操作符,可以无缝地应用于各种架构和模态。
- Upsample Anything框架能够实现高精度的特征、深度或概率地图重建。
- 该框架在处理224x224大小的图像时,运行时间仅为约0.419秒。
点此查看论文截图
Clustered Error Correction with Grouped 4D Gaussian Splatting
Authors:Taeho Kang, Jaeyeon Park, Kyungjin Lee, Youngki Lee
Existing 4D Gaussian Splatting (4DGS) methods struggle to accurately reconstruct dynamic scenes, often failing to resolve ambiguous pixel correspondences and inadequate densification in dynamic regions. We address these issues by introducing a novel method composed of two key components: (1) Elliptical Error Clustering and Error Correcting Splat Addition that pinpoints dynamic areas to improve and initialize fitting splats, and (2) Grouped 4D Gaussian Splatting that improves consistency of mapping between splats and represented dynamic objects. Specifically, we classify rendering errors into missing-color and occlusion types, then apply targeted corrections via backprojection or foreground splitting guided by cross-view color consistency. Evaluations on Neural 3D Video and Technicolor datasets demonstrate that our approach significantly improves temporal consistency and achieves state-of-the-art perceptual rendering quality, improving 0.39dB of PSNR on the Technicolor Light Field dataset. Our visualization shows improved alignment between splats and dynamic objects, and the error correction method’s capability to identify errors and properly initialize new splats. Our implementation details and source code are available at https://github.com/tho-kn/cem-4dgs.
现有的四维高斯贴片(4DGS)方法在重建动态场景时面临困难,通常无法解决模糊像素对应关系以及动态区域密度不足的问题。我们通过引入一种新型方法来解决这些问题,该方法包含两个关键组成部分:(1)椭圆误差聚类及误差修正贴片添加,该方法能够精准定位动态区域,以改善并初始化拟合贴片;(2)分组四维高斯贴片技术,该技术可提高贴片之间映射的一致性,并体现动态对象。具体来说,我们将渲染错误分为缺色和遮挡两种类型,然后通过反向投影或前景分割进行有针对性的校正,这些校正操作由跨视图色彩一致性引导。在Neural 3D Video和Technicolor数据集上的评估表明,我们的方法大大提高了时间一致性,达到了先进的感知渲染质量,在Technicolor Light Field数据集上提高了0.39dB的峰值信噪比。我们的可视化显示,贴片与动态对象之间的对齐得到了改善,误差校正方法能够识别错误并适当初始化新贴片。我们的具体实现细节和源代码可在https://github.com/tho-kn/cem-4dgs找到。
论文及项目相关链接
PDF 16 pages, 8 figures, SIGGRAPH Asia Conference Papers 2025
Summary
本文解决现有四维高斯模糊(4DGS)技术在重建动态场景时遇到的难题,通过引入新的技术,提高了动态区域的准确性和映射一致性。包括两点创新内容:椭圆误差聚类及错误校正添加点用于优化初始化拟合模斑;群组四维高斯模糊增强模斑间映射的一致性。新技术可有效分类渲染误差,定向纠正色差与遮挡问题。评估表明,新方案提升时间连贯性,达到业内领先的感知渲染质量。具体细节与源代码公开于GitHub。
Key Takeaways
- 新技术解决现有四维高斯模糊(4DGS)在重建动态场景时的缺陷。
- 创新椭圆误差聚类与错误校正添加点技术,提高动态区域的准确性。
- 群组四维高斯模糊技术增强了模斑间的映射一致性。
- 有效分类渲染误差,包括色差与遮挡问题,并进行定向纠正。
- 新技术提高了时间连贯性,并在感知渲染质量上达到业界领先水平。
- 技术在Neural 3D Video和Technicolor数据集上表现优异。
- 可视化展示改进后的模斑与动态物体对齐效果及错误纠正能力。
点此查看论文截图
CuriGS: Curriculum-Guided Gaussian Splatting for Sparse View Synthesis
Authors:Zijian Wu, Mingfeng Jiang, Zidian Lin, Ying Song, Hanjie Ma, Qun Wu, Dongping Zhang, Guiyang Pu
3D Gaussian Splatting (3DGS) has recently emerged as an efficient, high-fidelity representation for real-time scene reconstruction and rendering. However, extending 3DGS to sparse-view settings remains challenging because of supervision scarcity and overfitting caused by limited viewpoint coverage. In this paper, we present CuriGS, a curriculum-guided framework for sparse-view 3D reconstruction using 3DGS. CuriGS addresses the core challenge of sparse-view synthesis by introducing student views: pseudo-views sampled around ground-truth poses (teacher). For each teacher, we generate multiple groups of student views with different perturbation levels. During training, we follow a curriculum schedule that gradually unlocks higher perturbation level, randomly sampling candidate students from the active level to assist training. Each sampled student is regularized via depth-correlation and co-regularization, and evaluated using a multi-signal metric that combines SSIM, LPIPS, and an image-quality measure. For every teacher and perturbation level, we periodically retain the best-performing students and promote those that satisfy a predefined quality threshold to the training set, resulting in a stable augmentation of sparse training views. Experimental results show that CuriGS outperforms state-of-the-art baselines in both rendering fidelity and geometric consistency across various synthetic and real sparse-view scenes. Project page: https://zijian1026.github.io/CuriGS/
3D 高斯采样(3DGS)最近被作为一种高效的高保真表示技术,用于实时场景重建和渲染。然而,将3DGS扩展到稀疏视角设置仍然是一个挑战,因为缺乏监督以及由于视角覆盖有限导致的过拟合问题。在本文中,我们提出了CuriGS,这是一个使用3DGS的稀疏视图3D重建的课程教学框架。CuriGS通过引入学生视角来解决稀疏视角合成的核心挑战:即在真实姿势周围采样的伪视角(教师)。对于每个教师,我们生成多组具有不同扰动水平的学生视角。在训练过程中,我们遵循一个课程时间表,逐渐解锁更高的扰动水平,从活动水平中随机采样候选学生以辅助训练。每个采样的学生都会通过深度关联和协同规则进行规则化,并使用结合SSIM、LPIPS和图像质量测量的多信号度量进行评估。对于每个教师和每个扰动水平,我们定期保留表现最好的学生,并将满足预设质量阈值的学生提升到训练集,从而实现稀疏训练视角的稳定增强。实验结果表明,CuriGS在渲染保真度和几何一致性方面优于各种合成和真实稀疏视角场景的最先进基线。项目页面:https://zijian1026.github.io/CuriGS/
论文及项目相关链接
Summary
该论文提出了一种名为CuriGS的课程引导框架,用于在稀疏视角下进行基于3DGS的三维重建。CuriGS通过引入学生视角来解决稀疏视角合成中的核心挑战,即通过不同扰动级别生成多个学生视角样本,遵循课程安排逐步解锁更高扰动级别。同时,通过深度关联和协同规则化对采样出的学生进行调整,并使用多信号度量标准进行评价。实验结果表明,CuriGS在合成渲染的稀疏视角场景中,无论是在渲染保真度还是几何一致性方面都优于现有技术基线。
Key Takeaways
- 3DGS在实时场景重建和渲染中具有高效、高保真表示能力。
- 稀疏视角设置下扩展3DGS面临挑战,如监督稀缺和因视角覆盖有限而导致的过拟合问题。
- CuriGS是一种课程引导框架,通过引入学生视角来解决稀疏视角合成中的挑战。
- 学生视角是通过在真实姿态周围采样伪视角来生成的,每个老师对应生成多个不同扰动级别的学生视角。
- 遵循课程安排逐步解锁更高扰动级别,通过深度关联和协同规则化调整采样出的学生视角。
- 使用多信号度量标准结合SSIM、LPIPS和图像质量度量进行评价。