⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-09-17 更新
Sphere-GAN: a GAN-based Approach for Saliency Estimation in 360° Videos
Authors:Mahmoud Z. A. Wahba, Sara Baldoni, Federica Battisti
The recent success of immersive applications is pushing the research community to define new approaches to process 360{\deg} images and videos and optimize their transmission. Among these, saliency estimation provides a powerful tool that can be used to identify visually relevant areas and, consequently, adapt processing algorithms. Although saliency estimation has been widely investigated for 2D content, very few algorithms have been proposed for 360{\deg} saliency estimation. Towards this goal, we introduce Sphere-GAN, a saliency detection model for 360{\deg} videos that leverages a Generative Adversarial Network with spherical convolutions. Extensive experiments were conducted using a public 360{\deg} video saliency dataset, and the results demonstrate that Sphere-GAN outperforms state-of-the-art models in accurately predicting saliency maps.
近期沉浸式应用的成功推动了研究界对处理360°图像和视频以及优化其传输方法的新定义。其中,显著性估计提供了一种强大的工具,可以用来识别视觉相关区域,并据此调整处理算法。虽然显著性估计在二维内容方面得到了广泛的研究,但对于360°显著性估计却鲜有算法提出。为此,我们引入了Sphere-GAN,这是一种用于预测球视频的显著性估计模型。该模型采用生成对抗网络(GAN)和球面卷积技术。通过使用公开的360°视频显著性数据集进行的广泛实验表明,Sphere-GAN在准确预测显著性映射方面优于当前的最先进的模型。
论文及项目相关链接
Summary
该文探讨沉浸式应用成功推动了对处理全景图像和视频的新方法的研究,并着重优化其传输效率。文中介绍了显著性估计在全景图像和视频处理中的应用,用于识别视觉重要区域并据此调整处理算法。虽然显著性估计已广泛应用于二维内容,但针对全景图像和视频的相关算法却相对较少。为此,文章提出了一种名为Sphere-GAN的全景视频显著性检测模型,该模型利用生成对抗网络进行球面卷积。实验证明,Sphere-GAN在预测显著性地图方面优于现有模型。
Key Takeaways
- 沉浸式应用的成功推动了全景图像和视频处理的研究。
- 显著性估计用于识别全景图像和视频中的视觉重要区域。
- Sphere-GAN模型首次被提出用于全景视频的显著性检测。
- Sphere-GAN模型利用生成对抗网络进行球面卷积。
- Sphere-GAN模型在预测全景视频的显著性地图方面表现出优异性能。
- 实验证明Sphere-GAN优于现有模型。
点此查看论文截图







ROSGS: Relightable Outdoor Scenes With Gaussian Splatting
Authors:Lianjun Liao, Chunhui Zhang, Tong Wu, Henglei Lv, Bailin Deng, Lin Gao
Image data captured outdoors often exhibit unbounded scenes and unconstrained, varying lighting conditions, making it challenging to decompose them into geometry, reflectance, and illumination. Recent works have focused on achieving this decomposition using Neural Radiance Fields (NeRF) or the 3D Gaussian Splatting (3DGS) representation but remain hindered by two key limitations: the high computational overhead associated with neural networks of NeRF and the use of low-frequency lighting representations, which often result in inefficient rendering and suboptimal relighting accuracy. We propose ROSGS, a two-stage pipeline designed to efficiently reconstruct relightable outdoor scenes using the Gaussian Splatting representation. By leveraging monocular normal priors, ROSGS first reconstructs the scene’s geometry with the compact 2D Gaussian Splatting (2DGS) representation, providing an efficient and accurate geometric foundation. Building upon this reconstructed geometry, ROSGS then decomposes the scene’s texture and lighting through a hybrid lighting model. This model effectively represents typical outdoor lighting by employing a spherical Gaussian function to capture the directional, high-frequency components of sunlight, while learning a radiance transfer function via Spherical Harmonic coefficients to model the remaining low-frequency skylight comprehensively. Both quantitative metrics and qualitative comparisons demonstrate that ROSGS achieves state-of-the-art performance in relighting outdoor scenes and highlight its ability to deliver superior relighting accuracy and rendering efficiency.
户外采集的图像数据通常呈现无边界场景和无约束、多变的照明条件,将其分解为几何、反射和照明具有挑战性。近期的研究工作主要集中在利用神经辐射场(NeRF)或三维高斯贴图(3DGS)表示来实现这种分解,但仍存在两个主要局限性:一是与NeRF的神经网络相关的高计算开销,二是使用低频光照表示,通常导致渲染效率低下和重照明精度不佳。我们提出了ROSGS,这是一个两阶段管道,旨在利用高斯贴图表示有效地重建可重新照明的户外场景。ROSGS通过利用单目法线先验,首先用紧凑的二维高斯贴图(2DGS)表示重建场景的几何结构,提供了高效且准确的几何基础。在此基础上,ROSGS通过混合照明模型分解场景的纹理和照明。该模型通过采用球形高斯函数有效地表示典型户外照明,捕捉阳光的方向和高频成分,同时通过学习球面谐波系数的辐射传输函数来全面模拟其余的低频天空光。定量指标和定性比较均表明,ROSGS在重新照明户外场景方面达到了最新技术水平,并突出了其在提供优越的重新照明精度和渲染效率方面的能力。
论文及项目相关链接
Summary
本文提出一种基于高斯拼贴表示的两阶段管道,用于有效地重建可重新照明的户外场景。该方法利用单目法线先验,首先通过紧凑的二维高斯拼贴表示重建场景的几何结构,然后在此基础上通过混合照明模型分解场景的纹理和照明。该模型采用球形高斯函数捕捉阳光的方向性高频成分,并通过球面谐波系数学习剩余的低频天空光模型。此方法在重新照明户外场景方面达到了最先进的性能,并突出了其优越的重新照明精度和渲染效率。
Key Takeaways
- 户外图像数据由于无界场景和不受约束的照明条件,分解为几何、反射和照明具有挑战性。
- 现有方法如NeRF和3DGS存在计算开销大、光照表示低频等问题,导致渲染效率低下和重新照明精度不佳。
- 提出的ROSGS方法采用两阶段管道,首先通过2DGS表示重建场景几何,然后利用混合照明模型进行纹理和照明分解。
- ROSGS利用单目法线先验,提供高效准确的几何基础。
- 球形高斯函数用于捕捉阳光的方向性高频成分,而球面谐波系数则用于模拟低频天空光。
- ROSGS在重新照明户外场景方面达到最新技术性能。
点此查看论文截图




SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene Completion
Authors:Zhiwen Yang, Yuxin Peng
Camera-based 3D Semantic Scene Completion (SSC) is a critical task in autonomous driving systems, assessing voxel-level geometry and semantics for holistic scene perception. While existing voxel-based and plane-based SSC methods have achieved considerable progress, they struggle to capture physical regularities for realistic geometric details. On the other hand, neural reconstruction methods like NeRF and 3DGS demonstrate superior physical awareness, but suffer from high computational cost and slow convergence when handling large-scale, complex autonomous driving scenes, leading to inferior semantic accuracy. To address these issues, we propose the Semantic-PHysical Engaged REpresentation (SPHERE) for camera-based SSC, which integrates voxel and Gaussian representations for joint exploitation of semantic and physical information. First, the Semantic-guided Gaussian Initialization (SGI) module leverages dual-branch 3D scene representations to locate focal voxels as anchors to guide efficient Gaussian initialization. Then, the Physical-aware Harmonics Enhancement (PHE) module incorporates semantic spherical harmonics to model physical-aware contextual details and promote semantic-geometry consistency through focal distribution alignment, generating SSC results with realistic details. Extensive experiments and analyses on the popular SemanticKITTI and SSCBench-KITTI-360 benchmarks validate the effectiveness of SPHERE. The code is available at https://github.com/PKU-ICST-MIPL/SPHERE_ACMMM2025.
基于相机的三维语义场景补全(SSC)是自动驾驶系统中的一项关键任务,它评估整体场景感知中的体素级几何和语义。虽然现有的基于体素和基于平面的SSC方法已经取得了相当的进展,但它们在捕捉现实几何细节的物理规律方面遇到了困难。另一方面,神经重建方法,如NeRF和3DGS,表现出卓越的物理感知能力,但在处理大规模、复杂的自动驾驶场景时,由于计算成本高昂和收敛缓慢,导致语义精度较低。为了解决这些问题,我们提出了用于相机SSC的语义物理参与表示(SPHERE),它结合了体素和高斯表示,以联合利用语义和物理信息。首先,语义引导高斯初始化(SGI)模块利用双分支三维场景表示来确定关键体素作为锚点来引导高效的高斯初始化。然后,物理感知和谐增强(PHE)模块结合语义球面谐波来模拟物理感知的上下文细节,并通过焦点分布对齐来促进语义几何一致性,生成具有现实细节的SSC结果。在流行的SemanticKITTI和SSCBench-KITTI-360基准测试集上进行的大量实验和分析验证了SPHERE的有效性。代码可在https://github.com/PKU-ICST-MIPL/SPHERE_ACMMM2025上找到。
论文及项目相关链接
PDF 10 pages, 6 figures
Summary
本文介绍了针对自主驾驶系统中的基于相机的三维语义场景完成(SSC)任务的关键问题,并提出了Semantic-PHysical Engaged REpresentation(SPHERE)解决方案。该方案结合了体素和高斯表示,旨在高效利用语义和物理信息。通过SGI模块进行高斯初始化,并利用PHE模块进行物理感知的上下文细节建模,生成具有真实细节的场景完成结果。实验结果证明了SPHERE的有效性。
Key Takeaways
- SSC任务对于自主驾驶系统非常重要,评估整体场景感知的体素级几何和语义。
- 现有方法如基于体素和基于平面的SSC方法难以捕捉物理规律,导致几何细节不真实。
- 神经重建方法如NeRF和3DGS具有出色的物理感知能力,但在处理大规模复杂场景时计算成本高、收敛慢,语义准确性较差。
- SPHERE方法结合了体素和Gaussian表示,旨在高效利用语义和物理信息,解决上述问题。
- SPHERE通过SGI模块进行高斯初始化,并借助PHE模块进行物理感知的上下文细节建模,生成具有真实细节的场景完成结果。
点此查看论文截图





Multispectral-NeRF:a multispectral modeling approach based on neural radiance fields
Authors:Hong Zhang, Fei Guo, Zihan Xie, Dizhao Yao
3D reconstruction technology generates three-dimensional representations of real-world objects, scenes, or environments using sensor data such as 2D images, with extensive applications in robotics, autonomous vehicles, and virtual reality systems. Traditional 3D reconstruction techniques based on 2D images typically relies on RGB spectral information. With advances in sensor technology, additional spectral bands beyond RGB have been increasingly incorporated into 3D reconstruction workflows. Existing methods that integrate these expanded spectral data often suffer from expensive scheme prices, low accuracy and poor geometric features. Three - dimensional reconstruction based on NeRF can effectively address the various issues in current multispectral 3D reconstruction methods, producing high - precision and high - quality reconstruction results. However, currently, NeRF and some improved models such as NeRFacto are trained on three - band data and cannot take into account the multi - band information. To address this problem, we propose Multispectral-NeRF, an enhanced neural architecture derived from NeRF that can effectively integrates multispectral information. Our technical contributions comprise threefold modifications: Expanding hidden layer dimensionality to accommodate 6-band spectral inputs; Redesigning residual functions to optimize spectral discrepancy calculations between reconstructed and reference images; Adapting data compression modules to address the increased bit-depth requirements of multispectral imagery. Experimental results confirm that Multispectral-NeRF successfully processes multi-band spectral features while accurately preserving the original scenes’ spectral characteristics.
三维重建技术利用如二维图像等传感器数据生成真实世界物体、场景或环境的三维表示,在机器人、自动驾驶汽车和虚拟现实系统等领域有广泛应用。传统的基于二维图像的三维重建技术通常依赖于RGB光谱信息。随着传感器技术的进步,越来越多的超出RGB的额外光谱波段被融入到三维重建流程中。现有的融合这些扩展光谱数据的方法常常面临方案成本高、精度低和几何特征差的问题。基于NeRF的三维重建可以有效解决当前多光谱三维重建方法中的各种难题,产生高精度高质量的重建结果。然而,目前NeRF和一些改进模型如NeRFacto等是在三波段数据上训练的,无法考虑多波段信息。为了解决这个问题,我们提出Multispectral-NeRF,这是一种源于NeRF的增强型神经网络架构,可以有效整合多光谱信息。我们的技术贡献包括三方面的改进:扩展隐藏层的维度以适应六波段光谱输入;重新设计剩余函数以优化重建图像和参考图像之间的光谱差异计算;适应数据压缩模块以解决多光谱图像增加的位深要求。实验结果证实,Multispectral-NeRF成功处理多波段光谱特征,同时准确保留原始场景的光谱特征。
论文及项目相关链接
摘要
NeRF技术为构建真实世界的三维模型提供了新的视角。随着传感器技术的进步,引入多光谱信息为三维重建带来了新的可能性。传统基于RGB信息的方法存在一些局限性,而NeRF技术可以有效解决这些问题。然而,现有的NeRF及其改进模型如NeRFacto仅适用于三波段数据,难以处理多波段信息。为解决这一问题,我们提出了Multispectral-NeRF模型,它有效融合了多光谱信息。其主要技术贡献包括三个方面:扩展隐藏层维度以适应六波段光谱输入;重新设计残差函数以优化重建图像与参考图像之间的光谱差异计算;适应数据压缩模块以解决多光谱图像增加的位深度要求。实验结果表明,Multispectral-NeRF能够成功处理多波段光谱特征,同时准确保留原始场景的光谱特性。
关键见解
- NeRF技术为三维重建提供了高精度、高质量的结果。
- 传统三维重建方法主要依赖于RGB信息,存在价格昂贵、精度低、几何特征差等问题。
- 多光谱信息引入为三维重建带来了新的可能性,但现有NeRF模型难以处理多波段信息。
- Multispectral-NeRF模型有效融合了多光谱信息,通过扩展隐藏层维度、重新设计残差函数、适应数据压缩模块等技术贡献实现。
- Multispectral-NeRF能够成功处理多波段光谱特征,同时保留原始场景的光谱特性。
- 该模型具有潜在的应用价值,可广泛应用于机器人、自动驾驶、虚拟现实等领域。
点此查看论文截图


NeRF-Aug: Data Augmentation for Robotics with Neural Radiance Fields
Authors:Eric Zhu, Mara Levy, Matthew Gwilliam, Abhinav Shrivastava
Training a policy that can generalize to unknown objects is a long standing challenge within the field of robotics. The performance of a policy often drops significantly in situations where an object in the scene was not seen during training. To solve this problem, we present NeRF-Aug, a novel method that is capable of teaching a policy to interact with objects that are not present in the dataset. This approach differs from existing approaches by leveraging the speed, photorealism, and 3D consistency of a neural radiance field for augmentation. NeRF-Aug both creates more photorealistic data and runs 63% faster than existing methods. We demonstrate the effectiveness of our method on 5 tasks with 9 novel objects that are not present in the expert demonstrations. We achieve an average performance boost of 55.6% when comparing our method to the next best method. You can see video results at https://nerf-aug.github.io.
在机器人技术领域中,训练能够推广至未知物体的策略一直是一个长期存在的挑战。在场景中出现的物体在训练期间未见过的情境下,策略的表现往往会大幅下降。为了解决这一问题,我们推出了NeRF-Aug这一新方法,它能够教导策略与数据集中不存在的物体进行交互。这种方法与现有方法的不同之处在于,它利用神经辐射场的速度、逼真度和3D一致性来进行增强。NeRF-Aug既生成了更逼真的数据,又比现有方法快63%。我们在专家演示中未出现的5个任务、共9个新型物体上展示了我们的方法的有效性。在与次优方法的比较中,我们的方法实现了平均性能提升55.6%。视频结果可见:https://nerf-aug.github.io。
论文及项目相关链接
Summary
本文介绍了一种名为NeRF-Aug的新方法,该方法能够教授策略与训练集中不存在的对象进行交互。通过利用神经辐射场的速度、逼真性和三维一致性进行增强,NeRF-Aug创造更逼真的数据,并且在运行速度上比现有方法快63%。在五个任务中使用九个未见过的对象进行演示,相较于次优方法,NeRF-Aug的平均性能提升了55.6%。
Key Takeaways
- NeRF-Aug是一种能够教授策略与未知对象交互的新方法。
- 该方法利用神经辐射场的速度、逼真性和三维一致性进行增强。
- NeRF-Aug创造了更逼真的数据。
- NeRF-Aug在运行速度上比现有方法快63%。
- 在五个任务中使用未见过的对象进行演示,NeRF-Aug表现出有效性和优越性。
- 与次优方法相比,NeRF-Aug的平均性能提升了55.6%。
点此查看论文截图





