⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-03-18 更新
TriDF: Triplane-Accelerated Density Fields for Few-Shot Remote Sensing Novel View Synthesis
Authors:Jiaming Kang, Keyan Chen, Zhengxia Zou, Zhenwei Shi
Remote sensing novel view synthesis (NVS) offers significant potential for 3D interpretation of remote sensing scenes, with important applications in urban planning and environmental monitoring. However, remote sensing scenes frequently lack sufficient multi-view images due to acquisition constraints. While existing NVS methods tend to overfit when processing limited input views, advanced few-shot NVS methods are computationally intensive and perform sub-optimally in remote sensing scenes. This paper presents TriDF, an efficient hybrid 3D representation for fast remote sensing NVS from as few as 3 input views. Our approach decouples color and volume density information, modeling them independently to reduce the computational burden on implicit radiance fields and accelerate reconstruction. We explore the potential of the triplane representation in few-shot NVS tasks by mapping high-frequency color information onto this compact structure, and the direct optimization of feature planes significantly speeds up convergence. Volume density is modeled as continuous density fields, incorporating reference features from neighboring views through image-based rendering to compensate for limited input data. Additionally, we introduce depth-guided optimization based on point clouds, which effectively mitigates the overfitting problem in few-shot NVS. Comprehensive experiments across multiple remote sensing scenes demonstrate that our hybrid representation achieves a 30x speed increase compared to NeRF-based methods, while simultaneously improving rendering quality metrics over advanced few-shot methods (7.4% increase in PSNR, 12.2% in SSIM, and 18.7% in LPIPS). The code is publicly available at https://github.com/kanehub/TriDF
远程遥感新颖视图合成(NVS)在遥感场景的3D解释方面表现出巨大的潜力,在城市规划和环境监测等领域有重要应用。然而,由于采集限制,遥感场景经常缺乏足够的多视角图像。现有的NVS方法在处理有限输入视角时往往过于拟合,而先进的少视角NVS方法计算量大,在遥感场景中的表现并不理想。本文提出了TriDF,这是一种高效的混合3D表示方法,能够从仅3个输入视角快速进行遥感NVS。我们的方法将颜色和体积密度信息解耦,独立建模,以减少对隐辐射场的计算负担并加速重建。我们探索了triplane表示在少视角NVS任务中的潜力,通过将高频颜色信息映射到这个紧凑结构上,特征平面的直接优化显著加速了收敛。体积密度被建模为连续密度场,通过基于图像渲染融入邻近视角的参考特征来补偿有限输入数据。此外,我们引入了基于点云的深度引导优化,这有效地缓解了少视角NVS中的过度拟合问题。在多个遥感场景的综合实验表明,我们的混合表示方法相比基于NeRF的方法实现了30倍的速度提升,同时在渲染质量指标上超过了先进的少视角方法(PSNR提高7.4%,SSIM提高12.2%,LPIPS提高18.7%)。代码已公开在https://github.com/kanehub/TriDF。
论文及项目相关链接
Summary
本文提出了一种高效的混合三维表示方法TriDF,用于从仅少数几个视角进行遥感影像的新视角合成(NVS)。TriDF将颜色和体积密度信息解耦,独立建模,降低隐辐射场计算负担,加速重建过程。在少量样本的NVS任务中,探索了triplane表示的潜力,将高频颜色信息映射到此紧凑结构上,并直接优化特征平面以加快收敛速度。同时,将体积密度建模为连续密度场,通过图像渲染融合邻近视角的参考特征来补偿有限的输入数据。此外,引入基于点云的深度引导优化,有效解决了少量样本NVS中的过拟合问题。实验表明,与NeRF方法相比,TriDF实现了30倍的速度提升,同时在渲染质量指标上也有显著提高。
Key Takeaways
- TriDF是一种用于遥感影像新视角合成(NVS)的混合三维表示方法。
- TriDF能够从仅少数几个视角进行高效的NVS。
- TriDF将颜色和体积密度信息解耦,降低计算负担并加速重建。
- TriDF利用triplane表示处理高频颜色信息,并通过直接优化特征平面提高速度。
- 体积密度被建模为连续密度场,融合邻近视角的参考特征以补偿有限的输入数据。
- 引入基于点云的深度引导优化,解决少量样本NVS中的过拟合问题。
点此查看论文截图



DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction
Authors:Rui Wang, Quentin Lohmeyer, Mirko Meboldt, Siyu Tang
Reconstructing clean, distractor-free 3D scenes from real-world captures remains a significant challenge, particularly in highly dynamic and cluttered settings such as egocentric videos. To tackle this problem, we introduce DeGauss, a simple and robust self-supervised framework for dynamic scene reconstruction based on a decoupled dynamic-static Gaussian Splatting design. DeGauss models dynamic elements with foreground Gaussians and static content with background Gaussians, using a probabilistic mask to coordinate their composition and enable independent yet complementary optimization. DeGauss generalizes robustly across a wide range of real-world scenarios, from casual image collections to long, dynamic egocentric videos, without relying on complex heuristics or extensive supervision. Experiments on benchmarks including NeRF-on-the-go, ADT, AEA, Hot3D, and EPIC-Fields demonstrate that DeGauss consistently outperforms existing methods, establishing a strong baseline for generalizable, distractor-free 3D reconstructionin highly dynamic, interaction-rich environments.
从真实世界捕捉中重建干净、无干扰物的3D场景仍然是一个重大挑战,特别是在高度动态和杂乱的设置(例如以自我为中心的视频)中。为了解决这个问题,我们引入了DeGauss,这是一个简单而稳健的自监督动态场景重建框架,基于解耦的动态静态高斯拼贴设计。DeGauss使用前景高斯对动态元素进行建模,使用背景高斯对静态内容进行建模,并使用概率掩膜来协调它们的组合,以实现独立但互补的优化。DeGauss在多种真实场景(从日常图像集合到长动态以自我为中心的视频)中表现稳健,无需依赖复杂的启发式算法或广泛的监督。在包括NeRF-on-the-go、ADT、AEA、Hot3D和EPIC-Fields等基准测试上的实验表明,DeGauss始终优于现有方法,为高度动态、交互丰富的环境中的通用无干扰物3D重建建立了强大的基线。
论文及项目相关链接
Summary
本文介绍了名为DeGauss的简洁稳健的自监督动态场景重建框架,该框架基于解耦的动态静态高斯Splatting设计。DeGauss使用前景高斯模型对动态元素进行建模,使用背景高斯模型对静态内容进行建模,并使用概率掩膜来协调它们的组合,以实现独立但互补的优化。DeGauss在广泛的真实场景中表现稳健,从随意的图像集合到长动态第一人称视频,无需依赖复杂的启发式方法或大量监督。在包括NeRF-on-the-go、ADT、AEA、Hot3D和EPIC-Fields等多个基准测试上进行的实验表明,DeGauss在高度动态、交互丰富的环境中持续超越现有方法,为无干扰的3D重建建立了强大的基线。
Key Takeaways
- DeGauss是一个自监督的动态场景重建框架,适用于处理复杂的真实世界场景。
- 该框架采用前景高斯模型与背景高斯模型相结合的方式,对动态和静态元素进行建模。
- 通过概率掩膜协调动态和静态元素的组合,实现独立且互补的优化。
- DeGauss能够处理广泛的真实场景,包括从随意的图像集合到长动态第一人称视频等多种情况。
- 该框架无需依赖复杂的启发式方法或大量监督,具有强大的泛化能力。
- 在多个基准测试上的实验结果表明,DeGauss在高度动态、交互丰富的环境中表现优异。
点此查看论文截图




DivCon-NeRF: Generating Augmented Rays with Diversity and Consistency for Few-shot View Synthesis
Authors:Ingyun Lee, Jae Won Jang, Seunghyeon Seo, Nojun Kwak
Neural Radiance Field (NeRF) has shown remarkable performance in novel view synthesis but requires many multiview images, making it impractical for few-shot scenarios. Ray augmentation was proposed to prevent overfitting for sparse training data by generating additional rays. However, existing methods, which generate augmented rays only near the original rays, produce severe floaters and appearance distortion due to limited viewpoints and inconsistent rays obstructed by nearby obstacles and complex surfaces. To address these problems, we propose DivCon-NeRF, which significantly enhances both diversity and consistency. It employs surface-sphere augmentation, which preserves the distance between the original camera and the predicted surface point. This allows the model to compare the order of high-probability surface points and filter out inconsistent rays easily without requiring the exact depth. By introducing inner-sphere augmentation, DivCon-NeRF randomizes angles and distances for diverse viewpoints, further increasing diversity. Consequently, our method significantly reduces floaters and visual distortions, achieving state-of-the-art performance on the Blender, LLFF, and DTU datasets. Our code will be publicly available.
神经辐射场(NeRF)在新型视角合成中表现出卓越的性能,但需要许多多角度图像,因此在小样场景中应用不实际。射线增强法通过生成额外的射线来防止稀疏训练数据的过拟合。然而,现有方法仅在原始射线附近生成增强射线,由于有限的视角和不一致的射线被附近障碍物和复杂表面阻挡,导致出现严重的浮点和外观失真。为了解决这些问题,我们提出了DivCon-NeRF,它大大提高了多样性和一致性。它采用表面球体增强技术,保留了原始相机和预测表面点之间的距离。这使得模型能够比较高概率表面点的顺序,并轻松过滤出不一致的射线,而无需确切深度。通过引入内部球体增强技术,DivCon-NeRF随机化角度和距离以实现不同的视角,进一步提高了多样性。因此,我们的方法显著减少了浮点数和视觉失真,在Blender、LLFF和DTU数据集上达到了最先进的性能。我们的代码将公开可用。
论文及项目相关链接
PDF 11 pages, 6 figures
总结
NeRF在新型视角合成中表现出卓越性能,但需依赖大量多角度图像,对于少量图像场景并不实用。为了解决这个问题,有人提出了射线增强法,通过生成额外的射线防止稀疏训练数据的过拟合现象。然而,现有的生成方式只在原始射线附近生成额外的射线,这使得视图有限和障碍阻碍使得射线不一致导致浮点数和外观失真问题。为解决这些问题,本文提出了DivCon-NeRF,该方法显著提高了多样性和一致性。通过引入基于表面球的增强技术来保持相机预测表面的距离不变。该方法可以方便地去除不一致的射线。DivCon-NeRF通过引入内部球体增强技术进一步提高了多样性,随机调整角度和距离以模拟不同的视角。因此,该方法显著降低了浮点数和视觉失真,在Blender、LLFF和DTU数据集上取得了最先进的性能。我们的代码将公开可用。
关键见解
- NeRF在多视角图像合成中表现出卓越性能,但在处理少量图像时存在局限性。
- 现有射线增强法主要在原始射线附近生成额外射线,导致浮点数和外观失真问题。
点此查看论文截图



3D Gaussian Splatting against Moving Objects for High-Fidelity Street Scene Reconstruction
Authors:Peizhen Zheng, Longfei Wei, Dongjing Jiang, Jianfei Zhang
The accurate reconstruction of dynamic street scenes is critical for applications in autonomous driving, augmented reality, and virtual reality. Traditional methods relying on dense point clouds and triangular meshes struggle with moving objects, occlusions, and real-time processing constraints, limiting their effectiveness in complex urban environments. While multi-view stereo and neural radiance fields have advanced 3D reconstruction, they face challenges in computational efficiency and handling scene dynamics. This paper proposes a novel 3D Gaussian point distribution method for dynamic street scene reconstruction. Our approach introduces an adaptive transparency mechanism that eliminates moving objects while preserving high-fidelity static scene details. Additionally, iterative refinement of Gaussian point distribution enhances geometric accuracy and texture representation. We integrate directional encoding with spatial position optimization to optimize storage and rendering efficiency, reducing redundancy while maintaining scene integrity. Experimental results demonstrate that our method achieves high reconstruction quality, improved rendering performance, and adaptability in large-scale dynamic environments. These contributions establish a robust framework for real-time, high-precision 3D reconstruction, advancing the practicality of dynamic scene modeling across multiple applications. The source code for this work is available to the public at https://github.com/deepcoxcom/3dgs
动态街道场景的准确重建对于自动驾驶、增强现实和虚拟现实等领域的应用至关重要。传统方法依赖于密集的点云和三角网格,在移动物体、遮挡和实时处理限制方面存在困难,在复杂的城市环境中效果有限。虽然多视图立体技术和神经辐射场已经推动了3D重建的发展,但它们面临着计算效率和处理场景动态的挑战。本文提出了一种用于动态街道场景重建的3D高斯点分布新方法。我们的方法引入了一种自适应透明机制,能够消除移动物体,同时保留高保真静态场景细节。此外,通过对高斯点分布的迭代优化,提高了几何精度和纹理表示。我们将方向编码与空间位置优化相结合,以优化存储和渲染效率,在保持场景完整性的同时减少冗余。实验结果表明,我们的方法实现了高重建质量、提高的渲染性能以及在大规模动态环境中的适应性。这些贡献为实时、高精度3D重建建立了一个稳健的框架,推动了动态场景建模在多个应用中的实用性。该工作的源代码可在https://github.com/deepcoxcom/3dgs上供公众访问。
论文及项目相关链接
Summary
本文提出一种基于动态街景重建的3D高斯点分布新方法。此方法通过自适应透明机制消除移动物体并保留静态场景的细节。迭代优化的高斯点分布提高了几何精度和纹理表现。集成方向编码和空间位置优化提高了存储和渲染效率。该方法实现了高质量重建、出色的渲染性能和大规模动态环境的适应性,为实时高精度3D重建提供了稳健的框架,推动了动态场景建模在多个应用中的实用性。
Key Takeaways
- 提出一种基于动态街景重建的3D高斯点分布方法。
- 通过自适应透明机制消除移动物体,保留静态场景细节。
- 迭代优化的高斯点分布提高几何精度和纹理表现。
- 集成方向编码和空间位置优化以提高存储和渲染效率。
- 实现高质量重建、出色的渲染性能和大规模动态环境的适应性。
- 为实时高精度3D重建提供了稳健的框架。
点此查看论文截图

Memory-Efficient 3D High-Resolution Medical Image Synthesis Using CRF-Guided GANs
Authors:Mahshid Shiri, Alessandro Bruno, Daniele Loiacono
Generative Adversarial Networks (GANs) have many potential medical imaging applications. Due to the limited memory of Graphical Processing Units (GPUs), most current 3D GAN models are trained on low-resolution medical images, these models cannot scale to high-resolution or are susceptible to patchy artifacts. In this work, we propose an end-to-end novel GAN architecture that uses Conditional Random field (CRF) to model dependencies so that it can generate consistent 3D medical Images without exploiting memory. To achieve this purpose, the generator is divided into two parts during training, the first part produces an intermediate representation and CRF is applied to this intermediate representation to capture correlations. The second part of the generator produces a random sub-volume of image using a subset of the intermediate representation. This structure has two advantages: first, the correlations are modeled by using the features that the generator is trying to optimize. Second, the generator can generate full high-resolution images during inference. Experiments on Lung CTs and Brain MRIs show that our architecture outperforms state-of-the-art while it has lower memory usage and less complexity.
生成对抗网络(GANs)在医学成像方面具有许多潜在应用。由于图形处理单元(GPU)的内存有限,当前大多数3D GAN模型都是在低分辨率医学图像上训练的,这些模型无法扩展到高分辨率或容易出现斑块状伪影。在这项工作中,我们提出了一种端到端的新型GAN架构,该架构使用条件随机场(CRF)来建模依赖性,从而能够在不利用内存的情况下生成一致的3D医学图像。为了实现这一目的,在训练过程中将生成器分为两部分,第一部分产生中间表示,并应用CRF到此中间表示来捕获相关性。生成器的第二部分使用中间表示的一个子集来产生图像的随机子体积。这种结构有两个优点:首先,通过使用生成器正在尝试优化的特征来建模相关性。其次,生成器可以在推理期间生成全高分辨率图像。对肺部CT和脑部MRI的实验表明,我们的架构在具有较低内存使用和较低复杂性的同时,优于最新技术。
论文及项目相关链接
PDF Accepted to Artificial Intelligence for Healthcare Applications, 3rd International Workshop ICPR 2024
Summary
本文提出了一种新型的端到端GAN架构,该架构利用条件随机场(CRF)进行依赖建模,能够在不占用大量内存的情况下生成一致的3D医学图像。该架构在训练过程中将生成器分为两部分,第一部分生成中间表示,并应用CRF捕捉相关性,第二部分则利用部分中间表示生成随机图像子体积。这种结构既能够建模相关性,又能够在推理时生成全高分辨率图像。在肺部CT和脑部MRI的实验中,该架构表现出卓越的性能,同时降低了内存使用率和复杂性。
Key Takeaways
- 本研究提出了一种新型的GAN架构,可用于生成一致的3D医学图像。
- 该架构利用条件随机场(CRF)进行依赖建模,使得生成图像更为连贯。
- GAN架构在训练过程中分为两部分,第一部分生成中间表示并捕捉相关性,第二部分生成随机图像子体积。
- 该架构可在不占用大量内存的情况下生成高分辨率医学图像。
- 在肺部CT和脑部MRI的实验中,该架构的性能超越了现有技术。
- 该研究提出的架构降低了内存使用率和复杂性。
点此查看论文截图




Uni-Gaussians: Unifying Camera and Lidar Simulation with Gaussians for Dynamic Driving Scenarios
Authors:Zikang Yuan, Yuechuan Pu, Hongcheng Luo, Fengtian Lang, Cheng Chi, Teng Li, Yingying Shen, Haiyang Sun, Bing Wang, Xin Yang
Ensuring the safety of autonomous vehicles necessitates comprehensive simulation of multi-sensor data, encompassing inputs from both cameras and LiDAR sensors, across various dynamic driving scenarios. Neural rendering techniques, which utilize collected raw sensor data to simulate these dynamic environments, have emerged as a leading methodology. While NeRF-based approaches can uniformly represent scenes for rendering data from both camera and LiDAR, they are hindered by slow rendering speeds due to dense sampling. Conversely, Gaussian Splatting-based methods employ Gaussian primitives for scene representation and achieve rapid rendering through rasterization. However, these rasterization-based techniques struggle to accurately model non-linear optical sensors. This limitation restricts their applicability to sensors beyond pinhole cameras. To address these challenges and enable unified representation of dynamic driving scenarios using Gaussian primitives, this study proposes a novel hybrid approach. Our method utilizes rasterization for rendering image data while employing Gaussian ray-tracing for LiDAR data rendering. Experimental results on public datasets demonstrate that our approach outperforms current state-of-the-art methods. This work presents a unified and efficient solution for realistic simulation of camera and LiDAR data in autonomous driving scenarios using Gaussian primitives, offering significant advancements in both rendering quality and computational efficiency.
确保自动驾驶车辆的安全需要全面模拟多传感器数据,这包括来自相机和激光雷达传感器的输入,涵盖各种动态驾驶场景。利用收集的原始传感器数据模拟这些动态环境的神经渲染技术已成为一种主流方法。虽然基于NeRF的方法可以统一表示场景,实现相机和激光雷达的渲染数据,但它们受到密集采样导致的渲染速度慢的阻碍。相反,基于高斯贴图的方法使用高斯原始元素进行场景表示,并通过光栅化实现快速渲染。然而,这些基于光栅化的技术很难准确模拟非线性光学传感器。这一局限性限制了它们对针孔相机以外传感器的适用性。为了解决这些挑战,实现对动态驾驶场景的统一表示,本研究提出了一种新型混合方法。我们的方法利用光栅化进行图像数据渲染,同时使用高斯光线追踪进行激光雷达数据渲染。在公共数据集上的实验结果表明,我们的方法优于当前最先进的方法。这项工作提出了一种使用高斯原始元素对自动驾驶场景中的相机和激光雷达数据进行逼真模拟的统一高效解决方案,在渲染质量和计算效率方面都取得了重大进展。
论文及项目相关链接
PDF 10 pages
Summary
神经网络渲染技术已成为模拟自主驾驶车辆动态环境的主要方法,利用收集的原始传感器数据进行渲染。NeRF方法能统一表示场景,为相机和激光雷达提供渲染数据,但渲染速度慢。相反,基于高斯拼贴的方法使用高斯原始数据进行快速渲染,但难以准确模拟非线性光学传感器。本研究提出了一种混合方法,结合光栅化和高斯射线追踪技术实现相机数据和激光雷达数据的统一表示。在公共数据集上的实验结果表明,该方法优于现有技术。它为自主驾驶场景中相机和激光雷达数据的真实模拟提供了统一高效的解决方案,提高了渲染质量和计算效率。
Key Takeaways
- 自主车辆安全需要全面模拟多传感器数据,包括相机和激光雷达数据以及多种动态驾驶场景。
- 神经网络渲染技术已成为模拟自主驾驶车辆动态环境的主要方法。
- NeRF方法可以统一表示场景并为相机和激光雷达提供渲染数据,但存在渲染速度慢的问题。
- 基于高斯拼贴的方法快速渲染,但难以准确模拟非线性光学传感器。
- 本研究提出了一种混合方法,结合了光栅化和高斯射线追踪技术来解决上述问题。
- 该方法在公共数据集上的实验表现优于现有技术。
点此查看论文截图






LiDAR-GS:Real-time LiDAR Re-Simulation using Gaussian Splatting
Authors:Qifeng Chen, Sheng Yang, Sicong Du, Tao Tang, Peng Chen, Yuchi Huo
We present LiDAR-GS, a Gaussian Splatting (GS) method for real-time, high-fidelity re-simulation of LiDAR scans in public urban road scenes. Recent GS methods proposed for cameras have achieved significant advancements in real-time rendering beyond Neural Radiance Fields (NeRF). However, applying GS representation to LiDAR, an active 3D sensor type, poses several challenges that must be addressed to preserve high accuracy and unique characteristics. Specifically, LiDAR-GS designs a differentiable laser beam splatting, using range-view representation for precise surface splatting by projecting lasers onto micro cross-sections, effectively eliminating artifacts associated with local affine approximations. Furthermore, LiDAR-GS leverages Neural Gaussian Representation, which further integrate view-dependent clues, to represent key LiDAR properties that are influenced by the incident direction and external factors. Combining these practices with some essential adaptations, e.g., dynamic instances decomposition, LiDAR-GS succeeds in simultaneously re-simulating depth, intensity, and ray-drop channels, achieving state-of-the-art results in both rendering frame rate and quality on publically available large scene datasets when compared with the methods using explicit mesh or implicit NeRF. Our source code is publicly available at https://www.github.com/cqf7419/LiDAR-GS.
我们提出了LiDAR-GS,这是一种针对公共城市道路场景实时高保真模拟激光雷达扫描的高斯展铺(GS)方法。最近为相机提出的GS方法在超越神经辐射场(NeRF)的实时渲染方面取得了重大进展。然而,将GS表示法应用于激光雷达(一种主动三维传感器类型)时,存在几个挑战必须得到解决,以保持高准确性和独特特征。具体来说,LiDAR-GS设计了一种可微分的激光束展铺技术,采用范围视图表示法,通过将激光投射到微小横截面上进行精确表面展铺,有效地消除了与局部仿射近似相关的伪影。此外,LiDAR-GS利用神经高斯表示法,进一步整合了视图相关线索,以表示受入射方向和外部因素影响的激光雷达的关键属性。结合这些实践与一些基本适应,例如动态实例分解,LiDAR-GS成功重新模拟了深度、强度和射线通道等过程,并与公开可用的大规模场景数据集上显式网格或隐式NeRF的方法相比,在渲染帧率和质量方面达到了最先进的水平。我们的源代码可在https://www.github.com/cqf7419/LiDAR-GS获取。
论文及项目相关链接
Summary
LiDAR-GS是一种基于高斯喷涂(GS)方法的实时高保真模拟激光雷达扫描的公共城市道路场景技术。它解决了将GS方法应用于激光雷达这一主动三维传感器类型所面临的挑战,通过可微分的激光束喷涂、范围视角表示以及神经高斯表示等技术,实现了高精度和独特特性的保持。LiDAR-GS同时模拟深度、强度和射线丢失通道,并在公共可用大型场景数据集上实现了帧率和质量的最新结果。
Key Takeaways
- LiDAR-GS是一种用于公共城市道路场景的实时高保真模拟激光雷达扫描的Gaussian Splatting方法。
- 它解决了将GS方法应用于激光雷达所面临的挑战,包括可微分的激光束喷涂和范围视角表示。
- LiDAR-GS通过神经高斯表示技术,进一步结合了方向性和外部因素影响的视图相关线索,以表示激光雷达的关键属性。
- LiDAR-GS成功模拟了深度、强度和射线丢失通道。
- 与使用显式网格或隐式NeRF的方法相比,LiDAR-GS在公共可用大型场景数据集上实现了渲染帧率和质量的最新结果。
- 该方法的源代码已公开可用。
点此查看论文截图









Fast Global Localization on Neural Radiance Field
Authors:Mangyu Kong, Seongwon Lee, Jaewon Lee, Euntai Kim
Neural Radiance Fields (NeRF) presented a novel way to represent scenes, allowing for high-quality 3D reconstruction from 2D images. Following its remarkable achievements, global localization within NeRF maps is an essential task for enabling a wide range of applications. Recently, Loc-NeRF demonstrated a localization approach that combines traditional Monte Carlo Localization with NeRF, showing promising results for using NeRF as an environment map. However, despite its advancements, Loc-NeRF encounters the challenge of a time-intensive ray rendering process, which can be a significant limitation in practical applications. To address this issue, we introduce Fast Loc-NeRF, which leverages a coarse-to-fine approach to enable more efficient and accurate NeRF map-based global localization. Specifically, Fast Loc-NeRF matches rendered pixels and observed images on a multi-resolution from low to high resolution. As a result, it speeds up the costly particle update process while maintaining precise localization results. Additionally, to reject the abnormal particles, we propose particle rejection weighting, which estimates the uncertainty of particles by exploiting NeRF’s characteristics and considers them in the particle weighting process. Our Fast Loc-NeRF sets new state-of-the-art localization performances on several benchmarks, convincing its accuracy and efficiency.
神经辐射场(NeRF)提出了一种新的场景表示方法,允许从二维图像进行高质量的三维重建。在其显著成就之后,NeRF地图中的全局定位成为了一项为广泛应用赋能的重要任务。最近,Loc-NeRF演示了一种结合传统蒙特卡洛定位与NeRF的定位方法,在使用NeRF作为环境地图方面显示出有希望的结果。然而,尽管有所进展,Loc-NeRF仍面临耗时的光线渲染过程挑战,这可能在实际应用中成为重大限制。为了解决这一问题,我们引入了Fast Loc-NeRF,它采用由粗到细的方法,实现更高效、更准确的基于NeRF地图的全局定位。具体来说,Fast Loc-NeRF在多分辨率上从低分辨率到高分辨率匹配渲染像素和观察图像。因此,它加速了昂贵的粒子更新过程,同时保持了精确的定位结果。此外,为了排除异常粒子,我们提出了粒子拒绝加权方法,它通过利用NeRF的特性估计粒子的不确定性,并在粒子加权过程中加以考虑。我们的Fast Loc-NeRF在多个基准测试上创下了最新的定位性能记录,证明了其准确性和效率。
论文及项目相关链接
PDF Accepted at ICRA 2025
Summary
NeRF(神经辐射场)展示了一种新颖的场景表示方法,能够从二维图像生成高质量的三维重建。基于NeRF地图的全局定位对于广泛应用至关重要,但现有方法如Loc-NeRF面临计算耗时的挑战。为解决这一问题,我们提出Fast Loc-NeRF,采用从粗到细的逐层细化策略优化定位效率与准确性。它通过多分辨率下的渲染像素与观测图像匹配,加速粒子更新过程并维持精准定位结果。此外,我们还提出了粒子拒绝加权策略,利用NeRF特性估计粒子不确定性并考虑在粒子权重计算中,以提高性能。Fast Loc-NeRF在多个基准测试中表现卓越,证明了其准确性和高效性。
Key Takeaways
- NeRF提供了一种从二维图像生成高质量三维重建的场景表示方法。
- Loc-NeRF结合了传统的蒙特卡洛定位方法和NeRF技术,展示了在NeRF地图上的全局定位潜力。
- Fast Loc-NeRF通过采用从粗到细的逐层细化策略解决了Loc-NeRF计算耗时的问题。
- Fast Loc-NeRF在多分辨率下匹配渲染像素和观测图像,加速了粒子更新过程并保持精准定位。
- Fast Loc-NeRF提出了粒子拒绝加权策略,利用NeRF特性估计粒子不确定性以提高性能。
- Fast Loc-NeRF在多个基准测试中表现最佳,证明了其准确性和高效性。
点此查看论文截图






AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis
Authors:Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiankang Deng, Xiatian Zhu
Novel view acoustic synthesis (NVAS) aims to render binaural audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene. Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing binaural audio. However, in addition to low efficiency originating from heavy NeRF rendering, these methods all have a limited ability of characterizing the entire scene environment such as room geometry, material properties, and the spatial relation between the listener and sound source. To address these issues, we propose a novel Audio-Visual Gaussian Splatting (AV-GS) model. To obtain a material-aware and geometry-aware condition for audio synthesis, we learn an explicit point-based scene representation with an audio-guidance parameter on locally initialized Gaussian points, taking into account the space relation from the listener and sound source. To make the visual scene model audio adaptive, we propose a point densification and pruning strategy to optimally distribute the Gaussian points, with the per-point contribution in sound propagation (e.g., more points needed for texture-less wall surfaces as they affect sound path diversion). Extensive experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
新颖视图声合成(NVAS)旨在根据三维场景中声源发出的单声道音频在任何目标视角渲染双声道音频。现有方法已经提出了基于NeRF的隐式模型,利用视觉线索作为合成双声道音频的条件。然而,除了源于NeRF渲染的低效率之外,这些方法在刻画整个场景环境方面能力有限,如房间几何、材料属性和听众与声源之间的空间关系。为了解决这些问题,我们提出了一种新颖的视听高斯涂片(AV-GS)模型。为了获得用于音频合成的材料感知和几何感知条件,我们在本地初始化的高斯点上学习带有音频引导参数的显式点式场景表示,同时考虑到听众和声源之间的空间关系。为了使视觉场景模型适应音频,我们提出了一种点密集化和修剪策略,以优化高斯点的分布,每个点在声音传播中的贡献(例如,对于影响声音路径转向的无纹理墙面表面需要更多的点)。大量实验验证了我们的AV-GS在现实世界RWAS和基于模拟的SoundSpaces数据集上优于现有替代方法。
论文及项目相关链接
PDF Accepted to NeurIPS 2024
Summary
该文提出了一种新型的视听高斯融合(AV-GS)模型,用于通过NeRF技术进行全景声音渲染。该模型旨在解决现有方法在处理环境音频合成时面临的效率问题和场景环境特征表征的局限性。通过利用音频引导参数在本地初始化的高斯点上构建明确的点基场景表示,该模型能够感知材料和几何条件,并考虑听者与声源的空间关系。此外,还提出了一种点密集化和修剪策略,以优化高斯点的分布,并为每个点在声音传播中的贡献提供支持,从而提高了模型的适应性。经过大量实验验证,AV-GS在真实世界和模拟数据集上的性能优于现有方法。
Key Takeaways
- 新视图声学合成(NVAS)技术旨在根据三维场景中声源发出的单声道音频在任何目标观点进行双耳音频渲染。
- 现有方法利用NeRF技术的隐含模型结合视觉线索进行双耳音频合成,但存在效率低和场景环境特征表征有限的问题。
- 提出的AV-GS模型利用明确的点基场景表示来解决这些问题,结合音频引导参数,考虑了材料、几何条件和听者与声源的空间关系。
- AV-GS模型通过点密集化和修剪策略优化高斯点的分布,提高模型的适应性。
- 该模型在真实世界和模拟数据集上的性能优于现有方法。
- AV-GS模型可以更好地捕捉声音传播的细节,例如在纹理较少的墙面上需要更多的点来捕捉声音路径的偏移。
点此查看论文截图



CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization
Authors:Yao Ni, Piotr Koniusz
Generative Adversarial Networks (GANs) significantly advanced image generation but their performance heavily depends on abundant training data. In scenarios with limited data, GANs often struggle with discriminator overfitting and unstable training. Batch Normalization (BN), despite being known for enhancing generalization and training stability, has rarely been used in the discriminator of Data-Efficient GANs. Our work addresses this gap by identifying a critical flaw in BN: the tendency for gradient explosion during the centering and scaling steps. To tackle this issue, we present CHAIN (lipsCHitz continuity constrAIned Normalization), which replaces the conventional centering step with zero-mean regularization and integrates a Lipschitz continuity constraint in the scaling step. CHAIN further enhances GAN training by adaptively interpolating the normalized and unnormalized features, effectively avoiding discriminator overfitting. Our theoretical analyses firmly establishes CHAIN’s effectiveness in reducing gradients in latent features and weights, improving stability and generalization in GAN training. Empirical evidence supports our theory. CHAIN achieves state-of-the-art results in data-limited scenarios on CIFAR-10/100, ImageNet, five low-shot and seven high-resolution few-shot image datasets. Code: https://github.com/MaxwellYaoNi/CHAIN
生成对抗网络(GANs)在图像生成方面取得了重大进展,但其性能严重依赖于大量的训练数据。在数据有限的情况下,GANs经常面临判别器过拟合和训练不稳定的问题。尽管批标准化(BN)已知可以增强泛化和训练稳定性,但在数据高效GAN的判别器中很少使用。我们的工作通过识别BN中的关键缺陷来解决这一问题:在中心化和标准化步骤中梯度爆炸的倾向。为了解决这一问题,我们提出了CHAIN(受lipsCHitz连续性约束的归一化),它用零均值正则化替代了传统的中心化步骤,并在标准化步骤中集成了Lipschitz连续性约束。CHAIN通过自适应地插值归一化和未归一化的特征,进一步增强了GAN的训练,有效地避免了判别器的过拟合。我们的理论分析坚定地证明了CHAIN在降低潜在特征和权重梯度方面的有效性,提高了GAN训练的稳定性和泛化能力。经验证据支持我们的理论。在CIFAR-10/100、ImageNet、五个低镜头和七个高分辨率的少量镜头图像数据集上,CHAIN在数据有限的情况下达到了最新水平的结果。代码地址:https://github.com/MaxwellYaoNi/CHAIN
论文及项目相关链接
PDF Accepted by CVPR 2024. 26 pages. Code: https://github.com/MaxwellYaoNi/CHAIN
Summary
本文主要研究了在数据有限情况下生成对抗网络(GANs)的性能问题,针对批量标准化(BN)在GAN鉴别器中的使用进行了改进。提出了一种新的标准化方法CHAIN,解决了BN在中心化和缩放步骤中的梯度爆炸问题。CHAIN通过自适应插值归一化和未归一化的特征,有效避免了鉴别器过度拟合。理论分析证明了CHAIN在降低特征梯度和权重梯度方面的有效性,提高了GAN训练的稳定性和泛化能力。在多个数据集上的实验结果表明,CHAIN在数据有限的情况下达到了领先水平。
Key Takeaways
- 生成对抗网络(GANs)在数据有限的情况下性能下降,面临鉴别器过拟合和训练不稳定的问题。
- 批量标准化(BN)虽能增强泛化和训练稳定性,但在GAN鉴别器中使用时存在缺陷。
- 提出了新的标准化方法CHAIN,通过替换传统的中心化步骤和集成Lipschitz连续性约束来解决BN的问题。
- CHAIN通过自适应插值归一化和未归一化的特征,有效避免了鉴别器过度拟合。
- 理论分析证明了CHAIN在降低特征梯度和权重梯度方面的有效性。
- 在多个数据集上的实验结果表明,CHAIN在数据有限的情况下表现优异。
点此查看论文截图




Efficient Learning With Sine-Activated Low-rank Matrices
Authors:Yiping Ji, Hemanth Saratchandran, Cameron Gordon, Zeyu Zhang, Simon Lucey
Low-rank decomposition has emerged as a vital tool for enhancing parameter efficiency in neural network architectures, gaining traction across diverse applications in machine learning. These techniques significantly lower the number of parameters, striking a balance between compactness and performance. However, a common challenge has been the compromise between parameter efficiency and the accuracy of the model, where reduced parameters often lead to diminished accuracy compared to their full-rank counterparts. In this work, we propose a novel theoretical framework that integrates a sinusoidal function within the low-rank decomposition process. This approach not only preserves the benefits of the parameter efficiency characteristic of low-rank methods but also increases the decomposition’s rank, thereby enhancing model performance. Our method proves to be a plug in enhancement for existing low-rank models, as evidenced by its successful application in Vision Transformers (ViT), Large Language Models (LLMs), Neural Radiance Fields (NeRF) and 3D shape modelling.
低秩分解作为提高神经网络架构参数效率的重要工具,在机器学习各种应用中受到广泛关注。这些技术显著减少了参数数量,在紧凑性和性能之间达到了平衡。然而,参数效率和模型准确性之间的权衡是一个常见的挑战,减少的参数往往会导致与全秩模型相比准确性下降。在这项工作中,我们提出了一种新的理论框架,该框架将正弦函数整合到低秩分解过程中。这种方法不仅保留了低秩方法的参数效率优势,而且提高了分解的秩,从而增强了模型性能。我们的方法被证明是对现有低秩模型的增强插件,其在视觉转换器(ViT)、大型语言模型(LLM)、神经辐射场(NeRF)和3D形状建模中的应用证明了这一点。
论文及项目相关链接
PDF The first two authors contributed equally. Paper accepted at ICLR 2025
Summary
神经网络架构中,低秩分解作为提升参数效率的重要工具日益受到关注,广泛应用于机器学习各个领域。它在减少参数数量的同时,兼顾模型的紧凑性和性能。然而,参数效率的提升往往伴随着准确度的下降。本研究提出一种新型理论框架,将正弦函数融入低秩分解过程,既保留了低秩方法的参数效率优势,又提升了分解的秩,增强了模型性能。该方法可即插即用,增强现有低秩模型,成功应用于视觉转换器(ViT)、大型语言模型(LLMs)、神经辐射场(NeRF)和3D形状建模。
Key Takeaways
- 低秩分解在神经网络架构中扮演重要角色,能提高参数效率。
- 现有低秩分解方法在参数效率和模型准确度之间存在权衡。
- 本研究提出的新型理论框架结合了正弦函数,旨在增强低秩分解的效果。
- 该方法在提高参数效率的同时,能够提升模型的性能。
- 该方法具有普适性,可应用于ViT、LLMs、NeRF和3D形状建模等领域。
- 此方法可以作为现有低秩模型的插件增强。
点此查看论文截图



