⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-11-05 更新
SAGS: Self-Adaptive Alias-Free Gaussian Splatting for Dynamic Surgical Endoscopic Reconstruction
Authors:Wenfeng Huang, Xiangyun Liao, Yinling Qian, Hao Liu, Yongming Yang, Wenjing Jia, Qiong Wang
Surgical reconstruction of dynamic tissues from endoscopic videos is a crucial technology in robot-assisted surgery. The development of Neural Radiance Fields (NeRFs) has greatly advanced deformable tissue reconstruction, achieving high-quality results from video and image sequences. However, reconstructing deformable endoscopic scenes remains challenging due to aliasing and artifacts caused by tissue movement, which can significantly degrade visualization quality. The introduction of 3D Gaussian Splatting (3DGS) has improved reconstruction efficiency by enabling a faster rendering pipeline. Nevertheless, existing 3DGS methods often prioritize rendering speed while neglecting these critical issues. To address these challenges, we propose SAGS, a self-adaptive alias-free Gaussian splatting framework. We introduce an attention-driven, dynamically weighted 4D deformation decoder, leveraging 3D smoothing filters and 2D Mip filters to mitigate artifacts in deformable tissue reconstruction and better capture the fine details of tissue movement. Experimental results on two public benchmarks, EndoNeRF and SCARED, demonstrate that our method achieves superior performance in all metrics of PSNR, SSIM, and LPIPS compared to the state of the art while also delivering better visualization quality.
手术内窥镜视频中的动态组织重建是机器人辅助手术中的一项关键技术。神经网络辐射场(NeRF)的发展极大地推动了可变形组织的重建,能够从视频和图像序列中获得高质量的结果。然而,由于组织运动导致的混叠和伪影,重建可变形的内窥镜场景仍然具有挑战性,这可能会严重降低可视化质量。三维高斯贴图(3DGS)的引入提高了重建效率,通过实现更快的渲染流程。然而,现有的三维高斯贴图方法往往优先关注渲染速度而忽略了这些关键问题。为了应对这些挑战,我们提出了SAGS,一种自适应无混叠高斯贴图框架。我们引入了一种基于注意力驱动、动态加权的四维变形解码器,利用三维平滑滤波器和二维Mip滤波器来减少可变形组织重建中的伪影,并更好地捕捉组织运动的细节。在EndoNeRF和SCARED两个公开基准测试上的实验结果表明,我们的方法在峰值信噪比(PSNR)、结构相似性(SSIM)和局部感知图像相似性(LPIPS)等各项指标上均达到了最先进的性能,同时提供了更好的可视化质量。
论文及项目相关链接
Summary
NeRF技术结合内窥镜视频在机器人辅助手术中用于动态组织重建具有重要意义。为应对因组织运动导致的混叠和伪影问题,提出一种自适应无混叠高斯摊铺框架SAGS。该框架引入了一种关注细节的动态加权4D变形解码器,通过利用3D平滑滤波器和2D Mip滤波器来减少伪影并捕捉组织运动的细微细节。实验结果表明,与现有技术相比,该方法在PSNR、SSIM和LPIPS等所有指标上均表现优越,可视化质量更高。
Key Takeaways
- NeRF技术对于机器人辅助手术中的动态组织重建至关重要。
- 现有重建方法在处理内窥镜视频时面临混叠和伪影问题。
- SAGS框架通过引入自适应无混叠高斯摊铺技术解决了这些问题。
- SAGS使用动态加权的4D变形解码器,关注组织运动的细微细节。
- 该方法结合3D平滑滤波器和2D Mip滤波器来优化重建效果。
- 在公共基准测试上的实验结果表明,SAGS在各项指标上均优于现有技术。
点此查看论文截图
Impact of AlN buffer thickness on electrical and thermal characteristics of AlGaN/GaN/AlN HEMTs
Authors:Minho Kim, Dat Q. Tran, Plamen P. Paskov, U. Choi, O. Nam, Vanya Darakchieva
We investigate the influence of AlN buffer thickness on the structural, electrical, and thermal properties of AlGaN/GaN high-electron mobility transistors (HEMTs) grown on semi-insulating SiC substrates by metal-organic chemical vapor deposition. X-ray diffraction and atomic force microscopy reveal that while thin AlN layers (120 nm) exhibit compressive strain and smooth step-flow surfaces, thicker single-layer buffers (550 nm) develop tensile strain and increased surface roughness. Multi-layer buffer structures up to 2 {\mu}m alleviate strain and maintain surface integrity. Low-temperature Hall measurements confirm that electron mobility decreases with increasing interface roughness, with the highest mobility observed in the structure with a thin AlN buffer. Transient thermoreflectance measurements show that thermal conductivity (ThC) of the AlN buffer increases with the thickness, reaching 188 W/m.K at 300 K for the 2 {\mu}m buffer layer, which is approximately 60% of the bulk AlN ThC value. These results highlight the importance of optimizing AlN buffer design to balance strain relaxation, thermal management, and carrier transport for high-performance GaN-based HEMTs.
我们研究了AlN缓冲层厚度对生长在半绝缘SiC衬底上的AlGaN/GaN高电子迁移率晶体管(HEMT)的结构、电学和热学性质的影响。通过X射线衍射和原子力显微镜观察发现,较薄的AlN层(120纳米)表现出压缩应变和光滑的表面形态,而较厚的单层缓冲层(550纳米)则表现出拉伸应变和更高的表面粗糙度。多层缓冲结构(高达2微米)可以缓解应变并保持表面完整性。低温霍尔测量结果表明,随着界面粗糙度的增加,电子迁移率降低,在具有薄AlN缓冲层的结构中观察到最高的迁移率。瞬态反射测量表明,AlN缓冲层的导热系数随厚度增加而增加,在300K时,2微米缓冲层的导热系数达到188 W/m·K,约为块状AlN导热系数的60%。这些结果强调了优化AlN缓冲层设计的重要性,以实现应变松弛、热管理和载流子传输之间的平衡,从而制造出高性能的GaN基HEMT。
论文及项目相关链接
PDF 5 pages, 5 figures
Summary
这篇论文研究了AlN缓冲层厚度对AlGaN/GaN高电子迁移率晶体管(HEMTs)的结构、电学和热学性质的影响。通过金属有机化合物气相沉积在半导体绝缘SiC衬底上生长晶体管。研究发现,薄AlN层(120nm)表现出压缩应变和光滑的表面形态,而较厚的单层缓冲层(550nm)则表现出拉伸应变和增加的表面粗糙度。多层缓冲结构(最高达2μm)可以缓解应变并保持表面完整性。低温霍尔测量表明,电子迁移率随界面粗糙度的增加而降低,在具有薄AlN缓冲层的结构中观察到最高迁移率。瞬态热反射测量显示,AlN缓冲层的热导率随厚度增加而提高,2μm缓冲层的热导率在300K时达到188W/m·K,约为AlN体材料热导率的60%。这些结果强调了优化AlN缓冲层设计的重要性,以实现应变松弛、热管理和载流子传输之间的平衡,以提高基于GaN的HEMTs的性能。
Key Takeaways
- AlN缓冲层厚度对AlGaN/GaN HEMT的结构、电学和热学性质有显著影响。
- 薄AlN层表现出压缩应变和光滑表面,而厚层则出现拉伸应变和表面粗糙。
- 多层缓冲结构能够缓解应变并保持表面完整性。
- 电子迁移率随界面粗糙度增加而降低,在薄AlN缓冲层结构中达到最高。
- AlN缓冲层的热导率随厚度增加而提高。
- 2μm缓冲层的热导率接近AlN体材料的60%。
点此查看论文截图
BOLT-GAN: Bayes-Optimal Loss for Stable GAN Training
Authors:Mohammadreza Tavasoli Naeini, Ali Bereyhi, Morteza Noshad, Ben Liang, Alfred O. Hero III
We introduce BOLT-GAN, a simple yet effective modification of the WGAN framework inspired by the Bayes Optimal Learning Threshold (BOLT). We show that with a Lipschitz continuous discriminator, BOLT-GAN implicitly minimizes a different metric distance than the Earth Mover (Wasserstein) distance and achieves better training stability. Empirical evaluations on four standard image generation benchmarks (CIFAR-10, CelebA-64, LSUN Bedroom-64, and LSUN Church-64) show that BOLT-GAN consistently outperforms WGAN, achieving 10-60% lower Frechet Inception Distance (FID). Our results suggest that BOLT is a broadly applicable principle for enhancing GAN training.
我们介绍了BOLT-GAN,它是WGAN框架的简单而有效的修改,灵感来源于贝叶斯最优学习阈值(BOLT)。我们表明,使用Lipschitz连续鉴别器,BOLT-GAN隐式地最小化了与地球移动(Wasserstein)距离不同的度量距离,并实现了更好的训练稳定性。在四个标准的图像生成基准测试(CIFAR-10、CelebA-64、LSUN Bedroom-64和LSUN Church-64)上的经验评估表明,BOLT-GAN持续优于WGAN,降低了10-60%的Frechet Inception Distance(FID)。我们的结果表明,BOLT是一个广泛应用于提高GAN训练的原则。
论文及项目相关链接
Summary
本文介绍了基于贝叶斯最优学习阈值(BOLT)启发的WGAN框架的改进版本——BOLT-GAN。通过引入Lipschitz连续判别器,BOLT-GAN隐式地最小化了一种不同于地球移动(Wasserstein)距离的度量距离,并实现了更好的训练稳定性。在四个标准图像生成基准测试上的经验评估表明,BOLT-GAN在性能上持续优于WGAN,降低了10%~60%的Frechet Inception Distance(FID)。研究结果表明,BOLT是广泛应用于提高GAN训练效果的原则。
Key Takeaways
- BOLT-GAN是基于贝叶斯最优学习阈值(BOLT)启发的WGAN框架的改进版本。
- 通过引入Lipschitz连续判别器,BOLT-GAN隐式地最小化了一种不同于地球移动距离的度量距离。
- BOLT-GAN实现了更好的训练稳定性。
- 在四个标准图像生成基准测试上,BOLT-GAN的性能持续优于WGAN。
- BOLT-GAN降低了10%~60%的Frechet Inception Distance(FID)。
- 经验评估表明,BOLT-GAN在各种图像生成任务中具有广泛的应用性。
点此查看论文截图
OpenHype: Hyperbolic Embeddings for Hierarchical Open-Vocabulary Radiance Fields
Authors:Lisa Weijler, Sebastian Koch, Fabio Poiesi, Timo Ropinski, Pedro Hermosilla
Modeling the inherent hierarchical structure of 3D objects and 3D scenes is highly desirable, as it enables a more holistic understanding of environments for autonomous agents. Accomplishing this with implicit representations, such as Neural Radiance Fields, remains an unexplored challenge. Existing methods that explicitly model hierarchical structures often face significant limitations: they either require multiple rendering passes to capture embeddings at different levels of granularity, significantly increasing inference time, or rely on predefined, closed-set discrete hierarchies that generalize poorly to the diverse and nuanced structures encountered by agents in the real world. To address these challenges, we propose OpenHype, a novel approach that represents scene hierarchies using a continuous hyperbolic latent space. By leveraging the properties of hyperbolic geometry, OpenHype naturally encodes multi-scale relationships and enables smooth traversal of hierarchies through geodesic paths in latent space. Our method outperforms state-of-the-art approaches on standard benchmarks, demonstrating superior efficiency and adaptability in 3D scene understanding.
对三维物体和三维场景进行建模非常理想,因为它为自主智能体提供了更全面的环境理解。然而,使用隐式表示(如神经辐射场)来实现这一点仍然是一个未被探索的挑战。现有的显式建模层次结构的方法常常面临重大局限性:它们要么需要多次渲染以捕获不同粒度级别的嵌入,从而显著增加推理时间;要么依赖于预设的、封闭的离散层次结构,这在智能体在现实世界中遇到的各种多样化和细微结构中表现较差。为了解决这些挑战,我们提出了OpenHype这一新方法,使用连续的双曲潜空间来表示场景层次结构。通过利用双曲几何的特性,OpenHype自然地编码了多尺度关系,并通过潜空间中的测地线路径实现了层次结构的平滑遍历。我们的方法在标准基准测试中表现出优于当前最先进的性能水平,展示了其在三维场景理解中的优越效率和适应性。
论文及项目相关链接
Summary
本文提出了OpenHype方法,利用连续的双曲潜在空间表示场景层次结构,通过双曲几何的属性自然地编码多尺度关系,并通过潜在空间中的测地线路径实现平滑遍历层次结构。该方法在标准基准测试中表现出卓越的性能,证明了其在3D场景理解中的高效性和适应性。
Key Takeaways
- OpenHype方法利用连续的双曲潜在空间进行场景层次建模。
- 多尺度关系通过双曲几何的自然属性进行编码。
- 通过潜在空间中的测地线路径实现层次结构的平滑遍历。
- 与现有方法相比,OpenHype在3D场景理解方面表现出卓越的性能。
- OpenHype方法提高了效率并增强了适应性。
- 该方法能够处理现实世界中的多样化和细微层次结构。
点此查看论文截图
GauSSmart: Enhanced 3D Reconstruction through 2D Foundation Models and Geometric Filtering
Authors:Alexander Valverde, Brian Xu, Yuyin Zhou, Meng Xu, Hongyun Wang
Scene reconstruction has emerged as a central challenge in computer vision, with approaches such as Neural Radiance Fields (NeRF) and Gaussian Splatting achieving remarkable progress. While Gaussian Splatting demonstrates strong performance on large-scale datasets, it often struggles to capture fine details or maintain realism in regions with sparse coverage, largely due to the inherent limitations of sparse 3D training data. In this work, we propose GauSSmart, a hybrid method that effectively bridges 2D foundational models and 3D Gaussian Splatting reconstruction. Our approach integrates established 2D computer vision techniques, including convex filtering and semantic feature supervision from foundational models such as DINO, to enhance Gaussian-based scene reconstruction. By leveraging 2D segmentation priors and high-dimensional feature embeddings, our method guides the densification and refinement of Gaussian splats, improving coverage in underrepresented areas and preserving intricate structural details. We validate our approach across three datasets, where GauSSmart consistently outperforms existing Gaussian Splatting in the majority of evaluated scenes. Our results demonstrate the significant potential of hybrid 2D-3D approaches, highlighting how the thoughtful combination of 2D foundational models with 3D reconstruction pipelines can overcome the limitations inherent in either approach alone.
场景重建已成为计算机视觉的核心挑战,神经辐射场(NeRF)和高斯拼贴(Gaussian Splatting)等方法取得了显著进展。尽管高斯拼贴在大型数据集上表现出强大的性能,但在稀疏覆盖的区域捕捉精细细节或保持真实性方面往往遇到困难,这主要是由于稀疏的3D训练数据固有的局限性所致。在本研究中,我们提出了一种混合方法GauSSmart,它有效地桥接了2D基础模型和3D高斯拼贴重建。我们的方法集成了成熟的2D计算机视觉技术,包括凸过滤和来自基础模型(如DINO)的语义特征监督,以增强基于高斯的场景重建。通过利用2D分割先验和高维特征嵌入,我们的方法引导高斯拼贴的致密化和精细化,改进了欠代表区域的覆盖并保留了复杂结构细节。我们在三个数据集上验证了我们的方法,GauSSmart在大多数评估场景中始终优于现有高斯拼贴。我们的结果证明了混合2D-3D方法的巨大潜力,突出了如何将2D基础模型与3D重建管道相结合,以克服单一方法的固有局限性。
论文及项目相关链接
Summary
NeRF领域的研究者提出了一种名为GauSSmart的混合方法,该方法结合了二维基础模型和三维高斯喷绘重建技术,旨在解决场景重建中的精细细节捕捉和稀疏区域的真实感保持问题。通过利用二维分割先验和高维特征嵌入,GauSSmart能够指导高斯喷绘的密集化和精细化,提高欠代表区域的覆盖并保留精细的结构细节。在三个数据集上的验证表明,GauSSmart在多数评估场景中均优于现有的高斯喷绘技术,显示出混合二维-三维方法的巨大潜力。
Key Takeaways
- Neural Radiance Fields (NeRF) 和 Gaussian Splatting 等方法在场景重建中取得显著进展。
- Gaussian Splatting在大规模数据集上表现强劲,但在捕捉精细细节和保持稀疏区域的真实感方面存在挑战。
- GauSSmart是一种混合方法,结合了二维基础模型和三维高斯喷绘重建技术。
- GauSSmart利用二维分割先验和高维特征嵌入,指导高斯喷绘的密集化和精细化。
- GauSSmart在欠代表区域的覆盖和精细结构细节的保留方面表现出色。
- 在三个数据集上的验证表明,GauSSmart在多数评估场景中优于现有高斯喷绘技术。
点此查看论文截图
NGGAN: Noise Generation GAN Based on the Practical Measurement Dataset for Narrowband Powerline Communications
Authors:Ying-Ren Chien, Po-Heng Chou, You-Jie Peng, Chun-Yuan Huang, Hen-Wai Tsao, Yu Tsao
To effectively process impulse noise for narrowband powerline communications (NB-PLCs) transceivers, capturing comprehensive statistics of nonperiodic asynchronous impulsive noise (APIN) is a critical task. However, existing mathematical noise generative models only capture part of the characteristics of noise. In this study, we propose a novel generative adversarial network (GAN) called noise generation GAN (NGGAN) that learns the complicated characteristics of practically measured noise samples for data synthesis. To closely match the statistics of complicated noise over the NB-PLC systems, we measured the NB-PLC noise via the analog coupling and bandpass filtering circuits of a commercial NB-PLC modem to build a realistic dataset. To train NGGAN, we adhere to the following principles: 1) we design the length of input signals that the NGGAN model can fit to facilitate cyclostationary noise generation; 2) the Wasserstein distance is used as a loss function to enhance the similarity between the generated noise and training data; and 3) to measure the similarity performances of GAN-based models based on the mathematical and practically measured datasets, we conduct both quantitative and qualitative analyses. The training datasets include: 1) a piecewise spectral cyclostationary Gaussian model (PSCGM); 2) a frequency-shift (FRESH) filter; and 3) practical measurements from NB-PLC systems. Simulation results demonstrate that the generated noise samples from the proposed NGGAN are highly close to the real noise samples. The principal component analysis (PCA) scatter plots and Fr'echet inception distance (FID) analysis have shown that NGGAN outperforms other GAN-based models by generating noise samples with superior fidelity and higher diversity.
针对窄带电力线通信(NB-PLC)收发器中的脉冲噪声处理,捕获非周期性异步脉冲噪声(APIN)的综合统计信息是一项关键任务。然而,现有的数学噪声生成模型只能捕捉噪声部分特征。在本研究中,我们提出了一种名为噪声生成GAN(NGGAN)的新型生成对抗网络,用于合成数据,学习实际测量噪声样本的复杂特征。为了紧密匹配NB-PLC系统上的复杂噪声统计信息,我们通过商业NB-PLC调制解调器的模拟耦合和带通滤波电路测量NB-PLC噪声,以构建现实数据集。为了训练NGGAN,我们遵循以下原则:1)设计NGGAN模型能够适应的输入信号长度,以促进循环平稳噪声生成;2)使用Wasserstein距离作为损失函数,提高生成噪声与训练数据之间的相似性;3)为了测量基于数学和实际测量数据集的GAN模型的相似性性能,我们进行了定量和定性分析。训练数据集包括:1)分段谱循环平稳高斯模型(PSCGM);2)频率偏移(FRESH)滤波器;3)来自NB-PLC系统的实际测量。仿真结果表明,所提出的NGGAN生成的噪声样本与真实噪声样本高度接近。主成分分析(PCA)散点图和Fréchet inception距离(FID)分析表明,NGGAN通过生成具有更高保真度和多样性的噪声样本,在其他基于GAN的模型中表现出色。
论文及项目相关链接
PDF 16 pages, 15 figures, 11 tables, and published in IEEE Transactions on Instrumentation and Measurement, 2025
Summary
本文提出一种针对窄带电力线通信(NB-PLC)的噪声生成对抗网络(NGGAN),用于捕捉非周期性异步脉冲噪声(APIN)的综合统计信息。通过商业NB-PLC调制解调器的模拟耦合和带通滤波电路采集实际噪声数据,建立现实数据集。采用Wasserstein距离作为损失函数,增强生成噪声与训练数据之间的相似性。仿真结果表明,NGGAN生成的噪声样本与真实噪声样本高度接近,且通过主成分分析(PCA)散点图和Fréchet inception距离(FID)分析,表现出优于其他GAN模型的保真度和多样性。
Key Takeaways
- 提出使用NGGAN网络处理NB-PLC中的脉冲噪声,该网络能捕捉非周期性异步脉冲噪声的综合统计信息。
- 通过商业NB-PLC调制解调器采集实际噪声数据,建立现实数据集以匹配NB-PLC系统中的复杂噪声统计。
- 采用Wasserstein距离作为损失函数,提高生成噪声与训练数据之间的相似性。
- NGGAN生成的噪声样本与真实样本高度接近。
- NGGAN在生成噪声样本的保真度和多样性上优于其他GAN模型。
- 采用PCA散点图和FID分析来评估模型性能。
点此查看论文截图
MS-GS: Multi-Appearance Sparse-View 3D Gaussian Splatting in the Wild
Authors:Deming Li, Kaiwen Jiang, Yutao Tang, Ravi Ramamoorthi, Rama Chellappa, Cheng Peng
In-the-wild photo collections often contain limited volumes of imagery and exhibit multiple appearances, e.g., taken at different times of day or seasons, posing significant challenges to scene reconstruction and novel view synthesis. Although recent adaptations of Neural Radiance Field (NeRF) and 3D Gaussian Splatting (3DGS) have improved in these areas, they tend to oversmooth and are prone to overfitting. In this paper, we present MS-GS, a novel framework designed with Multi-appearance capabilities in Sparse-view scenarios using 3DGS. To address the lack of support due to sparse initializations, our approach is built on the geometric priors elicited from monocular depth estimations. The key lies in extracting and utilizing local semantic regions with a Structure-from-Motion (SfM) points anchored algorithm for reliable alignment and geometry cues. Then, to introduce multi-view constraints, we propose a series of geometry-guided supervision steps at virtual views in pixel and feature levels to encourage 3D consistency and reduce overfitting. We also introduce a dataset and an in-the-wild experiment setting to set up more realistic benchmarks. We demonstrate that MS-GS achieves photorealistic renderings under various challenging sparse-view and multi-appearance conditions, and outperforms existing approaches significantly across different datasets.
在野外的照片集通常包含有限的图像数量,并且展现出多种外观,例如在一天的不同时间或季节拍摄的照片,这给场景重建和新型视图合成带来了重大挑战。尽管最近对神经辐射场(NeRF)和三维高斯拼接(3DGS)的改编在这些领域有所改善,但它们往往过于平滑并容易过度拟合。在本文中,我们提出了MS-GS,这是一个在稀疏视图场景中利用3DGS的多外观能力设计的新型框架。为了解决由于稀疏初始化而导致的支持不足的问题,我们的方法建立在从单眼深度估计中引发的几何先验之上。关键在于通过结构从运动(SfM)点锚定算法提取和利用局部语义区域,以实现可靠的对齐和几何线索。然后,为了引入多视图约束,我们在像素和特征层面提出了在虚拟视图中进行的一系列几何引导的监督步骤,以鼓励三维一致性并减少过度拟合。我们还引入了一个数据集和一个野外实验设置,以建立更现实的基准测试。我们证明,MS-GS在各种具有挑战性的稀疏视图和多外观条件下实现了逼真的渲染,并且在不同的数据集上显著优于现有方法。
论文及项目相关链接
Summary
本文提出了一种基于3DGS的新型框架MS-GS,用于处理野外照片集中稀疏视角和多场景外观下的重建和合成。它通过利用几何先验信息来解决因稀疏初始化而带来的不足,采用基于单目深度估计的结构化建模。引入多重几何指导的约束方法以实现虚拟视图的精准对齐和几何一致性,减少过拟合现象。实验证明,MS-GS在多种稀疏视角和多场景外观条件下实现了逼真的渲染效果,并在不同数据集上显著优于现有方法。
Key Takeaways
- 野外照片集中存在多种挑战,如不同时间点和季节的图像拍摄造成的复杂场景重建和视角合成问题。
- MS-GS框架采用基于几何先验信息的建模方法,解决了稀疏初始化问题,采用结构化建模,旨在应对复杂的多外观场景。
- 框架采用单目深度估计,并采用结构化运动(SfM)点的固定算法进行可靠对齐和几何线索提取。
- 为实现多视角约束,引入了一系列几何指导的监督步骤,实现虚拟视角的精准渲染并减少过拟合现象。
- 提出了一种新的数据集和实验设置,以建立更现实的基准测试环境。
点此查看论文截图
MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction
Authors:Yaopeng Lou, Liao Shen, Tianqi Liu, Jiaqi Li, Zihao Huang, Huiqiang Sun, Zhiguo Cao
We present Multi-Baseline Gaussian Splatting (MuGS), a generalized feed-forward approach for novel view synthesis that effectively handles diverse baseline settings, including sparse input views with both small and large baselines. Specifically, we integrate features from Multi-View Stereo (MVS) and Monocular Depth Estimation (MDE) to enhance feature representations for generalizable reconstruction. Next, We propose a projection-and-sampling mechanism for deep depth fusion, which constructs a fine probability volume to guide the regression of the feature map. Furthermore, We introduce a reference-view loss to improve geometry and optimization efficiency. We leverage 3D Gaussian representations to accelerate training and inference time while enhancing rendering quality. MuGS achieves state-of-the-art performance across multiple baseline settings and diverse scenarios ranging from simple objects (DTU) to complex indoor and outdoor scenes (RealEstate10K). We also demonstrate promising zero-shot performance on the LLFF and Mip-NeRF 360 datasets. Code is available at https://github.com/EuclidLou/MuGS.
我们提出了多基线高斯拼贴(MuGS),这是一种用于新颖视图合成的通用前馈方法,能够有效处理多种基线设置,包括具有大小和基线差距较大稀疏输入视图的场景。具体来说,我们整合了多视图立体(MVS)和单眼深度估计(MDE)的特征,以增强通用重建的特征表示。接着,我们提出了一种用于深度深度融合的投影和采样机制,构建了精细的概率体积来指导特征图的回归。此外,我们引入了参考视图损失来提高几何形状和优化效率。我们利用三维高斯表示来加速训练和推理时间,同时提高渲染质量。MuGS在多基线设置和各种场景中实现了最先进的性能,包括从简单对象(DTU)到复杂的室内和室外场景(RealEstate10K)。我们在LLFF和Mip-NeRF 360数据集上展示了有前景的零样本性能。代码可在https://github.com/EuclidLou/MuGS找到。
论文及项目相关链接
PDF This work is accepted by ICCV 2025
Summary
本文提出一种名为Multi-Baseline Gaussian Splatting(MuGS)的新型视图合成广义前馈方法,适用于多种基线设置,包括输入视图稀疏、基线距离大小不同的情况。MuGS整合了多视角立体(MVS)和单目深度估计(MDE)的特征,提升特征表示的可泛化重建能力。此外,提出一种深度融合投影采样机制,构建精细概率体积以引导特征图的回归。引入参考视图损失以提高几何和优化效率,并利用3D高斯表示加速训练和推理时间,提升渲染质量。MuGS在多种基线设置和简单对象(DTU)到复杂室内室外场景(RealEstate10K)中表现优异,并在LLFF和Mip-NeRF 360数据集上展现出零样本性能。
Key Takeaways
- MuGS是一种新型视图合成方法,适用于多种基线设置。
- 集成MVS和MDE特征,提高特征表示的泛化能力。
- 提出深度融合投影采样机制,构建精细概率体积引导特征图回归。
- 引入参考视图损失,改善几何和优化效率。
- 利用3D高斯表示加速训练和推理过程,提升渲染质量。
- MuGS在多种场景中表现优异,包括简单对象和复杂室内室外场景。
- MuGS在特定数据集上展现出零样本性能。
点此查看论文截图
LiGen: GAN-Augmented Spectral Fingerprinting for Indoor Positioning
Authors:Jie Lin, Hsun-Yu Lee, Ho-Ming Li, Fang-Jing Wu
Accurate and robust indoor localization is critical for smart building applications, yet existing Wi-Fi-based systems are often vulnerable to environmental conditions. This work presents a novel indoor localization system, called LiGen, that leverages the spectral intensity patterns of ambient light as fingerprints, offering a more stable and infrastructure-free alternative to radio signals. To address the limited spectral data, we design a data augmentation framework based on generative adversarial networks (GANs), featuring two variants: PointGAN, which generates fingerprints conditioned on coordinates, and FreeGAN, which uses a weak localization model to label unconditioned samples. Our positioning model, leveraging a Multi-Layer Perceptron (MLP) architecture to train on synthesized data, achieves submeter-level accuracy, outperforming Wi-Fi-based baselines by over 50%. LiGen also demonstrates strong robustness in cluttered environments. To the best of our knowledge, this is the first system to combine spectral fingerprints with GAN-based data augmentation for indoor localization.
精确且稳定的室内定位对于智能建筑应用至关重要,然而现有的基于Wi-Fi的系统通常容易受到环境条件的干扰。这项工作提出了一种新型的室内定位系统,名为LiGen,它利用环境光的光谱强度模式作为指纹,为无线电信号提供了一种更稳定且无基础设施的替代方案。为了解决光谱数据有限的问题,我们设计了一个基于生成对抗网络(GANs)的数据增强框架,其中包含两种变体:PointGAN,它根据坐标生成指纹;FreeGAN,它使用一个弱定位模型来标记无条件的样本。我们的定位模型采用多层感知器(MLP)架构在合成数据上进行训练,实现了亚米级精度,比基于Wi-Fi的基线高出50%以上。LiGen在杂乱的环境中还表现出了强大的稳健性。据我们所知,这是第一个将光谱指纹与基于GAN的数据增强相结合用于室内定位的系统。
论文及项目相关链接
PDF 6 pages, 10 figures
Summary
室内定位对于智能建筑应用至关重要,但现有Wi-Fi系统易受环境影响。本文提出一种新型室内定位系统LiGen,利用环境光的谱强度模式作为指纹,为无线电信号提供更稳定、无需基础设施的替代方案。为解决谱数据有限问题,设计基于生成对抗网络(GANs)的数据增强框架,包括PointGAN和FreeGAN两种变体。利用多层感知器(MLP)架构在合成数据上进行训练的定位模型,实现亚米级精度,性能较Wi-Fi基线提升超过50%。LiGen在杂乱环境中表现出强大的稳健性,是首个将光谱指纹与基于GAN的数据增强相结合的室内定位系统。
Key Takeaways
- 室内定位对于智能建筑应用非常重要,但现有Wi-Fi系统存在稳定性问题。
- LiGen系统利用环境光的谱强度模式作为指纹,提供稳定的室内定位。
- 设计基于生成对抗网络(GANs)的数据增强框架,解决谱数据有限的问题。
- 开发出PointGAN和FreeGAN两种数据增强变体,用于生成指纹和标签。
- 利用多层感知器(MLP)架构在合成数据上进行训练,实现亚米级定位精度。
- LiGen性能较Wi-Fi基线提升超过50%,并在杂乱环境中表现出强大的稳健性。
点此查看论文截图
PlantSegNeRF: A few-shot, cross-species method for plant 3D instance point cloud reconstruction via joint-channel NeRF with multi-view image instance matching
Authors:Xin Yang, Ruiming Du, Hanyang Huang, Jiayang Xie, Pengyao Xie, Leisen Fang, Ziyue Guo, Nanjun Jiang, Yu Jiang, Haiyan Cen
Organ segmentation of plant point clouds is a prerequisite for the high-resolution and accurate extraction of organ-level phenotypic traits. Although the fast development of deep learning has boosted much research on segmentation of plant point clouds, the existing techniques for organ segmentation still face limitations in resolution, segmentation accuracy, and generalizability across various plant species. In this study, we proposed a novel approach called plant segmentation neural radiance fields (PlantSegNeRF), aiming to directly generate high-precision instance point clouds from multi-view RGB image sequences for a wide range of plant species. PlantSegNeRF performed 2D instance segmentation on the multi-view images to generate instance masks for each organ with a corresponding ID. The multi-view instance IDs corresponding to the same plant organ were then matched and refined using a specially designed instance matching module. The instance NeRF was developed to render an implicit scene, containing color, density, semantic and instance information. The implicit scene was ultimately converted into high-precision plant instance point clouds based on the volume density. The results proved that in semantic segmentation of point clouds, PlantSegNeRF outperformed the commonly used methods, demonstrating an average improvement of 16.1%, 18.3%, 17.8%, and 24.2% in precision, recall, F1-score, and IoU compared to the second-best results on structurally complex species. More importantly, PlantSegNeRF exhibited significant advantages in plant point cloud instance segmentation tasks. Across all plant species, it achieved average improvements of 11.7%, 38.2%, 32.2% and 25.3% in mPrec, mRec, mCov, mWCov, respectively. This study extends the organ-level plant phenotyping and provides a high-throughput way to supply high-quality 3D data for the development of large-scale models in plant science.
植物点云器官分割是高分辨率和精确提取器官水平表型特征的前提。尽管深度学习的快速发展促进了植物点云分割的研究,但现有的器官分割技术仍面临分辨率、分割精度和跨物种泛化能力的局限性。本研究提出了一种新的方法,称为植物分割神经辐射场(PlantSegNeRF),旨在直接从多视角RGB图像序列为广泛的植物物种生成高精度实例点云。PlantSegNeRF在多视角图像上进行2D实例分割,为每个器官生成具有相应ID的实例掩膜。然后,使用专门设计的实例匹配模块匹配和细化对应于同一植物器官的多视角实例ID。开发了实例NeRF来呈现包含颜色、密度、语义和实例信息的隐式场景。最终,基于体积密度将隐式场景转换为高精度植物实例点云。结果证明,在点云语义分割中,PlantSegNeRF优于常用方法,在结构复杂的物种上,与第二好的结果相比,精度、召回率、F1分数和IoU平均提高了16.1%、18.3%、17.8%和24.2%。更重要的是,PlantSegNeRF在植物点云实例分割任务中表现出显著优势。在所有植物物种中,mPrec、mRec、mCov和mWCov平均提高了11.7%、38.2%、32.2%和25.3%。该研究扩展了植物表型的器官水平研究,并为植物科学中大规模模型的开发提供了一种提供高质量3D数据的高通量方法。
论文及项目相关链接
摘要
植物点云器官分割是高分辨率和精确提取器官水平表型特征的前提。尽管深度学习快速发展推动了植物点云分割的研究,但现有技术仍面临分辨率、分割精度和跨物种泛化能力的局限。本研究提出了一种名为PlantSegNeRF的新方法,旨在从多视角RGB图像序列直接生成高精度实例点云,适用于广泛植物物种。PlantSegNeRF对多视角图像进行2D实例分割,为每个器官生成实例掩膜并赋予相应ID。利用专门设计的实例匹配模块对同一植物器官的跨视角实例ID进行匹配和细化。开发实例NeRF以呈现包含颜色、密度、语义和实例信息的隐式场景。最终,根据体积密度将隐式场景转换为高精度植物实例点云。结果证明,在点云语义分割方面,PlantSegNeRF较常用方法表现出优势,在结构复杂物种的精度、召回率、F1分数和IoU方面平均提高了16.1%、18.3%、17.8%和24.2%。更重要的是,在植物点云实例分割任务中,PlantSegNeRF具有显著优势,在所有物种中,mPrec、mRec、mCov和mWCov平均提高了11.7%、38.2%、32.2%和25.3%。本研究扩展了植物表型学中的器官水平研究,为植物科学中大规模模型的发展提供了一种产生高质量3D数据的高通量方法。
关键见解
- 植物点云器官分割对于高分辨率和精确提取器官水平表型特征至关重要。
- PlantSegNeRF是一种新颖的方法,可以从多视角RGB图像序列生成高精度实例点云,适用于多种植物物种。
- PlantSegNeRF通过2D实例分割和多视角实例匹配模块提高了分割精度和泛化能力。
- 实例NeRF的引入为呈现包含多种信息的隐式场景提供了新的手段。
- PlantSegNeRF在语义分割和实例分割方面都表现出优于常规方法的性能。
- 该研究扩展了植物表型学研究,特别是在器官水平上的研究。
点此查看论文截图
LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering
Authors:Jonas Kulhanek, Marie-Julie Rakotosaona, Fabian Manhardt, Christina Tsalicoglou, Michael Niemeyer, Torsten Sattler, Songyou Peng, Federico Tombari
In this work, we present a novel level-of-detail (LOD) method for 3D Gaussian Splatting that enables real-time rendering of large-scale scenes on memory-constrained devices. Our approach introduces a hierarchical LOD representation that iteratively selects optimal subsets of Gaussians based on camera distance, thus largely reducing both rendering time and GPU memory usage. We construct each LOD level by applying a depth-aware 3D smoothing filter, followed by importance-based pruning and fine-tuning to maintain visual fidelity. To further reduce memory overhead, we partition the scene into spatial chunks and dynamically load only relevant Gaussians during rendering, employing an opacity-blending mechanism to avoid visual artifacts at chunk boundaries. Our method achieves state-of-the-art performance on both outdoor (Hierarchical 3DGS) and indoor (Zip-NeRF) datasets, delivering high-quality renderings with reduced latency and memory requirements.
在这项工作中,我们为3D高斯绘图提出了一种新型的细节层次(LOD)方法,能够在内存受限的设备上实现大规模场景的实时渲染。我们的方法引入了一种层次型LOD表示,根据摄像机距离迭代选择最优的高斯子集,从而大大降低了渲染时间和GPU内存的使用。我们通过应用深度感知的3D平滑滤波器构建每个LOD级别,然后通过基于重要性的修剪和微调来保持视觉保真度。为了进一步优化内存开销,我们将场景分割成空间块,并在渲染时动态加载相关的高斯值,采用透明度混合机制以避免块边界处的视觉伪影。我们的方法在户外(Hierarchical 3DGS)和室内(Zip-NeRF)数据集上均实现了卓越的性能,以降低延迟和内存需求的同时提供高质量的渲染效果。
论文及项目相关链接
PDF NeurIPS 2025; Web: https://lodge-gs.github.io/
Summary
本文提出了一种针对3D高斯涂抹技术的新型细节层次(LOD)方法,能在内存受限的设备上实现大规模场景的实时渲染。该方法通过引入层次化LOD表示,根据相机距离迭代选择最优的高斯子集,从而大幅减少渲染时间和GPU内存使用。通过深度感知的3D平滑滤波构建每个LOD层次,然后进行基于重要性的修剪和微调以维持视觉保真度。为进一步优化内存开销,将场景分割成空间块,并在渲染时仅动态加载相关的高斯值,采用透明度混合机制避免块边界处的视觉伪影。该方法在户外(Hierarchical 3DGS)和室内(Zip-NeRF)数据集上均表现出卓越性能,可实现高质量渲染,降低延迟和内存要求。
Key Takeaways
- 引入了一种新型的细节层次(LOD)方法,用于优化3D高斯涂抹技术的实时渲染性能。
- 通过层次化LOD表示,根据相机距离选择最优的高斯子集,降低渲染时间和GPU内存使用。
- 利用深度感知的3D平滑滤波构建每个LOD层次,保持视觉保真度。
- 通过重要性修剪和微调技术进一步优化渲染效果。
- 采用场景空间分块和动态加载相关高斯值的方法,降低内存开销。
- 采用了透明度混合机制,避免块边界处的视觉伪影。
点此查看论文截图
NVS-SQA: Exploring Self-Supervised Quality Representation Learning for Neurally Synthesized Scenes without References
Authors:Qiang Qu, Yiran Shen, Xiaoming Chen, Yuk Ying Chung, Weidong Cai, Tongliang Liu
Neural View Synthesis (NVS), such as NeRF and 3D Gaussian Splatting, effectively creates photorealistic scenes from sparse viewpoints, typically evaluated by quality assessment methods like PSNR, SSIM, and LPIPS. However, these full-reference methods, which compare synthesized views to reference views, may not fully capture the perceptual quality of neurally synthesized scenes (NSS), particularly due to the limited availability of dense reference views. Furthermore, the challenges in acquiring human perceptual labels hinder the creation of extensive labeled datasets, risking model overfitting and reduced generalizability. To address these issues, we propose NVS-SQA, a NSS quality assessment method to learn no-reference quality representations through self-supervision without reliance on human labels. Traditional self-supervised learning predominantly relies on the “same instance, similar representation” assumption and extensive datasets. However, given that these conditions do not apply in NSS quality assessment, we employ heuristic cues and quality scores as learning objectives, along with a specialized contrastive pair preparation process to improve the effectiveness and efficiency of learning. The results show that NVS-SQA outperforms 17 no-reference methods by a large margin (i.e., on average 109.5% in SRCC, 98.6% in PLCC, and 91.5% in KRCC over the second best) and even exceeds 16 full-reference methods across all evaluation metrics (i.e., 22.9% in SRCC, 19.1% in PLCC, and 18.6% in KRCC over the second best).
神经视图合成(NVS),如NeRF和3D高斯展布技术,能够有效地从稀疏视角生成逼真的场景,通常通过PSNR、SSIM和LPIPS等质量评估方法进行评估。然而,这些全参考方法将合成视图与参考视图进行比较,可能无法完全捕捉神经合成场景(NSS)的感知质量,尤其是因为密集参考视图的可获得性有限。此外,获取人类感知标签的挑战阻碍了大规模标记数据集的创建,从而增加了模型过度拟合和泛化能力降低的风险。为了解决这些问题,我们提出了NVS-SQA,这是一种NSS质量评估方法,通过自监督学习无参考质量表示,而不依赖于人工标签。虽然传统自监督学习主要依赖于“同一实例,相似表示”的假设和大量数据集,但由于这些条件不适用于NSS质量评估,我们采用启发式线索和质量分数作为学习目标,并设计了一个专门的对比对准备过程,以提高学习的有效性和效率。结果表明,NVS-SQA在17种无参考方法中有很大的优势(例如,SRCC平均提高109.5%,PLCC提高98.6%,KRCC提高91.5%);在所有评估指标上,甚至超过了16种全参考方法(例如,SRCC提高22.9%,PLCC提高19.1%,KRCC提高18.6%)。
论文及项目相关链接
PDF Accepted by TPAMI
Summary
神经网络视图合成(NVS)如NeRF和3D高斯喷涂等技术,能从稀疏视角生成逼真的场景。但由于缺乏密集参考视角和难以获取人类感知标签,现有的质量评估方法存在局限性。为此,我们提出NVS-SQA无参考质量评估方法,通过自监督学习无需依赖人类标签学习质量表示。该方法采用启发式线索和质量分数作为学习目标,并设计对比配对准备过程以提高学习和效率。NVS-SQA在性能上大幅超越了现有方法。
Key Takeaways
- 神经网络视图合成(NVS)技术如NeRF能生成逼真的场景,但质量评估存在挑战。
- 由于缺乏密集参考视角和难以获取人类感知标签,现有的全参考质量评估方法可能无法完全捕捉神经合成场景(NSS)的感知质量。
- 提出NVS-SQA方法,通过自监督学习无需依赖人类标签学习无参考质量表示。
- NVS-SQA采用启发式线索和质量分数作为学习目标,并设计对比配对准备过程提高学习和效率。
- NVS-SQA在性能上大幅超越了现有方法,平均在SRCC、PLCC和KRCC上分别高出第二名方法109.5%、98.6%和91.5%。
- NVS-SQA甚至在所有评估指标上都超过了部分全参考方法,显示出其强大的性能。
点此查看论文截图
UV-Attack: Physical-World Adversarial Attacks for Person Detection via Dynamic-NeRF-based UV Mapping
Authors:Yanjie Li, Kaisheng Liang, Bin Xiao
In recent research, adversarial attacks on person detectors using patches or static 3D model-based texture modifications have struggled with low success rates due to the flexible nature of human movement. Modeling the 3D deformations caused by various actions has been a major challenge. Fortunately, advancements in Neural Radiance Fields (NeRF) for dynamic human modeling offer new possibilities. In this paper, we introduce UV-Attack, a groundbreaking approach that achieves high success rates even with extensive and unseen human actions. We address the challenge above by leveraging dynamic-NeRF-based UV mapping. UV-Attack can generate human images across diverse actions and viewpoints, and even create novel actions by sampling from the SMPL parameter space. While dynamic NeRF models are capable of modeling human bodies, modifying clothing textures is challenging because they are embedded in neural network parameters. To tackle this, UV-Attack generates UV maps instead of RGB images and modifies the texture stacks. This approach enables real-time texture edits and makes the attack more practical. We also propose a novel Expectation over Pose Transformation loss (EoPT) to improve the evasion success rate on unseen poses and views. Our experiments show that UV-Attack achieves a 92.7% attack success rate against the FastRCNN model across varied poses in dynamic video settings, significantly outperforming the state-of-the-art AdvCamou attack, which only had a 28.5% ASR. Moreover, we achieve 49.5% ASR on the latest YOLOv8 detector in black-box settings. This work highlights the potential of dynamic NeRF-based UV mapping for creating more effective adversarial attacks on person detectors, addressing key challenges in modeling human movement and texture modification. The code is available at https://github.com/PolyLiYJ/UV-Attack.
在最近的研究中,利用补丁或基于静态3D模型的纹理修改对人员检测器进行对抗性攻击的成功率较低,这是由于人类运动的灵活性所带来的挑战。对由各种动作引起的3D变形进行建模一直是一个巨大的挑战。幸运的是,神经网络辐射场(NeRF)在动态人体建模方面的进展提供了新的可能性。在本文中,我们介绍了UV-Attack,这是一种突破性的方法,即使面对广泛且未见的人类动作,也能实现较高的成功率。我们通过利用基于动态NeRF的UV映射来解决上述挑战。UV-Attack可以生成各种动作和视角下的人类图像,甚至可以通过采样SMPL参数空间来创建新的动作。虽然动态NeRF模型能够对人体进行建模,但修改衣物纹理却是一个挑战,因为它们被嵌入在神经网络参数中。为了解决这一问题,UV-Attack生成UV地图而不是RGB图像,并修改纹理堆栈。这种方法实现了实时纹理编辑,使攻击更加实用。我们还提出了一种新颖的基于姿态变换期望的损失(EoPT),以提高在未见姿态和视角下的躲避成功率。我们的实验表明,在动态视频设置中,针对各种姿态的FastRCNN模型,UV-Attack的攻击成功率达到了92.7%,显著优于最先进的AdvCamou攻击,其仅有28.5%的攻击成功率(ASR)。此外,我们在黑箱设置中实现了对最新YOLOv8检测器的49.5%ASR。这项工作突出了动态NeRF基UV映射在创建针对人员检测器的更有效对抗性攻击方面的潜力,解决了建模人类运动和纹理修改方面的关键挑战。代码可在[https://github.com/PolyLiYJ/UV-Attack找到。]
论文及项目相关链接
PDF 23 pages, 22 figures, accepted by ICLR2025
Summary
基于NeRF的动态人类建模技术为攻击人物检测器提供了新的可能性。最新研究的UV-Attack方法通过利用动态NeRF的UV映射技术,能够在多种动作和视角下生成人类图像,并创建新型动作。该方法在实时纹理编辑和攻击实用性方面具有优势,成功解决了人体动作和纹理修改建模中的关键挑战。实验表明,UV-Attack在动态视频设置中对FastRCNN模型的攻击成功率达到92.7%,显著优于仅具有28.5%ASR的AdvCamou攻击。此外,在黑色盒子设置下,对最新的YOLOv8检测器的ASR达到49.5%。
Key Takeaways
- UV-Attack利用NeRF技术实现动态人类建模,提高了攻击人物检测器的成功率。
- UV-Attack解决了传统补丁或静态3D模型纹理修改对动态人类动作的建模挑战。
- 通过利用动态NeRF的UV映射技术,UV-Attack可以生成不同动作和视角下的图像并创建新型动作。
- UV-Attack实现了实时纹理编辑,提高了攻击的实用性。
- UV-Attack通过引入期望姿态变换损失(EoPT)提高了未见姿态和视角的逃避成功率。
- 实验显示,UV-Attack在动态视频设置中对FastRCNN模型的攻击成功率远高于其他方法。
点此查看论文截图
GS-ProCams: Gaussian Splatting-based Projector-Camera Systems
Authors:Qingyue Deng, Jijiang Li, Haibin Ling, Bingyao Huang
We present GS-ProCams, the first Gaussian Splatting-based framework for projector-camera systems (ProCams). GS-ProCams is not only view-agnostic but also significantly enhances the efficiency of projection mapping (PM) that requires establishing geometric and radiometric mappings between the projector and the camera. Previous CNN-based ProCams are constrained to a specific viewpoint, limiting their applicability to novel perspectives. In contrast, NeRF-based ProCams support view-agnostic projection mapping, however, they require an additional co-located light source and demand significant computational and memory resources. To address this issue, we propose GS-ProCams that employs 2D Gaussian for scene representations, and enables efficient view-agnostic ProCams applications. In particular, we explicitly model the complex geometric and photometric mappings of ProCams using projector responses, the projection surface’s geometry and materials represented by Gaussians, and the global illumination component. Then, we employ differentiable physically-based rendering to jointly estimate them from captured multi-view projections. Compared to state-of-the-art NeRF-based methods, our GS-ProCams eliminates the need for additional devices, achieving superior ProCams simulation quality. It also uses only 1/10 of the GPU memory for training and is 900 times faster in inference speed. Please refer to our project page for the code and dataset: https://realqingyue.github.io/GS-ProCams/.
我们提出了GS-ProCams,这是基于高斯展布(Gaussian Splatting)的投影仪摄像头系统(ProCams)的首个框架。GS-ProCams不仅不受视角的限制,还能显著提高投影仪映射(PM)的效率,后者需要在投影仪和摄像头之间建立几何和辐射度量映射。以前的基于CNN的ProCams受限于特定的视角,限制了它们在新型视角下的应用。相比之下,基于NeRF的ProCams支持不受视角限制的投影映射,但它们需要额外的共置光源,并需要大量的计算和内存资源。为了解决这个问题,我们提出了GS-ProCams,它采用2D高斯进行场景表示,能够实现高效的视角无关ProCams应用。特别是,我们使用投影仪响应来显式地建模ProCams的复杂几何和光度映射,以及由高斯表示的投影表面的几何形状和材料,和全局照明组件。然后,我们采用基于物理的可微渲染技术,从捕获的多视角投影中联合估计它们。与最先进的基于NeRF的方法相比,我们的GS-ProCams不需要额外的设备,实现了更高的ProCams模拟质量。它的训练只需使用十分之一不到的GPU内存,推理速度也快了900倍。有关代码和数据集,请参见我们的项目页面:https://realqingyue.github.io/GS-ProCams/。
论文及项目相关链接
Summary
提出一种基于高斯融合技术的投影仪相机系统(GS-ProCams)。相较于以往的方法,它不仅视角无关,增强了投影映射的效率,无需额外的光源设备。采用二维高斯函数表达场景,构建复杂几何和光度映射模型,通过物理渲染技术实现多视角投影的联合估计。相较于基于NeRF的方法,GS-ProCams模拟质量更优,使用GPU内存仅为十分之一,推理速度提升900倍。
Key Takeaways
- GS-ProCams是一个基于高斯融合技术的投影仪相机系统框架。
- 它解决了以往视角相关性的投影仪相机系统的局限。
- 通过采用二维高斯函数表达场景提高了投影映射的效率。
- 构建复杂几何和光度映射模型,包括投影仪响应、投影表面几何和材料属性以及全局照明成分。
- 利用物理渲染技术从多视角投影中联合估计这些参数。
- GS-ProCams相较于基于NeRF的方法在模拟质量上更胜一筹,使用较少的GPU资源并且推理速度更快。
点此查看论文截图
MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussians
Authors:Peng Chen, Xiaobao Wei, Qingpo Wuwu, Xinyi Wang, Xingyu Xiao, Ming Lu
Reconstructing high-fidelity 3D head avatars is crucial in various applications such as virtual reality. The pioneering methods reconstruct realistic head avatars with Neural Radiance Fields (NeRF), which have been limited by training and rendering speed. Recent methods based on 3D Gaussian Splatting (3DGS) significantly improve the efficiency of training and rendering. However, the surface inconsistency of 3DGS results in subpar geometric accuracy; later, 2DGS uses 2D surfels to enhance geometric accuracy at the expense of rendering fidelity. To leverage the benefits of both 2DGS and 3DGS, we propose a novel method named MixedGaussianAvatar for realistically and geometrically accurate head avatar reconstruction. Our main idea is to utilize 2D Gaussians to reconstruct the surface of the 3D head, ensuring geometric accuracy. We attach the 2D Gaussians to the triangular mesh of the FLAME model and connect additional 3D Gaussians to those 2D Gaussians where the rendering quality of 2DGS is inadequate, creating a mixed 2D-3D Gaussian representation. These 2D-3D Gaussians can then be animated using FLAME parameters. We further introduce a progressive training strategy that first trains the 2D Gaussians and then fine-tunes the mixed 2D-3D Gaussians. We use a unified mixed Gaussian representation to integrate the two modalities of 2D image and 3D mesh. Furthermore, the comprehensive experiments demonstrate the superiority of MixedGaussianAvatar. The code will be released.
重建高保真3D头像在虚拟现实中有着广泛应用,具有重要的价值。先驱的方法使用神经辐射场(NeRF)重建逼真的头像,但受限于训练和渲染速度。基于3D高斯拼贴(3DGS)的最新方法显著提高了训练和渲染的效率。然而,3DGS的表面不一致导致几何精度不佳;之后的2DGS使用2D表面来提高几何精度,但牺牲了渲染的保真度。为了结合2DGS和3DGS的优点,我们提出了一种名为MixedGaussianAvatar的新方法,用于真实且几何准确的头像重建。我们的主要想法是使用2D高斯重建3D头像的表面,以确保几何精度。我们将2D高斯附加到FLAME模型的三角网格上,并在2DGS渲染质量不足的地方连接到额外的3D高斯,创建混合的2D-3D高斯表示。这些2D-3D高斯可以使用FLAME参数进行动画处理。我们还引入了一种渐进的训练策略,首先训练2D高斯,然后对混合的2D-3D高斯进行微调。我们使用统一的混合高斯表示来集成2D图像和3D网格的两种模式。此外,综合实验证明了MixedGaussianAvatar的优势。代码将发布。
论文及项目相关链接
Summary
基于神经辐射场(NeRF)技术,重建高保真3D头像在虚拟现实中具有重要意义。现有方法如基于3D高斯拼贴(3DGS)的方法虽然提高了训练和渲染效率,但几何精度有限;而二维高斯拼贴(2DGS)虽然提升了几何精度,但牺牲了渲染质量。本文提出一种名为MixedGaussianAvatar的混合方法,结合二维高斯与三维高斯的优势,确保几何精度与渲染质量。该方法将二维高斯用于重建三维头像表面,将其附着于FLAME模型的三角网格上,并在必要时增加三维高斯以提高渲染质量。此外,采用渐进训练策略,先训练二维高斯,再微调混合二维-三维高斯。实验证明MixedGaussianAvatar的优越性。
Key Takeaways
- MixedGaussianAvatar结合了二维高斯和三维高斯的优势,旨在实现高保真和几何准确的头像重建。
- 方法利用二维高斯重建三维头像表面并附着于FLAME模型的三角网格上。
- 在需要时,通过增加三维高斯来提高渲染质量,形成混合的二维-三维高斯表示。
- 采用渐进训练策略,先训练二维高斯,再微调混合表示。
- 方法通过统一的混合高斯表示集成了二维图像和三维网格的两种模式。
- 实验证明MixedGaussianAvatar在头像重建方面的优越性。
点此查看论文截图
Gaussian Splashing: Direct Volumetric Rendering Underwater
Authors:Nir Mualem, Roy Amoyal, Oren Freifeld, Derya Akkaynak
In underwater images, most useful features are occluded by water. The extent of the occlusion depends on imaging geometry and can vary even across a sequence of burst images. As a result, 3D reconstruction methods robust on in-air scenes, like Neural Radiance Field methods (NeRFs) or 3D Gaussian Splatting (3DGS), fail on underwater scenes. While a recent underwater adaptation of NeRFs achieved state-of-the-art results, it is impractically slow: reconstruction takes hours and its rendering rate, in frames per second (FPS), is less than 1. Here, we present a new method that takes only a few minutes for reconstruction and renders novel underwater scenes at 140 FPS. Named Gaussian Splashing, our method unifies the strengths and speed of 3DGS with an image formation model for capturing scattering, introducing innovations in the rendering and depth estimation procedures and in the 3DGS loss function. Despite the complexities of underwater adaptation, our method produces images at unparalleled speeds with superior details. Moreover, it reveals distant scene details with far greater clarity than other methods, dramatically improving reconstructed and rendered images. We demonstrate results on existing datasets and a new dataset we have collected. Additional visual results are available at: https://bgu-cs-vil.github.io/gaussiansplashingUW.github.io/ .
在水下图像中,大多数有用的特征都被水遮挡住了。遮挡的程度取决于成像几何,甚至在连续的多帧图像中也会有所不同。因此,在空气场景中稳健的3D重建方法,如神经网络辐射场方法(NeRFs)或三维高斯平铺(3DGS),在水下场景中都会失效。虽然最近对NeRFs的水下适应性改进取得了最新结果,但其处理速度过于缓慢:重建需要数小时,并且每秒渲染帧数(FPS)低于1。在这里,我们提出了一种新方法,它可以在几分钟内完成重建,并以每秒140帧的速度渲染新的水下场景。我们将其命名为高斯飞溅法(Gaussian Splashing),该方法结合了三维高斯平铺的优点和速度,同时采用了图像形成模型来捕捉散射现象,并在渲染和深度估计程序以及三维高斯平铺的损失函数中引入了创新。尽管水下适应的复杂性,我们的方法以无与伦比的速度生成具有卓越细节的图像。此外,它比其他方法更清晰地揭示了远处的场景细节,极大地提高了重建和渲染的图像质量。我们在现有的数据集和我们新收集的数据集上展示了结果。更多视觉结果可在以下网址找到:https://bgu-cs-vil.github.io/gaussiansplashingUW.github.io/。
论文及项目相关链接
Summary
在水下图像中,由于水的影响,大多数有用特征都被遮挡了。成像几何决定遮挡程度,即使在连续图像序列中也会有所不同。因此,在空气场景有效的三维重建方法,如神经网络辐射场方法(NeRFs)或三维高斯涂抹(3DGS),在水下场景中则表现不佳。虽然最近对NeRFs的水下适应达到了最先进的水平,但它非常耗时:重建需要数小时,并且其渲染速度每秒帧数(FPS)低于一帧。这里提出了一种新的方法,它能在几分钟内完成重建并以每秒140帧的速度渲染水下新场景。名为高斯飞溅的方法结合了三维高斯涂抹的强度与速度,并采用了图像形成模型来捕捉散射现象,同时改进了渲染和深度估计程序以及三维高斯涂抹的损失函数。尽管水下适应的复杂性,该方法能够以无与伦比的速度生成具有优越细节的图像。更重要的是,与其他方法相比,它能够揭示更远的场景细节,极大地提高了重建和渲染的图像质量。我们在现有的数据集和我们收集的新数据集上展示了结果。更多视觉结果可通过链接查看。
Key Takeaways
- 水下图像的特征常常被水遮挡,影响程度取决于成像几何。
- 当前的三维重建方法在空气场景中效果很好,但在水下场景中效果欠佳。
- 一种新的水下图像处理方法——高斯飞溅结合了三维高斯涂抹的优势和速度,并考虑了图像形成中的散射现象。
- 高斯飞溅方法能够在几分钟内完成重建并以每秒高达140帧的速度渲染水下新场景。
- 高斯飞溅方法在渲染速度和图像质量方面表现优越,特别是能够揭示更远场景细节的能力。
- 高斯飞溅方法通过改进渲染和深度估计程序以及三维高斯涂抹的损失函数来实现其高效性能。
点此查看论文截图
EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting
Authors:Bohao Liao, Wei Zhai, Zengyu Wan, Zhixin Cheng, Wenfei Yang, Tianzhu Zhang, Yang Cao, Zheng-Jun Zha
Scene reconstruction from casually captured videos has wide applications in real-world scenarios. With recent advancements in differentiable rendering techniques, several methods have attempted to simultaneously optimize scene representations (NeRF or 3DGS) and camera poses. Despite recent progress, existing methods relying on traditional camera input tend to fail in high-speed (or equivalently low-frame-rate) scenarios. Event cameras, inspired by biological vision, record pixel-wise intensity changes asynchronously with high temporal resolution, providing valuable scene and motion information in blind inter-frame intervals. In this paper, we introduce the event camera to aid scene construction from a casually captured video for the first time, and propose Event-Aided Free-Trajectory 3DGS, called EF-3DGS, which seamlessly integrates the advantages of event cameras into 3DGS through three key components. First, we leverage the Event Generation Model (EGM) to fuse events and frames, supervising the rendered views observed by the event stream. Second, we adopt the Contrast Maximization (CMax) framework in a piece-wise manner to extract motion information by maximizing the contrast of the Image of Warped Events (IWE), thereby calibrating the estimated poses. Besides, based on the Linear Event Generation Model (LEGM), the brightness information encoded in the IWE is also utilized to constrain the 3DGS in the gradient domain. Third, to mitigate the absence of color information of events, we introduce photometric bundle adjustment (PBA) to ensure view consistency across events and frames. We evaluate our method on the public Tanks and Temples benchmark and a newly collected real-world dataset, RealEv-DAVIS. Our project page is https://lbh666.github.io/ef-3dgs/.
场景重建在现实世界场景中的应用广泛,涉及从随意拍摄的视频中进行重建。随着可微分渲染技术的最新发展,一些方法已经尝试同时优化场景表示(NeRF或3DGS)和相机姿态。尽管取得了最新的进展,但依赖传统相机输入的方法往往在高速(或等效地,低帧率)场景中表现不佳。事件相机受到生物视觉的启发,能够异步记录像素级的强度变化,具有很高的时间分辨率,并在盲帧间隔提供了有价值的场景和运动信息。在本文中,我们首次引入了事件相机来帮助从随意拍摄的视频中进行场景构建,并提出了名为EF-3DGS的事件辅助自由轨迹3DGS。它通过三个关键组件无缝集成事件相机的优势到3DGS中。首先,我们利用事件生成模型(EGM)融合事件和帧,监督事件流观察到的渲染视图。其次,我们采用分段对比最大化(CMax)框架,通过最大化变形事件的图像对比度来提取运动信息,从而校准估计的姿态。此外,基于线性事件生成模型(LEGM),我们还利用IWE中编码的亮度信息在梯度域约束3DGS。第三,为了缓解事件颜色信息的缺失,我们引入了光度捆绑调整(PBA)以确保事件和帧之间的视图一致性。我们在公共的Tanks and Temples基准测试和新收集的现实世界数据集RealEv-DAVIS上评估了我们的方法。我们的项目页面是https://lbh666.github.io/ef-3dgs/。
论文及项目相关链接
PDF Accepted to NeurIPS 2025,Project Page: https://lbh666.github.io/ef-3dgs/
Summary
引入事件相机辅助从随手拍摄的视频中进行场景构建,提出Event-Aided Free-Trajectory 3DGS(EF-3DGS)方法,融合事件相机的优势,通过三个关键组件实现场景重建的优化。利用事件生成模型(EGM)融合事件和帧信息,采用对比最大化(CMax)框架提取运动信息,并基于线性事件生成模型(LEGM)利用亮度信息约束3DGS。此外,引入光度捆绑调整(PBA)确保事件和帧之间的视图一致性。在公共的Tanks和Temples基准测试以及新收集的真实世界数据集RealEv-DAVIS上评估了该方法。
Key Takeaways
- 事件相机被引入以辅助从随手拍摄的视频中进行场景构建。
- 提出了Event-Aided Free-Trajectory 3DGS(EF-3DGS)方法,无缝集成事件相机的优势。
- 通过三个关键组件实现场景重建的优化:事件生成模型(EGM)、对比最大化(CMax)和线性事件生成模型(LEGM)。
- 利用事件相机的亮度信息在梯度域约束3D场景结构。
- 引入光度捆绑调整(PBA)以确保事件和帧之间的视图一致性。
- 在公共基准测试RealEv-DAVIS上进行了评估,展示了所提方法的有效性。
点此查看论文截图
NerfBaselines: Consistent and Reproducible Evaluation of Novel View Synthesis Methods
Authors:Jonas Kulhanek, Torsten Sattler
Novel view synthesis is an important problem with many applications, including AR/VR, gaming, and robotic simulations. With the recent rapid development of Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS) methods, it is becoming difficult to keep track of the current state of the art (SoTA) due to methods using different evaluation protocols, codebases being difficult to install and use, and methods not generalizing well to novel 3D scenes. In our experiments, we show that even tiny differences in the evaluation protocols of various methods can artificially boost the performance of these methods. This raises questions about the validity of quantitative comparisons performed in the literature. To address these questions, we propose NerfBaselines, an evaluation framework which provides consistent benchmarking tools, ensures reproducibility, and simplifies the installation and use of various methods. We validate our implementation experimentally by reproducing the numbers reported in the original papers. For improved accessibility, we release a web platform that compares commonly used methods on standard benchmarks. We strongly believe NerfBaselines is a valuable contribution to the community as it ensures that quantitative results are comparable and thus truly measure progress in the field of novel view synthesis.
新型视角合成是一个具有许多应用的重要问题,包括AR/VR、游戏和机器人模拟。随着神经辐射场(NeRFs)和3D高斯涂抹(3DGS)方法的快速发展,由于使用了不同的评估协议、代码库难以安装和使用、以及方法不能很好地推广到新型3D场景,很难跟踪当前最前沿技术(SoTA)。在我们的实验中,我们表明,即使评价协议中的微小差异也会人为地提升各种方法的性能。这引发了关于文献中进行的定量比较真实性的问题。为了回答这个问题,我们提出了NeRF基线(NerfBaselines),一个评估框架,提供一致的性能评估工具,确保可重复性,并简化了各种方法的安装和使用。我们通过复制原始论文中报告的数值来验证我们的实现。为了增强可访问性,我们发布了一个网络平台,该平台可以在标准基准测试上对常用方法进行比较。我们坚信NeRF基线对社区是一个有价值的贡献,因为它确保了定量结果的可比性,从而真正衡量了新型视角合成领域的进展。
论文及项目相关链接
PDF NeurIPS 2025 D&B; Web: https://jkulhanek.com/nerfbaselines
Summary
新型视图合成是一个涵盖增强现实(AR)、虚拟现实(VR)、游戏和机器人模拟等领域的重要课题。近期随着神经网络辐射场(NeRF)和三维高斯拼接技术(3DGS)的快速发展,跟踪当前前沿技术变得困难重重。当前的方法采用不同的评估协议,代码库难以安装和使用,难以应用于新型三维场景。本研究通过实验表明,评估协议中的微小差异可能会人为提升方法的性能表现,这引发了关于文献中定量比较有效性的质疑。为解决这些问题,我们提出了NeRF基线评估框架,提供一致的基准测试工具,确保结果的可重复性,并简化了方法的安装和使用流程。通过实验验证我们重新产生报告的原始论文中的数字结果,证明了实施的有效性。为提升实用性,我们发布了一个在线平台,可以在标准基准上比较常用的方法。我们相信NeRF基线对社区是一大贡献,确保了定量结果的可比性,为视图合成领域的真实进步衡量打下基础。
Key Takeaways
- 新型视图合成是一个涵盖多个领域的重要问题,包括AR/VR、游戏和机器人模拟等。
- 当前NeRF和3DGS等技术的快速进展使得跟踪前沿技术变得困难。
- 文献中的定量比较有效性受到质疑,因为评估协议差异可能导致性能表现提升的人为效应。
- 提出NeRF基线评估框架以提供一致的基准测试工具,确保结果的可重复性。
- 该框架简化了方法的安装和使用流程,并通过实验验证了实施的有效性。
- 发布在线平台,用于在标准基准上比较常用方法。
点此查看论文截图
OpenMaterial: A Large-scale Dataset of Complex Materials for 3D Reconstruction
Authors:Zheng Dang, Jialu Huang, Fei Wang, Mathieu Salzmann
Recent advances in deep learning, such as neural radiance fields and implicit neural representations, have significantly advanced 3D reconstruction. However, accurately reconstructing objects with complex optical properties, such as metals, glass, and plastics, remains challenging due to the breakdown of multi-view color consistency in the presence of specular reflections, refractions, and transparency. This limitation is further exacerbated by the lack of benchmark datasets that explicitly model material-dependent light transport. To address this, we introduce OpenMaterial, a large-scale semi-synthetic dataset for benchmarking material-aware 3D reconstruction. It comprises 1,001 objects spanning 295 distinct materials, including conductors, dielectrics, plastics, and their roughened variants, captured under 714 diverse lighting conditions. By integrating lab-measured Index of Refraction (IOR) spectra, OpenMaterial enables the generation of high-fidelity multi-view images that accurately simulate complex light-matter interactions. It provides multi-view images, 3D shape models, camera poses, depth maps, and object masks, establishing the first extensive benchmark for evaluating 3D reconstruction on challenging materials. We evaluate 11 state-of-the-art methods for 3D reconstruction and novel view synthesis, conducting ablation studies to assess the impact of material type, shape complexity, and illumination on reconstruction performance. Our results indicate that OpenMaterial provides a strong and fair basis for developing more robust, physically-informed 3D reconstruction techniques to better handle real-world optical complexities.
最近深度学习领域的神经辐射场和隐式神经表示等进展极大地推动了3D重建的发展。然而,由于光反射、折射和透明度的存在,导致多视角颜色一致性失效,使得对金属、玻璃和塑料等具有复杂光学特性的物体的精确重建仍然具有挑战性。这一局限性还因缺乏显式建模材料依赖的光传输的基准数据集而加剧。为了解决这一问题,我们引入了OpenMaterial,这是一个用于基准测试的材料感知3D重建的大型半合成数据集。它包含1001个对象,跨越295种不同材料,包括导体、介电体、塑料及其粗糙变体,在714种不同的光照条件下拍摄。通过整合实验室测量的折射率(IOR)光谱,OpenMaterial能够生成高保真多视角图像,准确模拟复杂的光与物质相互作用。它提供了多视角图像、3D形状模型、相机姿态、深度图和对象掩膜,为评估在具有挑战性的材料上进行3D重建建立了首个广泛的基准测试。我们评估了11种最先进的3D重建和新颖视图合成方法,进行了消融研究,以评估材料类型、形状复杂性和照明对重建性能的影响。我们的结果表明,OpenMaterial为开发更稳健、物理驱动的3D重建技术提供了强大而公平的基础,以更好地应对现实世界的光学复杂性。
论文及项目相关链接
Summary
本文介绍了利用深度学习技术,如神经辐射场和隐式神经表示法,在3D重建方面的最新进展。然而,对于具有复杂光学特性的物体的准确重建,如金属、玻璃和塑料,仍然存在挑战。为解决此问题,本文引入了OpenMaterial,一个用于基准测试的材料感知3D重建的大型半合成数据集。它包含1001个对象,跨越295种不同材料,并在714种不同的照明条件下拍摄。通过整合实验室测量的折射率(IOR)光谱,OpenMaterial能够生成高保真多视角图像,准确模拟复杂的光与物质相互作用。它为3D重建和新颖视图合成提供了第一个广泛的基准测试平台。
Key Takeaways
- 神经辐射场和隐式神经表示法在3D重建方面取得显著进展。
- 重建具有复杂光学特性的物体(如金属、玻璃和塑料)仍然具有挑战。
- 缺乏明确模拟材料依赖光线传输的基准数据集。
- 引入OpenMaterial数据集,包含1001个对象,跨越295种不同材料,并在多种照明条件下拍摄。
- OpenMaterial数据集通过整合IOR光谱,模拟复杂的光与物质相互作用。
- 提供多视角图像、3D形状模型、相机姿态、深度图和对象掩膜。
点此查看论文截图