嘘~ 正在从服务器偷取页面 . . .

3DGS


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-10-04 更新

StealthAttack: Robust 3D Gaussian Splatting Poisoning via Density-Guided Illusions

Authors:Bo-Hsu Ke, You-Zhe Xie, Yu-Lun Liu, Wei-Chen Chiu

3D scene representation methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have significantly advanced novel view synthesis. As these methods become prevalent, addressing their vulnerabilities becomes critical. We analyze 3DGS robustness against image-level poisoning attacks and propose a novel density-guided poisoning method. Our method strategically injects Gaussian points into low-density regions identified via Kernel Density Estimation (KDE), embedding viewpoint-dependent illusory objects clearly visible from poisoned views while minimally affecting innocent views. Additionally, we introduce an adaptive noise strategy to disrupt multi-view consistency, further enhancing attack effectiveness. We propose a KDE-based evaluation protocol to assess attack difficulty systematically, enabling objective benchmarking for future research. Extensive experiments demonstrate our method’s superior performance compared to state-of-the-art techniques. Project page: https://hentci.github.io/stealthattack/

神经辐射场(NeRF)和三维高斯拼贴(3DGS)等三维场景表示方法在新型视角合成方面取得了显著进展。随着这些方法的普及,解决它们的脆弱性问题变得至关重要。我们分析了3DGS对图像级中毒攻击的稳健性,并提出了一种新型密度引导中毒方法。我们的方法有针对性地将在通过核密度估计(KDE)识别出的低密度区域注入高斯点,嵌入视角相关的虚幻对象,这些对象在受污染的观点中清晰可见,而对无辜观点的影响最小。此外,我们还引入了一种自适应噪声策略,以破坏多视角一致性,进一步提高攻击效果。我们提出了基于KDE的评估协议,系统地评估攻击难度,为未来研究提供客观基准。大量实验证明,我们的方法与最先进的技术相比具有卓越的性能。项目页面:https://hentci.github.io/stealthattack/

论文及项目相关链接

PDF ICCV 2025. Project page: https://hentci.github.io/stealthattack/

Summary

该文本主要介绍了针对基于NeRF和3DGS等方法的虚拟场景表示技术所存在的安全问题进行分析。提出了一种新型的密度引导式中毒攻击方法,通过向低密度区域注入高斯点,干扰场景的多视角一致性,从而在特定视角下嵌入伪装对象而不影响其他视角的视图效果。此外,提出了一种基于KDE的评估协议用于评估攻击难度并进行客观的基准测试。通过实验验证,该攻击方法相较于当前技术展现出了更高的性能。

Key Takeaways

  1. 虚拟场景表示技术如NeRF和3DGS面临安全挑战,需要进行稳健性分析。
  2. 提出了一种新的密度引导式中毒攻击方法,针对基于核密度估计(KDE)的区域注入高斯点。
  3. 该方法能够在特定视角下嵌入伪装对象,同时保持对其他视角的影响最小化。
  4. 通过自适应噪声策略干扰多视角一致性,增强了攻击效果。
  5. 提出了一种基于KDE的评估协议,用于系统地评估攻击难度,为未来研究提供客观基准测试。
  6. 实验结果表明,该方法相较于现有技术具有更高的性能表现。

Cool Papers

点此查看论文截图

Performance-Guided Refinement for Visual Aerial Navigation using Editable Gaussian Splatting in FalconGym 2.0

Authors:Yan Miao, Ege Yuceel, Georgios Fainekos, Bardh Hoxha, Hideki Okamoto, Sayan Mitra

Visual policy design is crucial for aerial navigation. However, state-of-the-art visual policies often overfit to a single track and their performance degrades when track geometry changes. We develop FalconGym 2.0, a photorealistic simulation framework built on Gaussian Splatting (GSplat) with an Edit API that programmatically generates diverse static and dynamic tracks in milliseconds. Leveraging FalconGym 2.0’s editability, we propose a Performance-Guided Refinement (PGR) algorithm, which concentrates visual policy’s training on challenging tracks while iteratively improving its performance. Across two case studies (fixed-wing UAVs and quadrotors) with distinct dynamics and environments, we show that a single visual policy trained with PGR in FalconGym 2.0 outperforms state-of-the-art baselines in generalization and robustness: it generalizes to three unseen tracks with 100% success without per-track retraining and maintains higher success rates under gate-pose perturbations. Finally, we demonstrate that the visual policy trained with PGR in FalconGym 2.0 can be zero-shot sim-to-real transferred to a quadrotor hardware, achieving a 98.6% success rate (69 / 70 gates) over 30 trials spanning two three-gate tracks and a moving-gate track.

视觉策略设计对于空中导航至关重要。然而,最先进的视觉策略通常会对单一轨迹过度拟合,当轨迹几何形状发生变化时,它们的性能就会下降。我们开发了FalconGym 2.0,这是一个基于高斯拼贴(GSplat)的光学仿真框架,配备编辑API,可程式化生成毫秒内的各种静态和动态轨迹。通过利用FalconGym 2.0的可编辑性,我们提出了一种性能引导细化(PGR)算法,该算法将视觉策略的训练集中在具有挑战性的轨迹上,同时迭代提高其性能。在两个具有不同动力学和环境的案例研究(固定翼无人机和旋翼无人机)中,我们展示了使用FalconGym 2.0中的PGR训练的单个视觉策略在通用性和稳健性方面超过了最先进的基线:它可以在未见过的三条轨迹上实现100%的成功率,无需针对每条轨迹进行再训练,并在门姿态扰动的情况下保持较高的成功率。最后,我们证明,使用FalconGym 2.0中的PGR训练的视觉策略可以零样本从仿真转移到实际硬件旋翼无人机上,在跨越两条三门轨道和一个移动门轨道的30次试验中,实现了98.6%(69/70门)的成功率。

论文及项目相关链接

PDF

摘要
视觉策略设计在航向导航中至关重要。然而,最先进的视觉策略通常过于适应单一轨迹,当轨迹几何形状发生变化时,其性能会下降。我们开发出了FalconGym 2.0,这是一个基于高斯斑贴技术构建的沉浸式仿真框架,通过编辑API可程式化快速生成静态和动态轨迹。借助FalconGym 2.0的可编辑性,我们提出了一种性能导向的细化算法(PGR),该算法集中视觉策略的训练于具有挑战性的轨迹上,同时不断改进其性能。在固定翼无人机和具有不同动力学和环境的四旋翼机的两个案例研究中,我们证明了使用FalconGym 2.0中的PGR训练的单视觉策略在通用性和稳健性方面优于最先进的基线产品:它能够推广到三条未见过的轨迹上,成功率为百分之百,无需进行针对每条轨迹的重新训练,并且在门姿态扰动下保持更高的成功率。最后,我们证明了使用FalconGym 2.0中的PGR训练的视觉策略可以零样本仿真到真实硬件四旋翼机上实现转移,在跨越两条三门轨道和一个移动门轨道的30次试验中,成功率达到百分之九十八点六(成功穿越69个门)。

要点掌握

以下是从文本中提取的七个最重要的见解:

Cool Papers

点此查看论文截图

GaussianMorphing: Mesh-Guided 3D Gaussians for Semantic-Aware Object Morphing

Authors:Mengtian Li, Yunshu Bai, Yimin Chu, Yijun Shen, Zhongmei Li, Weifeng Ge, Zhifeng Xie, Chaofeng Chen

We introduce GaussianMorphing, a novel framework for semantic-aware 3D shape and texture morphing from multi-view images. Previous approaches usually rely on point clouds or require pre-defined homeomorphic mappings for untextured data. Our method overcomes these limitations by leveraging mesh-guided 3D Gaussian Splatting (3DGS) for high-fidelity geometry and appearance modeling. The core of our framework is a unified deformation strategy that anchors 3DGaussians to reconstructed mesh patches, ensuring geometrically consistent transformations while preserving texture fidelity through topology-aware constraints. In parallel, our framework establishes unsupervised semantic correspondence by using the mesh topology as a geometric prior and maintains structural integrity via physically plausible point trajectories. This integrated approach preserves both local detail and global semantic coherence throughout the morphing process with out requiring labeled data. On our proposed TexMorph benchmark, GaussianMorphing substantially outperforms prior 2D/3D methods, reducing color consistency error ($\Delta E$) by 22.2% and EI by 26.2%. Project page: https://baiyunshu.github.io/GAUSSIANMORPHING.github.io/

我们介绍了GaussianMorphing,这是一种新的多视角图像语义感知3D形状和纹理变形框架。之前的方法通常依赖于点云或为无纹理数据需要提前定义同胚映射。我们的方法通过利用网格引导的3D高斯拼贴(3DGS)进行高保真几何和外观建模,克服了这些局限性。我们框架的核心是一种统一的变形策略,它将3D高斯值锚定到重建的网格块上,确保几何一致的变换,同时通过拓扑感知约束保持纹理保真度。同时,我们的框架使用网格拓扑作为几何先验来建立无监督语义对应,并通过物理可行的点轨迹保持结构完整性。这种综合方法在整个变形过程中保留了局部细节和全局语义连贯性,而无需标记数据。在我们提出的TexMorph基准测试上,GaussianMorphing显著优于先前的2D/3D方法,颜色一致性误差(ΔE)降低22.2%,EI降低26.2%。项目页面:https://baiyunshu.github.io/GAUSSIANMORPHING.github.io/

论文及项目相关链接

PDF Project page: https://baiyunshu.github.io/GAUSSIANMORPHING.github.io/

Summary

本文介绍了GaussianMorphing,一种基于多视角图像进行语义感知的3D形状和纹理变形的新框架。该方法克服了以往方法依赖点云或需要预先定义同胚映射的局限性,通过利用网格引导的3D高斯喷绘(3DGS)进行高保真几何和外观建模。该框架的核心是统一变形策略,将3D高斯锚定到重建的网格块上,确保几何一致的变换,同时通过拓扑感知约束保持纹理保真度。此外,该框架通过建立基于网格拓扑的几何先验进行无监督语义对应,并通过物理可行的点轨迹保持结构完整性。此综合方法在形态过程中保留了局部细节和全局语义连贯性,无需标注数据。在提出的TexMorph基准测试中,GaussianMorphing较之前的2D/3D方法大幅度提高了性能,颜色一致性误差(ΔE)降低了22.2%,EI降低了26.2%。

Key Takeaways

  1. GaussianMorphing是一个基于多视角图像的语义感知3D形状和纹理变形框架。
  2. 该方法利用网格引导的3D高斯喷绘(3DGS)进行高保真几何和外观建模。
  3. 框架通过统一变形策略确保几何一致的变换并维持纹理保真度。
  4. 利用网格拓扑建立无监督语义对应,保持结构完整性。
  5. 该方法在形态过程中保留了局部细节和全局语义连贯性,无需标注数据。
  6. 在TexMorph基准测试中,GaussianMorphing较之前的方法有显著改善。
  7. 该方法通过降低颜色一致性误差和EI,实现了高效的性能提升。

Cool Papers

点此查看论文截图

LOBE-GS: Load-Balanced and Efficient 3D Gaussian Splatting for Large-Scale Scene Reconstruction

Authors:Sheng-Hsiang Hung, Ting-Yu Yen, Wei-Fang Sun, Simon See, Shih-Hsuan Hung, Hung-Kuo Chu

3D Gaussian Splatting (3DGS) has established itself as an efficient representation for real-time, high-fidelity 3D scene reconstruction. However, scaling 3DGS to large and unbounded scenes such as city blocks remains difficult. Existing divide-and-conquer methods alleviate memory pressure by partitioning the scene into blocks, but introduce new bottlenecks: (i) partitions suffer from severe load imbalance since uniform or heuristic splits do not reflect actual computational demands, and (ii) coarse-to-fine pipelines fail to exploit the coarse stage efficiently, often reloading the entire model and incurring high overhead. In this work, we introduce LoBE-GS, a novel Load-Balanced and Efficient 3D Gaussian Splatting framework, that re-engineers the large-scale 3DGS pipeline. LoBE-GS introduces a depth-aware partitioning method that reduces preprocessing from hours to minutes, an optimization-based strategy that balances visible Gaussians – a strong proxy for computational load – across blocks, and two lightweight techniques, visibility cropping and selective densification, to further reduce training cost. Evaluations on large-scale urban and outdoor datasets show that LoBE-GS consistently achieves up to $2\times$ faster end-to-end training time than state-of-the-art baselines, while maintaining reconstruction quality and enabling scalability to scenes infeasible with vanilla 3DGS.

3D高斯展开(3DGS)已经成为实时高保真3D场景重建的有效表示方法。然而,将3DGS扩展到大型和无限场景(如街区)仍然很困难。现有的分而治之的方法通过将场景划分为块来缓解内存压力,但引入了新的瓶颈:(i)由于均匀或启发式分割并不能反映实际的计算需求,分区遭受严重的负载不平衡问题;(ii)从粗略到精细的管道未能有效地利用粗略阶段,经常需要重新加载整个模型并产生高昂的开销。在这项工作中,我们引入了LoBE-GS,这是一种新型的负载均衡和高效的三维高斯展开框架,它重新设计了大规模三维高斯展开管道。LoBE-GS引入了一种深度感知分区方法,将预处理时间从几小时缩短到几分钟;一种基于优化的策略,平衡可见高斯(计算负载的可靠代理)在各块之间;以及两种轻量级技术,可见性裁剪和选择性加密,以进一步降低训练成本。在大型城市和室外数据集上的评估表明,LoBE-GS始终能达到最先进基线技术的两倍端到端训练时间,同时保持重建质量并实现对普通三维高斯展开无法实现场景的扩展性。

论文及项目相关链接

PDF

摘要

3DGS对于大规模场景的重建表现出良好效率。但其在处理城市等大尺度场景时仍存在困难。现有方法通过分割场景以减轻内存压力,但存在负载不均衡等问题。本文提出LoBE-GS框架,通过深度感知分割技术将预处理时间缩短至分钟级,同时提出基于优化的策略以平衡负载和轻量级技术以提高效率。在大型城市和室外数据集上的评估表明,相较于当前最新方法,LoBE-GS端到端训练时间提升了两倍以上,保证了重建质量,并且能够在原先无法实现大规模场景中使用。

关键要点

  • 面临问题:在大规模和无限的场景中实现实时的3D重建仍是难题,需要分割场景的现有解决方案会引发新的瓶颈,如负载不均衡等。

Cool Papers

点此查看论文截图

MPMAvatar: Learning 3D Gaussian Avatars with Accurate and Robust Physics-Based Dynamics

Authors:Changmin Lee, Jihyun Lee, Tae-Kyun Kim

While there has been significant progress in the field of 3D avatar creation from visual observations, modeling physically plausible dynamics of humans with loose garments remains a challenging problem. Although a few existing works address this problem by leveraging physical simulation, they suffer from limited accuracy or robustness to novel animation inputs. In this work, we present MPMAvatar, a framework for creating 3D human avatars from multi-view videos that supports highly realistic, robust animation, as well as photorealistic rendering from free viewpoints. For accurate and robust dynamics modeling, our key idea is to use a Material Point Method-based simulator, which we carefully tailor to model garments with complex deformations and contact with the underlying body by incorporating an anisotropic constitutive model and a novel collision handling algorithm. We combine this dynamics modeling scheme with our canonical avatar that can be rendered using 3D Gaussian Splatting with quasi-shadowing, enabling high-fidelity rendering for physically realistic animations. In our experiments, we demonstrate that MPMAvatar significantly outperforms the existing state-of-the-art physics-based avatar in terms of (1) dynamics modeling accuracy, (2) rendering accuracy, and (3) robustness and efficiency. Additionally, we present a novel application in which our avatar generalizes to unseen interactions in a zero-shot manner-which was not achievable with previous learning-based methods due to their limited simulation generalizability. Our project page is at: https://KAISTChangmin.github.io/MPMAvatar/

在3D化身从视觉观察创建领域取得了重大进展的同时,为穿着宽松衣物的人类建立物理上合理的动态仍然是一个具有挑战性的问题。尽管一些现有工作通过利用物理模拟来解决这个问题,但它们在处理新颖动画输入时的准确性和鲁棒性方面存在局限性。在这项工作中,我们提出了MPMAvatar,这是一个从多视角视频创建3D人类化身的框架,支持高度逼真、稳健的动画,以及从自由视角进行写实渲染。为了进行准确而稳健的动态建模,我们的关键思想是使用基于物质点方法的模拟器,我们精心定制了该模拟器,以通过引入各向异性本构模型和新型碰撞处理算法来对复杂变形和与基础人体的接触进行衣物建模。我们将这种动态建模方案与可以进行渲染的典型化身相结合,使用带有准阴影的3D高斯碎化技术,为物理逼真的动画提供高保真渲染。在我们的实验中,我们证明了MPMAvatar在(1)动态建模准确性、(2)渲染准确性和(3)鲁棒性和效率方面显著优于现有的基于物理的化身最新技术。此外,我们还展示了一项新颖的应用,我们的化身能够以零样本的方式推广到未见过的交互-这是以前基于学习的方法由于有限的模拟泛化能力而无法实现的。我们的项目页面为:https://KAISTChangmin.github.io/MPMAvatar/。

论文及项目相关链接

PDF Accepted to NeurIPS 2025

Summary

本文介绍了一个利用多视角视频创建3D人类角色的框架MPMAvatar。该框架支持高度真实、稳健的动画以及从自由视角进行的光照真实渲染。通过使用基于物质点方法的模拟器,实现了对衣物复杂变形以及与底层身体的接触的精准建模。与现有技术相比,MPMAvatar在动力学建模精度、渲染精度、稳健性和效率方面都有显著优势。此外,该框架还能实现对未见交互的零样本泛化。

Key Takeaways

  1. MPMAvatar是一个用于从多视角视频创建3D人类角色的框架,支持高度真实和稳健的动画。
  2. 使用基于物质点方法的模拟器进行衣物复杂变形和与身体接触的建模。
  3. MPMAvatar在动力学建模精度、渲染精度、稳健性和效率方面优于现有技术。
  4. 框架具备对未见交互的零样本泛化能力。
  5. MPMAvatar结合了动力学建模方案和可通过3D高斯Splatting进行渲染的标准角色,实现了高保真渲染。
  6. 框架具有广泛的应用前景,可用于创建高度真实的虚拟角色。

Cool Papers

点此查看论文截图

Instant4D: 4D Gaussian Splatting in Minutes

Authors:Zhanpeng Luo, Haoxi Ran, Li Lu

Dynamic view synthesis has seen significant advances, yet reconstructing scenes from uncalibrated, casual video remains challenging due to slow optimization and complex parameter estimation. In this work, we present Instant4D, a monocular reconstruction system that leverages native 4D representation to efficiently process casual video sequences within minutes, without calibrated cameras or depth sensors. Our method begins with geometric recovery through deep visual SLAM, followed by grid pruning to optimize scene representation. Our design significantly reduces redundancy while maintaining geometric integrity, cutting model size to under 10% of its original footprint. To handle temporal dynamics efficiently, we introduce a streamlined 4D Gaussian representation, achieving a 30x speed-up and reducing training time to within two minutes, while maintaining competitive performance across several benchmarks. Our method reconstruct a single video within 10 minutes on the Dycheck dataset or for a typical 200-frame video. We further apply our model to in-the-wild videos, showcasing its generalizability. Our project website is published at https://instant4d.github.io/.

动态视图合成已经取得了重大进展,但由于优化缓慢和复杂的参数估计,从未校准的、随意的视频中重建场景仍然具有挑战性。在这项工作中,我们提出了Instant4D,这是一种单目重建系统,它利用本地4D表示,无需校准相机或深度传感器,就能在几分钟内高效处理随意视频序列。我们的方法始于通过深度视觉SLAM进行几何恢复,然后通过网格修剪来优化场景表示。我们的设计在保持几何完整性的同时,大大减少了冗余,将模型大小削减到原始足迹的不到10%。为了有效地处理时间动态,我们采用了简化的4D高斯表示,实现了30倍的加速,将训练时间缩短到两分钟以内,同时在多个基准测试中保持竞争力。我们的方法在Dycheck数据集上对一个单一视频的重建时间或在典型的200帧视频上的重建时间均在10分钟以内。我们还将模型应用于野生视频,展示了其泛化能力。我们的项目网站已发布在https://instant4d.github.io/上。

论文及项目相关链接

PDF Accepted by NeurIPS 25

Summary

本文介绍了Instant4D系统,它采用单目重建技术,利用原生4D表示,无需校准相机或深度传感器,可在几分钟内高效处理日常视频序列。通过深度视觉SLAM进行几何恢复,网格修剪优化场景表示,减少冗余,同时保持几何完整性。引入简化的4D高斯表示,高效处理时间动态,加快模型训练速度,同时在多个基准测试中表现具有竞争力。可在Dyncheck数据集上10分钟内重建单个视频,或对于典型的200帧视频。模型可应用于野外视频,展示其泛化能力。

Key Takeaways

  1. Instant4D系统采用单目重建技术处理日常视频序列。
  2. 利用原生4D表示,无需校准相机或深度传感器。
  3. 通过深度视觉SLAM进行几何恢复。
  4. 网格修剪技术优化场景表示,减少冗余。
  5. 引入简化的4D高斯表示,高效处理时间动态。
  6. 模型训练速度加快,可在几分钟内完成。
  7. 模型具有良好的泛化能力,可应用于野外视频。

Cool Papers

点此查看论文截图

GEM: 3D Gaussian Splatting for Efficient and Accurate Cryo-EM Reconstruction

Authors:Huaizhi Qu, Xiao Wang, Gengwei Zhang, Jie Peng, Tianlong Chen

Cryo-electron microscopy (cryo-EM) has become a central tool for high-resolution structural biology, yet the massive scale of datasets (often exceeding 100k particle images) renders 3D reconstruction both computationally expensive and memory intensive. Traditional Fourier-space methods are efficient but lose fidelity due to repeated transforms, while recent real-space approaches based on neural radiance fields (NeRFs) improve accuracy but incur cubic memory and computation overhead. Therefore, we introduce GEM, a novel cryo-EM reconstruction framework built on 3D Gaussian Splatting (3DGS) that operates directly in real-space while maintaining high efficiency. Instead of modeling the entire density volume, GEM represents proteins with compact 3D Gaussians, each parameterized by only 11 values. To further improve the training efficiency, we designed a novel gradient computation to 3D Gaussians that contribute to each voxel. This design substantially reduced both memory footprint and training cost. On standard cryo-EM benchmarks, GEM achieves up to 48% faster training and 12% lower memory usage compared to state-of-the-art methods, while improving local resolution by as much as 38.8%. These results establish GEM as a practical and scalable paradigm for cryo-EM reconstruction, unifying speed, efficiency, and high-resolution accuracy. Our code is available at https://github.com/UNITES-Lab/GEM.

冷冻电子显微镜(cryo-EM)已成为高分辨率结构生物学的重要工具,但数据集规模庞大(通常超过10万张粒子图像),使得3D重建在计算上既昂贵又内存密集。传统的傅里叶空间方法虽然效率高,但由于多次变换而损失保真度,而最近基于神经辐射场(NeRFs)的实空间方法虽然提高了精度,但带来了立方级的内存和计算开销。因此,我们引入了GEM,这是一个新的冷冻电镜重建框架,建立在3D高斯拼贴(3DGS)之上,直接在实空间操作,同时保持高效率。

论文及项目相关链接

PDF

Summary

本文介绍了一种基于3D高斯喷绘(3DGS)的新型冷冻电子显微镜(cryo-EM)重建框架——GEM。与传统的Fourier空间方法和基于神经辐射场的real-space方法相比,GEM在真实空间中直接操作并保持高效率。它使用紧凑的3D高斯表示蛋白质,参数数量有限,并设计了新颖的梯度计算方法来提高训练效率。在标准的cryo-EM基准测试中,GEM实现了高达48%的训练速度提升和12%的内存使用降低,同时提高了局部分辨率。

Key Takeaways

  • GEM是一种基于3D高斯喷绘(3DGS)的冷冻电子显微镜(cryo-EM)重建新方法。
  • 与传统方法相比,GEM具有更高的效率和准确性。
  • GEM通过紧凑的3D高斯表示蛋白质和新颖梯度计算方法,实现了内存使用和训练成本的降低。
  • GEM实现了快速训练、低内存使用和较高局部分辨率的提升。
  • GEM代码已公开在GitHub上。

Cool Papers

点此查看论文截图

FalconWing: An Ultra-Light Indoor Fixed-Wing UAV Platform for Vision-Based Autonomy

Authors:Yan Miao, Will Shen, Hang Cui, Sayan Mitra

We introduce FalconWing, an ultra-light (150 g) indoor fixed-wing UAV platform for vision-based autonomy. Controlled indoor environment enables year-round repeatable UAV experiment but imposes strict weight and maneuverability limits on the UAV, motivating our ultra-light FalconWing design. FalconWing couples a lightweight hardware stack (137g airframe with a 9g camera) and offboard computation with a software stack featuring a photorealistic 3D Gaussian Splat (GSplat) simulator for developing and evaluating vision-based controllers. We validate FalconWing on two challenging vision-based aerial case studies. In the leader-follower case study, our best vision-based controller, trained via imitation learning on GSplat-rendered data augmented with domain randomization, achieves 100% tracking success across 3 types of leader maneuvers over 30 trials and shows robustness to leader’s appearance shifts in simulation. In the autonomous landing case study, our vision-based controller trained purely in simulation transfers zero-shot to real hardware, achieving an 80% success rate over ten landing trials. We will release hardware designs, GSplat scenes, and dynamics models upon publication to make FalconWing an open-source flight kit for engineering students and research labs.

我们介绍了FalconWing,这是一个基于视觉自主性的超轻量级(150克)室内固定翼无人机平台。受控的室内环境使得全年可重复进行无人机实验,但对无人机的重量和机动性有着严格限制,这激发了我们设计超轻量级FalconWing的初衷。FalconWing将轻量级硬件堆栈(137克机身和9克相机)与离机计算相结合,软件堆栈则具有逼真的3D高斯平板(GSplat)模拟器,用于开发和评估基于视觉的控制器。我们通过两个具有挑战性的基于视觉的航空案例研究验证了FalconWing。在领导跟随案例研究中,我们的最佳基于视觉的控制器通过模仿学习在GSplat渲染的数据和领域随机增强上进行训练,在三种领导机动和30次试验中实现了100%的跟踪成功率,并且在模拟中显示出对领导者外观变化的稳健性。在自主着陆案例研究中,我们在仿真中纯粹训练的基于视觉的控制器零射击即可转移到真实硬件上,在十次着陆试验中实现了80%的成功率。我们会在发表时公开硬件设计、GSplat场景和动力学模型,使FalconWing成为工程学生和实验室开放的飞行套件。

论文及项目相关链接

PDF

Summary

FalconWing是一款超轻量级(150克)的室内固定翼无人机平台,专为视觉自主飞行设计。在受控室内环境下,可进行全年可重复的无人机实验,但对无人机的重量和机动性有着严格限制,从而催生出FalconWing的超轻量级设计。该平台结合轻量化硬件堆栈(137克机身搭配9克摄像头)、离板计算以及包含逼真的3D高斯Splat(GSplat)模拟器的软件堆栈,用于开发和评估基于视觉的控制器。在两项具有挑战性的基于视觉的空中案例研究中验证了FalconWing的有效性。

Key Takeaways

  1. FalconWing是一款专为视觉自主飞行设计的超轻量级室内固定翼无人机平台,重量仅为150g。
  2. 受室内环境限制,对无人机的重量和机动性有严格要求,催生出FalconWing的设计。
  3. FalconWing采用轻量化硬件堆栈(137g机身+9g摄像头)和离板计算。
  4. 配备了逼真的3D高斯Splat(GSplat)模拟器的软件堆栈,用于开发和评估视觉控制器。
  5. 在领导者跟随者的案例中,经过GSplat渲染数据以及领域随机化增强数据的模仿学习训练的视觉控制器,成功跟踪了三种领导者的机动动作,成功率达100%。
  6. 在自主着陆案例中,纯模拟环境中训练的视觉控制器在真实硬件上的零镜头转移成功率达80%。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
NeRF NeRF
NeRF 方向最新论文已更新,请持续关注 Update in 2025-10-04 StealthAttack Robust 3D Gaussian Splatting Poisoning via Density-Guided Illusions
2025-10-04
下一篇 
元宇宙/虚拟人 元宇宙/虚拟人
元宇宙/虚拟人 方向最新论文已更新,请持续关注 Update in 2025-10-04 MPMAvatar Learning 3D Gaussian Avatars with Accurate and Robust Physics-Based Dynamics
  目录