发布日期: 2025-09-17

更新日期: 2025-10-07

文章字数: 2.3k

阅读时长: 9 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-17 更新

3DViT-GAT: A Unified Atlas-Based 3D Vision Transformer and Graph Learning Framework for Major Depressive Disorder Detection Using Structural MRI Data

Authors:Nojod M. Alotaibi, Areej M. Alhothali, Manar S. Ali

Major depressive disorder (MDD) is a prevalent mental health condition that negatively impacts both individual well-being and global public health. Automated detection of MDD using structural magnetic resonance imaging (sMRI) and deep learning (DL) methods holds increasing promise for improving diagnostic accuracy and enabling early intervention. Most existing methods employ either voxel-level features or handcrafted regional representations built from predefined brain atlases, limiting their ability to capture complex brain patterns. This paper develops a unified pipeline that utilizes Vision Transformers (ViTs) for extracting 3D region embeddings from sMRI data and Graph Neural Network (GNN) for classification. We explore two strategies for defining regions: (1) an atlas-based approach using predefined structural and functional brain atlases, and (2) an cube-based method by which ViTs are trained directly to identify regions from uniformly extracted 3D patches. Further, cosine similarity graphs are generated to model interregional relationships, and guide GNN-based classification. Extensive experiments were conducted using the REST-meta-MDD dataset to demonstrate the effectiveness of our model. With stratified 10-fold cross-validation, the best model obtained 78.98% accuracy, 76.54% sensitivity, 81.58% specificity, 81.58% precision, and 78.98% F1-score. Further, atlas-based models consistently outperformed the cube-based approach, highlighting the importance of using domain-specific anatomical priors for MDD detection.

抑郁症是一种常见的心理健康问题，对个人福祉和全球公共卫生都产生负面影响。利用结构磁共振成像（sMRI）和深度学习（DL）方法进行抑郁症的自动化检测，在提高诊断准确性和实现早期干预方面有着巨大的潜力。大多数现有方法使用基于体素的特征或根据预先定义的脑图谱手工制作的区域表示，这限制了它们捕捉复杂大脑模式的能力。本文开发了一个统一的流程，利用视觉变压器（ViTs）从sMRI数据中提取三维区域嵌入信息，并使用图神经网络（GNN）进行分类。我们探索了两种定义区域的方法：（1）一种基于图谱的方法，使用预先定义的结构和功能脑图谱；（2）一种基于立方体块的方法，通过直接训练ViTs从均匀提取的三维块中识别区域。此外，还生成了余弦相似度图来模拟区域间的相互关系，并引导基于GNN的分类。我们利用REST-meta-MDD数据集进行了大量实验，以证明我们的模型的有效性。通过分层10倍交叉验证，最佳模型获得了78.98%的准确率、76.54%的敏感性、81.58%的特异性、81.58%的精确度和78.98%的F1分数。此外，基于图谱的模型始终优于基于立方体块的方法，强调了使用特定领域的解剖学先验知识在抑郁症检测中的重要性。

论文及项目相关链接

PDF 14 pages, 1 figure, 7 tables

Summary

本文介绍了一种利用Vision Transformer和Graph Neural Network对抑郁症进行自动化检测的新方法。该方法使用结构磁共振成像数据和深度学习技术，通过两种策略定义区域：基于图谱的方法和使用预定义的脑图谱；直接训练Vision Transformer识别来自均匀提取的3D区块的区域。研究使用REST-meta-MDD数据集进行验证，显示该方法具有较高准确性。

Key Takeaways

Vision Transformer和Graph Neural Network被用于自动化检测抑郁症，基于结构磁共振成像数据和深度学习技术。
现有方法主要使用基于像素或预定义的脑图谱的区域表示，但此方法使用两种策略定义区域，包括基于图谱的方法和基于Vision Transformer的方法。
该方法通过生成余弦相似度图来模拟区域间关系，并使用基于图神经网络的分类器进行分类。
使用REST-meta-MDD数据集进行的实验显示，该方法具有较高的准确性（78.98%），敏感性（76.54%），特异性（81.58%），精确度（81.58%）和F1分数（78.98%）。
基于脑图谱的方法通常优于基于立方体（cube-based）的方法，这表明使用特定于领域的解剖学先验信息对于抑郁症检测很重要。
该方法有望提高抑郁症的诊断准确性和早期干预能力。

Cool Papers

点此查看论文截图

ActivePose: Active 6D Object Pose Estimation and Tracking for Robotic Manipulation

Authors:Sheng Liu, Zhe Li, Weiheng Wang, Han Sun, Heng Zhang, Hongpeng Chen, Yusen Qin, Arash Ajoudani, Yizhao Wang

Accurate 6-DoF object pose estimation and tracking are critical for reliable robotic manipulation. However, zero-shot methods often fail under viewpoint-induced ambiguities and fixed-camera setups struggle when objects move or become self-occluded. To address these challenges, we propose an active pose estimation pipeline that combines a Vision-Language Model (VLM) with “robotic imagination” to dynamically detect and resolve ambiguities in real time. In an offline stage, we render a dense set of views of the CAD model, compute the FoundationPose entropy for each view, and construct a geometric-aware prompt that includes low-entropy (unambiguous) and high-entropy (ambiguous) examples. At runtime, the system: (1) queries the VLM on the live image for an ambiguity score; (2) if ambiguity is detected, imagines a discrete set of candidate camera poses by rendering virtual views, scores each based on a weighted combination of VLM ambiguity probability and FoundationPose entropy, and then moves the camera to the Next-Best-View (NBV) to obtain a disambiguated pose estimation. Furthermore, since moving objects may leave the camera’s field of view, we introduce an active pose tracking module: a diffusion-policy trained via imitation learning, which generates camera trajectories that preserve object visibility and minimize pose ambiguity. Experiments in simulation and real-world show that our approach significantly outperforms classical baselines.

精确的6自由度（6-DoF）对象姿态估计和跟踪对于可靠的机器人操作至关重要。然而，零样本方法通常在视角引起的歧义下失效，固定摄像头设置在物体移动或自我遮挡时就会遇到困难。为了应对这些挑战，我们提出了一种结合视觉语言模型（VLM）和“机器人想象”的主动姿态估计管道，以实时检测和解决歧义。在离线阶段，我们渲染CAD模型的密集视图集，计算每个视图的FoundationPose熵，并构建一个包含低熵（无歧义）和高熵（有歧义）示例的几何感知提示。在运行时，系统：（1）在实时图像上查询VLM以获取歧义分数；（2）如果检测到歧义，通过渲染虚拟视图来想象一组离散的候选相机姿态，基于VLM歧义概率和FoundationPose熵的加权组合对它们进行评分，然后将相机移动到最佳新视角（NBV）以获得明确的姿态估计。此外，由于移动物体可能会离开相机的视野，我们引入了一个主动姿态跟踪模块：一个通过模仿学习训练的扩散策略，生成能够保持物体可见性并最小化姿态歧义的相机轨迹。仿真和真实世界的实验表明，我们的方法显著优于经典基线。

论文及项目相关链接

PDF 6D Pose, Diffusion Policy

Summary
针对机器人操控中的6自由度物体姿态准确估计与追踪问题，现有方法存在视角诱导歧义及固定摄像头设置下物体移动或自遮挡导致的困难。本研究提出结合视觉语言模型（VLM）与“机器人想象”的主动姿态估计管道，实时检测并解决歧义。离线阶段，通过渲染CAD模型的密集视图、计算FoundationPose熵值并构建几何感知提示。运行时，系统查询实时图像的VLM以获取歧义分数；检测歧义时，通过渲染虚拟视图生成一组离散相机姿态候选，基于VLM歧义概率和FoundationPose熵值的加权组合进行评分，并移动相机至最佳新视角（NBV）以获得明确的姿态估计。此外，针对移动物体可能离开相机视野的问题，引入主动姿态追踪模块：通过模仿学习训练的扩散策略，生成保持物体可见性并最小化姿态歧义的相机轨迹。仿真及真实实验表明，该方法显著优于传统基线。

Key Takeaways