嘘~ 正在从服务器偷取页面 . . .

医学图像


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-02-12 更新

ViSIR: Vision Transformer Single Image Reconstruction Method for Earth System Models

Authors:Ehsan Zeraatkar, Salah Faroughi, Jelena Tesic

Purpose: Earth system models (ESMs) integrate the interactions of the atmosphere, ocean, land, ice, and biosphere to estimate the state of regional and global climate under a wide variety of conditions. The ESMs are highly complex, and thus, deep neural network architectures are used to model the complexity and store the down-sampled data. In this paper, we propose the Vision Transformer Sinusoidal Representation Networks (ViSIR) to improve the single image SR (SR) reconstruction task for the ESM data. Methods: ViSIR combines the SR capability of Vision Transformers (ViT) with the high-frequency detail preservation of the Sinusoidal Representation Network (SIREN) to address the spectral bias observed in SR tasks. Results: The ViSIR outperforms ViT by 4.1 dB, SIREN by 7.5 dB, and SR-Generative Adversarial (SR-GANs) by 7.1dB PSNR on average for three different measurements. Conclusion: The proposed ViSIR is evaluated and compared with state-of-the-art methods. The results show that the proposed algorithm is outperforming other methods in terms of Mean Square Error(MSE), Peak-Signal-to-Noise-Ratio(PSNR), and Structural Similarity Index Measure(SSIM).

目的:地球系统模型(ESM)整合了大气、海洋、陆地、冰层和生物圈的相互作用,以在多种条件下估计区域和全球气候的状态。由于ESM高度复杂,因此使用深度神经网络架构来对其复杂性进行建模并存储降采样数据。在本文中,我们提出了Vision Transformer Sinusoidal Representation Networks(ViSIR),旨在改进ESM数据的单图像超分辨率(SR)重建任务。

方法:ViSIR结合了Vision Transformer(ViT)的SR能力与Sinusoidal Representation Network(SIREN)的高频细节保留能力,以解决SR任务中观察到的频谱偏见。

结果:ViSIR在三种不同测量上的平均PSNR指标优于ViT 4.1 dB,优于SIREN 7.5 dB,以及优于SR-生成对抗网络(SR-GANs)7.1 dB。

论文及项目相关链接

PDF

Summary

本文介绍了地球系统模型(ESMs)的复杂性及其数据处理的挑战。为此,提出了Vision Transformer Sinusoidal Representation Networks(ViSIR)模型,旨在提高单一图像超分辨率(SR)重建任务的性能。通过结合Vision Transformer(ViT)的超分辨率能力和Sinusoidal Representation Network(SIREN)的高频细节保留能力,解决了SR任务中的光谱偏差问题。实验结果表明,ViSIR在性能上优于其他方法。

Key Takeaways

  1. ESMs集成大气、海洋、陆地、冰川和生物圈交互作用,用于估算各种条件下的区域和全球气候状态。
  2. ESM数据处理的复杂性需要使用深度神经网络架构进行建模。
  3. 提出了Vision Transformer Sinusoidal Representation Networks (ViSIR) 模型,旨在提高单一图像SR重建任务的性能。
  4. ViSIR结合了Vision Transformer(ViT)和Sinusoidal Representation Network(SIREN)的优点,解决了SR任务中的光谱偏差问题。
  5. ViSIR在性能上优于ViT、SIREN和SR-GANs等方法,平均提高了4.1 dB、7.5 dB和7.1 dB的PSNR。
  6. 实验结果以Mean Square Error(MSE)、Peak-Signal-to-Noise-Ratio(PSNR)和Structural Similarity Index Measure(SSIM)等指标进行评估。

Cool Papers

点此查看论文截图

Prototype Contrastive Consistency Learning for Semi-Supervised Medical Image Segmentation

Authors:Shihuan He, Zhihui Lai, Ruxin Wang, Heng Kong

Medical image segmentation is a crucial task in medical image analysis, but it can be very challenging especially when there are less labeled data but with large unlabeled data. Contrastive learning has proven to be effective for medical image segmentation in semi-supervised learning by constructing contrastive samples from partial pixels. However, although previous contrastive learning methods can mine semantic information from partial pixels within images, they ignore the whole context information of unlabeled images, which is very important to precise segmentation. In order to solve this problem, we propose a novel prototype contrastive learning method called Prototype Contrastive Consistency Segmentation (PCCS) for semi-supervised medical image segmentation. The core idea is to enforce the prototypes of the same semantic class to be closer and push the prototypes in different semantic classes far away from each other. Specifically, we construct a signed distance map and an uncertainty map from unlabeled images. The signed distance map is used to construct prototypes for contrastive learning, and then we estimate the prototype uncertainty from the uncertainty map as trade-off among prototypes. In order to obtain better prototypes, based on the student-teacher architecture, a new mechanism named prototype updating prototype is designed to assist in updating the prototypes for contrastive learning. In addition, we propose an uncertainty-consistency loss to mine more reliable information from unlabeled data. Extensive experiments on medical image segmentation demonstrate that PCCS achieves better segmentation performance than the state-of-the-art methods. The code is available at https://github.com/comphsh/PCCS.

医学图像分割是医学图像分析中的一项重要任务,但当标注数据较少而无标签数据较多时,这会变得非常具有挑战性。对比学习通过从部分像素构建对比样本,在医学图像分割的半监督学习中表现出了有效性。然而,尽管先前的对比学习方法能够从图像内的部分像素中挖掘语义信息,但它们忽略了无标签图像的整体上下文信息,这对于精确分割非常重要。为了解决这个问题,我们提出了一种新型原型对比学习方法,称为原型对比一致性分割(PCCS),用于半监督医学图像分割。其核心思想是使同一语义类别的原型彼此接近,并将不同语义类别的原型相互推开。具体来说,我们从无标签图像中构建带符号距离图和不确定性图。带符号距离图用于构建对比学习的原型,然后我们根据不确定性图估计原型的不确定性作为原型之间的权衡。为了获得更好的原型,我们基于学生-教师架构,设计了一种名为原型更新原型的新机制,以帮助更新对比学习的原型。此外,我们提出了一种不确定性一致性损失,以从无标签数据中挖掘更可靠的信息。在医学图像分割方面进行的广泛实验表明,PCCS在最新方法的基础上实现了更好的分割性能。代码可在https://github.com/comphsh/PCCS找到。

论文及项目相关链接

PDF 17 pages, 10 figures, 7 tables

Summary
医学图像分割是医学图像分析中的关键任务,但在半监督学习场景下,由于标注数据少、未标注数据量大,任务极具挑战性。为解决现有对比学习法忽略未标注图像整体上下文信息的问题,提出一种名为原型对比一致性分割(PCCS)的半监督医学图像分割新方法。该方法通过构建同语义类原型拉近、不同语义类原型拉远的核心机制,利用未标注图像生成带符号距离图和不确定性图。为获取更好的原型,设计基于学生-教师架构的原型更新机制。同时,提出不确定性一致性损失,以从未标注数据中挖掘更可靠信息。实验证明,PCCS在医学图像分割上实现了优于最新方法的效果。

Key Takeaways

  1. 医学图像分割面临半监督学习挑战,标注数据少、未标注数据量大。
  2. 现有对比学习法忽略未标注图像的上下文信息。
  3. 提出原型对比一致性分割(PCCS)方法,利用未标注数据构建带符号距离图和不确定性图。
  4. PCCS通过拉近同语义类原型、推远不同语义类原型,强化学习效果。
  5. 引入学生-教师架构,设计原型更新机制以优化原型。
  6. 提出不确定性一致性损失,提高从未标注数据中获取信息的可靠性。
  7. 实验证明PCCS在医学图像分割上表现优异。

Cool Papers

点此查看论文截图

CT-UIO: Continuous-Time UWB-Inertial-Odometer Localization Using Non-Uniform B-spline with Fewer Anchors

Authors:Jian Sun, Wei Sun, Genwei Zhang, Kailun Yang, Song Li, Xiangqi Meng, Na Deng, Chongbin Tan

Ultra-wideband (UWB) based positioning with fewer anchors has attracted significant research interest in recent years, especially under energy-constrained conditions. However, most existing methods rely on discrete-time representations and smoothness priors to infer a robot’s motion states, which often struggle with ensuring multi-sensor data synchronization. In this paper, we present an efficient UWB-Inertial-odometer localization system, utilizing a non-uniform B-spline framework with fewer anchors. Unlike traditional uniform B-spline-based continuous-time methods, we introduce an adaptive knot-span adjustment strategy for non-uniform continuous-time trajectory representation. This is accomplished by adjusting control points dynamically based on movement speed. To enable efficient fusion of IMU and odometer data, we propose an improved Extended Kalman Filter (EKF) with innovation-based adaptive estimation to provide short-term accurate motion prior. Furthermore, to address the challenge of achieving a fully observable UWB localization system under few-anchor conditions, the Virtual Anchor (VA) generation method based on multiple hypotheses is proposed. At the backend, we propose a CT-UIO factor graph with an adaptive sliding window for global trajectory estimation. Comprehensive experiments conducted on corridor and exhibition hall datasets validate the proposed system’s high precision and robust performance. The codebase and datasets of this work will be open-sourced at https://github.com/JasonSun623/CT-UIO.

基于超宽带(UWB)的锚点更少的位置定位技术在近年来得到了广泛的研究兴趣,特别是在能源受限的条件下。然而,大多数现有方法依赖于离散时间表示和光滑先验来推断机器人的运动状态,这常常面临多传感器数据同步的困难。在本文中,我们提出了一种高效的UWB惯性里程计定位系经系统,该系统采用非均匀B样条框架并减少锚点的使用。不同于传统的基于均匀B样条的连续时间方法,我们为不均匀连续时间轨迹表示引入了一种自适应节点跨度调整策略。这是通过根据移动速度动态调整控制点来实现的。为了实现IMU和里程计数据的有效融合,我们提出了一种基于改进的扩展卡尔曼滤波器(EKF)的创新自适应估计方法,以提供短期精确运动先验。此外,为了解决在少量锚点条件下实现完全可观测的UWB定位系统的挑战,我们提出了基于多种假设的虚拟锚点(VA)生成方法。在后端,我们提出了一种具有自适应滑动窗口的CT-UIO因子图用于全局轨迹估计。在走廊和展厅数据集上进行的综合实验验证了所提出系统的高精度和稳健性能。该工作的代码库和数据集将在https://github.com/JasonSun623/CT-UIO上开源。

论文及项目相关链接

PDF The codebase and datasets will be open-sourced at https://github.com/JasonSun623/CT-UIO

Summary

基于超宽带(UWB)的锚点较少的定位方法近年来受到广泛关注,特别是在能源受限条件下。本文提出一种高效的UWB-惯性-里程计定位系统,采用非均匀B样条框架,通过自适应结跨调整策略实现连续时间轨迹的非均匀表示。此外,为了融合IMU和里程计数据,改进了扩展卡尔曼滤波器(EKF),并提供短期运动先验。同时,为解决少锚点条件下UWB定位系统的全观测挑战,提出基于多重假设的虚拟锚点生成方法。后端采用带有自适应滑动窗口的CT-UIO因子图进行全局轨迹估计。实验验证,该系统在走廊和展厅数据集上表现出高精度和稳健性能。

Key Takeaways

  • 该研究关注基于超宽带(UWB)的锚点较少的定位方法,特别是在能源受限条件下的应用。
  • 采用非均匀B样条框架实现连续时间轨迹的非均匀表示,提高了定位精度。
  • 通过改进扩展卡尔曼滤波器(EKF)实现IMU和里程计数据的融合,提供短期运动先验。
  • 提出基于多重假设的虚拟锚点生成方法,解决了少锚点条件下UWB定位系统的全观测挑战。
  • 后端采用带有自适应滑动窗口的CT-UIO因子图进行全局轨迹估计,提高了系统的稳健性和精度。
  • 实验验证,该系统在走廊和展厅数据集上表现出优越性能。

Cool Papers

点此查看论文截图

A Data-Efficient Pan-Tumor Foundation Model for Oncology CT Interpretation

Authors:Wenhui Lei, Hanyu Chen, Zitian Zhang, Luyang Luo, Qiong Xiao, Yannian Gu, Peng Gao, Yankai Jiang, Ci Wang, Guangtao Wu, Tongjia Xu, Yingjie Zhang, Xiaofan Zhang, Pranav Rajpurkar, Shaoting Zhang, Zhenning Wang

Artificial intelligence-assisted imaging analysis has made substantial strides in tumor diagnosis and management. Here we present PASTA, a pan-tumor CT foundation model that achieves state-of-the-art performance on 45 of 46 representative oncology tasks – including lesion segmentation, tumor detection in plain CT, tumor staging, survival prediction, structured report generation, and cross-modality transfer learning, significantly outperforming the second-best models on 35 tasks. This remarkable advancement is driven by our development of PASTA-Gen, an innovative synthetic tumor generation framework that produces a comprehensive dataset of 30,000 CT scans with pixel-level annotated lesions and paired structured reports, encompassing malignancies across ten organs and five benign lesion types. By leveraging this rich, high-quality synthetic data, we overcome a longstanding bottleneck in the development of CT foundation models – specifically, the scarcity of publicly available, high-quality annotated datasets due to privacy constraints and the substantial labor required for scaling precise data annotation. Encouragingly, PASTA demonstrates exceptional data efficiency with promising practical value, markedly improving performance on various tasks with only a small amount of real-world data. The open release of both the synthetic dataset and PASTA foundation model effectively addresses the challenge of data scarcity, thereby advancing oncological research and clinical translation.

人工智能辅助成像分析在肿瘤诊断和治疗方面取得了重大进展。在此,我们推出了PANCTA模型,这是一种全面的肿瘤CT基础模型,在代表肿瘤的众多任务中实现了最佳性能,其中包括病变分割、普通CT中的肿瘤检测、肿瘤分期、生存预测、结构化报告生成和跨模态迁移学习等任务中的45项任务中的前4项任务。在其中的35项任务中,它显著优于排名第二的最佳模型。这一令人瞩目的进步得益于我们开发的PANCTA-Gen这一创新性的合成肿瘤生成框架。该框架生成了一个包含超过三万份CT扫描的综合数据集,其中包含像素级注释的病变和配对结构化报告,涵盖了十个器官的恶性肿瘤和五种良性病变类型。通过利用丰富的高质量合成数据,我们克服了长期以来阻碍CT基础模型发展的瓶颈问题,特别是由于隐私约束和大规模精确数据标注所需的大量劳动力导致的公开可用高质量标注数据集的稀缺问题。令人鼓舞的是,PANCTA在实用方面表现出令人振奋的数据效率优势,在仅有少量真实世界数据的情况下便可以在多种任务上显著提高其性能表现。合成数据集和基础模型的成功开源有助于解决数据稀缺问题所带来的挑战,进而推动肿瘤学研究和临床应用的发展。

论文及项目相关链接

PDF 57 pages, 7 figures

Summary

人工智能辅助成像分析在肿瘤诊断和治疗中取得了重大进展。本文提出了PASTA泛肿瘤CT基础模型,该模型在代表肿瘤的多个任务中取得了最先进的性能表现,包括病灶分割、普通CT中的肿瘤检测、肿瘤分期、生存预测、结构化报告生成和跨模态迁移学习等。这一显著进展得益于PASTA-Gen的合成肿瘤生成框架的开发,该框架生成了包含十万个CT扫描图像的全面数据集,具有像素级注释的病灶和配对的结构化报告,涵盖了十大器官的恶性肿瘤和五种良性病灶类型。通过利用这些丰富的高质量合成数据,我们克服了CT基础模型发展中的长期瓶颈。其中存在的公共高质量标注数据集稀缺的问题因隐私约束和大规模精确数据标注所需的大量劳动而加剧。令人鼓舞的是,PASTA在数据效率方面表现出色,在实际应用中具有很高的实用价值。本文所开放合成的数据集和PASTA基础模型的推出有效解决了数据稀缺问题,为肿瘤学研究和临床应用翻译带来了新的机遇。

Key Takeaways

  1. AI辅助成像分析在肿瘤诊断治疗中取得显著进展。
  2. PASTA模型在多种肿瘤相关任务上表现优异。
  3. PASTA-Gen合成肿瘤生成框架的开发为肿瘤影像分析提供了大量高质量数据。
  4. 通过利用合成数据,解决了CT基础模型发展中公开可用高质量标注数据集的稀缺问题。
  5. PASTA在数据效率方面表现优越,具有较高的实用价值。
  6. 开放合成数据集和PASTA基础模型的推出解决了数据稀缺的挑战。

Cool Papers

点此查看论文截图

A coupled planar transmit RF array for ultrahigh field spine MR imaging

Authors:Yunkun Zhao, Komlan Payne, Leslie Ying, Xiaoliang Zhang

Ultrahigh-field MRI, such as those operating at 7 Tesla, enhances diagnostic capabilities but also presents unique challenges, including the need for advanced RF coil designs to achieve an optimal signal-to-noise ratio and transmit efficiency, particularly when imaging large samples. In this work, we introduce the coupled planar array, a novel technique for high-frequency, large-size RF coil design with enhanced the RF magnetic field (B1) efficiency and transmit performance for ultrahigh-field spine imaging applications. This array comprises multiple resonators that are electromagnetically coupled to function as a single multimodal resonator. The field distribution of its highest frequency mode is suitable for spine imaging applications. Based on the numerical modeling and calculation, a prototype of the coupled planar array was constructed and its performance was evaluated through comprehensive numerical simulations, rigorous RF measurements, empirical tests, and a comparison against a conventional surface coil with the same size and geometry. The results of this study demonstrate that the proposed coupled planar array exhibits superior performance compared to conventional surface coils in terms of B1 efficiency for both transmit (B1+) and receive (B1-) fields, specific absorption rate (SAR), and the ability to operate at high frequencies. This study suggests a promising and efficient approach to the design of high-frequency, large-size RF coils for spine MR imaging at ultrahigh magnetic fields.

超高场MRI,如7特斯拉运行的MRI,虽然增强了诊断能力,但也带来了独特的挑战,包括需要先进的射频线圈设计,以实现最佳的信噪比和传输效率,特别是在成像大样本时。在这项工作中,我们引入了耦合平面阵列,这是一种用于高频大尺寸射频线圈设计的新技术,提高了射频磁场(B1)效率和传输性能,适用于超高场脊柱成像应用。该阵列由多个谐振器组成,通过电磁耦合功能作为单个多模式谐振器。其最高频率模式的场分布适用于脊柱成像应用。基于数值建模和计算,构建了耦合平面阵列的原型,并通过全面的数值模拟、严格的射频测量、实证测试以及与相同尺寸和几何形状的常规表面线圈的比较,对其性能进行了评估。研究结果表表明,与传统的表面线圈相比,所提出的耦合平面阵列在B1效率和传输(B1+)和接收(B-)场方面表现出卓越的性能,具有特定的吸收率(SAR),并且能够在高频下运行。本研究为高频大尺寸射频线圈的设计提供了一种有前途且有效的方法,可用于超高磁场下的脊柱MR成像。

论文及项目相关链接

PDF

Summary

本文介绍了一种新型的高频大尺寸射频线圈设计——耦合平面阵列,该设计提高了射频磁场(B1)效率和传输性能,适用于超高场脊柱成像应用。通过数值建模和计算,构建了该阵列的原型,并通过综合数值模拟、严谨的射频测量、经验测试以及与常规表面线圈的对比,评估了其性能。研究表明,与传统表面线圈相比,耦合平面阵列在B1效率和特定吸收率方面表现出卓越性能,并且能够在高频下运行。这为高频大尺寸射频线圈的设计提供了有前景且有效的方法,适用于超高磁场下的脊柱MR成像。

Key Takeaways

  1. 介绍了用于超高场MRI(如7特斯拉)的新型射频线圈设计——耦合平面阵列。
  2. 耦合平面阵列由多个谐振器组成,通过电磁耦合功能作为单一的多模式谐振器。
  3. 该阵列的最高频率模式的场分布适合脊柱成像应用。
  4. 通过数值建模和计算,构建了耦合平面阵列的原型。
  5. 与常规表面线圈相比,耦合平面阵列在B1效率和特定吸收率方面表现出卓越性能。
  6. 耦合平面阵列能在高频下运行,具有更好的传输和接收性能。

Cool Papers

点此查看论文截图

ClinKD: Cross-Modal Clinic Knowledge Distiller For Multi-Task Medical Images

Authors:Hongyu Ge, Longkun Hao, Zihui Xu, Zhenxin Lin, Bin Li, Shoujun Zhou, Hongjin Zhao, Yihang Liu

Med-VQA (Medical Visual Question Answering) is a crucial subtask within the broader VQA (Visual Question Answering) domain. This task requires a visual question answering system to analyze the provided image and corresponding question,offering reasonable analysis and suggestions to assist medical professionals in making pathological diagnoses, or ideally, enabling the system to independently provide correct diagnoses. Furthermore, more advanced Med-VQA tasks involve Referring and Grounding, which not only require the system to accurately comprehend medical images but also to pinpoint specific biological locations within those images. While many large pre-trained models have demonstrated substantial VQA capabilities,challenges persist in the medical imaging domain. The intricacy of biological features in medical images and the scarcity of high-quality medical image datasets, combined with the fact that current models are not tailored for the medical field in terms of architecture and training paradigms, hinder the full exploitation of model generalization. This results in issues such as hallucination in Visual Grounding. In this paper, we introduce the ClinKD model, which incorporates modifications to model position encoding and a diversified training process. Initially, we enhance the model’s ability to perceive image and modality variations by using Med-CLIP Guided Rotary Position Embedding. Subsequently, we leverage distillation to provide prior knowledge to the model before using complete training data. Additionally, the feedback-based training process during the formal training phase further enhances data utilization. Notably, under unchanged evaluation protocols, we achieve a new state-of-the-art performance on the Med-GRIT-270k dataset, and the Med-CLIP Guided Rotary Position Embedding approach presents potential for generalizing to universal model position encoding.

医学视觉问答(Med-VQA)是视觉问答(VQA)领域中的一个重要子任务。该任务要求视觉问答系统分析提供的图像和相应的问题,提供合理的分析和建议,以协助医疗专业人士进行病理诊断,或者理想情况下,使系统能够独立提供正确的诊断。此外,更高级的Med-VQA任务涉及引用和接地,这要求系统不仅准确理解医学图像,还能指出图像中特定的生物位置。虽然许多大型预训练模型已经表现出了相当的VQA能力,但在医学成像领域仍存在挑战。医学图像中生物特征的复杂性以及高质量医学图像数据集的稀缺性,再加上当前模型在结构和训练范式上并不专门针对医学领域,阻碍了模型的全面泛化。这导致视觉接地中的幻觉等问题。在本文中,我们介绍了ClinKD模型,该模型对模型位置编码进行了修改,并采用了多样化的训练过程。首先,我们通过使用Med-CLIP引导旋转位置嵌入法,提高了模型对图像和模态变化的理解能力。随后,我们在使用完整训练数据之前,利用蒸馏法为模型提供先验知识。此外,正式训练阶段的基于反馈的训练过程进一步提高了数据利用率。值得注意的是,在不变的评估协议下,我们在Med-GRIT-270k数据集上实现了最新的性能,并且Med-CLIP引导旋转位置嵌入法为通用模型位置编码提供了潜在的通用性。

论文及项目相关链接

PDF

Summary
医学视觉问答(Med-VQA)是视觉问答(VQA)领域中的一个重要子任务。它要求视觉问答系统分析图像和相应问题,提供合理的分析和建议,协助医疗专业人士进行病理诊断,或使系统能够独立提供正确的诊断。本文介绍了ClinKD模型,通过改进模型位置编码和多样化训练过程,提高在医学图像领域的视觉问答性能。

Key Takeaways

  1. Med-VQA是VQA领域中的重要子任务,要求系统分析医学图像和相应问题,提供诊断辅助。
  2. 当前面临的挑战包括医学图像的复杂性、高质量数据集稀缺以及模型架构和训练范式的不针对性。
  3. ClinKD模型通过改进模型位置编码和多样化训练过程,增强了模型在医学图像领域的性能。
  4. Med-CLIP Guided Rotary Position Embedding方法提高了模型对图像和模态变化的感知能力。
  5. 蒸馏法用于向模型提供先验知识,提高数据利用率。
  6. 在Med-GRIT-270k数据集上实现了最新性能。

Cool Papers

点此查看论文截图

High pressure structural and lattice dynamics study of α-In$_2$Se$_3$

Authors:Shiyu Feng, Baihong Sun, Wenting Lu, Haikai Zou, Chenxin Wei, Qian Zhang, Bihan Wang, Martin Kunz, Hirokazu Kadobayashi, Azkar Saeed Ahmad, Elad Koren, Elissaios Stavrou

Layered $\alpha$-In$_2$Se$_3$has been studied using a concomitant in-situ synchrotron angle dispersive powder x-ray diffraction and Raman spectroscopy study in a diamond anvil cell up to 60+ GPa, at room temperature. Helium, that remains fairly hydrostatic up to the highest pressure in this study, was used as the pressure-transmitting medium. The results from both experimental methods reveal a pressure-induced structural phase transition from $\alpha$-In$_2$Se$_3$ to a monoclinic $\beta$’-In2Se3 structure at $\approx$1 GPa, in agreement with previous studies. Based on our detailed measurements using both experimental techniques and F-f formalism, the $\beta$’-In$_2$Se$_3$ structure remains stable up to 45 GPa, without a clear indication of a phase transition towards the previously reported $\beta$-In2Se3 phase. Above this pressure, In$_2$Se$_3$ adopts a disordered solid-solution-like orthorhombic structure, phase IV. The results are discussed in comparison with the relevant previous studies of $\alpha$-In$_2$Se$_3$ under pressure.

采用同步加速器角度发散粉末X射线衍射和拉曼光谱联合原位研究法,在金刚石压砧高压实验装置中对层状α-In₂Se₃进行了高压研究,最高压力达到60+ GPa,环境温度为室温。在本研究中,最高压力下的氦气仍保持相当的水静压力,被用作压力传递介质。两种实验方法的结果均表明,α-In₂Se₃在约1 GPa压力下发生结构相变,转变为单斜β’-In₂Se₃结构,与前人的研究结果一致。基于我们利用这两种实验技术和F-f形式主义进行的详细测量,β’-In₂Se₃结构在高达45 GPa时保持稳定,没有明显的相转变为先前报道的β-In₂Se₃相的迹象。在此压力之上,In₂Se₃呈现无序的类似固态溶液的正交结构,即第IV相。结果与前人关于α-In₂Se₃受压的相关研究进行了比较和讨论。

论文及项目相关链接

PDF 16 Pages, 6 figures

Summary
α-In₂Se₃在金刚石压砧下进行同步辐射角度分散粉末X射线衍射和拉曼光谱研究,显示其能承受高达60 GPa以上的压力。在室温下,使用氦气作为压力传递介质,观察到压力诱导的结构相变从α-In₂Se₃转变为单斜β’-In₂Se₃结构,并在约45 GPa压力下保持稳定,无明显的相变迹象。超过此压力,In₂Se₃呈现无序的固溶体状正交结构。与之前的研究相比,这些结果提供了新的视角。

Key Takeaways

  1. α-In₂Se₃能承受高达60 GPa以上的压力。
  2. 在室温下,通过同步辐射角度分散粉末X射线衍射和拉曼光谱研究α-In₂Se₃。
  3. 压力诱导的结构相变从α-In₂Se₃转变为单斜β’-In₂Se₃结构在约1 GPa下发生。
  4. β’-In₂Se₃结构在高达约45 GPa的压力下保持稳定。
  5. 在更高的压力下,In₂Se₃呈现出无序的固溶体状正交结构(phase IV)。
  6. 此研究结果与之前关于α-In₂Se₃的研究相比提供了新的视角。

Cool Papers

点此查看论文截图

Fast Omni-Directional Image Super-Resolution: Adapting the Implicit Image Function with Pixel and Semantic-Wise Spherical Geometric Priors

Authors:Xuelin Shen, Yitong Wang, Silin Zheng, Kang Xiao, Wenhan Yang, Xu Wang

In the context of Omni-Directional Image (ODI) Super-Resolution (SR), the unique challenge arises from the non-uniform oversampling characteristics caused by EquiRectangular Projection (ERP). Considerable efforts in designing complex spherical convolutions or polyhedron reprojection offer significant performance improvements but at the expense of cumbersome processing procedures and slower inference speeds. Under these circumstances, this paper proposes a new ODI-SR model characterized by its capacity to perform Fast and Arbitrary-scale ODI-SR processes, denoted as FAOR. The key innovation lies in adapting the implicit image function from the planar image domain to the ERP image domain by incorporating spherical geometric priors at both the latent representation and image reconstruction stages, in a low-overhead manner. Specifically, at the latent representation stage, we adopt a pair of pixel-wise and semantic-wise sphere-to-planar distortion maps to perform affine transformations on the latent representation, thereby incorporating it with spherical properties. Moreover, during the image reconstruction stage, we introduce a geodesic-based resampling strategy, aligning the implicit image function with spherical geometrics without introducing additional parameters. As a result, the proposed FAOR outperforms the state-of-the-art ODI-SR models with a much faster inference speed. Extensive experimental results and ablation studies have demonstrated the effectiveness of our design.

在全景图像(ODI)超分辨率(SR)的背景下,面临的挑战来自于等矩形投影(ERP)引起的非均匀过采样特性。设计复杂的球形卷积或多面体重投影需要大量的努力,虽然这提供了显著的性能改进,但却带来了繁琐的处理程序和较慢的推理速度。本文提出了一种新的ODI-SR模型,其特点是能够快速进行任意尺度的ODI-SR处理,被称为FAOR。主要创新在于通过将隐式图像函数从平面图像域适应到ERP图像域,以低开销的方式在潜在表示和图像重建阶段融入球形几何先验。具体来说,在潜在表示阶段,我们采用一对像素级和语义级的球面到平面失真图,对潜在表示进行仿射变换,从而将其与球形属性相结合。此外,在图像重建阶段,我们引入了一种基于测地线的重采样策略,使隐式图像函数与球形几何对齐,而无需引入额外的参数。因此,所提出的FAOR在推理速度上大大优于现有的ODI-SR模型。大量的实验和消融研究证明了我们的设计的有效性。

论文及项目相关链接

PDF 9 pages, 4 figures, AAAI 2025

Summary
针对Omni-Directional Image(ODI)超分辨率(SR)中由EquiRectangular Projection(ERP)引起的非均匀过采样特性所带来的独特挑战,本文提出了一种新的ODI-SR模型FAOR,它能够在低开销的情况下,在潜在表示和图像重建阶段融入球形几何先验,快速执行任意尺度的ODI-SR处理。通过采用像素级和语义级的球到平面失真映射进行仿射变换,以及基于测地线的重采样策略,FAOR模型在保持高速推理速度的同时,实现了对最先进ODI-SR模型的性能超越。

Key Takeaways

  1. ODI-SR面临ERP引起的非均匀过采样特性的挑战。
  2. 现有方法如复杂球形卷积或多边形重新投影虽能提高性能,但处理过程繁琐且推理速度慢。
  3. 本文提出的FAOR模型能在潜在表示和图像重建阶段融入球形几何先验。
  4. FAOR模型通过球到平面失真映射进行仿射变换,适应球形图像特性。
  5. 图像重建阶段采用基于测地线的重采样策略,与球形几何对齐,无额外参数。
  6. FAOR模型实现了对最先进ODI-SR模型的性能超越,同时保持了快速推理速度。

Cool Papers

点此查看论文截图

Image-Based Alzheimer’s Disease Detection Using Pretrained Convolutional Neural Network Models

Authors:Nasser A Alsadhan

Alzheimer’s disease is an untreatable, progressive brain disorder that slowly robs people of their memory, thinking abilities, and ultimately their capacity to complete even the most basic tasks. Among older adults, it is the most frequent cause of dementia. Although there is presently no treatment for Alzheimer’s disease, scientific trials are ongoing to discover drugs to combat the condition. Treatments to slow the signs of dementia are also available. Many researchers throughout the world became interested in developing computer-aided diagnosis systems to aid in the early identification of this deadly disease and assure an accurate diagnosis. In particular, image based approaches have been coupled with machine learning techniques to address the challenges of Alzheimer’s disease detection. This study proposes a computer aided diagnosis system to detect Alzheimer’s disease from biomarkers captured using neuroimaging techniques. The proposed approach relies on deep learning techniques to extract the relevant visual features from the image collection to accurately predict the Alzheimer’s class value. In the experiments, standard datasets and pre-trained deep learning models were investigated. Moreover, standard performance measures were used to assess the models’ performances. The obtained results proved that VGG16-based models outperform the state of the art performance.

阿尔茨海默病是一种无法治疗的、进展性的大脑疾病,它会逐渐剥夺人们的记忆、思维能力和最终完成最基础任务的能力。在老年人中,它是导致痴呆最常见的原因。虽然目前还没有治疗阿尔茨海默病的方法,但科学家们正在进行药物试验以对抗这种疾病。也有治疗来缓解痴呆症状的方法。世界上许多研究者对开发计算机辅助诊断系统感兴趣,以帮助早期发现这种致命的疾病并确保准确诊断。特别是,图像方法已经与机器学习技术相结合,以解决阿尔茨海默病检测的挑战。本研究提出了一种计算机辅助诊断系统,通过神经成像技术捕获的生物标志物来检测阿尔茨海默病。所提出的方法依赖于深度学习技术从图像集合中提取相关的视觉特征,以准确预测阿尔茨海默病分类值。在实验中,调查了标准数据集和预训练的深度学习模型。而且,使用标准性能度量来评估模型性能。所得结果证明,基于VGG16的模型优于现有技术性能。

论文及项目相关链接

PDF

Summary

这篇文本主要描述了对老年痴呆症(Alzheimer’s disease)的研究现状。虽然当前没有治疗方法,但科学家们正在寻找对抗这种疾病的药物。同时,计算机辅助诊断系统被开发出来帮助早期识别这种疾病并保证准确诊断。本研究提出了一种基于深度学习的计算机辅助诊断系统,通过神经影像技术捕捉生物标志物来检测老年痴呆症。实验证明,基于VGG16模型的性能表现超越了现有技术。

Key Takeaways

  1. Alzheimer’s disease是常见的痴呆症状,目前无法治疗,但科学家正在寻找相关药物。
  2. 计算机辅助诊断系统被用于早期识别和准确诊断老年痴呆症。
  3. 本研究提出了一种基于深度学习的诊断系统,通过神经影像技术检测老年痴呆症。
  4. 该系统通过提取图像相关特征来准确预测老年痴呆症的分类值。
  5. 实验采用标准数据集和预训练深度学习模型进行评估。
  6. 基于VGG16模型的性能表现超越了现有技术。

Cool Papers

点此查看论文截图

Validity-first automatic polycube labeling for CAD models

Authors:Sébastien Mestrallet, Christophe Bourcier, Franck Ledoux

For many simulation codes, block-structured hex meshes remain preferred while their automatic generation is unsolved. We investigate the usage of a polycube-based approach. More specifically, we focus on the labeling stage, which consists in assigning each boundary facet to one of the 6 signed principal axis. Similar works are confronted with 2 challenges: over-constraining validity criteria, and the conflated processing of validity criteria with quality metrics. We tackle these obstacles with automatic routines based on semi-global labeling operators. Our approach is successfully tested on CAD models, which are of interest for many numerical simulation problems.

对于许多仿真代码来说,虽然块结构化的六边形网格的自动生成尚未得到解决,但它仍然是最受欢迎的选择。我们研究了基于多边形的处理方法的应用。更具体地说,我们关注的是标记阶段,该阶段将每个边界面分配给六个带符号的主轴之一。类似的研究面临两大挑战:过于严格的合理性标准以及合理性标准与质量指标的混淆处理。我们通过基于半全局标记运算符的自动程序来解决这些障碍。我们的方法在CAD模型上测试成功,这对许多数值模拟问题具有重要意义。

论文及项目相关链接

PDF 14 pages. Source code: https://github.com/LIHPC-Computational-Geometry/validity-first-polycube-labeling

Summary

本文探讨了在许多仿真代码中首选的块结构六边形网格的自动生成问题。研究了一种基于多边形的处理方法,特别是标签阶段,即为每个边界面分配给六个有符号主轴之一的过程。类似的研究面临两个挑战:过于严格的有效性标准和混淆的有效性标准与质量指标。该研究采用基于半全局标签算子的自动程序解决了这些障碍,并在CAD模型上进行了成功测试,这对许多数值模拟问题具有重要意义。

Key Takeaways

  1. 块结构六边形网格在许多仿真代码中受到青睐,但其自动生成仍是未解决的问题。
  2. 研究了一种基于多边形的处理方法来解决这一问题。
  3. 标签阶段是关键环节,涉及为每个边界面分配六个有符号主轴之一的任务。
  4. 类似的研究面临两个主要挑战:过于严格的有效性标准和混淆的有效性标准与质量指标的处理问题。
  5. 研究通过采用基于半全局标签算子的自动程序来解决这些挑战。
  6. 方法在CAD模型上进行了成功测试。

Cool Papers

点此查看论文截图

Semantic Data Augmentation Enhanced Invariant Risk Minimization for Medical Image Domain Generalization

Authors:Yaoyao Zhu, Xiuding Cai, Yingkai Wang, Yu Yao, Xu Luo, Zhongliang Fu

Deep learning has achieved remarkable success in medical image classification. However, its clinical application is often hindered by data heterogeneity caused by variations in scanner vendors, imaging protocols, and operators. Approaches such as invariant risk minimization (IRM) aim to address this challenge of out-of-distribution generalization. For instance, VIRM improves upon IRM by tackling the issue of insufficient feature support overlap, demonstrating promising potential. Nonetheless, these methods face limitations in medical imaging due to the scarcity of annotated data and the inefficiency of augmentation strategies. To address these issues, we propose a novel domain-oriented direction selector to replace the random augmentation strategy used in VIRM. Our method leverages inter-domain covariance as a guider for augmentation direction, guiding data augmentation towards the target domain. This approach effectively reduces domain discrepancies and enhances generalization performance. Experiments on a multi-center diabetic retinopathy dataset demonstrate that our method outperforms state-of-the-art approaches, particularly under limited data conditions and significant domain heterogeneity.

深度学习在医学图像分类方面取得了显著的成功。然而,由于其扫描设备供应商、成像协议和操作人员等方面的差异导致的数据异质性,其临床应用往往受到阻碍。不变风险最小化(IRM)等方法旨在解决分布外推广的挑战。例如,VIRM通过解决特征支持重叠不足的问题改进了IRM,显示出有前景的潜力。然而,这些方法在医学成像方面面临着标注数据稀缺和增强策略效率低下的局限性。为了解决这些问题,我们提出了一种新的面向域的方向选择器,以替代VIRM中使用的随机增强策略。我们的方法利用域间协方差作为增强方向的指导,引导数据增强朝向目标域。这种方法有效地减少了域差异,提高了泛化性能。在多中心糖尿病视网膜病变数据集上的实验表明,我们的方法优于最新技术,特别是在数据有限和域差异显著的条件下。

论文及项目相关链接

PDF

Summary
医学图像分类中深度学习取得了显著成功,但数据异质性阻碍了其临床应用。本文提出一种新型领域导向方向选择器,替代VIRM中的随机增强策略,利用跨域协方差作为增强方向的指导,有效减少域差异,提高泛化性能。

Key Takeaways

  1. 深度学习在医学图像分类上取得显著成就,但数据异质性是临床应用中的一大挑战。
  2. IRM方法旨在解决分布外推广的挑战,但存在不足。
  3. VIRM方法解决了IRM中特征支持重叠不足的问题,展现出良好潜力。
  4. 医学成像中面临的问题是标注数据稀缺和增强策略效率低下。
  5. 新型领域导向方向选择器被提出,以替代VIRM中的随机增强策略。
  6. 该方法利用跨域协方差作为增强方向的指导,有效减少域差异。

Cool Papers

点此查看论文截图

LMS-Net: A Learned Mumford-Shah Network For Few-Shot Medical Image Segmentation

Authors:Shengdong Zhang, Fan Jia, Xiang Li, Hao Zhang, Jun Shi, Liyan Ma, Shihui Ying

Few-shot semantic segmentation (FSS) methods have shown great promise in handling data-scarce scenarios, particularly in medical image segmentation tasks. However, most existing FSS architectures lack sufficient interpretability and fail to fully incorporate the underlying physical structures of semantic regions. To address these issues, in this paper, we propose a novel deep unfolding network, called the Learned Mumford-Shah Network (LMS-Net), for the FSS task. Specifically, motivated by the effectiveness of pixel-to-prototype comparison in prototypical FSS methods and the capability of deep priors to model complex spatial structures, we leverage our learned Mumford-Shah model (LMS model) as a mathematical foundation to integrate these insights into a unified framework. By reformulating the LMS model into prototype update and mask update tasks, we propose an alternating optimization algorithm to solve it efficiently. Further, the iterative steps of this algorithm are unfolded into corresponding network modules, resulting in LMS-Net with clear interpretability. Comprehensive experiments on three publicly available medical segmentation datasets verify the effectiveness of our method, demonstrating superior accuracy and robustness in handling complex structures and adapting to challenging segmentation scenarios. These results highlight the potential of LMS-Net to advance FSS in medical imaging applications. Our code will be available at: https://github.com/SDZhang01/LMSNet

小样语义分割(FSS)方法在数据稀缺场景的处理中显示出巨大的潜力,特别是在医学图像分割任务中。然而,大多数现有的FSS架构缺乏足够的可解释性,并且未能充分融入语义区域的底层物理结构。为了解决这些问题,本文提出了一种新型的深度展开网络,称为学习Mumford-Shah网络(LMS-Net),用于FSS任务。具体而言,受到原型FSS方法中的像素到原型比较的有效性和深度先验建模复杂空间结构的能力的启发,我们以学习到的Mumford-Shah模型(LMS模型)作为数学基础,将这些见解整合到一个统一框架中。通过将LMS模型重新制定为原型更新和掩膜更新任务,我们提出了一种交替优化算法来有效地解决它。此外,该算法的迭代步骤被展开成相应的网络模块,从而形成了具有明确可解释性的LMS-Net。在三个公开的医学分割数据集上进行的综合实验验证了我们的方法的有效性,在处理复杂结构和适应具有挑战性的分割场景方面表现出较高的准确性和鲁棒性。这些结果突出了LMS-Net在医学成像应用的FSS中的潜力。我们的代码将在以下网址提供:https://github.com/SDZhang01/LMSNet

论文及项目相关链接

PDF

Summary
医学图像分割中数据稀缺的问题可通过少数语义分割(FSS)方法解决,但现有架构缺乏足够的解释性且未能充分融入语义区域的底层物理结构。本文提出一种新型深度展开网络——学习Mumford-Shah网络(LMS-Net),以应对这些问题。结合原型FS方法和深度先验模型的优点,利用学习Mumford-Shah模型(LMS模型)作为数学基础进行整合。通过交替优化算法高效求解LMS模型,并展开迭代步骤为相应的网络模块,实现具有明确解释性的LMS-Net。在三个公开医学分割数据集上的实验验证了其优越性,展现出处理复杂结构和适应挑战分割场景的潜力。

Key Takeaways

  • FSS方法在医学图像分割中处理数据稀缺问题具有巨大潜力。
  • 现有FSS架构缺乏解释性,未能充分融入语义区域的物理结构。
  • 提出了新型网络LMS-Net,结合原型FS方法和深度先验模型的优点。
  • 利用学习Mumford-Shah模型(LMS模型)作为数学基础进行整合。
  • 通过交替优化算法高效求解LMS模型。
  • LMS-Net具有明确的解释性,通过展开迭代步骤为相应的网络模块。
  • 在三个公开医学分割数据集上的实验验证了LMS-Net的优越性。

Cool Papers

点此查看论文截图

A Novel Convolutional-Free Method for 3D Medical Imaging Segmentation

Authors:Canxuan Gang

Segmentation of 3D medical images is a critical task for accurate diagnosis and treatment planning. Convolutional neural networks (CNNs) have dominated the field, achieving significant success in 3D medical image segmentation. However, CNNs struggle with capturing long-range dependencies and global context, limiting their performance, particularly for fine and complex structures. Recent transformer-based models, such as TransUNet and nnFormer, have demonstrated promise in addressing these limitations, though they still rely on hybrid CNN-transformer architectures. This paper introduces a novel, fully convolutional-free model based on transformer architecture and self-attention mechanisms for 3D medical image segmentation. Our approach focuses on improving multi-semantic segmentation accuracy and addressing domain adaptation challenges between thick and thin slice CT images. We propose a joint loss function that facilitates effective segmentation of thin slices based on thick slice annotations, overcoming limitations in dataset availability. Furthermore, we present a benchmark dataset for multi-semantic segmentation on thin slices, addressing a gap in current medical imaging research. Our experiments demonstrate the superiority of the proposed model over traditional and hybrid architectures, offering new insights into the future of convolution-free medical image segmentation.

三维医学图像分割是准确诊断和治疗计划的关键任务。卷积神经网络(CNN)在该领域占据主导地位,并在三维医学图像分割方面取得了巨大成功。然而,CNN在捕捉长距离依赖性和全局上下文方面存在困难,限制了其性能,特别是对于精细和复杂结构的分割。最近基于transformer的模型,如TransUNet和nnFormer,显示出解决这些限制的潜力,尽管它们仍然依赖于混合的CNN-transformer架构。本文介绍了一种新型的、完全基于transformer架构和自注意力机制的、不含卷积的三维医学图像分割模型。我们的方法侧重于提高多语义分割的精度,并解决厚切片和薄切片CT图像之间的域适应挑战。我们提出了一种联合损失函数,它基于厚切片注释有效地分割薄切片,克服了数据集可用性方面的限制。此外,我们为薄切片上的多语义分割提供了一个基准数据集,解决了当前医学成像研究中的空白。我们的实验证明了所提出模型在性能和准确性方面优于传统和混合架构,为无卷积医学图像分割的未来提供了新的见解。

论文及项目相关链接

PDF technical report

Summary

基于Transformer架构和自注意力机制的全新卷积自由模型,用于改进三维医学图像的多语义分割精度并解决厚切片与薄切片CT图像之间的域适应挑战。提出联合损失函数,基于厚切片注释实现薄切片的有效分割,解决数据集可用性限制问题。同时提供多语义分割的薄切片基准数据集,填补当前医学成像研究的空白。实验证明该模型在性能和准确性上优于传统和混合架构。

Key Takeaways

  • 论文介绍了针对三维医学图像分割的新模型,该模型基于Transformer架构和自注意力机制,旨在改进多语义分割的准确度。
  • 新模型解决了卷积神经网络在处理长距离依赖和全局上下文时的局限性,特别是在精细和复杂结构上的表现。
  • 模型解决了厚切片与薄切片CT图像之间的域适应挑战,通过提出联合损失函数,利用厚切片的注释信息实现薄切片的准确分割。
  • 论文解决了医学成像研究中数据集可用性对模型训练的限制问题。
  • 提供了一个多语义分割的薄切片基准数据集,填补了当前医学成像研究的空白。

Cool Papers

点此查看论文截图

Homeomorphism Prior for False Positive and Negative Problem in Medical Image Dense Contrastive Representation Learning

Authors:Yuting He, Boyu Wang, Rongjun Ge, Yang Chen, Guanyu Yang, Shuo Li

Dense contrastive representation learning (DCRL) has greatly improved the learning efficiency for image-dense prediction tasks, showing its great potential to reduce the large costs of medical image collection and dense annotation. However, the properties of medical images make unreliable correspondence discovery, bringing an open problem of large-scale false positive and negative (FP&N) pairs in DCRL. In this paper, we propose GEoMetric vIsual deNse sImilarity (GEMINI) learning which embeds the homeomorphism prior to DCRL and enables a reliable correspondence discovery for effective dense contrast. We propose a deformable homeomorphism learning (DHL) which models the homeomorphism of medical images and learns to estimate a deformable mapping to predict the pixels’ correspondence under topological preservation. It effectively reduces the searching space of pairing and drives an implicit and soft learning of negative pairs via a gradient. We also propose a geometric semantic similarity (GSS) which extracts semantic information in features to measure the alignment degree for the correspondence learning. It will promote the learning efficiency and performance of deformation, constructing positive pairs reliably. We implement two practical variants on two typical representation learning tasks in our experiments. Our promising results on seven datasets which outperform the existing methods show our great superiority. We will release our code on a companion link: https://github.com/YutingHe-list/GEMINI.

密集对比表示学习(DCRL)极大地提高了图像密集预测任务的学习效率,显示出其在降低医学图像收集和密集注释的巨大成本方面的巨大潜力。然而,医学图像的特性导致了对应发现的不可靠性,从而在DCRL中提出了大规模误报和漏报(FP&N)对的开放问题。在本文中,我们提出了GEoMetric vIsual deNse sImilarity(GEMINI)学习,它将同胚先验知识嵌入到DCRL中,为实现可靠对应发现和有效的密集对比提供了支持。我们提出了可变形同胚学习(DHL),它模拟医学图像的同胚关系并学习估计可变形映射以预测拓扑保持下的像素对应关系。这有效地减少了配对搜索空间,并通过梯度驱动了负对的隐式和软学习。我们还提出了几何语义相似性(GSS),它提取特征中的语义信息来测量对应学习的对齐程度。它将提高变形的学习效率和性能,可靠地构建正对。我们在两个典型表示学习任务上实现了两个实用变体进行实验。我们在七个数据集上的令人鼓舞的结果优于现有方法,显示出我们的巨大优势。我们将在以下链接发布我们的代码:https://github.com/YutingHe-list/GEMINI。

论文及项目相关链接

PDF Accepted by T-PAMI 2025

Summary

DCRL在图像密集预测任务中提高了学习效率,但在医学图像中存在对应关系不可靠的问题。本文提出GEMINI学习,嵌入homeomorphism先验,实现可靠的对应关系发现,有效减少DCRL中的FP&N对。通过DHL建模医学图像homeomorphism并学习估计变形映射来预测像素对应关系,有效减少配对搜索空间。同时提出GSS,提取特征中的语义信息来测量对应关系的学习对齐程度,提高学习和变形效率。实验在两个典型表示学习任务上实现两种实用变体,在七个数据集上的优异表现证明了其优越性。

Key Takeaways

  1. DCRL在医学图像密集预测任务中表现出巨大潜力,但存在对应关系不可靠的问题。
  2. 本文提出GEMINI学习,嵌入homeomorphism先验,解决DCRL中的FP&N对问题。
  3. DHL建模医学图像homeomorphism并学习估计变形映射以预测像素对应关系。
  4. GSS用于提取特征中的语义信息,提高学习和变形效率。
  5. 实验证明GEMINI在七个数据集上的表现优于现有方法。
  6. 代码将在相关链接上发布。

Cool Papers

点此查看论文截图

L2GNet: Optimal Local-to-Global Representation of Anatomical Structures for Generalized Medical Image Segmentation

Authors:Vandan Gorade, Sparsh Mittal, Neethi Dasu, Rekha Singhal, KC Santosh, Debesh Jha

Continuous Latent Space (CLS) and Discrete Latent Space (DLS) models, like AttnUNet and VQUNet, have excelled in medical image segmentation. In contrast, Synergistic Continuous and Discrete Latent Space (CDLS) models show promise in handling fine and coarse-grained information. However, they struggle with modeling long-range dependencies. CLS or CDLS-based models, such as TransUNet or SynergyNet are adept at capturing long-range dependencies. Since they rely heavily on feature pooling or aggregation using self-attention, they may capture dependencies among redundant regions. This hinders comprehension of anatomical structure content, poses challenges in modeling intra-class and inter-class dependencies, increases false negatives and compromises generalization. Addressing these issues, we propose L2GNet, which learns global dependencies by relating discrete codes obtained from DLS using optimal transport and aligning codes on a trainable reference. L2GNet achieves discriminative on-the-fly representation learning without an additional weight matrix in self-attention models, making it computationally efficient for medical applications. Extensive experiments on multi-organ segmentation and cardiac datasets demonstrate L2GNet’s superiority over state-of-the-art methods, including the CDLS method SynergyNet, offering an novel approach to enhance deep learning models’ performance in medical image analysis.

连续潜在空间(CLS)和离散潜在空间(DLS)模型,如AttnUNet和VQUNet,在医学图像分割方面表现出色。相比之下,协同连续和离散潜在空间(CDLS)模型在处理精细和粗糙粒度信息方面显示出潜力。然而,它们在建模长距离依赖关系方面遇到困难。基于CLS或CDLS的模型,如TransUNet或SynergyNet,擅长捕捉长距离依赖关系。由于它们大量依赖使用自注意力的特征池化或聚合,因此可能会捕获冗余区域之间的依赖关系。这妨碍了对解剖结构内容的理解,对建模类内和类间依赖关系带来挑战,增加了假阴性,并损害了泛化能力。为了解决这些问题,我们提出了L2GNet。它通过利用离散代码(从DLS获得)在最优传输中进行关联,并在可训练参考上进行代码对齐,从而学习全局依赖关系。L2GNet实现了判别式的即时表示学习,无需在自注意力模型中使用额外的权重矩阵,使其在计算上非常适合医学应用。在多器官分割和心脏数据集上的大量实验表明,L2GNet优于包括SynergyNet在内的最新方法,为增强深度学习模型在医学图像分析中的性能提供了一种新方法。

论文及项目相关链接

PDF

Summary

基于连续潜在空间(CLS)和离散潜在空间(DLS)模型的AttnUNet和VQUNet在医学图像分割中表现出色。协同连续和离散潜在空间(CDLS)模型在处理精细和粗糙粒度信息方面展现出潜力,但在处理长距离依赖关系时遇到困难。TransUNet或SynergyNet等基于CLS或CDLS的模型擅长捕捉长距离依赖关系,但可能因依赖特征池化或自注意力聚合而捕捉冗余区域的依赖关系,这会影响对解剖结构内容的理解,建模类内和类间依赖关系时面临挑战,增加误检并影响泛化能力。为解决这些问题,我们提出了L2GNet,它通过利用离散代码学习全局依赖关系,使用最优传输将DLS中的代码关联起来,并在可训练参考上进行代码对齐。L2GNet实现了判别式的即时表示学习,无需在自注意力模型中使用额外的权重矩阵,使其在计算效率上适合医学应用。在多部位分割和心脏数据集上的实验表明,L2GNet优于包括SynergyNet在内的最新方法,为增强深度学习模型在医学图像分析中的性能提供了新颖的方法。

Key Takeaways

  1. CLS和DLS模型(如AttnUNet和VQUNet)在医学图像分割中表现优异。
  2. CDLS模型在处理精细和粗糙粒度信息时具有潜力,但在处理长距离依赖关系时遇到困难。
  3. CLS或CDLS模型(如TransUNet和SynergyNet)擅长捕捉长距离依赖,但可能受到冗余区域依赖的影响。
  4. L2GNet通过结合DLS的离散代码学习全局依赖关系,使用最优传输和代码对齐来提升性能。
  5. L2GNet实现了判别式的即时表示学习,提高了计算效率,特别适合医学应用。
  6. L2GNet在多个数据集上的表现优于现有方法,包括处理复杂医学图像分割任务的SynergyNet。

Cool Papers

点此查看论文截图

Towards Consistent and Controllable Image Synthesis for Face Editing

Authors:Mengting Wei, Tuomas Varanka, Yante Li, Xingxun Jiang, Huai-Qian Khor, Guoying Zhao

Face editing methods, essential for tasks like virtual avatars, digital human synthesis and identity preservation, have traditionally been built upon GAN-based techniques, while recent focus has shifted to diffusion-based models due to their success in image reconstruction. However, diffusion models still face challenges in controlling specific attributes and preserving the consistency of other unchanged attributes especially the identity characteristics. To address these issues and facilitate more convenient editing of face images, we propose a novel approach that leverages the power of Stable-Diffusion (SD) models and crude 3D face models to control the lighting, facial expression and head pose of a portrait photo. We observe that this task essentially involves the combinations of target background, identity and face attributes aimed to edit. We strive to sufficiently disentangle the control of these factors to enable consistency of face editing. Specifically, our method, coined as RigFace, contains: 1) A Spatial Attribute Encoder that provides presise and decoupled conditions of background, pose, expression and lighting; 2) A high-consistency FaceFusion method that transfers identity features from the Identity Encoder to the denoising UNet of a pre-trained SD model; 3) An Attribute Rigger that injects those conditions into the denoising UNet. Our model achieves comparable or even superior performance in both identity preservation and photorealism compared to existing face editing models. Code is publicly available at https://github.com/weimengting/RigFace.

面部编辑方法对于虚拟化身、数字人类合成和身份保留等任务至关重要,传统上基于生成对抗网络(GAN)技术构建。然而,最近的关注已经转向扩散模型,因为它们在图重建方面取得了成功。然而,扩散模型在控制特定属性以及保持其他未更改属性的一致性方面仍然存在挑战,尤其是身份特征。为了解决这些问题并方便对面部图像进行更便捷的编辑,我们提出了一种新方法,它利用Stable-Diffusion(SD)模型和粗略的3D面部模型的力量来控制肖像照片的光线、面部表情和头部姿势。我们发现此任务主要涉及要编辑的目标背景、身份和面部属性。我们努力充分解开这些因素的控制以实现面部编辑的一致性。具体来说,我们的方法(被称为RigFace)包括:1)空间属性编码器,它提供精确且解耦的背景、姿势、表情和光线条件;2)高一致性FaceFusion方法,将身份特征从身份编码器转移到预训练SD模型的降噪UNet中;3)属性触发器将这些条件注入到降噪UNet中。与现有的面部编辑模型相比,我们的模型在身份保留和逼真度方面实现了相当或更好的性能。代码已公开在https://github.com/weimengting/RigFace。

论文及项目相关链接

PDF

Summary

本文介绍了一种结合Stable-Diffusion模型和简单3D面部模型的新方法,用于控制肖像照片的光线、面部表情和头部姿势。该方法通过精确且解耦的条件,实现背景、姿势、表达和光照的控制,同时保持面部编辑的一致性。该方法被称为RigFace,包括空间属性编码器、高一致性FaceFusion方法和属性触发器,可实现与现有面部编辑模型相比拟甚至更优越的身份保留和逼真度。

Key Takeaways

  1. 面部编辑方法对于虚拟化身、数字人类合成和身份保留等任务至关重要。
  2. 传统方法主要基于GAN技术,而最近的研究焦点已转向扩散模型,因其成功应用于图像重建。
  3. 扩散模型在控制特定属性和保持未改变属性的一致性方面存在挑战,特别是身份特征。
  4. 提出了一种结合Stable-Diffusion模型和简单3D面部模型的新方法,以控制肖像照片的光线、面部表情和头部姿势。
  5. 该方法通过精确且解耦的条件实现背景、姿势、表达和光照的控制,并保持面部编辑的一致性。
  6. RigFace方法包括空间属性编码器、高一致性FaceFusion方法和属性触发器。

Cool Papers

点此查看论文截图

LEAD: Large Foundation Model for EEG-Based Alzheimer’s Disease Detection

Authors:Yihe Wang, Nan Huang, Nadia Mammone, Marco Cecchi, Xiang Zhang

Electroencephalogram (EEG) provides a non-invasive, highly accessible, and cost-effective solution for Alzheimer’s Disease (AD) detection. However, existing methods, whether based on manual feature extraction or deep learning, face two major challenges: the lack of large-scale datasets for robust feature learning and evaluation, and poor detection performance due to inter-subject variations. To address these challenges, we curate an EEG-AD corpus containing 813 subjects, which forms the world’s largest EEG-AD dataset to the best of our knowledge. Using this unique dataset, we propose LEAD, the first large foundation model for EEG-based AD detection. Our method encompasses an entire pipeline, from data selection and preprocessing to self-supervised contrastive pretraining, fine-tuning, and key setups such as subject-independent evaluation and majority voting for subject-level detection. We pre-train the model on 11 EEG datasets and unified fine-tune it on 5 AD datasets. Our self-supervised pre-training design includes sample-level and subject-level contrasting to extract useful general EEG features. Fine-tuning is performed on 5 channel-aligned datasets together. The backbone encoder incorporates temporal and channel embeddings to capture features across both temporal and spatial dimensions. Our method demonstrates outstanding AD detection performance, achieving up to a 9.86% increase in F1 score at the sample-level and up to a 9.31% at the subject-level compared to state-of-the-art methods. The results of our model strongly confirm the effectiveness of contrastive pre-training and channel-aligned unified fine-tuning for addressing inter-subject variation. The source code is at https://github.com/DL4mHealth/LEAD.

脑电图(EEG)为检测阿尔茨海默症(AD)提供了一种非侵入性、易于获取且成本效益高的解决方案。然而,现有方法,无论是基于手动特征提取还是深度学习,都面临两大挑战:缺乏用于稳健特征学习和评估的大规模数据集,以及因受试者间差异导致的检测性能不佳。为了应对这些挑战,我们精心策划了一个EEG-AD语料库,包含813名受试者,据我们所知,这是世界上最大的EEG-AD数据集。使用该独特数据集,我们提出了LEAD——首个用于EEG基AD检测的大型基础模型。我们的方法涵盖了整个流程,从数据选择、预处理到自我监督对比预训练、微调以及主体独立评估、主体层面检测的多数表决等关键设置。我们在11个EEG数据集上预训练模型,并在5个AD数据集上进行统一微调。我们的自我监督预训练设计包括样本级和主体级的对比,以提取有用的通用EEG特征。微调是在5个通道对齐的数据集上一起进行的。主干编码器结合时间嵌入和通道嵌入,以捕获时间和空间两个维度的特征。我们的方法表现出了卓越的AD检测性能,在样本级上F1分数提高了9.86%,在主体级上提高了9.31%,与最先进的方法相比。我们模型的结果强烈地证实了对比预训练和通道对齐统一微调在解决受试者间差异方面的有效性。源代码位于https://github.com/DL4mHealth/LEAD。

论文及项目相关链接

PDF

Summary

本文主要介绍了使用脑电图(EEG)进行阿尔茨海默病(AD)检测的一种新方法。针对现有方法面临的挑战,如缺乏大规模数据集和主体间差异导致的检测性能不佳,作者创建了一个包含813名受试者的EEG-AD语料库,并基于此提出了LEAD模型。该模型采用从数据选择、预处理到自监督对比预训练、微调等一系列流程,并采用主体独立评估、多数投票等关键设置进行主体层面的检测。模型在多个数据集上进行预训练和微调,采用样本级和主体级对比设计提取通用EEG特征。该方法在AD检测方面表现出卓越性能,与最新方法相比,样本级F1分数提高了9.86%,主体级提高了9.31%。

Key Takeaways

  1. EEG为阿尔茨海默病检测提供了非侵入性、高度可及且成本效益高的解决方案。
  2. 现有EEG-AD检测方法面临缺乏大规模数据集和主体间差异的挑战。
  3. 引入了包含813名受试者的世界最大EEG-AD数据集。
  4. 提出了LEAD模型,采用从数据选择到预训练、微调等一系列流程进行AD检测。
  5. 模型采用样本级和主体级对比设计进行自监督预训练,以提取通用EEG特征。
  6. LEAD模型在AD检测方面表现出卓越性能,与最新方法相比有明显提升。

Cool Papers

点此查看论文截图

Generating crossmodal gene expression from cancer histopathology improves multimodal AI predictions

Authors:Samiran Dey, Christopher R. S. Banerji, Partha Basuchowdhuri, Sanjoy K. Saha, Deepak Parashar, Tapabrata Chakraborti

Emerging research has highlighted that artificial intelligence based multimodal fusion of digital pathology and transcriptomic features can improve cancer diagnosis (grading/subtyping) and prognosis (survival risk) prediction. However, such direct fusion for joint decision is impractical in real clinical settings, where histopathology is still the gold standard for diagnosis and transcriptomic tests are rarely requested, at least in the public healthcare system. With our novel diffusion based crossmodal generative AI model PathoGen, we show that genomic expressions synthesized from digital histopathology jointly predicts cancer grading and patient survival risk with high accuracy (state-of-the-art performance), certainty (through conformal coverage guarantee) and interpretability (through distributed attention maps). PathoGen code is available for open use by the research community through GitHub at https://github.com/Samiran-Dey/PathoGen.

最新的研究强调了基于人工智能的数字病理和转录组特征的多模式融合能够提升癌症诊断(分级/亚型)和预后(生存风险)预测的准确性。然而,在实际的临床环境中,直接融合进行联合决策并不实用。在那里,组织病理学仍然是诊断的金标准,转录组测试很少被要求(至少在公共医疗体系中)。通过使用我们新型基于扩散的跨模态生成人工智能模型PathoGen,我们证明了由数字病理学合成的基因表达能够联合预测癌症分级和患者生存风险,具有高准确性(最先进的性能)、确定性(通过一致覆盖保证)和可解释性(通过分布式注意力图)。PathoGen代码可通过GitHub供研究界开放使用,网址为:https://github.com/Samiran-Dey/PathoGen。

论文及项目相关链接

PDF

Summary

新兴研究表明,基于人工智能的多模态融合数字病理与转录组特征可提升癌症诊断(分级/亚型)和预后(生存风险)预测的准确性。然而,在实际临床环境中直接融合进行联合决策并不现实,因为组织病理学仍是诊断的金标准,转录组测试在公共医疗体系中很少使用。通过我们的新型扩散基础跨模态生成人工智能模型PathoGen,我们展示了从数字病理学合成的基因表达能够精准地联合预测癌症分级和患者生存风险,同时具备高度准确性、可靠性和可解释性。PathoGen代码已公开供研究社区使用,可通过GitHub访问:链接

Key Takeaways

  1. 人工智能多模态融合能提高癌症诊断与预后预测的准确性。
  2. 目前直接融合进行联合决策在实际临床环境中不现实。
  3. 组织病理学仍是诊断的金标准,转录组测试使用较少。
  4. PathoGen模型可从数字病理学合成基因表达进行预测。
  5. PathoGen模型预测癌症分级和患者生存风险具备高度准确性、可靠性和可解释性。
  6. PathoGen代码已公开,供研究社区使用。

Cool Papers

点此查看论文截图

Beyond-Labels: Advancing Open-Vocabulary Segmentation With Vision-Language Models

Authors:Muhammad Atta ur Rahman

Self-supervised learning can resolve numerous image or linguistic processing problems when effectively trained. This study investigated simple yet efficient methods for adapting previously learned foundation models for open-vocabulary semantic segmentation tasks. Our research proposed “Beyond-Labels,” a lightweight transformer-based fusion module that uses a handful of image segmentation data to fuse frozen image representations with language concepts. This strategy allows the model to successfully actualize enormous knowledge from pretrained models without requiring extensive retraining, making the model data-efficient and scalable. Furthermore, we efficiently captured positional information in images using Fourier embeddings, thus improving the generalization across various image sizes, addressing one of the key limitations of previous methods. Extensive ablation tests were performed to investigate the important components of our proposed method; when tested against the common benchmark PASCAL-5i, it demonstrated superior performance despite being trained on frozen image and language characteristics.

自监督学习在得到有效的训练后,可以解决许多图像或语言处理方面的问题。本研究探讨了适应先前学习的基础模型用于开放词汇语义分割任务的简单而高效的方法。我们的研究提出了“超越标签”的概念,这是一个轻量级的基于变压器的融合模块,它使用少量的图像分割数据来融合冻结的图像表示与语言概念。这一策略使得模型能够在不需要大量重新训练的情况下,成功实现预训练模型中的巨大知识,使模型具有数据高效性和可扩展性。此外,我们利用傅里叶嵌入有效地捕获了图像中的位置信息,从而提高了不同图像大小之间的泛化能力,解决了以前方法的关键局限性之一。我们进行了广泛的消融试验,以研究我们提出方法的重要组成;在针对通用基准PASCAL-5i的测试中,即使在冻结的图像和语言特征上进行的训练,也表现出了卓越的性能。

论文及项目相关链接

PDF

Summary
本研究探讨了简单而高效的方法,用于将先前学习的基本模型适应于开放词汇语义分割任务。研究提出了“超越标签”的轻量级transformer融合模块,该模块利用少量的图像分割数据将冻结的图像表示与语言概念相融合。这种方法使模型能够在不需要大量重新训练的情况下成功实现预训练模型中的巨大知识,使模型更加高效且可扩展。此外,通过使用傅里叶嵌入有效地捕获图像中的位置信息,从而提高了模型在不同图像大小上的泛化能力,解决了之前方法的局限性之一。通过广泛的消融测试验证了所提方法的重要组件,针对常用基准PASCAL-5i的测试显示了优越的性能表现,即使在冻结的图像和语言特征上进行训练也是如此。

Key Takeaways

  1. 本研究利用自监督学习解决图像或语言处理问题。
  2. 提出了一种名为“超越标签”的融合模块,该模块基于轻量级transformer。
  3. “超越标签”方法使用少量图像分割数据融合冻结的图像表示与语言概念。
  4. 模型无需大量重新训练即可实现预训练模型中的知识,提高数据效率和可扩展性。
  5. 通过傅里叶嵌入捕获图像中的位置信息,改进了模型在不同图像大小上的泛化能力。
  6. 消融测试验证了所提方法的关键组件。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
TTS TTS
TTS 方向最新论文已更新,请持续关注 Update in 2025-02-12 Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
2025-02-12
下一篇 
Diffusion Models Diffusion Models
Diffusion Models 方向最新论文已更新,请持续关注 Update in 2025-02-12 A Large-scale AI-generated Image Inpainting Benchmark
  目录