发布日期: 2025-11-20

更新日期: 2025-11-27

文章字数: 3.2k

阅读时长: 13 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-11-20 更新

CD-DPE: Dual-Prompt Expert Network based on Convolutional Dictionary Feature Decoupling for Multi-Contrast MRI Super-Resolution

Authors:Xianming Gu, Lihui Wang, Ying Cao, Zeyu Deng, Yingfeng Ou, Guodong Hu, Yi Chen

Multi-contrast magnetic resonance imaging (MRI) super-resolution intends to reconstruct high-resolution (HR) images from low-resolution (LR) scans by leveraging structural information present in HR reference images acquired with different contrasts. This technique enhances anatomical detail and soft tissue differentiation, which is vital for early diagnosis and clinical decision-making. However, inherent contrasts disparities between modalities pose fundamental challenges in effectively utilizing reference image textures to guide target image reconstruction, often resulting in suboptimal feature integration. To address this issue, we propose a dual-prompt expert network based on a convolutional dictionary feature decoupling (CD-DPE) strategy for multi-contrast MRI super-resolution. Specifically, we introduce an iterative convolutional dictionary feature decoupling module (CD-FDM) to separate features into cross-contrast and intra-contrast components, thereby reducing redundancy and interference. To fully integrate these features, a novel dual-prompt feature fusion expert module (DP-FFEM) is proposed. This module uses a frequency prompt to guide the selection of relevant reference features for incorporation into the target image, while an adaptive routing prompt determines the optimal method for fusing reference and target features to enhance reconstruction quality. Extensive experiments on public multi-contrast MRI datasets demonstrate that CD-DPE outperforms state-of-the-art methods in reconstructing fine details. Additionally, experiments on unseen datasets demonstrated that CD-DPE exhibits strong generalization capabilities.

多对比度磁共振成像（MRI）超分辨率技术旨在利用从不同对比度获取的高分辨率（HR）参考图像中的结构信息，从低分辨率（LR）扫描中重建高分辨率图像。此技术可增强解剖细节和软组织分化，对于早期诊断和临床决策至关重要。然而，不同模式之间固有的对比度差异，在有效利用参考图像纹理以指导目标图像重建方面构成了根本挑战，通常导致特征融合不佳。为了解决这个问题，我们提出了一种基于卷积字典特征解耦（CD-DPE）策略的多对比度MRI超分辨率双提示专家网络。具体来说，我们引入了一个迭代卷积字典特征解耦模块（CD-FDM），将特征分离为跨对比度和内对比度成分，从而减少冗余和干扰。为了充分融合这些特征，我们提出了一种新型的双提示特征融合专家模块（DP-FFEM）。该模块使用频率提示来指导选择相关的参考特征，并将其合并到目标图像中，而自适应路由提示则确定融合参考和目标特征的最佳方法，以提高重建质量。在公共多对比度MRI数据集上的广泛实验表明，CD-DPE在重建细节方面优于现有先进技术。此外，在未见数据集上的实验表明，CD-DPE具有较强的泛化能力。

论文及项目相关链接

PDF Accepted by AAAI 2026

Summary

本文介绍了基于多对比度磁共振成像（MRI）的超分辨率技术，旨在利用从高分辨率参考图像中获取的结构信息，从低分辨率扫描中重建高分辨率图像。文章提出了一种基于卷积字典特征解耦（CD-DPE）策略的双提示专家网络来解决不同对比度之间的内在差异带来的挑战，该网络通过分离特征并减少冗余和干扰，从而提高了解构质量和细节恢复能力。在公共多对比度MRI数据集上的实验表明，CD-DPE在重建细节方面优于现有方法，并在未见数据集上展现出强大的泛化能力。

Key Takeaways

多对比度磁共振成像（MRI）超分辨率技术旨在从低分辨率扫描中重建高分辨率图像。
利用高分辨率参考图像中的结构信息来提高图像分辨率和细节。
面临不同对比度带来的固有差异挑战，影响目标图像重建效果。
提出一种基于卷积字典特征解耦（CD-DPE）策略的双提示专家网络来解决这一问题。
通过分离特征并减少冗余和干扰，提高重建质量和细节恢复能力。
在公共数据集上的实验表明，CD-DPE在细节重建方面优于现有方法。

Cool Papers

点此查看论文截图

Seeing and Knowing in the Wild: Open-domain Visual Entity Recognition with Large-scale Knowledge Graphs via Contrastive Learning

Authors:Hongkuan Zhou, Lavdim Halilaj, Sebastian Monka, Stefan Schmid, Yuqicheng Zhu, Jingcheng Wu, Nadeem Nazer, Steffen Staab

Open-domain visual entity recognition aims to identify and link entities depicted in images to a vast and evolving set of real-world concepts, such as those found in Wikidata. Unlike conventional classification tasks with fixed label sets, it operates under open-set conditions, where most target entities are unseen during training and exhibit long-tail distributions. This makes the task inherently challenging due to limited supervision, high visual ambiguity, and the need for semantic disambiguation. We propose a Knowledge-guided Contrastive Learning (KnowCoL) framework that combines both images and text descriptions into a shared semantic space grounded by structured information from Wikidata. By abstracting visual and textual inputs to a conceptual level, the model leverages entity descriptions, type hierarchies, and relational context to support zero-shot entity recognition. We evaluate our approach on the OVEN benchmark, a large-scale open-domain visual recognition dataset with Wikidata IDs as the label space. Our experiments show that using visual, textual, and structured knowledge greatly improves accuracy, especially for rare and unseen entities. Our smallest model improves the accuracy on unseen entities by 10.5% compared to the state-of-the-art, despite being 35 times smaller.

开放域视觉实体识别旨在识别图像中的实体并将其链接到大量不断变化的现实世界概念集合，如WikiData中的概念。与具有固定标签集的常规分类任务不同，它在开放集条件下运行，其中大多数目标实体在训练期间不可见，并且呈现长尾分布。由于监督有限、视觉歧义高以及需要进行语义消歧，这使得任务具有固有的挑战性。我们提出了一个知识引导对比学习（KnowCoL）框架，它将图像和文本描述结合到一个由WikiData的结构信息构成的共享语义空间中。通过将视觉和文本输入抽象到概念层面，该模型利用实体描述、类型层次结构和关系上下文来支持零样本实体识别。我们在OVEN基准测试集上评估了我们的方法，这是一个大规模开放域视觉识别数据集，以WikiData ID作为标签空间。我们的实验表明，使用视觉、文本和结构知识可以大大提高准确性，特别是对于稀有和未见过的实体。与最新技术相比，我们的最小模型在未见过的实体上的准确度提高了10.5%，尽管其大小仅为最新模型的35分之一。

论文及项目相关链接

PDF Accepted by AAAI2026

Summary
本文介绍了面向开放域视觉实体识别的知识引导对比学习框架。该框架结合图像和文本描述，在WikiData的结构信息支持下，构建一个共享语义空间。通过抽象视觉和文本输入到概念层面，利用实体描述、类型层次和关系上下文来支持零射击实体识别。在OVEN基准测试集上的实验表明，使用视觉、文本和结构化知识大大提高了准确性，特别是对于罕见和未见过的实体。

Key Takeaways

开放域视觉实体识别旨在将图像中的实体与不断变化的现实世界概念集联系起来，如WikiData中的概念。
该任务面临有限监督、高视觉模糊性和语义歧义消解的挑战。
提出了知识引导对比学习（KnowCoL）框架，结合图像和文本描述，在WikiData的结构信息支持下，构建共享语义空间。
模型通过抽象视觉和文本输入到概念层面，利用实体描述、类型层次和关系上下文支持零射击实体识别。
在OVEN基准测试集上，该框架提高了识别准确率，特别是针对罕见和未见过的实体。
与最新技术相比，即使模型规模较小（仅相当于最新技术的35分之一），对未见实体的识别准确率仍提高了10.5%。

Cool Papers

点此查看论文截图

Efficient Fourier Filtering Network with Contrastive Learning for AAV-based Unaligned Bimodal Salient Object Detection

Authors:Pengfei Lyu, Pak-Hei Yeung, Xiaosheng Yu, Xiufei Cheng, Chengdong Wu, Jagath C. Rajapakse

Autonomous aerial vehicle (AAV)-based bi-modal salient object detection (BSOD) aims to segment salient objects in a scene utilizing complementary cues in unaligned RGB and thermal image pairs. However, the high computational expense of existing AAV-based BSOD models limits their applicability to real-world AAV devices. To address this problem, we propose an efficient Fourier filter network with contrastive learning that achieves both real-time and accurate performance. Specifically, we first design a semantic contrastive alignment loss to align the two modalities at the semantic level, which facilitates mutual refinement in a parameter-free way. Second, inspired by the fast Fourier transform that obtains global relevance in linear complexity, we propose synchronized alignment fusion, which aligns and fuses bi-modal features in the channel and spatial dimensions by a hierarchical filtering mechanism. Our proposed model, AlignSal, reduces the number of parameters by 70.0%, decreases the floating point operations by 49.4%, and increases the inference speed by 152.5% compared to the cutting-edge BSOD model (i.e., MROS). Extensive experiments on the AAV RGB-T 2400 and seven bi-modal dense prediction datasets demonstrate that AlignSal achieves both real-time inference speed and better performance and generalizability compared to nineteen state-of-the-art models across most evaluation metrics. In addition, our ablation studies further verify AlignSal’s potential in boosting the performance of existing aligned BSOD models on AAV-based unaligned data. The code is available at: https://github.com/JoshuaLPF/AlignSal.

基于自主空中车辆（AAV）的双模态显著目标检测（BSOD）旨在利用未对齐的RGB和热成像图像对中的互补线索来分割场景中的显著目标。然而，现有基于AAV的BSOD模型的高计算成本限制了它们在真实世界AAV设备中的应用。为了解决这个问题，我们提出了一种具有对比学习的高效傅里叶滤波器网络，实现了实时和准确的性能。具体来说，我们首先设计了一种语义对比对齐损失，在语义级别对齐两种模态，这有助于以无参数的方式实现相互细化。其次，受到快速傅里叶变换以线性复杂度获得全局相关性的启发，我们提出了同步对齐融合，通过分层过滤机制在通道和空间维度上对齐和融合双模态特征。我们提出的模型AlignSal，与最先进的BSOD模型（即MROS）相比，减少了70.0%的参数，减少了49.4%的浮点运算，并提高了152.5%的推理速度。在AAV RGB-T 2400和七个双模态密集预测数据集上的大量实验表明，AlignSal实现了实时推理速度，并在大多数评估指标上表现出更好的性能和泛化能力。此外，我们的消融研究进一步验证了AlignSal在提升现有对齐BSOD模型在基于AAV的未对齐数据上的潜力。代码可在https://github.com/JoshuaLPF/AlignSal处获取。

论文及项目相关链接

PDF Accepted by TGRS 2025

Summary

该文本介绍了一种基于自主无人飞行车辆（AAV）的双模态显著性目标检测（BSOD）模型。为提高模型的实时性能和准确性，提出了一种高效的傅里叶滤波器网络及对比学习方法。模型通过语义对比对齐损失实现两种模态的语义对齐，并通过傅里叶变换启发下的同步对齐融合机制实现特征融合。相比现有先进模型，AlignSal在参数减少70%、浮点运算减少49.4%、推理速度提高152.5%的同时，在多个数据集上实现了更好的性能和泛化能力。

Key Takeaways

AAV-based BSOD旨在利用RGB和红外图像中的互补线索来检测场景中的显著目标。
提出了一种高效的傅里叶滤波器网络模型AlignSal，提高了实时性和准确性。
语义对比对齐损失用于在语义级别上对齐两种模态的数据，从而实现无参数相互改进。
结合傅里叶变换快速获得全局相关性的思想，提出了同步对齐融合机制，实现了双模态特征的通道和空间维度对齐和融合。
AlignSal在参数、浮点运算和推理速度方面相比现有先进模型有明显优势。
在多个数据集上的实验表明，AlignSal的性能和泛化能力优于其他先进模型。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-11-20/%E6%97%A0%E7%9B%91%E7%9D%A3_%E5%8D%8A%E7%9B%91%E7%9D%A3_%E5%AF%B9%E6%AF%94%E5%AD%A6%E4%B9%A0/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

无监督/半监督/对比学习

Speech

Speech 方向最新论文已更新，请持续关注 Update in 2025-11-20 IMSE Efficient U-Net-based Speech Enhancement using Inception Depthwise Convolution and Amplitude-Aware Linear Attention

2025-11-20 Speech

Speech

检测/分割/跟踪

检测/分割/跟踪方向最新论文已更新，请持续关注 Update in 2025-11-20 SAM2MOT A Novel Paradigm of Multi-Object Tracking by Segmentation

2025-11-20 检测/分割/跟踪

检测/分割/跟踪