发布日期: 2025-11-20

更新日期: 2025-11-27

文章字数: 5.5k

阅读时长: 22 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-11-20 更新

A Specialized Large Language Model for Clinical Reasoning and Diagnosis in Rare Diseases

Authors:Tao Yang, Dandan Huang, Yunting Lin, Pengfei Wu, Zhikun Wu, Gangyuan Ma, Yulan Lu, Xinran Dong, Dingpeng Li, Junshuang Ge, Zhiyan Zhang, Xuanzhao Huang, Wenyan Nong, Yao Zhou, Hui Tang, Hongxi Yang, Shijie Zhang, Juan Li, Xiaojun Cao, Lin Yang, Xia Gao, Kaishou Xu, Xiaoqiong Gu, Wen Zhang, Huimin Xia, Li Liu, Wenhao Zhou, Mulin Jun Li

Rare diseases affect hundreds of millions worldwide, yet diagnosis often spans years. Convectional pipelines decouple noisy evidence extraction from downstream inferential diagnosis, and general/medical large language models (LLMs) face scarce real world electronic health records (EHRs), stale domain knowledge, and hallucinations. We assemble a large, domain specialized clinical corpus and a clinician validated reasoning set, and develop RareSeek R1 via staged instruction tuning, chain of thought learning, and graph grounded retrieval. Across multicenter EHR narratives and public benchmarks, RareSeek R1 attains state of the art accuracy, robust generalization, and stability under noisy or overlapping phenotypes. Augmented retrieval yields the largest gains when narratives pair with prioritized variants by resolving ambiguity and aligning candidates to mechanisms. Human studies show performance on par with experienced physicians and consistent gains in assistive use. Notably, transparent reasoning highlights decisive non phenotypic evidence (median 23.1%, such as imaging, interventions, functional tests) underpinning many correct diagnoses. This work advances a narrative first, knowledge integrated reasoning paradigm that shortens the diagnostic odyssey and enables auditable, clinically translatable decision support.

罕见疾病影响全球数亿人，但诊断过程往往长达数年。传统流程会将嘈杂的证据提取与下游推断诊断解耦，而通用/医疗大型语言模型（LLM）面临现实世界电子健康记录（EHRs）稀缺、领域知识陈旧以及幻觉等问题。我们组建了一个大型、专业化的临床语料库和经临床医生验证的推理集，并通过分阶段指令调整、思维链学习和图检索技术，开发了RareSeek R1。在多中心电子健康记录叙述和公共基准测试中，RareSeek R1达到了最先进的准确性、稳健的通用性和在嘈杂或重叠表型下的稳定性。当叙述与优先变体配对时，增强检索会产生最大的收益，通过解决歧义并将候选者与机制对齐。人类研究表明，其表现与经验丰富的医生相当，并且在辅助使用方面表现出一致的收益。值得注意的是，透明的推理凸显了至关重要的非表型证据（占中位数的23.1%，如成像、干预、功能测试等），为许多正确诊断提供了支持。这项工作推进了一种以叙述为主、知识整合的推理范式，缩短了诊断旅程，并提供了可审计的、临床可转化的决策支持。

论文及项目相关链接

PDF 50 pages, 5 figures

Summary

该文本主要介绍了罕见疾病对全球数百万人的影响以及诊断过程的挑战。针对传统诊断流程中存在的问题和大型语言模型在现实世界电子健康记录方面的不足，研究团队构建了一个大型、专业的临床语料库，并开发了名为“RareSeek R1”的系统。该系统通过分阶段指令调整、思维链学习和图检索等技术手段，在多中心电子健康记录叙事和公共基准测试中达到了业界领先的准确度、稳健性和稳定性。特别是结合了优先变异和叙事的系统可以解决诊断中的歧义，并将诊断候选者对应到具体机制上，从而取得了最大的改进效果。人类研究表明，其性能与经验丰富的医生相当，并且在辅助使用方面表现一致。该系统的优势在于透明化的推理过程可以强调许多正确诊断所依赖的非决定性证据，并推动了一种以叙事为基础、整合知识的推理模式的发展，从而缩短诊断旅程并提供可审计的临床决策支持。

Key Takeaways

罕见疾病影响全球数百万人，诊断过程耗时多年。
传统诊断流程存在证据提取与下游推断之间的脱节问题。
大型语言模型在现实世界电子健康记录方面面临挑战，如缺乏数据、领域知识过时和虚构现象。
研究团队构建了大型专业临床语料库和经过医生验证的推理集。
“RareSeek R1”系统通过先进技术达到了业界领先的诊断准确度、稳健性和稳定性。
结合叙事和优先变异技术解决了诊断中的歧义问题，提高了诊断效率。

Cool Papers

点此查看论文截图

A Generative Data Framework with Authentic Supervision for Underwater Image Restoration and Enhancement

Authors:Yufeng Tian, Yifan Chen, Zhe Sun, Libang Chen, Mingyu Dou, Jijun Lu, Ye Zheng, Xuelong Li

Underwater image restoration and enhancement are crucial for correcting color distortion and restoring image details, thereby establishing a fundamental basis for subsequent underwater visual tasks. However, current deep learning methodologies in this area are frequently constrained by the scarcity of high-quality paired datasets. Since it is difficult to obtain pristine reference labels in underwater scenes, existing benchmarks often rely on manually selected results from enhancement algorithms, providing debatable reference images that lack globally consistent color and authentic supervision. This limits the model’s capabilities in color restoration, image enhancement, and generalization. To overcome this limitation, we propose using in-air natural images as unambiguous reference targets and translating them into underwater-degraded versions, thereby constructing synthetic datasets that provide authentic supervision signals for model learning. Specifically, we establish a generative data framework based on unpaired image-to-image translation, producing a large-scale dataset that covers 6 representative underwater degradation types. The framework constructs synthetic datasets with precise ground-truth labels, which facilitate the learning of an accurate mapping from degraded underwater images to their pristine scene appearances. Extensive quantitative and qualitative experiments across 6 representative network architectures and 3 independent test sets show that models trained on our synthetic data achieve comparable or superior color restoration and generalization performance to those trained on existing benchmarks. This research provides a reliable and scalable data-driven solution for underwater image restoration and enhancement. The generated dataset is publicly available at: https://github.com/yftian2025/SynUIEDatasets.git.

水下图像恢复和增强对于校正颜色失真和恢复图像细节至关重要，从而为后续的水下视觉任务奠定了基本基础。然而，目前该领域的深度学习方法经常受到高质量配对数据集稀缺的制约。由于在水下场景中获得原始参考标签很困难，现有基准测试通常依赖于增强算法的手动选择结果，提供的参考图像具有争议性，缺乏全局一致的颜色和真实监督。这限制了模型在颜色恢复、图像增强和通用化方面的能力。为了克服这一局限性，我们建议使用空中的自然图像作为明确的参考目标，并将其转换为水下退化版本，从而构建合成数据集，为模型学习提供真实监督信号。具体来说，我们建立了基于配对图像到图像翻译生成数据框架，生成了一个大规模数据集，涵盖了6种代表性的水下退化类型。该框架构建具有精确地面真实标签的合成数据集，便于学习从退化水下图像到其原始场景外观的准确映射。在6种代表性网络架构和3个独立测试集上的大量定量和定性实验表明，在我们合成数据上训练的模型在颜色恢复和通用化性能上与在现有基准测试上训练的模型相当或更优。该研究为水下图像恢复和增强提供了可靠且可扩展的数据驱动解决方案。生成的数据集可在https://github.com/yftian2025/SynUIEDatasets.git上公开获得。

论文及项目相关链接

PDF This work has been submitted to the IEEE for possible publication

总结
本论文针对水下图像恢复与增强中高质量配对数据集缺乏的问题，提出了利用空中自然图像作为明确参考目标，将其转化为水下退化版本，构建合成数据集的方法。该数据集为模型学习提供了真实的监督信号。建立了一个基于非配对图像到图像翻译的生成数据框架，可以覆盖6种代表性的水下退化类型。实验证明，该合成数据训练的模型在颜色恢复和泛化性能上，与现有基准测试相当或更优。

关键见解

水下图像恢复与增强是后续水下视觉任务的基础，但高质量配对数据集的缺乏限制了模型在颜色恢复、图像增强和泛化方面的能力。
现有基准测试通常依赖于增强算法的手动选择结果，提供的参考图像存在争议，缺乏全局一致颜色和真实监督。
论文建议使用空中自然图像作为明确的参考目标，并将其转化为水下退化版本，以构建合成数据集。
合成数据集具有精确的地标标签，有助于学习从退化水下图像到其原始场景外观的准确映射。
该方法建立了一个基于非配对图像到图像翻译的生成数据框架，覆盖6种代表性的水下退化类型。
实验证明，该合成数据训练的模型在颜色恢复和泛化性能上实现了良好效果，与现有基准测试相当或更优。
生成的数据集已公开发布，可供研究使用。

Cool Papers

点此查看论文截图

GRLoc: Geometric Representation Regression for Visual Localization

Authors:Changyang Li, Xuejian Ma, Lixiang Liu, Zhan Li, Qingan Yan, Yi Xu

Absolute Pose Regression (APR) has emerged as a compelling paradigm for visual localization. However, APR models typically operate as black boxes, directly regressing a 6-DoF pose from a query image, which can lead to memorizing training views rather than understanding 3D scene geometry. In this work, we propose a geometrically-grounded alternative. Inspired by novel view synthesis, which renders images from intermediate geometric representations, we reformulate APR as its inverse that regresses the underlying 3D representations directly from the image, and we name this paradigm Geometric Representation Regression (GRR). Our model explicitly predicts two disentangled geometric representations in the world coordinate system: (1) a ray bundle’s directions to estimate camera rotation, and (2) a corresponding pointmap to estimate camera translation. The final 6-DoF camera pose is then recovered from these geometric components using a differentiable deterministic solver. This disentangled approach, which separates the learned visual-to-geometry mapping from the final pose calculation, introduces a strong geometric prior into the network. We find that the explicit decoupling of rotation and translation predictions measurably boosts performance. We demonstrate state-of-the-art performance on 7-Scenes and Cambridge Landmarks datasets, validating that modeling the inverse rendering process is a more robust path toward generalizable absolute pose estimation.

绝对位姿回归（APR）已成为视觉定位的一种引人注目的范式。然而，APR模型通常作为黑箱操作，直接从查询图像中回归6自由度位姿，这可能导致模型记住训练视图，而不是理解3D场景几何。在这项工作中，我们提出了一种基于几何的替代方法。受新型视图合成的启发，该合成从中间几何表示呈现图像，我们将APR重新定义为其逆过程，直接从图像回归潜在的三维表示，并将这种范式命名为几何表示回归（GRR）。我们的模型显式预测世界坐标系中的两个分离几何表示：（1）估计相机旋转的光线束方向，以及（2）估计相机平移的对应点图。然后，使用可微分的确定性求解器从这些几何组件恢复最终的6自由度相机位姿。这种分离的方法将学习的视觉到几何映射与最终的姿态计算分开，将强烈的几何先验引入网络。我们发现显式地解耦旋转和平移预测可以显著提高性能。我们在7场景和剑桥地标数据集上展示了最先进的性能，验证了模拟逆向渲染过程是实现通用绝对姿态估计的更稳健途径。

论文及项目相关链接

PDF

Summary
将绝对姿态回归（APR）改革为几何表示回归（GRR），直接从图像回归底层三维表示。提出一种基于几何的替代方法，显式预测世界坐标系中的两个几何表示：射线束方向和对应点图，以估计相机旋转和翻译。使用可微确定性求解器从几何组件恢复最终6自由度相机姿态。在7场景和剑桥地标数据集上表现卓越，验证了逆向渲染建模是更稳健的通用绝对姿态估计途径。

Key Takeaways

提出了将绝对姿态回归（APR）改革为几何表示回归（GRR）的新思路。
采用逆向渲染的方法，从图像直接回归底层三维表示。
显式预测世界坐标系中的射线束方向和对应点图，以估计相机旋转和翻译。
使用可微确定性求解器从几何组件恢复相机姿态。
几何表示回归（GRR）在7场景和剑桥地标数据集上表现优越。
对比传统APR模型，GRR模型更具通用性和鲁棒性。

Cool Papers

点此查看论文截图

Crossing Borders: A Multimodal Challenge for Indian Poetry Translation and Image Generation

Authors:Sofia Jamil, Kotla Sai Charan, Sriparna Saha, Koustava Goswami, Joseph K J

Indian poetry, known for its linguistic complexity and deep cultural resonance, has a rich and varied heritage spanning thousands of years. However, its layered meanings, cultural allusions, and sophisticated grammatical constructions often pose challenges for comprehension, especially for non-native speakers or readers unfamiliar with its context and language. Despite its cultural significance, existing works on poetry have largely overlooked Indian language poems. In this paper, we propose the Translation and Image Generation (TAI) framework, leveraging Large Language Models (LLMs) and Latent Diffusion Models through appropriate prompt tuning. Our framework supports the United Nations Sustainable Development Goals of Quality Education (SDG 4) and Reduced Inequalities (SDG 10) by enhancing the accessibility of culturally rich Indian-language poetry to a global audience. It includes (1) a translation module that uses an Odds Ratio Preference Alignment Algorithm to accurately translate morphologically rich poetry into English, and (2) an image generation module that employs a semantic graph to capture tokens, dependencies, and semantic relationships between metaphors and their meanings, to create visually meaningful representations of Indian poems. Our comprehensive experimental evaluation, including both human and quantitative assessments, demonstrates the superiority of TAI Diffusion in poem image generation tasks, outperforming strong baselines. To further address the scarcity of resources for Indian-language poetry, we introduce the Morphologically Rich Indian Language Poems MorphoVerse Dataset, comprising 1,570 poems across 21 low-resource Indian languages. By addressing the gap in poetry translation and visual comprehension, this work aims to broaden accessibility and enrich the reader’s experience.

印度诗歌以其语言上的复杂性和深厚的文化共鸣而著称，拥有跨越数千年的丰富而多样的传统。然而，其层次丰富的含义、文化典故和复杂的语法结构常常构成理解上的挑战，尤其是对于非母语者或对其语境和语言不熟悉的读者。尽管其在文化上具有重要性，但现有关于诗歌的作品大多忽视了印度语言诗歌。在本文中，我们提出了翻译与图像生成（TAI）框架，它借助大型语言模型（LLM）和潜在扩散模型，通过适当的提示调整来支持。我们的框架支持联合国可持续发展目标中的优质教育（SDG 4）和减少不平等（SDG 10），通过提高富含文化的印度语言诗歌的普及性，面向全球受众。它包括（1）一个翻译模块，该模块使用Odds Ratio Preference Alignment Algorithm算法，将形态丰富的诗歌准确地翻译成英语；（2）一个图像生成模块，该模块采用语义图来捕获标记、依赖关系和隐喻及其意义之间的语义关系，以创建印度诗歌的视觉有意义表示。我们的综合实验评估，包括人类和定量评估，证明了TAI Diffusion在诗歌图像生成任务中的优越性，超越了强大的基线。为了进一步解决印度语言诗歌资源稀缺的问题，我们推出了形态丰富印度语言诗歌MorphoVerse数据集，其中包含1570首跨越21种低资源印度语言的诗歌。通过解决诗歌翻译和视觉理解上的差距，这项工作旨在提高普及性并丰富读者的体验。

论文及项目相关链接

PDF

Summary

本文提出了一个名为Translation and Image Generation（TAI）的框架，该框架利用大型语言模型（LLMs）和潜在扩散模型，通过适当的提示调整，支持联合国可持续发展目标中的优质教育和减少不平等目标。该框架包括翻译模块和图像生成模块，前者使用Odds Ratio Preference Alignment Algorithm准确翻译形态丰富的诗歌，后者利用语义图捕捉诗歌中的标记、依赖关系和隐喻之间的语义关系，为印度诗歌创建视觉上有意义的表示。实验评估表明，TAI框架在诗歌图像生成任务中表现优异。此外，还介绍了形态丰富印度语言诗歌MorphoVerse数据集，包含1570首跨越21种低资源印度语言的诗歌。

Key Takeaways

印度诗歌具有复杂的语言和深厚的文化背景，对其理解和欣赏对非母语者或不了解其语境和语言的读者来说是一个挑战。
现有研究忽视了印度语言诗歌的重要性。
提出的Translation and Image Generation（TAI）框架旨在通过利用大型语言模型和潜在扩散模型，增强印度诗歌的全球性可及性。
TAI框架包括翻译模块和图像生成模块，前者能够准确翻译形态丰富的诗歌，后者能够创建视觉上有意义的诗歌表示。
综合实验评估证明了TAI框架在诗歌图像生成任务中的优越性。
引入的MorphoVerse数据集包含多种印度语言的诗歌，有助于解决印度语言诗歌资源稀缺的问题。

Cool Papers

点此查看论文截图

MAVias: Mitigate any Visual Bias

Authors:Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos, Christos Diou

Mitigating biases in computer vision models is an essential step towards the trustworthiness of artificial intelligence models. Existing bias mitigation methods focus on a small set of predefined biases, limiting their applicability in visual datasets where multiple, possibly unknown biases exist. To address this limitation, we introduce MAVias, an open-set bias mitigation approach leveraging foundation models to discover spurious associations between visual attributes and target classes. MAVias first captures a wide variety of visual features in natural language via a foundation image tagging model, and then leverages a large language model to select those visual features defining the target class, resulting in a set of language-coded potential visual biases. We then translate this set of potential biases into vision-language embeddings and introduce an in-processing bias mitigation approach to prevent the model from encoding information related to them. Our experiments on diverse datasets, including CelebA, Waterbirds, ImageNet, and UrbanCars, show that MAVias effectively detects and mitigates a wide range of biases in visual recognition tasks outperforming current state-of-the-art.

减少计算机视觉模型中的偏见是建立人工智能模型可信度的重要步骤。现有的偏见缓解方法主要关注一组预定义的偏见，在存在多种可能未知的偏见的视觉数据集中，这些方法的应用范围有限。为了解决这一局限性，我们引入了 MAVias，这是一种利用基础模型发现视觉属性与目标类别之间偶然联系的开集偏见缓解方法。MAVias 首先通过基础图像标记模型在自然语言中提取各种视觉特征，然后利用大型语言模型选择定义目标类别的视觉特征，形成一组语言编码的潜在视觉偏见。然后，我们将这组潜在的偏见转化为视觉语言嵌入，并引入一种处理过程中的偏见缓解方法，以防止模型编码与偏见相关的信息。我们在包括 CelebA、Waterbirds、ImageNet 和 UrbanCars 等不同数据集上的实验表明，MAVias 能够有效检测和缓解视觉识别任务中的广泛偏见，且性能优于当前最新技术。

论文及项目相关链接

PDF

Summary：

针对人工智能模型的可靠性问题，计算机视觉模型的偏见抑制成为关键环节。现有的偏见缓解方法集中在预设偏见上，不适用于具有多种未知偏见的大型视觉数据集。本文提出了一种新的偏见缓解方法MAVias，它利用基础模型发现视觉属性和目标类别之间的偶然关联。MAVias首先通过图像标注模型捕捉自然语言中的多种视觉特征，然后利用大型语言模型选择定义目标类别的视觉特征，形成语言编码的潜在视觉偏见集。接下来，将潜在偏见集转化为视觉语言嵌入，并引入预处理偏见抑制方法，防止模型编码与之相关的信息。在CelebA、Waterbirds、ImageNet和UrbanCars等数据集上的实验表明，MAVias能有效检测并抑制广泛的视觉识别任务中的偏见，表现优于现有最佳方法。

Key Takeaways：