发布日期: 2025-09-28

更新日期: 2025-11-27

文章字数: 4.8k

阅读时长: 19 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-28 更新

A co-evolving agentic AI system for medical imaging analysis

Authors:Songhao Li, Jonathan Xu, Tiancheng Bao, Yuxuan Liu, Yuchen Liu, Yihang Liu, Lilin Wang, Wenhui Lei, Sheng Wang, Yinuo Xu, Yan Cui, Jialu Yao, Shunsuke Koga, Zhi Huang

Agentic AI is rapidly advancing in healthcare and biomedical research. However, in medical image analysis, their performance and adoption remain limited due to the lack of a robust ecosystem, insufficient toolsets, and the absence of real-time interactive expert feedback. Here we present “TissueLab”, a co-evolving agentic AI system that allows researchers to ask direct questions, automatically plan and generate explainable workflows, and conduct real-time analyses where experts can visualize intermediate results and refine them. TissueLab integrates tool factories across pathology, radiology, and spatial omics domains. By standardizing inputs, outputs, and capabilities of diverse tools, the system determines when and how to invoke them to address research and clinical questions. Across diverse tasks with clinically meaningful quantifications that inform staging, prognosis, and treatment planning, TissueLab achieves state-of-the-art performance compared with end-to-end vision-language models (VLMs) and other agentic AI systems such as GPT-5. Moreover, TissueLab continuously learns from clinicians, evolving toward improved classifiers and more effective decision strategies. With active learning, it delivers accurate results in unseen disease contexts within minutes, without requiring massive datasets or prolonged retraining. Released as a sustainable open-source ecosystem, TissueLab aims to accelerate computational research and translational adoption in medical imaging while establishing a foundation for the next generation of medical AI.

医疗人工智能（Agentic AI）在医疗保健和生物医学研究领域正迅速推进。然而，在医学图像分析方面，由于其缺乏稳健的生态系统、工具集不足以及缺乏实时交互式专家反馈，其性能和采用程度仍然受到限制。在这里，我们推出了“TissueLab”，一个协同进化的医疗人工智能系统，让研究人员能够提出直接问题、自动规划和生成可解释的工作流程，并进行实时分析，专家可以可视化中间结果并进行优化。TissueLab整合了病理学、放射学和空间组学领域的工具工厂。通过标准化各种工具的输出、输入和能力，该系统可以确定何时以及如何调用它们来解决研究和临床问题。在具有临床意义的量化任务的舞台上，TissueLab在分期、预后和治疗计划方面实现了与端到端视觉语言模型（VLM）和其他医疗人工智能系统（如GPT-5）相比的先进性能。此外，TissueLab能够从临床医生那里持续学习，朝着更优秀的分类器和更有效的决策策略发展。通过主动学习，它可以在未见疾病背景下几分钟内提供准确结果，无需大规模数据集或长期再训练。作为可持续的开放源代码生态系统发布，TissueLab旨在加速医学成像的计算研究和翻译应用，同时为下一代医疗人工智能奠定基础。

论文及项目相关链接

PDF

Summary

基于AI的医疗图像分析系统TissueLab集成了多样化工具工厂，可解决不同领域的问题，包括病理学、放射学和空间组学领域。它实现了先进的性能，并能实时分析，允许专家可视化中间结果并进行优化。该系统采用标准化输入、输出和能力，可智能决定何时以及如何调用工具来应对研究和临床问题。通过主动学习，它能在未见疾病背景下快速提供准确结果，无需大规模数据集或长期再训练。作为可持续的开源生态系统，TissueLab旨在加速医疗成像的计算研究和实际应用，为下一代医疗AI奠定基础。

Key Takeaways

TissueLab是一个协同进化的AI系统，允许研究人员直接提问并自动规划工作流程。
它集成了多样化的工具工厂，解决了不同医学领域（如病理学、放射学和空间组学）的问题。
TissueLab可实现先进性能并具备实时分析功能，允许专家可视化中间结果并进行调整优化。
通过标准化工具输入、输出和能力，该系统智能决定如何调用工具来应对临床问题。
该系统具备主动学习功能，可在未见疾病背景下快速提供准确结果。
TissueLab旨在加速医疗成像的计算研究和实际应用过程。

Cool Papers

点此查看论文截图

A Versatile Foundation Model for AI-enabled Mammogram Interpretation

Authors:Fuxiang Huang, Jiayi Zhu, Yunfang Yu, Yu Xie, Yuan Guo, Qingcong Kong, Mingxiang Wu, Xinrui Jiang, Shu Yang, Jiabo Ma, Ziyi Liu, Zhe Xu, Zhixuan Chen, Yujie Tan, Zifan He, Luhui Mao, Xi Wang, Junlin Hou, Lei Zhang, Qiong Luo, Zhenhui Li, Herui Yao, Hao Chen

Breast cancer is the most commonly diagnosed cancer and the leading cause of cancer-related mortality in women globally. Mammography is essential for the early detection and diagnosis of breast lesions. Despite recent progress in foundation models (FMs) for mammogram analysis, their clinical translation remains constrained by several fundamental limitations, including insufficient diversity in training data, limited model generalizability, and a lack of comprehensive evaluation across clinically relevant tasks. Here, we introduce VersaMammo, a versatile foundation model for mammograms, designed to overcome these limitations. We curated the largest multi-institutional mammogram dataset to date, comprising 706,239 images from 21 sources. To improve generalization, we propose a two-stage pre-training strategy to develop VersaMammo, a mammogram foundation model. First, a teacher model is trained via self-supervised learning to extract transferable features from unlabeled mammograms. Then, supervised learning combined with knowledge distillation transfers both features and clinical knowledge into VersaMammo. To ensure a comprehensive evaluation, we established a benchmark comprising 92 specific tasks, including 68 internal tasks and 24 external validation tasks, spanning 5 major clinical task categories: lesion detection, segmentation, classification, image retrieval, and visual question answering. VersaMammo achieves state-of-the-art performance, ranking first in 50 out of 68 specific internal tasks and 20 out of 24 external validation tasks, with average ranks of 1.5 and 1.2, respectively. These results demonstrate its superior generalization and clinical utility, offering a substantial advancement toward reliable and scalable breast cancer screening and diagnosis.

乳腺癌是全球女性中最常见的癌症诊断类型以及主要的癌症致死原因。乳腺X线摄影对于早期发现和诊断乳腺病变至关重要。尽管基础模型（FMs）在乳腺X线摄影分析方面取得了最新进展，但它们在临床翻译方面仍然受到几个基本限制的影响，包括训练数据多样性不足、模型泛化性有限以及缺乏临床相关任务的全面评估。在这里，我们介绍了VersaMammo，这是一个用于乳腺X线摄影的通用基础模型，旨在克服这些限制。我们整理了迄今为止最大的多机构乳腺X线摄影数据集，包含来自21个来源的706,239张图像。为了提高模型的泛化能力，我们提出了一种两阶段预训练策略来开发VersaMammo乳腺X线摄影基础模型。首先，通过自我监督学习训练教师模型，从未标记的乳腺X线摄影中提取可迁移特征。然后，结合知识蒸馏进行有监督学习，将特征和临床知识转移到VersaMammo中。为了确保全面评估，我们建立了一个包含92个特定任务的基准测试，包括68个内部任务和24个外部验证任务，涵盖5大类临床任务：病变检测、分割、分类、图像检索和视觉问答。VersaMammo在多个特定内部任务和外部验证任务中取得了卓越的性能，在68个内部任务中排名第一50个，在24个外部验证任务中排名第一20个，平均排名分别为1.5和1.2。这些结果证明了其出色的泛化能力和临床实用性，为实现可靠且可规模化的乳腺癌筛查和诊断提供了重大进展。

论文及项目相关链接

PDF 64 pages, 7 figures, 40 tables

Summary
全球女性最常见的癌症与主要致死原因是乳腺癌，而乳腺X光摄影在早期检测与诊断乳腺病变上扮演着重要角色。尽管基础模型（FMs）在分析乳腺X光摄影方面有所进展，但其临床转化仍受限于训练数据多样性不足、模型泛化能力有限以及在临床相关任务上的综合评估缺乏等限制。为解决这些问题，我们推出了VersaMammo，这是一款乳腺X光摄影的通用基础模型。我们整理了迄今为止最大的多机构乳腺X光摄影数据集，包含来自21个来源的706,239张图像。通过两阶段预训练策略开发VersaMammo，首先通过自我监督学习训练教师模型以从非标记的乳腺X光片中提取可迁移特征，然后结合监督学习与知识蒸馏将特征与临床知识迁移至VersaMammo。为确保全面评估，我们建立了包含92个特定任务的基准测试，涵盖五大临床任务类别：病变检测、分割、分类、图像检索与视觉问答。VersaMammo在内部与外部验证任务上均取得了卓越的性能，展示了其出色的泛化与临床实用性，为可靠且可规模化的乳腺癌筛查与诊断提供了重大进展。

Key Takeaways

乳腺癌是全球女性中最常见的癌症及主要致死原因，乳腺X光摄影在早期诊断中起关键作用。
当前基础模型在分析乳腺X光摄影方面存在局限性，如数据多样性、模型泛化能力和综合临床评估的不足。
推出VersaMammo模型，旨在克服这些限制，通过两阶段预训练策略提升性能。
使用了大规模的多机构乳腺X光摄影数据集进行训练。
VersaMammo在基准测试中表现卓越，特别是在内部和外部验证任务上。
VersaMammo模型具备高度的临床实用性和泛化能力。

Cool Papers

点此查看论文截图

Intermediate Domain-guided Adaptation for Unsupervised Chorioallantoic Membrane Vessel Segmentation

Authors:Pengwu Song, Zhiping Wang, Peng Yao, Liang Xu, Shuwei Shen, Pengfei Shao, Mingzhai Sun, Ronald X. Xu

The chorioallantoic membrane (CAM) model is a widely used in vivo platform for studying angiogenesis, especially in relation to tumor growth, drug delivery, and vascular biology.Since the topology and morphology of developing blood vessels is a key evaluation metric, accurate vessel segmentation is essential for quantitative analysis of angiogenesis. However, manual segmentation is extremely time-consuming, labor-intensive, and prone to inconsistency due to its subjective nature. Moreover, research on CAM vessel segmentation algorithms remains limited, and the lack of public datasets contributes to poor prediction performance. To address these challenges, we propose an innovative Intermediate Domain-guided Adaptation (IDA) method, which utilizes the similarity between CAM images and retinal images, along with existing public retinal datasets, to perform unsupervised training on CAM images. Specifically, we introduce a Multi-Resolution Asymmetric Translation (MRAT) strategy to generate intermediate images to promote image-level interaction. Then, an Intermediate Domain-guided Contrastive Learning (IDCL) module is developed to disentangle cross-domain feature representations. This method overcomes the limitations of existing unsupervised domain adaptation (UDA) approaches, which primarily concentrate on directly source-target alignment while neglecting intermediate domain information. Notably, we create the first CAM dataset to validate the proposed algorithm. Extensive experiments on this dataset show that our method outperforms compared approaches. Moreover, it achieves superior performance in UDA tasks across retinal datasets, highlighting its strong generalization capability. The CAM dataset and source codes are available at https://github.com/Light-47/IDA.

羊膜膜（CAM）模型是广泛用于体内研究血管生成的平台，特别是在肿瘤生长、药物输送和血管生物学方面。由于发育中血管的拓扑结构和形态是关键的评估指标，因此准确的血管分割对于血管生成的定量分析至关重要。然而，手动分割非常耗时、劳动强度大，并且由于其主观性容易存在不一致性。此外，关于CAM血管分割算法的研究仍然有限，缺乏公共数据集导致预测性能不佳。为了解决这些挑战，我们提出了一种创新的中间域引导适应（IDA）方法，该方法利用CAM图像和视网膜图像之间的相似性，以及现有的公共视网膜数据集，对CAM图像进行无监督训练。具体来说，我们引入了一种多分辨率不对称翻译（MRAT）策略来生成中间图像，以促进图像级别的交互。然后，开发了一个中间域引导对比学习（IDCL）模块来解开跨域特征表示。该方法克服了现有无监督域适应（UDA）方法的局限性，这些方法主要集中在直接源目标对齐上，而忽略了中间域信息。值得注意的是，我们创建了第一个CAM数据集来验证所提出算法。在该数据集上的大量实验表明，我们的方法优于其他方法。此外，它在视网膜数据集上的UDA任务中表现出卓越的性能，突显了其强大的泛化能力。CAM数据集和源代码可在https://github.com/Light-47/IDA获得。

论文及项目相关链接

PDF

Summary

 利用中间域引导适应（IDA）方法解决鸡胚绒毛膜尿膜（CAM）血管模型图像分割难题。通过借鉴视网膜图像与CAM图像的相似性，利用现有公共视网膜数据集进行CAM图像的无监督训练。采用多分辨率对称翻译（MRAT）策略生成中间图像，促进图像级别的交互。开发中间域引导对比学习（IDCL）模块，以解开跨域特征表示。创建首个CAM数据集验证算法，表现优异，且在不同视网膜数据集上的无监督迁移任务中展现出强大的泛化能力。

Key Takeaways

鸡胚绒毛膜尿膜（CAM）模型广泛应用于研究血管生成过程，尤其是与肿瘤增长、药物输送和血管生物学相关的研究。
准确进行血管分割是对CAM模型中血管生成进行定量分析的关键。
目前手动分割方法存在耗时长、工作量大和结果不一致的问题。
针对CAM血管分割算法的研究有限，且缺乏公共数据集，影响预测性能。
提出的中间域引导适应（IDA）方法利用CAM图像与视网膜图像的相似性，结合公共视网膜数据集进行CAM图像的无监督训练。
多分辨率对称翻译（MRAT）策略用于生成中间图像，促进图像级别交互。
IDA方法创建首个CAM数据集验证算法，表现优于其他方法，且在视网膜数据集上的无监督迁移任务中展现出强大的泛化能力。

Cool Papers

点此查看论文截图

From Slow Bidirectional to Fast Autoregressive Video Diffusion Models

Authors:Tianwei Yin, Qiang Zhang, Richard Zhang, William T. Freeman, Fredo Durand, Eli Shechtman, Xun Huang

Current video diffusion models achieve impressive generation quality but struggle in interactive applications due to bidirectional attention dependencies. The generation of a single frame requires the model to process the entire sequence, including the future. We address this limitation by adapting a pretrained bidirectional diffusion transformer to an autoregressive transformer that generates frames on-the-fly. To further reduce latency, we extend distribution matching distillation (DMD) to videos, distilling 50-step diffusion model into a 4-step generator. To enable stable and high-quality distillation, we introduce a student initialization scheme based on teacher’s ODE trajectories, as well as an asymmetric distillation strategy that supervises a causal student model with a bidirectional teacher. This approach effectively mitigates error accumulation in autoregressive generation, allowing long-duration video synthesis despite training on short clips. Our model achieves a total score of 84.27 on the VBench-Long benchmark, surpassing all previous video generation models. It enables fast streaming generation of high-quality videos at 9.4 FPS on a single GPU thanks to KV caching. Our approach also enables streaming video-to-video translation, image-to-video, and dynamic prompting in a zero-shot manner.

当前的视频扩散模型虽然生成质量令人印象深刻，但由于双向注意力依赖，在交互式应用中遇到了困难。生成单帧需要模型处理整个序列，包括未来信息。我们通过将预训练的双向扩散变压器自适应为自回归变压器来解决这个问题，该变压器可以即时生成帧。为了进一步降低延迟，我们将分布匹配蒸馏（DMD）扩展到视频领域，将50步扩散模型精简为4步生成器。为了实现稳定和高质量的蒸馏，我们引入了基于教师ODE轨迹的学生初始化方案，以及一种不对称蒸馏策略，即用双向教师监督因果学生模型。这种方法有效地减轻了自回归生成中的误差累积，即使在短片段训练的情况下也能实现长时长视频合成。我们的模型在VBench-Long基准测试中达到84.27的总分，超过了所有之前的视频生成模型。它凭借KV缓存，在单个GPU上以9.4 FPS的速度实现了高质量视频的快速流式生成。我们的方法还支持流式视频到视频的翻译、图像到视频以及零样本方式的动态提示。

论文及项目相关链接

PDF CVPR 2025. Project Page: https://causvid.github.io/

Summary

视频扩散模型虽然能生成高质量的内容，但在互动应用中表现受限，主要由于需要处理整个序列才能生成单帧，这造成了一定的时间延迟。通过引入自适应扩散模型和降低步骤数，并利用教师和学生模型间的不对称蒸馏策略进行精细化调整，能有效改善此问题。此策略使得我们的模型即使在长视频合成任务中也能展现良好的性能。实验结果显示其在VBench-Long标准上达到最高分数84.27分，并能在单GPU上以每秒9.4帧的速度生成高质量视频。此外，我们的方法还支持视频实时翻译、图像转视频等功能。核心目的在于提升了效率及多样性任务的支持度。对于输入的图片和参数集合信息具备很强建模能力的流式网络新研究方向来说尤为重要。总结为：提升视频生成模型的互动性和效率，支持多种任务转换，且表现优异。

Key Takeaways

当前视频扩散模型在互动应用中表现欠佳，主要原因是双向注意力依赖造成的时序延迟。
提出将预训练的双向扩散模型改造为流式生成的自回归模型来解决这一问题。
利用分布匹配蒸馏技术将原模型的复杂流程简化至四个步骤。
提出基于教师模型的轨道信息进行学生模型的初始设置，并且采取了不对称蒸馏的策略优化了学生模型性能，使得即便面对长期合成的场景，也可以抑制错误积累的出现。
此模型的VBench得分超越所有先前模型，达到了84.27分的高分评价。
模型能够在单GPU上实现高质量视频的快速生成（每秒9.4帧），并且支持视频转视频、图像转视频的流式处理功能。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-09-28/I2I%20Translation/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

I2I Translation

视频理解

视频理解方向最新论文已更新，请持续关注 Update in 2025-09-28 VIR-Bench Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction

2025-09-28 视频理解

视频理解

Few-Shot

Few-Shot 方向最新论文已更新，请持续关注 Update in 2025-09-28 RePro Leveraging Large Language Models for Semi-Automated Reproduction of Networking Research Results

2025-09-28 Few-Shot

Few-Shot