发布日期: 2025-09-09

更新日期: 2025-10-07

文章字数: 2.3k

阅读时长: 9 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-09 更新

PRIM: Towards Practical In-Image Multilingual Machine Translation

Authors:Yanzhi Tian, Zeming Liu, Zhengyang Liu, Chong Feng, Xin Li, Heyan Huang, Yuhang Guo

In-Image Machine Translation (IIMT) aims to translate images containing texts from one language to another. Current research of end-to-end IIMT mainly conducts on synthetic data, with simple background, single font, fixed text position, and bilingual translation, which can not fully reflect real world, causing a significant gap between the research and practical conditions. To facilitate research of IIMT in real-world scenarios, we explore Practical In-Image Multilingual Machine Translation (IIMMT). In order to convince the lack of publicly available data, we annotate the PRIM dataset, which contains real-world captured one-line text images with complex background, various fonts, diverse text positions, and supports multilingual translation directions. We propose an end-to-end model VisTrans to handle the challenge of practical conditions in PRIM, which processes visual text and background information in the image separately, ensuring the capability of multilingual translation while improving the visual quality. Experimental results indicate the VisTrans achieves a better translation quality and visual effect compared to other models. The code and dataset are available at: https://github.com/BITHLP/PRIM.

图像内机器翻译（IIMT）旨在将包含文本的图像从一种语言翻译到另一种语言。目前端到端的IIMT主要研究内容是基于合成数据，包括简单的背景、单一的字体、固定的文本位置和双语翻译，这并不能完全反映真实世界，导致研究与实际应用条件之间存在较大差距。为了促进在真实场景下的IIMT研究，我们探索了实用图像内多语言机器翻译（IIMMT）。由于缺乏公开可用的数据，我们对PRIM数据集进行了标注，该数据集包含真实世界捕获的一行文本图像，具有复杂的背景、多种字体、多样的文本位置，并支持多语言翻译方向。为了应对实际条件下的挑战，我们提出了端到端的VisTrans模型，该模型可以分别处理图像中的视觉文本和背景信息，在保证多语言翻译能力的同时提高了视觉质量。实验结果表明显示，与其他模型相比，VisTrans在翻译质量和视觉效果方面取得了更好的效果。相关代码和数据集可通过以下链接获取：https://github.com/BITHLP/PRIM 。

论文及项目相关链接

PDF Accepted to EMNLP 2025 Main Conference

Summary
文本主要介绍了针对图像内机器翻译（IIMT）在现实世界场景下的研究挑战，并探索了实用图像多语种机器翻译（IIMMT）。由于缺少公开可用的数据，团队标注了真实世界拍摄的一行文本图像数据集PRIM，其中包含复杂背景、多种字体和文本位置，并支持多语种翻译方向。为应对真实条件下的挑战，提出了端到端的VisTrans模型，能够分别处理图像中的视觉文本和背景信息，保证多语种翻译能力的同时提升视觉效果。实验结果显示VisTrans相较于其他模型有更好的翻译质量和视觉效果。数据集和代码可通过链接查看。

Key Takeaways

IIMT的目标是对含有文本的图像进行跨语言翻译。
当前的研究主要基于合成数据，无法完全反映真实世界场景，存在研究与实践之间的鸿沟。
为了解决这一问题，探索了实用图像多语种机器翻译（IIMMT）。
由于缺乏公开数据集，团队标注了真实世界的数据集PRIM，支持多种语言和复杂的图像背景。
提出了VisTrans模型来处理真实条件下的挑战，该模型可以分别处理图像中的视觉文本和背景信息。
VisTrans模型在保证多语种翻译能力的同时提升了视觉效果。

Cool Papers

点此查看论文截图

Label-Free Whole Slide Virtual Multi-Staining Using Dual-Excitation Photon Absorption Remote Sensing Microscopy

Authors:James E. D. Tweel, Benjamin R. Ecclestone, James A. Tummon Simmons, Parsin Haji Reza

Histochemical staining is essential for visualizing tissue architecture and cellular morphology but is destructive and limited by the availability of tissue for multiple stains. Virtual staining with label-free microscopy offers a non-destructive alternative, enabling multiple stains to be generated from the same section while reducing stain variability and preserving tissue for downstream assays. Here, a new dual-excitation Photon Absorption Remote Sensing (PARS) system is presented, representing the first application of long-wave ultraviolet A (UVA) 355 nm excitation alongside the established UVC 266 nm source. The addition of 355 nm extends PARS contrast beyond 266 nm, enhancing stromal visualization (e.g., collagen, elastin) and capturing red blood cells, melanin, and other features through complementary radiative and non-radiative absorption. The 266 nm and 355 nm pulses interrogate the sample in an interlaced fashion, enabling concurrent acquisition without compromising imaging speed. Using the RegGAN image-translation framework, this work presents the first demonstration of PARS virtual staining across multiple specialized stains, including Masson’s trichrome, periodic acid-Schiff (PAS), and Jones’ silver, in addition to hematoxylin and eosin (H&E), across diverse human and murine tissues. A masked evaluation by expert pathologists showed that virtual stains achieved the same diagnostic quality as their chemical counterparts, and pathologists could not reliably distinguish real from virtual stains. By providing label-free multi-stain outputs from a single scan, dual-excitation PARS virtual staining could integrate into digital pathology workflows, expanding diagnostic utility. Real and virtual whole-slide image (WSI) pairs are publicly available at the BioImage Archive (https://doi.org/10.6019/S-BIAD2232).

组织化学染色对于可视化组织结构和细胞形态至关重要，但它具有破坏性，并且由于多种染料的可用性而受到限制。使用无标签显微镜的虚拟染色提供了一种非破坏性的替代方案，能够在同一部位生成多种染色，同时减少染色变量的影响并保留组织以供下游测定。在此，介绍了一种新的双激发光子吸收遥感系统（PARS），代表了长波紫外线A（UVA）355nm激发与已建立的UVC 266nm光源的首次应用。添加355nm将PARS对比度扩展到266nm之外，提高了基质可视化（例如胶原蛋白、弹性蛋白），并通过互补辐射和非辐射吸收捕获红细胞、黑色素和其他特征。266nm和355nm脉冲以交错的方式询问样本，能够在不损害成像速度的情况下同时进行采集。使用RegGAN图像翻译框架，这项工作展示了PARS虚拟染色在多种专业染色剂上的首次演示，包括马松三色染色、周期性酸-希夫反应（PAS）和琼斯银染色，以及多种人类和小鼠组织的苏木精和伊红染色（H&E）。专家病理学家进行的隐蔽评估表明，虚拟染色达到了与其化学对应物相同的诊断质量，病理学家无法可靠地区分真实染色和虚拟染色。通过提供单次扫描的无标签多染色输出，双激发PARS虚拟染色可以融入数字病理工作流程中，提高诊断效用。真实和虚拟全幻灯片图像（WSI）对可在BioImage Archive（https://doi.org/10.6019/S-BIAD2232）公开获取。

论文及项目相关链接

PDF 31 pages, 10 figures

Summary：新型双激发光子吸收遥感系统（PARS）采用紫外线A（UVA）波段进行虚拟染色技术。该系统实现了包括胶原纤维等组织的染色效果提升和各类特征的捕捉。系统通过交替使用不同波长脉冲进行样本检测，实现了虚拟染色技术的多通道成像。专家评估结果显示虚拟染色和化学生色法的效果相似。PARS系统的应用对数字化病理诊断和预测起到了积极的推动作用。此类真实和虚拟全切片图像对已在生物图像档案中公开。

Key Takeaways：

双激发光子吸收遥感系统结合了两种不同波长激发，即紫外线A波段（UVA）的远程感应技术。这种新型系统拓展了染色效果的对比度范围，使得细胞间质染色如胶原纤维更为清晰可见。
系统可捕捉到红细胞、黑色素和其他特征，包括利用互补辐射和非辐射吸收方式来实现这些特征的捕捉。
双激发脉冲交替检测样本，实现了快速成像的同时进行多通道成像。这种技术显著提高了虚拟染色的效果，使其与化学染色的效果相当。
利用RegGAN图像翻译框架，该系统成功展示了多种虚拟染色技术，包括Masson三色染色、PAS染色和Jones银染色等。这些虚拟染色技术在不同的人类和鼠类组织中都有广泛的应用。
专家评估结果显示，虚拟染色在诊断质量上与化学染色无法区分，进一步证明了其在实际应用中的有效性。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-09-09/I2I%20Translation/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

I2I Translation

Vision Transformer

Vision Transformer 方向最新论文已更新，请持续关注 Update in 2025-09-09 VCMamba Bridging Convolutions with Multi-Directional Mamba for Efficient Visual Representation

2025-09-09 Vision Transformer

Vision Transformer

Few-Shot

Few-Shot 方向最新论文已更新，请持续关注 Update in 2025-09-09 A Meta-Fusion Architecture for Few-Shot Classification of Spike Waveforms in High-Bandwidth Brain-Machine Interfacing

2025-09-09 Few-Shot

Few-Shot