GAN

发布日期: 2025-09-13

更新日期: 2025-10-07

文章字数: 2.2k

阅读时长: 9 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-13 更新

VRAE: Vertical Residual Autoencoder for License Plate Denoising and Deblurring

Authors:Cuong Nguyen, Dung T. Tran, Hong Nguyen, Xuan-Vu Phan, Nam-Phong Nguyen

In real-world traffic surveillance, vehicle images captured under adverse weather, poor lighting, or high-speed motion often suffer from severe noise and blur. Such degradations significantly reduce the accuracy of license plate recognition systems, especially when the plate occupies only a small region within the full vehicle image. Restoring these degraded images a fast realtime manner is thus a crucial pre-processing step to enhance recognition performance. In this work, we propose a Vertical Residual Autoencoder (VRAE) architecture designed for the image enhancement task in traffic surveillance. The method incorporates an enhancement strategy that employs an auxiliary block, which injects input-aware features at each encoding stage to guide the representation learning process, enabling better general information preservation throughout the network compared to conventional autoencoders. Experiments on a vehicle image dataset with visible license plates demonstrate that our method consistently outperforms Autoencoder (AE), Generative Adversarial Network (GAN), and Flow-Based (FB) approaches. Compared with AE at the same depth, it improves PSNR by about 20%, reduces NMSE by around 50%, and enhances SSIM by 1%, while requiring only a marginal increase of roughly 1% in parameters.

在现实世界中的交通监控中，恶劣天气、光线不足或高速运动情况下拍摄到的车辆图像通常受到严重的噪声和模糊影响。这种退化大大降低了车牌识别系统的准确性，特别是在车牌仅占全车图像一小部分的情况下。因此，以快速实时的方式恢复这些退化图像是增强识别性能的关键预处理步骤。在这项工作中，我们提出了一种针对交通监控中的图像增强任务设计的垂直残差自编码器（VRAE）架构。该方法采用了一种增强策略，即使用一个辅助块，在每个编码阶段注入输入感知特征，以引导表示学习过程，与传统的自编码器相比，能够在网络中保留更好的通用信息。在包含可见车牌的车辆图像数据集上的实验表明，我们的方法始终优于自编码器（AE）、生成对抗网络（GAN）和基于流（FB）的方法。与相同深度的AE相比，它在PSNR上提高了约20%，在NMSE上降低了约50%，在SSIM上提高了1%，同时参数仅增加了大约1%。

论文及项目相关链接

PDF

Summary

在交通监控中，恶劣天气、光线不足或高速运动导致的车辆图像噪声和模糊问题严重影响车牌识别系统的准确性。本文提出了一种针对交通监控图像增强任务的垂直残差自编码器（VRAE）架构，通过采用辅助块注入输入感知特征，引导表示学习过程，比传统自编码器更好地保留通用信息。实验证明，该方法在车辆图像数据集上表现优于自编码器（AE）、生成对抗网络（GAN）和基于流的方法（FB），在相同深度下，与AE相比，PSNR提高约20%，NMSE降低约50%，SSIM提高1%，同时参数增加幅度较小。

Key Takeaways

恶劣天气、光线不足或高速运动下的车辆图像噪声和模糊问题影响车牌识别系统的准确性。
提出了一种新的垂直残差自编码器（VRAE）架构，专门用于交通监控图像增强。
VRAE通过辅助块注入输入感知特征，以引导表示学习过程。
VRAE在车辆图像数据集上的表现优于其他方法，如自编码器（AE）、生成对抗网络（GAN）和基于流的方法（FB）。
与相同深度的自编码器相比，VRAE在PSNR、NMSE和SSIM指标上表现出更好的性能。
VRAE的参数增加幅度较小。
该方法是一个有效的预处理方法，可以提高车牌识别系统的性能。

Cool Papers

点此查看论文截图

Spec2VolCAMU-Net: A Spectrogram-to-Volume Model for EEG-to-fMRI Reconstruction based on Multi-directional Time-Frequency Convolutional Attention Encoder and Vision-Mamba U-Net

Authors:Dongyi He, Shiyang Li, Bin Jiang, He Yan

High-resolution functional magnetic resonance imaging (fMRI) is essential for mapping human brain activity; however, it remains costly and logistically challenging. If comparable volumes could be generated directly from widely available scalp electroencephalography (EEG), advanced neuroimaging would become significantly more accessible. Existing EEG-to-fMRI generators rely on plain Convolutional Neural Networks (CNNs) that fail to capture cross-channel time-frequency cues or on heavy transformer/Generative Adversarial Network (GAN) decoders that strain memory and stability. To address these limitations, we propose Spec2VolCAMU-Net, a lightweight architecture featuring a Multi-directional Time-Frequency Convolutional Attention Encoder for rich feature extraction and a Vision-Mamba U-Net decoder that uses linear-time state-space blocks for efficient long-range spatial modelling. We frame the goal of this work as establishing a new state of the art in the spatial fidelity of single-volume reconstruction, a foundational prerequisite for the ultimate aim of generating temporally coherent fMRI time series. Trained end-to-end with a hybrid SSI-MSE loss, Spec2VolCAMU-Net achieves state-of-the-art fidelity on three public benchmarks, recording Structural Similarity Index (SSIM) of 0.693 on NODDI, 0.725 on Oddball and 0.788 on CN-EPFL, representing improvements of 14.5%, 14.9%, and 16.9% respectively over previous best SSIM scores. Furthermore, it achieves competitive Signal-to-Noise Ratio (PSNR) scores, particularly excelling on the CN-EPFL dataset with a 4.6% improvement over the previous best PSNR, thus striking a better balance in reconstruction quality. The proposed model is lightweight and efficient, making it suitable for real-time applications in clinical and research settings. The code is available at https://github.com/hdy6438/Spec2VolCAMU-Net.

高分辨率功能磁共振成像（fMRI）对于绘制人类大脑活动图至关重要；然而，它仍然成本高昂且后勤挑战重重。如果能够从广泛使用的头皮脑电图（EEG）直接生成相当数量的数据，那么先进的神经成像将变得更加易于获取。现有的EEG到fMRI生成器依赖于简单的卷积神经网络（CNNs），无法捕捉跨通道的时空线索，或者依赖于沉重的转换器/对抗生成网络（GAN）解码器，这会给内存和稳定性带来压力。为了解决这些局限性，我们提出了Spec2VolCAMU-Net，这是一种轻量级的架构，配备了一个多方向时空卷积注意力编码器进行丰富的特征提取和一个采用线性时间状态空间块的Vision-Mamba U-Net解码器，以进行有效的远程空间建模。我们将这项工作的目标定位为建立单体积重建空间保真度方面新的技术标杆，这是生成时间连贯的fMRI时间序列的最终目标的先决基础条件。使用混合SSI-MSE损失进行端到端的训练，Spec2VolCAMU-Net在三个公共基准测试上达到了最先进的保真度，在NODDI上的结构相似性指数（SSIM）为0.693，Oddball上的SSIM为0.725，CN-EPFL上的SSIM为0.788，分别比之前的最佳SSIM得分提高了14.5%、14.9%和16.9%。此外，它还取得了具有竞争力的信噪比（PSNR）分数，特别是在CN-EPFL数据集上比之前的最佳PSNR提高了4.6%，因此在重建质量上达到了更好的平衡。所提出的模型既轻便又高效，适合在临床和研究环境中进行实时应用。代码可在[https://github.com/hdy6438/Spec2VolCAMU-Net找到。]

论文及项目相关链接

PDF

Summary
该文本介绍了一种名为Spec2VolCAMU-Net的新模型，该模型利用多方向时间频率卷积注意力编码器和线性时间状态空间块的视觉Mamba U-Net解码器，旨在实现从广泛可用的头皮脑电图（EEG）生成高分辨率功能磁共振成像（fMRI）。该模型在三个公共基准测试上实现了最新颖的空间保真性单体积重建效果，并且在临床和研究环境中适用于实时应用。模型代码可在指定GitHub地址找到。

Key Takeaways