⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-02-12 更新
ViSIR: Vision Transformer Single Image Reconstruction Method for Earth System Models
Authors:Ehsan Zeraatkar, Salah Faroughi, Jelena Tesic
Purpose: Earth system models (ESMs) integrate the interactions of the atmosphere, ocean, land, ice, and biosphere to estimate the state of regional and global climate under a wide variety of conditions. The ESMs are highly complex, and thus, deep neural network architectures are used to model the complexity and store the down-sampled data. In this paper, we propose the Vision Transformer Sinusoidal Representation Networks (ViSIR) to improve the single image SR (SR) reconstruction task for the ESM data. Methods: ViSIR combines the SR capability of Vision Transformers (ViT) with the high-frequency detail preservation of the Sinusoidal Representation Network (SIREN) to address the spectral bias observed in SR tasks. Results: The ViSIR outperforms ViT by 4.1 dB, SIREN by 7.5 dB, and SR-Generative Adversarial (SR-GANs) by 7.1dB PSNR on average for three different measurements. Conclusion: The proposed ViSIR is evaluated and compared with state-of-the-art methods. The results show that the proposed algorithm is outperforming other methods in terms of Mean Square Error(MSE), Peak-Signal-to-Noise-Ratio(PSNR), and Structural Similarity Index Measure(SSIM).
目的:地球系统模型(ESMs)整合了大气、海洋、陆地、冰层和生物圈的相互作用,以在多种条件下估计区域和全球气候的状态。由于ESMs高度复杂,因此使用深度神经网络架构来对其复杂性进行建模并存储降采样数据。在本文中,我们提出了Vision Transformer Sinusoidal Representation Networks(ViSIR),旨在改进ESM数据的单图像超分辨率(SR)重建任务。
方法:ViSIR结合了Vision Transformer(ViT)的超分辨率能力和Sinusoidal Representation Network(SIREN)的高频细节保留功能,以解决SR任务中观察到的光谱偏差。
结果:ViSIR在三个不同测量上的平均PSNR值比ViT高出4.1 dB,比SIREN高出7.5 dB,比SR-生成对抗网络(SR-GANs)高出7.1 dB。
论文及项目相关链接
Summary:本文提出了Vision Transformer Sinusoidal Representation Networks(ViSIR),结合了Vision Transformers(ViT)的超分辨率(SR)能力和Sinusoidal Representation Network(SIREN)的高频细节保留功能,以改进地球系统模型数据的单图像超分辨率重建任务。ViSIR在三个不同测量上的平均峰值信号噪声比(PSNR)比ViT高出4.1 dB,比SIREN高出7.5 dB,比SR-生成对抗网络(SR-GANs)高出7.1dB。
Key Takeaways:
- 地球系统模型(ESMs)整合大气、海洋、陆地、冰层和生物圈的交互作用,以估计各种条件下的区域和全球气候状态。
- ESMs高度复杂,因此使用深度神经网络架构来建模复杂性和存储下采样数据。
- 提出了Vision Transformer Sinusoidal Representation Networks(ViSIR)以改进ESM数据的单图像超分辨率重建任务。
- ViSIR结合了Vision Transformers(ViT)的SR能力和Sinusoidal Representation Network(SIREN)的高频细节保留功能。
- ViSIR在SR任务中表现出优异的性能,显著优于ViT、SIREN和SR-GANs。
- ViSIR在三个不同测量上的平均PSNR指标显示其优越性。
点此查看论文截图




Provably Near-Optimal Federated Ensemble Distillation with Negligible Overhead
Authors:Won-Jun Jang, Hyeon-Seo Park, Si-Hyeon Lee
Federated ensemble distillation addresses client heterogeneity by generating pseudo-labels for an unlabeled server dataset based on client predictions and training the server model using the pseudo-labeled dataset. The unlabeled server dataset can either be pre-existing or generated through a data-free approach. The effectiveness of this approach critically depends on the method of assigning weights to client predictions when creating pseudo-labels, especially in highly heterogeneous settings. Inspired by theoretical results from GANs, we propose a provably near-optimal weighting method that leverages client discriminators trained with a server-distributed generator and local datasets. Our experiments on various image classification tasks demonstrate that the proposed method significantly outperforms baselines. Furthermore, we show that the additional communication cost, client-side privacy leakage, and client-side computational overhead introduced by our method are negligible, both in scenarios with and without a pre-existing server dataset.
联邦集成蒸馏技术通过为未标记的服务器数据集生成伪标签来解决客户端异质性问题。这些伪标签是基于客户端预测生成的,并使用伪标记数据集训练服务器模型。未标记的服务器数据集可以是预先存在的,也可以通过无数据方法生成。该方法的有效性关键在于在创建伪标签时为客户端预测分配权重的方法,特别是在高度异质的环境中。受生成对抗网络(GANs)理论结果的启发,我们提出了一种利用服务器分布式生成器和本地数据集训练的客户端鉴别器进行近优权重分配的方法。我们在各种图像分类任务上的实验表明,该方法显著优于基线方法。此外,我们证明,该方法引入的额外通信成本、客户端隐私泄露和客户端计算开销都是微不足道的,无论是在有预先存在的服务器数据集还是在没有预先存在的服务器数据集的情境中都是这样。
论文及项目相关链接
Summary
联邦集成蒸馏技术通过基于客户端预测为无标签的服务器数据集生成伪标签,并训练服务器模型使用这些伪标签数据集来解决客户端异构性问题。该方法的效能关键在于创建伪标签时如何为客户端预测分配权重,特别是在高度异构的环境中。受生成对抗网络(GANs)理论结果的启发,我们提出了一种利用服务器分布式生成器和本地数据集训练的客户端鉴别器来提供近似的最优权重分配方法。实验证明,该方法在多种图像分类任务上显著优于基线方法。此外,我们的方法在有无预存的服务器数据集情况下,引入的额外通信成本、客户端隐私泄露和客户端计算开销均可忽略不计。
Key Takeaways
- 联邦集成蒸馏技术通过生成伪标签解决客户端异构性问题。
- 伪标签是基于客户端预测为无标签的服务器数据集生成的。
- 方法的效能关键在创建伪标签时为客户端预测分配权重的方法。
- 提出了一种利用GANs理论的权重分配方法,结合客户端鉴别器和服务器分布式生成器。
- 在多种图像分类任务上,该方法显著优于基线方法。
- 该方法引入的额外通信成本、客户端隐私泄露和计算开销较小,可忽略不计。
点此查看论文截图





Beyond Batch Learning: Global Awareness Enhanced Domain Adaptation
Authors:Lingkun Luo, Shiqiang Hu, Liming Chen
In domain adaptation (DA), the effectiveness of deep learning-based models is often constrained by batch learning strategies that fail to fully apprehend the global statistical and geometric characteristics of data distributions. Addressing this gap, we introduce ‘Global Awareness Enhanced Domain Adaptation’ (GAN-DA), a novel approach that transcends traditional batch-based limitations. GAN-DA integrates a unique predefined feature representation (PFR) to facilitate the alignment of cross-domain distributions, thereby achieving a comprehensive global statistical awareness. This representation is innovatively expanded to encompass orthogonal and common feature aspects, which enhances the unification of global manifold structures and refines decision boundaries for more effective DA. Our extensive experiments, encompassing 27 diverse cross-domain image classification tasks, demonstrate GAN-DA’s remarkable superiority, outperforming 24 established DA methods by a significant margin. Furthermore, our in-depth analyses shed light on the decision-making processes, revealing insights into the adaptability and efficiency of GAN-DA. This approach not only addresses the limitations of existing DA methodologies but also sets a new benchmark in the realm of domain adaptation, offering broad implications for future research and applications in this field.
在领域适应(DA)中,基于深度学习的模型的有效性通常受到批量学习策略的约束,这些策略未能完全理解数据分布的全局统计和几何特征。为了解决这个问题,我们引入了“全球意识增强领域适应”(GAN-DA),这是一种超越传统基于批量的限制的新方法。GAN-DA集成了一种独特的预定义特征表示(PFR),以促进跨域分布的对齐,从而实现全面的全局统计意识。这种表示创新地扩展到正交和公共特征方面,这增强了全局流形结构的统一,并细化了决策边界,从而更有效地实现DA。我们的广泛实验涵盖了27个不同的跨域图像分类任务,证明了GAN-DA的显著优势,在24种已建立的DA方法中有显著的优势。此外,我们的深入分析揭示了决策过程,对GAN-DA的适应性和效率提供了见解。这种方法不仅解决了现有DA方法学的局限性,而且为领域适应领域树立了新的基准,为未来的研究和应用提供了广泛的启示。
论文及项目相关链接
Summary
提出了一种全新的域适应(DA)方法——全局意识增强域适应(GAN-DA),解决了深度学习方法因批处理策略无法全面理解数据分布的全局统计和几何特征而导致的有效性受限问题。该方法通过引入一个独特的预定义特征表示(PFR),促进了跨域分布的对齐,实现了全面的全局统计意识。在涵盖27个不同跨域图像分类任务的广泛实验中,证明了GAN-DA显著优于其他24种已建立的DA方法。此方法不仅解决了现有DA方法的局限性,还为该领域的研究和应用提供了新的基准。
Key Takeaways
- GAN-DA解决了深度学习方法在域适应中的局限性,这些局限性源于批处理策略无法全面理解数据分布的全局统计和几何特征。
- GAN-DA通过引入预定义特征表示(PFR)实现了跨域分布的对齐,增强了全局统计意识。
- PFR被创新性地扩展为包含正交和共同特征方面,这有助于统一全局流形结构,并优化决策边界,从而实现更有效的域适应。
- 在广泛的实验中,涵盖27个不同的跨域图像分类任务,GAN-DA显著优于其他24种已建立的DA方法。
- GAN-DA不仅解决了现有DA方法的局限性,还为该领域的研究和应用提供了新的基准。
- 深入的分析揭示了GAN-DA的决策过程,对其适应性和效率提供了见解。
点此查看论文截图




Towards Consistent and Controllable Image Synthesis for Face Editing
Authors:Mengting Wei, Tuomas Varanka, Yante Li, Xingxun Jiang, Huai-Qian Khor, Guoying Zhao
Face editing methods, essential for tasks like virtual avatars, digital human synthesis and identity preservation, have traditionally been built upon GAN-based techniques, while recent focus has shifted to diffusion-based models due to their success in image reconstruction. However, diffusion models still face challenges in controlling specific attributes and preserving the consistency of other unchanged attributes especially the identity characteristics. To address these issues and facilitate more convenient editing of face images, we propose a novel approach that leverages the power of Stable-Diffusion (SD) models and crude 3D face models to control the lighting, facial expression and head pose of a portrait photo. We observe that this task essentially involves the combinations of target background, identity and face attributes aimed to edit. We strive to sufficiently disentangle the control of these factors to enable consistency of face editing. Specifically, our method, coined as RigFace, contains: 1) A Spatial Attribute Encoder that provides presise and decoupled conditions of background, pose, expression and lighting; 2) A high-consistency FaceFusion method that transfers identity features from the Identity Encoder to the denoising UNet of a pre-trained SD model; 3) An Attribute Rigger that injects those conditions into the denoising UNet. Our model achieves comparable or even superior performance in both identity preservation and photorealism compared to existing face editing models. Code is publicly available at https://github.com/weimengting/RigFace.
面部编辑方法对于虚拟化身、数字人类合成和身份保留等任务至关重要,传统上基于GAN技术构建,而最近的研究重点已转向扩散模型,它们在图像重建方面的成功吸引了人们的关注。然而,扩散模型在控制特定属性和保持其他未改变属性的一致性方面仍然面临挑战,尤其是身份特征。为了解决这些问题,并方便对面部图像进行编辑,我们提出了一种新方法,利用Stable-Diffusion(SD)模型和粗略的3D面部模型的力量来控制肖像照片的光线、面部表情和头部姿势。我们发现此任务主要涉及目标背景、身份和面部属性的编辑组合。我们努力充分解开这些因素的控制,以实现面部编辑的一致性。具体来说,我们的方法——RigFace,包括:1)空间属性编码器,提供精确且解耦的背景、姿势、表情和光线条件;2)高一致性FaceFusion方法,将身份编码器的身份特征转移到预训练SD模型的去噪UNet中;3)属性触发器,将这些条件注入去噪UNet。与现有的面部编辑模型相比,我们的模型在身份保留和逼真度方面达到了相当甚至更高的性能。代码公开在https://github.com/weimengting/RigFace。
论文及项目相关链接
Summary
针对虚拟头像、数字人类合成和身份保留等任务,面部编辑方法至关重要。传统上基于GAN技术构建的方法近期被扩散模型所取代,因其图像重建方面的成功。然而,扩散模型在控制特定属性和保持其他不变属性的一致性方面仍存在挑战,特别是在身份特征方面。为解决这些问题并更方便地编辑面部图像,我们提出了一种结合Stable-Diffusion模型和粗略3D面部模型的新方法,可控制肖像照的光线、面部表情和头部姿势。我们的方法旨在充分解开背景、身份和面部属性控制,以实现面部编辑的一致性。实验表明,该方法在身份保留和逼真度方面达到或超越了现有面部编辑模型的效果。
Key Takeaways
- 面部编辑在虚拟头像、数字人类合成和身份保留等任务中非常重要。
- 传统基于GAN的面部编辑方法被扩散模型所替代,后者在图像重建方面表现优越。
- 扩散模型在控制特定属性和保持未变属性的一致性方面存在挑战,尤其是在身份特征上。
- 为解决这些问题,提出了结合Stable-Diffusion模型和粗略3D面部模型的新方法,可控制肖像照的光线、面部表情和头部姿势。
- 该方法旨在通过精确的背景、身份和面部属性控制来实现面部编辑的一致性。
- 方法在身份保留和逼真度方面表现出色,超越了一些现有的面部编辑模型。
点此查看论文截图






Weakly-Supervised PET Anomaly Detection using Implicitly-Guided Attention-Conditional Counterfactual Diffusion Modeling: a Multi-Center, Multi-Cancer, and Multi-Tracer Study
Authors:Shadab Ahamed, Arman Rahmim
Minimizing the need for pixel-level annotated data to train PET lesion detection and segmentation networks is highly desired and can be transformative, given time and cost constraints associated with expert annotations. Current un-/weakly-supervised anomaly detection methods rely on autoencoder or generative adversarial networks trained only on healthy data; however GAN-based networks are more challenging to train due to issues with simultaneous optimization of two competing networks, mode collapse, etc. In this paper, we present the weakly-supervised Implicitly guided COuNterfactual diffusion model for Detecting Anomalies in PET images (IgCONDA-PET). The solution is developed and validated using PET scans from six retrospective cohorts consisting of a total of 2652 cases containing both local and public datasets. The training is conditioned on image class labels (healthy vs. unhealthy) via attention modules, and we employ implicit diffusion guidance. We perform counterfactual generation which facilitates “unhealthy-to-healthy” domain translation by generating a synthetic, healthy version of an unhealthy input image, enabling the detection of anomalies through the calculated differences. The performance of our method was compared against several other deep learning based weakly-supervised or unsupervised methods as well as traditional methods like 41% SUVmax thresholding. We also highlight the importance of incorporating attention modules in our network for the detection of small anomalies. The code is publicly available at: https://github.com/ahxmeds/IgCONDA-PET.git.
减少训练PET病变检测与分割网络时对像素级标注数据的需求是非常理想的,并能在考虑到与专家标注相关的时间和成本约束的情况下带来变革。当前的未标记/弱监督异常检测方法依赖于仅对健康数据训练的自动编码器或生成对抗性网络;然而,基于GAN的网络由于同时优化两个竞争网络的问题、模式崩溃等问题而更具挑战性。在本文中,我们展示了弱监督隐式指导的用于PET图像异常检测的反事实扩散模型(IgCONDA-PET)。该解决方案是使用来自六个回顾性队列的PET扫描进行开发和验证的,这些队列总共包含包含本地和公共数据集的2652个病例。训练是通过注意力模块对图像类别标签(健康与否)进行条件设置,我们采用隐式扩散指导。我们执行反事实生成,通过生成不健康输入图像的合成健康版本,促进“不健康到健康”的域转换,并通过计算差异来实现异常检测。我们的方法与基于深度学习的其他弱监督或无监督方法以及传统的如SUVmax阈值分割等方法进行了比较。我们还强调了在网络中纳入注意力模块对于检测微小异常的重要性。代码公开在:https://github.com/ahxmeds/IgCONDA-PET.git。
论文及项目相关链接
PDF 32 pages, 6 figures, 4 tables
Summary
本论文提出了一种基于弱监督的PET图像异常检测模型IgCONDA-PET。该模型通过条件训练利用注意力模块,并采用隐性扩散引导和反向生成技术,能够生成健康图像的版本,从而通过计算差异来检测异常。此方法相较于其他弱监督或无监督深度学习方法和传统方法表现出优异性能。代码已公开。
Key Takeaways
- 论文提出了一种新的弱监督PET图像异常检测模型IgCONDA-PET。
- 该模型使用注意力模块进行训练,以区分健康与非健康图像。
- 采用隐性扩散引导技术生成健康图像版本,实现“不健康到健康”的域转换。
- 通过计算差异来检测异常,提高了检测性能。
- 模型性能优于其他弱监督或无监督的深度学习方法和传统方法,如SUVmax阈值法。
- 论文强调了注意力模块在检测小异常中的重要性。
点此查看论文截图

