⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-11-06 更新
NSYNC: Negative Synthetic Image Generation for Contrastive Training to Improve Stylized Text-To-Image Translation
Authors:Serkan Ozturk, Samet Hicsonmez, Pinar Duygulu
Current text conditioned image generation methods output realistic looking images, but they fail to capture specific styles. Simply finetuning them on the target style datasets still struggles to grasp the style features. In this work, we present a novel contrastive learning framework to improve the stylization capability of large text-to-image diffusion models. Motivated by the astonishing advance in image generation models that makes synthetic data an intrinsic part of model training in various computer vision tasks, we exploit synthetic image generation in our approach. Usually, the generated synthetic data is dependent on the task, and most of the time it is used to enlarge the available real training dataset. With NSYNC, alternatively, we focus on generating negative synthetic sets to be used in a novel contrastive training scheme along with real positive images. In our proposed training setup, we forward negative data along with positive data and obtain negative and positive gradients, respectively. We then refine the positive gradient by subtracting its projection onto the negative gradient to get the orthogonal component, based on which the parameters are updated. This orthogonal component eliminates the trivial attributes that are present in both positive and negative data and directs the model towards capturing a more unique style. Experiments on various styles of painters and illustrators show that our approach improves the performance over the baseline methods both quantitatively and qualitatively. Our code is available at https://github.com/giddyyupp/NSYNC.
当前基于文本条件的图像生成方法能够输出看起来很逼真的图像,但它们无法捕捉特定的风格。仅通过对目标风格数据集进行微调,仍然难以捕捉风格特征。在这项工作中,我们提出了一种新的对比学习框架,以提高大型文本到图像扩散模型的风格化能力。受图像生成模型的惊人进步的启发,合成数据已成为各种计算机视觉任务中模型训练不可或缺的一部分,我们在方法中利用合成图像生成。通常,生成的合成数据依赖于任务,并且大部分时间用于扩大可用的真实训练数据集。而使用NSYNC方法,我们专注于生成用于新型对比训练方案的负合成集,以及真实的正图像。在我们提出的训练设置中,我们将负数据与前向正数据一起传递,并分别获得负和正梯度。然后,我们通过从正梯度中减去其在负梯度上的投影来细化正梯度,得到正交分量,基于该正交分量更新参数。这个正交分量消除了同时存在于正负数据中的常规属性,并指导模型捕捉更独特的风格。对各种画家和插画师风格的实验表明,我们的方法在定量和定性方面都超过了基线方法。我们的代码可在https://github.com/giddyyupp/NSYNC找到。
论文及项目相关链接
PDF Under review
Summary
文本描述了一种新的对比学习框架,用于提高大型文本到图像扩散模型的风格化能力。该框架通过生成负合成集,并将其用于对比训练方案,与真实正图像一起使用。通过消除正负数据中的通用属性,使模型更专注于捕捉独特的风格。
Key Takeaways
- 当前文本条件图像生成方法难以捕捉特定风格,即使对目标风格数据集进行微调也仍难以把握风格特征。
- 提出了一种新的对比学习框架,旨在提高大型文本到图像扩散模型的风格化能力。
- 利用合成图像生成,生成负合成集用于对比训练。
- 通过正负数据的前向传递,获得正负梯度,并基于正交分量更新参数。
- 正交分量消除了正负数据中的通用属性,使模型更专注于捕捉独特风格。
- 在各种风格的画家和插图师上的实验表明,该方法在定量和定性方面都优于基准方法。
点此查看论文截图
Chasing the storm: Investigating the application of high-contrast imaging techniques in producing precise exoplanet light curves
Authors:Ben J. Sutlieff, David S. Doelman, Jayne L. Birkby, Matthew A. Kenworthy, Jordan M. Stone, Frans Snik, Steve Ertel, Beth A. Biller, Charles E. Woodward, Andrew J. Skemer, Jarron M. Leisenring, Alexander J. Bohn, Luke T. Parker
Substellar companions such as exoplanets and brown dwarfs exhibit changes in brightness arising from top-of-atmosphere inhomogeneities, providing insights into their atmospheric structure and dynamics. This variability can be measured in the light curves of high-contrast companions from the ground by combining differential spectrophotometric monitoring techniques with high-contrast imaging. However, ground-based observations are sensitive to the effects of turbulence in Earth’s atmosphere, and while adaptive optics (AO) systems and bespoke data processing techniques help to mitigate these, residual systematics can limit photometric precision. Here, we inject artificial companions to data obtained with an AO system and a vector Apodizing Phase Plate coronagraph to test the level to which telluric and other systematics contaminate such light curves, and thus how well their known variability signals can be recovered. We find that varying companions are distinguishable from non-varying companions, but that variability amplitudes and periods cannot be accurately recovered when observations cover only a small number of periods. Residual systematics remain above the photon noise in the light curves but have not yet reached a noise floor. We also simulate observations to assess how specific systematic sources, such as non-common path aberrations and AO residuals, can impact aperture photometry as a companion moves through pupil-stabilised data. We show that only the lowest-order aberrations are likely to affect flux measurements, but that thermal background noise is the dominant source of scatter in raw companion photometry. Predictive control and focal-plane wavefront sensing techniques will help to further reduce systematics in data of this type.
亚恒星伴侣(如外行星和棕矮星)表现出的亮度变化源于其大气层顶部的非均匀性,为我们提供了对其大气结构和动力学的见解。通过结合差分光谱光度监测技术与高对比度成像技术,可以从地面测量高对比度伴侣的光变曲线中的这种变化。然而,地面观测对地球大气湍流的影响很敏感,虽然自适应光学(AO)系统和专用数据处理技术有助于缓解这一问题,但残留的系统效应可能会限制光度精度。在这里,我们对自适应光学系统和矢量掩星相位板仪器获得的数据注入了人工伴侣,以测试地球大气和其他系统污染对这些光变曲线的影响程度,以及已知的可变信号可以恢复得多么完好。我们发现,可变伴侣与不可变伴侣之间是可区分的,但当观测覆盖的周期数较少时,无法准确恢复变化幅度和周期。残留的系统效应仍然高于光曲线中的光子噪声,但尚未达到噪声下限。我们还模拟观测以评估特定系统来源(如非公共路径像差和自适应光学残差)如何影响伴随星通过瞳孔稳定数据时的孔径光度测量。我们表明,只有低阶像差才可能影响流量测量,但热背景噪声是原始伴随星光度测量中的主要散射源。预测控制和焦平面波前传感技术将帮助进一步减少此类数据的系统效应。
论文及项目相关链接
PDF 19 pages, 12 figures, accepted for publication in MNRAS
Summary
此文本讨论了亚恒星伴侣(如太阳系外行星和棕矮星)的大气结构和动力学的研究。通过结合差分光谱光度监测技术和高对比度成像技术,可以从地面测量这些高对比度伴侣的光变曲线。然而,地面观测会受到地球大气湍流的影响,尽管自适应光学系统和专用数据处理技术有助于减轻这些影响,但残余的系统性误差仍可能限制光度测量的精度。通过向自适应光学系统和矢量孔径掩模器掩星法获得的数据中注入人工伴侣,测试了大气和其他系统误差对光变曲线的污染程度,以及恢复已知变星信号的可能性。研究发现,变星伴侣与非变星伴侣可以区分开,但当观测周期数较少时,变星振幅和周期无法准确恢复。残余系统误差仍高于光子噪声,但尚未达到噪声水平。模拟观测还表明,特定系统误差来源(如非公共路径像差和自适应光学残差)会对伴随物在瞳孔稳定数据中移动时的孔径光度测量产生影响。只有低阶像差可能影响流量测量,而热背景噪声是原始伴随光度测量的主要散射源。预测控制和焦点平面波前传感技术将有助于进一步减少此类数据的系统误差。
Key Takeaways
- 亚恒星伴侣的大气结构和动力学可以通过其亮度变化来研究,这些变化来源于顶部大气的不均匀性。
- 地面观测受到地球大气湍流的影响,导致光度测量存在误差。
- 自适应光学系统和数据处理技术有助于减轻大气湍流的影响,但残余系统误差仍可能影响光度测量的精度。
- 通过注入人工伴侣进行测试,发现变星伴侣与非变星伴侣可区分,但变星特性(振幅和周期)在观测周期较少时难以准确恢复。
- 残余系统误差高于光子噪声,但尚未达到噪声地板。
- 模拟观测显示,特定系统误差(如低阶像差和热背景噪声)会影响伴随物的光度测量。
点此查看论文截图
Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials
Authors:Yifan Pu, Jixuan Ying, Qixiu Li, Tianzhu Ye, Dongchen Han, Xiaochen Wang, Ziyi Wang, Xinyu Shao, Gao Huang, Xiu Li
Vision Transformers (ViTs) have become a universal backbone for both image recognition and image generation. Yet their Multi-Head Self-Attention (MHSA) layer still performs a quadratic query-key interaction for every token pair, spending the bulk of computation on visually weak or redundant correlations. We introduce Visual-Contrast Attention (VCA), a drop-in replacement for MHSA that injects an explicit notion of discrimination while reducing the theoretical complexity from O(N N C) to O(N n C) with n << N. VCA first distils each head’s dense query field into a handful of spatially pooled visual-contrast tokens, then splits them into a learnable positive and negative stream whose differential interaction highlights what truly separates one region from another. The module adds fewer than 0.3M parameters to a DeiT-Tiny backbone, requires no extra FLOPs, and is wholly architecture-agnostic. Empirically, VCA lifts DeiT-Tiny top-1 accuracy on ImageNet-1K from 72.2% to 75.6% (+3.4) and improves three strong hierarchical ViTs by up to 3.1%, while in class-conditional ImageNet generation it lowers FID-50K by 2.1 to 5.2 points across both diffusion (DiT) and flow (SiT) models. Extensive ablations confirm that (i) spatial pooling supplies low-variance global cues, (ii) dual positional embeddings are indispensable for contrastive reasoning, and (iii) combining the two in both stages yields the strongest synergy. VCA therefore offers a simple path towards faster and sharper Vision Transformers. The source code is available at https://github.com/LeapLabTHU/LinearDiff.
视觉Transformer(ViTs)已成为图像识别和图像生成的通用主干网络。然而,其多头自注意力(MHSA)层仍然为每对令牌执行二次查询-键交互,将大量计算花费在视觉较弱或冗余的关联上。我们引入了视觉对比注意力(VCA),它是MHSA的即插即用替代品,在减少理论复杂度从O(N^2C)到O(NnC)的同时注入了明确的辨别概念,其中n<<N。VCA首先提炼每个头部的密集查询字段,将其转化为少量空间池化的视觉对比令牌,然后将其分为可学习的正负极流,其差异交互突出了真正区分一个区域与另一个区域的内容。该模块为DeiT-Tiny主干增加了不到0.3M的参数,无需额外的浮点运算,并且完全独立于架构。在实证中,VCA将DeiT-Tiny在ImageNet-1K上的top-1准确率从72.2%提高到75.6%(+3.4%),并改善了三个强大的层次化ViTs高达3.1%,而在类条件ImageNet生成中,它在扩散(DiT)和流(SiT)模型中降低了FID-50K达2.1至5.2点。广泛的消融实验证实:(i)空间池化提供低方差全局线索;(ii)双位置嵌入对于对比推理是必不可少的;(iii)在两个阶段中结合两者会产生最强的协同作用。因此,VCA为更快、更清晰的视觉Transformer提供了一条简单的途径。源代码可在https://github.com/LeapLabTHU/LinearDiff找到。
论文及项目相关链接
PDF NeurIPS 2025
Summary
视觉转换器(ViTs)已成为图像识别和图像生成的通用主干。然而,其多头自注意力(MHSA)层仍对每对令牌执行二次查询键交互,将大量计算用于视觉较弱或冗余的相关性。我们引入了视觉对比注意力(VCA),作为MHSA的替代品,在减少理论复杂度从O(N^2 C)到O(N n C)的同时,注入了明确的鉴别概念,其中n<<N。VCA首先提炼每个头部的密集查询字段为少量的空间池化视觉对比令牌,然后将其分为可学习的正负流,其差异交互突出了真正区分一个区域与其他区域的内容。该模块为DeiT-Tiny主干增加了不到0.3M个参数,无需额外的FLOPs,并且完全独立于架构。经验表明,VCA将DeiT-Tiny在ImageNet-1K上的top-1准确率从72.2%提高到75.6%(+3.4%),并改进了三个强大的分层ViTs高达3.1%,而在类条件ImageNet生成中,它将FID-50K降低2.1至5.2点,适用于扩散(DiT)和流动(SiT)模型。广泛的消融实验证实,(i)空间池提供低方差全局线索,(ii)双位置嵌入对于对比推理至关重要,(iii)在两个阶段中结合两者产生最强协同作用。因此,VCA为更快、更锐利的视觉转换器提供了简单途径。源代码可在https://github.com/LeapLabTHU/LinearDiff找到。
Key Takeaways
- Vision Transformers (ViTs) 已成为图像识别和生成的通用架构。
- Multi-Head Self-Attention (MHSA) 层在计算上仍有优化空间,特别是对于视觉弱相关和冗余的关联。
- 引入 Visual-Contrast Attention (VCA),旨在优化 MHSA 的性能并降低计算复杂度。
- VCA 通过提炼查询字段和采用正负流来强化区域间的对比。
- VCA 在少量参数增加下提升了DeiT-Tiny在ImageNet上的准确率,并改进了其他ViTs模型。
- VCA在图像生成任务中也表现出性能提升,特别是在FID评分上。
点此查看论文截图