⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-01-06 更新
Test-time Controllable Image Generation by Explicit Spatial Constraint Enforcement
Authors:Z. Zhang, B. Liu, J. Bao, L. Chen, S. Zhu, J. Yu
Recent text-to-image generation favors various forms of spatial conditions, e.g., masks, bounding boxes, and key points. However, the majority of the prior art requires form-specific annotations to fine-tune the original model, leading to poor test-time generalizability. Meanwhile, existing training-free methods work well only with simplified prompts and spatial conditions. In this work, we propose a novel yet generic test-time controllable generation method that aims at natural text prompts and complex conditions. Specifically, we decouple spatial conditions into semantic and geometric conditions and then enforce their consistency during the image-generation process individually. As for the former, we target bridging the gap between the semantic condition and text prompts, as well as the gap between such condition and the attention map from diffusion models. To achieve this, we propose to first complete the prompt w.r.t. semantic condition, and then remove the negative impact of distracting prompt words by measuring their statistics in attention maps as well as distances in word space w.r.t. this condition. To further cope with the complex geometric conditions, we introduce a geometric transform module, in which Region-of-Interests will be identified in attention maps and further used to translate category-wise latents w.r.t. geometric condition. More importantly, we propose a diffusion-based latents-refill method to explicitly remove the impact of latents at the RoI, reducing the artifacts on generated images. Experiments on Coco-stuff dataset showcase 30$%$ relative boost compared to SOTA training-free methods on layout consistency evaluation metrics.
Key Takeaways
- 提出了新型的测试时可控生成方法,适用于自然文本提示和复杂条件。
- 通过解耦空间条件为语义和几何条件,提升图像生成的灵活性。
- 语义条件下完成提示,并通过注意力图和词空间距离测量消除干扰提示词的负面影响。
- 引入几何变换模块应对复杂几何条件,识别关注区域并翻译相关类别潜在特征。
- 提出基于扩散的潜在特征填充方法,减少生成图像的瑕疵。
- 在Coco-stuff数据集上的实验显示,该方法在布局一致性评估指标上相对提高了30%。
Conditional Consistency Guided Image Translation and Enhancement
Authors:A. V. Subramanyam, Amil Bhagat, Milind Jain
Consistency models have emerged as a promising alternative to diffusion models, offering high-quality generative capabilities through single-step sample generation. However, their application to multi-domain image translation tasks, such as cross-modal translation and low-light image enhancement remains largely unexplored. In this paper, we introduce Conditional Consistency Models (CCMs) for multi-domain image translation by incorporating additional conditional inputs. We implement these modifications by introducing task-specific conditional inputs that guide the denoising process, ensuring that the generated outputs retain structural and contextual information from the corresponding input domain. We evaluate CCMs on 10 different datasets demonstrating their effectiveness in producing high-quality translated images across multiple domains. Code is available at https://github.com/amilbhagat/Conditional-Consistency-Models.
PDF 6 pages, 5 figures, 4 tables, ICME conference 2025
Key Takeaways
- 条件一致性模型(CCM)被提出用于多领域图像翻译,通过引入额外的条件输入实现高质量生成。
- CCM模型能够实现单步采样生成,提高了生成效率。
- CCM模型在结构保留和上下文信息保留方面表现出色,确保生成的图像与输入领域相关。
- CCM模型在多个数据集上的表现得到了验证,展现了广泛的应用前景。
- 该模型的应用范围包括跨模态翻译和低光图像增强等。
- 模型的相关代码已经公开,便于其他研究者进行进一步的研究和实验。