TTS

发布日期: 2025-08-21

更新日期: 2025-09-08

文章字数: 3.2k

阅读时长: 12 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-08-21 更新

DiffIER: Optimizing Diffusion Models with Iterative Error Reduction

Authors:Ao Chen, Lihe Ding, Tianfan Xue

Diffusion models have demonstrated remarkable capabilities in generating high-quality samples and enhancing performance across diverse domains through Classifier-Free Guidance (CFG). However, the quality of generated samples is highly sensitive to the selection of the guidance weight. In this work, we identify a critical ``training-inference gap’’ and we argue that it is the presence of this gap that undermines the performance of conditional generation and renders outputs highly sensitive to the guidance weight. We quantify this gap by measuring the accumulated error during the inference stage and establish a correlation between the selection of guidance weight and minimizing this gap. Furthermore, to mitigate this gap, we propose DiffIER, an optimization-based method for high-quality generation. We demonstrate that the accumulated error can be effectively reduced by an iterative error minimization at each step during inference. By introducing this novel plug-and-play optimization framework, we enable the optimization of errors at every single inference step and enhance generation quality. Empirical results demonstrate that our proposed method outperforms baseline approaches in conditional generation tasks. Furthermore, the method achieves consistent success in text-to-image generation, image super-resolution, and text-to-speech generation, underscoring its versatility and potential for broad applications in future research.

扩散模型已经显示出在生成高质量样本和提高跨不同领域的性能方面的显著能力，这主要是通过无分类器引导（CFG）实现的。然而，生成样本的质量对引导权重的选择非常敏感。在这项工作中，我们识别出了一个关键的“训练-推理差距”，我们认为正是这个差距的存在破坏了条件生成的表现，并使输出高度依赖于引导权重。我们通过测量推理阶段的累积误差来量化这个差距，并建立了引导权重的选择与最小化这个差距之间的相关性。此外，为了缓解这种差距，我们提出了DiffIER，这是一种基于优化的高质量生成方法。我们证明，通过推理过程中每一步的迭代误差最小化，可以有效地减少累积误差。通过引入这种新颖的即插即用优化框架，我们能够在每一个单独的推理步骤中优化误差，提高生成质量。经验结果表明，我们所提出的方法在条件生成任务上优于基准方法。此外，该方法在文本到图像生成、图像超分辨率和文本到语音生成方面取得了持续的成功，这突显了其在未来研究中的通用性和潜力。

论文及项目相关链接

PDF

摘要

扩散模型通过无分类器引导（CFG）展现了生成高质量样本和增强跨域性能的显著能力。然而，生成样本的质量对引导权重的选择非常敏感。在这项工作中，我们识别出了一个关键的“训练-推理差距”，我们认为正是这个差距影响了条件生成的表现，并使输出对引导权重高度敏感。我们通过测量推理阶段的累积误差来量化这一差距，并建立引导权重选择与缩小这一差距之间的关联。此外，为了缓解这一差距，我们提出了DiffIER，一种基于优化的高质量生成方法。我们证明通过每一步推理时的迭代误差最小化，可以有效地减少累积误差。通过引入这种新颖的即插即用优化框架，我们能够在每一步推理中优化误差，提高生成质量。经验结果表明，我们的方法在条件生成任务上优于基准方法。此外，该方法在文本到图像生成、图像超分辨率和文本到语音生成方面取得了持续的成功，证明了其通用性和在未来研究中广泛应用的潜力。

关键见解

扩散模型在生成高质量样本和跨域性能增强方面表现出显著能力，但通过无分类器引导（CFG）实现。
生成样本的质量对引导权重的选择非常敏感。
存在一个关键的“训练-推理差距”，影响条件生成的表现。
通过测量推理阶段的累积误差来量化这一差距。
提出DiffIER方法，通过迭代误差最小化在每一步推理中优化误差，提高生成质量。
DiffIER方法在条件生成任务上优于基准方法。
DiffIER方法在文本到图像生成、图像超分辨率和文本到语音生成方面取得了显著成功，显示出其广泛的应用潜力。

Cool Papers

点此查看论文截图

MrMARTIAN: A Multi-resolution Mass Reconstruction Algorithm Combining Free-form and Analytic Components

Authors:Sangjun Cha, M. James Jee

We present ${\tt MrMARTIAN}$ (Multi-resolution MAximum-entropy Reconstruction Technique Integrating Analytic Node), a new hybrid strong lensing (SL) modeling algorithm. By incorporating physically motivated analytic nodes into the free-form method ${\tt MARS}$, ${\tt MrMARTIAN}$ enables stable and flexible mass reconstructions while mitigating oversmoothing in the inner mass profile. Its multi-resolution framework increases the degrees of freedom in regions with denser strong lensing constraints, thereby enhancing computational efficiency for a fixed number of free parameters. We evaluate the performance of ${\tt MrMARTIAN}$ using publicly available simulated SL data and find that it consistently outperforms ${\tt MARS}$ in recovering both mass and magnification. In particular, it delivers significantly more stable reconstructions when multiple images are sparsely distributed. Finally, we apply ${\tt MrMARTIAN}$ to the galaxy cluster MACS J0416.1–2403, incorporating two analytic nodes centered on the northeastern and southwestern BCGs. Our mass model, constrained by 412 multiple images, achieves an image-plane rms scatter of ~0”.11, the smallest to date for this dataset.

我们提出了MrMARTIAN（多分辨率最大熵重建技术集成解析节点），这是一种新的混合强透镜（SL）建模算法。通过将物理驱动的解析节点融入到自由形态方法MARS中，MrMARTIAN能够实现稳定和灵活的质量重建，同时缓解内质量分布的过平滑问题。其多分辨率框架在强透镜约束更密集的区域增加了自由度，从而在固定数量的自由参数下提高了计算效率。我们使用公开发布的模拟SL数据评估了MrMARTIAN的性能，发现它在恢复质量和放大率方面一直优于MARS。尤其当多个图像稀疏分布时，它提供了更稳定的重建结果。最后，我们对星系团MACS J0416.1–2403应用了MrMARTIAN，以东北部和西南部BCG为中心，融入了两个解析节点。我们的质量模型受412个多重图像的约束，实现了图像平面rms散射约为0”.11，这是迄今为止该数据集的最小值。

论文及项目相关链接

PDF 16 pages, 10 figures, submitted to ApJ

Summary
中国研究者提出了一种新的混合强透镜（SL）建模算法，名为${\tt MrMARTIAN}$。该算法将基于物理原理的分析节点纳入自由形式方法${\tt MARS}$中，可实现稳定和灵活的质量重建，同时缓解内部质量分布的过平滑问题。其多分辨率框架提高了具有密集强透镜约束区域的自由度，使用固定数量的自由参数增强了计算效率。研究团队用模拟的SL数据评估了${\tt MrMARTIAN}$的性能，发现其在恢复质量和放大倍数方面均优于${\tt MARS}$。特别是在多个图像稀疏分布的情况下，它提供了更稳定的重建结果。最后，研究团队将其应用于MACS J0416.1–2403星系团，以东北部和西南部BCGs为中心设置两个分析节点，通过约束的412个多重图像得到的图像平面rms散射约为0”.11，这是迄今为止该数据集的最小值。

Key Takeaways

${\tt MrMARTIAN}$是一种新的强透镜建模算法，结合了自由形式方法${\tt MARS}$和分析节点技术。
该算法通过引入分析节点实现了稳定和灵活的质量重建，解决了内部质量分布过平滑的问题。
${\tt MrMARTIAN}$的多分辨率框架提高了计算效率，同时增加了在密集约束区域的自由度。
对比模拟数据测试，${\tt MrMARTIAN}$在恢复质量和放大倍数方面表现出优于${\tt MARS}$的性能。
在处理多个图像稀疏分布的情况下，${\tt MrMARTIAN}$提供了更稳定的重建结果。
${\tt MrMARTIAN}$成功应用于MACS J0416.1–2403星系团，通过约束的412个多重图像得到较小的图像平面rms散射值。
该研究展示了${\tt MrMARTIAN}$算法在实际数据集上的有效性和优越性。

Cool Papers

点此查看论文截图

Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning

Authors:Can Jin, Hongwu Peng, Qixin Zhang, Yujin Tang, Dimitris N. Metaxas, Tong Che

Multi-agent systems (MAS) built on large language models (LLMs) offer a promising path toward solving complex, real-world tasks that single-agent systems often struggle to manage. While recent advancements in test-time scaling (TTS) have significantly improved single-agent performance on challenging reasoning tasks, how to effectively scale collaboration and reasoning in MAS remains an open question. In this work, we introduce an adaptive multi-agent framework designed to enhance collaborative reasoning through both model-level training and system-level coordination. We construct M500, a high-quality dataset containing 500 multi-agent collaborative reasoning traces, and fine-tune Qwen2.5-32B-Instruct on this dataset to produce M1-32B, a model optimized for multi-agent collaboration. To further enable adaptive reasoning, we propose a novel CEO agent that dynamically manages the discussion process, guiding agent collaboration and adjusting reasoning depth for more effective problem-solving. Evaluated in an open-source MAS across a range of tasks-including general understanding, mathematical reasoning, and coding-our system significantly outperforms strong baselines. For instance, M1-32B achieves 12% improvement on GPQA-Diamond, 41% on AIME2024, and 10% on MBPP-Sanitized, matching the performance of state-of-the-art models like DeepSeek-R1 on some tasks. These results highlight the importance of both learned collaboration and adaptive coordination in scaling multi-agent reasoning. Code is available at https://github.com/jincan333/MAS-TTS

基于大型语言模型（LLM）的多智能体系统（MAS）为解决复杂的现实世界任务提供了一条有前景的道路，这些任务往往是单一智能体系统难以应对的。虽然最近在测试时缩放（TTS）方面的进展大大提高了单一智能体在具有挑战性的推理任务上的性能，但如何在MAS中有效地扩展协作和推理仍然是一个悬而未决的问题。

在这项工作中，我们引入了一个自适应多智能体框架，旨在通过模型级别的训练和系统级别的协调来增强协作推理。我们构建了M500，这是一个包含500个多智能体协作推理轨迹的高质量数据集，并在此基础上对Qwen2.5-32B-Instruct进行了微调，产生了M1-32B，一个针对多智能体协作优化的模型。

论文及项目相关链接

PDF

Summary

该文介绍了一种基于大型语言模型的多智能体系统框架，旨在通过模型级训练和系统级协调来增强协作推理能力。文章提出了M500数据集，用于构建适用于多智能体协作的模型M1-32B。此外，引入了一种动态管理讨论过程的首席执行官智能体，以实现自适应推理。评估结果表明，该系统在多种任务上的性能显著优于强基线，如GPQA-Diamond任务上提高12%，AIME2024任务上提高41%，MBPP-Sanitized任务上提高10%。这凸显了学习协作和自适应协调在扩展多智能体推理中的重要性。

Key Takeaways

多智能体系统（MAS）结合大型语言模型（LLM）为解决复杂现实世界任务提供了有效途径，尤其在单智能体系统难以应对的场合中表现突出。
通过模型级训练和系统级协调，增强了智能体之间的协作推理能力。
引入M500数据集用于训练适用于多智能体协作的模型M1-32B。
首席执行官智能体的概念被提出，用于动态管理讨论过程并调整推理深度，从而实现自适应推理。
系统在多种任务上的性能评估超过了强基线，包括通用理解、数学推理和编码任务。
与最新技术相比，如DeepSeek-R1，该系统在某些任务上的性能相匹配。
代码已公开，可供进一步研究使用。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-08-21/TTS/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

TTS

Interactive

Interactive 方向最新论文已更新，请持续关注 Update in 2025-08-21 EEG Blink Artifacts Can Identify Read Music in Listening and Imagery

2025-08-21 Interactive

Interactive

医学图像

医学图像方向最新论文已更新，请持续关注 Update in 2025-08-21 ASDFormer A Transformer with Mixtures of Pooling-Classifier Experts for Robust Autism Diagnosis and Biomarker Discovery

2025-08-21 医学图像

医学图像