嘘~ 正在从服务器偷取页面 . . .

Interactive


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-02-21 更新

FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems

Authors:Borui Liao, Yulong Xu, Jiao Ou, Kaiyuan Yang, Weihua Jian, Pengfei Wan, Di Zhang

Full-Duplex Speech Dialogue Systems (Full-Duplex SDS) have significantly enhanced the naturalness of human-machine interaction by enabling real-time bidirectional communication. However, existing approaches face challenges such as difficulties in independent module optimization and contextual noise interference due to highly coupled architectural designs and oversimplified binary state modeling. This paper proposes FlexDuo, a flexible full-duplex control module that decouples duplex control from spoken dialogue systems through a plug-and-play architectural design. Furthermore, inspired by human information-filtering mechanisms in conversations, we introduce an explicit Idle state. On one hand, the Idle state filters redundant noise and irrelevant audio to enhance dialogue quality. On the other hand, it establishes a semantic integrity-based buffering mechanism, reducing the risk of mutual interruptions while ensuring accurate response transitions. Experimental results on the Fisher corpus demonstrate that FlexDuo reduces the false interruption rate by 24.9% and improves response accuracy by 7.6% compared to integrated full-duplex dialogue system baselines. It also outperforms voice activity detection (VAD) controlled baseline systems in both Chinese and English dialogue quality. The proposed modular architecture and state-based dialogue model provide a novel technical pathway for building flexible and efficient duplex dialogue systems.

全双工语音对话系统(Full-Duplex SDS)通过实现实时双向通信,极大地提高了人机交互的自然性。然而,现有方法面临独立模块优化和上下文噪声干扰等挑战,这是由于高度耦合的架构设计以及过于简化的二元状态建模导致的。本文提出了FlexDuo,这是一个灵活的全双工控制模块,通过即插即用的架构设计,将双工控制从语音对话系统中解耦出来。此外,受人类对话中信息过滤机制的启发,我们引入了一个明确的空闲状态。一方面,空闲状态可以过滤掉冗余的噪声和无关的音频,以提高对话质量。另一方面,它建立了一种基于语义完整性的缓冲机制,降低了相互干扰的风险,同时确保了准确的响应转换。在Fisher语料库上的实验结果表明,与集成全双工对话系统基线相比,FlexDuo将误中断率降低了24.9%,响应准确率提高了7.6%。与语音活动检测(VAD)控制的基线系统相比,它在中文和英文对话质量方面也表现出更好的性能。所提出的模块化架构和基于状态的对话模型为构建灵活高效的全双工对话系统提供了新的技术途径。

论文及项目相关链接

PDF

Summary

FlexDuo是一个灵活的全双工控制模块,它通过即插即用架构解决了现有全双工对话系统面临的挑战。FlexDuo引入了空闲状态来过滤冗余噪声和无关音频,从而提高对话质量。同时,它通过语义完整性为基础的缓冲机制确保准确响应过渡,降低相互干扰的风险。实验结果表明,FlexDuo在Fisher语料库上的表现优于集成全双工对话系统的基线,并在中英文对话质量方面优于语音活动检测(VAD)控制基线系统。该模块化和状态驱动的对话模型为构建灵活高效的全双工对话系统提供了新的技术途径。

Key Takeaways

  1. FlexDuo是一个灵活的全双工控制模块,旨在增强人类与机器之间的自然交互。
  2. 它解决了现有全双工对话系统所面临的挑战,如独立模块优化和上下文噪声干扰。
  3. FlexDuo引入了空闲状态,用于过滤冗余噪声和无关音频,从而提高对话质量。
  4. 空闲状态通过语义完整性为基础的缓冲机制确保了准确的响应过渡。
  5. 实验结果表明,FlexDuo在Fisher语料库上的性能优于基线系统,降低了错误中断率,提高了响应准确性。
  6. FlexDuo在中文和英文对话质量方面均优于基于语音活动检测(VAD)的控制系统。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
LLM LLM
LLM 方向最新论文已更新,请持续关注 Update in 2025-02-22 LServe Efficient Long-sequence LLM Serving with Unified Sparse Attention
2025-02-22
下一篇 
TTS TTS
TTS 方向最新论文已更新,请持续关注 Update in 2025-02-21 A Survey on Bridging EEG Signals and Generative AI From Image and Text to Beyond
2025-02-21
  目录