发布日期: 2025-11-17

更新日期: 2025-11-27

文章字数: 3.8k

阅读时长: 15 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-11-17 更新

Normality and the Turing Test

Authors:Alexandre Kabbach

This paper proposes to revisit the Turing test through the concept of normality. Its core argument is that the Turing test is a test of normal intelligence as assessed by a normal judge. First, in the sense that the Turing test targets normal/average rather than exceptional human intelligence, so that successfully passing the test requires machines to “make mistakes” and display imperfect behavior just like normal/average humans. Second, in the sense that the Turing test is a statistical test where judgments of intelligence are never carried out by a single “average” judge (understood as non-expert) but always by a full jury. As such, the notion of “average human interrogator” that Turing talks about in his original paper should be understood primarily as referring to a mathematical abstraction made of the normalized aggregate of individual judgments of multiple judges. Its conclusions are twofold. First, it argues that large language models such as ChatGPT are unlikely to pass the Turing test as those models precisely target exceptional rather than normal/average human intelligence. As such, they constitute models of what it proposes to call artificial smartness rather than artificial intelligence, insofar as they deviate from the original goal of Turing for the modeling of artificial minds. Second, it argues that the objectivization of normal human behavior in the Turing test fails due to the game configuration of the test which ends up objectivizing normative ideals of normal behavior rather than normal behavior per se.

本文提议通过正常性的概念重新考量图灵测试。其核心论点是，图灵测试是测试正常智力的一种方式，这种智力是由正常评判者评估的。首先，从某种程度上讲，图灵测试的目标是测试正常或平均的人类智力，而非超常的智力，因此成功通过测试需要机器像正常或平均人类一样“犯错误”并展现出非完美的行为。其次，图灵测试是一种统计测试，智力的判断并非由单个的“普通”（理解为非专家）评判者做出，而是由一组评判者共同完成。因此，图灵在其原始论文中提到的“普通人类询问者”的概念应该主要被理解为由多个评判者的个人判断标准化汇总后的数学抽象。其结论有两点。首先，它认为大型语言模型如ChatGPT不太可能通过图灵测试，因为这些模型主要针对的是超常而非正常或平均的人类智力。因此，它们构成了所谓的人工智慧模型，而非人工智能模型，因为它们偏离了图灵建模人工心智的初衷。其次，它认为图灵测试中正常人类行为的客观化由于测试的游戏设置而失败，最终客观化的反而是正常行为的标准理想而非本身。

论文及项目相关链接

PDF

Summary：

本文提出重新思考图灵测试的概念，通过正常性的概念来解读图灵测试的核心论点。文章认为图灵测试是测试正常智力的一种方式，成功通过测试需要机器展现出像正常人一样的错误和不完美行为。此外，图灵测试是一个统计测试，判断智能的任务不是由一个普通的评判者完成，而是一个完整的陪审团。因此，图灵测试中提到的“普通人类询问者”应该被理解为多个评判者判断的平均值抽象概念。文章认为大型语言模型如ChatGPT不太可能通过图灵测试，因为它们针对的是异常而非正常人类智力。因此，这些模型更多地展现了人工智慧而非人工智能。此外，图灵测试中对正常人类行为的客观化因测试设置而失败，最终客观化的反而是规范性理想行为而非真实正常行为。

Key Takeaways：

图灵测试是测试正常智力的方式，机器需展现不完美行为以模拟正常人。
图灵测试是一个统计测试，涉及多个评判者的平均判断。
大型语言模型如ChatGPT针对的是异常而非正常人类智力，更体现人工智慧而非人工智能。
图灵测试中对正常行为的客观化因测试设置而存在问题。
测试中的客观化更倾向于规范性理想行为而非真实正常行为。
图灵测试需要重新审视和解读，尤其是在正常性的概念下。

Cool Papers

点此查看论文截图

DIFFA: Large Language Diffusion Models Can Listen and Understand

Authors:Jiaming Zhou, Hongjie Chen, Shiwan Zhao, Jian Kang, Jie Li, Enzhi Wang, Yujie Guo, Haoqin Sun, Hui Wang, Aobo Kong, Yong Qin, Xuelong Li

Recent advances in large language models (LLMs) have shown remarkable capabilities across textual and multimodal domains. In parallel, diffusion-based language models have emerged as a promising alternative to the autoregressive paradigm, offering improved controllability, bidirectional context modeling, and robust generation. However, their application to the audio modality remains underexplored. In this work, we introduce \textbf{DIFFA}, the first diffusion-based large audio-language model designed to perform spoken language understanding. DIFFA integrates a frozen diffusion language model with a lightweight dual-adapter architecture that bridges speech understanding and natural language reasoning. We employ a two-stage training pipeline: first, aligning semantic representations via an ASR objective; then, learning instruction-following abilities through synthetic audio-caption pairs automatically generated by prompting LLMs. Despite being trained on only 960 hours of ASR and 127 hours of synthetic instruction data, DIFFA demonstrates competitive performance on major benchmarks, including MMSU, MMAU, and VoiceBench, outperforming several autoregressive open-source baselines. Our results reveal the potential of diffusion-based language models for efficient and scalable audio understanding, opening a new direction for speech-driven AI. Our code will be available at https://github.com/NKU-HLT/DIFFA.git.

最近的大型语言模型（LLM）进展在文本和多模态领域表现出了显著的能力。与此同时，基于扩散的语言模型作为一种有前景的自回归范式的替代方案崭露头角，提供了更好的可控性、双向上下文建模和稳健的生成。然而，它们在音频模态的应用仍然被较少探索。在这项工作中，我们介绍了基于扩散的大型音频语言模型DIFFA，用于执行口语理解。DIFFA将冻结的扩散语言模型与轻量级的双适配器架构相结合，该架构架起了语音理解和自然语言推理之间的桥梁。我们采用两阶段训练管道：首先，通过ASR目标对齐语义表示；然后，通过由提示LLM自动生成的合成音频字幕对来学习指令执行能力。尽管仅在960小时的ASR和127小时的合成指令数据上进行训练，DIFFA在主要基准测试上表现出竞争力，包括MMSU、MMAU和VoiceBench，超越了多个自回归开源基线。我们的结果揭示了基于扩散的语言模型在高效可扩展音频理解方面的潜力，为语音驱动的AI开辟了一个新方向。我们的代码将在https://github.com/NKU-HLT/DIFFA.git上提供。

论文及项目相关链接

PDF Accepted by AAAI 2026

Summary

扩散模型首次应用于音频语言领域，推出首个扩散大音频语言模型DIFFA，用于口语理解。DIFFA结合冻结的扩散语言模型和轻量级双适配器架构，通过两阶段训练管道实现语音理解和自然语言推理的桥梁。仅在少量ASR和合成指令数据上训练，DIFFA在主要基准测试中表现出竞争力，包括MMSU、MMAU和VoiceBench，优于多个开源回归基线。展示了扩散模型在高效可伸缩音频理解方面的潜力，为语音驱动的AI开启新方向。

Key Takeaways

DIFFA是首个基于扩散模型的音频语言模型，旨在进行口语理解。
DIFFA结合了冻结的扩散语言模型和轻量级双适配器架构。
通过两阶段训练管道实现语音理解和自然语言推理的结合。
DIFFA在主要基准测试中表现优异，包括MMSU、MMAU和VoiceBench。
DIFFA在仅使用少量ASR和合成指令数据的情况下进行了训练。
DIFFA的性能优于多个开源回归基线。

Cool Papers

点此查看论文截图

MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction

Authors:Yunkee Chae, Kyogu Lee

We present MGE-LDM, a unified latent diffusion framework for simultaneous music generation, source imputation, and query-driven source separation. Unlike prior approaches constrained to fixed instrument classes, MGE-LDM learns a joint distribution over full mixtures, submixtures, and individual stems within a single compact latent diffusion model. At inference, MGE-LDM enables (1) complete mixture generation, (2) partial generation (i.e., source imputation), and (3) text-conditioned extraction of arbitrary sources. By formulating both separation and imputation as conditional inpainting tasks in the latent space, our approach supports flexible, class-agnostic manipulation of arbitrary instrument sources. Notably, MGE-LDM can be trained jointly across heterogeneous multi-track datasets (e.g., Slakh2100, MUSDB18, MoisesDB) without relying on predefined instrument categories. Audio samples are available at our project page: https://yoongi43.github.io/MGELDM_Samples/.

我们提出了MGE-LDM，这是一个统一的潜在扩散框架，用于同时实现音乐生成、源补全和查询驱动源分离。不同于受限于固定仪器类别的先前方法，MGE-LDM在一个紧凑的潜在扩散模型中学习全混合、子混合和单个茎的联合分布。在推理过程中，MGE-LDM能够实现（1）完全混合生成，（2）部分生成（即源补全），以及（3）文本条件驱动的任意源提取。通过将在潜在空间中的分离和补全都制定为条件填充任务，我们的方法支持灵活、类别无关的任意乐器源操作。值得注意的是，MGE-LDM可以在异质的多轨道数据集（例如Slakh2100、MUSDB18、MoisesDB）上进行联合训练，而无需依赖预定义的乐器类别。音频样本可在我们的项目页面找到：https://yoongi43.github.io/MGELDM_Samples/。

论文及项目相关链接

PDF Accepted by NeurIPS 2025

Summary

MGE-LDM是一个统一的潜在扩散框架，支持音乐生成、源补全和查询驱动源分离。与固定乐器类别的先前方法不同，MGE-LDM学习在单一紧凑的潜在扩散模型中对完整混合物、子混合物和单个音轨的联合分布。在推理过程中，MGE-LDM支持完整混合物生成、部分生成（即源补全）和文本条件驱动下的任意源提取。通过将在潜在空间中的分离和补全都制定为条件填充任务，我们的方法支持灵活、类别无关的任意乐器源操作。MGE-LDM可以在不依赖预设乐器类别的情况下，联合训练跨异质多轨道数据集（如Slakh2100、MUSDB18、MoisesDB）。

Key Takeaways

MGE-LDM是一个统一的潜在扩散框架，用于音乐生成、源补全和查询驱动源分离。
与其他方法不同，MGE-LDM学习在单一模型中处理完整混合物、子混合物和单个音轨的联合分布。
MGE-LDM支持推理过程中的多种任务，包括完整混合物生成、部分生成（源补全）和文本条件驱动下的任意源提取。
MGE-LDM通过条件填充任务在潜在空间中进行操作，实现灵活、类别无关的乐器源操作。
MGE-LDM可以在多个异质数据集上进行联合训练，无需依赖预设的乐器类别。
MGE-LDM能够处理不同类型的音乐数据，包括不同的音乐数据库中的多轨道音频数据。
提供了音频样本以供参考和评估。

Cool Papers

点此查看论文截图

Stochastic Diffusion: A Diffusion Probabilistic Model for Stochastic Time Series Forecasting

Authors:Yuansan Liu, Sudanthi Wijewickrema, Dongting Hu, Christofer Bester, Stephen O’Leary, James Bailey

Recent innovations in diffusion probabilistic models have paved the way for significant progress in image, text and audio generation, leading to their applications in generative time series forecasting. However, leveraging such abilities to model highly stochastic time series data remains a challenge. In this paper, we propose a novel Stochastic Diffusion (StochDiff) model which learns data-driven prior knowledge at each time step by utilizing the representational power of the stochastic latent spaces to model the variability of the multivariate time series data. The learnt prior knowledge helps the model to capture complex temporal dynamics and the inherent uncertainty of the data. This improves its ability to model highly stochastic time series data. Through extensive experiments on real-world datasets, we demonstrate the effectiveness of our proposed model on stochastic time series forecasting. Additionally, we showcase an application of our model for real-world surgical guidance, highlighting its potential to benefit the medical community.

最近扩散概率模型的创新为图像、文本和音频生成领域带来了重大进展，并推动了其在生成时间序列预测中的应用。然而，利用这些能力对高度随机的时间序列数据进行建模仍然是一个挑战。在本文中，我们提出了一种新型随机扩散（StochDiff）模型，该模型利用随机潜在空间的表示能力，通过数据驱动的方式在每个时间步骤学习先验知识，以模拟多元时间序列数据的变异性。所学习的先验知识有助于模型捕捉复杂的时序动态和数据的内在不确定性，从而提高了其模拟高度随机时间序列数据的能力。通过在实际数据集上进行的大量实验，我们验证了所提出模型在随机时间序列预测方面的有效性。此外，我们还展示了模型在真实手术指导中的应用，突显了其惠及医学界的潜力。

论文及项目相关链接

PDF 15 pages, 4 figures. SIGKDD 2025

Summary
本文提出一种新型随机扩散模型（StochDiff），结合扩散概率模型的最新创新技术，通过利用随机潜在空间的表征能力来模拟多元时间序列数据的变异性，学习数据驱动的先验知识以捕捉复杂的时间动态和数据的固有不确定性，从而改进对高度随机时间序列数据的建模能力。并在真实数据集上的实验证明模型在随机时间序列预测上的有效性。同时展示模型在真实手术指导中的应用，为医疗行业带来潜在益处。

Key Takeaways