发布日期: 2025-09-16

更新日期: 2025-10-07

文章字数: 3.5k

阅读时长: 14 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-16 更新

Multi-Intent Recognition in Dialogue Understanding: A Comparison Between Smaller Open-Source LLMs

Authors:Adnan Ahmad, Philine Kowol, Stefan Hillmann, Sebastian Möller

In this paper, we provide an extensive analysis of multi-label intent classification using Large Language Models (LLMs) that are open-source, publicly available, and can be run in consumer hardware. We use the MultiWOZ 2.1 dataset, a benchmark in the dialogue system domain, to investigate the efficacy of three popular open-source pre-trained LLMs, namely LLama2-7B-hf, Mistral-7B-v0.1, and Yi-6B. We perform the classification task in a few-shot setup, giving 20 examples in the prompt with some instructions. Our approach focuses on the differences in performance of these models across several performance metrics by methodically assessing these models on multi-label intent classification tasks. Additionally, we compare the performance of the instruction-based fine-tuning approach with supervised learning using the smaller transformer model BertForSequenceClassification as a baseline. To evaluate the performance of the models, we use evaluation metrics like accuracy, precision, and recall as well as micro, macro, and weighted F1 score. We also report the inference time, VRAM requirements, etc. The Mistral-7B-v0.1 outperforms two other generative models on 11 intent classes out of 14 in terms of F-Score, with a weighted average of 0.50. It also has relatively lower Humming Loss and higher Jaccard Similarity, making it the winning model in the few-shot setting. We find BERT based supervised classifier having superior performance compared to the best performing few-shot generative LLM. The study provides a framework for small open-source LLMs in detecting complex multi-intent dialogues, enhancing the Natural Language Understanding aspect of task-oriented chatbots.

本文广泛分析了使用大型语言模型（LLM）进行多标签意图分类的方法。这些开源的大型语言模型可供公众使用，可在消费者硬件上运行。我们使用MultiWOZ 2.1数据集，这是对话系统领域的一个基准数据集，来研究三种流行的大型预训练开源LLM（即LLama2-7B-hf、Mistral-7B-v0.1和Yi-6B）的有效性。我们在少量样本设置中进行分类任务，提示中给出20个示例和一些指令。我们的方法侧重于在多个性能指标上评估这些模型在多标签意图分类任务中的性能差异。此外，我们将基于指令微调方法与使用较小Transformer模型BertForSequenceClassification的基于监督学习的方法进行比较。为了评估模型的性能，我们使用准确性、精确度和召回率以及微观、宏观和加权F1分数等评估指标。我们还报告了推理时间、VRAM要求等。在F得分方面，Mistral-7B-v0.1在14个意图类别中的11个类别中表现出超过另外两个生成模型的优势，加权平均为0.50。此外，它的Humming损失相对较低，Jaccard相似度较高，因此在小样本设置中成为获胜模型。我们发现基于BERT的监督分类器相较于表现最佳的小样本生成LLM具有更好的性能。该研究为小型开源LLM检测复杂多意图对话提供了框架，增强了任务导向型聊天机器人的自然语言理解方面。

论文及项目相关链接

PDF

Summary
本文分析了使用开源大型语言模型（LLMs）进行多标签意图分类的研究。研究采用MultiWOZ 2.1数据集，调查了三个流行的开源预训练LLMs在少样本设置下的性能差异。研究通过评估模型在多标签意图分类任务上的表现，比较了指令微调方法与基于较小转换器模型的监督学习方法。评估指标包括准确性、精确度、召回率以及微、宏观和加权F1分数等。最终发现Mistral-7B-v0.1在F分数上优于其他两个生成模型，并且在少数设置中具有较低的人类损失和较高的Jaccard相似性。此外，BERT基于监督的分类器表现出最佳性能。该研究为小型开源LLMs检测复杂多意图对话提供了框架，提高了任务导向型聊天机器人的自然语言理解方面。

Key Takeaways

研究采用了MultiWOZ 2.1数据集，对三个开源预训练的大型语言模型（LLama2-7B-hf、Mistral-7B-v0.1和Yi-6B）进行了多标签意图分类的分析。
在少样本设置下进行了分类任务，并通过方法评估了这些模型的性能差异。
除了常规评估指标（如准确性、精确度、召回率），还使用了F1分数、微、宏观和加权评价来全面评估模型性能。
Mistral-7B-v0.1在F分数上表现最佳，并且在某些评价中具有较低的人类损失和较高的Jaccard相似性。
与基于生成模型的少样本学习相比，基于BERT的监督分类器表现出更好的性能。
研究提供了小型开源LLMs在检测复杂多意图对话方面的应用框架。

Cool Papers

点此查看论文截图

Comparative Studies of Quantum Annealing, Digital Annealing, and Classical Solvers for Reaction Network Pathway Analysis and mRNA Codon Selection

Authors:Milind Upadhyay, Mark Nicholas Jones

For various optimization problems, the classical time to solution is super-polynomial and intractable to solve with classical bit-based computing hardware to date. Digital and quantum annealers have the potential to identify near-optimal solutions for such optimization problems using a quadratic unconstrained binary optimization (QUBO) problem formulation. This work benchmarks two use cases to evaluate the utility of QUBO solvers for combinatorial optimization problems, in order to determine if a QUBO formulation and annealing-based algorithms have an advantage over classical mixed-integer programming (MIP) and constraint programming (CP) solvers. Various QUBO and solver metrics such as problem mapping, quantitative interconnectivity, penalty structure, solver minimum cost (obtained optimal value) and solver time to solution have been applied to evaluate different QUBO problems. Constrained and unconstrained QUBO solvers are compared including the Fujitsu digital annealer (DA), various D-Wave hybrid quantum annealing solvers (QA, HQA), and the classical MIP/CP solvers HiGHS, Gurobi, SCIP, and CP-SAT. The two industrially relevant use cases are reaction network pathway analysis and mRNA codon selection. For reaction pathway analysis, classical MIP/CP solvers are observed to solve the problem to optimality in reasonable time frames while the DA is not able to do so. For mRNA codon selection, CP-SAT displayed the best performance for standard and large protein datasets (under 1500 amino acids). For the extra-large protein dataset (11000 to 14000 amino acids), the D-Wave Nonlinear HQA solver performed comparably to CP-SAT, outperforming it in minimum cost in 2 out of the 4 problems.

针对各种优化问题，使用经典比特计算硬件至今尚未找到多项式时间内的解决方案，因此很难解决这些问题。数字退火器和量子退火器有可能使用二次无约束二元优化（QUBO）问题公式来识别此类优化问题的近似最优解。为了评估QUBO求解器在组合优化问题上的效用，以确定QUBO公式和基于退火的算法是否优于经典混合整数编程（MIP）和约束编程（CP）求解器，这项工作对两个用例进行了基准测试。为了评估不同的QUBO问题，应用了各种QUBO和求解器指标，如问题映射、定量互联性、惩罚结构、求解器最低成本（获得的最佳值）和求解时间。比较了约束和无约束的QUBO求解器，包括富士通的数字退火器（DA）、各种D-Wave混合量子退火求解器（QA、HQA）、以及经典MIP/CP求解器HiGHS、Gurobi、SCIP和CP-SAT。这两个与工业相关的用例是反应网络途径分析和mRNA密码子选择。对于反应途径分析，观察到经典MIP/CP求解器能在合理的时间内找到最优解，而数字退火器则无法做到。对于mRNA密码子选择，CP-SAT在标准和大型蛋白质数据集（少于1500个氨基酸）上表现最佳。对于超大型蛋白质数据集（介于11000到14000个氨基酸之间），D-Wave非线性HQA求解器的性能与CP-SAT相当，在其中的两个问题上优于CP-SAT的最小成本。

论文及项目相关链接

PDF 56 pages, 9 figures, 13 tables, 56 references

Summary
针对各类优化问题，经典计算硬件的求解时间呈超多项式增长，难以解决。数字退火器和量子退火器具有潜力通过二次无约束二进制优化（QUBO）问题形式化来识别这些优化问题的近似最优解。为了评估QUBO求解器对于组合优化问题的实用性，本文选择了两个应用场景作为基准测试。通过应用不同的QUBO和求解器指标，如问题映射、定量互联性、惩罚结构、求解器最小成本和求解时间等，来评估不同的QUBO问题。测试了包括富士通的数字退火器（DA）和各种D-Wave混合量子退火求解器在内的约束和无约束QUBO求解器，以及与经典混合整数规划（MIP）和约束规划（CP）求解器进行比较。在两个与工业相关的用例反应网络途径分析和mRNA密码子选择中，观察到不同的求解器表现有所差异。

Key Takeaways

经典计算硬件在解决优化问题上存在困难，特别是面对超多项式增长的问题求解时间。
数字退火器和量子退火器具备解决这类优化问题的潜力，通过QUBO问题形式化来寻找近似最优解。
QUBO求解器在组合优化问题上进行了基准测试，包括问题映射、定量互联性、惩罚结构等关键指标。
评估了包括富士通数字退火器（DA）、D-Wave混合量子退火求解器在内的多种QUBO求解器，以及与经典MIP/CP求解器的性能比较。
在反应网络途径分析中，经典MIP/CP求解器能在合理的时间内达到最优解，而数字退火器未能做到。
在mRNA密码子选择中，CP-SAT在标准和大蛋白质数据集上表现最佳。

Cool Papers

点此查看论文截图

Authors:Liangqi Yuan, Dong-Jun Han, Shiqiang Wang, Christopher G. Brinton

Compared to traditional machine learning models, recent large language models (LLMs) can exhibit multi-task-solving capabilities through multiple dialogues and multi-modal data sources. These unique characteristics of LLMs, together with their large model size, make their deployment more challenging. Specifically, (i) deploying LLMs on local devices faces computational, memory, and energy resource issues, while (ii) deploying them in the cloud cannot guarantee real-time service and incurs communication/usage costs. In this paper, we design TMO, a local-cloud LLM inference system with Three-M Offloading: Multi-modal, Multi-task, and Multi-dialogue. TMO incorporates (i) a lightweight local LLM that can process simple tasks at high speed and (ii) a large-scale cloud LLM that can handle multi-modal data sources. We develop a resource-constrained reinforcement learning (RCRL) strategy for TMO that optimizes the inference location (i.e., local vs. cloud) and multi-modal data sources to use for each task/dialogue, aiming to maximize the long-term reward (response quality, latency, and usage cost) while adhering to resource constraints. We also contribute M4A1, a new dataset we curated that contains reward and cost metrics across multiple modality, task, dialogue, and LLM configurations, enabling evaluation of offloading decisions. We demonstrate the effectiveness of TMO compared to several exploration-decision and LLM-as-Agent baselines, showing significant improvements in latency, cost, and response quality.

与传统机器学习模型相比，最新的大型语言模型（LLM）能够通过多种对话和多模态数据源展示多任务解决能力。LLM的这些独特特点以及它们的大型模型规模，使得其部署更具挑战性。具体而言，（i）在本地设备上部署LLM面临计算、内存和能源资源问题，而（ii）在云端部署它们无法保证实时服务并会产生通信/使用成本。在本文中，我们设计了TMO，这是一个带有三种卸载功能（多模态、多任务、多对话）的本地云LLM推理系统。TMO结合了（i）一个轻量级的本地LLM，可以高速处理简单任务，以及（ii）一个可以处理多模态数据源的大规模云LLM。我们为TMO开发了一种资源受限强化学习（RCRL）策略，该策略可优化推理位置（即本地与云之间）以及针对每个任务/对话要使用的多模态数据源，旨在最大化长期奖励（响应质量、延迟和使用成本），同时遵守资源限制。我们还贡献了M4A1，这是我们整理的一个新数据集，其中包含跨多模态、任务、对话和LLM配置的奖励和成本指标，能够对卸载决策进行评估。我们证明了与多种探索决策和LLM代理基线相比，TMO的有效性，在延迟、成本和质量方面均显示出显着改善。

论文及项目相关链接

PDF

Summary

LLMs的多任务解决能力及多对话和多模态数据源的特性带来了部署挑战。本文设计了一个本地云LLM推理系统TMO，通过轻量化本地LLM处理简单任务，大规模云LLM处理多模态数据源。采用资源约束强化学习策略优化推理位置和数据源使用，旨在实现长期奖励最大化。同时贡献新的数据集M4A1，用于评估卸载决策的有效性。

Key Takeaways

LLMs具有多任务解决能力和处理多模态数据源的特性。
LLMs的部署面临计算、内存和能源资源的问题。
TMO系统结合本地和云LLM，以应对不同任务需求。
采用资源约束强化学习策略优化推理位置和数据源选择。
M4A1数据集用于评估不同模态、任务、对话和LLM配置的奖励和成本指标。
TMO相比其他方法，在延迟、成本和响应质量方面表现出显著改进。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-09-16/Interactive/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

Interactive

Talking Head Generation

Talking Head Generation 方向最新论文已更新，请持续关注 Update in 2025-09-17 AvatarSync Rethinking Talking-Head Animation through Autoregressive Perspective

2025-09-17 Talking Head Generation

Talking Head Generation

TTS

TTS 方向最新论文已更新，请持续关注 Update in 2025-09-16 WhisTLE Deeply Supervised, Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers

2025-09-16 TTS

TTS

Interactive

2025-09-16 更新

Multi-Intent Recognition in Dialogue Understanding: A Comparison Between Smaller Open-Source LLMs

Comparative Studies of Quantum Annealing, Digital Annealing, and Classical Solvers for Reaction Network Pathway Analysis and mRNA Codon Selection

Local-Cloud Inference Offloading for LLMs in Multi-Modal, Multi-Task, Multi-Dialogue Settings