发布日期: 2025-10-25

更新日期: 2025-11-27

文章字数: 7.5k

阅读时长: 30 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-10-25 更新

Stoichiometrically-informed symbolic regression for extracting chemical reaction mechanisms from data

Authors:Manuel Palma Banos, Joel D. Kress, Rigoberto Hernandez, Galen T. Craven

A data-driven computational method is introduced to extract chemical reaction mechanisms from time series chemical concentration data. It is realized through the use of dynamic symbolic regression in which a sparse analytical form for a dynamical system is discoverable from the underlying data. We specifically develop the stoichiometrically-informed symbolic regression (SISR) method to address a standing challenge in complex chemical reaction networks: Given a time-series dataset of concentrations of several components, what is the mechanism and the associated rate constants? SISR finds the optimal mechanism, kinetic equations and rate constants by combining differential optimization with a genetic optimization approach that searches a symbolic space of possible reaction mechanisms. Use of SISR in several paradigmatic examples spanning linear and nonlinear reaction schemes results in excellent agreement between true and predicted mechanisms, including when the method is applied to noisy data. The advantages of a stoichiometrically-informed approach such as SISR to address reaction discovery is illustrated through comparison with the use of generic state-of-the-art data-driven approaches.

介绍了一种以数据驱动的计算方法，该方法从时间序列化学浓度数据中提取化学反应机理。它是通过采用动态符号回归实现的，其中可以从基础数据中发现一个动力系统的稀疏分析形式。我们专门开发了化学计量信息符号回归（SISR）方法，以解决复杂化学反应网络中的一项挑战：给定多个组分的浓度时间序列数据集，其机理和相关速率常数是什么？SISR通过结合差分优化和遗传优化方法，在可能的反应机理的符号空间中搜索，找到最优机理、动力学方程和速率常数。SISR在涵盖线性和非线性反应方案的几个范例中的应用表明，真实机理与预测机理之间具有良好的一致性，包括在应用于噪声数据时也是如此。通过与使用先进的通用数据驱动方法进行对比，说明了采用如SISR之类的化学计量信息方法在处理反应发现方面的优势。

论文及项目相关链接

PDF

Summary

数据驱动的计算方法用于从时间序列化学浓度数据中提取化学反应机理。该方法通过动态符号回归实现，可从数据中获取动态系统的稀疏分析形式。为解决复杂化学反应网络中的难题，我们专门开发了化学计量信息符号回归（SISR）方法：给定多个组分的浓度时间序列数据集，反应机理和相关的速率常数是什么？SISR通过结合差分优化和遗传优化方法，在可能的反应机理的符号空间中搜索，找到最佳机理、动力学方程和速率常数。将SISR应用于线性反应方案和非线性反应方案的几个范例中，真实和预测机理之间的吻合度极高，包括应用于噪声数据时也是如此。通过与现代数据驱动方法的比较，突出了采用化学计量信息方法（如SISR）解决反应发现的优点。

Key Takeaways

引入了一种数据驱动的计算方法，能够从时间序列化学浓度数据中提取化学反应机理。
通过动态符号回归实现该方法，能够发现动态系统的稀疏分析形式。
开发了专门解决复杂化学反应网络问题的SISR方法。
SISR方法可以找到最优的反应机理、动力学方程和速率常数。
SISR方法应用于不同的反应方案，表现出高准确性的预测。
即使面对噪声数据，SISR也能获得优秀的表现。

Cool Papers

点此查看论文截图

MR-UBi: Mixed Reality-Based Underwater Robot Arm Teleoperation System with Reaction Torque Indicator via Bilateral Control

Authors:Kohei Nishi, Masato Kobayashi, Yuki Uranishi

We present a mixed reality-based underwater robot arm teleoperation system with a reaction torque indicator via bilateral control (MR-UBi). The reaction torque indicator (RTI) overlays a color and length-coded torque bar in the MR-HMD, enabling seamless integration of visual and haptic feedback during underwater robot arm teleoperation. User studies with sixteen participants compared MR-UBi against a bilateral-control baseline. MR-UBi significantly improved grasping-torque control accuracy, increasing the time within the optimal torque range and reducing both low and high grasping torque range during lift and pick-and-place tasks with objects of different stiffness. Subjective evaluations further showed higher usability (SUS) and lower workload (NASA–TLX). Overall, the results confirm that \textit{MR-UBi} enables more stable, accurate, and user-friendly underwater robot-arm teleoperation through the integration of visual and haptic feedback. For additional material, please check: https://mertcookimg.github.io/mr-ubi

我们提出了一种基于混合现实的遥控水下机器人手臂系统，该系统采用双边控制带有反应扭矩指示器（MR-UBi）。反应扭矩指示器（RTI）在MR-HMD中叠加了一个颜色和长度编码的扭矩条，使视觉和触觉反馈在水下机器人手臂遥控操作中无缝集成。对十六名参与者的用户研究表明，与双边控制基线相比，MR-UBi在抓取扭矩控制精度方面有了显著提高，增加了处于最佳扭矩范围的时间，并降低了不同刚度物体在升降和拾取放置任务中的低扭矩和高扭矩抓取范围。主观评价进一步显示，其提高了可用性（SUS）并降低了工作量（NASA-TLX）。总体而言，结果表明，通过视觉和触觉反馈的集成，MR-UBi能够实现更稳定、准确和用户友好的水下机器人手臂遥控操作。更多材料请查看：https://mertcookimg.github.io/mr-ubi。

论文及项目相关链接

PDF

Summary

本文介绍了一种基于混合现实技术的水下机器人手臂遥操作体系，该体系通过双边控制引入反应扭矩指示器（RTI）。RTI在混合现实头戴显示器中叠加颜色和长度编码的扭矩条，实现了视觉和触觉反馈的无缝集成，从而在水下机器人手臂遥操作中提供更好的控制体验。用户研究结果显示，与双边控制基准相比，MR-UBi显著提高了抓取扭矩控制精度，增加了处于最佳扭矩范围的时间，减少了不同刚度物体的拾取和放置任务中的低高抓取扭矩范围。主观评估进一步表明，MR-UBi具有更高的可用性（SUS）和更低的工作负荷（NASA-TLX）。总体而言，MR-UBi通过整合视觉和触觉反馈，实现了更稳定、准确和用户友好的水下机器人手臂遥操作。

Key Takeaways

MR-UBi是一种基于混合现实技术的水下机器人手臂遥操作体系。
反应扭矩指示器（RTI）通过双边控制实现。
RTI在混合现实头戴显示器中显示颜色和长度编码的扭矩条，整合视觉和触觉反馈。
MR-UBi提高了抓取扭矩控制精度，增加了处于最佳扭矩范围的时间。
MR-UBi减少了低高抓取扭矩范围，尤其在拾取和放置不同刚度物体的任务中。
主观评估显示MR-UBi具有更高的可用性和更低的工作负荷。
MR-UBi实现了更稳定、准确和用户友好的水下机器人手臂遥操作。

Cool Papers

点此查看论文截图

Two Quantum Algorithms for Nonlinear Reaction-Diffusion Equation using Chebyshev Approximation Method

Authors:Manish Kumar

We present two new quantum algorithms for reaction-diffusion equations that employ the truncated Chebyshev polynomial approximation. This method is employed to numerically solve the ordinary differential equation emerging from the linearization of the associated nonlinear differential equation. In the first algorithm, we use the matrix exponentiation method (Patel et al., 2018), while in the second algorithm, we repurpose the quantum spectral method (Childs et al., 2020). Our main technical contribution is to derive the sufficient conditions for the diagonalization of the Carleman embedding matrix, which is indispensable for designing both quantum algorithms. We supplement this with an efficient iterative algorithm to diagonalize the Carleman matrix. Our first algorithm has gate complexity of O(d$\cdot$log(d)+T$\cdot$polylog(T/$\varepsilon$)). Here $d$ is the size of the Carleman matrix, $T$ is the simulation time, and $\varepsilon$ is the approximation error. The second algorithm is polynomial in $log(d)$, $T$, and $log(1/\varepsilon)$ - the gate complexity scales as O(polylog(d)$\cdot$T$\cdot$polylog(T/$\varepsilon$)). In terms of $T$ and $\varepsilon$, this is comparable to the speedup gained by the current best known quantum algorithm for this problem, the truncated Taylor series method (Costa et.al., 2025). Our approach has two shortcomings. First, we have not provided an upper bound, in terms of d, on the condition number of the Carleman matrix. Second, the success of the diagonalization is based on a conjecture that a specific trigonometric equation has no integral solution. However, we provide strategies to mitigate these shortcomings in most practical cases.

我们提出了两个用于解决反应扩散方程的新量子算法，这两个算法采用截断切比雪夫多项式逼近法。该方法用于数值求解由相关非线性微分方程线性化产生的常微分方程。在第一个算法中，我们使用了矩阵指数法（Patel等人，2018年），而在第二个算法中，我们重新利用了量子谱方法（Childs等人，2020年）。我们的主要技术贡献是推导出卡尔曼嵌入矩阵对角化的充分条件，这对于设计两种量子算法都是不可或缺的。为此，我们还提供了一种有效的迭代算法来对卡尔曼矩阵进行对角化。我们的第一个算法的门复杂度为O(d·log(d)+T·polylog(T/ε))。其中d是卡尔曼矩阵的大小，T是模拟时间，ε是近似误差。第二个算法在log(d)、T和log(1/ε)上是多项式的——门复杂度为O(polylog(d)·T·polylog(T/ε))。就T和ε而言，这与当前解决此问题的最佳已知量子算法——截断泰勒级数法（Costa等，2025年）所获得的加速效果相当。我们的方法有两个缺点。首先，我们没有提供卡尔曼矩阵条件数的d的上界。其次，对角化的成功是基于一个猜想，即特定的三角方程没有整数解。然而，我们提供了策略，在大多数实际情况下可以缓解这些缺点。

论文及项目相关链接

PDF

Summary
本文提出两种解决反应扩散方程的新量子算法，采用截断切比雪夫多项式逼近法。此法用于数值求解由相关非线性微分方程线性化产生的常微分方程。第一种算法采用矩阵指数法，第二种算法则采用量子谱方法。主要技术贡献是推导出卡尔曼嵌入矩阵对角化的充分条件，这是设计两种量子算法所必需的。同时，我们补充了一种有效的迭代算法来对卡尔曼矩阵进行对角化。我们的第一个算法的门复杂度为O(d·log(d)+T·polylog(T/ε))，其中d是卡尔曼矩阵的大小，T是模拟时间，ε是近似误差。第二个算法在log(d)、T和log(1/ε)上是多项式的——门复杂度为O(polylog(d)·T·polylog(T/ε))。相较于当前最佳已知量子算法（截断泰勒级数法），我们的方法在T和ε方面取得了相当的加速效果。我们的方法存在两个缺点：未就d对卡尔曼矩阵的条件数给出上界；对角化的成功依赖于一个特定三角方程无整数解的猜想。但在大多数实际情况下，我们提供了应对策略。

Key Takeaways

提出两种新量子算法，使用截断切比雪夫多项式逼近法解决反应扩散方程。
第一种算法基于矩阵指数法，第二种算法基于量子谱方法。
主要贡献在于推导出卡尔曼嵌入矩阵对角化的充分条件及高效迭代算法。
第一个算法门复杂度为O(d·log(d)+T·polylog(T/ε))。
第二个算法在复杂度上实现了多项式加速，与当前最佳量子算法相当。
方法存在两个缺点：未给出卡尔曼矩阵条件数的上界，成功依赖于特定猜想的验证。

Cool Papers

点此查看论文截图

The Regulated GeAs Cycles with the New $^{63}$Ga(p,$γ$)$^{64}$Ge and $^{64}$Ge(p,$γ$)$^{65}$As Reaction Rates and Their Impact on the GS 1826$-$24 Clocked Bursts and SAX J1808.4$-$3658 Photospheric Radius Expansion Bursts

Authors:Yi Hua Lam, Ning Lu, Alexander Heger, Zi Xin Liu, Zac Johnston, Hidetoshi Yamaguchi

The $^{63}$Ga(p,$\gamma$)$^{64}$Ge and $^{64}$Ge(p,$\gamma$)$^{65}$As thermonuclear reactions connect the ZnGa and GeAs cycles by diverting the flow of the rapid proton capture process from $^{63}$Ga to $^{65}$As. Changes in these two reaction rates regulate the ZnGa and GeAs cycles and may affect the modeled properties matching with the observed counterparts of a type I X-ray burster. We implement the latest $^{63}$Ga(p,$\gamma$)$^{64}$Ge and $^{64}$Ge(p,$\gamma$)$^{65}$As reaction rates to the state-of-the-art self-consistent one-dimensional multi-zone thermo-hydrodynamic code, KEPLER, to study the influence of these new reaction rates on the models of the GS 1826$-$24 clocked burster and SAX J1808.4$-$3658 photospheric radius expansion burster. Both new reaction rates obtained by Lu et al. [Phys. Rev. C 110, 065804 (2024)] are determined from complementing the experimental input with the nuclear spectroscopic information deduced from the full pf-shell space configuration-interaction shell-model calculations. By constraining the models on reproducing the observed burst peak, light-curve profile, fluence, and recurrence time, we find that the impact of the newly measured proton thresholds and respective proton-capture reactions on the burst light-curve profile of the GS 1826$-$24 clocked burster is, in fact, not as significant as claimed by Zhou et al. [Nat. Phys. 19, 1091 (2023)]. With or without the inclusion of the newly determined reaction rate of the highly influential $^{22}$Mg($\alpha$,p)$^{25}$Al reaction, the impact of the new $^{63}$Ga(p,$\gamma$)$^{64}$Ge and $^{64}$Ge(p,$\gamma$)$^{65}$As reaction rates on SAX J1808.4$-$3658 photospheric radius expansion bursts is evident. Our finding indicates that the models reproducing the 2002 October epoch of SAX J1808.4$-$3658 photospheric radius expansion burster is more sensitive to the uncertainties of thermonuclear reaction rates.

第63号Ga（p，γ）第64号Ge和第64号Ge（p，γ）第65号As的热核反应连接了ZnGa和GeAs循环，通过将从第63号Ga的快速质子捕获过程转移到第65号As。这两个反应速率的变化调节ZnGa和GeAs循环，并可能影响与I型X射线爆发者观察到的对应物的模型属性。我们采用最新的第63号Ga（p，γ）第64号Ge和第64号Ge（p，γ）第65号As反应率，将其纳入最新的一维多区热动力代码KEPLER中，以研究这些新反应率对GS 1826-24计时爆发器和SAX J1808.4-3658光球半径扩展爆发器模型的影响。这两种新的反应速率由Lu等人确定（物理评论C 110，065804（2024年））是通过补充实验输入和从完整的pf壳空间配置相互作用壳模型计算推导出的核光谱信息得出的。通过约束模型以重现观察到的爆发峰值、光变曲线轮廓、流量和复发时间，我们发现新测量的质子阈值和各自的质子捕获反应对GS 1826-24计时爆发器的爆发光变曲线轮廓的影响实际上并不像Zhou等人所声称的那样重要。[自然物理首页上全篇幅报道第19期（2023年）。无论是否包含新确定的具有高度影响力的第22号Mg（α、质子）第25号Al的反应速率，新的第Ga（p、γ）第Ge和第Ge（p、γ）第As的反应速率对SAX J1808.4-3658光球半径扩展爆发的明显影响是明显的。我们的研究结果表明，复制SAX J1808.4-3658光球半径扩张爆发器在时隔久远后重现的模型对于热核反应速率的不确定性更为敏感。

论文及项目相关链接

PDF 12 pages, 6 figures (colorblind-friendly colors), 1 table, finalized version; first version was submitted on 29 Nov 2023, the only revised version was submitted on 28 Apr 2025; accepted by The Astrophysical Journal on 25 May 2025. To appear at The Astrophysical Journal (Open Access) on 28 Oct 2025

摘要
本文研究了$^{63}$Ga(p,$\gamma$)$^{64}$Ge和$^{64}$Ge(p,$\gamma$)$^{65}$As核反应对ZnGa和GeAs循环的影响，这些反应调整了快速质子捕获过程从$^{63}$Ga到$^{65}$As的流向。这些反应速率的变化会调控ZnGa和GeAs循环，并可能影响模拟特性与类型I X射线爆发器的观测结果相匹配。最新反应率被纳入先进的一维多区热动力代码KEPLER中，以研究这些新反应率对GS 1826$-$24时钟爆发器和SAX J1808.4$-$3658光球半径扩展爆发器模型的影响。新反应率结合了实验数据与基于全pf壳空间配置相互作用壳模型的核光谱信息推导结果。通过约束模型以重现观测到的爆发峰值、光曲线轮廓、流量和复发时间，研究发现新测定的质子阈值和相应的质子捕获反应对GS 1826$-$24时钟爆发器的爆发光曲线轮廓的影响并不像之前所声称的那样显著。无论是否包含高度影响的$^{22}$Mg($\alpha$,p)$^{25}$Al反应的新反应率，新反应对SAX J1808.4$-$3658光球半径扩展爆发的影响是明显的。研究指出，再现SAX J1808.4$-$3658光球半径扩展爆发器2002年十月时期的模型对核反应速率的不确定性更为敏感。

关键见解

第$^{63}$Ga(p,$\gamma$)$^{64}$Ge和$^{64}$Ge(p,$\gamma$)$^{65}$As核反应连接了ZnGa和GeAs循环，调整了快速质子捕获过程的流向。
这些核反应速率的变化对ZnGa和GeAs循环有调控作用，并可能影响模拟特性与观测数据的匹配。
实施了最新的反应率到KEPLER代码中，以研究其对GS 1826-24时钟爆发器和SAX J1808.4-3658光球半径扩展爆发器模型的影响。
新反应率的结合方法结合了实验数据与核光谱信息推导结果。
在GS 1826-24时钟爆发器模型中，新质子捕获反应的影响并不像先前所声称的那么显著。
在SAX J1808.4-3658光球半径扩展爆发模型中，新反应率具有明显影响，该模型的敏感性更高，特别是在再现2002年十月时期的观测数据时。

Cool Papers

点此查看论文截图

Identifying the Catalytic Descriptor of Single-Atom Catalysts in Nitrate Reduction Reaction: An Interpretable Machine-Learning Method

Authors:Zhen Zhu, Shan Gao, Jing Zhang, Xuxin Kang, Shunfang Li, Xiangmei Duan

Elucidating the catalytic descriptor that accurately characterizes the structure-activity relationships of typical catalysts for various important heterogeneous catalytic reactions is pivotal for designing high-efficient catalytic systems. Here, an interpretable machine learning technique was employed to identify the key determinants governing the nitrate reduction reaction ($\rm NO_3RR$) performance across 286 single-atom catalysts (SACs) with the active sites anchored on double-vacancy $\rm BC_3$ monolayers. Through Shapley Additive Explanations (SHAP) analysis with reliable predictive accuracy, we quantitatively demonstrated that, favorable $\rm NO_3RR$ activity stems from a delicate balance among three critical factors: low $\rm N_V$, moderate $\rm D_N$, and specific doping patterns. Building upon these insights, we established a descriptor ($\psi$) that integrates the intrinsic catalytic properties and the intermediate O-N-H angle ($\theta$), effectively capturing the underlying structure-activity relationship. Guided by this, we further identified 16 promising catalysts with predicted low limiting potential ($U_{\rm L}$). Importantly, these catalysts are composed of cost-effective non-precious metal elements and are predicted to surpass most reported catalysts, with the best-performing Ti-V-1N1 is predicted to have an ultra-low $U_{\rm L}$ of $-0.10$ V.

阐明能够准确描述各种重要非均相催化反应典型催化剂的结构-活性关系的催化描述符，对于设计高效催化系统至关重要。在这里，采用可解释的机器学习技术来确定控制硝酸盐还原反应（NO3RR）性能的关键因素，涉及286个单原子催化剂（SACs），其活性位点锚定在双空位BC3单层上。通过具有可靠预测精度的Shapley加法解释（SHAP）分析，我们定量证明了有利的NO3RR活性来源于三个关键因素之间的微妙平衡：低NV、中度DN和特定的掺杂模式。基于这些见解，我们建立了一个描述符（ψ），它结合了内在的催化特性和中间态O-N-H角（θ），有效地捕捉了潜在的结构-活性关系。在此基础上，我们进一步识别了16种具有预测低极限电位（UL）的有前途的催化剂。重要的是，这些催化剂由成本效益高的非贵金属元素组成，并预测将超越大多数已报道的催化剂，其中性能最佳的Ti-V-1N1预计具有超低UL为-0.10V。

论文及项目相关链接

PDF 10 pages, 8 figures, 74 references

Summary：利用可解释的机器学习方法，研究团队对286种单原子催化剂的硝酸根还原反应性能进行了关键影响因素的解析，并建立了描述催化剂结构与活性关系的描述符。研究指出，良好的硝酸根还原反应活性源于三个关键因素之间的微妙平衡：低氮空位、适度的掺杂密度和特定的掺杂模式。基于这些见解，研究进一步识别了16种具有潜力的催化剂，它们由成本效益高的非贵金属元素组成，并有望超越大多数已报道的催化剂。

Key Takeaways：

利用可解释机器学习方法研究催化剂性能。
分析了单原子催化剂在硝酸根还原反应中的关键影响因素。
关键影响因素包括低氮空位、适度的掺杂密度和特定的掺杂模式。
建立了一个描述催化剂结构与活性关系的描述符。
识别了由成本效益高的非贵金属元素组成的具有潜力的催化剂。
最佳表现的催化剂是Ti-V-1N1，其预测具有超低限制电位。

Cool Papers

点此查看论文截图

Multi-Faceted Evaluation of Tool-Augmented Dialogue Systems

Authors:Zhaoyi Joey Hou, Tanya Shourya, Yingfan Wang, Shamik Roy, Vinayshekhar Bannihatti Kumar, Rashmi Gangadharaiah

Evaluating conversational AI systems that use external tools is challenging, as errors can arise from complex interactions among user, agent, and tools. While existing evaluation methods assess either user satisfaction or agents’ tool-calling capabilities, they fail to capture critical errors in multi-turn tool-augmented dialogues-such as when agents misinterpret tool results yet appear satisfactory to users. We introduce TRACE, a benchmark of systematically synthesized tool-augmented conversations covering diverse error cases, and SCOPE, an evaluation framework that automatically discovers diverse error patterns and evaluation rubrics in tool-augmented dialogues. Experiments show SCOPE significantly outperforms the baseline, particularly on challenging cases where user satisfaction signals are misleading.

评估使用外部工具的对话式AI系统具有挑战性，因为错误可能来自用户、代理和工具之间复杂的交互。虽然现有的评估方法评估的是用户满意度或代理的工具调用能力，但它们无法捕获多轮工具增强对话中的关键错误，例如代理误解工具结果但对用户来说看似满意的情况。我们介绍了TRACE，这是一个包含各种错误情况的合成工具增强对话的基准测试，以及SCOPE，这是一个评估框架，可以自动发现工具增强对话中的多种错误模式和评价准则。实验表明，SCOPE显著优于基线，特别是在用户满意度信号具有误导性的情况下。

论文及项目相关链接

PDF The first two authors contributed equally. Manuscript under submission

Summary

本文介绍了评估使用外部工具的对话式AI系统的挑战，并指出了现有评估方法的不足。为此，文章提出了TRACE基准测试，这是一个系统地合成工具辅助对话的基准测试，涵盖各种错误情况。同时，文章还介绍了SCOPE评估框架，该框架能够自动发现工具辅助对话中的多种错误模式并对其进行评估。实验表明，SCOPE在挑战情况下显著优于基线，特别是在用户满意度信号误导的情况下。

Key Takeaways

评估使用外部工具的对话式AI系统存在挑战，因为涉及用户、代理和工具之间的复杂交互。
现有评估方法主要评估用户满意度或代理的工具调用能力，但无法捕捉多轮工具增强对话中的关键错误。
引入TRACE基准测试，这是一个包含多种错误情况的工具辅助对话系统测试。
介绍SCOPE评估框架，能够自动发现工具辅助对话中的错误模式并进行评估。
SCOPE框架特别擅长处理挑战情况，如代理误解工具结果但对用户而言却表现满意的情况。
实验表明SCOPE显著优于现有评估方法。

Cool Papers

点此查看论文截图

LeCoDe: A Benchmark Dataset for Interactive Legal Consultation Dialogue Evaluation

Authors:Weikang Yuan, Kaisong Song, Zhuoren Jiang, Junjie Cao, Yujie Zhang, Jun Lin, Kun Kuang, Ji Zhang, Xiaozhong Liu

Legal consultation is essential for safeguarding individual rights and ensuring access to justice, yet remains costly and inaccessible to many individuals due to the shortage of professionals. While recent advances in Large Language Models (LLMs) offer a promising path toward scalable, low-cost legal assistance, current systems fall short in handling the interactive and knowledge-intensive nature of real-world consultations. To address these challenges, we introduce LeCoDe, a real-world multi-turn benchmark dataset comprising 3,696 legal consultation dialogues with 110,008 dialogue turns, designed to evaluate and improve LLMs’ legal consultation capability. With LeCoDe, we innovatively collect live-streamed consultations from short-video platforms, providing authentic multi-turn legal consultation dialogues. The rigorous annotation by legal experts further enhances the dataset with professional insights and expertise. Furthermore, we propose a comprehensive evaluation framework that assesses LLMs’ consultation capabilities in terms of (1) clarification capability and (2) professional advice quality. This unified framework incorporates 12 metrics across two dimensions. Through extensive experiments on various general and domain-specific LLMs, our results reveal significant challenges in this task, with even state-of-the-art models like GPT-4 achieving only 39.8% recall for clarification and 59% overall score for advice quality, highlighting the complexity of professional consultation scenarios. Based on these findings, we further explore several strategies to enhance LLMs’ legal consultation abilities. Our benchmark contributes to advancing research in legal domain dialogue systems, particularly in simulating more real-world user-expert interactions.

法律咨询对于保护个人权利和确保获得司法公正至关重要，然而由于专业人员短缺，许多个体仍然觉得成本高昂且难以获得。尽管大型语言模型（LLM）的最新进展为可扩展且低成本的法律援助提供了前景，但当前的系统在处理真实世界中互动的、知识密集型的咨询方面仍然力不从心。为了解决这些挑战，我们推出了LeCoDe，这是一个现实世界的多轮基准数据集，包含3696个法律咨询对话和110008个对话轮次，旨在评估和改进LLM的法律咨询能力。通过LeCoDe，我们创新地从短视频平台收集直播咨询，提供真实的多轮法律咨询对话。法律专家的严格注释进一步增强了数据集的专业见解和专业知识。此外，我们提出了一个全面的评估框架，该框架从（1）澄清能力和（2）专业建议质量两个方面评估LLM的咨询能力。这一统一框架涵盖了两个维度的12个指标。通过对各种通用和特定领域的LLM进行的广泛实验，我们的结果揭示了此任务面临巨大挑战，即使是像GPT-4这样最先进模型在澄清方面的召回率也只有39.8%，建议质量的总体得分也只有59%，这凸显了专业咨询场景的复杂性。基于这些发现，我们进一步探索了几种增强LLM法律咨询能力的策略。我们的基准测试有助于推进法律领域对话系统的研究，特别是在模拟更现实的用户与专家互动方面。

论文及项目相关链接

PDF

Summary
法律咨询服务对保障个人权益和确保司法公正至关重要，但高昂的成本和人才短缺使得许多个人难以获得咨询。大型语言模型（LLM）的发展为提供低成本、可扩展的法律援助提供了希望，但在处理真实世界的互动和知识密集型咨询方面仍存在挑战。为解决这些问题，我们推出了LeCoDe数据集，其中包含来自短视频平台的真实法律咨询对话，旨在评估和改进LLM的法律咨询能力。此外，我们还提出了全面的评估框架，评估LLM的咨询能力包括澄清能力和专业建议质量。通过实验，我们发现即使在专业咨询场景中，顶级模型如GPT-4仍面临重大挑战。我们的研究有助于推进法律领域对话系统的研究，特别是在模拟更真实的用户与专家互动方面。

Key Takeaways