发布日期: 2025-06-24

更新日期: 2025-07-06

文章字数: 2.3k

阅读时长: 9 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-06-24 更新

Authors:Kai Huang, Jian Zhang, Xiaofei Xie, Chunyang Chen

Large language model-(LLM) based automated program repair (APR) techniques have shown promising results in resolving real-world GitHub issue tasks. Existing APR systems are primarily evaluated in unimodal settings (e.g., SWE-bench). However, these autonomous systems struggle to resolve multimodal problem scenarios (e.g., SWE-bench M) due to limitations in interpreting and leveraging visual information. In multimodal scenarios, LLMs need to rely on visual information in the graphical user interface (GUI) to understand bugs and generate fixes. To bridge this gap, we propose GUIRepair, a cross-modal reasoning approach for resolving multimodal issue scenarios by understanding and capturing visual information. Specifically, GUIRepair integrates two key components, Image2Code and Code2Image, to enhance fault comprehension and patch validation. Image2Code extracts relevant project documents based on the issue report, then applies this domain knowledge to generate the reproduced code responsible for the visual symptoms, effectively translating GUI images into executable context for better fault comprehension. Code2Image replays the visual issue scenario using the reproduced code and captures GUI renderings of the patched program to assess whether the fix visually resolves the issue, providing feedback for patch validation. We evaluate GUIRepair on SWE-bench M, and the approach demonstrates significant effectiveness. When utilizing GPT-4o as the base model, GUIRepair solves 157 instances, outperforming the best open-source baseline by 26 instances. Furthermore, when using o4-mini as the base model, GUIRepair can achieve even better results and solve 175 instances, outperforming the top commercial system by 22 instances. This emphasizes the success of our new perspective on incorporating cross-modal reasoning by understanding and capturing visual information to resolve multimodal issues.

基于大型语言模型的自动化程序修复（APR）技术在解决现实世界GitHub问题任务方面显示出有前景的结果。现有的APR系统主要在不跨模态的场景中进行评估（例如，SWE-bench）。然而，这些自主系统在处理跨模态问题场景（例如，SWE-bench M）时却遇到困难，这是由于在解释和利用视觉信息方面的局限性所致。在跨模态场景中，大型语言模型需要依赖图形用户界面（GUI）中的视觉信息来理解错误并生成修复方案。为了弥补这一差距，我们提出了GUIRepair，这是一种跨模态推理方法，通过理解和捕获视觉信息来解决跨模态问题场景。具体来说，GUIRepair集成了两个关键组件：Image2Code和Code2Image，以增强故障理解和补丁验证。Image2Code根据问题报告提取相关的项目文档，然后应用这些领域知识来生成负责视觉症状的再现代码，有效地将GUI图像翻译成可执行上下文，以更好地了解故障。Code2Image使用再现的代码重新播放视觉问题场景，并捕获修补程序的GUI渲染，以评估修复方案是否在视觉上解决了问题，为补丁验证提供反馈。我们在SWE-bench M上评估了GUIRepair，该方法证明了其有效性。当使用GPT-4o作为基础模型时，GUIRepair解决了157个实例，比最佳开源基准高出26个实例。此外，当使用o4-mini作为基础模型时，GUIRepair可以取得更好的结果，解决175个实例，比顶级商业系统高出22个实例。这凸显了我们结合跨模态推理、理解和捕获视觉信息来解决多模态问题的新视角的成功。

论文及项目相关链接

PDF

Summary
基于大语言模型的自动化程序修复（APR）技术在解决现实世界GitHub问题任务中展现出良好效果。然而，现有APR系统在处理多模式问题场景时存在困难，因为它们难以解释和利用视觉信息。为此，提出了GUIRepair，这是一个用于解决多模式问题场景的跨模式推理方法。它通过理解和捕获视觉信息，集成了Image2Code和Code2Image两个关键组件，分别增强故障理解和补丁验证。在SWE-bench M上的评估表明，GUIRepair效果显著，利用GPT-4o作为基础模型时，解决了157个实例，超越最佳开源基准26个实例。使用o4-mini作为基础模型时，GUIRepair表现更佳，解决了175个实例，超越顶级商业系统22个实例。

Key Takeaways

大语言模型在自动化程序修复中展现出良好效果，特别是在解决GitHub问题任务方面。
现有APR系统在处理多模式问题场景时存在困难，需要改进以解释和利用视觉信息。
GUIRepair是一个跨模式推理方法，旨在解决多模式问题场景，通过理解和捕获视觉信息来提高故障理解和补丁验证。
GUIRepair集成了Image2Code和Code2Image两个关键组件，前者负责从项目文档中提取相关信息并生成可执行的代码，后者则通过重现问题场景来验证补丁的有效性。
在SWE-bench M上的评估表明GUIRepair效果显著。
使用GPT-4o和o4-mini作为基础模型时，GUIRepair在解决实例数量上超越了现有基准系统和商业系统。
这种方法的新视角是成功地将跨模式推理融入APR技术中，通过理解和捕获视觉信息来解决多模式问题。

Cool Papers

点此查看论文截图

Diffusion & Adversarial Schrödinger Bridges via Iterative Proportional Markovian Fitting

Authors:Sergei Kholkin, Grigoriy Ksenofontov, David Li, Nikita Kornilov, Nikita Gushchin, Alexandra Suvorikova, Alexey Kroshnin, Evgeny Burnaev, Alexander Korotin

The Iterative Markovian Fitting (IMF) procedure, which iteratively projects onto the space of Markov processes and the reciprocal class, successfully solves the Schr"odinger Bridge (SB) problem. However, an efficient practical implementation requires a heuristic modification - alternating between fitting forward and backward time diffusion at each iteration. This modification is crucial for stabilizing training and achieving reliable results in applications such as unpaired domain translation. Our work reveals a close connection between the modified version of IMF and the Iterative Proportional Fitting (IPF) procedure - a foundational method for the SB problem, also known as Sinkhorn’s algorithm. Specifically, we demonstrate that the heuristic modification of the IMF effectively integrates both IMF and IPF procedures. We refer to this combined approach as the Iterative Proportional Markovian Fitting (IPMF) procedure. Through theoretical and empirical analysis, we establish the convergence of IPMF procedure under various settings, contributing to developing a unified framework for solving SB problems. Moreover, from a practical standpoint, the IPMF procedure enables a flexible trade-off between image similarity and generation quality, offering a new mechanism for tailoring models to specific tasks.

迭代马尔可夫拟合（IMF）程序通过将数据投影到马尔可夫过程和倒数类空间，成功地解决了薛定谔桥（SB）问题。然而，为了有效地实际应用，需要采用启发式修改-在每次迭代时交替进行正向和反向时间扩散的拟合。这一修改对于稳定训练和在如非配对域翻译等应用中实现可靠结果至关重要。我们的工作揭示了IMF的修改版本与迭代比例拟合（IPF）程序之间的密切联系——SB问题的基本方法，也被称为Sinkhorn算法。具体来说，我们证明了IMF的启发式修改有效地融合了IMF和IPF程序。我们将这种综合方法称为迭代比例马尔可夫拟合（IPMF）程序。通过理论分析和实证研究，我们确定了IPMF程序在各种设置下的收敛性，为解决SB问题建立了统一的框架。此外，从实际的角度来看，IPMF程序能够在图像相似性和生成质量之间进行灵活权衡，为针对特定任务定制模型提供了新的机制。

论文及项目相关链接

PDF

Summary

IMF程序通过迭代投影到马尔可夫过程和倒数类别空间成功解决了薛定谔桥问题。然而，高效的实践实现需要启发式修改——在每次迭代时交替进行正向和反向时间扩散的拟合。这一修改对于稳定训练和在实现如非配对域翻译等应用中获得可靠结果至关重要。本研究揭示了修改后的IMF与解决SB问题的基本方法——迭代比例拟合（IPF）程序之间的密切联系，也被称为Sinkhorn算法。具体来说，我们证明了IMF的启发式修改有效地结合了IMF和IPF程序。我们将这种结合的方法称为迭代比例马尔可夫拟合（IPMF）程序。通过理论分析和实证研究，我们建立了IPMF程序在各种设置下的收敛性，为解决SB问题提供了统一的框架。此外，从实际的角度来看，IPMF程序能够在图像相似性和生成质量之间进行灵活权衡，为针对特定任务定制模型提供了新的机制。

Key Takeaways