发布日期: 2025-03-04

更新日期: 2025-05-14

文章字数: 2k

阅读时长: 8 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-03-04 更新

GUIDE: LLM-Driven GUI Generation Decomposition for Automated Prototyping

Authors:Kristian Kolthoff, Felix Kretzer, Christian Bartelt, Alexander Maedche, Simone Paolo Ponzetto

GUI prototyping serves as one of the most valuable techniques for enhancing the elicitation of requirements and facilitating the visualization and refinement of customer needs. While GUI prototyping has a positive impact on the software development process, it simultaneously demands significant effort and resources. The emergence of Large Language Models (LLMs) with their impressive code generation capabilities offers a promising approach for automating GUI prototyping. Despite their potential, there is a gap between current LLM-based prototyping solutions and traditional user-based GUI prototyping approaches which provide visual representations of the GUI prototypes and direct editing functionality. In contrast, LLMs and related generative approaches merely produce text sequences or non-editable image output, which lacks both mentioned aspects and therefore impede supporting GUI prototyping. Moreover, minor changes requested by the user typically lead to an inefficient regeneration of the entire GUI prototype when using LLMs directly. In this work, we propose GUIDE, a novel LLM-driven GUI generation decomposition approach seamlessly integrated into the popular prototyping framework Figma. Our approach initially decomposes high-level GUI descriptions into fine-granular GUI requirements, which are subsequently translated into Material Design GUI prototypes, enabling higher controllability and more efficient adaption of changes. To efficiently conduct prompting-based generation of Material Design GUI prototypes, we propose a retrieval-augmented generation approach to integrate the component library. Our preliminary evaluation demonstrates the effectiveness of GUIDE in bridging the gap between LLM generation capabilities and traditional GUI prototyping workflows, offering a more effective and controlled user-based approach to LLM-driven GUI prototyping. Video: https://youtu.be/C9RbhMxqpTU

GUI原型设计是增强需求激发和促进客户可视化及需求细化的最有价值的技巧之一。虽然GUI原型设计对软件开发过程有积极影响，但它同时也需要巨大的努力和资源。随着具有强大代码生成能力的大型语言模型（LLM）的出现，它为自动化GUI原型设计提供了一个有前景的方法。尽管具有潜力，但当前的LLM基于原型解决方案与传统用户基于GUI的原型设计方式之间仍存在差距。传统方法提供GUI原型的可视化表示和直接编辑功能，而LLM和相关生成方法仅产生文本序列或非可编辑图像输出，缺乏这两个方面，因此不支持GUI原型设计。此外，当用户要求微小更改时，直接使用LLM通常会导致整个GUI原型重新生成，效率低下。在此工作中，我们提出GUIDE，这是一种新型LLM驱动GUI生成分解方法，无缝集成到流行的原型设计框架Figma中。我们的方法首先将高级GUI描述分解为精细的GUI要求，然后将其翻译为Material Design GUI原型，从而实现更高的可控性和更有效的适应性更改。为了有效地进行基于提示的Material Design GUI原型的生成，我们提出了一种增强检索的生成方法，以整合组件库。我们的初步评估表明，GUIDE在填补LLM生成能力与传统GUI原型设计工作流程之间的差距方面是有效的，为用户提供了一种更有效和可控的、基于LLM驱动的GUI原型设计方法。视频链接：点击这里。

论文及项目相关链接

PDF

Summary

GUI原型设计是提升需求收集和用户可视化需求精化的重要技术。大型语言模型（LLM）的涌现为自动化GUI原型设计提供了希望。然而，现有LLM为基础的原型设计解决方案与传统用户式GUI原型设计相比存在差距。本文提出GUIDE，一种新型LLM驱动的GUI生成分解方法，集成于流行的原型设计框架Figma中。GUIDE能分解高级GUI描述为精细的GUI需求，并转化为Material Design GUI原型，提高控制性和适应性。通过集成组件库，提出检索增强生成方法，初步评估表明GUIDE在LLM生成能力和传统GUI原型设计流程之间架起桥梁，提供更有效、受控的用户式LLM驱动GUI原型设计。视频地址：<链接>。

Key Takeaways

GUI原型设计是软件开发过程中需求明确和用户视觉化的关键环节。
大型语言模型（LLM）为自动化GUI原型设计带来希望。
当前LLM在GUI原型设计中的解决方案存在与用户的可视化修改及功能差距的问题。
提出GUIDE，一种新型LLM驱动的GUI生成分解方法，集成于Figma框架中。
GUIDE能将高级GUI描述分解为精细需求并转化为Material Design GUI原型。
GUIDE提高了原型设计的控制性和适应性，特别是对用户要求的微小改动。

Cool Papers

点此查看论文截图

Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation

Authors:Shaharukh Khan, Ayush Tarun, Ali Faraz, Palash Kamble, Vivek Dahiya, Praveen Pokala, Ashish Kulkarni, Chandra Khatri, Abhinav Ravi, Shubham Agarwal

In this work, we provide the system description of our submission as part of the English to Lowres Multimodal Translation Task at the Workshop on Asian Translation (WAT2024). We introduce Chitranuvad, a multimodal model that effectively integrates Multilingual LLM and a vision module for Multimodal Translation. Our method uses a ViT image encoder to extract visual representations as visual token embeddings which are projected to the LLM space by an adapter layer and generates translation in an autoregressive fashion. We participated in all the three tracks (Image Captioning, Text only and Multimodal translation tasks) for Indic languages (ie. English translation to Hindi, Bengali and Malyalam) and achieved SOTA results for Hindi in all of them on the Challenge set while remaining competitive for the other languages in the shared task.

在这项工作中，我们提供了作为亚洲翻译研讨会（WAT2024）英语转Lowres多媒体翻译任务部分提交的系统描述。我们介绍了Chitranuvad，这是一款有效地集成了多语言大型语言模型和视觉模块的多模态模型，用于多媒体翻译。我们的方法使用ViT图像编码器来提取视觉表征作为视觉令牌嵌入，通过适配器层投影到大型语言模型空间，并以自回归的方式生成翻译。我们参加了所有三个轨道（图像描述、仅文本和多媒体翻译任务）的印度语（即英语翻译至印地语、孟加拉语和马拉雅姆语），在英语转印地语的翻译中挑战集上取得了最佳结果，而其他语言的共享任务中也保持了竞争力。

论文及项目相关链接

PDF

Summary

本文介绍了在亚洲翻译研讨会（WAT2024）上提交的英语到低资源多模态翻译任务的系统描述。提出了Chitranuvad多模态模型，该模型有效地集成了多语言LLM和视觉模块进行多模态翻译。通过使用ViT图像编码器提取视觉表示作为视觉令牌嵌入，通过适配器层投影到LLM空间，并以自回归的方式生成翻译。参与所有三个轨道（图像描述、仅文本和多模态翻译任务）的印度语（英语翻译至印地语、孟加拉语和马拉雅拉姆语），并在挑战集上为印地语取得最新成果，同时在其他语言的共享任务中保持竞争力。

Key Takeaways