嘘~ 正在从服务器偷取页面 . . .

Few-Shot


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-11-11 更新

Software Defined Vehicle Code Generation: A Few-Shot Prompting Approach

Authors:Quang-Dung Nguyen, Tri-Dung Tran, Thanh-Hieu Chu, Hoang-Loc Tran, Xiangwei Cheng, Dirk Slama

The emergence of Software-Defined Vehicles (SDVs) marks a paradigm shift in the automotive industry, where software now plays a pivotal role in defining vehicle functionality, enabling rapid innovation of modern vehicles. Developing SDV-specific applications demands advanced tools to streamline code generation and improve development efficiency. In recent years, general-purpose large language models (LLMs) have demonstrated transformative potential across domains. Still, restricted access to proprietary model architectures hinders their adaption to specific tasks like SDV code generation. In this study, we propose using prompts, a common and basic strategy to interact with LLMs and redirect their responses. Using only system prompts with an appropriate and efficient prompt structure designed using advanced prompt engineering techniques, LLMs can be crafted without requiring a training session or access to their base design. This research investigates the extensive experiments on different models by applying various prompting techniques, including bare models, using a benchmark specifically created to evaluate LLMs’ performance in generating SDV code. The results reveal that the model with a few-shot prompting strategy outperforms the others in adjusting the LLM answers to match the expected outcomes based on quantitative metrics.

软件定义车辆(SDV)的出现标志着汽车行业模式的转变,软件在定义车辆功能方面发挥着至关重要的作用,促进了现代车辆的快速创新。开发针对SDV的应用程序需要先进的工具来简化代码生成,提高开发效率。近年来,通用大型语言模型(LLM)在各个领域都表现出了变革的潜力。然而,专有模型架构的有限访问阻碍了它们适应特定任务,如SDV代码生成。在这项研究中,我们提出了一种通用的基本策略,即提示,来与LLM互动并引导其回应。通过使用适当的提示结构和先进的提示工程技术构建的提示,可以在无需进行训练会话或访问其基本设计的情况下定制LLM。本研究通过应用各种提示技术进行了大量实验,这些技术包括裸模型,并使用专门创建的基准测试来评估LLM在生成SDV代码方面的性能。结果表明,采用少提示策略的模型在调整LLM答案以匹配基于定量指标的预期结果方面表现优于其他模型。

论文及项目相关链接

PDF 6 pages, 3 figures

Summary

软件定义车辆(SDV)的出现标志着汽车行业的一个范式转变,软件在定义车辆功能方面发挥着关键作用,促进了现代车辆的快速创新。开发SDV特定应用程序需要高级工具来简化代码生成并提高工作效率。本研究探讨了通过使用提示来利用大型语言模型(LLM)在SDV代码生成中的潜力。通过使用系统提示和高效的提示结构,可以在无需进行训练会话或访问基本设计的情况下,运用先进的提示工程技术来引导LLM的响应。本研究进行了广泛的实验,应用了各种提示技术,包括裸模型,并使用专门创建的基准测试来评估LLM在生成SDV代码方面的表现。结果表明,采用少提示策略的模型在根据定量指标调整LLM答案以匹配预期结果方面表现最佳。

Key Takeaways

  1. 软件定义车辆(SDV)代表汽车行业的新范式,强调软件在车辆功能定义中的关键作用。
  2. 开发SDV特定应用需要高级工具来提高代码生成和开发的效率。
  3. 大型语言模型(LLM)在多个领域具有变革性潜力,但访问专有模型架构的限制阻碍了它们在特定任务(如SDV代码生成)中的应用。
  4. 提示是与LLM交互并引导其响应的通用和基本策略。
  5. 通过使用系统提示和高效的提示结构,可以在不训练或访问基础设计的情况下利用LLM。
  6. 研究表明,采用少提示策略的模型在SDV代码生成方面表现最佳,能够基于定量指标调整答案以符合期望。

Cool Papers

点此查看论文截图

GEMMA-SQL: A Novel Text-to-SQL Model Based on Large Language Models

Authors:Hari Mohan Pandey, Anshul Gupta, Subham Sarkar, Minakshi Tomer, Schneider Johannes, Yan Gong

Text-to-SQL systems enable users to interact with structured databases using natural language, eliminating the need for specialized programming knowledge. In this work, we introduce GEMMA-SQL, a lightweight and efficient text-to-SQL model built upon the open-source Gemma 2B architecture. Unlike many large language models (LLMs), GEMMA-SQL is fine-tuned in a resource-efficient, iterative manner and can be deployed on low-cost hardware. Leveraging the SPIDER benchmark for training and evaluation, GEMMA-SQL combines multiple prompting strategies, including few-shot learning, to enhance SQL query generation accuracy. The instruction-tuned variant, GEMMA-SQL Instruct, achieves 66.8% Test-Suite accuracy and 63.3% Exact Set Match accuracy, outperforming several state-of-the-art baselines such as IRNet, RYANSQL, and CodeXDavinci. The proposed approach demonstrates that effective prompt design and targeted instruction tuning can significantly boost performance while maintaining high scalability and adaptability. These results position GEMMA-SQL as a practical, open-source alternative for robust and accessible text-to-SQL systems.

文本到SQL系统使用户能够通过自然语言与结构化数据库进行交互,从而无需具备专业编程知识。在这项工作中,我们介绍了基于开源Gemma 2B架构的轻量级、高效的文本到SQL模型——GEMMA-SQL。与许多大型语言模型(LLM)不同,GEMMA-SQL以资源高效、迭代的方式进行了微调,可以在低成本硬件上部署。利用SPIDER基准数据集进行训练和评估,GEMMA-SQL结合了多种提示策略,包括小样本学习,以提高SQL查询生成的准确性。指令调整型变体GEMMA-SQL Instruct达到了66.8%的测试套件准确率和63.3%的精确集匹配准确率,优于IRNet、RYANSQL和CodeXDavinci等多个最新基线。所提出的方法表明,有效的提示设计和有针对性的指令调整可以在保持高可扩展性和适应性的同时,显著提高性能。这些成果使GEMMA-SQL成为实用、开源的文本到SQL系统的稳健、可访问的替代方案。

论文及项目相关链接

PDF

Summary
文本转SQL系统允许用户通过自然语言与结构化数据库进行交互,无需特定编程知识。本研究引入了一个基于开源Gemma 2B架构的轻量级、高效的文本转SQL模型——GEMMA-SQL。与其他大型语言模型不同,GEMMA-SQL以资源高效的方式进行微调,并可在低成本硬件上部署。通过SPIDER基准进行训练和评估,GEMMA-SQL结合了多种提示策略,包括少样本学习,以提高SQL查询生成准确性。指令调整型变体GEMMA-SQL Instruct达到了66.8%的测试套件准确率和63.3%的确切集匹配准确率,优于多个先进基线模型,如IRNet、RYANSQL和CodeXDavinci。该研究证明了有效的提示设计和有针对性的指令调整可以在保持高可扩展性和适应性的同时显著提高性能。这些成果确立了GEMMA-SQL作为一个实用、开源的文本转SQL系统的稳健和可访问的替代方案。

Key Takeaways

  1. GEMMA-SQL是一个基于开源Gemma 2B架构的轻量级文本转SQL模型。
  2. 与大型语言模型不同,GEMMA-SQL以资源高效的方式微调,适用于低成本硬件部署。
  3. GEMMA-SQL结合多种提示策略,包括少样本学习,提高SQL查询生成准确性。
  4. GEMMA-SQL Instruct型的测试套件准确率和确切集匹配准确率均高于多个先进基线模型。
  5. 有效提示设计和指令调整可显著提高性能。
  6. GEMMA-SQL作为实用、开源的文本转SQL系统替代方案,具有稳健性和可访问性。

Cool Papers

点此查看论文截图

ZERO: Industry-ready Vision Foundation Model with Multi-modal Prompts

Authors:Sangbum Choi, Kyeongryeol Go, Taewoong Jang

Foundation models have revolutionized AI, yet they struggle with zero-shot deployment in real-world industrial settings due to a lack of high-quality, domain-specific datasets. To bridge this gap, Superb AI introduces ZERO, an industry-ready vision foundation model that leverages multi-modal prompting (textual and visual) for generalization without retraining. Trained on a compact yet representative 0.9 million annotated samples from a proprietary billion-scale industrial dataset, ZERO demonstrates competitive performance on academic benchmarks like LVIS-Val and significantly outperforms existing models across 37 diverse industrial datasets. Furthermore, ZERO achieved 2nd place in the CVPR 2025 Object Instance Detection Challenge and 4th place in the Foundational Few-shot Object Detection Challenge, highlighting its practical deployability and generalizability with minimal adaptation and limited data. To the best of our knowledge, ZERO is the first vision foundation model explicitly built for domain-specific, zero-shot industrial applications.

基础模型已经彻底改变了人工智能领域,然而由于缺乏高质量、特定领域的数据集,它们在现实世界工业环境中的零样本部署方面遇到了困难。为了弥补这一差距,Superb AI推出了ZERO,这是一款面向工业应用的准备就绪的视觉基础模型。它通过多模式提示(文本和视觉)进行训练,无需重新训练即可实现泛化。ZERO在专有百亿级工业数据集上,使用精简但具有代表性的90万标注样本进行训练。在LVIS-Val等学术基准测试中,它表现出良好的性能,并在37个不同的工业数据集上显著优于现有模型。此外,ZERO在CVPR 2025目标实例检测挑战中取得第二名,在基础少样本目标检测挑战中取得第四名,突显出它在最小适应性和有限数据下的实际部署能力和泛化能力。据我们所知,ZERO是专门为特定领域的零样本工业应用构建的第一个视觉基础模型。

论文及项目相关链接

PDF 9 pages, 2 figures

Summary:超级人工智能(Superb AI)推出了一款名为ZERO的行业级视觉基础模型,它无需重新训练就能适应各种领域。通过利用多模式提示(文本和视觉),该模型在少量样本上实现了良好的性能,并在多个工业数据集上显著优于现有模型。此外,它在CVPR 2025目标实例检测挑战中获得第二名和在基础少样本目标检测挑战中获得第四名,证明了其在有限数据下的实际应用部署和泛化能力。ZERO是专为特定领域零样本工业应用设计的首个视觉基础模型。

Key Takeaways

  1. ZERO是一个针对特定领域的视觉基础模型,适用于工业环境中的零样本部署。
  2. 它利用多模式提示(文本和视觉)实现泛化,无需重新训练。
  3. ZERO在少量样本上表现出良好的性能,训练数据紧凑且具有代表性。
  4. 它在多个工业数据集上显著优于现有模型。
  5. ZERO在CVPR 2025的比赛中获得优异成绩,验证了其实践部署和泛化能力。
  6. ZERO是第一个专为特定领域的零样本工业应用设计的视觉基础模型。
  7. 该模型的成功展示了基础模型在AI领域的潜力,特别是在解决特定领域的数据集问题上。

Cool Papers

点此查看论文截图

Scalable Medication Extraction and Discontinuation Identification from Electronic Health Records Using Large Language Models

Authors:Chong Shao, Douglas Snyder, Chiran Li, Bowen Gu, Kerry Ngan, Chun-Ting Yang, Jiageng Wu, Richard Wyss, Kueiyu Joshua Lin, Jie Yang

Identifying medication discontinuations in electronic health records (EHRs) is vital for patient safety but is often hindered by information being buried in unstructured notes. This study aims to evaluate the capabilities of advanced open-sourced and proprietary large language models (LLMs) in extracting medications and classifying their medication status from EHR notes, focusing on their scalability on medication information extraction without human annotation. We collected three EHR datasets from diverse sources to build the evaluation benchmark. We evaluated 12 advanced LLMs and explored multiple LLM prompting strategies. Performance on medication extraction, medication status classification, and their joint task (extraction then classification) was systematically compared across all experiments. We found that LLMs showed promising performance on the medication extraction and discontinuation classification from EHR notes. GPT-4o consistently achieved the highest average F1 scores in all tasks under zero-shot setting - 94.0% for medication extraction, 78.1% for discontinuation classification, and 72.7% for the joint task. Open-sourced models followed closely, Llama-3.1-70B-Instruct achieved the highest performance in medication status classification on the MIV-Med dataset (68.7%) and in the joint task on both the Re-CASI (76.2%) and MIV-Med (60.2%) datasets. Medical-specific LLMs demonstrated lower performance compared to advanced general-domain LLMs. Few-shot learning generally improved performance, while CoT reasoning showed inconsistent gains. LLMs demonstrate strong potential for medication extraction and discontinuation identification on EHR notes, with open-sourced models offering scalable alternatives to proprietary systems and few-shot can further improve LLMs’ capability.

识别电子健康记录(EHRs)中的药物停用对于患者安全至关重要,但往往因信息埋藏在非结构化笔记中而受到阻碍。本研究旨在评估先进的开源和专有大型语言模型(LLMs)从EHR笔记中提取药物并分类其用药状态的能力,重点关注其在无需人工标注的药物信息提取上的可扩展性。我们从多种来源收集了三个EHR数据集,以建立评估基准。我们评估了12个先进的大型语言模型,并探索了多种大型语言模型提示策略。系统地比较了所有实验中药物提取、用药状态分类及其联合任务(先提取后分类)的性能。我们发现大型语言模型在EHR笔记中的药物提取和停用分类方面表现出有希望的性能。GPT-4o在所有任务中的平均F1得分一直最高,零样本设置下的得分分别为:药物提取94.0%,停用分类78.1%,联合任务72.7%。开源模型紧随其后,Llama-3.1-70B-Instruct在MIV-Med数据集上的用药状态分类表现最佳(68.7%),在Re-CASI和MIV-Med数据集的联合任务中表现最佳(分别为76.2%和60.2%)。特定医疗领域的大型语言模型与先进的一般领域大型语言模型相比,表现较差。少样本学习通常可以提高性能,而CoT推理则表现出不一致的增益。大型语言模型在EHR笔记中的药物提取和停用识别方面表现出强大的潜力,开源模型为专有系统提供了可扩展的替代方案,而少样本学习可以进一步提高大型语言模型的能力。

论文及项目相关链接

PDF

Summary

本文研究了使用先进的开源和专有大型语言模型(LLMs)从电子健康记录(EHRs)中提取药物信息并分类其用药状态的能力。实验结果表明,LLMs在药物提取和停药状态分类方面表现出良好性能,GPT-4o在零样本设置下表现最佳。此外,开源模型如Llama也表现出高性能,而医疗特定LLMs的性能较低。少量样本学习可以提高性能,而CoT推理的增益则表现不一。LLMs在EHRs中显示出强大的药物提取和停药识别潜力,开源模型为专有系统提供了可扩展的替代方案。

Key Takeaways

  1. 大型语言模型(LLMs)能够从电子健康记录(EHRs)中提取药物信息并进行分类。
  2. GPT-4o在零样本设置下的药物提取和停药状态分类表现最佳。
  3. 开源模型如Llama在特定数据集上的性能与专有模型相近。
  4. 医疗特定LLMs的性能低于先进的通用领域LLMs。
  5. 少量样本学习能提高LLMs的性能。
  6. CoT推理的增益表现不一。

Cool Papers

点此查看论文截图

LEME: Open Large Language Models for Ophthalmology with Advanced Reasoning and Clinical Validation

Authors:Hyunjae Kim, Xuguang Ai, Sahana Srinivasan, Aidan Gilson, Maxwell B. Singer, Krithi Pushpanathan, Qianqian Xie, Jungwoo Park, Serina Applebaum, Gabriel Dawei Yang, Minjie Zou, David Ziyou Chen, Ke Zou, Soshian Sarrafpour, Ji Liu, Yu Yin, Jimin Huang, Quang Ngoc Nguyen, Erping Long, Peixing Wan, Dianbo Liu, Richard Hintz, W. Jim Zheng, Sophia Y. Wang, Lucila Ohno-Machado, Hua Xu, Ron A. Adelman, Luciano V. Del Priore, Yih-Chung Tham, Qingyu Chen

The rising prevalence of eye diseases poses a growing public health burden. Large language models (LLMs) offer a promising path to reduce documentation workload and support clinical decision-making. However, few have been tailored for ophthalmology, and most evaluations focus mainly on knowledge-based QA without clinically relevant benchmarks or real-world validation. Here, we present LEME, a suite of open-weight LLMs developed through a two-stage process: (1) instruction tuning on 200,000 samples from clinical guidelines, textbooks, and case reports to enhance reasoning and task-following, and (2) reinforcement learning with ~30,000 preference labels to enhance accuracy and informativeness. LEME was evaluated on five curated zero-shot benchmarks spanning tasks such as patient QA, consultation, and treatment planning. It outperformed all seven baselines (all p < 0.004), exceeding GPT-4o by 3.32% (absolute ROUGE-L gain). It was further evaluated on three downstream tasks using deidentified patient data, reviewed by clinicians. In patient QA, LEME received the highest ratings from attending clinicians in 3 out of 4 criteria, with scores of 4.67 for factuality, 4.77 for specificity, 4.79 for completeness, and 4.88 for safety (1-5 scale). Its completeness score surpassed that of expert-written answers (4.79 vs. 4.56; p = 0.015). In visual acuity extraction, LEME achieved the highest F1, outperforming LLaMA-3 by 14.1% and Eye-LLaMA by 59.0%. In a pilot evaluation on assessment and treatment planning for diabetic retinopathy, AMD, and glaucoma, LEME received scores of 4.36 for factuality, 4.55 for specificity, 4.42 for completeness, and 4.36 for safety, approaching attending-level performance. All models, data, and code will be released to support further development and clinical translation, laying the groundwork for improved efficiency and patient care

眼疾日益普遍,给公共卫生带来了越来越大的负担。大型语言模型(LLM)为减少文档工作量并支持临床决策制定提供了前景。然而,针对眼科领域定制的大型语言模型并不多,而且大多数评估主要集中在基于知识的问答上,缺乏临床相关的基准测试或现实世界的验证。在这里,我们推出了LEME,这是一套通过两个阶段开发出的开源大型语言模型:首先是基于来自临床指南、教科书和病例报告的20万个样本进行指令微调,以增强推理和完成任务的能力;其次是使用大约3万偏好标签进行强化学习,以提高准确性和信息量。LEME在五个精心挑选的零样本基准测试上进行了评估,涵盖了患者问答、咨询和治疗计划等任务。它超越了所有七个基准(所有p < 0.004),比GPT-4o高出3.32%(绝对ROUGE-L增益)。它进一步在三个下游任务中使用匿名患者数据进行了评估,并得到了临床医生的审查。在患者问答方面,LEME在四项标准中有三项获得了主治医师的最高评分,事实性得分为4.67,特异性得分为4.77,完整性得分为4.79,安全性得分为4.88(1-5分制)。其完整性得分超过了专家答案的得分(4.79 vs 4.56;p = 0.015)。在视力提取方面,LEME取得了最高的F1分数,比LLaMA-3高出14.1%,比Eye-LLaMA高出59.0%。在针对糖尿病视网膜病变、AMD和青光眼评估和治疗的试点评估中,LEME在事实性、特异性、完整性和安全性方面的得分分别为4.36、4.55、4.42和4.36,接近主治医师级别的表现。所有模型、数据和代码都将发布,以支持进一步的开发和临床翻译应用,为提高效率和患者护理奠定基础。

论文及项目相关链接

PDF

Summary

大型语言模型在眼疾领域具有巨大潜力,LEME模型通过两阶段开发,包括指令调优和强化学习,展现出卓越性能。在多个基准测试中,LEME表现优异,尤其在患者问答、咨询和治疗规划方面。其在实际患者数据上的表现亦得到临床医生的高度评价。此研究为眼疾领域的临床决策支持和效率提升奠定了基础。

Key Takeaways

  • 眼疾的普及给患者和公众健康带来了负担,大型语言模型(LLMs)为减少文档工作量和支持临床决策提供了希望。
  • LEME模型通过两阶段开发:指令调优以增强推理和任务遵循能力,强化学习以提高准确性和信息丰富度。
  • LEME在多个基准测试中表现优异,超过其他模型,尤其在患者问答、咨询和治疗规划方面。
  • 在实际患者数据上,LEME的表现得到临床医生的高度评价,尤其在事实性、特异性、完整性和安全性方面。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
视频理解 视频理解
视频理解 方向最新论文已更新,请持续关注 Update in 2025-11-11 TimeSearch-R Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning
2025-11-11
下一篇 
Agent Agent
Agent 方向最新论文已更新,请持续关注 Update in 2025-11-11 SWE-Compass Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
2025-11-11
  目录