LLM

发布日期: 2025-10-20

更新日期: 2025-11-27

文章字数: 1.9k

阅读时长: 7 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-10-20 更新

The Open Source Advantage in Large Language Models (LLMs)

Authors:Jiya Manchanda, Laura Boettcher, Matheus Westphalen, Jasser Jasser

Large language models (LLMs) have rapidly advanced natural language processing, driving significant breakthroughs in tasks such as text generation, machine translation, and domain-specific reasoning. The field now faces a critical dilemma in its approach: closed-source models like GPT-4 deliver state-of-the-art performance but restrict reproducibility, accessibility, and external oversight, while open-source frameworks like LLaMA and Mixtral democratize access, foster collaboration, and support diverse applications, achieving competitive results through techniques like instruction tuning and LoRA. Hybrid approaches address challenges like bias mitigation and resource accessibility by combining the scalability of closed-source systems with the transparency and inclusivity of open-source framework. However, in this position paper, we argue that open-source remains the most robust path for advancing LLM research and ethical deployment.

大型语言模型（LLM）在自然语言处理领域取得了快速进展，在文本生成、机器翻译和领域特定推理等任务中取得了重大突破。然而，该领域的方法面临一个关键的困境：像GPT-4这样的封闭源模型虽然提供了最先进的性能，但限制了可重复性、可访问性和外部监督；而像LLaMA和Mixtral这样的开源框架则实现了民主化的访问，促进了协作，并支持多样化的应用，通过指令微调、LoRA等技术取得了有竞争力的结果。混合方法通过结合封闭系统的可扩展性和开源框架的透明度和包容性，解决了偏见缓解和资源可及性等挑战。然而，在这篇立场论文中，我们认为开源仍然是推进LLM研究和道德部署的最稳健途径。

论文及项目相关链接

PDF 9 pages, 1 figure

Summary

大规模语言模型（LLM）在自然语言处理领域取得了快速进展，已在文本生成、机器翻译和领域特定推理等任务中取得了重大突破。当前，该领域面临一个关键困境：封闭源模型如GPT-4虽然性能卓越，但限制了可重复性、可访问性和外部监督；而开源框架如LLaMA和Mixtral则实现了民主化访问、促进了协作并支持了多种应用，通过指令微调LoRA等技术取得了具有竞争力的结果。这篇立场论文认为，结合封闭系统的可扩展性和开源框架的透明性和包容性，开放源代码仍然是推进LLM研究和道德部署的最稳健途径。

Key Takeaways

LLM在NLP领域的进展迅速，特别是在文本生成、机器翻译和领域特定推理方面取得了重大突破。
封闭源模型如GPT-4具有卓越性能，但限制了可重复性、可访问性和外部监督。
开源框架如LLaMA和Mixtral促进了LLM的民主化访问和协作，并支持多种应用。
指令微调LoRA等技术使开源框架在性能上取得了具有竞争力的结果。
混合式方法结合了封闭源系统的可扩展性和开源框架的透明性，旨在解决如偏见缓解和资源可访问性等问题。
论文强调开放源代码是推动LLM研究和道德部署的最稳健途径。

Cool Papers

点此查看论文截图

OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance

Authors:Yongqiang Yao, Jingru Tan, Feizhao Zhang, Jiahao Hu, Yazhe Niu, Xin Jin, Bo Li, Pengfei Liu, Ruihao Gong, Dahua Lin, Ningyi Xu

Vision-language instruction-tuning models have recently achieved significant performance improvements. In this work, we discover that large-scale 3D parallel training on those models leads to an imbalanced computation load across different devices. The vision and language parts are inherently heterogeneous: their data distribution and model architecture differ significantly, which affects distributed training efficiency. To address this issue, we rebalance the computational load from data, model, and memory perspectives, achieving more balanced computation across devices. Specifically, for the data, instances are grouped into new balanced mini-batches within and across devices. A search-based method is employed for the model to achieve a more balanced partitioning. For memory optimization, we adaptively adjust the re-computation strategy for each partition to utilize the available memory fully. These three perspectives are not independent but are closely connected, forming an omniverse balanced training framework. Extensive experiments are conducted to validate the effectiveness of our method. Compared with the open-source training code of InternVL-Chat, training time is reduced greatly, achieving about 1.8$\times$ speed-up. Our method’s efficacy and generalizability are further validated across various models and datasets. Codes will be released at https://github.com/ModelTC/OmniBal.

视觉语言指令微调模型最近取得了显著的性能改进。在这项工作中，我们发现大规模3D并行训练这些模型会导致不同设备之间计算负载不平衡。视觉和语言部分本质上是异构的：它们的数据分布和模型架构存在很大差异，从而影响分布式训练效率。为了解决这一问题，我们从数据、模型和内存三个方面重新平衡计算负载，实现各设备之间更平衡的计算。具体来说，对于数据，实例被分组为新的平衡小批次，跨设备内部和外部。采用基于搜索的方法对模型进行更平衡的分区。为了优化内存，我们自适应地调整每个分区的重新计算策略，以充分利用可用内存。这三个方面不是独立的，而是紧密相关的，形成了一个全方位平衡的训练框架。进行了大量实验来验证我们方法的有效性。与InternVL-Chat的开源训练代码相比，我们的训练时间大大缩短，实现了约1.8倍的加速。我们的方法的效果和普遍性在各种模型和数据集上得到了进一步验证。代码将在https://github.com/ModelTC/OmniBal发布。

论文及项目相关链接

PDF Accepted in ICML 2025

Summary

大规模3D并行训练视觉语言指令调整模型时，存在计算负载不均衡的问题。为解决此问题，本文从数据、模型和内存三个角度进行再平衡，实现了更均衡的设备间计算负载。通过分组实例进行平衡小批次处理、采用搜索方法实现模型更均衡的分区、自适应调整各分区的重新计算策略来优化内存使用。此方法有效性经广泛实验验证，与公开代码InternVL-Chat相比，减少了训练时间，实现了约1.8倍的速度提升。此方法在不同模型和数据集上的有效性和通用性得到了进一步验证。相关代码将发布在https://github.com/ModelTC/OmniBal上。

Key Takeaways