LLM

发布日期: 2025-10-06

更新日期: 2025-11-27

文章字数: 9.4k

阅读时长: 38 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-10-06 更新

LitterBox+: An Extensible Framework for LLM-enhanced Scratch Static Code Analysis

Authors:Benedikt Fein, Florian Obermüller, Gordon Fraser

Large language models (LLMs) have become an essential tool to support developers using traditional text-based programming languages, but the graphical notation of the block-based Scratch programming environment inhibits the use of LLMs. To overcome this limitation, we propose the LitterBox+ framework that extends the Scratch static code analysis tool LitterBox with the generative abilities of LLMs. By converting block-based code to a textual representation suitable for LLMs, LitterBox+ allows users to query LLMs about their programs, about quality issues reported by LitterBox, and it allows generating code fixes. Besides offering a programmatic API for these functionalities, LitterBox+ also extends the Scratch user interface to make these functionalities available directly in the environment familiar to learners. The framework is designed to be easily extensible with other prompts, LLM providers, and new features combining the program analysis capabilities of LitterBox with the generative features of LLMs. We provide a screencast demonstrating the tool at https://youtu.be/RZ6E0xgrIgQ.

大型语言模型（LLM）已经成为支持开发者使用传统文本编程语言的重要工具，但基于块的Scratch编程环境的图形标记阻碍了LLM的使用。为了克服这一限制，我们提出了LitterBox+框架，该框架扩展了Scratch静态代码分析工具LitterBox，并增加了LLM的生成能力。通过将基于块的代码转换为适合LLM的文本表示形式，LitterBox+允许用户查询LLM关于他们的程序、关于LitterBox报告的质量问题，并允许生成代码修复。除了为这些功能提供程序化API外，LitterBox+还扩展了Scratch用户界面，使这些功能在熟悉的学习者环境中直接使用。该框架设计易于与其他提示、LLM提供商和新功能扩展结合，将LitterBox的程序分析功能与LLM的生成功能相结合。我们提供了一个屏幕录像演示工具：点击这里查看。

论文及项目相关链接

PDF ASE 2025 Tool Demonstration Track

摘要

LLM已成为开发者使用传统文本编程语言的重要工具，但由于Scratch编程环境的图形符号表示限制了LLM的使用。为克服这一局限，提出LitterBox+框架，该框架扩展了Scratch静态代码分析工具LitterBox，融合了LLM的生成能力。LitterBox+能将基于块的代码转换为适合LLM的文本表示形式，使用户能够查询LLM关于他们的程序、关于LitterBox报告的质量问题，并生成代码修复。除了通过这些功能的程序化API提供这些功能外，LitterBox+还扩展了Scratch用户界面，使这些功能在学习者熟悉的环境中可用。该框架设计易于与其他提示、LLM提供商和新功能扩展结合，将LitterBox的程序分析功能与LLM的生成功能相结合。可通过https://youtu.be/RZ6E0xgrIgQ观看演示该工具的视频。

关键见解

LLM已成为传统文本编程语言开发的重要支持工具。
Scratch编程环境的图形表示限制了LLM的使用。
LitterBox+框架扩展了LitterBox，融合了LLM的生成能力，以克服这一限制。
LitterBox+能将基于块的代码转换为适合LLM的文本格式。
用户可以利用LitterBox+查询LLM关于程序和质量问题，并生成代码修复。
LitterBox+提供了程序化API和扩展的Scratch用户界面，使功能更易于使用。
LitterBox+框架设计具有可扩展性，易于与其他提示、LLM提供商和新功能结合。

Cool Papers

点此查看论文截图

Probabilistic Reasoning with LLMs for k-anonymity Estimation

Authors:Jonathan Zheng, Sauvik Das, Alan Ritter, Wei Xu

Probabilistic reasoning is a key aspect of both human and artificial intelligence that allows for handling uncertainty and ambiguity in decision-making. In this paper, we introduce a new numerical reasoning task under uncertainty for large language models, focusing on estimating the privacy risk of user-generated documents containing privacy-sensitive information. We propose BRANCH, a new LLM methodology that estimates the k-privacy value of a text-the size of the population matching the given information. BRANCH factorizes a joint probability distribution of personal information as random variables. The probability of each factor in a population is estimated separately using a Bayesian network and combined to compute the final k-value. Our experiments show that this method successfully estimates the k-value 73% of the time, a 13% increase compared to o3-mini with chain-of-thought reasoning. We also find that LLM uncertainty is a good indicator for accuracy, as high-variance predictions are 37.47% less accurate on average.

概率推理是人类和人工智能决策中的关键方面，它允许处理不确定性和模糊性。在本文中，我们针对大型语言模型引入了一种新的不确定性数值推理任务，重点是对包含隐私敏感信息的用户生成文档进行隐私风险评估。我们提出了一种新的LLM方法BRANCH，用于估计文本的k隐私值——与给定信息相匹配的人口规模。BRANCH将个人信息的联合概率分布分解为随机变量。分别使用贝叶斯网络估计人口中每个因素的概值，并将其组合起来计算最终的k值。我们的实验表明，该方法能够成功估算出大多数时间的k值，即估算正确率达到为73%，与带有链思维推理的o3-mini相比增加了13%。我们还发现大型语言模型的不确定性是准确性的良好指标，因为高方差预测的平均准确性降低了37.47%。

论文及项目相关链接

PDF 10 pages, Accepted to NeurIPS 2025

Summary

文本介绍了概率推理在人工智能与人类决策中的重要性，并指出处理不确定性和模糊性是其主要特点。文章提出了一种新的数值推理任务，针对大型语言模型估计用户生成文档中隐私敏感信息的隐私风险。文章介绍了一种新的LLM方法BRANCH，用于估计文本的k隐私值，即与给定信息匹配的人口规模。BRANCH将个人信息作为随机变量分解联合概率分布。分别使用贝叶斯网络估计人口中每个因素的概率为单独，并组合计算最终的k值。实验表明，该方法成功估计k值的概率为百分之七十三，相较于使用思维链推理的o3-mini提高了百分之十三。还发现LLM的不确定性是准确性的良好指标，高方差预测的平均准确性降低了百分之三十七点四七。

Key Takeaways

概率推理是人工智能与人类决策处理不确定性和模糊性的关键方面。
提出了一种新的针对大型语言模型的数值推理任务，重点在于估计用户生成文档的隐私风险。
介绍了一种新的LLM方法BRANCH，用于估计文本的k隐私值。
BRANCH通过分解个人信息的联合概率分布并单独估计每个因素的概率为，来计算k值。
实验显示BRANCH方法成功估计k值的概率较高。
与其他方法相比，BRANCH在估计隐私风险的准确性方面有所提高。

Cool Papers

点此查看论文截图

Forget Forgetting: Continual Learning in a World of Abundant Memory

Authors:Dongkyu Cho, Taesup Moon, Rumi Chunara, Kyunghyun Cho, Sungmin Cha

Continual learning (CL) has traditionally focused on minimizing exemplar memory, a constraint often misaligned with modern systems where GPU time, not storage, is the primary bottleneck. This paper challenges this paradigm by investigating a more realistic regime: one where memory is abundant enough to mitigate forgetting, but full retraining from scratch remains prohibitively expensive. In this practical “middle ground”, we find that the core challenge shifts from stability to plasticity, as models become biased toward prior tasks and struggle to learn new ones. Conversely, improved stability allows simple replay baselines to outperform the state-of-the-art methods at a fraction of the GPU cost. To address this newly surfaced trade-off, we propose Weight Space Consolidation, a lightweight method that combines (1) rank-based parameter resets to restore plasticity with (2) weight averaging to enhance stability. Validated on both class-incremental learning with image classifiers and continual instruction tuning with large language models, our approach outperforms strong baselines while matching the low computational cost of replay, offering a scalable alternative to expensive full-retraining. These findings challenge long-standing CL assumptions and establish a new, cost-efficient baseline for real-world CL systems where exemplar memory is no longer the limiting factor.

持续学习（CL）传统上主要关注减少样本内存，但这种约束与现代系统往往不匹配，因为现代系统的主要瓶颈在于GPU时间而非存储空间。本文挑战了这一范式，探讨了一个更现实的情境：一个内存充足足以缓解遗忘但完全从头开始重新训练仍然过于昂贵的情境。在这种实际的“中间地带”中，我们发现核心挑战从稳定性转向了可塑性，因为模型偏向于先前的任务，难以学习新任务。相反，增强稳定性使得简单的回放基线可以在GPU成本极小的情况下超越最新技术方法。为了应对新出现的这种权衡，我们提出了权重空间巩固（Weight Space Consolidation），这是一种轻量级的方法，它将（1）基于排名的参数重置用于恢复可塑性，（2）权重平均用于增强稳定性。我们的方法经过图像分类器的类增量学习和大型语言模型的持续指令调整验证，在超越强大基线的同时匹配了回放的低计算成本，为昂贵的完全重新训练提供了可伸缩的替代方案。这些发现挑战了长期存在的CL假设，并为现实世界中的CL系统建立了新的低成本基线，其中样本内存不再是限制因素。

论文及项目相关链接

PDF 24 pages, 11 figures

Summary

在内存充足但仍需考虑GPU成本的现实情况下，传统的持续学习（CL）方法不再适用。模型在新任务学习方面存在挑战，主要在于偏向先前的任务并失去对新任务的适应力。针对此，本文提出了Weight Space Consolidation方法，通过结合基于排名的参数重置和权重平均来解决稳定性和可塑性之间的权衡问题。该方法在图像分类器的类增量学习和大型语言模型的连续指令调整中均表现出优异的性能，且计算成本低。这一发现挑战了长期的CL假设，为现实世界的CL系统建立了新的、成本效益高的基准。

Key Takeaways

传统持续学习（CL）主要关注最小化示例内存，但在内存充足但GPU成本高昂的现实情况下，这种方法并不适用。
在这种现实情况下，模型面临的主要挑战从稳定性转向了可塑性，因为模型偏向于先前的任务并难以学习新任务。
为了解决稳定性和可塑性之间的权衡问题，本文提出了Weight Space Consolidation方法，结合了基于排名的参数重置和权重平均技术。
该方法在图像分类器的类增量学习和大型语言模型的连续指令调整中进行了验证，性能优于强基线，同时匹配了回放（replay）的低计算成本。
该方法挑战了长期的CL假设，并为现实世界的CL系统提供了新的、成本效益高的基准。
该方法注重在不需要昂贵的全量训练的前提下，实现模型的持续学习和适应新任务的能力。

Cool Papers

点此查看论文截图

Adapting Large Language Models for Character-based Augmentative and Alternative Communication

Authors:Dylan Gaines, Keith Vertanen

Users of Augmentative and Alternative Communication (AAC) may write letter-by-letter via an interface that uses a character language model. However, most state-of-the-art large pretrained language models predict subword tokens of variable length. We investigate how to practically use such models to make accurate and efficient character predictions. Our algorithm for producing character predictions from a subword large language model (LLM) provides more accurate predictions than using a classification layer, a byte-level LLM, or an n-gram model. Additionally, we investigate a domain adaptation procedure based on a large dataset of sentences we curated based on scoring how useful each sentence might be for spoken or written AAC communication. We find our procedure further improves model performance on simple, conversational text.

增强和替代通信（AAC）的用户可以通过使用字符语言模型的界面逐字进行书写。然而，大多数最先进的大型预训练语言模型预测的是可变长度的子词标记。我们调查了如何实际使用此类模型进行准确高效的字符预测。我们从子词大型语言模型（LLM）中产生字符预测的算法，比使用分类层、字节级LLM或n-gram模型提供更准确的预测。此外，我们还调查了一种基于我们根据每个句子对口语或书面AAC通信的潜在作用评分而精心制作的大量数据集进行的域自适应过程。我们发现我们的过程在简单的对话文本上进一步提高了模型性能。

论文及项目相关链接

PDF To appear in Findings of EMNLP 2025

Summary

使用大型预训练语言模型进行字符预测的实践方法，对于增强和替代交流（AAC）的用户来说至关重要。该研究提出了一种从子词大型语言模型（LLM）中产生字符预测的算法，相较于分类层、字节级LLM或n-gram模型，该算法预测更为准确。此外，研究还基于我们根据句子对口语或书面AAC交流的潜在价值进行评分而精心挑选的大规模数据集，探索了一种领域适应程序。该程序可进一步提升模型在简单对话文本上的性能。

Key Takeaways

大型预训练语言模型可用于增强和替代交流（AAC）用户的字符预测。
提出的算法可以从子词大型语言模型（LLM）中产生更准确的字符预测。
与分类层、字节级LLM或n-gram模型相比，该算法的预测准确性更高。
研究利用大规模数据集进行领域适应，以提高模型在简单对话文本上的性能。
数据集的选取基于句子对口语或书面AAC交流潜在价值的评分。
该研究强调了大型语言模型在AAC交流中的实际应用价值。

Cool Papers

点此查看论文截图

LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation

Authors:Xi Ye, Fangcong Yin, Yinghui He, Joie Zhang, Howard Yen, Tianyu Gao, Greg Durrett, Danqi Chen

Existing benchmarks for evaluating long-context language models (LCLMs) primarily focus on long-context recall, requiring models to produce short responses based on a few critical snippets while processing thousands of irrelevant tokens. We introduce LongProc (Long Procedural Generation), a new benchmark that requires both the integration of highly dispersed information and long-form generation. LongProc consists of six diverse procedural generation tasks, such as extracting structured information from HTML pages into a TSV format and executing complex search procedures to create travel plans. These tasks challenge LCLMs by testing their ability to follow detailed procedural instructions, synthesize and reason over dispersed information, and generate structured, long-form outputs (up to 8K tokens). Furthermore, as these tasks adhere to deterministic procedures and yield structured outputs, they enable reliable rule-based evaluation. We evaluated 23 LCLMs, including instruction-tuned models and recent reasoning models, on LongProc at three difficulty levels, with the maximum number of output tokens set at 500, 2K, and 8K. Notably, while all tested models claim a context window size above 32K tokens, open-weight models typically falter on 2K-token tasks, and closed-source models like GPT-4o show significant degradation on 8K-token tasks. Reasoning models achieve stronger overall performance in long-form generation, benefiting from long CoT training. Further analysis reveals that LCLMs struggle to maintain long-range coherence in long-form generations. These findings highlight critical limitations in current LCLMs and suggest substantial room for improvement. Data and code available at: https://princeton-pli.github.io/LongProc.

现有评估长语境语言模型（LCLM）的基准测试主要集中在长语境回忆上，要求模型在处理成千上万的无关标记时，基于几个关键片段产生短回复。我们推出了LongProc（长程序生成）新基准测试，它要求高度分散的信息整合和长篇生成。LongProc包含六个不同的程序生成任务，例如从HTML页面提取结构化信息并转换为TSV格式，执行复杂的搜索程序以创建旅行计划。这些任务通过测试LCLM遵循详细程序指令、合成和推理分散信息、生成结构化长篇输出（最多达8K令牌）的能力来挑战LCLM。此外，由于这些任务遵循确定性程序并产生结构化输出，因此它们可以进行可靠的基于规则的评价。我们在三个难度级别上使用LongProc评估了23个LCLM，包括指令调整模型和最新的推理模型，最大输出令牌数设定为500、2K和8K。值得注意的是，尽管所有测试模型的上下文窗口大小都超过32K令牌，但开放式权重模型通常在2K令牌任务上表现不佳，而闭源模型如GPT-4o在8K令牌任务上表现出显著退化。推理模型在长篇文章生成方面总体表现更强，得益于长期CoT训练。进一步分析表明，LCLM在长篇生成中难以维持长期连贯性。这些发现突出了当前LCLM的关键局限性，并表明有很大的改进空间。[数据集和代码可在：https://princeton-pli.github.io/LongProc 获取。

论文及项目相关链接

PDF COLM 2025. Data and code available at: https://princeton-pli.github.io/LongProc

Summary

本文介绍了针对长语境语言模型（LCLM）的新基准测试LongProc，它要求模型在集成高度分散的信息的同时进行长文本生成。LongProc包含六个不同的过程生成任务，挑战了LCLM遵循详细程序指令、合成和推理分散信息以及生成结构化长文本的能力。评估发现，现有模型在长文本生成方面存在局限性，如开放权重模型在2K标记任务上表现不佳，封闭源模型如GPT-4o在8K标记任务上显著退化。尽管所有测试模型都声称拥有超过32K标记的上下文窗口大小，但整体而言，推理模型在长文本生成方面表现更好，得益于长期训练。建议改善现有模型并优化其在长文本上下文处理方面的性能。更多信息和代码可在https://princeton-pli.github.io/LongProc访问。

Key Takeaways

引入LongProc作为评估长语境语言模型（LCLM）的新基准测试，涵盖六个不同的过程生成任务。
LongProc任务要求模型集成高度分散的信息并进行长文本生成，挑战了模型的详细程序指令遵循能力、信息合成和推理能力。
现有LCLM模型在长文本生成方面存在局限性，不同模型在不同难度的任务中表现不同。
开放权重模型在2K标记任务上表现不佳，而封闭源模型如GPT-4o在8K标记任务上显著退化。
尽管所有测试模型都声称拥有较大的上下文窗口，但推理模型在长文本生成方面表现更好，得益于长期训练。
分析发现LCLM在长文本生成的连贯性方面存在问题。

Cool Papers

点此查看论文截图

CART: Compositional Auto-Regressive Transformer for Image Generation

Authors:Siddharth Roheda, Rohit Chowdhury, Aniruddha Bala, Rohan Jaiswal

We propose a novel Auto-Regressive (AR) image generation approach that models images as hierarchical compositions of interpretable visual layers. While AR models have achieved transformative success in language modeling, replicating this success in vision tasks has presented unique challenges due to inherent spatial dependencies in images. Addressing the unique challenges of vision tasks, our method (CART) adds image details iteratively via semantically meaningful decompositions. We demonstrate the flexibility and generality of CART by applying it across three distinct decomposition strategies: (i) Base-Detail Decomposition (Mumford-Shah smoothness), (ii) Intrinsic Decomposition (albedo/shading), and (iii) Specularity Decomposition (diffuse/specular). This “next-detail” strategy outperforms traditional “next-token” and “next-scale” approaches, improving controllability, semantic interpretability, and resolution scalability. Experiments show CART generates visually compelling results while enabling structured image manipulation, opening new directions for controllable generative modeling via physically or perceptually motivated image factorization.

我们提出了一种新型的自动回归（AR）图像生成方法，该方法将图像建模为可解释视觉层次结构的分层组合。虽然AR模型在语言建模方面取得了革命性的成功，但在视觉任务中复制这一成功却面临了独特的挑战，因为图像存在固有的空间依赖性。为了应对视觉任务的独特挑战，我们的方法（CART）通过语义有意义的分解迭代地添加图像细节。我们展示了CART的灵活性和通用性，将其应用于三种不同的分解策略：（i）基础细节分解（Mumford-Shah平滑），（ii）内在分解（亮度/阴影），以及（iii）光泽分解（漫反射/光泽）。这种“下一个细节”策略优于传统的“下一个标记”和“下一个尺度”方法，提高了可控性、语义可解释性和分辨率的可扩展性。实验表明，CART在生成视觉上引人注目的结果的同时，能够实现结构化图像操作，为通过物理或感知驱动的图像分解实现可控生成建模开辟了新的方向。

论文及项目相关链接

PDF figures compressed to meet arxiv size limit

Summary

本文提出了一种新型的Auto-Regressive（AR）图像生成方法，该方法将图像建模为可解释视觉层的层次组合。尽管AR模型在语言建模中取得了突破性成功，但在视觉任务中复制这一成功却面临了独特的挑战，这是由于图像固有的空间依赖性所导致的。本文的方法（CART）通过语义上有意义的分解迭代地添加图像细节，并展示了其在三种不同分解策略上的灵活性和通用性。实验表明，CART生成了视觉吸引人的结果，同时实现了结构化图像操纵，为可控生成建模提供了新的方向。

Key Takeaways

提出了Auto-Regressive（AR）图像生成方法，将图像建模为层次化的可解释视觉层组合。
AR模型在视觉任务中面临了独特的挑战，如图像的空间依赖性。
CART方法通过语义上有意义的分解迭代地添加图像细节。
CART可应用于三种不同的分解策略：Base-Detail分解、Intrinsic分解和Specularity分解。
CART的“next-detail”策略优于传统的“next-token”和“next-scale”方法，提高了可控性、语义可解释性和分辨率可扩展性。
实验表明CART能生成视觉上吸引人的结果，并实现了结构化图像操作。

Cool Papers

点此查看论文截图

PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context

Authors:Maximilian Augustin, Syed Shakib Sarwar, Mostafa Elhoushi, Sai Qian Zhang, Yuecheng Li, Barbara De Salvo

Following their success in natural language processing (NLP), there has been a shift towards transformer models in computer vision. While transformers perform well and offer promising multi-tasking performance, due to their high compute requirements, many resource-constrained applications still rely on convolutional or hybrid models that combine the benefits of convolution and attention layers and achieve the best results in the sub 100M parameter range. Simultaneously, task adaptation techniques that allow for the use of one shared transformer backbone for multiple downstream tasks, resulting in great storage savings at negligible cost in performance, have not yet been adopted for hybrid transformers. In this work, we investigate how to achieve the best task-adaptation performance and introduce PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers. We further combine PETAH adaptation with pruning to achieve highly performant and storage friendly models for multi-tasking. In our extensive evaluation on classification and other vision tasks, we demonstrate that our PETAH-adapted hybrid models outperform established task-adaptation techniques for ViTs while requiring fewer parameters and being more efficient on mobile hardware.

继其在自然语言处理（NLP）领域的成功后，计算机视觉领域也开始转向使用Transformer模型。虽然Transformer表现良好并提供有前景的多任务性能，但由于其计算需求较高，许多资源受限的应用程序仍然依赖于卷积或混合模型，这些模型结合了卷积层和注意力层的优点，并在小于100M的参数范围内取得了最佳结果。同时，允许使用一个共享的Transformer主干进行多个下游任务的任务适配技术，可以在性能损失极小的情况下实现巨大的存储节省，但尚未被混合Transformer所采用。在这项工作中，我们研究了如何实现最佳的任务适配性能，并介绍了PETAH：混合Transformer的参数高效任务适配。我们还将PETAH适配与修剪相结合，以实现多任务的高性能且存储友好的模型。在分类和其他视觉任务的广泛评估中，我们证明了我们的PETAH适配的混合模型在参数更少、移动硬件上效率更高的同时，优于ViT的现有任务适配技术。

论文及项目相关链接

PDF Published in CVPRW 2025

Summary
随着自然语言处理（NLP）的成功，计算机视觉领域开始转向使用Transformer模型。虽然Transformer表现良好并提供有前景的多任务性能，但由于其高计算需求，许多资源受限的应用仍依赖于卷积或混合模型。本文探讨了如何实现最佳的任务适应性，并引入了PETAH：混合Transformer的参数高效任务适应性技术。我们还通过剪枝进一步提高PETAH适应性，以实现对多任务的高效处理。在分类和其他视觉任务上的广泛评估表明，我们的PETAH自适应混合模型在参数需求更少且移动端硬件效率更高的同时，优于ViT的任务适应性技术。

Key Takeaways

Transformer模型在计算机视觉领域的应用逐渐普及。
虽然Transformer具有良好的多任务性能，但其高计算需求使得资源受限的应用仍依赖卷积或混合模型。
PETAH技术旨在实现混合Transformer的最佳任务适应性。
PETAH技术与剪枝相结合，提高模型的性能和存储效率。
在分类和其他视觉任务上，PETAH自适应混合模型的性能优于ViT的任务适应性技术。
PETAH自适应混合模型具有更少的参数需求和高移动端硬件效率。

Cool Papers

点此查看论文截图

M6(GPT)3: Generating Multitrack Modifiable Multi-Minute MIDI Music from Text using Genetic algorithms, Probabilistic methods and GPT Models in any Progression and Time Signature

Authors:Jakub Poćwiardowski, Mateusz Modrzejewski, Marek S. Tatara

This work introduces the M6(GPT)3 composer system, capable of generating complete, multi-minute musical compositions with complex structures in any time signature, in the MIDI domain from input descriptions in natural language. The system utilizes an autoregressive transformer language model to map natural language prompts to composition parameters in JSON format. The defined structure includes time signature, scales, chord progressions, and valence-arousal values, from which accompaniment, melody, bass, motif, and percussion tracks are created. We propose a genetic algorithm for the generation of melodic elements. The algorithm incorporates mutations with musical significance and a fitness function based on normal distribution and predefined musical feature values. The values adaptively evolve, influenced by emotional parameters and distinct playing styles. The system for generating percussion in any time signature utilises probabilistic methods, including Markov chains. Through both human and objective evaluations, we demonstrate that our music generation approach outperforms baselines on specific, musically meaningful metrics, offering a viable alternative to purely neural network-based systems.

本文介绍了M6（GPT）3作曲系统，该系统能够在MIDI领域中根据自然语言描述生成完整、多分钟的具有复杂结构的音乐。系统采用自回归转换器语言模型，将自然语言提示映射到JSON格式的作曲参数上。所定义的参数结构包括时间签名、音阶、和弦进展以及情感值，由此生成伴奏、旋律、低音、动机和打击乐轨道。我们提出了一种用于生成旋律元素的遗传算法。该算法结合了具有音乐意义的突变和基于正态分布和预定义音乐特征值的适应度函数。这些值会适应性地演变，受到情绪参数和独特演奏风格的影响。系统生成任何时间签名的打击乐部分采用了概率方法，包括马尔可夫链。通过人类和客观评估，我们证明了我们的音乐生成方法在特定的音乐意义上优于基线方法，为仅基于神经网络的系统提供了可行的替代方案。

论文及项目相关链接

PDF Published in 2025 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)

Summary
该工作介绍了M6（GPT）3作曲系统，该系统可根据自然语言描述生成完整、多分钟的音乐作品，具有复杂的结构并能适应任何的时间签名。系统采用自回归转换器语言模型，将自然语言提示映射到JSON格式的创作参数上。通过遗传算法生成旋律元素，结合音乐意义的突变和基于正态分布及预定义音乐特征值的适应度函数。系统可生成任何时间签名的打击乐，采用概率方法，包括马尔可夫链。评估结果表明，该音乐生成方法在某些音乐相关指标上优于基线方法，是纯粹基于神经网络系统的可行替代方案。

Key Takeaways

M6(GPT)3作曲系统能够根据自然语言描述生成具有复杂结构的完整音乐作品。
系统采用自回归转换器语言模型将自然语言提示转换为JSON格式的创作参数。
采用遗传算法生成旋律元素，结合音乐意义的突变和适应度函数。
系统可以适应不同的时间签名并生成相应的音乐作品。
打击乐的生成采用了概率方法，包括使用马尔可夫链。
该音乐生成方法在特定音乐指标上的表现优于基线方法。

Cool Papers

点此查看论文截图

StarTrail: Concentric Ring Sequence Parallelism for Efficient Near-Infinite-Context Transformer Model Training

Authors:Ziming Liu, Shaoyu Wang, Shenggan Cheng, Zhongkai Zhao, Kai Wang, Xuanlei Zhao, James Demmel, Yang You

Training Transformer models on long sequences in a distributed setting poses significant challenges in terms of efficiency and scalability. Current methods are either constrained by the number of attention heads or excessive communication overheads. To address this problem, we propose StarTrail, a multi-dimensional concentric distributed training system for long sequences, fostering an efficient communication paradigm and providing additional tuning flexibility for communication arrangements. Specifically, StarTrail introduces an extra parallel dimension and divides the peer-to-peer communication into sub-rings to substantially reduce communication volume and avoid bandwidth bottlenecks. Through comprehensive experiments across diverse hardware environments and on both Natural Language Processing (NLP) and Computer Vision (CV) tasks, we demonstrate that our approach significantly surpasses state-of-the-art methods that support Long sequence lengths, achieving performance improvements of up to 77.12% on GPT-style models and up to 114.33% on DiT (Diffusion Transformer) models without affecting the computations results.

在分布式环境中对长序列进行Transformer模型训练，在效率和可扩展性方面面临着巨大挑战。当前的方法要么受到注意力头数量的限制，要么存在过多的通信开销。为了解决这一问题，我们提出了StarTrail，这是一个用于长序列的多维同心分布式训练系统，它促进了一种高效的通信范式，并为通信安排提供了额外的调整灵活性。具体来说，StarTrail引入了一个额外的并行维度，将点对点通信分成子环，从而大幅度减少了通信量并避免了带宽瓶颈。我们通过在不同硬件环境上进行的全面实验，以及自然语言处理（NLP）和计算机视觉（CV）任务，证明了我们的方法显著超过了支持长序列长度的最新方法，在GPT风格模型上实现了高达77.12%的性能提升，在DiT（扩散变换器）模型上实现了高达114.33%的性能提升，且不影响计算结果。

论文及项目相关链接

PDF

Summary

在分布式环境中训练长序列的Transformer模型面临效率和可扩展性的挑战。针对这一问题，本文提出StarTrail系统，该系统采用多维同心分布式训练架构，优化了通信模式并为通信安排提供了额外的调整灵活性。StarTrail通过引入额外的并行维度和将点对点通信划分为子环来减少通信量并避免带宽瓶颈。实验表明，StarTrail在支持长序列长度方面显著优于现有技术方法，GPT风格模型和DiT（扩散转换器）模型的性能分别提高了77.12%和最高可达上超过我们的做法通过简化的创新算法表明序列比现有技术方法提高了高达上提高了高达上超过我们的做法。通过在多样化的硬件环境和自然语言处理（NLP）和计算机视觉（CV）任务上的全面实验，证明了StarTrail的有效性。这些方法在不影响计算结果的情况下实现了显著的性能提升。

Key Takeaways

训练Transformer模型在长序列的分布式环境中存在效率和可扩展性问题。
当前方法受到注意力头数量的限制或通信开销过大的影响。
StarTrail系统提出一个多维同心分布式训练架构来解决这个问题。
StarTrail引入额外的并行维度，将点对点通信划分为子环来减少通信量和避免带宽瓶颈。
StarTrail通过优化通信模式为通信安排提供额外的调整灵活性。
实验证明，StarTrail在支持长序列长度方面显著优于现有技术方法，GPT风格模型和DiT模型的性能提升显著。

Cool Papers

点此查看论文截图

PaECTER: Patent-level Representation Learning using Citation-informed Transformers

Authors:Mainak Ghosh, Michael E. Rose, Sebastian Erhardt, Erik Buunk, Dietmar Harhoff

PaECTER is an open-source document-level encoder specific for patents. We fine-tune BERT for Patents with examiner-added citation information to generate numerical representations for patent documents. PaECTER performs better in similarity tasks than current state-of-the-art models used in the patent domain. More specifically, our model outperforms the patent specific pre-trained language model (BERT for Patents) and general-purpose text embedding models (e.g., E5, GTE, and BGE) on our patent citation prediction test dataset on different rank evaluation metrics. PaECTER predicts at least one most similar patent at a rank of 1.32 on average when compared against 25 irrelevant patents. Numerical representations generated by PaECTER from patent text can be used for downstream tasks such as classification, tracing knowledge flows, or semantic similarity search. Semantic similarity search is especially relevant in the context of prior art search for both inventors and patent examiners.

PaECTER是一个针对专利的开源文档级别编码器。我们使用审查员添加的引文信息对专利的BERT模型进行微调，以生成专利文档的数值表示。在相似度任务方面，PaECTER的表现优于当前专利领域中使用的最先进的模型。更具体地说，我们的模型在专利引文预测测试数据集上的不同排名评估指标上，超越了针对专利的预训练语言模型（专利BERT）和通用文本嵌入模型（例如E5、GTE和BGE）。当与25篇不相关的专利相比时，PaECTER平均在排名第一时至少预测出一篇最相似的专利，相似度平均值为1.32。由PaECTER从专利文本生成的数值表示可用于下游任务，如分类、追踪知识流或语义相似性搜索。语义相似性搜索对于发明家和专利审查员的现有技术搜索来说尤其重要。

论文及项目相关链接

PDF 8 pages, 3 figures, 4 tables

Summary

PaECTER是一种针对专利的开源文档级别编码器。通过利用审查员添加的引文信息对BERT for Patents进行微调，生成专利文档的数值表示。在专利相似性任务中，PaECTER的表现优于当前专利领域的最先进模型。它能够在专利引文预测测试数据集上超越专利特定预训练语言模型（如BERT for Patents）和通用文本嵌入模型（如E5、GTE和BGE），在不同排名评估指标上表现优异。PaECTER预测的至少一个最相似专利的平均排名为1.32，与25个不相关专利相比。由PaECTER从专利文本生成的数值表示可用于下游任务，如分类、追踪知识流或语义相似性搜索。语义相似性搜索对于发明人和专利审查员的先前技术搜索尤为重要。

Key Takeaways