LLM

发布日期: 2025-11-09

更新日期: 2025-11-27

文章字数: 3.3k

阅读时长: 13 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-11-09 更新

RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

Authors:Tianyu Yu, Haoye Zhang, Qiming Li, Qixin Xu, Yuan Yao, Da Chen, Xiaoman Lu, Ganqu Cui, Yunkai Dang, Taiwen He, Xiaocheng Feng, Jun Song, Bo Zheng, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun

Traditional feedback learning for hallucination reduction relies on labor-intensive manual labeling or expensive proprietary models. This leaves the community without foundational knowledge about how to build high-quality feedback with open-source MLLMs. In this work, we introduce RLAIF-V, a novel framework that aligns MLLMs in a fully open-source paradigm. RLAIF-V maximally explores open-source MLLMs from two perspectives, including high-quality feedback data generation for preference learning and self-feedback guidance for inference-time scaling. Extensive experiments on six benchmarks in both automatic and human evaluation show that RLAIF-V substantially enhances the trustworthiness of models at both preference learning and inference time. RLAIF-V 7B reduces object hallucination by 80.7% and overall hallucination by 33.7%. Remarkably, RLAIF-V 12B further reveals the self-alignment potential of open-source MLLMs, where the model can learn from feedback of itself to achieve super GPT-4V trustworthiness.

传统的反馈学习在减少幻觉方面依赖于劳动密集的手动标签或昂贵的专有模型。这使得社区在没有关于如何使用开源MLLM构建高质量反馈的基础知识。在这项工作中，我们介绍了RLAIF-V，这是一个全新的框架，以完全开源的方式对齐MLLM。RLAIF-V从两个角度最大限度地探索了开源MLLM，包括用于偏好学习的高质量反馈数据生成以及用于推断时间尺度的自我反馈指导。在六个基准测试上进行的自动和人为评估的广泛实验表明，RLAIF-V在偏好学习和推断时间两个方面都大大提高了模型的可靠性。RLAIF-V 7B将对象幻觉减少了80.7%，总体幻觉减少了33.7%。值得注意的是，RLAIF-V 1 揭示了开源MLLM的自我对齐潜力，该模型可以从自身的反馈中学习，达到超越GPT-4V的可信水平。

论文及项目相关链接

PDF Project Website: https://github.com/RLHF-V/RLAIF-V

摘要

本文介绍了RLAIF-V框架，该框架以完全开源的方式对齐大型语言模型（LLM）。RLAIF-V从高质量反馈数据生成和自反馈指导两个方面探索了开源LLM。在自动和人类评估的六个基准测试上的实验表明，RLAIF-V在偏好学习和推理时间都能显著提高模型的可靠性。RLAIF-V 7B减少了物体幻觉的80.7%，总体幻觉减少了33.7%。值得注意的是，RLAIF-V 12B揭示了开源LLM的自我对齐潜力，模型能够从自身反馈中学习，实现超越GPT-4V的信任度。

关键见解

RLAIF-V是一个新颖的框架，旨在以完全开源的方式对齐大型语言模型（LLM）。
该框架从高质量反馈数据生成和自反馈指导两个方面探索了开源LLM的应用。
RLAIF-V通过生成反馈数据提高了模型在偏好学习阶段的可靠性。
在推理时间，RLAIF-V通过自反馈指导增强了模型的信任度。
RLAIF-V 7B版本在减少物体幻觉和总体幻觉方面取得了显著成效。
RLAIF-V 12B版本展示了开源LLM的自我对齐潜力，即模型能从自身反馈中学习。

Cool Papers

点此查看论文截图

SST: Multi-Scale Hybrid Mamba-Transformer Experts for Time Series Forecasting

Authors:Xiongxiao Xu, Canyu Chen, Yueqing Liang, Baixiang Huang, Guangji Bai, Liang Zhao, Kai Shu

Time series forecasting has made significant advances, including with Transformer-based models. The attention mechanism in Transformer effectively captures temporal dependencies by attending to all past inputs simultaneously. However, its quadratic complexity with respect to sequence length limits the scalability for long-range modeling. Recent state space models (SSMs) such as Mamba offer a promising alternative by achieving linear complexity without attention. Yet, Mamba compresses historical information into a fixed-size latent state, potentially causing information loss and limiting representational effectiveness. This raises a key research question: Can we design a hybrid Mamba-Transformer architecture that is both effective and efficient for time series forecasting? To address it, we adapt a hybrid Mamba-Transformer architecture Mambaformer, originally proposed for language modeling, to the time series domain. Preliminary experiments reveal that naively stacking Mamba and Transformer layers in Mambaformer is suboptimal for time series forecasting, due to an information interference problem. To mitigate this issue, we introduce a new time series decomposition strategy that separates time series into long-range patterns and short-range variations. Then we show that Mamba excels at capturing long-term structures, while Transformer is more effective at modeling short-term dynamics. Building on this insight, we propose State Space Transformer (SST), a multi-scale hybrid model with expert modules: a Mamba expert for long-range patterns and a Transformer expert for short-term variations. SST also employs a multi-scale patching mechanism to adaptively adjust time series resolution: low resolution for long-term patterns and high resolution for short-term variations. Experiments show that SST obtains SOTA performance with linear scalability. The code is at https://github.com/XiongxiaoXu/SST.

时间序列预测已经取得了重大进展，包括基于Transformer的模型。Transformer中的注意力机制通过同时关注所有过去的输入有效地捕捉了时间依赖性。然而，其关于序列长度的二次复杂性限制了长程建模的可扩展性。最近的状态空间模型（SSM），如Mamba，通过实现无需注意力的线性复杂性提供了一种有前途的替代方案。然而，Mamba将历史信息压缩为固定大小的潜在状态，可能导致信息丢失并限制其表示的有效性。这引发了一个关键的研究问题：我们能否设计出一种混合的Mamba-Transformer架构，既能有效地进行时间序列预测，又能提高效率？为了解决这一问题，我们将原本为语言模型设计的混合Mamba-Transformer架构Mambaformer，适应到时间序列领域。初步实验表明，在Mambaformer中简单地堆叠Mamba和Transformer层对于时间序列预测来说并不是最优的，因为存在信息干扰问题。为了缓解这个问题，我们引入了一种新的时间序列分解策略，将时间序列分解为长期模式和短期变化。然后我们发现Mamba在捕捉长期结构方面表现出色，而Transformer在模拟短期动态方面更为有效。基于这一见解，我们提出了状态空间Transformer（SST），这是一个多尺度混合模型，包含专业模块：用于长期模式的Mamba专家和用于短期变化的Transformer专家。SST还采用了一种多尺度补丁机制来自适应地调整时间序列的分辨率：长期模式使用低分辨率，短期变化使用高分辨率。实验表明，SST具有线性可扩展性，并获得了最先进的性能。代码地址为：https://github.com/XiongxiaoXu/SST。

论文及项目相关链接

PDF CIKM 2025

Summary

本文探讨了时间序列预测中的混合模型研究。介绍了基于Transformer的模型在处理时间序列时的挑战，如二次复杂度限制长期建模的扩展性。研究了最近的状态空间模型（SSM）如Mamba，其通过实现线性复杂度无需注意力机制。然而，Mamba压缩历史信息至固定大小的潜在状态可能导致信息损失和代表性不足。研究提出混合Mamba和Transformer的架构Mambaformer，并适应于时间序列领域。初步实验发现简单堆叠Mamba和Transformer层在Mambaformer中对于时间序列预测是次优的，存在信息干扰问题。为解决此问题，引入新的时间序列分解策略，将时间序列分为长期模式和短期变化。基于此洞察，提出State Space Transformer（SST）这一多尺度混合模型，包含针对长期模式和短期变化的Mamba专家和Transformer专家模块。SST还采用多尺度补丁机制自适应调整时间序列分辨率，以获得长期模式和短期变化的最佳效果。实验显示SST在具有线性扩展性的情况下达到最优性能。其代码已在指定链接公开分享。

Key Takeaways

时间序列预测领域存在挑战，包括模型复杂度和信息损失问题。
Transformer模型在处理时间序列时的二次复杂度限制了长期建模的扩展性。
Mamba作为一种状态空间模型（SSM）实现了线性复杂度，但可能因信息压缩而损失代表性。
Mambaformer架构结合了Mamba和Transformer，但简单堆叠这两者在时间序列预测中效果不佳，存在信息干扰。
提出新的时间序列分解策略，区分长期模式和短期变化。
Mamba擅长捕捉长期结构，而Transformer更擅长建模短期动态。
SST是一个多尺度混合模型，包含针对长期和短期变化的专家模块，并采用多尺度补丁机制优化性能。

Cool Papers

点此查看论文截图

MobilityGPT: Enhanced Human Mobility Modeling with a GPT model

Authors:Ammar Haydari, Dongjie Chen, Zhengfeng Lai, Michael Zhang, Chen-Nee Chuah

Generative models have shown promising results in capturing human mobility characteristics and generating synthetic trajectories. However, it remains challenging to ensure that the generated geospatial mobility data is semantically realistic, including consistent location sequences, and reflects real-world characteristics, such as constraining on geospatial limits. We reformat human mobility modeling as an autoregressive generation task to address these issues, leveraging the Generative Pre-trained Transformer (GPT) architecture. To ensure its controllable generation to alleviate the above challenges, we propose a geospatially-aware generative model, MobilityGPT. We propose a gravity-based sampling method to train a transformer for semantic sequence similarity. Then, we constrained the training process via a road connectivity matrix that provides the connectivity of sequences in trajectory generation, thereby keeping generated trajectories in geospatial limits. Lastly, we proposed to construct a preference dataset for fine-tuning MobilityGPT via Reinforcement Learning from Trajectory Feedback (RLTF) mechanism, which minimizes the travel distance between training and the synthetically generated trajectories. Experiments on real-world datasets demonstrate MobilityGPT’s superior performance over state-of-the-art methods in generating high-quality mobility trajectories that are closest to real data in terms of origin-destination similarity, trip length, travel radius, link, and gravity distributions. We release the source code and reference links to datasets at https://github.com/ammarhydr/MobilityGPT.

生成模型在捕捉人类移动特性和生成合成轨迹方面已显示出有前景的结果。然而，确保生成的地理空间移动数据在语义上是现实的，包括位置序列的一致性，并反映现实世界的特点，如地理空间限制，仍然是一个挑战。我们将人类移动建模重塑为自回归生成任务，以解决这些问题，并利用生成预训练转换器（GPT）架构。为了确保可控生成，以减轻上述挑战，我们提出了一种地理空间感知的生成模型，即MobilityGPT。我们提出了一种基于引力的采样方法来训练转换器，以实现语义序列相似性。然后，我们通过道路连通性矩阵约束训练过程，该矩阵提供了轨迹生成中序列的连通性，从而保持生成的轨迹在地理空间限制内。最后，我们提出通过基于轨迹反馈的强化学习（RLTF）机制构建偏好数据集，对MobilityGPT进行微调，最小化训练和合成轨迹之间的旅行距离。在真实世界数据集上的实验表明，MobilityGPT在生成高质量移动轨迹方面的性能优于最先进的方法，在起点和终点相似性、行程长度、旅行半径、链接和引力分布方面最接近真实数据。我们在https://github.com/ammarhydr/MobilityGPT上发布了源代码和参考数据集链接。

论文及项目相关链接

PDF

Summary

基于生成式预训练转换器（GPT）架构，研究者提出了一种地理空间感知的生成模型MobilityGPT，用于解决人类移动性建模的问题。该模型通过自回归生成任务的形式，能够捕捉人类移动性特征并生成合成轨迹。为解决生成的地理空间数据语义真实性问题，研究者提出了基于重力的采样方法进行训练，并通过道路连通性矩阵约束训练过程，以保证生成轨迹的地理空间限制。此外，通过强化学习从轨迹反馈（RLTF）机制构建偏好数据集进行微调，使MobilityGPT能够生成最接近真实数据的优质移动轨迹。

Key Takeaways