发布日期: 2025-10-07

更新日期: 2025-11-27

文章字数: 20.3k

阅读时长: 83 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-10-07 更新

Self-Anchor: Large Language Model Reasoning via Step-by-step Attention Alignment

Authors:Hongxiang Zhang, Yuan Tian, Tianyi Zhang

To solve complex reasoning tasks for Large Language Models (LLMs), prompting-based methods offer a lightweight alternative to fine-tuning and reinforcement learning. However, as reasoning chains extend, critical intermediate steps and the original prompt will be buried in the context, receiving insufficient attention and leading to errors. In this paper, we propose Self-Anchor, a novel pipeline that leverages the inherent structure of reasoning to steer LLM attention. Self-Anchor decomposes reasoning trajectories into structured plans and automatically aligns the model’s attention to the most relevant inference steps, allowing the model to maintain focus throughout generation. Our experiment shows that Self-Anchor outperforms SOTA prompting methods across six benchmarks. Notably, Self-Anchor significantly reduces the performance gap between ``non-reasoning’’ models and specialized reasoning models, with the potential to enable most LLMs to tackle complex reasoning tasks without retraining.

为了解决大型语言模型（LLM）的复杂推理任务，基于提示的方法为微调强化学习提供了一种轻量级的替代方案。然而，随着推理链的延伸，关键的中间步骤和原始提示会被上下文所淹没，得不到足够的关注，从而导致错误。在本文中，我们提出了Self-Anchor，这是一种利用推理的内在结构来引导LLM注意的新型管道。Self-Anchor将推理轨迹分解为结构化计划，并自动将模型注意力与最相关的推理步骤对齐，使模型在整个生成过程中保持关注。我们的实验表明，Self-Anchor在六个基准测试中优于SOTA提示方法。值得注意的是，Self-Anchor显著减少了“非推理”模型和专用推理模型之间的性能差距，有可能使大多数LLM能够解决复杂的推理任务而无需重新训练。

论文及项目相关链接

PDF

Summary
大型语言模型（LLM）在处理复杂推理任务时，基于提示的方法为微调提供了轻量级替代方案。但随着推理链的延伸，关键的中间步骤和原始提示会在语境中被忽略，导致注意力不足和错误。本文提出Self-Anchor，一种利用推理的内在结构来引导LLM注意力的新方法。它通过分解推理轨迹为结构化计划并自动对齐模型注意力到最关键的推理步骤，使模型在生成过程中保持关注重点。实验表明，Self-Anchor在六个基准测试中优于最新提示方法。尤其值得关注的是，Self-Anchor显著缩小了“非推理”模型和专用推理模型之间的性能差距，具有使大多数LLM无需重新训练就能处理复杂推理任务的能力。

Key Takeaways

提示方法在解决大型语言模型的复杂推理任务中提供了一个轻量级的替代方案。
随着推理链的延长，关键中间步骤和原始提示容易被忽视，导致错误。
Self-Anchor利用推理的内在结构来引导LLM的注意力。
Self-Anchor通过分解推理轨迹为结构化计划，保持模型在生成过程中的关注重点。
实验显示Self-Anchor在多个基准测试中表现优越。
Self-Anchor缩小了非推理模型和专用推理模型之间的性能差距。

R1_Reasoning

2025-10-07 更新

Self-Anchor: Large Language Model Reasoning via Step-by-step Attention Alignment

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

PRISM-Physics: Causal DAG-Based Process Evaluation for Physics Reasoning

SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus

MM-Nav: Multi-View VLA Model for Robust Visual Navigation via Multi-Expert Learning

A Unified Deep Reinforcement Learning Approach for Close Enough Traveling Salesman Problem

FR-LUX: Friction-Aware, Regime-Conditioned Policy Optimization for Implementable Portfolio Management

RoiRL: Efficient, Self-Supervised Reasoning with Offline Iterative Reinforcement Learning

Reward Model Routing in Alignment

StepChain GraphRAG: Reasoning Over Knowledge Graphs for Multi-Hop Question Answering

Retrv-R1: A Reasoning-Driven MLLM Framework for Universal and Efficient Multimodal Retrieval

IndiCASA: A Dataset and Bias Evaluation Framework in LLMs Using Contrastive Embedding Similarity in the Indian Context

Time-To-Inconsistency: A Survival Analysis of Large Language Model Robustness to Adversarial Attacks

SoT: Structured-of-Thought Prompting Guides Multilingual Reasoning in Large Language Models

Evaluation Framework for Highlight Explanations of Context Utilisation in Language Models

On the Role of Temperature Sampling in Test-Time Scaling

Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models through Reinforcement Learning from Ranking Feedback

Orchestrating Human-AI Teams: The Manager Agent as a Unifying Research Challenge

Do AI Models Perform Human-like Abstract Reasoning Across Modalities?

Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey