发布日期: 2025-11-21

更新日期: 2025-11-27

文章字数: 2.5k

阅读时长: 10 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-11-21 更新

Generating Natural-Language Surgical Feedback: From Structured Representation to Domain-Grounded Evaluation

Authors:Firdavs Nasriddinov, Rafal Kocielnik, Anima Anandkumar, Andrew J. Hung

High-quality intraoperative feedback from a surgical trainer is pivotal for improving trainee performance and long-term skill acquisition. Automating natural, trainer-style feedback promises timely, accessible, and consistent guidance at scale but requires models that understand clinically relevant representations. We present a structure-aware pipeline that learns a surgical action ontology from real trainer-to-trainee transcripts (33 surgeries) and uses it to condition feedback generation. We contribute by (1) mining Instrument-Action-Target (IAT) triplets from real-world feedback text and clustering surface forms into normalized categories, (2) fine-tuning a video-to-IAT model that leverages the surgical procedure and task contexts as well as fine-grained temporal instrument motion, and (3) demonstrating how to effectively use IAT triplet representations to guide GPT-4o in generating clinically grounded, trainer-style feedback. We show that, on Task 1: Video-to-IAT recognition, our context injection and temporal tracking deliver consistent AUC gains (Instrument: 0.67 to 0.74; Action: 0.60 to 0.63; Tissue: 0.74 to 0.79). For Task 2: feedback text generation (rated on a 1-5 fidelity rubric where 1 = opposite/unsafe, 3 = admissible, and 5 = perfect match to a human trainer), GPT-4o from video alone scores 2.17, while IAT conditioning reaches 2.44 (+12.4%), doubling the share of admissible generations with score >= 3 from 21% to 42%. Traditional text-similarity metrics also improve: word error rate decreases by 15-31% and ROUGE (phrase/substring overlap) increases by 9-64%. Grounding generation in explicit IAT structure improves fidelity and yields clinician-verifiable rationales, supporting auditable use in surgical training.

手术训练师的高质量术中反馈对于提高学员表现和长期技能获取至关重要。自动化自然、教练式的反馈承诺提供及时、可访问和大规模的指导，但需要能够理解临床相关表示的模型。我们提出了一种结构感知管道，它从真实的教练到学员的转录本（33次手术）中学习手术动作本体，并将其用于条件反馈生成。我们的贡献在于：（1）从现实世界的反馈文本中挖掘仪器-动作-目标（IAT）三元组，并将表面形式聚类到标准化类别中；（2）微调视频到IAT的模型，该模型利用手术程序和任务上下文以及精细的临时仪器运动；（3）展示如何使用IAT三元组表示来指导GPT-4o生成基于临床、教练式的反馈。我们表明，在任务1：视频到IAT识别方面，我们的上下文注入和临时跟踪带来了一致的AUC增益（仪器：0.67至0.74；动作：0.60至0.63；组织：0.74至0.79）。对于任务2：反馈文本生成（在1-5的保真度规则中评定，其中1=相反/不安全，3=可接受，5=与人工教练完美匹配），仅通过视频，GPT-4o的得分为2.17，而IAT条件达到2.44（+12.4%），将得分大于等于3的可接受代的份额从21%增加到42%。传统的文本相似性指标也有所提高：词错误率降低了15-31%，ROUGE（短语/子字符串重叠）增加了9-64%。在明确的IAT结构基础上进行生成提高了保真度，并产生了临床医生可验证的理由，支持在手术培训中使用可审核的方法。

论文及项目相关链接

PDF Accepted as proceedings paper for ML4H 2025

Summary

该文强调了术中反馈对提升手术训练员表现和长期技能习得的重要性。自动化自然、教练风格的反馈可提供及时、可访问、规模化的指导。研究团队提出了一种结构感知的管道，从真实的教练与学员对话中（涉及33次手术）学习手术动作本体，并用其来调节反馈生成。通过挖掘真实反馈文本中的仪器-动作-目标（IAT）三元组并对表面形式进行聚类归一化分类，微调利用手术程序和任务上下文以及精细时间仪器运动的视频到IAT模型，并展示了如何使用IAT三元组表示来指导GPT-4o生成临床基础、教练风格的反馈。结果显示，在视频到IAT识别任务中，上下文注入和时间追踪带来了一致性的AUC增益；在反馈文本生成任务中，相比仅依赖视频生成，使用IAT调节提高了评分。总之，该研究提高了手术训练中的反馈质量。

Key Takeaways

高质量的术中教练反馈对提升学员表现和长期技能习得至关重要。
自动化自然、教练风格的反馈能提供及时、可访问、规模化的指导。
研究团队提出了一个结构感知的管道，能够从真实对话中学习手术动作本体并用于调节反馈生成。
通过挖掘仪器-动作-目标（IAT）三元组并聚类归一化分类，提高反馈质量。
上下文注入和时间追踪技术提高了视频到IAT识别的准确性。
使用IAT调节的GPT-4o在生成临床基础、教练风格的反馈时表现出更高的评分，提高反馈文本的保真度。相较于仅依赖视频生成反馈，IAT调节能提高评分和减少错误率。

Cool Papers

点此查看论文截图

SceneEdited: A City-Scale Benchmark for 3D HD Map Updating via Image-Guided Change Detection

Authors:Chun-Jung Lin, Tat-Jun Chin, Sourav Garg, Feras Dayoub

Accurate, up-to-date High-Definition (HD) maps are critical for urban planning, infrastructure monitoring, and autonomous navigation. However, these maps quickly become outdated as environments evolve, creating a need for robust methods that not only detect changes but also incorporate them into updated 3D representations. While change detection techniques have advanced significantly, there remains a clear gap between detecting changes and actually updating 3D maps, particularly when relying on 2D image-based change detection. To address this gap, we introduce SceneEdited, the first city-scale dataset explicitly designed to support research on HD map maintenance through 3D point cloud updating. SceneEdited contains over 800 up-to-date scenes covering 73 km of driving and approximate 3 $\text{km}^2$ of urban area, with more than 23,000 synthesized object changes created both manually and automatically across 2000+ out-of-date versions, simulating realistic urban modifications such as missing roadside infrastructure, buildings, overpasses, and utility poles. Each scene includes calibrated RGB images, LiDAR scans, and detailed change masks for training and evaluation. We also provide baseline methods using a foundational image-based structure-from-motion pipeline for updating outdated scenes, as well as a comprehensive toolkit supporting scalability, trackability, and portability for future dataset expansion and unification of out-of-date object annotations. Both the dataset and the toolkit are publicly available at https://github.com/ChadLin9596/ScenePoint-ETK, establising a standardized benchmark for 3D map updating research.

准确、最新的高清（HD）地图对于城市规划、基础设施监测和自主导航至关重要。然而，随着环境的不断变化，这些地图很快就会过时，因此需要一种稳健的方法，不仅要检测变化，还要将这些变化纳入更新的3D表示中。虽然变化检测技术已经取得了显著的进步，但在检测变化和实际更新3D地图之间仍然存在明显的差距，特别是当依赖基于2D图像的变化检测时。为了弥补这一差距，我们推出了SceneEdited，这是第一个城市规模的数据集，专门用于支持通过3D点云更新高清地图维护的研究。SceneEdited包含800多个最新场景，覆盖73公里的驾驶里程和约3平方公里的城市区域，其中包含超过23000个合成对象变化，这些变化是通过手动和自动方式在2000多个过时版本上创建的，模拟真实的城市修改，如缺失的路侧基础设施、建筑物、立交桥和电杆等。每个场景都包括校准后的RGB图像、激光雷达扫描和详细的变化掩膜，用于训练和评估。我们还提供了一种基于图像的结构从运动管道的基础方法，用于更新过时的场景，以及一个全面的工具包，支持可扩展性、可追溯性和便携性，以便于未来数据集的扩展和过时对象注释的统一。该数据集和工具包均可在https://github.com/ChadLin9596/ScenePoint-ETK上公开获取，为3D地图更新研究建立了标准化的基准。

论文及项目相关链接

PDF accepted by WACV 2026

Summary

本文介绍了城市规模数据集SceneEdited的创建，该数据集旨在支持通过三维点云更新高清地图维护的研究。SceneEdited包含超过800个最新场景，覆盖73公里的驾驶和约3平方公里的城市区域，包含超过2.3万个合成对象变化。数据集提供校准的RGB图像、激光雷达扫描和详细的变化掩膜，用于训练和评估。同时提供基于图像的基础运动恢复结构管道，以更新过时场景，并提供全面的工具包支持未来数据集的扩展和过时对象注释的统一。

Key Takeaways