发布日期: 2025-10-11

更新日期: 2025-11-27

文章字数: 3.9k

阅读时长: 16 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-10-11 更新

A million-solar-mass object detected at cosmological distance using gravitational imaging

Authors:D. M. Powell, J. P. McKean, S. Vegetti, C. Spingola, S. D. M. White, C. D. Fassnacht

Structure on sub-galactic scales provides important tests of galaxy formation models and the nature of dark matter. However, such objects are typically too faint to provide robust mass constraints. Here, we report the discovery of an extremely low-mass object detected via its gravitational perturbation to a thin lensed arc observed with milli-arcsecond-resolution very long baseline interferometry (VLBI). The object was identified using a non-parametric gravitational imaging technique and confirmed using independent parametric modelling. It contains a mass of $m_{\rm 80}=(1.13 \pm 0.04)\times 10^6{M_\odot}$ within a projected radius of 80 parsecs at an assumed redshift of 0.881. This detection is extremely robust and precise, with a statistical significance of 26$\sigma$, a 3.3 per cent fractional uncertainty on $m_{\rm 80}$, and an astrometric uncertainty of 194 $\mu$as. This is the lowest-mass object known to us, by two orders of magnitude, to be detected at a cosmological distance by its gravitational effect. This work demonstrates the observational feasibility of using gravitational imaging to probe the million-solar-mass regime far beyond our local Universe.

在子星系尺度上的结构对星系形成模型和暗物质的性质提供了重要的测试。然而，这样的天体通常太暗，无法提供稳健的质量约束。在这里，我们报告了一个通过其对具有毫角秒分辨率的长基线干涉仪观测到的薄透镜弧的引力扰动而检测到的一种极低质量天体的发现。该天体通过非参数引力成像技术识别，并使用独立的参数模型进行验证。在假设的红移为0.881的情况下，其在80秒差距的投影半径内包含的质量为$m_{\rm 80}=(1.13 \pm 0.04)\times 10^6{M_\odot}$。此次检测非常稳健且精确，统计显著性为26σ，对$m_{\rm 80}$的不确定性为3.3%，星历不确定性为194微角秒。这是我们所知，通过其引力作用在宇宙距离上检测到的两个数量级最低质量的天体。这项工作证明了使用引力成像探测百万太阳质量范围远超我们本地宇宙的观察可行性。

论文及项目相关链接

PDF Published in Nature Astronomy. See companion paper by McKean et al. also posted today

Summary：
通过引力成像技术，研究人员利用极高分辨率的射电干涉仪检测到一个极低质量物体的引力扰动，该物体质量约为太阳质量的百万分之一，是我们已知的在宇宙距离上通过引力效应检测到的两个数量级最低的物体之一。这项研究证明了利用引力成像技术在宇宙距离探测百万太阳质量领域内的物体的可行性。

Key Takeaways:

Cool Papers

点此查看论文截图

CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation

Authors:Pardis Taghavi, Tian Liu, Renjie Li, Reza Langari, Zhengzhong Tu

Instance segmentation demands costly per-pixel annotations and computationally expensive models. We introduce CAST, a semi-supervised knowledge distillation (SSKD) framework that compresses pre-trained vision foundation models (VFM) into compact experts using limited labeled and abundant unlabeled data. CAST unfolds in three stages: (1) domain adaptation of the VFM(s) via self-training with contrastive calibration, (2) knowledge transfer through a unified multi-objective loss, and (3) student refinement to mitigate residual pseudo-label bias. Central to CAST is an \emph{instance-aware pixel-wise contrastive loss} that fuses mask and class scores to extract informative negatives and enforce clear inter-instance margins. By maintaining this contrastive signal across both adaptation and distillation, we align teacher and student embeddings and fully leverage unlabeled images. On Cityscapes and ADE20K, our ~11x smaller student improves over its zero-shot VFM teacher(s) by +8.5 and +7.1 AP, surpasses adapted teacher(s) by +3.4 and +1.5 AP, and further outperforms state-of-the-art SSKD methods on both benchmarks.

实例分割需要昂贵的逐像素标注和计算密集型的模型。我们引入了CAST，这是一种半监督知识蒸馏（SSKD）框架，它利用有限的标记数据和大量的无标记数据，将预训练的视觉基础模型（VFM）压缩成紧凑的专家模型。CAST分为三个阶段：（1）通过对比校准进行自我训练的VFM域适应，（2）通过统一的多目标损失进行知识迁移，（3）学生精炼，以减轻残留的伪标签偏见。CAST的核心是实例感知像素级对比损失，它融合掩膜和类别分数来提取信息负样本并强制实施清晰的实例间边界。通过在整个适应和蒸馏过程中保持对比信号，我们使教师和学生嵌入对齐，并充分利用无标签图像。在Cityscapes和ADE20K上，我们较小的~11倍学生模型超过了其零样本VFM教师模型，提高了+8.5和+7.1 AP；超越了适应的教师模型，提高了+3.4和+1.5 AP；并且在两个基准测试上都超越了最新的SSKD方法。

论文及项目相关链接

PDF

Summary

本文介绍了一种基于半监督知识蒸馏（SSKD）的框架CAST，该框架利用有限的标签数据和大量的无标签数据将预训练的视觉基础模型（VFM）压缩成紧凑的专家模型。CAST分为三个主要阶段：VFM的域自适应、通过统一的多目标损失进行知识转移以及学生模型的精炼。核心在于实例感知像素级对比损失，该损失融合掩膜和类别分数以提取信息负样本并强制明确的实例间间隔。在Cityscapes和ADE20K数据集上，我们的较小学生模型在性能和效率上都优于其零样本VFM教师模型，并超越了适应的教师模型，同时在两个基准测试上都超过了最新的SSKD方法。

Key Takeaways

CAST框架是一种半监督知识蒸馏方法，旨在利用有限的标签数据和大量的无标签数据。
CAST将预训练的视觉基础模型（VFM）压缩成紧凑的专家模型。
CAST包括三个阶段：VFM的域自适应、知识转移和学生模型精炼。
实例感知像素级对比损失是CAST的核心，融合了掩膜和类别分数。
该损失有助于提取信息负样本并强制实例间明确的间隔。

Cool Papers

点此查看论文截图

Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective

Authors:Qishuai Wen, Chun-Guang Li

State-of-the-art methods for Transformer-based semantic segmentation typically adopt Transformer decoders that are used to extract additional embeddings from image embeddings via cross-attention, refine either or both types of embeddings via self-attention, and project image embeddings onto the additional embeddings via dot-product. Despite their remarkable success, these empirical designs still lack theoretical justifications or interpretations, thus hindering potentially principled improvements. In this paper, we argue that there are fundamental connections between semantic segmentation and compression, especially between the Transformer decoders and Principal Component Analysis (PCA). From such a perspective, we derive a white-box, fully attentional DEcoder for PrIncipled semantiC segemenTation (DEPICT), with the interpretations as follows: 1) the self-attention operator refines image embeddings to construct an ideal principal subspace that aligns with the supervision and retains most information; 2) the cross-attention operator seeks to find a low-rank approximation of the refined image embeddings, which is expected to be a set of orthonormal bases of the principal subspace and corresponds to the predefined classes; 3) the dot-product operation yields compact representation for image embeddings as segmentation masks. Experiments conducted on dataset ADE20K find that DEPICT consistently outperforms its black-box counterpart, Segmenter, and it is light weight and more robust.

目前先进的基于Transformer的语义分割方法通常采用Transformer解码器，通过跨注意力从图像嵌入中提取额外的嵌入，通过自注意力完善其中一类或两类嵌入，并通过点积将图像嵌入投射到额外的嵌入上。尽管这些方法取得了显著的成功，但这些经验设计仍然缺乏理论证明或解释，从而阻碍了原则性的改进。在本文中，我们认为语义分割与压缩之间存在根本的联系，特别是Transformer解码器与主成分分析（PCA）之间的联系。从这一角度，我们推导出了一个白箱、全注意力的用于原则性语义分割的解码器（DEPICT），其解释如下：1）自注意力运算符通过完善图像嵌入来构建一个理想的主子空间，该空间与监督对齐并保留最多信息；2）跨注意力运算符旨在找到完善后的图像嵌入的低阶近似，这预期是一组正交基底的子集，对应于预定义的类别；3）点积操作为图像嵌入生成紧凑表示作为分割掩码。在ADE20K数据集上进行的实验发现，DEPICT始终优于其黑箱对应模型Segmenter，并且它更轻量级、更稳健。

论文及项目相关链接

PDF NeurIPS2024. Code:https://github.com/QishuaiWen/DEPICT/

Summary：最新先进的Transformer语义分割方法通常采用Transformer解码器，通过交叉注意力从图像嵌入中提取附加嵌入，通过自注意力精炼一种或两种类型的嵌入，并通过点积将图像嵌入投影到附加嵌入上。虽然这些方法取得了显著的成功，但它们缺乏理论解释或支持，阻碍了进一步的改进。本文从语义分割和压缩之间的基本联系出发，特别是Transformer解码器与主成分分析（PCA）之间的联系，提出了一种全新的解码器DEPICT，其主要组件自注意力算子和交叉注意力算子等设计都具备理论基础。实验证明，DEPICT在ADE20K数据集上的表现优于黑箱模型Segmenter，并且具有轻量级和更稳健的特点。

Key Takeaways：

Transformer解码器在语义分割中通过交叉注意力提取附加嵌入，并通过自注意力进行精炼。
当前方法缺乏理论解释或支持，阻碍了进一步的改进。
本文从语义分割和压缩之间的联系出发，特别是与主成分分析（PCA）的联系来解读解码器设计。
DEPICT解码器的设计基于理论基础，包括自注意力算子和交叉注意力算子的解读。
DEPICT在ADE20K数据集上的表现优于黑箱模型Segmenter。

Cool Papers

点此查看论文截图

Ultra-Efficient On-Device Object Detection on AI-Integrated Smart Glasses with TinyissimoYOLO

Authors:Julian Moosmann, Pietro Bonazzi, Yawei Li, Sizhen Bian, Philipp Mayer, Luca Benini, Michele Magno

Smart glasses are rapidly gaining advanced functions thanks to cutting-edge computing technologies, especially accelerated hardware architectures, and tiny Artificial Intelligence (AI) algorithms. However, integrating AI into smart glasses featuring a small form factor and limited battery capacity remains challenging for a satisfactory user experience. To this end, this paper proposes the design of a smart glasses platform for always-on on-device object detection with an all-day battery lifetime. The proposed platform is based on GAP9, a novel multi-core RISC-V processor from Greenwaves Technologies. Additionally, a family of sub-million parameter TinyissimoYOLO networks are proposed. They are benchmarked on established datasets, capable of differentiating up to 80 classes on MS-COCO. Evaluations on the smart glasses prototype demonstrate TinyissimoYOLO’s inference latency of only 17ms and consuming 1.59mJ energy per inference. An end-to-end latency of 56ms is achieved which is equivalent to 18 frames per seconds (FPS) with a total power consumption of 62.9mW. This ensures continuous system runtime of up to 9.3 hours on a 154mAh battery. These results outperform MCUNet (TinyNAS+TinyEngine), which runs a simpler task (image classification) at just 7.3 FPS, while the 18 FPS achieved in this paper even include image-capturing, network inference, and detection post-processing. The algorithm’s code is released open with this paper and can be found here: https://github.com/ETH-PBL/TinyissimoYOLO

智能眼镜由于先进的计算技术，特别是加速的硬件架构和微小的人工智能算法，正在迅速获得高级功能。然而，将人工智能集成到具有小尺寸和有限电池容量的小型智能眼镜上仍然面临挑战，这关乎于用户体验。为此，本文提出了一个智能眼镜平台的设计，该平台支持全天候在线设备上物体检测并具有全天候电池寿命。该提议的平台基于Greenwaves Technologies的新型多核RISC-V处理器GAP9。此外，还提出了一系列小型YOLO网络，名为TinyissimoYOLO，这些网络的参数少于百万。它们在公认的数据集上进行了基准测试，能够在MS-COCO上区分多达80类。对智能眼镜原型的评估表明，TinyissimoYOLO的推理延迟仅为17毫秒，每次推理消耗的能量为1.59mJ。端到端的延迟达到56毫秒，相当于每秒处理18帧（FPS），总功耗为62.9毫瓦。这确保了使用容量为154mAh的电池可以持续运行长达9.3小时的系统。这些结果优于MCUNet（TinyNAS+TinyEngine），后者仅执行更简单的任务（图像分类）且帧率仅为每秒7.3帧，而本文实现的每秒处理18帧还包括图像捕获、网络推理和检测后处理。该算法的代码与论文一同公开发布，可以在这里找到：https://github.com/ETH-PBL/TinyissimoYOLO

论文及项目相关链接

PDF This paper has been accepted for publication at ECCV 2024 Workshops, Milan, 2024

Summary
智能眼镜由于先进的计算技术，特别是加速硬件架构和微小的人工智能算法，正在迅速获得高级功能。然而，在小尺寸和有限电池容量下实现满意用户体验的智能眼镜仍面临挑战。为此，本文提出了基于Greenwaves Technologies的GAP9新型多核RISC-V处理器设计的智能眼镜平台，该平台可实现全天候在线物体检测。此外，还提出了一系列TinyissimoYOLO网络，能够在MS-COCO上区分多达80类物体。评估结果表明，TinyissimoYOLO的推理延迟仅为17毫秒，每推理能耗为1.59mJ。整个系统可达到每秒处理约每秒帧帧帧的延迟率框架系统耗电总共耗费的时间也只有经过此项测试的论文所使用的简单测试优化无线还有出色耗率为万个小时待满相比提高了耗电量大幅度降低了本文论文并附开放算法代码可在网站获取https://github.com/ETH-PBL/TinyissimoYOLO 。总体而言，本研究为提高智能眼镜性能并提升用户体验做出了贡献。文章总结出所提出的方案将极大改善智能眼镜的用户体验并提供全天续航能力且支持更高速度的检测处理。此外算法代码已公开发布便于获取和使用。

Key Takeaways

以下是基于文本提取的关键见解：