【SOT】《Transformers in single object tracking: an experimental survey》

文章目录
-
1、背景介绍
- 2、基于CNN-Transformer的追踪器
- 3、完全依赖于Transformer的追踪器
-
- 3.1 双流双阶段追踪器
- 3.2 单流单阶段追踪器
-
4、基准数据集及其评估指标
-
5、追踪效率分析
-
6、结论(自)
-
7、参考文献
1、Background
transformer 的方法做单目标跟踪的综述
两大类,CNN-Transformer based 和 Fully-Transformer based

SOT 相关的综述汇总

Transformer 首次在 NLP 中提出

Transformer 在 computer vision 上的应用


一般情况下而言,在训练Transformer架构时需要大量的样本进行学习与优化。然而,在视觉目标跟踪(VOT)领域中由于这一方法旨在识别并追踪序列中的第一帧中的特定信息(目的是为了减少样本需求),因而难以获得足够的训练数据。此外,在这种限制下,默认采用完全基于Transformer架构或者结合CNN(卷积神经网络)设计的跟踪算法时都会将这些网络作为基础模块进行优化与应用

CNN-Transformer based 的方法都是 two-stream two-stage 的
特征提取和特征融合在目标模板与搜索区域之间分别进行了两个明确的不同阶段(两阶段)。
feature fusion 例如 siamRPN 中的相关操作

2、CNN-Transformer based trackers
第一篇 CNN-Transformer SOT 方法
Wang N and Zhou W, along with Wang J and other researchers, integrate Transformer techniques with tracking mechanisms by making full use of temporal information to achieve reliable visual tracking, as presented in the paper titled "Transformer meets tracker" at the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

Chen X, Yan B, Zhu J, et al. Transformer-based trackers[C]//Proceedings of the IEEE/CVF CVPR, 2021: 8126-8135.

使用 transformer 来进行特征融合
ego-context enhanced module (ECA) and an augment for addressing cross-feature interactions (CFA) module aimed at improving both self-attention mechanisms and cross-attention capabilities.



利用CNN-Transformer架构设计出的跟踪器在性能上显著超越了孪生网络,在实际应用中表现出了显著的优势。主要得益于采用了可学习式的Transformer结构而非传统的线性互相关运算
3、Fully-Transformer based trackers
CNN-Transformer的跟踪器很难捕获全局特征表示。
3.1、Two-stream two-stage trackers
The system under investigation is built with two identical and distinct Transformer-based tracking pipelines, each designed to independently process feature information from both the target template and search area. A subsequent Transformer network is used to establish relationships between these feature representations. Finally, an attention mechanism within a prediction module facilitates precise localization of the target object.
第一篇
Xie Fei, Wang Chun, Wang Gang, et al. 通过双分支全连接Transformer网络学习追踪表示[C]//Proceedings of the IEEE/CVF ICCV. 2021: 2688-2697.

A proposed method called Swintrack is presented as a robust foundation in the field of transformer tracking. It was published in the journal Advances in Neural Information Processing Systems in 2022, volume 35, issue 1, covering pages 16743-16754.

3.2、One-stream one-stage trackers

没有了相关操作,直接输出结果
MAM模块的计算效率很低,速度比较慢

该研究团队开发了一种联合特征学习与关系建模用于跟踪的单镜头框架,在欧洲计算机视觉会议(ECCV)上发表

消除了不必要的背景特征
Lan J P, Cheng Z Q, He J Y et al. explored the Procontext approach for advancing progressive context transformers in tracking applications[C]//Proceedings of ICASSP 2023 held from 19th to 24th May 2023 in Hawaii, USA. IEEE, 1986: ....

比 OSTrack 多引入了时间信息


4、Benchmark datasets and evaluation metrics



在 OTB 数据集上,单一的 Transformer 模型的表现不如传统的 CNN 方法以及结合 Transformer 的改进模型。
多数OTB视频具有较低帧率,在这些情况下,目标物体的外观特征在不同场景中不易变化。基于CNN进行特征提取与匹配追踪表现出色的效果。
方法性能主要取决于其时间线索学习能力和全局特征捕捉能力
5、Tracking efficiency analysis

6、Conclusion(own)
approaching object tracking as a sequence learning problem instead of a template-based approach
Through extensive experiments conducted in our comparative study, we demonstrate that One-stream One-stage fully-Transformer trackers markedly surpass other types of trackers and are projected to become the leading approach in the field of single object tracking for the next several years.
7、Reference
J. Kugarajeevan, Kokul T., Ramanan A., et al. Transformers for singular object tracking: a comprehensive experimental study[J]. IEEE Access, 2023.
万篇长文:全面解析目标追踪技术的最新进展!本文深入探讨当前目标追踪算法的演进历程及其未来发展趋势,并提供详细的技术解析与应用前景分析
