Advertisement

RWTH-PHOENIX Weather数据集模型说明和下载

阅读量:

RWTH-PHOENIX Weather 2014 T数据集说明:

德国公共电视台PHONIX在期间(2009至2011年) 制作了配以lip reading 或者 hand signing 的每日新闻与气象 forecasts,并通过注释标记法完成了386份不同版本的气象 forecast报道。

此外,我们采用自动化语音识别技术与人工处理流程的结合使用来实现原始德语语音的转录工作.从而支持构建一个从手语视频输入到口语输出的手势翻译系统.

本文将简要说明这个库的用法和提供快速下载地址链接。

目录结构和说明:

RWTH-PHOENIX Weather 2014 T数据集的相关文件目录结构如下:

研究中使用它时,请遵守以下指导方针:Necati Cihan Camgöz等人的著作《Neural Sign Language Translation》发表于IEEE计算机视觉与模式识别会议(2018年盐湖城会议)。

手语翻译的过程由配备固定彩色摄像机的手语翻译员进行拍摄;口译员穿着深色衣物,在模拟背景中进行工作;所有拍摄的视频均采用每秒二十五帧的速度拍摄,并且每个帧的尺寸均为210像素乘以260像素;每一帧画面仅包含解码器窗口。

数据集获取位置: https://www-i6.informatik.rwth-aachen.de/ftp/pub/rwth-phoenix/2016/phoenix-2014-T.v3.tar.gz

下面我将分享几个论文都引用了该数据集,大家可以参考下:

Sign Language Transformers: Integrated End-to-end Sign Language Understanding and Conversion

Existing studies on Sign Language Translation have demonstrated that incorporating a mid-level sign gloss representation significantly enhances translation performance. Notably, the current state-of-the-art approaches necessitate gloss-level tokenization for functionality. To address this challenge, we introduce a novel architecture based on transformer models that simultaneously learns Continuous Sign Language Recognition and Translation. This is accomplished through the application of Connectionist Temporal Classification (CTC) loss, which unifies both recognition and translation tasks into a single framework. Our approach does not require ground-truth timing information, effectively solving two interdependent sequence-to-sequence learning problems in an end-to-end manner. As a result, we achieve significant performance improvements. We evaluate our system's recognition and translation capabilities on the RWTH-PHOENIX-Weather-2014T dataset. Our results demonstrate state-of-the-art performance in both sign language recognition and translation tasks. Specifically, our translation networks outperform existing methods by achieving scores more than double those of comparable systems (9.58 compared to 21.80 BLEU-4 Score). Additionally, we establish new baseline results for various text-to-text sign language translation tasks using transformer-based approaches.

论文地址:https://arxiv.org/pdf/2003.13830v1.pdf

Gloss Attention for Gloss-free Sign Language Translation

Currently, most sign language translation (SLT) approaches rely on gloss annotations to supplement supervision information, but obtaining these gloss annotations remains a challenge. To address this issue, we first analyze existing models to investigate how gloss annotations can simplify SLT processes. Our findings reveal that this mechanism provides two key benefits for the model: 1) it assists in implicitly identifying semantic boundaries within continuous sign language videos and 2) it enables the model to grasp the global context of sign language videos. To further enhance this capability, we introduce \emph{gloss attention}, which allows the model to maintain focus within segments of videos that share semantic similarities while still benefiting from gloss-based information as done by existing models. Additionally, we adapt insights from sentence-level similarity in natural language processing into our gloss attention-based SLT framework (GASLT), facilitating sentence-level understanding of sign language videos. Experimental results across multiple large-scale datasets demonstrate that our proposed GASLT model makes significant strides in performance compared to current techniques, with code available at \url{https://github.com/YinAoXiong/GASLT}.

论文地址:https://arxiv.org/pdf/2307.07361v1.pdf

Temporal Lift Pooling for Continuous Sign Language Recognition

In neural networks, pooling methods have become essential techniques for expanding feature representations and reducing computational complexity.

论文地址:https://arxiv.org/pdf/2207.08734v1.pdf

全部评论 (0)

还没有任何评论哟~