深度学习论文: An End-to-End Trainable Neural Network for Image-based Sequence Recognition

阅读量：

深度学习论文: An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition及其PyTorch实现
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
PDF: https://arxiv.org/pdf/1507.05717.pdf
PyTorch代码: https://github.com/shanglianlm0525/CvPytorch
PyTorch代码: https://github.com/shanglianlm0525/PyTorch-Networks

1 概述

CRNN将特征提取，序列模型和转录融合到一个统一的框架下。

与之前的场景文本识别系统相比，CRNN主要包含了四个特性：

目前已经存在的算法的组件大多是分开训练和调整的，相比之下CRNN是可以进行端到端训练的。
能够自然的处理任意长度文本，不涉及字符分割和水平尺度归一化
不受限于任何预定义的词库，并且在使用词库和不使用词库的模式下都取得了较好的成过。
使用高效且小的多的模型，在现实应用中更具实用性。

2 CRNN

CRNN架构如下：CRNN模型包括三个部分，分别称作卷积层、循环层以及转录层。在CRNN的底部，卷积层自动从每个输入图像中提取特征序列。在卷积网络之上，建立循环网络，用于对由卷积层输出的特征序列的每个帧进行预测。采用CRNN顶部的transcription将recurrent层的每帧预测转换为标签序列。虽然CRNN由不同类型的网络架构（例如CNN和RNN）组成，但它可以通过一个损失函数联合训练。
在这里插入图片描述

2-1 Feature Sequence Extraction

CRNN模型中的卷积层由一系列的卷积层、池化层、BN层构造而成。就像其他的CNN模型一样，它将输入的图片转化为具有特征信息的特征图，作为后面循环层的输入。当然，为了使提取的特征图尺寸相同，输入的图像事先要缩放到固定的大小。

由于卷积神经网络中卷积层和最大池化层的存在，使其具有平移不变性的特点。卷积神经网络中的感受野指的是经过卷积层输出的特征图中每个像素对应的原输入图像区域的大小，它与特征图上的像素从左到右，从上到下是一一对应的，如下图所示。因此，可以将特征图作为图像特征的表示。
在这里插入图片描述

2-2 Sequence Labeling

卷积层得到的特征序列经过循环层两个BiLSTM的处理后，进一步结合了上下文的语义信息，可以对图片中的文本信息进行更好地识别。此外，由于卷积层的输入的维度和LSTM的输入并不完全相同，所以还需要线性层进行维度的转换，即Map-to-Sequence。
在这里插入图片描述

2-3 Transcription

转录层的作用是将前面通过CNN层和RNN层得到的预测序列转换成标记序列，得到最终的识别结果。简单来说，就是选取预测序列中每个分量中概率最大的索引对应的符号作为识别结果，最终组成序列作为最终的识别序列。这里采用CTC中定义的条件概率来处理序列的转换问题。

2-4 Network configuration

卷积层的结构基于VGG-VeryDeep架构。为了使其适合识别英文文本，在第三和第四个最大池化层中，采用了1×2大小的矩形池化窗口而不是传统的方形窗口。这种调整产生了具有较大宽度的特征图，因此具有更长的特征序列。例如，包含10个字符的图像通常大小为100×32，从中可以生成25帧的特征序列。这个长度超过了大多数英语单词的长度。最重要的是，矩形池化窗口产生矩形感受野，这有利于识别一些具有窄形状的字符，例如“i”和“l”。在 5th 和 6th 卷积层后面加入 batch normalization，有助于模型快速收敛。
在这里插入图片描述

3 Experiments

全部评论 (0)

还没有任何评论哟~

深度学习论文: An End-to-End Trainable Neural Network for Image-based Sequence Recognition

深度学习论文:AnEndtoEndTrainableNeuralNetworkforImagebasedSequenceRecognitionandItsApplicationtoSceneTextR...

An End-to-End Trainable Neural Network for Image-based Sequence Recognition

Abstract 基于图像的序列识别一直是计算机视觉中长期存在的研究课题。在本文中，我们研究了场景文本识别的问题，这是基于图像的序列识别中最重要和最具挑战性的任务之一。我们一种新颖的神经网络架构，集成...

BaiXiang——【arXi2015】An End-to-End Trainable Neural Network for Image-based Sequence Recognition and

BaiXiang的CRNN论文阅读 1\.论文题目 BaiXiang——【arXiv2015】AnEndtoEndTrainableNeuralNetworkforImagebasedSequence...

论文笔记：An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application

历史方法 1）基于字符的DCNN,比如photoOCR.单个字符的检测与识别。要求单个字符的检测器性能很强，crop的足够好。 2）直接对图片进行分类。9万个单词，组合成无数的单词，无法直接应用 3）...

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

简介 MaskTextSpotter:AnEndtoEndTrainableNeuralNetworkforSpottingTextwithArbitraryShapes，ECCV18 作者：旷视科技...

【读点论文】An End-to-End Trainable Neural Network for Image-based Sequence Rec轻量的长序列条形文本识别模型,很经典，很实用

AnEndtoEndTrainableNeuralNetworkforImagebasedSequenceRecognitionandItsApplicationtoSceneTextRecognit...

A Network-based End-to-End Trainable Task-oriented Dialogue System

关键词 end2end,taskorienteddialoguesystem 来源 arXiv2016.04.15 问题当前构建一个诸如宾馆预订或技术支持服务的taskoriented的对话系统很难...

An End-to-End Local Attention Based Model for Table Recognition(ICDAR 2023)

AnEndtoEndLocalAttentionBasedModelforTableRecognitionICDAR2023 一.前述作者认为基于Transformer的表格识别模型很难处理大表格的...

Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework

效果：１、在ICDAR2013andICDAR2015的端到端识别中达到当时最好的精度２、速度很快，检测＋识别可达到１０FPS. 本文特点：是端到端的框架，同时检测和识别文本贡献：１、在单个...

Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework

DeepTextSpotter:AnEndtoEndTrainableSceneTextLocalizationandRecognitionFramework 结果：stateoftheartaccu...

是否确定退出登录?

深度学习论文: An End-to-End Trainable Neural Network for Image-based Sequence Recognition

1 概述

2 CRNN

2-1 Feature Sequence Extraction

2-2 Sequence Labeling

2-3 Transcription

2-4 Network configuration

3 Experiments

全部评论 (0)

相关文章推荐

深度学习论文: An End-to-End Trainable Neural Network for Image-based Sequence Recognition

An End-to-End Trainable Neural Network for Image-based Sequence Recognition

BaiXiang——【arXi2015】An End-to-End Trainable Neural Network for Image-based Sequence Recognition and

论文笔记：An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

【读点论文】An End-to-End Trainable Neural Network for Image-based Sequence Rec轻量的长序列条形文本识别模型,很经典，很实用

A Network-based End-to-End Trainable Task-oriented Dialogue System

An End-to-End Local Attention Based Model for Table Recognition(ICDAR 2023)

Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework

Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework