Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

阅读量：

简介

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes，ECCV 18
作者：旷视科技，华中科技大学白翔老师团队
文章链接
 代码pytorch实现
 代码caffe2实现

解决的问题 ：场景文本检测和识别。本文以一种可以端到端训练的神经网络模型实现场景文本检测和识别两个任务，提出模型名为Mask TextSpotter，可以通过语义分割semantic segmentation识别任意形状的文本实例。以前的方法（[27]和[3]）将两个任务分别调优，忽视了两者之间的高度相关又互补的关系，也无法end-to-end
图三是本文提出的方法，使用绿色bounding box，是文本检测的结果，红色文本是识别的结果。

论文主要贡献

提出一种简单光滑可训练的端到端文本spotting模型Mask TextSpotter
该模型可以识别和检测任意形状的文本，无论是水平，旋转还是弯曲
与之前的方法不同，该方法使用语义分割实现文本检测和识别
该方法在多个数据集中达到了SOTA

Pipeline

Pipeline：四个部分组成，分别是FPN，RPN，Fast R-CNN和mask branch
ResNet+FPN：使用ResNet50对不同尺度特征进行特征提取和融合以提高精度
RPN：使用不同尺度的anchors及RoI Aligh操作生成候选区域
Fast R-CNN：完成分类和回归任务，提供更精确的bounding boxes
mask branch：完成文本实例分割和字符分割任务，如下图所示，输入RoI特征图16*64，经过4个卷积核和1个反卷积，输出预测图有38个通道，分别是1个全局文本实例图（任意形状文本的精确位置），10个数字图，26个字母图和1个背景图（用于后处理）。

Loss Function

多任务损失函数（公式3-7）
包括RPN网络的损失，Faster R-CNN分支的损失和mask分支的损失，其中mask分支包括全局文本分割损失和字符分割损失，前者是一个平均二值较差损失函数，后者是加权softmax分类损失函数。
像素投票机制
将输出的字符分割图解码成字符序列。首先，将背景分割图进行二值化，然后寻找字符连通区域，并计算对应的字符分割图上每个连通区域的均值，将最大的均值概率对应的字符分配给该区域作为预测字符，然后从左到右确定当前文本区域对应的字符序列。
加权编辑距离
此外，本文还提出了一种新定义的加权编辑距离，用于在给定字典中查找当前预测序列的最佳输出，由于可能存在多个最小距离相同的单词，而无法取决最佳时，考虑到插入、删除和替代的不同代价，对编辑距离进行加权计算求最优，设计初衷是根据当前投票后的字符概率进行三种操作的代价计算。

实验结果

为了验证提出方法的有效性，分别在三个数据集上测试对于水平文本、旋转文本和弯曲文本的识别性能。

本人文笔粗浅，以上是个人理解，如有错误，欢迎指正。

全部评论 (0)

还没有任何评论哟~

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

简介 MaskTextSpotter:AnEndtoEndTrainableNeuralNetworkforSpottingTextwithArbitraryShapes，ECCV18 作者：旷视科技...

Mask TextSpotter: An End-to-End TrainableNeural Network for Spotting Text withArbitrary Shapes 中英对翻译

MaskTextSpotter：一种用于识别任意形状文本的端到端可训练神经网络摘要最近，基于深度神经网络的模型在场景文本检测和识别领域占据主导地位。在本文中，我们研究了场景文本识别的问题，其目的是...

An End-to-End Trainable Neural Network for Image-based Sequence Recognition

Abstract 基于图像的序列识别一直是计算机视觉中长期存在的研究课题。在本文中，我们研究了场景文本识别的问题，这是基于图像的序列识别中最重要和最具挑战性的任务之一。我们一种新颖的神经网络架构，集成...

Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework

效果：１、在ICDAR2013andICDAR2015的端到端识别中达到当时最好的精度２、速度很快，检测＋识别可达到１０FPS. 本文特点：是端到端的框架，同时检测和识别文本贡献：１、在单个...

Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework

DeepTextSpotter:AnEndtoEndTrainableSceneTextLocalizationandRecognitionFramework 结果：stateoftheartaccu...

Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks

文章目录 Abstract 1\.Introduction 2\.RelatedWork 2.1.TextDetection 2.2.TextRecognition 2.3.TextSpottingS...

BaiXiang——【arXi2015】An End-to-End Trainable Neural Network for Image-based Sequence Recognition and

BaiXiang的CRNN论文阅读 1\.论文题目 BaiXiang——【arXiv2015】AnEndtoEndTrainableNeuralNetworkforImagebasedSequence...

[论文阅读]TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting阅读笔记

TextDragon:AnEndtoEndFrameworkforArbitraryShapedTextSpotting阅读笔记文章被收录于ICCV2019 [论文地址]:<http://opena...

深度学习论文: An End-to-End Trainable Neural Network for Image-based Sequence Recognition

深度学习论文:AnEndtoEndTrainableNeuralNetworkforImagebasedSequenceRecognitionandItsApplicationtoSceneTextR...

An end-to-end TextSpotter with Explicit Alignment and Attention

AnendtoendTextSpotterwithExplicitAlignmentandAttention TongHe;,ZhiTian;,WeilinHuang,ChunhuaShen中国科学院...

是否确定退出登录?

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

简介

论文主要贡献

相关工作

Pipeline

Loss Function

实验结果

全部评论 (0)

相关文章推荐

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Mask TextSpotter: An End-to-End TrainableNeural Network for Spotting Text withArbitrary Shapes 中英对翻译

An End-to-End Trainable Neural Network for Image-based Sequence Recognition

Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework

Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework

Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks

BaiXiang——【arXi2015】An End-to-End Trainable Neural Network for Image-based Sequence Recognition and

[论文阅读]TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting阅读笔记

深度学习论文: An End-to-End Trainable Neural Network for Image-based Sequence Recognition

An end-to-end TextSpotter with Explicit Alignment and Attention