深度学习论文: MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning及其PyTorch实现

阅读量：

MCUNetV2: 基于分块推理的小型深度学习网络及其高效内存实现

1 概述

该模型在MCU端实现了突破性的ImageNet分类性能记录（达到71.8%），特别重要的是，在此过程中还意外地打开了在MCU端执行密集预测任务的可能性。

2 MCUNetV2

2-1 Breaking the Memory Bottleneck with Patch-based Inference

基于逐层推理的方法，在每一个卷积层中，推理单元首先会在SRAM内存中分配并创建用于存储输入数据和计算结果的缓冲区，并在完成当前层的计算任务之后及时释放这些输入缓冲区的空间以供下一层次使用。该方法在实现上更为简便，在提升系统性能的同时也降低了硬件资源的需求量；然而，在这种设计下，SRAM必须始终保留完整且独立的一组输入与输出缓冲区以避免数据丢失或干扰。

基于逐片处理的方法，则主要在内存密集阶段实现逐片运算功能。模型主要在计算时局限于一个小比例区域（其大小仅为完整区域的十分之一），这种设计可以显著降低系统的峰值内存占用率。完成这一阶段后，在后续过程中剩余的小峰值内存模块将按照层状方式进行常规计算

但是为了达到与逐层推理一致的结果目标, 非重叠输出块必须与重叠输入区域相配合, 并通过这种重复运算导致网络计算量出现明显上升趋势（约10%-17%）

2-2 Reducing Computation Overhead by Redistributing the Receptive Field

计算复杂度与patch方案初始阶段的感受野具有关联性，在考虑patch阶段的输出时，较大的感受野对应着较大的输入分辨率这将导致更多的重叠区域和冗余的计算过程

因此提出了重分布(redistribute)感受野以降低计算复杂度：

降低patch阶段的感受野；
提升layer阶段的感受野；

该方法的核心在于通过缩小初始阶段的感受野来减少patch区域的输入尺寸以及避免重复计算。然而，在一些任务中，由于感受野过小可能导致性能下降。因此，在进一步提升layer部分的感受野后能够弥补由此造成的性能损失。

该图表展示了基于MobileNetV2的架构对比分析，并提供了前后版本的具体配置比较。具体而言，在Patch推理过程中采用了具有较小计算量的小核单元和少量计算块，在Layer推理过程中则采用了增加计算块数量的方式以提高模型性能。

3 实验结果

全部评论 (0)

还没有任何评论哟~

深度学习论文: MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning及其PyTorch实现

深度学习论文:MCUNetV2:MemoryEfficientPatchbasedInferenceforTinyDeepLearning及其PyTorch实现 MCUNetV2:MemoryEffi...

深度学习论文: MCUNet: Tiny Deep Learning on IoT Devices及其PyTorch实现

深度学习论文:MCUNet:TinyDeepLearningonIoTDevices及其PyTorch实现 MCUNet:TinyDeepLearningonIoTDevices PDF:<https...

深度学习论文: Q-YOLO: Efficient Inference for Real-time Object Detection及其PyTorch实现

深度学习论文:QYOLO:EfficientInferenceforRealtimeObjectDetection及其PyTorch实现 QYOLO:EfficientInferenceforReal...

深度学习论文: Rethinking Mobile Block for Efficient Attention-based Models及其PyTorch实现

深度学习论文:RethinkingMobileBlockforEfficientAttentionbasedModels及其PyTorch实现 RethinkingMobileBlockforEffi...

深度学习论文: SegFormer:Simple and Efficient Design for Semantic Segmentation with Transformers及其PyTorch实现

深度学习论文:SegFormer:SimpleandEfficientDesignforSemanticSegmentationwithTransformers及其PyTorch实现 SegForme...

深度学习论文: MobileNeXt: Rethinking Bottleneck Structure for Efficient Mobile Network Design及其PyTorch实现

深度学习论文:MobileNeXt:RethinkingBottleneckStructureforEfficientMobileNetworkDesign及其PyTorch实现 MobileNeXt...

深度学习论文: Learning to Resize Images for Computer Vision Tasks及其PyTorch实现

深度学习论文:LearningtoResizeImagesforComputerVisionTasks及其PyTorch实现 LearningtoResizeImagesforComputerVisi...

深度学习论文:Learning Spatial Fusion for Single-Shot Object Detection及其PyTorch实现

LearningSpatialFusionforSingleShotObjectDetection PDF:<https://arxiv.org/pdf/1911.09516.pdf PyTorch代...

论文笔记 | Type4Py: Deep Similarity Learning-Based Type Inference for Python

arxiv2021 AmirM.Mir,EvaldasLatoskinas,SebastianProksch,GeorgiosGousios DelftUniversityofTechnology（荷...

深度学习论文: Efficient Multi-order Gated Aggregation Network及其PyTorch实现

深度学习论文:EfficientMultiorderGatedAggregationNetwork及其PyTorch实现 EfficientMultiorderGatedAggregationNetw...

是否确定退出登录?

深度学习论文: MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning及其PyTorch实现

1 概述

2 MCUNetV2

2-1 Breaking the Memory Bottleneck with Patch-based Inference

2-2 Reducing Computation Overhead by Redistributing the Receptive Field

3 实验结果

全部评论 (0)

相关文章推荐

深度学习论文: MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning及其PyTorch实现

深度学习论文: MCUNet: Tiny Deep Learning on IoT Devices及其PyTorch实现

深度学习论文: Q-YOLO: Efficient Inference for Real-time Object Detection及其PyTorch实现

深度学习论文: Rethinking Mobile Block for Efficient Attention-based Models及其PyTorch实现

深度学习论文: SegFormer:Simple and Efficient Design for Semantic Segmentation with Transformers及其PyTorch实现

深度学习论文: MobileNeXt: Rethinking Bottleneck Structure for Efficient Mobile Network Design及其PyTorch实现

深度学习论文: Learning to Resize Images for Computer Vision Tasks及其PyTorch实现

深度学习论文:Learning Spatial Fusion for Single-Shot Object Detection及其PyTorch实现

论文笔记 | Type4Py: Deep Similarity Learning-Based Type Inference for Python

深度学习论文: Efficient Multi-order Gated Aggregation Network及其PyTorch实现