Advertisement

Paper Digest - Neural Architecture Search: A Survey

阅读量:

Table of Content

    • Introduction
    • Search Space
    • Search Strategy
    • Performance Estimation Strategy

Original paper link: https://arxiv.org/pdf/1808.05377.pdf


Introduction

NAS is a rather broad topic. The author of the paper categorize methods for NAS according to three dimensions.

  • search space - what structure to search from
  • search strategy - how to explore search space
  • performance estimation strategy - method on reduce the cost of performance estimation.

A simple search space is chain-structured neural network. The search space is then parametrized by:

  • (i) the (maximum) number of layers n (possibly unbounded);
  • (ii) the type of operation every layer executes, e.g., pooling, convolution, or more advanced operations like depthwise separable convolutions (Chollet, 2016) or dilated convolutions (Yu and Koltun, 2016);
  • (iii) hyperparameters associated with the operation, e.g., number of filters, kernel size and strides for a convolutional layer (Baker et al., 2017a; Suganuma et al., 2017; Cai et al., 2018a), or simply number of units for fully-connected networks (Mendoza et al., 2016).

In addition, more modern proven architectures can be added to the search space, like skip connections, multi-branches.

Motivated by hand-crafted architectures consisting of repeated motifs (Szegedy et al., 2016; He et al., 2016; Huang et al., 2017), Zoph et al. (2018) and Zhong et al. (2018a) propose to search for such motifs, dubbed cells or blocks, respectively, rather than for whole architectures

The main strategies includes

  • Random search
  • Bayesian optimization - search conditional spaces using models like Random Forest.
  • Evolutionary methods
  • Reinforcement learning (RL) - model the quality of the network using model.
  • Gradient-based methods. - a continuous relaxation to enable direct gradient-based optimization

Performance Estimation Strategy

The main focus of performance estimation strategy is on reducing estimation time. There are 4 main approaches.

  • Lower fidelity estimates - Training time reduced by training for fewer epochs, on subset of data, down-scaled models, down-scaled data, …
  • Learning Curve Extrapolation - Training time reduced as performance can be extrapolated after just a few epochs of training.
  • Weight Inheritance/Network Morphisms - Instead of training models from scratch, they are warm-started by inheriting weights of, e.g., a parent model.
  • One-Shot Models/Weight Sharing - Only the one-shot model needs to be trained; its weights are then shared across different architectures that are just sub-graphs of the one-shot model.

全部评论 (0)

还没有任何评论哟~