Advertisement

Activation Functions in Neural Networks

阅读量:

This article draws inspiration from two sources: here and here.

激活函数的主要作用是为神经网络引入非线性特性。在神经科学中模拟的是一个神经元的动作机制。

A neural network lacking activation functions is essentially equivalent to a linear regression model. The term neural networks encompasses the concept of universal function approximators, which have been widely recognized for their versatility. This implies that neural networks are capable of computing and learning an array of functions with varying complexity. It is practical to represent nearly every conceivable process through functional computations within neural networks.

- 激活函数must be 'differentiable' for gradient descent.

- 一些常见的Function

  1. Sigmoid (deprecated). Output in [0,1], 不利于优化. 存在Vanishing Gradient问题(gradients get smaller and smaller during back-propagation. Early laryers cannot learn.)。
  2. Tanh (deprecated). Output in [-1,1]. Dominates Sigmoid but same vanishing gradient problem.
  3. ReLU-Rectified Linear Unit. f=max(0,x). 非常简单有效地解决了VG问题。不过不能用在最后一层(使用linear unit or softmax)。另外有可能导致Dead Neurons (when x<0, the neuron never activates). 可以使用Leaky ReLU 或者maxout解决这个问=题。
    1. Leaky ReLu: introduces a small slope to keep the updates alive.
    2. Benefits of ReLU: Cheap to compute ,converges faster, capable of outputting a true zero value allowing Representational Sparsity
  4. Maxout. is a generalization of the ReLU and the leaky ReLU functions.

This activation function is adaptive.

This function is a piecewise linear one, returning the maximum of its inputs and designed to be used alongside dropout regularization, where weights are set to zero with a certain probability.

Both Rectified Linear Unit (ReLU) and leaky ReLU are special cases of this activation function. The Maxout neuron benefits from the advantages of ReLU units (linear regime of operation without saturation), while avoiding their drawbacks (dying ReLU).

However, this approach increases the number of parameters per neuron by double, necessitating a larger parameter set during training.

Conclusion

employ ReLU for hidden layer activation but must pay attention to learning rates and track dead neurons.
If ReLU is causing issues, consider alternatives like Leaky ReLU, PReLU or Maxout; avoid using sigmoid.
normalize data to maximize validation accuracy; standardize when expediency is key.
These functions are unsuitable for deep networks because their gradients diminish during backpropagation.

全部评论 (0)

还没有任何评论哟~