The linear functions have a constant derivate and exhibit a constant rate of change, hence they have a constant descent at every iteration. Hence using a linear activation function the NN can only on fixed patterns which are common in all iterations. Unusual data characteristics cannot be addressed by the linear activations functions.
With the advent, if large data bodies the need for studying the non-linear characteristics of data has grown, Methods which can learn them are focused. Non-linear activation functions came to usage, The non-linear activation functions are curved in nature with many gradients. At these gradients, functional optimal exists. The main design principle of the NN is to minimize the error which is the difference between the actual value and the predicted value. Supporting this constructive principle the NNs use activation functions which can converge to a local minimum. On this non-linear boundary of the activation function, the NN learns possible unusual data patterns.
The non-linear activation functions are curved boundaries that can adapt to the non-linear changes of the output. The NN model can now learn any complex feature of the input that is mapped on to the boundary.
For example, let us consider the linear activation function given by---
Y = F(x) = x --------(1)
consider the input-output mapping of the linear function given by equation(1).
X | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 10 |
Y | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 10 |
Here for the input from 0-10 boundary of the output is from 0-10. The linear activation function has been trained on the inputs from 0-10. The activation function has adapted to learn the same outputs as that of inputs. Here the predictive powers of the NN are not so accurate as the output is the same as the input. If the value x=3 has the same error then the NN failed to reduce this error as it has given the same output Y=3. This is a major drawback if linear activation functions.
On the hand consider a non-linear activation function of the sigmoid type given by equation (2)
F(x) = 1/1+e power of -x ----------(2)
consider the I/O mapping of the sigmoid function.
X | -E | 0 | 1 | 2 | 3 | 4 | 5 | 6 | E |
Y | 0 | 0.5 | 0.7 | 0.8 | 0.96 | 0.97 | 0.98 | 0.99 | 1 |
Whatever be the input given the output varies between 0 and 1. While the model is training with given data like 0,1, etc. the model is adopting to learn the new output like 0, 0.5, etc. for a given input boundary the model has learned a new output boundary. This shows the non-linearity characteristic of the sigmoid function. Though the input is linear the output is non-linear. Because of this non-linear characteristec the sigmoid function of majorly used in the NN.
No comments:
Post a Comment