Problem Detail: A lot of activation functions in neural networks (sigmoid, tanh, softmax) are monotonic, continuous and differentiable (except of may be a couple of points, where derivative does not exist). I understand the reason for continuity and differentiability, but can not really understand a reason for monotonisity.
Asked By : Salvador Dali
Answered By : Kyle Jones
During the training phase, backpropagation informs each neuron how much it should influence each neuron in the next layer. If the activation function isn’t monotonic then increasing the neuron’s weight might cause it to have less influence, the opposite of what was intended. The result would be choatic behavior during training, with the network unlikely to converge to a state that yields an accurate classifier.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/45281 Ask a Question Download Related Notes/Documents