Problem Detail: I have been going through the description of the backpropagation algorithm found here. and I am having a bit of trouble getting my head around some of the linear algebra. Say I have a final output layer $L$ consisting of two visible units, and layer $L-1$ consists of four hidden units. (this is just an example to illustrate my problem) my understanding is that the weight matrix for this final layer ($w^L$) should be a 4 x 2 matrix. the reference says to calculate the output error $delta^{x,L}$ given by: $delta^{x,L} = nabla_a C_x odot sigma'(z^{x,L})$ where: $z^{x,L} = w^La^{x,L-1} + b^L$, $a^{x,L} = sigma(z^{x,L})$, and $odot$ is the hadamard product. evaluating $delta^{x,L}$ gives a 1 x 2 vector, as it should given there are two output units. my problem is when calculating the hidden layer gradients (eg. $L-1$) given by: $delta^{x,L-1} = ((w^L)^Tdelta^{x,L})odotsigma'(z^{x,L-1})$ now if $w^L$ is a 4×2 matrix and $delta^{x,L}$ is a 1×2 vector then wouldnt $(w^L)^Tdelta^{x,L}$ be a multiplication of a 2×4 matrix and a 1×2 matrix, which is impossible? i feel like i have missed something vital in my understanding, but i cant work out what it is. is it just as simple as making it $delta^{x,L}(w^L)^T$? this would be a 1×2 matrix multiplied by a 2×4 matrix, which is perfectly legal. but the formula has it around the other way. can anyone see where my understanding is flawed? any help would be greatly appreciated
Asked By : guskenny83
Answered By : Kyle Jones
You’ve transposed the sizes of both $w^L$ and $delta^{x,L}$. $w^L$ should be 2×4 and $delta^{x,L}$ should be 2×1. $(w^L)^T$ is a 4×2 matrix that will be multipled by a 2×1 matrix yielding a 4×1 matrix suitable for the next step of backpropagation. In general for neural nets the activation units are represented as column vectors and the weights are matrices of dimension |L+1| x |L|, where L is the current layer and L+1 is the next layer (in the forward direction).
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/30785