Convolutional Neural Network Example in Tensorflow

Problem Detail: I want to ask the dimension change in different convolution and max-pooling layer. I am referring to the example in TensorFlow tutorial: http://tensorflow.org/tutorials/mnist/pros/index.html#deep-mnist-for-experts The original image is a 28x28x1 The first convolutional layer:

  1. apply convolution to a 5×5 patch with 32 features -> 24x24x32
  2. apply max-pooling 2×2 -> 12x12x32

Second convolutional layer:

  1. apply convolution to a 5×5 patch with 64 features -> 8x8x64
  2. apply max-pooling 2×2 -> 4x4x64

But it said “Now that the image size has been reduced to 7×7” but my calculation seems to claim that it is a 4×4 Did I miss some concept? I am new to CNN so it may be a beginner question. Thanks

Asked By : LKS

Answered By : Wandering Logic

Your calculation would be correct if the example were following the “usual” approach of having convolution chop off the edges. Instead the example you pointed to says:

How do we handle the boundaries? What is our stride size? In this example, we’re always going to choose the vanilla version. Our convolutions uses a stride of one and are zero padded so that the output is the same size as the input.

So they are:

  1. zero-padding the 28x28x1 image to 32x32x1
  2. applying 5x5x32 convolution to get 28x28x32
  3. max-pooling down to 14x14x32
  4. zero-padding the 14x14x32 to 18x18x32
  5. applying 5x5x32x64 convolution to get 14x14x64
  6. max-pooling down to 7x7x64.

They probably have an option to turn the zero padding off. In other infrastructures I’ve used zero padding is not the default. (In several of the infrastructures I’ve used zero-padding isn’t even possible.)

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/49658