conv
– Ops for convolutional neural nets¶
Note
Two similar implementation exists for conv2d:
signal.conv2d
andnnet.conv2d
.
The former implements a traditional 2D convolution, while the latter implements the convolutional layers present in convolutional neural networks (where filters are 3D and pool over several input channels).
TODO: Give examples for how to use these things! They are pretty complicated.
- Conv implemented
conv2d_fft
This is a GPU-only version of conv2d that uses an FFT transform to perform the work. You can enable it by setting ‘THEANO_FLAGS=optimizer_including=conv_fft_valid:conv_fft_full’ in your environement. This is not enabled by default because it has some restrictions on input and uses more memory. Also note that it requires CUDA >= 5.0, scikits.cuda >= 0.5.0 and PyCUDA to run.conv3D
. Doesn’t work on the GPU.conv3d2d
Another conv3d implementation that uses the conv2d with data reshaping. It is faster in some cases than conv3d, specifically on the GPU.-
This is in Pylearn2, not very documented and uses a different memory layout for the input. It is important to have the input in the native memory layout, and not use dimshuffle on the inputs, otherwise you lose most of the speed up. So this is not a drop in replacement of conv2d.
Normally those are called from the linear transform implementation.
Also, there is restrictions on which shape are supported.
-
theano.tensor.nnet.conv.
conv2d
(input, filters, image_shape=None, filter_shape=None, border_mode='valid', subsample=(1, 1), **kargs)¶ This function will build the symbolic graph for convolving a stack of input images with a set of filters. The implementation is modelled after Convolutional Neural Networks (CNN). It is simply a wrapper to the ConvOp but provides a much cleaner interface.
Parameters: - input (symbolic 4D tensor) – mini-batch of feature map stacks, of shape (batch size, stack size, nb row, nb col) see the optional parameter image_shape
- filters (symbolic 4D tensor) – set of filters used in CNN layer of shape (nb filters, stack size, nb row, nb col) see the optional parameter filter_shape
- border_mode –
- ‘valid’– only apply filter to complete patches of the image. Generates
- output of shape: image_shape - filter_shape + 1
- ‘full’ – zero-pads image to multiple of filter shape to generate output
- of shape: image_shape + filter_shape - 1
- subsample (tuple of len 2) – factor by which to subsample the output
- image_shape (None, tuple/list of len 4 of int or Constant variable) – The shape of the input parameter. Optional, used for optimization like loop unrolling You can put None for any element of the list to tell that this element is not constant.
- filter_shape (None, tuple/list of len 4 of int or Constant variable) – Optional, used for optimization like loop unrolling You can put None for any element of the list to tell that this element is not constant.
- kwargs –
kwargs are passed onto ConvOp. Can be used to set the following: unroll_batch, unroll_kern, unroll_patch, openmp (see ConvOp doc)
- openmp: By default have the same value as
- config.openmp. For small image, filter, batch size, nkern and stack size, it can be faster to disable manually openmp. A fast and incomplete test show that with image size 6x6, filter size 4x4, batch size==1, n kern==1 and stack size==1, it is faster to disable it in valid mode. But if we grow the batch size to 10, it is faster with openmp on a core 2 duo.
Return type: symbolic 4D tensor
Returns: set of feature maps generated by convolutional layer. Tensor is of shape (batch size, nb filters, output row, output col)
-
theano.tensor.nnet.Conv3D.
conv3D
()¶ 3D “convolution” of multiple filters on a minibatch (does not flip the kernel, moves kernel with a user specified stride)
Parameters: - V – Visible unit, input. dimensions: (batch, row, column, time, in channel)
- W – Weights, filter. dimensions: (out channel, row, column, time ,in channel)
- b – bias, shape == (W.shape[0],)
- d – strides when moving the filter over the input(dx, dy, dt)
Note: The order of dimensions does not correspond to the one in conv2d. This is for optimization.
Note: The GPU implementation is very slow. You should use
conv3d2d
for a GPU graph instead.See: Someone made a script that shows how to swap the axes between both 3d convolution implementations in Theano. See the last attachment.
-
theano.tensor.nnet.conv3d2d.
conv3d
(signals, filters, signals_shape=None, filters_shape=None, border_mode='valid')¶ Convolve spatio-temporal filters with a movie.
It flips the filters.
Parameters: - signals – timeseries of images whose pixels have color channels. shape: [Ns, Ts, C, Hs, Ws]
- filters – spatio-temporal filters shape: [Nf, Tf, C, Hf, Wf]
- signals_shape – None or a tuple/list with the shape of signals
- filters_shape – None or a tuple/list with the shape of filters
- border_mode – The only one tested is ‘valid’.
Note: Work on the GPU. Another way to define signals: (batch, time, in channel, row, column) Another way to define filters: (out channel,time,in channel, row, column)
See: Someone made a script that shows how to swap the axes between both 3d convolution implementations in Theano. See the last attachment.
-
theano.sandbox.cuda.fftconv.
conv2d_fft
(input, filters, image_shape=None, filter_shape=None, border_mode='valid', pad_last_dim=False)¶ Perform a convolution through fft.
Only support input which will be even on the last dimension (width). All other dimensions can be anything and the filters can have an even or odd width.
If you must use input which has an odd width, you can either pad it or use the pad_last_dim argument which will do it for you and take care to strip the padding before returning. Don’t use this argument if you are not sure the input is odd since the padding is unconditional and will make even input odd, thus leading to problems.
On valid mode the filters must be smaller than the input.
input: (b, ic, i0, i1) filters: (oc, ic, f0, f1)
border_mode: ‘valid’ of ‘full’
- pad_last_dim: Unconditionally pad the last dimension of the input
- to to turn it from odd to even. Will strip the padding before returning the result.