sandbox.cuda – List of CUDA GPU Op implemented

Normally you should not call directly those Ops! Theano should automatically transform cpu ops to their gpu equivalent. So this list is just useful to let people know what is implemented on the gpu.

Basic Op

class theano.sandbox.cuda.basic_ops.GpuAdvancedIncSubtensor1(inplace=False, set_instead_of_inc=False)

Implement AdvancedIncSubtensor1 on the gpu.

class theano.sandbox.cuda.basic_ops.GpuAdvancedIncSubtensor1_dev20(inplace=False, set_instead_of_inc=False)

Implement AdvancedIncSubtensor1 on the gpu, but use function only avail on compute capability 2.0 and more recent.

make_node(x, y, ilist)

It defer from GpuAdvancedIncSubtensor1 in that it make sure the index are of type long.

class theano.sandbox.cuda.basic_ops.GpuAdvancedSubtensor1(sparse_grad=False)

Implement AdvancedSubtensor1 on the gpu.

class theano.sandbox.cuda.basic_ops.GpuAlloc(memset_0=False)

Implement Alloc on the gpu.

The memset_0 param is an optimization. When True, we call cudaMalloc that is faster.

class theano.sandbox.cuda.basic_ops.GpuCAReduce(reduce_mask, scalar_op, pre_scalar_op=None)

GpuCAReduce is a Reduction along some dimensions by a scalar op.

The dimensions along which to reduce is specified by the reduce_mask that you pass to the constructor. The reduce_mask is a tuple of booleans (actually integers 0 or 1) that specify for each input dimension, whether to reduce it (1) or not (0).

For example, when scalar_op is a theano.scalar.basic.Add instance:

  • reduce_mask == (1,) sums a vector to a scalar
  • reduce_mask == (1,0) computes the sum of each column in a matrix
  • reduce_mask == (0,1) computes the sum of each row in a matrix
  • reduce_mask == (1,1,1) computes the sum of all elements in a 3-tensor.
Note:any reduce_mask of all zeros is a sort of ‘copy’, and may be removed during graph optimization

This Op is a work in progress.

This op was recently upgraded from just GpuSum a general CAReduce. Not many code cases are supported for scalar_op being anything other than scal.Add instances yet.

Important note: if you implement new cases for this op, be sure to benchmark them and make sure that they actually result in a speedup. GPUs are not especially well-suited to reduction operations so it is quite possible that the GPU might be slower for some cases.

pre_scalar_op: if present, must be a scalar op with only 1 input. We will execute it on the input value before reduction.

c_code_reduce_01X(sio, node, name, x, z, fail, N)
Parameters:N – the number of 1 in the pattern N=1 -> 01, N=2 -> 011 N=3 ->0111 Work for N=1,2,3
c_code_reduce_ccontig(sio, node, name, x, z, fail)

WRITEME IG: I believe, based on how this is called in c_code, that it is for the case where we are reducing on all axes and x is C contiguous.

supports_c_code(inputs)

Returns True if the current op and reduce pattern has functioning C code

class theano.sandbox.cuda.basic_ops.GpuContiguous(use_c_code='g++')

Always return a c contiguous output. Copy the input only if it is not already c contiguous.

class theano.sandbox.cuda.basic_ops.GpuDimShuffle(input_broadcastable, new_order)

Implement DimShuffle on the gpu.

class theano.sandbox.cuda.basic_ops.GpuElemwise(scalar_op, inplace_pattern=None, sync=None)

Implement a generic elemwise on the gpu.

class theano.sandbox.cuda.basic_ops.GpuFlatten(outdim=1)

Implement Flatten on the gpu.

class theano.sandbox.cuda.basic_ops.GpuFromHost(use_c_code='g++')

Implement the transfer from cpu to the gpu.

class theano.sandbox.cuda.basic_ops.GpuIncSubtensor(idx_list, inplace=False, set_instead_of_inc=False, destroyhandler_tolerate_aliased=None)

Implement IncSubtensor on the gpu.

Note: The optimization to make this inplace is in tensor/opt.
The same optimization handles IncSubtensor and GpuIncSubtensor. This Op has c_code too; it inherits tensor.IncSubtensor’s c_code. The helper methods like do_type_checking, copy_of_x, etc. specialize the c_code for this Op.
copy_into(view, source)

view: string, C code expression for an array source: string, C code expression for an array

returns a C code expression to copy source into view, and return 0 on success

copy_of_x(x)
Parameters:x – a string giving the name of a C variable pointing to an array
Returns:C code expression to make a copy of x

Base class uses PyArrayObject *, subclasses may override for different types of arrays.

do_type_checking(node)

Should raise NotImplementedError if c_code does not support the types involved in this node.

get_helper_c_code_args()

Return a dictionary of arguments to use with helper_c_code

make_view_array(x, view_ndim)
Parameters:
  • x – a string identifying an array to be viewed
  • view_ndim – a string specifying the number of dimensions to have in the view

This doesn’t need to actually set up the view with the right indexing; we’ll do that manually later.

class theano.sandbox.cuda.basic_ops.GpuJoin(use_c_code='g++')

Implement Join on the gpu.

class theano.sandbox.cuda.basic_ops.GpuReshape(ndim, name=None)

Implement Reshape on the gpu.

class theano.sandbox.cuda.basic_ops.GpuShape(use_c_code='g++')

Implement Shape on the gpu.

class theano.sandbox.cuda.basic_ops.GpuSubtensor(idx_list)

Implement subtensor on the gpu.

class theano.sandbox.cuda.basic_ops.HostFromGpu(use_c_code='g++')

Implement the transfer from gpu to the cpu.

theano.sandbox.cuda.basic_ops.col(name=None, dtype=None)

Return a symbolic column variable (ndim=2, broadcastable=[False,True]). :param dtype: numeric type (None means to use theano.config.floatX) :param name: a name to attach to this variable

theano.sandbox.cuda.basic_ops.matrix(name=None, dtype=None)

Return a symbolic matrix variable. :param dtype: numeric type (None means to use theano.config.floatX) :param name: a name to attach to this variable

theano.sandbox.cuda.basic_ops.row(name=None, dtype=None)

Return a symbolic row variable (ndim=2, broadcastable=[True,False]). :param dtype: numeric type (None means to use theano.config.floatX) :param name: a name to attach to this variable

theano.sandbox.cuda.basic_ops.scalar(name=None, dtype=None)

Return a symbolic scalar variable. :param dtype: numeric type (None means to use theano.config.floatX) :param name: a name to attach to this variable

theano.sandbox.cuda.basic_ops.tensor3(name=None, dtype=None)

Return a symbolic 3-D variable. :param dtype: numeric type (None means to use theano.config.floatX) :param name: a name to attach to this variable

theano.sandbox.cuda.basic_ops.tensor4(name=None, dtype=None)

Return a symbolic 4-D variable. :param dtype: numeric type (None means to use theano.config.floatX) :param name: a name to attach to this variable

theano.sandbox.cuda.basic_ops.vector(name=None, dtype=None)

Return a symbolic vector variable. :param dtype: numeric type (None means to use theano.config.floatX) :param name: a name to attach to this variable

Blas Op

class theano.sandbox.cuda.blas.GpuConv(border_mode, subsample=(1, 1), logical_img_hw=None, logical_kern_hw=None, logical_kern_align_top=True, version=-1, verbose=0, kshp=None, imshp=None, max_threads_dim0=None)

Implement the batched and stacked 2d convolution on the gpu.

flops(inputs, outputs)

Useful with the hack in profilemode to print the MFlops

class theano.sandbox.cuda.blas.GpuDot22(use_c_code='g++')

Implement dot(2d, 2d) on the gpu.

class theano.sandbox.cuda.blas.GpuDot22Scalar(use_c_code='g++')

Implement dot(2d, 2d) * scalar on the gpu.

class theano.sandbox.cuda.blas.GpuDownsampleFactorMax(ds, ignore_border=False)

Implement downsample with max on the gpu.

class theano.sandbox.cuda.blas.GpuDownsampleFactorMaxGrad(ds, ignore_border)

Implement the grad of downsample with max on the gpu.

class theano.sandbox.cuda.blas.GpuGemm(inplace)

implement the gemm on the gpu.

class theano.sandbox.cuda.blas.GpuGemv(inplace)

implement gemv on the gpu.

class theano.sandbox.cuda.blas.GpuGer(inplace)

implement ger on the gpu.

Nnet Op

Curand Op

Random generator based on the CURAND libraries. It is not inserted automatically.

Define CURAND_RandomStreams - backed by CURAND

class theano.sandbox.cuda.rng_curand.CURAND_Base(output_type, seed, destructive)

Base class for a random number generator implemented in CURAND.

The random number generator itself is an opaque reference managed by CURAND. This Op uses a generic-typed shared variable to point to a CObject that encapsulates this opaque reference.

Each random variable is created with a generator of False. The actual random number generator is allocated from the seed, on the first call to allocate random numbers (see c_code).

Note:One caveat is that the random number state is simply not serializable. Consequently, attempts to serialize functions compiled with these random numbers will fail.
as_destructive()

Return an destructive version of self

classmethod new_auto_update(generator, ndim, dtype, size, seed)

Return a symbolic sample from generator.

cls dictates the random variable (e.g. uniform, normal)

class theano.sandbox.cuda.rng_curand.CURAND_Normal(output_type, seed, destructive)

Op to draw normal numbers using CURAND

class theano.sandbox.cuda.rng_curand.CURAND_RandomStreams(seed)

RandomStreams instance that creates CURAND-based random variables.

One caveat is that generators are not serializable.

next_seed()

Return a unique seed for initializing a random variable.

normal(size=None, avg=0.0, std=1.0, ndim=None, dtype='float64')

Return symbolic tensor of normally-distributed numbers.

Param:size: Can be a list of integer or Theano variable(ex: the shape of other Theano Variable)
uniform(size, low=0.0, high=1.0, ndim=None, dtype='float64')

Return symbolic tensor of uniform numbers.

updates()

List of all (old, new) generator update pairs created by this instance.

class theano.sandbox.cuda.rng_curand.CURAND_Uniform(output_type, seed, destructive)

Op to draw uniform numbers using CURAND