Module¶
What is a Theano Module¶
Theano ‘Module’ is a structure which implements what could be called a
“theano class”. A Module
can contain Members
, which act like
instance variables (“state”). It can also contain an arbitrary number
of Methods
, which are functions that share the same Members
in
addition to their own inputs. Last but not least, Modules
can be
nested (explanations and examples follow). Module
is meant to:
- ease the sharing of variables between several Theano functions,
- streamline automatic naming, and
- allow a hierarchy of “modules” whose states can interact.
import¶
All examples suppose that you have done those import:
#!/usr/bin/env python
import theano
import numpy as N
from theano import tensor as T
from theano.tensor import nnet as NN
from theano.compile import module as M
Module¶
A Module
can contain Members
, Methods
and inner Modules
. Each type has a special meaning.
module = M.Module()
Member
¶
Usage:
#module.state = variable
module.state = T.scalar()
A Member
represents a state variable (i.e., whose value remains after a Method
is called). It will be named automatically after that field and it will be an implicit input of all Methods
of the Module
. Its storage (i.e. where the value is stored) will be shared by all Methods
of the Module
.
A Variable
which is the variable of a previous computation (by opposition to being updated
) is not a Member
. Internally this is called an External. You should not need to care about this.
For sharing state between modules, see Inner Module
section.
Method
¶
Usage:
module.method = M.Method(inputs, outputs, **updates)
Each key in the updates dictionary must be the name of an existing Member
of the Module
and the value associated to that key is the update expression for the state. When called on a ModuleInstance
produced by the Module
, the method will calculate the outputs from the inputs and will update all the states as specified by the update expressions. See the basic example below.
Inner Module¶
To share a Member
between modules, the modules must be linked through the inner module mechanism.
Usage:
module2.submodule = module
ModuleInstance
¶
A Module
can produce a ModuleInstance
with its make
method. Think of this as a class and an object in C++/Java. If an attribute was a Member
, it will become a read/write access to actual data for the state. If it was a M.Method
, a function will be compiled with the proper signature and semantics.
Module Interface¶
def make(self, mode = {'FAST_COMPILE', 'FAST_RUN', ... }, **init)
‘’‘make’‘’ compiles all Methods
and allocates storage for all Members
into a ModuleInstance
object, which is returned. The init
dictionary can be used to provide initial values for the members.
def resolve(self, symbol, filter = None)
Resolves a symbol in this module. The symbol can be a string or a Variable
. If the string contains dots (eg "x.y"
), the module will resolve the symbol hierarchically in its inner modules. The filter argument is None or a class and it can be used to restrict the search to Member
or Method
instances for example.
def _instance_initialize(self, inst, **init)
The inst argument is a ModuleInstance
. For each key, value pair in init: setattr(inst, key, value)
is called. This can be easily overriden by Module
subclasses to initialize an instance in different ways. If you don’t know what to put their, don’t put it and it will execute a default version. If you want to call the parent version call: M.default_initialize(inst,**init)
Basic example¶
The problem here is to create two functions, inc
and dec
and a shared state c
such that inc(n)
increases c
by n
and dec(n)
decreases c
by n
. We also want a third function, plus10
, which return 10 + the current state without changing the current state. Using the function interface, the feature can be implemented as follows:
n, c = T.scalars('nc')
inc = theano.function([n, ((c, c + n), 0)], [])
dec = theano.function([n, ((c, c - n), inc.container[c])], []) # we need to pass inc's container in order to share
plus10 = theano.function([(c, inc.container[c])], c + 10)
assert inc[c] == 0
inc(2)
assert inc[c] == 2 and dec[c] == inc[c]
dec(3)
assert inc[c] == -1 and dec[c] == inc[c]
assert plus10() == 9
Now, using Module
:
m = M.Module()
n = T.scalar('n')
m.c = T.scalar() # state variables
m.inc = M.Method(n, [], updates = {m.c: m.c + n}) # m.c <= m.c + n
m.dec = M.Method(n, [], updates = {m.c: m.c - n}) # k.c <= k.c - n
#m.dec = M.Method(n, [], updates = {c: m.c - n})#global c don't exist
#m.plus10 does not update the state
m.plus10 = M.Method([], m.c + 10) # m.c is always accessible since it is a member of this mlass
inst = m.make(c = 0) # here, we make an "instance" of the module with c initialized to 0
assert inst.c == 0
inst.inc(2)
assert inst.c == 2
inst.dec(3)
assert inst.c == -1
assert inst.plus10() == 9
- Benefits of
Module
overfunction
in this example: - There is no need to manipulate the containers directly
- The fact inc and dec share a state is more obvious syntactically.
Method
does not require the states to be anywhere in the input list.- The interface of the instance produced by
m.make()
is simple and coherent, extremely similar to that of a normal python object. It is directly usable by any user.
Nesting example¶
The problem now is to create two pairs of inc dec
functions and a function sum
that adds the shared states of the first and second pair.
Using function:
def make_incdec_function():
n, c = T.scalars('nc')
inc = theano.function([n, ((c, c + n), 0)], [])
dec = theano.function([n, ((c, c - n), inc.container[c])], [])#inc and dec share the same state.
return inc,dec
inc1, dec1 = make_incdec_function()
inc2, dec2 = make_incdec_function()
a, b = T.scalars('ab')
sum = theano.function([(a, inc1.container['c']), (b, inc2.container['c'])], a + b)
inc1(2)
dec1(4)
inc2(6)
assert inc1['c'] == -2 and inc2['c'] == 6
assert sum() == 4 # -2 + 6
Using Module:
def make_incdec_module():
m = M.Module()
n = T.scalar('n')
m.c = T.scalar() # state variables
m.inc = M.Method(n, [], updates = {m.c: m.c + n}) # m.c <= m.c + n
m.dec = M.Method(n, [], updates = {m.c: m.c - n}) # m.c <= m.c - n
return m
m = M.Module()
m.incdec1 = make_incdec_module()
m.incdec2 = make_incdec_module()
m.sum = M.Method([], m.incdec1.c + m.incdec2.c)
inst = m.make(incdec1 = dict(c=0), incdec2 = dict(c=0))
inst.incdec1.inc(2)
inst.incdec1.dec(4)
inst.incdec2.inc(6)
assert inst.incdec1.c == -2 and inst.incdec2.c == 6
assert inst.sum() == 4 # -2 + 6
Here, we make a new Module
and we give it two inner Modules
like
the one defined in the basic example. Each inner module has methods inc
and dec as well as a state c and their state is directly accessible from
the outer module, which means that it can define methods using them. The
instance (inst) we make from the Module
(m) reflects the hierarchy
that we created. Unlike the method using function, there is no need to
manipulate any containers directly.
Advanced example¶
Complex models can be implemented by subclassing Module
(though that is not mandatory). Here is a complete, extensible (and working) regression model implemented using this system:
import theano
import numpy as N
from theano import tensor as T
from theano.tensor import nnet as NN
from theano.compile import module as M
class RegressionLayer(M.Module):
def __init__(self, input = None, target = None, regularize = True):
super(RegressionLayer, self).__init__() #boilerplate
# MODEL CONFIGURATION
self.regularize = regularize
# ACQUIRE/MAKE INPUT AND TARGET
if not input:
input = T.matrix('input')
if not target:
target = T.matrix('target')
# HYPER-PARAMETERS
self.stepsize = T.scalar() # a stepsize for gradient descent
# PARAMETERS
self.w = T.matrix() #the linear transform to apply to our input points
self.b = T.vector() #a vector of biases, which make our transform affine instead of linear
# REGRESSION MODEL
self.activation = T.dot(input, self.w) + self.b
self.prediction = self.build_prediction()
# CLASSIFICATION COST
self.classification_cost = self.build_classification_cost(target)
# REGULARIZATION COST
self.regularization = self.build_regularization()
# TOTAL COST
self.cost = self.classification_cost
if self.regularize:
self.cost = self.cost + self.regularization
# GET THE GRADIENTS NECESSARY TO FIT OUR PARAMETERS
self.grad_w, self.grad_b, grad_act = T.grad(self.cost, [self.w, self.b, self.prediction])
print 'grads', self.grad_w, self.grad_b
# INTERFACE METHODS
self.update = M.Method([input, target],
[self.cost, self.grad_w, self.grad_b, grad_act],
updates={self.w: self.w - self.stepsize * self.grad_w,
self.b: self.b - self.stepsize * self.grad_b})
self.apply = M.Method(input, self.prediction)
def params(self):
return self.w, self.b
def _instance_initialize(self, obj, input_size = None, target_size = None,
seed = 1827, **init):
# obj is an "instance" of this module holding values for each member and
# functions for each method
if input_size and target_size:
# initialize w and b in a special way using input_size and target_size
sz = (input_size, target_size)
rng = N.random.RandomState(seed)
obj.w = rng.uniform(size = sz, low = -0.5, high = 0.5)
obj.b = N.zeros(target_size)
obj.stepsize = 0.01
# here we call the default_initialize method, which takes all the name: value
# pairs in init and sets the property with that name to the provided value
# this covers setting stepsize, l2_coef; w and b can be set that way too
# we call it after as we want the parameter to superseed the default value.
M.default_initialize(obj,**init)
def build_regularization(self):
return T.zero() # no regularization!
class SpecifiedRegressionLayer(RegressionLayer):
""" XE mean cross entropy"""
def build_prediction(self):
# return NN.softmax(self.activation) #use this line to expose a slow subtensor
# implementation
return NN.sigmoid(self.activation)
def build_classification_cost(self, target):
self.classification_cost_matrix = (target - self.prediction)**2
#print self.classification_cost_matrix.type
self.classification_costs = T.sum(self.classification_cost_matrix, axis=1)
return T.sum(self.classification_costs)
def build_regularization(self):
self.l2_coef = T.scalar() # we can add a hyper parameter if we need to
return self.l2_coef * T.sum(self.w * self.w)
class PrintEverythingMode(theano.Mode):
def __init__(self, linker, optimizer=None):
def print_eval(i, node, fn):
print i, node, [input[0] for input in fn.inputs],
fn()
print [output[0] for output in fn.outputs]
wrap_linker = theano.gof.WrapLinkerMany([linker], [print_eval])
super(PrintEverythingMode, self).__init__(wrap_linker, optimizer)
def test_module_advanced_example():
profmode = theano.ProfileMode(optimizer='fast_run', linker=theano.gof.OpWiseCLinker())
profmode = PrintEverythingMode(theano.gof.OpWiseCLinker(), 'fast_run')
data_x = N.random.randn(4, 10)
data_y = [ [int(x)] for x in (N.random.randn(4) > 0)]
model = SpecifiedRegressionLayer(regularize = False).make(input_size = 10,
target_size = 1,
stepsize = 0.1,
mode=profmode)
for i in xrange(1000):
xe, gw, gb, ga = model.update(data_x, data_y)
if i % 100 == 0:
print i, xe
pass
#for inputs, targets in my_training_set():
#print "cost:", model.update(inputs, targets)
print "final weights:", model.w
print "final biases:", model.b
profmode.print_summary()
Here is how we use the model:
data_x = N.random.randn(4, 10)
data_y = [ [int(x)] for x in N.random.randn(4) > 0]
model = SoftmaxXERegression(regularize = False).make(input_size = 10,
target_size = 1,
stepsize = 0.1)
for i in xrange(1000):
xe = model.update(data_x, data_y)
if i % 100 == 0:
print i, xe
pass
#for inputs, targets in my_training_set():
#print "cost:", model.update(inputs, targets)
print "final weights:", model.w
print "final biases:", model.b
Extending Methods
¶
Methods
can be extended to update more parameters. For example, if we wanted to add a variable holding the sum of all costs encountered so far to SoftmaxXERegression
, we could proceed like this:
model_module = SoftmaxXERegression(regularize = False)
model_module.sum = T.scalar() # we add a module member to hold the sum
model_module.update.updates.update(sum = model_module.sum + model_module.cost) # now update will also update sum!
model = model_module.make(input_size = 4,
target_size = 2,
stepsize = 0.1,
sum = 0) # we mustn't forget to initialize the sum
test = model.update([[0,0,1,0]], [[0,1]]) + model.update([[0,1,0,0]], [[1,0]])
assert model.sum == test
The inputs and outputs list of a Method
can be doctored as well, but it is trickier, arguably less useful and not fully supported at the moment.