When should I use Module vs. Op?ΒΆ

An Op can:

  • compute some stuff from multiple inputs
  • produce multiple outputs
  • resize them as necessary (matrices are generally resizable in theano runtime. The only exception is when a vector of integers is used as the new shape of some tensor. Since the rank of a tensor is fixed, that vector of integers must have a fixed length.)

Python expressions provide the nice syntax for sticking Ops together as if they were more like PLearn Variables.

But all the links that you have in PLearn modules are still present with Ops, and they can be manipulated explicitly. Theano’s optimizations do this. They look at small sections of the graph (so you can handle the case where your inputs come from certain kinds of Ops differently) and re-wire the graph.

So that’s the Op for you. I think that if you only ever wanted to compile one theano function per program execution, the Ops would be enough, and there would be no need for Modules. The Modules exist to make it syntactically easier for multiple theano compiled functions to collaborate by sharing physical variables. When you compile a Method in a Module, it automatically gets access to all of the Module’s Member [symbolic] variables, and when that Method is compiled to a function, then the function gets access to all of the ModuleInstance’s [physical] variables.

So I don’t think that there is an analog of Modules in PLearn, because PLearn doesn’t make a distinction between the symbolic and instantiated functions.

Modules can also be arranged into graphs. Modules can contain Modules. The meaning of this though is not computational... it’s still about sharing variables. Variables are shared throughout the entire Module graph: when you add a DAA to a parent Module for example, the parent Module’s Methods gain access to the variables in the DAA. This makes it possible to identify each Module with one particular USE of member variables. Complex behaviour can be built up by adding a few Modules that each do something (bring new skills??) with common variables.

For example, you could have one Module that has weight and bias members, and comes with some [symbolic] methods for using these members to compute the hidden-unit activation of a single layer, given some input. Then, you could add a second module that handles pre-training by RBM. You would tell the second Module to use the weight matrix and bias of the first Module, and it would allocate its own member for the visible-unit bias. This module would come with [symbolic] methods for doing CD. Then, you could add a third module that handles pre-training by DAA! You would tell this module about the original weights (maybe the bias too, maybe not) and it would allocate the downward weights, and yet another visible-bias, and would add [symbolic] methods for pre-training that way.

When you compile the parent module, all of the methods associated with each algorithm will be compiled so that they update the same weight matrix that the first module uses to compute it’s hidden-unit activation.

This ability for Modules to bring algorithms to work on existing member variables is what lets the minimization algorithm be separated from the layer and regression Modules. There is thus a stochastic gradient descent module, that has one little learning-rate Member, and basically just works with the parameter members of other modules.

The module system would make it possible to include several minimization algorithms at once too, so you could start off by using SGD, and then maybe later you could switch to conjugate gradient for really fine tuning.

So, Ops are for computations and Modules are for sharing variables between Theano functions.