Iterables¶
CodeSnippet
You can download all the code on this page from the code snippets directory
Python allows user-defined classes to support iteration,
as described in the Python docs. A class is a
so called iterable if it defines a method __iter__
that returns an
iterator instance. An iterable is typically some kind of container or
collection (e.g. list
and tuple
are iterables).
The iterator instance must have a next
method that returns the next
element in the iteration. In Python an iterable also has to have an
__iter__
method itself that returns self
instead of a new iterator.
It is important to understand that an iterator only manages a single iteration.
After this iteration it is spend and cannot be used for a second iteration
(it cannot be restarted). An iterable on the other hand can create as many
iterators as needed and therefore supports multiple iterations. Even though
both iterables and iterators have an __iter__
method they are
semantically very different (duck-typing can be misleading in this case).
In the context of MDP this means that an iterator can only be used for a single training phase, while iterables also support multiple training phases. So if you use a node with multiple training phases and train it in a flow make sure that you provide an iterable for this node (otherwise an exception will be raised). For nodes with a single training phase you can use either an iterable or an iterator.
A convenient implementation of the iterator protocol is provided by generators: see this article for an introduction, and the official PEP 255 for a complete description.
Let us define two bogus node classes to be used as examples of nodes
>>> class BogusNode(mdp.Node):
... """This node does nothing."""
... def _train(self, x):
... pass
>>> class BogusNode2(mdp.Node):
... """This node does nothing. But it's neither trainable nor invertible.
... """
... def is_trainable(self): return False
... def is_invertible(self): return False
This generator generates blocks
input blocks to be used as training set.
In this example one block is a 2-dimensional time series. The first variable
is [2,4,6,....,1000] and the second one [0,1,3,5,...,999].
All blocks are equal, this of course would not be the case in a real-life
example.
In this example we use a progress bar to get progress information.
>>> def gen_data(blocks):
... for i in mdp.utils.progressinfo(xrange(blocks)):
... block_x = np.atleast_2d(np.arange(2.,1001,2))
... block_y = np.atleast_2d(np.arange(1.,1001,2))
... # put variables on columns and observations on rows
... block = np.transpose(np.concatenate([block_x,block_y]))
... yield block
The progressinfo
function is a fully configurable text-mode
progress info box tailored to the command-line die-hards. Have a look
at its doc-string and prepare to be amazed!
Let’s define a bogus flow consisting of 2 BogusNode
s
>>> flow = mdp.Flow([BogusNode(),BogusNode()], verbose=1)
Train the first node with 5000 blocks and the second node with 3000 blocks.
Note that the only allowed argument to train
is a sequence (list or
tuple) of iterables or iterators. In case you don’t want or need to use
incremental learning and want to do a one-shot training, you can use as
argument to train
a single array of data.
Block-mode training¶
>>> flow.train([gen_data(5000),gen_data(3000)])
Training node #0 (BogusNode)
[===================================100%==================================>]
Training finished
Training node #1 (BogusNode)
[===================================100%==================================>]
Training finished
Close the training phase of the last node
One-shot training using one single set of data for both nodes¶
>>> flow = BogusNode() + BogusNode()
>>> block_x = np.atleast_2d(np.arange(2.,1001,2))
>>> block_y = np.atleast_2d(np.arange(1.,1001,2))
>>> single_block = np.transpose(np.concatenate([block_x,block_y]))
>>> flow.train(single_block)
If your flow contains non-trainable nodes, you must specify a None
for the non-trainable nodes
>>> flow = mdp.Flow([BogusNode2(),BogusNode()], verbose=1)
>>> flow.train([None, gen_data(5000)])
Training node #0 (BogusNode2)
Training finished
Training node #1 (BogusNode)
[===================================100%==================================>]
Training finished
Close the training phase of the last node
You can use the one-shot training
>>> flow = mdp.Flow([BogusNode2(),BogusNode()], verbose=1)
>>> flow.train(single_block)
Training node #0 (BogusNode2)
Training finished
Training node #1 (BogusNode)
Training finished
Close the training phase of the last node
Iterators can always be safely used for execution and inversion, since only a single iteration is needed
>>> flow = mdp.Flow([BogusNode(),BogusNode()], verbose=1)
>>> flow.train([gen_data(1), gen_data(1)])
Training node #0 (BogusNode)
Training finished
Training node #1 (BosgusNode)
[===================================100%==================================>]
Training finished
Close the training phase of the last node
>>> output = flow.execute(gen_data(1000))
[===================================100%==================================>]
>>> output = flow.inverse(gen_data(1000))
[===================================100%==================================>]
Execution and inversion can be done in one-shot mode also. Note that since training is finished you are not going to get a warning
>>> output = flow(single_block)
>>> output = flow.inverse(single_block)
If a node requires multiple training phases (e.g.,
GaussianClassifierNode
), Flow
automatically takes care of using the
iterable multiple times. In this case generators (and iterators) are not
allowed, since they are spend after yielding the last data block.
However, it is fairly easy to wrap a generator in a simple iterable if you need to
>>> class SimpleIterable(object):
... def __init__(self, blocks):
... self.blocks = blocks
... def __iter__(self):
... # this is a generator
... for i in range(self.blocks):
... yield generate_some_data()
Note that if you use random numbers within the generator, you usually would like to reset the random number generator to produce the same sequence every time
>>> class RandomIterable(object):
... def __init__(self):
... self.state = None
... def __iter__(self):
... if self.state is None:
... self.state = np.random.get_state()
... else:
... np.random.set_state(self.state)
... for i in range(2):
... yield np.random.random((1,4))
>>> iterable = RandomIterable()
>>> for x in iterable:
... print x
[[ 0.5488135 0.71518937 0.60276338 0.54488318]]
[[ 0.4236548 0.64589411 0.43758721 0.891773 ]]
>>> for x in iterable:
... print x
[[ 0.5488135 0.71518937 0.60276338 0.54488318]]
[[ 0.4236548 0.64589411 0.43758721 0.891773 ]]