Package mdp
[hide private]
[frames] | no frames]

Package mdp


Modular toolkit for Data Processing (MDP) is a data processing
framework written in Python.

From the user's perspective, MDP consists of a collection of trainable
supervised and unsupervised algorithms or other data processing units
(nodes) that can be combined into data processing flows and more 
complex feed-forward network architectures. Given a
sequence of input data, MDP takes care of successively training or
executing all nodes in the network. This structure allows to specify
complex algorithms as a sequence of simpler data processing steps in a
natural way. Training can be performed using small chunks of input
data, so that the use of very large data sets becomes possible while
reducing the memory requirements. Memory usage can also be minimized
by defining the internals of the nodes to be single precision.

The set of readily available algorithms includes Principal Component
Analysis (PCA and NIPALS), four flavors of Independent Component
Analysis (CuBICA, FastICA, TDSEP, and JADE), Slow Feature Analysis,
Independent Slow Feature Analysis, Gaussian Classifiers, Growing
Neural Gas, Fisher Discriminant Analysis, Factor Analysis, Restricted
Boltzmann Machine, and many more.

From the developer's perspective, MDP is a framework to make the
implementation of new algorithms easier. The basic class 'Node' takes
care of tedious tasks like numerical type and dimensionality checking,
leaving the developer free to concentrate on the implementation of the
training and execution phases. The node then automatically integrates
with the rest of the library and can be used in a flow together with
other nodes. A node can have multiple training phases and even an
undetermined number of phases. This allows for example the
implementation of algorithms that need to collect some statistics on
the whole input before proceeding with the actual training, or others
that need to iterate over a training phase until a convergence
criterion is satisfied. The ability to train each phase using chunks
of input data is maintained if the chunks are generated with
iterators. Moreover, crash recovery is optionally available: in case
of failure, the current state of the flow is saved for later
inspection.

MDP has been written in the context of theoretical research in
neuroscience, but it has been designed to be helpful in any context
where trainable data processing algorithms are used. Its simplicity on
the user side together with the reusability of the implemented nodes
make it also a valid educational tool.

As its users' and contributors' base is steadily increasing, MDP appears
as a good candidate for becoming a common repository of user-supplied, freely
available, Python implemented data processing algorithms.

http://mdp-toolkit.sourceforge.net


Version: 2.3

Author: Pietro Berkes, Niko Wilbert, and Tiziano Zito

Contact: mdp-toolkit-users AT lists.sourceforge.net

Copyright: (c) 2003-2008 Pietro Berkes, Niko Wilbert, Tiziano Zito

License: LGPL v3, http://www.gnu.org/licenses/lgpl.html

Submodules [hide private]

Classes [hide private]
CheckpointFlow
Subclass of Flow class that allows user-supplied checkpoint functions to be executed at the end of each phase, for example to save the internal structures of a node for later analysis.
CheckpointFunction
Base class for checkpoint functions.
CheckpointSaveFunction
This checkpoint function saves the node in pickle format.
CrashRecoveryException
Class to handle crash recovery
Cumulator
A Cumulator is a Node whose training phase simply cumulates all input data.
Flow
A Flow consists in a linear sequence of Nodes.
FlowException
Base class for exceptions in Flow subclasses.
FlowExceptionCR
Class to handle flow-crash recovery
IsNotInvertibleException
Raised when the 'inverse' function is called although the node is not invertible.
IsNotTrainableException
Raised when the 'train' function is called although the node is not trainable.
MDPException
Base class for exceptions in MDP.
MDPWarning
Base class for warnings in MDP.
Node
Node is the basic unit in MDP and it represents a data processing element, like for example a learning algorithm, a filter, a visualization step, etc.
NodeException
Base class for exceptions in Node subclasses.
TrainingException
Base class for exceptions in the training phase.
TrainingFinishedException
Raised when the 'train' function is called although the training phase is closed.
Functions [hide private]
 
cubica(x, **kwargs)
Perform Independent Component Analysis on input data using the CuBICA algorithm by Tobias Blaschke.
 
factor_analysis(x, **kwargs)
Perform Factor Analysis on the input data and returns the Mximum A Posteriori estimate of the latent variables.
 
fastica(x, **kwargs)
Perform Independent Component Analysis on input data using the FastICA algorithm by Aapo Hyvarinen.
 
get_eta(x, **kwargs)
Compute eta values (a slowness measure) of the input data.
 
isfa(x, **kwargs)
 
pca(x, **kwargs)
Filters multidimensioanl input data through its principal components.
 
sfa(x, **kwargs)
Perform Slow Feature Analysis on input data using the SFA algorithm by Laurenz Wiskott.
 
sfa2(x, **kwargs)
Perform quadratic Slow Feature Analysis on input data using the SFA algorithm by Laurenz Wiskott.
 
test(suitename='all', verbosity=2, seed=None, testname=None)
 
whitening(x, **kwargs)
Filters multidimensional input data through its principal components, rescaling the output signals such that they have unit variance.
Variables [hide private]
  numx_description = 'scipy'
Function Details [hide private]

cubica(x, **kwargs)

 
Perform Independent Component Analysis on input data using the CuBICA
algorithm by Tobias Blaschke.

Observations of the same variable are stored on rows, different variables
are stored on columns.

This is a shortcut function for the corresponding node CuBICANode.
If any keyword arguments are specified, they are passed to its constructor.

factor_analysis(x, **kwargs)

 
Perform Factor Analysis on the input data and returns the
Mximum A Posteriori estimate of the latent variables.

Observations of the same variable are stored on rows, different variables
are stored on columns.

This is a shortcut function for the corresponding node FANode.
If any keyword arguments are specified, they are passed to its constructor.

fastica(x, **kwargs)

 
Perform Independent Component Analysis on input data using the FastICA
algorithm by Aapo Hyvarinen.

Observations of the same variable are stored on rows, different variables
are stored on columns.

This is a shortcut function for the corresponding node FastICANode.
If any keyword arguments are specified, they are passed to its constructor.

get_eta(x, **kwargs)

 
Compute eta values (a slowness measure) of the input data.

The delta value of a signal is a measure of its temporal
variation, and is defined as the mean of the derivative squared,
i.e. delta(x) = mean(dx/dt(t)^2).  delta(x) is zero if
x is a constant signal, and increases if the temporal variation
of the signal is larger.

The eta value is a more intuitive measure of temporal variation,
defined as
   eta(x) = T/(2*pi) * sqrt(delta(x))
If x is a signal of length T which consists of a sine function
that accomplishes exactly N oscillations, then eta(x)=N.

Input data are normalized to have unit variance, such that it is
possible to compare the temporal variation of two signals
independently from their scaling.    

Observations of the same variable are stored on rows, different variables
are stored on columns.

This is a shortcut function for the corresponding node EtaComputerNode.
If any keyword arguments are specified, they are passed to its constructor.

isfa(x, **kwargs)

 

pca(x, **kwargs)

 
Filters multidimensioanl input data through its principal components.

Observations of the same variable are stored on rows, different variables
are stored on columns.

This is a shortcut function for the corresponding node PCANode. If any
keyword arguments are specified, they are passed to its constructor.

sfa(x, **kwargs)

 
Perform Slow Feature Analysis on input data using the SFA
algorithm by Laurenz Wiskott.

Observations of the same variable are stored on rows, different variables
are stored on columns.

This is a shortcut function for the corresponding node SFANode.
If any keyword arguments are specified, they are passed to its constructor.

sfa2(x, **kwargs)

 
Perform quadratic Slow Feature Analysis on input data using the SFA
algorithm by Laurenz Wiskott.

Observations of the same variable are stored on rows, different variables
are stored on columns.

This is a shortcut function for the corresponding node SFA2Node.
If any keyword arguments are specified, they are passed to its constructor.

test(suitename='all', verbosity=2, seed=None, testname=None)

 

whitening(x, **kwargs)

 
Filters multidimensional input data through its principal components,
rescaling the output signals such that they have unit variance.

Observations of the same variable are stored on rows, different variables
are stored on columns.

This is a shortcut function for the corresponding node WhiteningNode.
If any keyword arguments are specified, they are passed to its constructor.


Variables Details [hide private]

numx_description

Value:
'scipy'