| 15.05.2008: | MDP 2.3 released (changes since last release). |
| 22.03.2008: | MDP 2.2 released. |
Modular toolkit for Data Processing (MDP) is a Python data processing framework. Implemented algorithms include: Principal Component Analysis (PCA), Independent Component Analysis (ICA), Slow Feature Analysis (SFA), Independent Slow Feature Analysis (ISFA), Growing Neural Gas (GNG), Factor Analysis, Fisher Discriminant Analysis (FDA), Gaussian Classifiers, and Restricted Boltzmann Machines. Read the full list.
Quick start
Using MDP is as easy as:>>> import mdp >>> # perform pca on some data x ... >>> y = mdp.pca(x) >>> # perform ica on some data x using single precision ... >>> y = mdp.fastica(x, dtype='float32')MDP is of course much more than this: it allows to combine different algorithms and other data processing elements (nodes) into data processing sequences (flows), and more general feed-forward architectures (with the new
hinet subpackage).
Moreover, it provides a framework that
makes the implementation of new algorithms easy and intuitive.
To learn more about MDP:
- Read the long description
- Take a look at the Tutorial (pdf 290 KB)
- See the presentation given at the Europython conference in Geneva, Switzerland, July 3-5 2006: OpenOffice (603 KB), pdf (250 KB).
- Sneak through the API
Description
Modular toolkit for Data Processing (MDP) is a data processing framework written in Python.From the user's perspective, MDP consists of a collection of trainable supervised and unsupervised algorithms or other data processing units (nodes) that can be combined into data processing flows and more complex feed-forward network architectures. Given a sequence of input data, MDP takes care of successively training or executing all nodes in the network. This structure allows to specify complex algorithms as a sequence of simpler data processing steps in a natural way. Training can be performed using small chunks of input data, so that the use of very large data sets becomes possible while reducing the memory requirements. Memory usage can also be minimized by defining the internals of the nodes to be single precision.
The base of readily available algorithms includes Principal Component Analysis (PCA and NIPALS), four flavors of Independent Component Analysis (CuBICA, FastICA, TDSEP, and JADE), Slow Feature Analysis, Independent Slow Feature Analysis, Gaussian Classifiers, Growing Neural Gas, Fisher Discriminant Analysis, Factor Analysis, Restricted Boltzmann Machine, and many more.
From the developer's perspective, MDP is a framework to make the
implementation of new supervised and unsupervised algorithms easier.
The basic class Node takes
care of tedious tasks like numerical type and dimensionality checking,
leaving the developer free to concentrate on the implementation of the
training and execution phases. The node then automatically integrates
with the rest of the library and can be used in a flow together with
other nodes. A node can have multiple training phases and even an
undetermined number of phases. This allows for example the
implementation of algorithms that need to collect some statistics on
the whole input before proceeding with the actual training, or others
that need to iterate over a training phase until a convergence
criterion is satisfied. The ability to train each phase using chunks
of input data is maintained if the chunks are generated with
iterators. Moreover, crash recovery is optionally available: in case
of failure, the current state of the flow is saved for later
inspection.
MDP has been written in the context of theoretical research in neuroscience, but it has been designed to be helpful in any context where trainable data processing algorithms are used. Its simplicity on the user side together with the reusability of the implemented nodes make it also a valid educational tool.
As its user and contributor base is steadily increasing, MDP appears as a good candidate for becoming a common repository of user-supplied, freely available, Python implemented data processing algorithms.
Installation
Requirements: Python ≥ 2.4, and NumPy ≥ 1.0 or Scipy ≥ 0.5.2. The symeig package is automatically used if installed.Download: Download MDP 2.3 at SourceForge. If you want to live on the bleeding edge, check out the MDP svn repository: you can browse the repository or just check out the trunk with:
svn co https://mdp-toolkit.svn.sourceforge.net/svnroot/mdp-toolkit/mdp/trunk/mdp mdpThanks to Yaroslav Halchenko, Debian lenny/sid users can install the
python-mdp package.
Installation:
Unpack the archive file, enter the project directory and type:
python setup.py installIf you want to use MDP without installing it on the system Python path:
python setup.py install --prefix=/some_dir_in_PYTHONPATH/On Debian lenny/sid you can just type:
aptitude update aptitude install python-mdpOn Windows, the installation of the binary distribution is as easy as executing the installer and following the instructions.
Testing:
If you have successfully installed MDP, you can test your installation in a Python shell
as follows:
>>> import mdp >>> mdp.test()
Demos:
All the examples shown in the MDP tutorial
can be found in the package installation path in the subdirectory
demo.
Maintainers
MDP has been originally written by Pietro Berkes and Tiziano Zito at the Institute for Theoretical Biology of the Humboldt University, Berlin in 2003.Current maintainers are:
Yaroslav Halchenko maintains the python-mdp Debian package.For comments, patches, feature requests, support requests, and bug reports (if any) you can use the users mailing list.
If you want to contribute some code or a new algorithm, please do not hesitate to submit it!
How to cite MDP
If you use MDP for scientific purposes, you may want to cite it. This is the official way to do it:
Berkes, P., Wilbert, N., and Zito, T. (2008)
Modular Toolkit for Data Processing (version 2.3)
http://mdp-toolkit.sourceforge.net
If your paper gets published, plase send us a reference (and even a copy if you don't mind).
