Home | Trees | Indices | Help |
|
---|
|
The Modular toolkit for Data Processing (MDP) package is a library of widely used data processing algorithms, and the possibility to combine them together to form pipelines for building more complex data processing software.
MDP has been designed to be used as-is and as a framework for scientific data processing development.
From the user's perspective, MDP consists of a collection of units, which process data. For example, these include algorithms for supervised and unsupervised learning, principal and independent components analysis and classification.
These units can be chained into data processing flows, to create pipelines as well as more complex feed-forward network architectures. Given a set of input data, MDP takes care of training and executing all nodes in the network in the correct order and passing intermediate data between the nodes. This allows the user to specify complex algorithms as a series of simpler data processing steps.
The number of available algorithms is steadily increasing and includes signal processing methods (Principal Component Analysis, Independent Component Analysis, Slow Feature Analysis), manifold learning methods ([Hessian] Locally Linear Embedding), several classifiers, probabilistic methods (Factor Analysis, RBM), data pre-processing methods, and many others.
Particular care has been taken to make computations efficient in terms of speed and memory. To reduce the memory footprint, it is possible to perform learning using batches of data. For large data-sets, it is also possible to specify that MDP should use single precision floating point numbers rather than double precision ones. Finally, calculations can be parallelised using the parallel subpackage, which offers a parallel implementation of the basic nodes and flows.
From the developer's perspective, MDP is a framework that makes the implementation of new supervised and unsupervised learning algorithms easy and straightforward. The basic class, Node, takes care of tedious tasks like numerical type and dimensionality checking, leaving the developer free to concentrate on the implementation of the learning and execution phases. Because of the common interface, the node then automatically integrates with the rest of the library and can be used in a network together with other nodes.
A node can have multiple training phases and even an undetermined number of phases. Multiple training phases mean that the training data is presented multiple times to the same node. This allows the implementation of algorithms that need to collect some statistics on the whole input before proceeding with the actual training, and others that need to iterate over a training phase until a convergence criterion is satisfied. It is possible to train each phase using chunks of input data if the chunks are given as an iterable. Moreover, crash recovery can be optionally enabled, which will save the state of the flow in case of a failure for later inspection.
MDP is distributed under the open source BSD license. It has been written in the context of theoretical research in neuroscience, but it has been designed to be helpful in any context where trainable data processing algorithms are used. Its simplicity on the user's side, the variety of readily available algorithms, and the reusability of the implemented nodes also make it a useful educational tool.
http://mdp-toolkit.sourceforge.net
Version: 3.5
Author: MDP Developers
Contact: mdp-toolkit-users@lists.sourceforge.net
Copyright: (c) 2003-2016 mdp-toolkit-devel@lists.sourceforge.net
License: BSD License, see COPYRIGHT
|
|||
|
|
|||
CheckpointFlow Subclass of Flow class that allows user-supplied checkpoint functions to be executed at the end of each phase, for example to save the internal structures of a node for later analysis. |
|||
CheckpointFunction Base class for checkpoint functions. |
|||
CheckpointSaveFunction This checkpoint function saves the node in pickle format. The pickle dump can be done either before the training phase is finished or right after that. In this way, it is for example possible to reload it in successive sessions and continue the training. |
|||
ClassifierCumulator A ClassifierCumulator is a Node whose training phase simply collects all input data and labels. In this way it is possible to easily implement batch-mode learning. |
|||
ClassifierNode A ClassifierNode can be used for classification tasks that should not interfere with the normal execution flow. A reason for that is that the labels used for classification do not form a vector space, and so they don't make much sense in a flow. |
|||
CrashRecoveryException Class to handle crash recovery |
|||
Cumulator A specialized version of VariadicCumulator which only fills the field self.data. |
|||
ExtensionNode Base class for extensions nodes. |
|||
ExtensionNodeMetaclass This is the metaclass for node extension superclasses. |
|||
Flow A 'Flow' is a sequence of nodes that are trained and executed together to form a more complex algorithm. Input data is sent to the first node and is successively processed by the subsequent nodes along the sequence. |
|||
FlowException Base class for exceptions in Flow subclasses. |
|||
FlowExceptionCR Class to handle flow-crash recovery |
|||
IsNotInvertibleException Raised when the Node.inverse method is called although the node is not invertible. |
|||
IsNotTrainableException Raised when the Node.train method is called although the node is not trainable. |
|||
MDPDeprecationWarning Warn about deprecated MDP API. |
|||
MDPException Base class for exceptions in MDP. |
|||
MDPWarning Base class for warnings in MDP. |
|||
Node A Node is the basic building block of an MDP application. |
|||
NodeException Base class for exceptions in Node subclasses. |
|||
NodeMetaclass A metaclass which copies docstrings from private to public methods. |
|||
TrainingException Base class for exceptions in the training phase. |
|||
TrainingFinishedException Raised when the Node.train method is called although the training phase is closed. |
|||
config Provide information about optional dependencies. |
|||
extension Context manager for MDP extension. |
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|
|||
__homepage__ =
|
|||
__medium_description__ =
|
|||
__package__ =
|
|||
__revision__ =
|
|||
__short_description__ =
|
|||
_pp_needs_monkeypatching = False
|
|||
numx_description =
|
|
A VariadicCumulator is a Node whose training phase simply collects all input data. In this way it is possible to easily implement batch-mode learning. The data is accessible in the attributes given with the VariadicCumulator's constructor after the beginning of the Node._stop_training phase. self.tlen contains the number of data points collected. |
|
Activate all the extensions for the given names. extension_names -- Sequence of extension names. |
|
Deactivate all the extensions for the given names. extension_names -- Sequence of extension names. |
Returns a decorator to register a function as an extension method. Note that it is possible to directly call other extension functions, call extension methods in other node classes or to use super in the normal way (the function will be called as a method of the node class).
|
Returns a decorator to register a setup function for an extension. The decorated function will be called when the extension is activated. Note that there is also the extension_teardown decorator, which should probably defined as well if there is a setup procedure.
|
Returns a decorator to register a teardown function for an extension. The decorated function will be called when the extension is deactivated.
|
Perform Independent Component Analysis on input data using the FastICA algorithm by Aapo Hyvarinen. Observations of the same variable are stored on rows, different variables are stored on columns. This is a shortcut function for the corresponding node nodes.FastICANode. If any keyword arguments are specified, they are passed to its constructor. This is equivalent to mdp.nodes.FastICANode(**kwargs)(x) |
Return a dictionary currently registered extensions. Note that this is not a copy, so if you change anything in this dict the whole extension mechanism will be affected. If you just want the names of the available extensions use get_extensions().keys(). |
Filters multidimensional input data through its principal components. Observations of the same variable are stored on rows, different variables are stored on columns. This is a shortcut function for the corresponding node nodes.PCANode. If any keyword arguments are specified, they are passed to its constructor. This is equivalent to mdp.nodes.PCANode(**kwargs)(x) |
Return a wrapper function to activate and deactivate the extension. This function is intended to be used with the decorator syntax. The deactivation happens only if the extension was activated by the decorator (not if it was already active before). So this decorator ensures that the extensions is active and prevents unintended side effects. If the generated function is a generator, the extension will be in effect only when the generator object is created (that is when the function is called, but its body is not actually immediately executed). When the function body is executed (after next is called on the generator object), the extension might not be in effect anymore. Therefore, it is better to use the extension context manager with a generator function. |
|
__homepage__
|
__medium_description__
|
__package__
|
__revision__
|
__short_description__
|
_pp_needs_monkeypatching
|
numx_description
|
Home | Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0.1 on Tue Mar 8 12:39:48 2016 | http://epydoc.sourceforge.net |