Package mdp :: Package nodes :: Class NIPALSNode
[hide private]
[frames] | no frames]

Class NIPALSNode


Perform Principal Component Analysis using the NIPALS algorithm. This algorithm is particularyl useful if you have more variable than observations, or in general when the number of variables is huge and calculating a full covariance matrix may be unfeasable. It's also more efficient of the standard PCANode if you expect the number of significant principal components to be a small. In this case setting output_dim to be a certain fraction of the total variance, say 90%, may be of some help.

Internal variables of interest

self.avg
Mean of the input data (available after training).
self.d
Variance corresponding to the PCA components.
self.v
Transposed of the projection matrix (available after training).
self.explained_variance
When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.

Reference for NIPALS (Nonlinear Iterative Partial Least Squares): Wold, H. Nonlinear estimation by iterative least squares procedures. in David, F. (Editor), Research Papers in Statistics, Wiley, New York, pp 411-444 (1966).

More information about Principal Component Analysis, a.k.a. discrete Karhunen-Loeve transform can be found among others in I.T. Jolliffe, Principal Component Analysis, Springer-Verlag (1986).

Original code contributed by: Michael Schmuker, Susanne Lezius, and Farzad Farkhooi (2008).

Instance Methods [hide private]
 
__init__(self, input_dim=None, output_dim=None, dtype=None, conv=1e-08, max_it=100000)
The number of principal components to be kept can be specified as 'output_dim' directly (e.g. 'output_dim=10' means 10 components are kept) or by the fraction of variance to be explained (e.g. 'output_dim=0.95' means that as many components as necessary will be kept in order to explain 95% of the input variance).
 
_stop_training(self, debug=False)
Concatenate the collected data in a single array.
 
_train(self, x)
Collect all input data in a list.
 
stop_training(self, debug=False)
Concatenate the collected data in a single array.
 
train(self, x)
Collect all input data in a list.

Inherited from unreachable.newobject: __long__, __native__, __nonzero__, __unicode__, next

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

    Inherited from PCANode
 
_adjust_output_dim(self)
Return the eigenvector range and set the output dim if required.
 
_check_output(self, y)
 
_execute(self, x, n=None)
Project the input on the first 'n' principal components. If 'n' is not set, use all available components.
 
_inverse(self, y, n=None)
Project 'y' to the input space using the first 'n' components. If 'n' is not set, use all available components.
 
_set_output_dim(self, n)
 
execute(self, x, n=None)
Project the input on the first 'n' principal components. If 'n' is not set, use all available components.
 
get_explained_variance(self)
Return the fraction of the original variance that can be explained by self._output_dim PCA components. If for example output_dim has been set to 0.95, the explained variance could be something like 0.958... Note that if output_dim was explicitly set to be a fixed number of components, there is no way to calculate the explained variance.
 
get_projmatrix(self, transposed=1)
Return the projection matrix.
 
get_recmatrix(self, transposed=1)
Return the back-projection matrix (i.e. the reconstruction matrix).
 
inverse(self, y, n=None)
Project 'y' to the input space using the first 'n' components. If 'n' is not set, use all available components.
    Inherited from Node
 
__add__(self, other)
 
__call__(self, x, *args, **kwargs)
Calling an instance of Node is equivalent to calling its execute method.
 
__repr__(self)
repr(x)
 
__str__(self)
str(x)
 
_check_input(self, x)
 
_check_train_args(self, x, *args, **kwargs)
 
_get_supported_dtypes(self)
Return the list of dtypes supported by this node.
 
_get_train_seq(self)
 
_if_training_stop_training(self)
 
_pre_execution_checks(self, x)
This method contains all pre-execution checks.
 
_pre_inversion_checks(self, y)
This method contains all pre-inversion checks.
 
_refcast(self, x)
Helper function to cast arrays to the internal dtype.
 
_set_dtype(self, t)
 
_set_input_dim(self, n)
 
copy(self, protocol=None)
Return a deep copy of the node.
 
get_current_train_phase(self)
Return the index of the current training phase.
 
get_dtype(self)
Return dtype.
 
get_input_dim(self)
Return input dimensions.
 
get_output_dim(self)
Return output dimensions.
 
get_remaining_train_phase(self)
Return the number of training phases still to accomplish.
 
get_supported_dtypes(self)
Return dtypes supported by the node as a list of dtype objects.
 
has_multiple_training_phases(self)
Return True if the node has multiple training phases.
 
is_training(self)
Return True if the node is in the training phase, False otherwise.
 
save(self, filename, protocol=-1)
Save a pickled serialization of the node to filename. If filename is None, return a string.
 
set_dtype(self, t)
Set internal structures' dtype.
 
set_input_dim(self, n)
Set input dimensions.
 
set_output_dim(self, n)
Set output dimensions.
Static Methods [hide private]
    Inherited from Node
 
is_invertible()
Return True if the node can be inverted, False otherwise.
 
is_trainable()
Return True if the node can be trained, False otherwise.
Properties [hide private]

Inherited from object: __class__

    Inherited from Node
  _train_seq
List of tuples:
  dtype
dtype
  input_dim
Input dimensions
  output_dim
Output dimensions
  supported_dtypes
Supported dtypes
Method Details [hide private]

__init__(self, input_dim=None, output_dim=None, dtype=None, conv=1e-08, max_it=100000)
(Constructor)

 

The number of principal components to be kept can be specified as 'output_dim' directly (e.g. 'output_dim=10' means 10 components are kept) or by the fraction of variance to be explained (e.g. 'output_dim=0.95' means that as many components as necessary will be kept in order to explain 95% of the input variance).

Other Arguments:
conv - convergence threshold for the residual error. max_it - maximum number of iterations
Overrides: object.__init__

_stop_training(self, debug=False)

 
Concatenate the collected data in a single array.
Overrides: Node._stop_training

_train(self, x)

 
Collect all input data in a list.
Overrides: Node._train

stop_training(self, debug=False)

 
Concatenate the collected data in a single array.
Overrides: Node.stop_training

train(self, x)

 
Collect all input data in a list.
Overrides: Node.train