Package mdp :: Package nodes :: Class PCANode

Class PCANode

Filter the input data through the most significatives of its principal components.

Internal variables of interest

self.avg

Mean of the input data (available after training).

self.v

Transposed of the projection matrix (available after training).

self.d

Variance corresponding to the PCA components (eigenvalues of the covariance matrix).

self.explained_variance

When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.

More information about Principal Component Analysis, a.k.a. discrete Karhunen-Loeve transform can be found among others in I.T. Jolliffe, Principal Component Analysis, Springer-Verlag (1986).

Instance Methods

[hide private]

__init__(self, input_dim=None, output_dim=None, dtype=None, svd=False, reduce=False, var_rel=1e-12, var_abs=1e-15, var_part=None)
The number of principal components to be kept can be specified as 'output_dim' directly (e.g. 'output_dim=10' means 10 components are kept) or by the fraction of variance to be explained (e.g. 'output_dim=0.95' means that as many components as necessary will be kept in order to explain 95% of the input variance).

_adjust_output_dim(self)
Return the eigenvector range and set the output dim if required.

_check_output(self, y)

_execute(self, x, n=None)
Project the input on the first 'n' principal components. If 'n' is not set, use all available components.

_inverse(self, y, n=None)
Project 'y' to the input space using the first 'n' components. If 'n' is not set, use all available components.

_set_output_dim(self, n)

_stop_training(self, debug=False)
Stop the training phase.

_train(self, x)

execute(self, x, n=None)
Project the input on the first 'n' principal components. If 'n' is not set, use all available components.

get_explained_variance(self)
Return the fraction of the original variance that can be explained by self._output_dim PCA components. If for example output_dim has been set to 0.95, the explained variance could be something like 0.958... Note that if output_dim was explicitly set to be a fixed number of components, there is no way to calculate the explained variance.

get_projmatrix(self, transposed=1)
Return the projection matrix.

get_recmatrix(self, transposed=1)
Return the back-projection matrix (i.e. the reconstruction matrix).

inverse(self, y, n=None)
Project 'y' to the input space using the first 'n' components. If 'n' is not set, use all available components.

stop_training(self, debug=False)
Stop the training phase.

train(self, x)
Update the internal structures according to the input data x.

Inherited from unreachable.newobject: __long__, __native__, __nonzero__, __unicode__, next

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

Inherited from Node

__add__(self, other)

__call__(self, x, *args, **kwargs)
Calling an instance of Node is equivalent to calling its execute method.

__repr__(self)
repr(x)

__str__(self)
str(x)

_check_input(self, x)

_check_train_args(self, x, *args, **kwargs)

_get_supported_dtypes(self)
Return the list of dtypes supported by this node.

_get_train_seq(self)

_if_training_stop_training(self)

_pre_execution_checks(self, x)
This method contains all pre-execution checks.

_pre_inversion_checks(self, y)
This method contains all pre-inversion checks.

_refcast(self, x)
Helper function to cast arrays to the internal dtype.

_set_dtype(self, t)

_set_input_dim(self, n)

copy(self, protocol=None)
Return a deep copy of the node.

get_current_train_phase(self)
Return the index of the current training phase.

get_dtype(self)
Return dtype.

get_input_dim(self)
Return input dimensions.

get_output_dim(self)
Return output dimensions.

get_remaining_train_phase(self)
Return the number of training phases still to accomplish.

get_supported_dtypes(self)
Return dtypes supported by the node as a list of dtype objects.

has_multiple_training_phases(self)
Return True if the node has multiple training phases.

is_training(self)
Return True if the node is in the training phase, False otherwise.

save(self, filename, protocol=-1)
Save a pickled serialization of the node to filename. If filename is None, return a string.

set_dtype(self, t)
Set internal structures' dtype.

set_input_dim(self, n)
Set input dimensions.

set_output_dim(self, n)
Set output dimensions.

Static Methods

[hide private]

Inherited from Node

is_invertible()
Return True if the node can be inverted, False otherwise.

is_trainable()
Return True if the node can be trained, False otherwise.

Properties

[hide private]

Inherited from object: __class__

Inherited from Node

_train_seq
List of tuples:

dtype
dtype

input_dim
Input dimensions

output_dim
Output dimensions

supported_dtypes
Supported dtypes

Method Details

[hide private]

init(self, input_dim=None, output_dim=None, dtype=None, svd=False, reduce=False, var_rel=1e-12, var_abs=1e-15, var_part=None)
(Constructor)

The number of principal components to be kept can be specified as 'output_dim' directly (e.g. 'output_dim=10' means 10 components are kept) or by the fraction of variance to be explained (e.g. 'output_dim=0.95' means that as many components as necessary will be kept in order to explain 95% of the input variance).

Other Keyword Arguments:

svd -- if True use Singular Value Decomposition instead of the: standard eigenvalue problem solver. Use it when PCANode complains about singular covariance matrices
reduce -- Keep only those principal components which have a variance: larger than 'var_abs' and a variance relative to the first principal component larger than 'var_rel' and a variance relative to total variance larger than 'var_part' (set var_part to None or 0 for no filtering). Note: when the 'reduce' switch is enabled, the actual number of principal components (self.output_dim) may be different from that set when creating the instance.

Overrides: object.__init__

_adjust_output_dim(self)

Return the eigenvector range and set the output dim if required.

This is used if the output dimensions is smaller than the input dimension (so only the larger eigenvectors have to be kept).

_check_output(self, y)

Overrides: Node._check_output

_execute(self, x, n=None)

Project the input on the first 'n' principal components. If 'n' is not set, use all available components.

Overrides: Node._execute

_inverse(self, y, n=None)

Project 'y' to the input space using the first 'n' components. If 'n' is not set, use all available components.

Overrides: Node._inverse

_set_output_dim(self, n)

Overrides: Node._set_output_dim

_stop_training(self, debug=False)

Stop the training phase.

Keyword arguments:

debug=True if stop_training fails because of singular cov: matrices, the singular matrices itselves are stored in self.cov_mtx and self.dcov_mtx to be examined.

Overrides: Node._stop_training

_train(self, x)

Overrides: Node._train

execute(self, x, n=None)

Project the input on the first 'n' principal components. If 'n' is not set, use all available components.

Overrides: Node.execute

get_explained_variance(self)

Return the fraction of the original variance that can be explained by self._output_dim PCA components. If for example output_dim has been set to 0.95, the explained variance could be something like 0.958... Note that if output_dim was explicitly set to be a fixed number of components, there is no way to calculate the explained variance.

get_projmatrix(self, transposed=1)

Return the projection matrix.

get_recmatrix(self, transposed=1)

Return the back-projection matrix (i.e. the reconstruction matrix).

inverse(self, y, n=None)

Project 'y' to the input space using the first 'n' components. If 'n' is not set, use all available components.

Overrides: Node.inverse

stop_training(self, debug=False)

Stop the training phase.

Keyword arguments:

debug=True if stop_training fails because of singular cov: matrices, the singular matrices itselves are stored in self.cov_mtx and self.dcov_mtx to be examined.

Overrides: Node.stop_training

train(self, x)

Update the internal structures according to the input data x.

x is a matrix having different variables on different columns and observations on the rows.

By default, subclasses should overwrite _train to implement their training phase. The docstring of the _train method overwrites this docstring.

Note: a subclass supporting multiple training phases should implement the same signature for all the training phases and document the meaning of the arguments in the _train method doc-string. Having consistent signatures is a requirement to use the node in a flow.

Overrides: Node.train

Class PCANode

__init__(self, input_dim=None, output_dim=None, dtype=None, svd=False, reduce=False, var_rel=1e-12, var_abs=1e-15, var_part=None) (Constructor)

_adjust_output_dim(self)

_check_output(self, y)

_execute(self, x, n=None)

_inverse(self, y, n=None)

_set_output_dim(self, n)

_stop_training(self, debug=False)

_train(self, x)

execute(self, x, n=None)

get_explained_variance(self)

get_projmatrix(self, transposed=1)

get_recmatrix(self, transposed=1)

inverse(self, y, n=None)

stop_training(self, debug=False)

train(self, x)

init(self, input_dim=None, output_dim=None, dtype=None, svd=False, reduce=False, var_rel=1e-12, var_abs=1e-15, var_part=None)
(Constructor)