Package mdp :: Package nodes :: Class FeatureAgglomerationScikitsLearnNode
[hide private]
[frames] | no frames]

Class FeatureAgglomerationScikitsLearnNode



Agglomerate features.

This node has been automatically generated by wrapping the ``sklearn.cluster.hierarchical.FeatureAgglomeration`` class
from the ``sklearn`` library.  The wrapped instance can be accessed
through the ``scikits_alg`` attribute.

Similar to AgglomerativeClustering, but recursively merges features
instead of samples.

Read more in the :ref:`User Guide <hierarchical_clustering>`.

**Parameters**

n_clusters : int, default 2
    The number of clusters to find.

connectivity : array-like or callable, optional
    Connectivity matrix. Defines for each feature the neighboring
    features following a given structure of the data.
    This can be a connectivity matrix itself or a callable that transforms
    the data into a connectivity matrix, such as derived from
    kneighbors_graph. Default is None, i.e, the
    hierarchical clustering algorithm is unstructured.

affinity : string or callable, default "euclidean"
    Metric used to compute the linkage. Can be "euclidean", "l1", "l2",
    "manhattan", "cosine", or 'precomputed'.
    If linkage is "ward", only "euclidean" is accepted.

memory : Instance of joblib.Memory or string, optional
    Used to cache the output of the computation of the tree.
    By default, no caching is done. If a string is given, it is the
    path to the caching directory.

n_components : int (optional)
    Number of connected components. If None the number of connected
    components is estimated from the connectivity matrix.
    NOTE: This parameter is now directly determined from the connectivity
    matrix and will be removed in 0.18

compute_full_tree : bool or 'auto', optional, default "auto"
    Stop early the construction of the tree at n_clusters. This is
    useful to decrease computation time if the number of clusters is
    not small compared to the number of features. This option is
    useful only when specifying a connectivity matrix. Note also that
    when varying the number of clusters and using caching, it may
    be advantageous to compute the full tree.

linkage : {"ward", "complete", "average"}, optional, default "ward"
    Which linkage criterion to use. The linkage criterion determines which
    distance to use between sets of features. The algorithm will merge
    the pairs of cluster that minimize this criterion.

    - ward minimizes the variance of the clusters being merged.
    - average uses the average of the distances of each feature of
      the two sets.
    - complete or maximum linkage uses the maximum distances between
      all features of the two sets.

pooling_func : callable, default np.mean
    This combines the values of agglomerated features into a single
    value, and should accept an array of shape [M, N] and the keyword
    argument `axis=1`, and reduce it to an array of size [M].

**Attributes**

``labels_`` : array-like, (n_features,)
    cluster labels for each feature.

``n_leaves_`` : int
    Number of leaves in the hierarchical tree.

``n_components_`` : int
    The estimated number of connected components in the graph.

``children_`` : array-like, shape (n_nodes-1, 2)
    The children of each non-leaf node. Values less than `n_features`
    correspond to leaves of the tree which are the original samples.
    A node `i` greater than or equal to `n_features` is a non-leaf
    node and has children `children_[i - n_features]`. Alternatively
    at the i-th iteration, children[i][0] and children[i][1]
    are merged to form node `n_features + i`

Instance Methods [hide private]
 
__init__(self, input_dim=None, output_dim=None, dtype=None, **kwargs)
Agglomerate features.
 
_execute(self, x)
 
_get_supported_dtypes(self)
Return the list of dtypes supported by this node. The types can be specified in any format allowed by numpy.dtype.
 
_stop_training(self, **kwargs)
Concatenate the collected data in a single array.
 
execute(self, x)
Transform a new matrix using the built clustering
 
stop_training(self, **kwargs)
Fit the hierarchical clustering on the data

Inherited from unreachable.newobject: __long__, __native__, __nonzero__, __unicode__, next

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

    Inherited from Cumulator
 
_train(self, *args)
Collect all input data in a list.
 
train(self, *args)
Collect all input data in a list.
    Inherited from Node
 
__add__(self, other)
 
__call__(self, x, *args, **kwargs)
Calling an instance of Node is equivalent to calling its execute method.
 
__repr__(self)
repr(x)
 
__str__(self)
str(x)
 
_check_input(self, x)
 
_check_output(self, y)
 
_check_train_args(self, x, *args, **kwargs)
 
_get_train_seq(self)
 
_if_training_stop_training(self)
 
_inverse(self, x)
 
_pre_execution_checks(self, x)
This method contains all pre-execution checks.
 
_pre_inversion_checks(self, y)
This method contains all pre-inversion checks.
 
_refcast(self, x)
Helper function to cast arrays to the internal dtype.
 
_set_dtype(self, t)
 
_set_input_dim(self, n)
 
_set_output_dim(self, n)
 
copy(self, protocol=None)
Return a deep copy of the node.
 
get_current_train_phase(self)
Return the index of the current training phase.
 
get_dtype(self)
Return dtype.
 
get_input_dim(self)
Return input dimensions.
 
get_output_dim(self)
Return output dimensions.
 
get_remaining_train_phase(self)
Return the number of training phases still to accomplish.
 
get_supported_dtypes(self)
Return dtypes supported by the node as a list of dtype objects.
 
has_multiple_training_phases(self)
Return True if the node has multiple training phases.
 
inverse(self, y, *args, **kwargs)
Invert y.
 
is_training(self)
Return True if the node is in the training phase, False otherwise.
 
save(self, filename, protocol=-1)
Save a pickled serialization of the node to filename. If filename is None, return a string.
 
set_dtype(self, t)
Set internal structures' dtype.
 
set_input_dim(self, n)
Set input dimensions.
 
set_output_dim(self, n)
Set output dimensions.
Static Methods [hide private]
 
is_invertible()
Return True if the node can be inverted, False otherwise.
 
is_trainable()
Return True if the node can be trained, False otherwise.
Properties [hide private]

Inherited from object: __class__

    Inherited from Node
  _train_seq
List of tuples:
  dtype
dtype
  input_dim
Input dimensions
  output_dim
Output dimensions
  supported_dtypes
Supported dtypes
Method Details [hide private]

__init__(self, input_dim=None, output_dim=None, dtype=None, **kwargs)
(Constructor)

 

Agglomerate features.

This node has been automatically generated by wrapping the ``sklearn.cluster.hierarchical.FeatureAgglomeration`` class
from the ``sklearn`` library.  The wrapped instance can be accessed
through the ``scikits_alg`` attribute.

Similar to AgglomerativeClustering, but recursively merges features
instead of samples.

Read more in the :ref:`User Guide <hierarchical_clustering>`.

**Parameters**

n_clusters : int, default 2
    The number of clusters to find.

connectivity : array-like or callable, optional
    Connectivity matrix. Defines for each feature the neighboring
    features following a given structure of the data.
    This can be a connectivity matrix itself or a callable that transforms
    the data into a connectivity matrix, such as derived from
    kneighbors_graph. Default is None, i.e, the
    hierarchical clustering algorithm is unstructured.

affinity : string or callable, default "euclidean"
    Metric used to compute the linkage. Can be "euclidean", "l1", "l2",
    "manhattan", "cosine", or 'precomputed'.
    If linkage is "ward", only "euclidean" is accepted.

memory : Instance of joblib.Memory or string, optional
    Used to cache the output of the computation of the tree.
    By default, no caching is done. If a string is given, it is the
    path to the caching directory.

n_components : int (optional)
    Number of connected components. If None the number of connected
    components is estimated from the connectivity matrix.
    NOTE: This parameter is now directly determined from the connectivity
    matrix and will be removed in 0.18

compute_full_tree : bool or 'auto', optional, default "auto"
    Stop early the construction of the tree at n_clusters. This is
    useful to decrease computation time if the number of clusters is
    not small compared to the number of features. This option is
    useful only when specifying a connectivity matrix. Note also that
    when varying the number of clusters and using caching, it may
    be advantageous to compute the full tree.

linkage : {"ward", "complete", "average"}, optional, default "ward"
    Which linkage criterion to use. The linkage criterion determines which
    distance to use between sets of features. The algorithm will merge
    the pairs of cluster that minimize this criterion.

    - ward minimizes the variance of the clusters being merged.
    - average uses the average of the distances of each feature of
      the two sets.
    - complete or maximum linkage uses the maximum distances between
      all features of the two sets.

pooling_func : callable, default np.mean
    This combines the values of agglomerated features into a single
    value, and should accept an array of shape [M, N] and the keyword
    argument `axis=1`, and reduce it to an array of size [M].

**Attributes**

``labels_`` : array-like, (n_features,)
    cluster labels for each feature.

``n_leaves_`` : int
    Number of leaves in the hierarchical tree.

``n_components_`` : int
    The estimated number of connected components in the graph.

``children_`` : array-like, shape (n_nodes-1, 2)
    The children of each non-leaf node. Values less than `n_features`
    correspond to leaves of the tree which are the original samples.
    A node `i` greater than or equal to `n_features` is a non-leaf
    node and has children `children_[i - n_features]`. Alternatively
    at the i-th iteration, children[i][0] and children[i][1]
    are merged to form node `n_features + i`

Overrides: object.__init__

_execute(self, x)

 
Overrides: Node._execute

_get_supported_dtypes(self)

 
Return the list of dtypes supported by this node. The types can be specified in any format allowed by numpy.dtype.
Overrides: Node._get_supported_dtypes

_stop_training(self, **kwargs)

 
Concatenate the collected data in a single array.
Overrides: Node._stop_training

execute(self, x)

 

Transform a new matrix using the built clustering

This node has been automatically generated by wrapping the sklearn.cluster.hierarchical.FeatureAgglomeration class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

X : array-like, shape = [n_samples, n_features] or [n_features]
A M by N array of M observations in N dimensions or a length M array of M one-dimensional observations.
pooling_func : callable, default=np.mean
This combines the values of agglomerated features into a single value, and should accept an array of shape [M, N] and the keyword argument axis=1, and reduce it to an array of size [M].

Returns

Y : array, shape = [n_samples, n_clusters] or [n_clusters]
The pooled values for each feature cluster.
Overrides: Node.execute

is_invertible()
Static Method

 
Return True if the node can be inverted, False otherwise.
Overrides: Node.is_invertible
(inherited documentation)

is_trainable()
Static Method

 
Return True if the node can be trained, False otherwise.
Overrides: Node.is_trainable

stop_training(self, **kwargs)

 

Fit the hierarchical clustering on the data

This node has been automatically generated by wrapping the sklearn.cluster.hierarchical.FeatureAgglomeration class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

X : array-like, shape = [n_samples, n_features]
The data

Returns

self

Overrides: Node.stop_training