Package mdp :: Package nodes :: Class MeanShiftScikitsLearnNode
[hide private]
[frames] | no frames]

Class MeanShiftScikitsLearnNode



Mean shift clustering using a flat kernel.

This node has been automatically generated by wrapping the ``sklearn.cluster.mean_shift_.MeanShift`` class
from the ``sklearn`` library.  The wrapped instance can be accessed
through the ``scikits_alg`` attribute.

Mean shift clustering aims to discover "blobs" in a smooth density of
samples. It is a centroid-based algorithm, which works by updating
candidates for centroids to be the mean of the points within a given
region. These candidates are then filtered in a post-processing stage to
eliminate near-duplicates to form the final set of centroids.

Seeding is performed using a binning technique for scalability.

Read more in the :ref:`User Guide <mean_shift>`.

**Parameters**

bandwidth : float, optional
    Bandwidth used in the RBF kernel.

    If not given, the bandwidth is estimated using
    sklearn.cluster.estimate_bandwidth; see the documentation for that
    function for hints on scalability (see also the Notes, below).

seeds : array, shape=[n_samples, n_features], optional
    Seeds used to initialize kernels. If not set,
    the seeds are calculated by clustering.get_bin_seeds
    with bandwidth as the grid size and default values for
    other parameters.

bin_seeding : boolean, optional
    If true, initial kernel locations are not locations of all
    points, but rather the location of the discretized version of
    points, where points are binned onto a grid whose coarseness
    corresponds to the bandwidth. Setting this option to True will speed
    up the algorithm because fewer seeds will be initialized.
    default value: False
    Ignored if seeds argument is not None.

min_bin_freq : int, optional
   To speed up the algorithm, accept only those bins with at least
   min_bin_freq points as seeds. If not defined, set to 1.

cluster_all : boolean, default True
    If true, then all points are clustered, even those orphans that are
    not within any kernel. Orphans are assigned to the nearest kernel.
    If false, then orphans are given cluster label -1.

n_jobs : int
    The number of jobs to use for the computation. This works by computing
    each of the n_init runs in parallel.

    If -1 all CPUs are used. If 1 is given, no parallel computing code is
    used at all, which is useful for debugging. For n_jobs below -1,
    (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one
    are used.

**Attributes**

``cluster_centers_`` : array, [n_clusters, n_features]
    Coordinates of cluster centers.

``labels_`` :

    - Labels of each point.


**Notes**


Scalability:


Because this implementation uses a flat kernel and
a Ball Tree to look up members of each kernel, the complexity will is
to O(T*n*log(n)) in lower dimensions, with n the number of samples
and T the number of points. In higher dimensions the complexity will
tend towards O(T*n^2).

Scalability can be boosted by using fewer seeds, for example by using
a higher value of min_bin_freq in the get_bin_seeds function.

Note that the estimate_bandwidth function is much less scalable than the
mean shift algorithm and will be the bottleneck if it is used.

**References**


Dorin Comaniciu and Peter Meer, "Mean Shift: A robust approach toward
feature space analysis". IEEE Transactions on Pattern Analysis and
Machine Intelligence. 2002. pp. 603-619.

Instance Methods [hide private]
 
__init__(self, input_dim=None, output_dim=None, dtype=None, **kwargs)
Mean shift clustering using a flat kernel.
 
_execute(self, x)
 
_get_supported_dtypes(self)
Return the list of dtypes supported by this node. The types can be specified in any format allowed by numpy.dtype.
 
_stop_training(self, **kwargs)
Concatenate the collected data in a single array.
 
execute(self, x)
Predict the closest cluster each sample in X belongs to.
 
stop_training(self, **kwargs)
Perform clustering.

Inherited from unreachable.newobject: __long__, __native__, __nonzero__, __unicode__, next

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

    Inherited from Cumulator
 
_train(self, *args)
Collect all input data in a list.
 
train(self, *args)
Collect all input data in a list.
    Inherited from Node
 
__add__(self, other)
 
__call__(self, x, *args, **kwargs)
Calling an instance of Node is equivalent to calling its execute method.
 
__repr__(self)
repr(x)
 
__str__(self)
str(x)
 
_check_input(self, x)
 
_check_output(self, y)
 
_check_train_args(self, x, *args, **kwargs)
 
_get_train_seq(self)
 
_if_training_stop_training(self)
 
_inverse(self, x)
 
_pre_execution_checks(self, x)
This method contains all pre-execution checks.
 
_pre_inversion_checks(self, y)
This method contains all pre-inversion checks.
 
_refcast(self, x)
Helper function to cast arrays to the internal dtype.
 
_set_dtype(self, t)
 
_set_input_dim(self, n)
 
_set_output_dim(self, n)
 
copy(self, protocol=None)
Return a deep copy of the node.
 
get_current_train_phase(self)
Return the index of the current training phase.
 
get_dtype(self)
Return dtype.
 
get_input_dim(self)
Return input dimensions.
 
get_output_dim(self)
Return output dimensions.
 
get_remaining_train_phase(self)
Return the number of training phases still to accomplish.
 
get_supported_dtypes(self)
Return dtypes supported by the node as a list of dtype objects.
 
has_multiple_training_phases(self)
Return True if the node has multiple training phases.
 
inverse(self, y, *args, **kwargs)
Invert y.
 
is_training(self)
Return True if the node is in the training phase, False otherwise.
 
save(self, filename, protocol=-1)
Save a pickled serialization of the node to filename. If filename is None, return a string.
 
set_dtype(self, t)
Set internal structures' dtype.
 
set_input_dim(self, n)
Set input dimensions.
 
set_output_dim(self, n)
Set output dimensions.
Static Methods [hide private]
 
is_invertible()
Return True if the node can be inverted, False otherwise.
 
is_trainable()
Return True if the node can be trained, False otherwise.
Properties [hide private]

Inherited from object: __class__

    Inherited from Node
  _train_seq
List of tuples:
  dtype
dtype
  input_dim
Input dimensions
  output_dim
Output dimensions
  supported_dtypes
Supported dtypes
Method Details [hide private]

__init__(self, input_dim=None, output_dim=None, dtype=None, **kwargs)
(Constructor)

 

Mean shift clustering using a flat kernel.

This node has been automatically generated by wrapping the ``sklearn.cluster.mean_shift_.MeanShift`` class
from the ``sklearn`` library.  The wrapped instance can be accessed
through the ``scikits_alg`` attribute.

Mean shift clustering aims to discover "blobs" in a smooth density of
samples. It is a centroid-based algorithm, which works by updating
candidates for centroids to be the mean of the points within a given
region. These candidates are then filtered in a post-processing stage to
eliminate near-duplicates to form the final set of centroids.

Seeding is performed using a binning technique for scalability.

Read more in the :ref:`User Guide <mean_shift>`.

**Parameters**

bandwidth : float, optional
    Bandwidth used in the RBF kernel.

    If not given, the bandwidth is estimated using
    sklearn.cluster.estimate_bandwidth; see the documentation for that
    function for hints on scalability (see also the Notes, below).

seeds : array, shape=[n_samples, n_features], optional
    Seeds used to initialize kernels. If not set,
    the seeds are calculated by clustering.get_bin_seeds
    with bandwidth as the grid size and default values for
    other parameters.

bin_seeding : boolean, optional
    If true, initial kernel locations are not locations of all
    points, but rather the location of the discretized version of
    points, where points are binned onto a grid whose coarseness
    corresponds to the bandwidth. Setting this option to True will speed
    up the algorithm because fewer seeds will be initialized.
    default value: False
    Ignored if seeds argument is not None.

min_bin_freq : int, optional
   To speed up the algorithm, accept only those bins with at least
   min_bin_freq points as seeds. If not defined, set to 1.

cluster_all : boolean, default True
    If true, then all points are clustered, even those orphans that are
    not within any kernel. Orphans are assigned to the nearest kernel.
    If false, then orphans are given cluster label -1.

n_jobs : int
    The number of jobs to use for the computation. This works by computing
    each of the n_init runs in parallel.

    If -1 all CPUs are used. If 1 is given, no parallel computing code is
    used at all, which is useful for debugging. For n_jobs below -1,
    (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one
    are used.

**Attributes**

``cluster_centers_`` : array, [n_clusters, n_features]
    Coordinates of cluster centers.

``labels_`` :

    - Labels of each point.


**Notes**


Scalability:


Because this implementation uses a flat kernel and
a Ball Tree to look up members of each kernel, the complexity will is
to O(T*n*log(n)) in lower dimensions, with n the number of samples
and T the number of points. In higher dimensions the complexity will
tend towards O(T*n^2).

Scalability can be boosted by using fewer seeds, for example by using
a higher value of min_bin_freq in the get_bin_seeds function.

Note that the estimate_bandwidth function is much less scalable than the
mean shift algorithm and will be the bottleneck if it is used.

**References**


Dorin Comaniciu and Peter Meer, "Mean Shift: A robust approach toward
feature space analysis". IEEE Transactions on Pattern Analysis and
Machine Intelligence. 2002. pp. 603-619.

Overrides: object.__init__

_execute(self, x)

 
Overrides: Node._execute

_get_supported_dtypes(self)

 
Return the list of dtypes supported by this node. The types can be specified in any format allowed by numpy.dtype.
Overrides: Node._get_supported_dtypes

_stop_training(self, **kwargs)

 
Concatenate the collected data in a single array.
Overrides: Node._stop_training

execute(self, x)

 

Predict the closest cluster each sample in X belongs to.

This node has been automatically generated by wrapping the sklearn.cluster.mean_shift_.MeanShift class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

X : {array-like, sparse matrix}, shape=[n_samples, n_features]
New data to predict.

Returns

labels : array, shape [n_samples,]
Index of the cluster each sample belongs to.
Overrides: Node.execute

is_invertible()
Static Method

 
Return True if the node can be inverted, False otherwise.
Overrides: Node.is_invertible
(inherited documentation)

is_trainable()
Static Method

 
Return True if the node can be trained, False otherwise.
Overrides: Node.is_trainable

stop_training(self, **kwargs)

 

Perform clustering.

This node has been automatically generated by wrapping the sklearn.cluster.mean_shift_.MeanShift class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

X : array-like, shape=[n_samples, n_features]
Samples to cluster.
Overrides: Node.stop_training