mdp.nodes.MiniBatchKMeansScikitsLearnNode

This node has been automatically generated by wrapping the sklearn.cluster.k_means_.MiniBatchKMeans class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

init(self, input_dim=None, output_dim=None, dtype=None, **kwargs)
(Constructor)

Mini-Batch K-Means clustering

Parameters

n_clusters : int, optional, default: 8

The number of clusters to form as well as the number of centroids to generate.

max_iter : int, optional

Maximum number of iterations over the complete dataset before stopping independently of any early stopping criterion heuristics.

max_no_improvement : int, default: 10

Control early stopping based on the consecutive number of mini batches that does not yield an improvement on the smoothed inertia.

To disable convergence detection based on inertia, set max_no_improvement to None.

tol : float, default: 0.0

Control early stopping based on the relative center changes as measured by a smoothed, variance-normalized of the mean center squared position changes. This early stopping heuristics is closer to the one used for the batch variant of the algorithms but induces a slight computational and memory overhead over the inertia heuristic.

To disable convergence detection based on normalized center change, set tol to 0.0 (default).

batch_size : int, optional, default: 100

Size of the mini batches.

init_size : int, optional, default: 3 * batch_size

Number of samples to randomly sample for speeding up the initialization (sometimes at the expense of accuracy): the only algorithm is initialized by running a batch KMeans on a random subset of the data. This needs to be larger than n_clusters.

init : {'k-means++', 'random' or an ndarray}, default: 'k-means++'

Method for initialization, defaults to 'k-means++':

'k-means++' : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details.

'random': choose k observations (rows) at random from data for the initial centroids.

If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.

n_init : int, default=3

Number of random initializations that are tried. In contrast to KMeans, the algorithm is only run once, using the best of the n_init initializations as measured by inertia.

compute_labels : boolean, default=True

Compute label assignment and inertia for the complete dataset once the minibatch optimization has converged in fit.

random_state : integer or numpy.RandomState, optional

The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.

reassignment_ratio : float, default: 0.01

Control the fraction of the maximum number of counts for a center to be reassigned. A higher value means that low count centers are more easily reassigned, which means that the model will take longer to converge, but should converge in a better clustering.

verbose : boolean, optional

Verbosity mode.

Attributes

cluster_centers_ : array, [n_clusters, n_features]: Coordinates of cluster centers

labels_ :

Labels of each point (if compute_labels is set to True).

inertia_ : float: The value of the inertia criterion associated with the chosen partition (if compute_labels is set to True). The inertia is defined as the sum of square distances of samples to their nearest neighbor.

Notes

See http://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf

Overrides: object.__init__

execute(self, x)

Transform X to a cluster-distance space.

In the new space, each dimension is the distance to the cluster centers. Note that even if X is sparse, the array returned by transform will typically be dense.

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]: New data to transform.

Returns

X_new : array, shape [n_samples, k]: X transformed in the new space.

Overrides: Node.execute

Class MiniBatchKMeansScikitsLearnNode

init(self, input_dim=None, output_dim=None, dtype=None, **kwargs)
(Constructor)

_execute(self, x)

_get_supported_dtypes(self)

_stop_training(self, **kwargs)

execute(self, x)

is_invertible()
Static Method

is_trainable()
Static Method

stop_training(self, **kwargs)

Class MiniBatchKMeansScikitsLearnNode

__init__(self, input_dim=None, output_dim=None, dtype=None, **kwargs) (Constructor)

_execute(self, x)

_get_supported_dtypes(self)

_stop_training(self, **kwargs)

execute(self, x)

is_invertible() Static Method

is_trainable() Static Method

stop_training(self, **kwargs)

init(self, input_dim=None, output_dim=None, dtype=None, **kwargs)
(Constructor)

is_invertible()
Static Method

is_trainable()
Static Method