Node List

Full API documentation: nodes

class mdp.nodes.PCANode

Filter the input data through the most significatives of its principal components.

Internal variables of interest

self.avg
Mean of the input data (available after training).
self.v
Transposed of the projection matrix (available after training).
self.d
Variance corresponding to the PCA components (eigenvalues of the covariance matrix).
self.explained_variance
When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.

More information about Principal Component Analysis, a.k.a. discrete Karhunen-Loeve transform can be found among others in I.T. Jolliffe, Principal Component Analysis, Springer-Verlag (1986).

Full API documentation: PCANode

class mdp.nodes.WhiteningNode

Whiten the input data by filtering it through the most significatives of its principal components. All output signals have zero mean, unit variance and are decorrelated.

Internal variables of interest

self.avg
Mean of the input data (available after training).
self.v
Transpose of the projection matrix (available after training).
self.d
Variance corresponding to the PCA components (eigenvalues of the covariance matrix).
self.explained_variance
When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.

Full API documentation: WhiteningNode

class mdp.nodes.NIPALSNode

Perform Principal Component Analysis using the NIPALS algorithm. This algorithm is particularyl useful if you have more variable than observations, or in general when the number of variables is huge and calculating a full covariance matrix may be unfeasable. It’s also more efficient of the standard PCANode if you expect the number of significant principal components to be a small. In this case setting output_dim to be a certain fraction of the total variance, say 90%, may be of some help.

Internal variables of interest

self.avg
Mean of the input data (available after training).
self.d
Variance corresponding to the PCA components.
self.v
Transposed of the projection matrix (available after training).
self.explained_variance
When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.

Reference for NIPALS (Nonlinear Iterative Partial Least Squares): Wold, H. Nonlinear estimation by iterative least squares procedures. in David, F. (Editor), Research Papers in Statistics, Wiley, New York, pp 411-444 (1966).

More information about Principal Component Analysis, a.k.a. discrete Karhunen-Loeve transform can be found among others in I.T. Jolliffe, Principal Component Analysis, Springer-Verlag (1986).

Original code contributed by: Michael Schmuker, Susanne Lezius, and Farzad Farkhooi (2008).

Full API documentation: NIPALSNode

class mdp.nodes.FastICANode

Perform Independent Component Analysis using the FastICA algorithm. Note that FastICA is a batch-algorithm. This means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.

FastICA does not support the telescope mode (the convergence criterium is not robust in telescope mode).

Reference: Aapo Hyvarinen (1999). Fast and Robust Fixed-Point Algorithms for Independent Component Analysis IEEE Transactions on Neural Networks, 10(3):626-634.

Internal variables of interest

self.white
The whitening node used for preprocessing.
self.filters
The ICA filters matrix (this is the transposed of the projection matrix after whitening).
self.convergence
The value of the convergence threshold.

History:

  • 1.4.1998 created for Matlab by Jarmo Hurri, Hugo Gavert, Jaakko Sarela, and Aapo Hyvarinen
  • 7.3.2003 modified for Python by Thomas Wendler
  • 3.6.2004 rewritten and adapted for scipy and MDP by MDP’s authors
  • 25.5.2005 now independent from scipy. Requires Numeric or numarray
  • 26.6.2006 converted to numpy
  • 14.9.2007 updated to Matlab version 2.5

Full API documentation: FastICANode

class mdp.nodes.CuBICANode

Perform Independent Component Analysis using the CuBICA algorithm. Note that CuBICA is a batch-algorithm, which means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.

As an alternative to this batch mode you might consider the telescope mode (see the docs of the __init__ method).

Reference: Blaschke, T. and Wiskott, L. (2003). CuBICA: Independent Component Analysis by Simultaneous Third- and Fourth-Order Cumulant Diagonalization. IEEE Transactions on Signal Processing, 52(5), pp. 1250-1256.

Internal variables of interest

self.white
The whitening node used for preprocessing.
self.filters
The ICA filters matrix (this is the transposed of the projection matrix after whitening).
self.convergence
The value of the convergence threshold.

Full API documentation: CuBICANode

class mdp.nodes.TDSEPNode

Perform Independent Component Analysis using the TDSEP algorithm. Note that TDSEP, as implemented in this Node, is an online algorithm, i.e. it is suited to be trained on huge data sets, provided that the training is done sending small chunks of data for each time.

Reference: Ziehe, Andreas and Muller, Klaus-Robert (1998). TDSEP an efficient algorithm for blind separation using time structure. in Niklasson, L, Boden, M, and Ziemke, T (Editors), Proc. 8th Int. Conf. Artificial Neural Networks (ICANN 1998).

Internal variables of interest

self.white
The whitening node used for preprocessing.
self.filters
The ICA filters matrix (this is the transposed of the projection matrix after whitening).
self.convergence
The value of the convergence threshold.

Full API documentation: TDSEPNode

class mdp.nodes.JADENode

Perform Independent Component Analysis using the JADE algorithm. Note that JADE is a batch-algorithm. This means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.

JADE does not support the telescope mode.

Main references:

  • Cardoso, Jean-Francois and Souloumiac, Antoine (1993). Blind beamforming for non Gaussian signals. Radar and Signal Processing, IEE Proceedings F, 140(6): 362-370.
  • Cardoso, Jean-Francois (1999). High-order contrasts for independent component analysis. Neural Computation, 11(1): 157-192.

Original code contributed by: Gabriel Beckers (2008).

History:

  • May 2005 version 1.8 for MATLAB released by Jean-Francois Cardoso
  • Dec 2007 MATLAB version 1.8 ported to Python/NumPy by Gabriel Beckers
  • Feb 15 2008 Python/NumPy version adapted for MDP by Gabriel Beckers

Full API documentation: JADENode

class mdp.nodes.SFANode

Extract the slowly varying components from the input data. More information about Slow Feature Analysis can be found in Wiskott, L. and Sejnowski, T.J., Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715-770 (2002).

Instance variables of interest

self.avg
Mean of the input data (available after training)
self.sf
Matrix of the SFA filters (available after training)
self.d
Delta values corresponding to the SFA components (generalized eigenvalues). [See the docs of the get_eta_values method for more information]

Special arguments for constructor

include_last_sample

If False the train method discards the last sample in every chunk during training when calculating the covariance matrix. The last sample is in this case only used for calculating the covariance matrix of the derivatives. The switch should be set to False if you plan to train with several small chunks. For example we can split a sequence (index is time):

x_1 x_2 x_3 x_4

in smaller parts like this:

x_1 x_2
x_2 x_3
x_3 x_4

The SFANode will see 3 derivatives for the temporal covariance matrix, and the first 3 points for the spatial covariance matrix. Of course you will need to use a generator that connects the small chunks (the last sample needs to be sent again in the next chunk). If include_last_sample was True, depending on the generator you use, you would either get:

x_1 x_2
x_2 x_3
x_3 x_4

in which case the last sample of every chunk would be used twice when calculating the covariance matrix, or:

x_1 x_2
x_3 x_4

in which case you loose the derivative between x_3 and x_2.

If you plan to train with a single big chunk leave include_last_sample to the default value, i.e. True.

You can even change this behaviour during training. Just set the corresponding switch in the train method.

Full API documentation: SFANode

class mdp.nodes.SFA2Node

Get an input signal, expand it in the space of inhomogeneous polynomials of degree 2 and extract its slowly varying components. The get_quadratic_form method returns the input-output function of one of the learned unit as a QuadraticForm object. See the documentation of mdp.utils.QuadraticForm for additional information.

More information about Slow Feature Analysis can be found in Wiskott, L. and Sejnowski, T.J., Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715-770 (2002).

Full API documentation: SFA2Node

class mdp.nodes.ISFANode

Perform Independent Slow Feature Analysis on the input data.

Internal variables of interest

self.RP
The global rotation-permutation matrix. This is the filter applied on input_data to get output_data
self.RPC
The complete global rotation-permutation matrix. This is a matrix of dimension input_dim x input_dim (the ‘outer space’ is retained)
self.covs

A mdp.utils.MultipleCovarianceMatrices instance containing the current time-delayed covariance matrices of the input_data. After convergence the uppermost output_dim x output_dim submatrices should be almost diagonal.

self.covs[n-1] is the covariance matrix relative to the n-th time-lag

Note: they are not cleared after convergence. If you need to free some memory, you can safely delete them with:

>>> del self.covs
self.initial_contrast
A dictionary with the starting contrast and the SFA and ICA parts of it.
self.final_contrast
Like the above but after convergence.

Note: If you intend to use this node for large datasets please have a look at the stop_training method documentation for speeding things up.

References: Blaschke, T. , Zito, T., and Wiskott, L. (2007). Independent Slow Feature Analysis and Nonlinear Blind Source Separation. Neural Computation 19(4):994-1021 (2007) http://itb.biologie.hu-berlin.de/~wiskott/Publications/BlasZitoWisk2007-ISFA-NeurComp.pdf

Full API documentation: ISFANode

class mdp.nodes.XSFANode

Perform Non-linear Blind Source Separation using Slow Feature Analysis.

This node is designed to iteratively extract statistically independent sources from (in principle) arbitrary invertible nonlinear mixtures. The method relies on temporal correlations in the sources and consists of a combination of nonlinear SFA and a projection algorithm. More details can be found in the reference given below (once it’s published).

The node has multiple training phases. The number of training phases depends on the number of sources that must be extracted. The recommended way of training this node is through a container flow:

>>> flow = mdp.Flow([XSFANode()])
>>> flow.train(x)

doing so will automatically train all training phases. The argument x to the Flow.train method can be an array or a list of iterables (see the section about Iterators in the MDP tutorial for more info).

If the number of training samples is large, you may run into memory problems: use data iterators and chunk training to reduce memory usage.

If you need to debug training and/or execution of this node, the suggested approach is to use the capabilities of BiMDP. For example:

>>> flow = mdp.Flow([XSFANode()])
>>> tr_filename = bimdp.show_training(flow=flow, data_iterators=x)
>>> ex_filename, out = bimdp.show_execution(flow, x=x)

this will run training and execution with bimdp inspection. Snapshots of the internal flow state for each training phase and execution step will be opened in a web brower and presented as a slideshow.

References: Sprekeler, H., Zito, T., and Wiskott, L. (2009). An Extension of Slow Feature Analysis for Nonlinear Blind Source Separation. Journal of Machine Learning Research. http://cogprints.org/7056/1/SprekelerZitoWiskott-Cogprints-2010.pdf

Full API documentation: XSFANode

class mdp.nodes.FDANode

Perform a (generalized) Fisher Discriminant Analysis of its input. It is a supervised node that implements FDA using a generalized eigenvalue approach.

FDANode has two training phases and is supervised so make sure to pay attention to the following points when you train it:

  • call the train method with two arguments: the input data and the labels (see the doc string of the train method for details).
  • if you are training the node by hand, call the train method twice.
  • if you are training the node using a flow (recommended), the only argument to Flow.train must be a list of (data_point, label) tuples or an iterator returning lists of such tuples, not a generator. The Flow.train function can be called just once as usual, since it takes care of rewinding the iterator to perform the second training step.

More information on Fisher Discriminant Analysis can be found for example in C. Bishop, Neural Networks for Pattern Recognition, Oxford Press, pp. 105-112.

Internal variables of interest

self.avg
Mean of the input data (available after training)
self.v
Transposed of the projection matrix, so that output = dot(input-self.avg, self.v) (available after training).

Full API documentation: FDANode

class mdp.nodes.FANode

Perform Factor Analysis.

The current implementation should be most efficient for long data sets: the sufficient statistics are collected in the training phase, and all EM-cycles are performed at its end.

The execute method returns the Maximum A Posteriori estimate of the latent variables. The generate_input method generates observations from the prior distribution.

Internal variables of interest

self.mu
Mean of the input data (available after training)
self.A
Generating weights (available after training)
self.E_y_mtx
Weights for Maximum A Posteriori inference
self.sigma
Vector of estimated variance of the noise for all input components

More information about Factor Analysis can be found in Max Welling’s classnotes: http://www.ics.uci.edu/~welling/classnotes/classnotes.html , in the chapter ‘Linear Models’.

Full API documentation: FANode

class mdp.nodes.RBMNode

Restricted Boltzmann Machine node. An RBM is an undirected probabilistic network with binary variables. The graph is bipartite into observed (visible) and hidden (latent) variables.

By default, the execute method returns the probability of one of the hiden variables being equal to 1 given the input.

Use the sample_v method to sample from the observed variables given a setting of the hidden variables, and sample_h to do the opposite. The energy method can be used to compute the energy of a given setting of all variables.

The network is trained by Contrastive Divergence, as described in Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1711-1800

Internal variables of interest

self.w
Generative weights between hidden and observed variables
self.bv
bias vector of the observed variables
self.bh
bias vector of the hidden variables

For more information on RBMs, see Geoffrey E. Hinton (2007) Boltzmann machine. Scholarpedia, 2(5):1668

Full API documentation: RBMNode

class mdp.nodes.RBMWithLabelsNode

Restricted Boltzmann Machine with softmax labels. An RBM is an undirected probabilistic network with binary variables. In this case, the node is partitioned into a set of observed (visible) variables, a set of hidden (latent) variables, and a set of label variables (also observed), only one of which is active at any time. The node is able to learn associations between the visible variables and the labels.

By default, the execute method returns the probability of one of the hiden variables being equal to 1 given the input.

Use the sample_v method to sample from the observed variables (visible and labels) given a setting of the hidden variables, and sample_h to do the opposite. The energy method can be used to compute the energy of a given setting of all variables.

The network is trained by Contrastive Divergence, as described in Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1711-1800

Internal variables of interest:

self.w
Generative weights between hidden and observed variables
self.bv
bias vector of the observed variables
self.bh
bias vector of the hidden variables

For more information on RBMs with labels, see

  • Geoffrey E. Hinton (2007) Boltzmann machine. Scholarpedia, 2(5):1668.
  • Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554.

Full API documentation: RBMWithLabelsNode

class mdp.nodes.GrowingNeuralGasNode

Learn the topological structure of the input data by building a corresponding graph approximation.

The algorithm expands on the original Neural Gas algorithm (see mdp.nodes NeuralGasNode) in that the algorithm adds new nodes are added to the graph as more data becomes available. Im this way, if the growth rate is appropriate, one can avoid overfitting or underfitting the data.

More information about the Growing Neural Gas algorithm can be found in B. Fritzke, A Growing Neural Gas Network Learns Topologies, in G. Tesauro, D. S. Touretzky, and T. K. Leen (editors), Advances in Neural Information Processing Systems 7, pages 625-632. MIT Press, Cambridge MA, 1995.

Attributes and methods of interest

  • graph – The corresponding mdp.graph.Graph object

Full API documentation: GrowingNeuralGasNode

class mdp.nodes.LLENode

Perform a Locally Linear Embedding analysis on the data.

Internal variables of interest

self.training_projection
The LLE projection of the training data (defined when training finishes).
self.desired_variance
variance limit used to compute intrinsic dimensionality.

Based on the algorithm outlined in An Introduction to Locally Linear Embedding by L. Saul and S. Roweis, using improvements suggested in Locally Linear Embedding for Classification by D. deRidder and R.P.W. Duin.

References: Roweis, S. and Saul, L., Nonlinear dimensionality reduction by locally linear embedding, Science 290 (5500), pp. 2323-2326, 2000.

Original code contributed by: Jake VanderPlas, University of Washington,

Full API documentation: LLENode

class mdp.nodes.HLLENode

Perform a Hessian Locally Linear Embedding analysis on the data.

Internal variables of interest

self.training_projection
the HLLE projection of the training data (defined when training finishes)
self.desired_variance
variance limit used to compute intrinsic dimensionality.

Implementation based on algorithm outlined in Donoho, D. L., and Grimes, C., Hessian Eigenmaps: new locally linear embedding techniques for high-dimensional data, Proceedings of the National Academy of Sciences 100(10): 5591-5596, 2003.

Original code contributed by: Jake Vanderplas, University of Washington

Full API documentation: HLLENode

class mdp.nodes.LinearRegressionNode

Compute least-square, multivariate linear regression on the input data, i.e., learn coefficients b_j so that:

y_i = b_0 + b_1 x_1 + ... b_N x_N ,

for i = 1 ... M, minimizes the square error given the training x‘s and y‘s.

This is a supervised learning node, and requires input data x and target data y to be supplied during training (see train docstring).

Internal variables of interest

self.beta
The coefficients of the linear regression

Full API documentation: LinearRegressionNode

class mdp.nodes.QuadraticExpansionNode

Perform expansion in the space formed by all linear and quadratic monomials. QuadraticExpansionNode() is equivalent to a PolynomialExpansionNode(2)

Full API documentation: QuadraticExpansionNode

class mdp.nodes.PolynomialExpansionNode

Perform expansion in a polynomial space.

Full API documentation: PolynomialExpansionNode

class mdp.nodes.RBFExpansionNode

Expand input space with Gaussian Radial Basis Functions (RBFs).

The input data is filtered through a set of unnormalized Gaussian filters, i.e.:

y_j = exp(-0.5/s_j * ||x - c_j||^2)

for isotropic RBFs, or more in general:

y_j = exp(-0.5 * (x-c_j)^T S^-1 (x-c_j))

for anisotropic RBFs.

Full API documentation: RBFExpansionNode

class mdp.nodes.GeneralExpansionNode

Expands the input signal x according to a list [f_0, ... f_k] of functions.

Each function f_i should take the whole two-dimensional array x as input and output another two-dimensional array. Moreover the output dimension should depend only on the input dimension. The output of the node is [f_0[x], ... f_k[x]], that is, the concatenation of each one of the outputs f_i[x].

Original code contributed by Alberto Escalante.

Full API documentation: GeneralExpansionNode

class mdp.nodes.GrowingNeuralGasExpansionNode

Perform a trainable radial basis expansion, where the centers and sizes of the basis functions are learned through a growing neural gas.

positions of RBFs
position of the nodes of the neural gas
sizes of the RBFs
mean distance to the neighbouring nodes.

Important: Adjust the maximum number of nodes to control the dimension of the expansion.

More information on this expansion type can be found in: B. Fritzke. Growing cell structures-a self-organizing network for unsupervised and supervised learning. Neural Networks 7, p. 1441–1460 (1994).

Full API documentation: GrowingNeuralGasExpansionNode

class mdp.nodes.NeuralGasNode

Learn the topological structure of the input data by building a corresponding graph approximation (original Neural Gas algorithm).

The Neural Gas algorithm was originally published in Martinetz, T. and Schulten, K.: A “Neural-Gas” Network Learns Topologies. In Kohonen, T., Maekisara, K., Simula, O., and Kangas, J. (eds.), Artificial Neural Networks. Elsevier, North-Holland., 1991.

Attributes and methods of interest

  • graph – The corresponding mdp.graph.Graph object
  • max_epochs - maximum number of epochs until which to train.

Full API documentation: NeuralGasNode

class mdp.nodes.SignumClassifier

This classifier node classifies as 1 if the sum of the data points is positive and as -1 if the data point is negative

Full API documentation: SignumClassifier

class mdp.nodes.PerceptronClassifier

A simple perceptron with input_dim input nodes.

Full API documentation: PerceptronClassifier

class mdp.nodes.SimpleMarkovClassifier

A simple version of a Markov classifier. It can be trained on a vector of tuples the label being the next element in the testing data.

Full API documentation: SimpleMarkovClassifier

class mdp.nodes.DiscreteHopfieldClassifier

Node for simulating a simple discrete Hopfield model

Full API documentation: DiscreteHopfieldClassifier

class mdp.nodes.KMeansClassifier

Employs K-Means Clustering for a given number of centroids.

Full API documentation: KMeansClassifier

class mdp.nodes.NormalizeNode

Make input signal meanfree and unit variance

Full API documentation: NormalizeNode

class mdp.nodes.GaussianClassifier

Perform a supervised Gaussian classification.

Given a set of labelled data, the node fits a gaussian distribution to each class.

Full API documentation: GaussianClassifier

class mdp.nodes.NearestMeanClassifier

Nearest-Mean classifier.

Full API documentation: NearestMeanClassifier

class mdp.nodes.KNNClassifier

K-Nearest-Neighbour Classifier.

Full API documentation: KNNClassifier

class mdp.nodes.EtaComputerNode

Compute the eta values of the normalized training data.

The delta value of a signal is a measure of its temporal variation, and is defined as the mean of the derivative squared, i.e. delta(x) = mean(dx/dt(t)^2). delta(x) is zero if x is a constant signal, and increases if the temporal variation of the signal is bigger.

The eta value is a more intuitive measure of temporal variation, defined as:

eta(x) = T/(2*pi) * sqrt(delta(x))

If x is a signal of length T which consists of a sine function that accomplishes exactly N oscillations, then eta(x)=N.

EtaComputerNode normalizes the training data to have unit variance, such that it is possible to compare the temporal variation of two signals independently from their scaling.

Reference: Wiskott, L. and Sejnowski, T.J. (2002). Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715-770.

Important: if a data chunk is tlen data points long, this node is going to consider only the first tlen-1 points together with their derivatives. This means in particular that the variance of the signal is not computed on all data points. This behavior is compatible with that of SFANode.

This is an analysis node, i.e. the data is analyzed during training and the results are stored internally. Use the method get_eta to access them.

Full API documentation: EtaComputerNode

class mdp.nodes.HitParadeNode

Collect the first n local maxima and minima of the training signal which are separated by a minimum gap d.

This is an analysis node, i.e. the data is analyzed during training and the results are stored internally. Use the get_maxima and get_minima methods to access them.

Full API documentation: HitParadeNode

class mdp.nodes.NoiseNode

Inject multiplicative or additive noise into the input data.

Original code contributed by Mathias Franzius.

Full API documentation: NoiseNode

class mdp.nodes.NormalNoiseNode

Special version of NoiseNode for Gaussian additive noise.

Unlike NoiseNode it does not store a noise function reference but simply uses numx_rand.normal.

Full API documentation: NormalNoiseNode

class mdp.nodes.TimeFramesNode

Copy delayed version of the input signal on the space dimensions.

For example, for time_frames=3 and gap=2:

[ X(1) Y(1)        [ X(1) Y(1) X(3) Y(3) X(5) Y(5)
  X(2) Y(2)          X(2) Y(2) X(4) Y(4) X(6) Y(6)
  X(3) Y(3)   -->    X(3) Y(3) X(5) Y(5) X(7) Y(7)
  X(4) Y(4)          X(4) Y(4) X(6) Y(6) X(8) Y(8)
  X(5) Y(5)          ...  ...  ...  ...  ...  ... ]
  X(6) Y(6)
  X(7) Y(7)
  X(8) Y(8)
  ...  ...  ]

It is not always possible to invert this transformation (the transformation is not surjective. However, the pseudo_inverse method does the correct thing when it is indeed possible.

Full API documentation: TimeFramesNode

class mdp.nodes.TimeDelayNode

Copy delayed version of the input signal on the space dimensions.

For example, for time_frames=3 and gap=2:

[ X(1) Y(1)        [ X(1) Y(1)   0    0    0    0
  X(2) Y(2)          X(2) Y(2)   0    0    0    0
  X(3) Y(3)   -->    X(3) Y(3) X(1) Y(1)   0    0
  X(4) Y(4)          X(4) Y(4) X(2) Y(2)   0    0
  X(5) Y(5)          X(5) Y(5) X(3) Y(3) X(1) Y(1)
  X(6) Y(6)          ...  ...  ...  ...  ...  ... ]
  X(7) Y(7)
  X(8) Y(8)
  ...  ...  ]

This node provides similar functionality as the TimeFramesNode, only that it performs a time embedding into the past rather than into the future.

See TimeDelaySlidingWindowNode for a sliding window delay node for application in a non-batch manner.

Original code contributed by Sebastian Hoefer. Dec 31, 2010

Full API documentation: TimeDelayNode

class mdp.nodes.TimeDelaySlidingWindowNode

TimeDelaySlidingWindowNode is an alternative to TimeDelayNode which should be used for online learning/execution. Whereas the TimeDelayNode works in a batch manner, for online application a sliding window is necessary which yields only one row per call.

Applied to the same data the collection of all returned rows of the TimeDelaySlidingWindowNode is equivalent to the result of the TimeDelayNode.

Original code contributed by Sebastian Hoefer. Dec 31, 2010

Full API documentation: TimeDelaySlidingWindowNode

class mdp.nodes.CutoffNode

Node to cut off values at specified bounds.

Works similar to numpy.clip, but also works when only a lower or upper bound is specified.

Full API documentation: CutoffNode

class mdp.nodes.AdaptiveCutoffNode

Node which uses the data history during training to learn cutoff values.

As opposed to the simple CutoffNode, a different cutoff value is learned for each data coordinate. For example if an upper cutoff fraction of 0.05 is specified, then the upper cutoff bound is set so that the upper 5% of the training data would have been clipped (in each dimension). The cutoff bounds are then applied during execution. This node also works as a HistogramNode, so the histogram data is stored.

When stop_training is called the cutoff values for each coordinate are calculated based on the collected histogram data.

Full API documentation: AdaptiveCutoffNode

class mdp.nodes.HistogramNode

Node which stores a history of the data during its training phase.

The data history is stored in self.data_hist and can also be deleted to free memory. Alternatively it can be automatically pickled to disk.

Note that data is only stored during training.

Full API documentation: HistogramNode

class mdp.nodes.IdentityNode

Execute returns the input data and the node is not trainable.

This node can be instantiated and is for example useful in complex network layouts.

Full API documentation: IdentityNode

class mdp.nodes.Convolution2DNode

Convolve input data with filter banks.

The filters argument specifies a set of 2D filters that are convolved with the input data during execution. Convolution can be selected to be executed by linear filtering of the data, or in the frequency domain using a Discrete Fourier Transform.

Input data can be given as 3D data, each row being a 2D array to be convolved with the filters, or as 2D data, in which case the input_shape argument must be specified.

This node depends on scipy.

Full API documentation: Convolution2DNode

class mdp.nodes.ShogunSVMClassifier

The ShogunSVMClassifier works as a wrapper class for accessing the SHOGUN machine learning toolbox for support vector machines.

Most kernel machines and linear classifier should work with this class.

Currently, distance machines such as the K-means classifier are not supported yet.

Information to paramters and additional options can be found on http://www.shogun-toolbox.org/

Note that some parts in this classifier might receive some refinement in the future.

This node depends on shogun.

Full API documentation: ShogunSVMClassifier

class mdp.nodes.LibSVMClassifier

The LibSVMClassifier class acts as a wrapper around the LibSVM library for support vector machines.

Information to the parameters can be found on http://www.csie.ntu.edu.tw/~cjlin/libsvm/

The class provides access to change kernel and svm type with a text string.

Additionally self.parameter is exposed which allows to change all other svm parameters directly.

This node depends on libsvm.

Full API documentation: LibSVMClassifier

class mdp.nodes.SGDRegressorScikitsLearnNode

Linear model fitted by minimizing a regularized empirical loss with SGD

This node has been automatically generated by wrapping the sklearn.linear_model.sparse.stochastic_gradient.SGDRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

SGD stands for Stochastic Gradient Descent: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).

The regularizer is a penalty added to the loss function that shrinks model parameters towards the zero vector using either the squared euclidean norm L2 or the absolute norm L1 or a combination of both (Elastic Net). If the parameter update crosses the 0.0 value because of the regularizer, the update is truncated to 0.0 to allow for learning sparse models and achieve online feature selection.

This implementation works with data represented as dense numpy arrays of floating point values for the features.

Parameters

loss : str, ‘squared_loss’ or ‘huber’
The loss function to be used. Defaults to ‘squared_loss’ which refers to the ordinary least squares fit. ‘huber’ is an epsilon insensitive loss function for robust regression.
penalty : str, ‘l2’ or ‘l1’ or ‘elasticnet’
The penalty (aka regularization term) to be used. Defaults to ‘l2’ which is the standard regularizer for linear SVM models. ‘l1’ and ‘elasticnet’ migh bring sparsity to the model (feature selection) not achievable with ‘l2’.
alpha : float
Constant that multiplies the regularization term. Defaults to 0.0001
rho : float
The Elastic Net mixing parameter, with 0 < rho <= 1. Defaults to 0.85.
fit_intercept: bool
Whether the intercept should be estimated or not. If False, the data is assumed to be already centered. Defaults to True.
n_iter: int
The number of passes over the training data (aka epochs). Defaults to 5.
shuffle: bool
Whether or not the training data should be shuffled after each epoch. Defaults to False.
seed: int, optional
The seed of the pseudo random number generator to use when shuffling the data.
verbose: integer, optional
The verbosity level
p : float
Epsilon in the epsilon insensitive huber loss function; only if loss==’huber’.
learning_rate : string, optional

The learning rate:

  • constant: eta = eta0
  • optimal: eta = 1.0/(t+t0)
  • invscaling: eta = eta0 / pow(t, power_t) [default]
eta0 : double, optional
The initial learning rate [default 0.01].
power_t : double, optional
The exponent for inverse scaling learning rate [default 0.25].

Attributes

coef_ : array, shape = [n_features]
Weights asigned to the features.
intercept_ : array, shape = [1]
The intercept term.

Examples

>>> import numpy as np
>>> from sklearn import linear_model
>>> n_samples, n_features = 10, 5
>>> np.random.seed(0)
>>> y = np.random.randn(n_samples)
>>> X = np.random.randn(n_samples, n_features)
>>> clf = linear_model.sparse.SGDRegressor()
>>> clf.fit(X, y)
SGDRegressor(alpha=0.0001, eta0=0.01, fit_intercept=True,
       learning_rate='invscaling', loss='squared_loss', n_iter=5, p=0.1,
       penalty='l2', power_t=0.25, rho=1.0, seed=0, shuffle=False,
       verbose=0)

See also

RidgeRegression, ElasticNet, Lasso, SVR

Full API documentation: SGDRegressorScikitsLearnNode

class mdp.nodes.RFEScikitsLearnNode

Feature ranking with recursive feature elimination.

This node has been automatically generated by wrapping the sklearn.feature_selection.rfe.RFE class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and weights are assigned to each one of them. Then, features whose absolute weights are the smallest are pruned from the current set features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached.

Parameters

estimator : object

A supervised learning estimator with a fit method that updates a coef_ attribute that holds the fitted parameters. The first dimension of the coef_ array must be equal to the number of features of the input dataset of the estimator. Important features must correspond to high absolute values in the coef_ array.

For instance, this is the case for most supervised learning algorithms such as Support Vector Classifiers and Generalized Linear Models from the svm and linear_model modules.

n_features_to_select : int
The number of features to select.
step : int or float, optional (default=1)
If greater than or equal to 1, then step corresponds to the (integer) number of features to remove at each iteration. If within (0.0, 1.0), then step corresponds to the percentage (rounded down) of features to remove at each iteration.

Attributes

n_features_ : int
The number of selected features.
support_ : array of shape [n_features]
The mask of selected features.
ranking_ : array of shape [n_features]
The feature ranking, such that ranking_[i] corresponds to the ranking position of the i-th feature. Selected (i.e., estimated best) features are assigned rank 1.

Examples

The following example shows how to retrieve the 5 right informative features in the Friedman #1 dataset.

>>> from sklearn.datasets import make_friedman1
>>> from sklearn.feature_selection import RFE
>>> from sklearn.svm import SVR
>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
>>> estimator = SVR(kernel="linear")
>>> selector = RFE(estimator, 5, step=1)
>>> selector = selector.fit(X, y)
>>> selector.support_ 
array([ True,  True,  True,  True,  True,
        False, False, False, False, False], dtype=bool)
>>> selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])

References

[1]Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., “Gene selection for cancer classification using support vector machines”, Mach. Learn., 46(1-3), 389–422, 2002.

Full API documentation: RFEScikitsLearnNode

class mdp.nodes.NMFScikitsLearnNode

Non-Negative matrix factorization by Projected Gradient (NMF)

This node has been automatically generated by wrapping the sklearn.decomposition.nmf.NMF class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

X: array, [n_samples, n_features]
Data the model will be fit to.
n_components: int or None
Number of components, if n_components is not set all components are kept
init: ‘nndsvd’ | ‘nndsvda’ | ‘nndsvdar’ | int | RandomState

Method used to initialize the procedure. Default: ‘nndsvdar’ Valid options:

- 'nndsvd': Nonnegative Double Singular Value Decomposition (NNDSVD)
-     initialization (better for sparseness)
- 'nndsvda': NNDSVD with zeros filled with the average of X
-     (better when sparsity is not desired)
- 'nndsvdar': NNDSVD with zeros filled with small random values
-     (generally faster, less accurate alternative to NNDSVDa
-     for when sparsity is not desired)
- int seed or RandomState: non-negative random matrices
sparseness: ‘data’ | ‘components’ | None, default: None
Where to enforce sparsity in the model.
beta: double, default: 1
Degree of sparseness, if sparseness is not None. Larger values mean more sparseness.
eta: double, default: 0.1
Degree of correctness to mantain, if sparsity is not None. Smaller values mean larger error.
tol: double, default: 1e-4
Tolerance value used in stopping conditions.
max_iter: int, default: 200
Number of iterations to compute.
nls_max_iter: int, default: 2000
Number of iterations in NLS subproblem.

Attributes

components_: array, [n_components, n_features]
Non-negative components of the data
reconstruction_err_: number
Frobenius norm of the matrix difference between the training data and the reconstructed data from the fit produced by the model. || X - WH ||_2

Examples

>>> import numpy as np
>>> X = np.array([[1,1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])
>>> from sklearn.decomposition import ProjectedGradientNMF
>>> model = ProjectedGradientNMF(n_components=2, init=0)
>>> model.fit(X) 
ProjectedGradientNMF(beta=1, eta=0.1,
           init=<mtrand.RandomState object at 0x...>, max_iter=200,
           n_components=2, nls_max_iter=2000, sparseness=None, tol=0.0001)
>>> model.components_
array([[ 0.77032744,  0.11118662],
       [ 0.38526873,  0.38228063]])
>>> model.reconstruction_err_ 
0.00746...
>>> model = ProjectedGradientNMF(n_components=2, init=0,
...                              sparseness='components')
>>> model.fit(X) 
ProjectedGradientNMF(beta=1, eta=0.1,
           init=<mtrand.RandomState object at 0x...>, max_iter=200,
           n_components=2, nls_max_iter=2000, sparseness='components',
           tol=0.0001)
>>> model.components_
array([[ 1.67481991,  0.29614922],
       [-0.        ,  0.4681982 ]])
>>> model.reconstruction_err_ 
0.513...

Notes

This implements C.-J. Lin. Projected gradient methods for non-negative matrix factorization. Neural Computation, 19(2007), 2756-2779. http://www.csie.ntu.edu.tw/~cjlin/nmf/

NNDSVD is introduced in C. Boutsidis, E. Gallopoulos: SVD based initialization: A head start for nonnegative matrix factorization - Pattern Recognition, 2008 http://www.cs.rpi.edu/~boutsc/files/nndsvd.pdf

Full API documentation: NMFScikitsLearnNode

class mdp.nodes.SelectFprScikitsLearnNode

Filter : Select the pvalues below alpha based on a FPR test: False

This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectFpr class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Full API documentation: SelectFprScikitsLearnNode

class mdp.nodes.VBGMMScikitsLearnNode

Variational Inference for the Gaussian Mixture Model

This node has been automatically generated by wrapping the sklearn.mixture.dpgmm.VBGMM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Variational inference for a Gaussian mixture model probability distribution. This class allows for easy and efficient inference of an approximate posterior distribution over the parameters of a gaussian mixture model with a fixed number of components.

Initialization is with normally-distributed means and identity covariance, for proper convergence.

Parameters

n_components: int, optional
Number of mixture components. Defaults to 1.
cvtype: string (read-only), optional
String describing the type of covariance parameters to use. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. Defaults to ‘diag’.
alpha: float, optional
Real number representing the concentration parameter of the dirichlet distribution. Intuitively, the higher the value of alpha the more likely the variational mixture of gaussians model will use all components it can. Defaults to 1.

Attributes

cvtype : string (read-only)
String describing the type of covariance parameters used by the DP-GMM. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’.
n_features : int
Dimensionality of the Gaussians.
n_components : int (read-only)
Number of mixture components.
weights : array, shape (n_components,)
Mixing weights for each mixture component.
means : array, shape (n_components, n_features)
Mean parameters for each mixture component.
precisions : array

Precision (inverse covariance) parameters for each mixture component. The shape depends on cvtype:

  • (n_components,) if ‘spherical’,
  • (n_features, n_features) if ‘tied’,
  • (n_components, n_features) if ‘diag’,
  • (n_components, n_features, n_features) if ‘full’
converged_ : bool
True when convergence was reached in fit(), False otherwise.

Methods

decode(X)
Find most likely mixture components for each point in X.
eval(X)
Compute a lower-bound of the log likelihood of X under the model and an approximate posterior distribution over mixture components.
fit(X)
Estimate the posterior of themodel parameters from X using the variational mean-field algorithm.
predict(X)
Like decode, find most likely mixtures components for each observation in X.
rvs(n=1)
Generate n samples from the posterior for the model.
score(X)
Compute the log likelihood of X under the model.

See Also

GMM : Finite gaussian mixture model fit with EM

DPGMM : Ininite gaussian mixture model, using the dirichlet process, fit with a variational algorithm

Full API documentation: VBGMMScikitsLearnNode

class mdp.nodes.SparseBaseLibSVMScikitsLearnNode

Full API documentation: SparseBaseLibSVMScikitsLearnNode

class mdp.nodes.VectorizerScikitsLearnNode

Convert a collection of raw documents to a matrix

This node has been automatically generated by wrapping the sklearn.feature_extraction.text.Vectorizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Equivalent to CountVectorizer followed by TfidfTransformer.

Full API documentation: VectorizerScikitsLearnNode

class mdp.nodes.DPGMMScikitsLearnNode

Variational Inference for the Infinite Gaussian Mixture Model.

This node has been automatically generated by wrapping the sklearn.mixture.dpgmm.DPGMM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

DPGMM stands for Dirichlet Process Gaussian Mixture Model, and it is an infinite mixture model with the Dirichlet Process as a prior distribution on the number of clusters. In practice the approximate inference algorithm uses a truncated distribution with a fixed maximum number of components, but almost always the number of components actually used depends on the data.

Stick-breaking Representation of a Gaussian mixture model probability distribution. This class allows for easy and efficient inference of an approximate posterior distribution over the parameters of a gaussian mixture model with a variable number of components (smaller than the truncation parameter n_components).

Initialization is with normally-distributed means and identity covariance, for proper convergence.

Parameters

n_components: int, optional
Number of mixture components. Defaults to 1.
cvtype: string (read-only), optional
String describing the type of covariance parameters to use. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. Defaults to ‘diag’.
alpha: float, optional
Real number representing the concentration parameter of the dirichlet process. Intuitively, the Dirichler Process is as likely to start a new cluster for a point as it is to add that point to a cluster with alpha elements. A higher alpha means more clusters, as the expected number of clusters is alpha*log(N). Defaults to 1.
thresh : float, optional
Convergence threshold.

Attributes

cvtype : string (read-only)
String describing the type of covariance parameters used by the DP-GMM. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’.
n_features : int
Dimensionality of the Gaussians.
n_components : int (read-only)
Number of mixture components.
weights : array, shape (n_components,)
Mixing weights for each mixture component.
means : array, shape (n_components, n_features)
Mean parameters for each mixture component.
precisions : array

Precision (inverse covariance) parameters for each mixture component. The shape depends on cvtype:

- (`n_components`,)                             if 'spherical',
- (`n_features`, `n_features`)              if 'tied',
- (`n_components`, `n_features`)                if 'diag',
- (`n_components`, `n_features`, `n_features`)  if 'full'
converged_ : bool
True when convergence was reached in fit(), False otherwise.

Methods

decode(X)
Find most likely mixture components for each point in X.
eval(X)
Compute a lower-bound of the log likelihood of X under the model and an approximate posterior distribution over mixture components.
fit(X)
Estimate the posterior of themodel parameters from X using the variational mean-field algorithm.
predict(X)
Like decode, find most likely mixtures components for each observation in X.
rvs(n=1)
Generate n samples from the posterior for the model.
score(X)
Compute the log likelihood of X under the model.

See Also

GMM : Finite gaussian mixture model fit with EM

VBGMM : Finite gaussian mixture model fit with a variational algorithm, better for situations where there might be too little data to get a good estimate of the covariance matrix.

Full API documentation: DPGMMScikitsLearnNode

class mdp.nodes.LassoLarsScikitsLearnNode

Lasso model fit with Least Angle Regression a.k.a. Lars

This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.LassoLars class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

It is a Linear Model trained with an L1 prior as regularizer. lasso).

Parameters

fit_intercept : boolean
whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
verbose : boolean or integer, optional
Sets the verbosity amount
normalize : boolean, optional
If True, the regressors X are normalized
overwrite_X : boolean, optional
If True, X will not be copied Default is False
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument.
max_iter: integer, optional
Maximum number of iterations to perform.
eps: float, optional
The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems. Unlike the ‘tol’ parameter in some iterative optimization-based algorithms, this parameter does not control the tolerance of the optimization.

Attributes

coef_ : array, shape = [n_features]
parameter vector (w in the fomulation formula)
intercept_ : float
independent term in decision function.

Examples

>>> from sklearn import linear_model
>>> clf = linear_model.LassoLars(alpha=0.01)
>>> clf.fit([[-1, 1], [0, 0], [1, 1]], [-1, 0, -1]) 
LassoLars(alpha=0.01, eps=..., fit_intercept=True,
     max_iter=500, normalize=True, overwrite_X=False, precompute='auto',
     verbose=False)
>>> print clf.coef_ 
[ 0.         -0.963257...]

References

http://en.wikipedia.org/wiki/Least_angle_regression

See also

lars_path, Lasso

Full API documentation: LassoLarsScikitsLearnNode

class mdp.nodes.LinearModelCVScikitsLearnNode

This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.LinearModelCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Full API documentation: LinearModelCVScikitsLearnNode

class mdp.nodes.NormalizerScikitsLearnNode

Normalize samples individually to unit norm

This node has been automatically generated by wrapping the sklearn.preprocessing.Normalizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.

This transformer is able to work both with dense numpy arrays and scipy.sparse matrix (use CSR format if you want to avoid the burden of a copy / conversion).

Scaling inputs to unit norms is a common operation for text classification or clustering for instance. For instance the dot product of two l2-normalized TF-IDF vectors is the cosine similarity of the vectors and is the base similarity metric for the Vector Space Model commonly used by the Information Retrieval community.

Parameters

norm : ‘l1’ or ‘l2’, optional (‘l2’ by default)
The norm to use to normalize each non zero sample.
copy : boolean, optional, default is True
set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix).

Note

This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.

See also

sklearn.preprocessing.normalize() equivalent function without the object oriented API

Full API documentation: NormalizerScikitsLearnNode

class mdp.nodes.DictionaryLearningScikitsLearnNode

Dictionary learning

This node has been automatically generated by wrapping the sklearn.decomposition.dict_learning.DictionaryLearning class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Finds a dictionary (a set of atoms) that can best be used to represent data using a sparse code.

Solves the optimization problem:

(U^*,V^*) = argmin 0.5 || Y - U V ||_2^2 + alpha * || U ||_1
(U,V)

with || V_k ||_2 = 1 for all 0 <= k < n_atoms

Parameters

n_atoms: int,
number of dictionary elements to extract
alpha: int,
sparsity controlling parameter
max_iter: int,
maximum number of iterations to perform
tol: float,
tolerance for numerical error
fit_algorithm: {‘lars’, ‘cd’}
lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.
transform_algorithm: {‘lasso_lars’, ‘lasso_cd’, ‘lars’, ‘omp’, ‘threshold’}
Algorithm used to transform the data lars: uses the least angle regression method (linear_model.lars_path) lasso_lars: uses Lars to compute the Lasso solution lasso_cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). lasso_lars will be faster if the estimated components are sparse. omp: uses orthogonal matching pursuit to estimate the sparse solution threshold: squashes to zero all coefficients less than alpha from the projection X.T * Y
transform_n_nonzero_coefs: int, 0.1 * n_features by default
Number of nonzero coefficients to target in each column of the solution. This is only used by algorithm=’lars’ and algorithm=’omp’ and is overridden by alpha in the omp case.
transform_alpha: float, 1. by default
If algorithm=’lasso_lars’ or algorithm=’lasso_cd’, alpha is the penalty applied to the L1 norm. If algorithm=’threhold’, alpha is the absolute value of the threshold below which coefficients will be squashed to zero. If algorithm=’omp’, alpha is the tolerance parameter: the value of the reconstruction error targeted. In this case, it overrides n_nonzero_coefs.
n_jobs: int,
number of parallel jobs to run
code_init: array of shape (n_samples, n_atoms),
initial value for the code, for warm restart
dict_init: array of shape (n_atoms, n_features),
initial values for the dictionary, for warm restart

verbose:

  • degree of verbosity of the printed output
random_state: int or RandomState
Pseudo number generator state used for random sampling.

Attributes

components_: array, [n_atoms, n_features]
dictionary atoms extracted from the data
error_: array
vector of errors at each iteration

References

J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning for sparse coding (http://www.di.ens.fr/sierra/pdfs/icml09.pdf)

See also

sklearn.decomposition.SparsePCA which solves the transposed problem, finding sparse components to represent data.

Full API documentation: DictionaryLearningScikitsLearnNode

class mdp.nodes.CountVectorizerScikitsLearnNode

Convert a collection of raw documents to a matrix of token counts

This node has been automatically generated by wrapping the sklearn.feature_extraction.text.CountVectorizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

This implementation produces a sparse representation of the counts using scipy.sparse.coo_matrix.

If you do not provide an a-priori dictionary and you do not use an analyzer that does some kind of feature selection then the number of features will be equal to the vocabulary size found by analysing the data. The default analyzer does simple stop word filtering for English.

Parameters

analyzer: WordNGramAnalyzer or CharNGramAnalyzer, optional

vocabulary: dict or iterable, optional

Either a dictionary where keys are tokens and values are indices in the matrix, or an iterable over terms (in which case the indices are determined by the iteration order as per enumerate).

This is useful in order to fix the vocabulary in advance.

max_df : float in range [0.0, 1.0], optional, 1.0 by default

When building the vocabulary ignore terms that have a term frequency strictly higher than the given threshold (corpus specific stop words).

This parameter is ignored if vocabulary is not None.

max_features : optional, None by default

If not None, build a vocabulary that only consider the top max_features ordered by term frequency across the corpus.

This parameter is ignored if vocabulary is not None.

dtype: type, optional
Type of the matrix returned by fit_transform() or transform().

Full API documentation: CountVectorizerScikitsLearnNode

class mdp.nodes.ElasticNetCVScikitsLearnNode

Elastic Net model with iterative fitting along a regularization path

This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.ElasticNetCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The best model is selected by cross-validation.

Parameters

rho : float, optional
float between 0 and 1 passed to ElasticNet (scaling between l1 and l2 penalties). For rho = 0 the penalty is an L1 penalty. For rho = 1 it is an L2 penalty. For 0 < rho < 1, the penalty is a combination of L1 and L2
eps : float, optional
Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3.
n_alphas : int, optional
Number of alphas along the regularization path
alphas : numpy array, optional
List of alphas where to compute the models. If None alphas are set automatically
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument.
max_iter: int, optional
The maximum number of iterations
tol: float, optional
The tolerance for the optimization: if the updates are smaller than ‘tol’, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.
cv : integer or crossvalidation generator, optional
If an integer is passed, it is the number of fold (default 3). Specific crossvalidation objects can be passed, see sklearn.cross_validation module for the list of possible objects
verbose : bool or integer
amount of verbosity

Notes

See examples/linear_model/lasso_path_with_crossvalidation.py for an example.

To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.

The parameter rho corresponds to alpha in the glmnet R package while alpha corresponds to the lambda parameter in glmnet. More specifically, the penalty is:

alpha*rho*L1 + alpha*(1-rho)*L2

If you are interested in controlling the L1 and L2 penalty separately, keep in mind that this is equivalent to:

a*L1 + b*L2

for:

alpha = a + b and rho = a/(a+b)

Full API documentation: ElasticNetCVScikitsLearnNode

class mdp.nodes.KernelPCAScikitsLearnNode

Kernel Principal component analysis (KPCA)

This node has been automatically generated by wrapping the sklearn.decomposition.kernel_pca.KernelPCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Non-linear dimensionality reduction through the use of kernels.

Parameters

n_components: int or None
Number of components. If None, all non-zero components are kept.
kernel: “linear” | “poly” | “rbf” | “sigmoid” | “precomputed”
Kernel. Default: “linear”
degree : int, optional
Degree for poly, rbf and sigmoid kernels. Default: 3.
gamma : float, optional
Kernel coefficient for rbf and poly kernels. Default: 1/n_features.
coef0 : float, optional
Independent term in poly and sigmoid kernels.
alpha: int
Hyperparameter of the ridge regression that learns the inverse transform (when fit_inverse_transform=True). Default: 1.0
fit_inverse_transform: bool
Learn the inverse transform. (i.e. learn to find the pre-image of a point) Default: False
eigen_solver: string [‘auto’|’dense’|’arpack’]
Select eigensolver to use. If n_components is much less than the number of training samples, arpack may be more efficient than the dense eigensolver.
tol: float
convergence tolerance for arpack. Default: 0 (optimal value will be chosen by arpack)
max_iter : int
maximum number of iterations for arpack Default: None (optimal value will be chosen by arpack)

Attributes

lambdas_, alphas_:

  • Eigenvalues and eigenvectors of the centered kernel matrix

dual_coef_:

  • Inverse transform matrix

X_transformed_fit_:

  • Projection of the fitted data on the kernel principal components

Reference

Kernel PCA was intoduced in:

  • Bernhard Schoelkopf, Alexander J. Smola,
  • and Klaus-Robert Mueller. 1999. Kernel principal
  • component analysis. In Advances in kernel methods,
  • MIT Press, Cambridge, MA, USA 327-352.

Full API documentation: KernelPCAScikitsLearnNode

class mdp.nodes.SelectPercentileScikitsLearnNode

This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectPercentile class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Full API documentation: SelectPercentileScikitsLearnNode

class mdp.nodes.ScalerScikitsLearnNode

Standardize features by removing the mean and scaling to unit variance

This node has been automatically generated by wrapping the sklearn.preprocessing.Scaler class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Centering and scaling happen indepently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using the transform method.

Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual feature do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).

For instance many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered around 0 and have variance in the same order. If a feature has a variance that is orders of magnitude larger that others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.

Parameters

with_mean : boolean, True by default
If True, center the data before scaling.
with_std : boolean, True by default
If True, scale the data to unit variance (or equivalently, unit standard deviation).
copy : boolean, optional, default is True
set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix and if axis is 1).

Attributes

mean_ : array of floats with shape [n_features]
The mean value for each feature in the training set.
std_ : array of floats with shape [n_features]
The standard deviation for each feature in the training set.

See also

sklearn.preprocessing.scale() to perform centering and scaling without using the Transformer object oriented API

sklearn.decomposition.RandomizedPCA with whiten=True to further remove the linear correlation across features.

Full API documentation: ScalerScikitsLearnNode

class mdp.nodes.RidgeScikitsLearnNode

Ridge regression.

This node has been automatically generated by wrapping the sklearn.linear_model.ridge.Ridge class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

alpha : float
Small positive values of alpha improve the conditioning of the problem and reduce the variance of the estimates. Alpha corresponds to (2*C)^-1 in other linear models such as LogisticRegression or LinearSVC.
fit_intercept : boolean
Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
normalize : boolean, optional
If True, the regressors X are normalized
overwrite_X : boolean, optional
If True, X will not be copied Default is False
tol: float
Precision of the solution.

Attributes

coef_: array, shape = [n_features] or [n_responses, n_features]
Weight vector(s).

Examples

>>> from sklearn.linear_model import Ridge
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> np.random.seed(0)
>>> y = np.random.randn(n_samples)
>>> X = np.random.randn(n_samples, n_features)
>>> clf = Ridge(alpha=1.0)
>>> clf.fit(X, y)
Ridge(alpha=1.0, fit_intercept=True, normalize=False, overwrite_X=False,
   tol=0.001)

Full API documentation: RidgeScikitsLearnNode

class mdp.nodes.CCAScikitsLearnNode

CCA Canonical Correlation Analysis. CCA inherits from PLS with mode=”B” and deflation_mode=”canonical”.

This node has been automatically generated by wrapping the sklearn.pls.CCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

X: array-like of predictors, shape = [n_samples, p]
Training vectors, where n_samples in the number of samples and p is the number of predictors.
Y: array-like of response, shape = [n_samples, q]
Training vectors, where n_samples in the number of samples and q is the number of response variables.
n_components: int, (default 2).
number of components to keep.
scale: boolean, (default True)
whether to scale the data?
algorithm: str, “nipals” or “svd”
The algorithm used to estimate the weights. It will be called n_components times, i.e. once for each iteration of the outer loop.
max_iter: an integer, (default 500)
the maximum number of iterations of the NIPALS inner loop (used only if algorithm=”nipals”)
tol: non-negative real, default 1e-06.
the tolerance used in the iterative algorithm
copy: boolean
Whether the deflation be done on a copy. Let the default value to True unless you don’t care about side effects

Attributes

x_weights_: array, [p, n_components]
X block weights vectors.
y_weights_: array, [q, n_components]
Y block weights vectors.
x_loadings_: array, [p, n_components]
X block loadings vectors.
y_loadings_: array, [q, n_components]
Y block loadings vectors.
x_scores_: array, [n_samples, n_components]
X scores.
y_scores_: array, [n_samples, n_components]
Y scores.
x_rotations_: array, [p, n_components]
X block to latents rotations.
y_rotations_: array, [q, n_components]
Y block to latents rotations.

Notes

For each component k, find the weights u, v that maximizes max corr(Xk u, Yk v), such that |u| = |v| = 1

Note that it maximizes only the correlations between the scores.

The residual matrix of X (Xk+1) block is obtained by the deflation on the current X score: x_score.

The residual matrix of Y (Yk+1) block is obtained by deflation on the current Y score.

Examples

>>> from sklearn.pls import PLSCanonical, PLSRegression, CCA
>>> X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [3.,5.,4.]]
>>> Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
>>> cca = CCA(n_components=1)
>>> cca.fit(X, Y)
CCA(algorithm='nipals', copy=True, max_iter=500, n_components=1, scale=True,
  tol=1e-06)
>>> X_c, Y_c = cca.transform(X, Y)

References

Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. Technical Report 371, Department of Statistics, University of Washington, Seattle, 2000.

In french but still a reference:

Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris:

Editions Technic.

See also

PLSCanonical PLSSVD

Full API documentation: CCAScikitsLearnNode

class mdp.nodes.LarsCVScikitsLearnNode

Cross-validated Least Angle Regression model

This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.LarsCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

fit_intercept : boolean
whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
verbose : boolean or integer, optional
Sets the verbosity amount
normalize : boolean, optional
If True, the regressors X are normalized
overwrite_X : boolean, optional
If True, X will not be copied Default is False
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument.
max_iter: integer, optional
Maximum number of iterations to perform.
cv : crossvalidation generator, optional
see sklearn.cross_validation module. If None is passed, default to a 5-fold strategy
n_jobs : integer, optional
Number of CPUs to use during the cross validation. If ‘-1’, use all the CPUs
eps: float, optional
The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems.

Attributes

coef_ : array, shape = [n_features]
parameter vector (w in the fomulation formula)
intercept_ : float
independent term in decision function.
coef_path: array, shape = [n_features, n_alpha]
the varying values of the coefficients along the path

See also

lars_path, LassoLARS, LassoLarsCV

Full API documentation: LarsCVScikitsLearnNode

class mdp.nodes.ProjectedGradientNMFScikitsLearnNode

Non-Negative matrix factorization by Projected Gradient (NMF)

This node has been automatically generated by wrapping the sklearn.decomposition.nmf.ProjectedGradientNMF class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

X: array, [n_samples, n_features]
Data the model will be fit to.
n_components: int or None
Number of components, if n_components is not set all components are kept
init: ‘nndsvd’ | ‘nndsvda’ | ‘nndsvdar’ | int | RandomState

Method used to initialize the procedure. Default: ‘nndsvdar’ Valid options:

- 'nndsvd': Nonnegative Double Singular Value Decomposition (NNDSVD)
-     initialization (better for sparseness)
- 'nndsvda': NNDSVD with zeros filled with the average of X
-     (better when sparsity is not desired)
- 'nndsvdar': NNDSVD with zeros filled with small random values
-     (generally faster, less accurate alternative to NNDSVDa
-     for when sparsity is not desired)
- int seed or RandomState: non-negative random matrices
sparseness: ‘data’ | ‘components’ | None, default: None
Where to enforce sparsity in the model.
beta: double, default: 1
Degree of sparseness, if sparseness is not None. Larger values mean more sparseness.
eta: double, default: 0.1
Degree of correctness to mantain, if sparsity is not None. Smaller values mean larger error.
tol: double, default: 1e-4
Tolerance value used in stopping conditions.
max_iter: int, default: 200
Number of iterations to compute.
nls_max_iter: int, default: 2000
Number of iterations in NLS subproblem.

Attributes

components_: array, [n_components, n_features]
Non-negative components of the data
reconstruction_err_: number
Frobenius norm of the matrix difference between the training data and the reconstructed data from the fit produced by the model. || X - WH ||_2

Examples

>>> import numpy as np
>>> X = np.array([[1,1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])
>>> from sklearn.decomposition import ProjectedGradientNMF
>>> model = ProjectedGradientNMF(n_components=2, init=0)
>>> model.fit(X) 
ProjectedGradientNMF(beta=1, eta=0.1,
           init=<mtrand.RandomState object at 0x...>, max_iter=200,
           n_components=2, nls_max_iter=2000, sparseness=None, tol=0.0001)
>>> model.components_
array([[ 0.77032744,  0.11118662],
       [ 0.38526873,  0.38228063]])
>>> model.reconstruction_err_ 
0.00746...
>>> model = ProjectedGradientNMF(n_components=2, init=0,
...                              sparseness='components')
>>> model.fit(X) 
ProjectedGradientNMF(beta=1, eta=0.1,
           init=<mtrand.RandomState object at 0x...>, max_iter=200,
           n_components=2, nls_max_iter=2000, sparseness='components',
           tol=0.0001)
>>> model.components_
array([[ 1.67481991,  0.29614922],
       [-0.        ,  0.4681982 ]])
>>> model.reconstruction_err_ 
0.513...

Notes

This implements C.-J. Lin. Projected gradient methods for non-negative matrix factorization. Neural Computation, 19(2007), 2756-2779. http://www.csie.ntu.edu.tw/~cjlin/nmf/

NNDSVD is introduced in C. Boutsidis, E. Gallopoulos: SVD based initialization: A head start for nonnegative matrix factorization - Pattern Recognition, 2008 http://www.cs.rpi.edu/~boutsc/files/nndsvd.pdf

Full API documentation: ProjectedGradientNMFScikitsLearnNode

class mdp.nodes.KernelCentererScikitsLearnNode

Center a kernel matrix

This node has been automatically generated by wrapping the sklearn.preprocessing.KernelCenterer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

This is equivalent to centering phi(X) with sklearn.preprocessing.Scaler(with_std=False).

Full API documentation: KernelCentererScikitsLearnNode

class mdp.nodes.RidgeClassifierScikitsLearnNode

Classifier using Ridge regression

This node has been automatically generated by wrapping the sklearn.linear_model.ridge.RidgeClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

alpha : float
Small positive values of alpha improve the conditioning of the problem and reduce the variance of the estimates. Alpha corresponds to (2*C)^-1 in other linear models such as LogisticRegression or LinearSVC.
fit_intercept : boolean
Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
normalize : boolean, optional
If True, the regressors X are normalized

Attributes

coef_: array, shape = [n_features] or [n_classes, n_features]
Weight vector(s).

Note

For multi-class classification, n_class classifiers are trained in a one-versus-all approach.

Full API documentation: RidgeClassifierScikitsLearnNode

class mdp.nodes.BinarizerScikitsLearnNode

Binarize data (set feature values to 0 or 1) according to a threshold

This node has been automatically generated by wrapping the sklearn.preprocessing.Binarizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The default threshold is 0.0 so that any non-zero values are set to 1.0 and zeros are left untouched.

Binarization is a common operation on text count data where the analyst can decide to only consider the presence or absence of a feature rather than a quantified number of occurences for instance.

It can also be used as a pre-processing step for estimators that consider boolean random variables (e.g. modeled using the Bernoulli distribution in a Bayesian setting).

Parameters

threshold : float, optional (0.0 by default)
The lower bound that triggers feature values to be replaced by 1.0.
copy : boolean, optional, default is True
set to False to perform inplace binarization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix).

Notes

If the input is a sparse matrix, only the non-zero values are subject to update by the Binarizer class.

This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.

Full API documentation: BinarizerScikitsLearnNode

class mdp.nodes.LinearSVCScikitsLearnNode

Linear Support Vector Classification, Sparse Version

This node has been automatically generated by wrapping the sklearn.svm.sparse.classes.LinearSVC class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Similar to SVC with parameter kernel=’linear’, but uses internally liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should be faster for huge datasets.

See sklearn.svm.SVC for a complete list of parameters

Notes

For best results, this accepts a matrix in csr format (scipy.sparse.csr), but should be able to convert from any array-like object (including other sparse representations).

Full API documentation: LinearSVCScikitsLearnNode

class mdp.nodes.NeighborsClassifierScikitsLearnNode

Classifier implementing the nearest neighbors vote. (Deprecated)

This node has been automatically generated by wrapping the sklearn.neighbors.classification.NeighborsClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

DEPRECATED IN VERSION 0.9; WILL BE REMOVED IN VERSION 0.11 Please use KNeighborsClassifier or RadiusNeighborsClassifier instead.

Samples participating in the vote are either the k-nearest neighbors (for some k) or all neighbors within some fixed radius around the sample to classify.

Parameters

n_neighbors : int, optional (default = 5)
Number of neighbors to use by default for k_neighbors() queries.
radius : float, optional (default = 1.0)
Range of parameter space to use by default for :meth`radius_neighbors` queries.
algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional

Algorithm used to compute the nearest neighbors:

  • ‘ball_tree’ will use BallTree
  • ‘kd_tree’ will use scipy.spatial.cKDtree
  • ‘brute’ will use a brute-force search.
  • ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_size : int, optional (default = 30)
Leaf size passed to BallTree or cKDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.
classification_type : {‘knn_vote’, ‘radius_vote’}, optional
Type of fit to use: ‘knn_vote’ specifies a k-NN classification. ‘radius_vote’ specifies a r-NN classification. Default is ‘knn_vote’.

Examples

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import NeighborsClassifier
>>> neigh = NeighborsClassifier(n_neighbors=2)
>>> neigh.fit(X, y)
NeighborsClassifier(algorithm='auto', classification_type='knn_vote',
          leaf_size=30, n_neighbors=2, radius=1.0)
>>> print neigh.predict([[1.5]])
[0]

See also

NearestNeighbors NeighborsRegressor

Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

References

http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Full API documentation: NeighborsClassifierScikitsLearnNode

class mdp.nodes.MultinomialNBScikitsLearnNode

Naive Bayes classifier for multinomial models

This node has been automatically generated by wrapping the sklearn.naive_bayes.MultinomialNB class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf may also work.

Parameters

alpha: float, optional (default=1.0)
Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).
fit_prior: boolean
Whether to learn class prior probabilities or not. If false, a uniform prior will be used.

Methods

fit(X, y) : self
Fit the model
predict(X) : array
Predict using the model.
predict_proba(X) : array
Predict the probability of each class using the model.
predict_log_proba(X) : array
Predict the log probability of each class using the model.

Attributes

intercept_, class_log_prior_ : array, shape = [n_classes]
Log probability of each class (smoothed).
feature_log_prob_, coef_ : array, shape = [n_classes, n_features]

Empirical log probability of features given a class, P(x_i|y).

(intercept_ and coef_ are properties referring to class_log_prior_ and feature_log_prob_, respectively.)

Examples

>>> import numpy as np
>>> X = np.random.randint(5, size=(6, 100))
>>> Y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import MultinomialNB
>>> clf = MultinomialNB()
>>> clf.fit(X, Y)
MultinomialNB(alpha=1.0, fit_prior=True)
>>> print clf.predict(X[2])
[3]

References

For the rationale behind the names coef_ and intercept_, i.e. naive Bayes as a linear classifier, see J. Rennie et al. (2003), Tackling the poor assumptions of naive Bayes text classifiers, ICML.

Full API documentation: MultinomialNBScikitsLearnNode

class mdp.nodes.LassoScikitsLearnNode

Linear Model trained with L1 prior as regularizer (aka the Lasso)

This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.Lasso class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Technically the Lasso model is optimizing the same objective function as the Elastic Net with rho=1.0 (no L2 penalty).

Parameters

alpha : float, optional
Constant that multiplies the L1 term. Defaults to 1.0
fit_intercept : boolean
whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
normalize : boolean, optional
If True, the regressors X are normalized
overwrite_X : boolean, optional
If True, X will not be copied Default is False
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument.
max_iter: int, optional
The maximum number of iterations
tol: float, optional
The tolerance for the optimization: if the updates are smaller than ‘tol’, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.

Attributes

coef_ : array, shape = [n_features]
parameter vector (w in the fomulation formula)
intercept_ : float
independent term in decision function.

Examples

>>> from sklearn import linear_model
>>> clf = linear_model.Lasso(alpha=0.1)
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
Lasso(alpha=0.1, fit_intercept=True, max_iter=1000, normalize=False,
   overwrite_X=False, precompute='auto', tol=0.0001)
>>> print clf.coef_
[ 0.85  0.  ]
>>> print clf.intercept_
0.15

See also

LassoLars decomposition.sparse_encode decomposition.sparse_encode_parallel

Notes

The algorithm used to fit the model is coordinate descent.

To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.

Full API documentation: LassoScikitsLearnNode

class mdp.nodes.SelectFdrScikitsLearnNode

Filter : Select the p-values corresponding to an estimated false

This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectFdr class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Full API documentation: SelectFdrScikitsLearnNode

class mdp.nodes.NuSVRScikitsLearnNode

NuSVR for sparse matrices (csr)

This node has been automatically generated by wrapping the sklearn.svm.sparse.classes.NuSVR class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

See sklearn.svm.NuSVC for a complete list of parameters

Notes

For best results, this accepts a matrix in csr format (scipy.sparse.csr), but should be able to convert from any array-like object (including other sparse representations).

Examples

>>> from sklearn.svm.sparse import NuSVR
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> np.random.seed(0)
>>> y = np.random.randn(n_samples)
>>> X = np.random.randn(n_samples, n_features)
>>> clf = NuSVR(nu=0.1, C=1.0)
>>> clf.fit(X, y)
NuSVR(C=1.0, coef0=0.0, degree=3, epsilon=0.1, gamma=0.2, kernel='rbf',
   nu=0.1, probability=False, shrinking=True, tol=0.001)

Full API documentation: NuSVRScikitsLearnNode

class mdp.nodes.WardAgglomerationScikitsLearnNode

Feature agglomeration based on Ward hierarchical clustering

This node has been automatically generated by wrapping the sklearn.cluster.hierarchical.WardAgglomeration class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

n_clusters : int or ndarray
The number of clusters.
connectivity : sparse matrix
connectivity matrix. Defines for each feature the neigbhoring features following a given structure of the data. Defaut is None, i.e, the hiearchical agglomeration algorithm is unstructured.
memory : Instance of joblib.Memory or string
Used to cache the output of the computation of the tree. By default, no caching is done. If a string is given, it is the path to the caching directory.
copy : bool
Copy the connectivity matrix or work inplace.
n_components : int (optional)
The number of connected components in the graph defined by the connectivity matrix. If not set, it is estimated.

Methods

fit:

  • Compute the clustering of features

Attributes

children_ : array-like, shape = [n_nodes, 2]
List of the children of each nodes. Leaves of the tree do not appear.
labels_ : array [n_points]
cluster labels for each point
n_leaves_ : int
Number of leaves in the hiearchical tree.

Full API documentation: WardAgglomerationScikitsLearnNode

class mdp.nodes.SparseBaseLibLinearScikitsLearnNode

Full API documentation: SparseBaseLibLinearScikitsLearnNode

class mdp.nodes.LassoLARSScikitsLearnNode

Full API documentation: LassoLARSScikitsLearnNode

class mdp.nodes.RandomizedPCAScikitsLearnNode

Principal component analysis (PCA) using randomized SVD

This node has been automatically generated by wrapping the sklearn.decomposition.pca.RandomizedPCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Linear dimensionality reduction using approximated Singular Value Decomposition of the data and keeping only the most significant singular vectors to project the data to a lower dimensional space.

This implementation uses a randomized SVD implementation and can handle both scipy.sparse and numpy dense arrays as input.

Parameters

n_components: int
Maximum number of components to keep: default is 50.
copy: bool
If False, data passed to fit are overwritten
iterated_power: int, optional
Number of iteration for the power method. 3 by default.
whiten: bool, optional

When True (False by default) the components_ vectors are divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.

Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making there data respect some hard-wired assumptions.

Attributes

components_: array, [n_components, n_features]
Components with maximum variance.
explained_variance_ratio_: array, [n_components]
Percentage of variance explained by each of the selected components. k is not set then all components are stored and the sum of explained variances is equal to 1.0

Examples

>>> import numpy as np
>>> from sklearn.decomposition import RandomizedPCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = RandomizedPCA(n_components=2)
>>> pca.fit(X)
RandomizedPCA(copy=True, iterated_power=3, n_components=2, whiten=False)
>>> print pca.explained_variance_ratio_
[ 0.99244289  0.00755711]

See also

PCA ProbabilisticPCA

Notes

References:

  • Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions Halko, et al., 2009 (arXiv:909)
  • A randomized algorithm for the decomposition of matrices Per-Gunnar Martinsson, Vladimir Rokhlin and Mark Tygert

Full API documentation: RandomizedPCAScikitsLearnNode

class mdp.nodes.KNeighborsClassifierScikitsLearnNode

Classifier implementing the k-nearest neighbors vote.

This node has been automatically generated by wrapping the sklearn.neighbors.classification.KNeighborsClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

n_neighbors : int, optional (default = 5)
Number of neighbors to use by default for k_neighbors() queries.
weights : str or callable

weight function used in prediction. Possible values:

  • ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
  • ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
  • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

Uniform weights are used by default.

algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional

Algorithm used to compute the nearest neighbors:

  • ‘ball_tree’ will use BallTree
  • ‘kd_tree’ will use scipy.spatial.cKDtree
  • ‘brute’ will use a brute-force search.
  • ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_size : int, optional (default = 30)
Leaf size passed to BallTree or cKDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

Examples

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import KNeighborsClassifier
>>> neigh = KNeighborsClassifier(n_neighbors=2)
>>> neigh.fit(X, y) 
KNeighborsClassifier(...)
>>> print neigh.predict([[1.5]])
[0]

See also

RadiusNeighborsClassifier KNeighborsRegressor RadiusNeighborsRegressor NearestNeighbors

Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

References

http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Full API documentation: KNeighborsClassifierScikitsLearnNode

class mdp.nodes.KNeighborsRegressorScikitsLearnNode

Regression based on k-nearest neighbors.

This node has been automatically generated by wrapping the sklearn.neighbors.regression.KNeighborsRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set.

Parameters

n_neighbors : int, optional (default = 5)
Number of neighbors to use by default for k_neighbors() queries.
weights : str or callable

weight function used in prediction. Possible values:

  • ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
  • ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
  • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

Uniform weights are used by default.

algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional

Algorithm used to compute the nearest neighbors:

  • ‘ball_tree’ will use BallTree
  • ‘kd_tree’ will use scipy.spatial.cKDtree
  • ‘brute’ will use a brute-force search.
  • ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_size : int, optional (default = 30)
Leaf size passed to BallTree or cKDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

Examples

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import KNeighborsRegressor
>>> neigh = KNeighborsRegressor(n_neighbors=2)
>>> neigh.fit(X, y) 
KNeighborsRegressor(...)
>>> print neigh.predict([[1.5]])
[ 0.5]

See also

NearestNeighbors RadiusNeighborsRegressor KNeighborsClassifier RadiusNeighborsClassifier

Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

References

http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Full API documentation: KNeighborsRegressorScikitsLearnNode

class mdp.nodes.SparsePCAScikitsLearnNode

Sparse Principal Components Analysis (SparsePCA)

This node has been automatically generated by wrapping the sklearn.decomposition.sparse_pca.SparsePCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha.

Parameters

n_components: int,
Number of sparse atoms to extract.
alpha: float,
Sparsity controlling parameter. Higher values lead to sparser components.
ridge_alpha: float,
Amount of ridge shrinkage to apply in order to improve conditioning when calling the transform method.
max_iter: int,
Maximum number of iterations to perform.
tol: float,
Tolerance for the stopping condition.
method: {‘lars’, ‘cd’}
lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.
n_jobs: int,
Number of parallel jobs to run.
U_init: array of shape (n_samples, n_atoms),
Initial values for the loadings for warm restart scenarios.
V_init: array of shape (n_atoms, n_features),
Initial values for the components for warm restart scenarios.

verbose:

  • Degree of verbosity of the printed output.
random_state: int or RandomState
Pseudo number generator state used for random sampling.

Attributes

components_: array, [n_components, n_features]
Sparse components extracted from the data.
error_: array
Vector of errors at each iteration.

See also

PCA

Full API documentation: SparsePCAScikitsLearnNode

class mdp.nodes.LDAScikitsLearnNode

Linear Discriminant Analysis (LDA)

This node has been automatically generated by wrapping the sklearn.lda.LDA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

n_components: int
Number of components (< n_classes - 1)
priors : array, optional, shape = [n_classes]
Priors on classes

Attributes

means_ : array-like, shape = [n_classes, n_features]
Class means
xbar_ : float, shape = [n_features]
Over all mean
priors_ : array-like, shape = [n_classes]
Class priors (sum to 1)
covariance_ : array-like, shape = [n_features, n_features]
Covariance matrix (shared by all classes)

Examples

>>> import numpy as np
>>> from sklearn.lda import LDA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = LDA()
>>> clf.fit(X, y)
LDA(n_components=None, priors=None)
>>> print clf.predict([[-0.8, -1]])
[1]

See also

QDA

Full API documentation: LDAScikitsLearnNode

class mdp.nodes.SGDClassifierScikitsLearnNode

Linear model fitted by minimizing a regularized empirical loss with SGD.

This node has been automatically generated by wrapping the sklearn.linear_model.stochastic_gradient.SGDClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

SGD stands for Stochastic Gradient Descent: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).

The regularizer is a penalty added to the loss function that shrinks model parameters towards the zero vector using either the squared euclidean norm L2 or the absolute norm L1 or a combination of both (Elastic Net). If the parameter update crosses the 0.0 value because of the regularizer, the update is truncated to 0.0 to allow for learning sparse models and achieve online feature selection.

This implementation works with data represented as dense numpy arrays of floating point values for the features.

Parameters

loss : str, ‘hinge’ or ‘log’ or ‘modified_huber’
The loss function to be used. Defaults to ‘hinge’. The hinge loss is a margin loss used by standard linear SVM models. The ‘log’ loss is the loss of logistic regression models and can be used for probability estimation in binary classifiers. ‘modified_huber’ is another smooth loss that brings tolerance to outliers.
penalty : str, ‘l2’ or ‘l1’ or ‘elasticnet’
The penalty (aka regularization term) to be used. Defaults to ‘l2’ which is the standard regularizer for linear SVM models. ‘l1’ and ‘elasticnet’ migh bring sparsity to the model (feature selection) not achievable with ‘l2’.
alpha : float
Constant that multiplies the regularization term. Defaults to 0.0001
rho : float
The Elastic Net mixing parameter, with 0 < rho <= 1. Defaults to 0.85.
fit_intercept: bool
Whether the intercept should be estimated or not. If False, the data is assumed to be already centered. Defaults to True.
n_iter: int, optional
The number of passes over the training data (aka epochs). Defaults to 5.
shuffle: bool, optional
Whether or not the training data should be shuffled after each epoch. Defaults to False.
seed: int, optional
The seed of the pseudo random number generator to use when shuffling the data.
verbose: integer, optional
The verbosity level
n_jobs: integer, optional
The number of CPUs to use to do the OVA (One Versus All, for multi-class problems) computation. -1 means ‘all CPUs’. Defaults to 1.
learning_rate : string, optional

The learning rate:

  • constant: eta = eta0
  • optimal: eta = 1.0/(t+t0) [default]
  • invscaling: eta = eta0 / pow(t, power_t)
eta0 : double
The initial learning rate [default 0.01].
power_t : double
The exponent for inverse scaling learning rate [default 0.25].

Attributes

coef_ : array, shape = [1, n_features] if n_classes == 2 else [n_classes, n_features]

Weights assigned to the features.
intercept_ : array, shape = [1] if n_classes == 2 else [n_classes]
Constants in decision function.

Examples

>>> import numpy as np
>>> from sklearn import linear_model
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> Y = np.array([1, 1, 2, 2])
>>> clf = linear_model.SGDClassifier()
>>> clf.fit(X, Y)
SGDClassifier(alpha=0.0001, eta0=0.0, fit_intercept=True,
       learning_rate='optimal', loss='hinge', n_iter=5, n_jobs=1,
       penalty='l2', power_t=0.5, rho=1.0, seed=0, shuffle=False,
       verbose=0)
>>> print clf.predict([[-0.8, -1]])
[ 1.]

See also

LinearSVC, LogisticRegression

Full API documentation: SGDClassifierScikitsLearnNode

class mdp.nodes.MiniBatchSparsePCAScikitsLearnNode

Mini-batch Sparse Principal Components Analysis

This node has been automatically generated by wrapping the sklearn.decomposition.sparse_pca.MiniBatchSparsePCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha.

Parameters

n_components: int,
number of sparse atoms to extract
alpha: int,
Sparsity controlling parameter. Higher values lead to sparser components.
ridge_alpha: float,
Amount of ridge shrinkage to apply in order to improve conditioning when calling the transform method.
n_iter: int,
number of iterations to perform for each mini batch
callback: callable,
callable that gets invoked every five iterations
chunk_size: int,
the number of features to take in each mini batch

verbose:

  • degree of output the procedure will print
shuffle: boolean,
whether to shuffle the data before splitting it in batches
n_jobs: int,
number of parallel jobs to run, or -1 to autodetect.
method: {‘lars’, ‘cd’}
lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.
random_state: int or RandomState
Pseudo number generator state used for random sampling.

Full API documentation: MiniBatchSparsePCAScikitsLearnNode

class mdp.nodes.TfidfTransformerScikitsLearnNode

Transform a count matrix to a normalized tf or tf–idf representation

This node has been automatically generated by wrapping the sklearn.feature_extraction.text.TfidfTransformer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Tf means term-frequency while tf–idf means term-frequency times inverse document-frequency. This is a common term weighting scheme in information retrieval, that has also found good use in document classification.

The goal of using tf–idf instead of the raw frequencies of occurrence of a token in a given document is to scale down the impact of tokens that occur very frequently in a given corpus and that are hence empirically less informative than features that occur in a small fraction of the training corpus.

In the SMART notation used in IR, this class implements several tf–idf variants. Tf is always “n” (natural), idf is “t” iff use_idf is given, “n” otherwise, and normalization is “c” iff norm=’l2’, “n” iff norm=None.

Parameters

norm : ‘l1’, ‘l2’ or None, optional
Norm used to normalize term vectors. None for no normalization.
use_idf : boolean, optional
Enable inverse-document-frequency reweighting.
smooth_idf : boolean, optional
Smooth idf weights by adding one to document frequencies, as if an extra document was seen containing every term in the collection exactly once. Prevents zero divisions.

References

  1. Baeza-Yates and B. Ribeiro-Neto (2011). Modern Information Retrieval.

    Addison Wesley, pp. 68–74.

C.D. Manning, H. Schütze and P. Raghavan (2008). Introduction to
Information Retrieval. Cambridge University Press, pp. 121–125.

Full API documentation: TfidfTransformerScikitsLearnNode

class mdp.nodes.PCAScikitsLearnNode

Principal component analysis (PCA)

This node has been automatically generated by wrapping the sklearn.decomposition.pca.PCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Linear dimensionality reduction using Singular Value Decomposition of the data and keeping only the most significant singular vectors to project the data to a lower dimensional space.

This implementation uses the scipy.linalg implementation of the singular value decomposition. It only works for dense arrays and is not scalable to large dimensional data.

The time complexity of this implementation is O(n ** 3) assuming n ~ n_samples ~ n_features.

Parameters

n_components: int, none or string

Number of components to keep. if n_components is not set all components are kept:

  • n_components == min(n_samples, n_features)

if n_components == ‘mle’, Minka’s MLE is used to guess the dimension

if 0 < n_components < 1, select the number of components such that
the amount of variance that needs to be explained is greater than the percentage specified by n_components
copy: bool
If False, data passed to fit are overwritten
whiten: bool, optional

When True (False by default) the components_ vectors are divided by n_samples times singular values to ensure uncorrelated outputs with unit component-wise variances.

Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making there data respect some hard-wired assumptions.

Attributes

components_: array, [n_components, n_features]
Components with maximum variance.
explained_variance_ratio_: array, [n_components]
Percentage of variance explained by each of the selected components. k is not set then all components are stored and the sum of explained variances is equal to 1.0

Notes

For n_components=’mle’, this class uses the method of Thomas P. Minka:

Automatic Choice of Dimensionality for PCA. NIPS 2000: 598-604

Due to implementation subtleties of the Singular Value Decomposition (SVD), which is used in this implementation, running fit twice on the same matrix can lead to principal components with signs flipped (change in direction). For this reason, it is important to always use the same estimator object to transform data in a consistent fashion.

Examples

>>> import numpy as np
>>> from sklearn.decomposition import PCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = PCA(n_components=2)
>>> pca.fit(X)
PCA(copy=True, n_components=2, whiten=False)
>>> print pca.explained_variance_ratio_
[ 0.99244289  0.00755711]

See also

ProbabilisticPCA RandomizedPCA

Full API documentation: PCAScikitsLearnNode

class mdp.nodes.RFECVScikitsLearnNode

Feature ranking with recursive feature elimination and cross-validated selection of the best number of features.

This node has been automatically generated by wrapping the sklearn.feature_selection.rfe.RFECV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

estimator : object

A supervised learning estimator with a fit method that updates a coef_ attribute that holds the fitted parameters. The first dimension of the coef_ array must be equal to the number of features of the input dataset of the estimator. Important features must correspond to high absolute values in the coef_ array.

For instance, this is the case for most supervised learning algorithms such as Support Vector Classifiers and Generalized Linear Models from the svm and linear_model modules.

step : int or float, optional (default=1)
If greater than or equal to 1, then step corresponds to the (integer) number of features to remove at each iteration. If within (0.0, 1.0), then step corresponds to the percentage (rounded down) of features to remove at each iteration.
cv : int or cross-validation generator, optional (default=None)
If int, it is the number of folds. If None, 3-fold cross-validation is performed by default. Specific cross-validation objects can also be passed, see scikits.learn.cross_validation module for details.
loss_function : function, optional (default=None)
The loss function to minimize by cross-validation. If None, then the score function of the estimator is maximized.

Attributes

n_features_ : int
The number of selected features with cross-validation.
support_ : array of shape [n_features]
The mask of selected features.
ranking_ : array of shape [n_features]
The feature ranking, such that ranking_[i] corresponds to the ranking position of the i-th feature. Selected (i.e., estimated best) features are assigned rank 1.
cv_scores_: array of shape [n_subsets_of_features]
The cross-validation scores such that cv_scores_[i] corresponds to the CV score of the i-th subset of features.

Examples

The following example shows how to retrieve the a-priori not known 5 informative features in the Friedman #1 dataset.

>>> from sklearn.datasets import make_friedman1
>>> from sklearn.feature_selection import RFECV
>>> from sklearn.svm import SVR
>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
>>> estimator = SVR(kernel="linear")
>>> selector = RFECV(estimator, step=1, cv=5)
>>> selector = selector.fit(X, y)
>>> selector.support_ 
array([ True,  True,  True,  True,  True,
        False, False, False, False, False], dtype=bool)
>>> selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])

References

[1]Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., “Gene selection for cancer classification using support vector machines”, Mach. Learn., 46(1-3), 389–422, 2002.

Full API documentation: RFECVScikitsLearnNode

class mdp.nodes.LassoCVScikitsLearnNode

Lasso linear model with iterative fitting along a regularization path

This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.LassoCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The best model is selected by cross-validation.

Parameters

eps : float, optional
Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3.
n_alphas : int, optional
Number of alphas along the regularization path
alphas : numpy array, optional
List of alphas where to compute the models. If None alphas are set automatically
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument.
max_iter: int, optional
The maximum number of iterations
tol: float, optional
The tolerance for the optimization: if the updates are smaller than ‘tol’, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.
cv : integer or crossvalidation generator, optional
If an integer is passed, it is the number of fold (default 3). Specific crossvalidation objects can be passed, see sklearn.cross_validation module for the list of possible objects
verbose : bool or integer
amount of verbosity

Notes

See examples/linear_model/lasso_path_with_crossvalidation.py for an example.

To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.

Full API documentation: LassoCVScikitsLearnNode

class mdp.nodes.SelectFweScikitsLearnNode

Filter : Select the p-values corresponding to Family-wise error rate: a

This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectFwe class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Full API documentation: SelectFweScikitsLearnNode

class mdp.nodes.BayesianRidgeScikitsLearnNode

Bayesian ridge regression

This node has been automatically generated by wrapping the sklearn.linear_model.bayes.BayesianRidge class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Fit a Bayesian ridge model and optimize the regularization parameters lambda (precision of the weights) and alpha (precision of the noise).

Parameters

X : array, shape = (n_samples, n_features)
Training vectors.
y : array, shape = (length)
Target values for training vectors
n_iter : int, optional
Maximum number of iterations. Default is 300.
tol : float, optional
Stop the algorithm if w has converged. Default is 1.e-3.
alpha_1 : float, optional
Hyper-parameter : shape parameter for the Gamma distribution prior over the alpha parameter. Default is 1.e-6
alpha_2 : float, optional
Hyper-parameter : inverse scale parameter (rate parameter) for the Gamma distribution prior over the alpha parameter. Default is 1.e-6.
lambda_1 : float, optional
Hyper-parameter : shape parameter for the Gamma distribution prior over the lambda parameter. Default is 1.e-6.
lambda_2 : float, optional
Hyper-parameter : inverse scale parameter (rate parameter) for the Gamma distribution prior over the lambda parameter. Default is 1.e-6
compute_score : boolean, optional
If True, compute the objective function at each step of the model. Default is False
fit_intercept : boolean, optional
wether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered). Default is True.
normalize : boolean, optional
If True, the regressors X are normalized Default is False
overwrite_X : boolean, optional
If True, X will not be copied Default is False
verbose : boolean, optional
Verbose mode when fitting the model. Default is False.

Attributes

coef_ : array, shape = (n_features)
Coefficients of the regression model (mean of distribution)
alpha_ : float
estimated precision of the noise.
lambda_ : array, shape = (n_features)
estimated precisions of the weights.
scores_ : float
if computed, value of the objective function (to be maximized)

Methods

fit(X, y) : self
Fit the model
predict(X) : array
Predict using the model.

Examples

>>> from sklearn import linear_model
>>> clf = linear_model.BayesianRidge()
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, compute_score=False,
       fit_intercept=True, lambda_1=1e-06, lambda_2=1e-06, n_iter=300,
       normalize=False, overwrite_X=False, tol=0.001, verbose=False)
>>> clf.predict([[1, 1]])
array([ 1.])

Notes

See examples/linear_model/plot_bayesian_ridge.py for an example.

Full API documentation: BayesianRidgeScikitsLearnNode

class mdp.nodes.RidgeCVScikitsLearnNode

Ridge regression with built-in cross-validation.

This node has been automatically generated by wrapping the sklearn.linear_model.ridge.RidgeCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

By default, it performs Generalized Cross-Validation, which is a form of efficient Leave-One-Out cross-validation. Currently, only the n_features > n_samples case is handled efficiently.

Parameters

alphas: numpy array of shape [n_alpha]
Array of alpha values to try. Small positive values of alpha improve the conditioning of the problem and reduce the variance of the estimates. Alpha corresponds to (2*C)^-1 in other linear models such as LogisticRegression or LinearSVC.
fit_intercept : boolean
Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
normalize : boolean, optional
If True, the regressors X are normalized
loss_func: callable, optional
function that takes 2 arguments and compares them in order to evaluate the performance of prediciton (small is good) if None is passed, the score of the estimator is maximized
score_func: callable, optional
function that takes 2 arguments and compares them in order to evaluate the performance of prediciton (big is good) if None is passed, the score of the estimator is maximized

See also

Ridge

Full API documentation: RidgeCVScikitsLearnNode

class mdp.nodes.GMMScikitsLearnNode

Gaussian Mixture Model

This node has been automatically generated by wrapping the sklearn.mixture.gmm.GMM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Representation of a Gaussian mixture model probability distribution. This class allows for easy evaluation of, sampling from, and maximum-likelihood estimation of the parameters of a GMM distribution.

Initializes parameters such that every mixture component has zero mean and identity covariance.

Parameters

n_components : int, optional
Number of mixture components. Defaults to 1.
cvtype : string (read-only), optional
String describing the type of covariance parameters to use. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. Defaults to ‘diag’.
rng : numpy.random object, optional
Must support the full numpy random number generator API.
min_covar : float, optional
Floor on the diagonal of the covariance matrix to prevent overfitting. Defaults to 1e-3.
thresh : float, optional
Convergence threshold.

Attributes

cvtype : string (read-only)
String describing the type of covariance parameters used by the GMM. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’.
n_features : int
Dimensionality of the Gaussians.
n_states : int (read-only)
Number of mixture components.
weights : array, shape (n_states,)
Mixing weights for each mixture component.
means : array, shape (n_states, n_features)
Mean parameters for each mixture component.
covars : array

Covariance parameters for each mixture component. The shape depends on cvtype:

  • (n_states,) if ‘spherical’,
  • (n_features, n_features) if ‘tied’,
  • (n_states, n_features) if ‘diag’,
  • (n_states, n_features, n_features) if ‘full’
converged_ : bool
True when convergence was reached in fit(), False otherwise.

Methods

decode(X)
Find most likely mixture components for each point in X.
eval(X)
Compute the log likelihood of X under the model and the posterior distribution over mixture components.
fit(X)
Estimate model parameters from X using the EM algorithm.
predict(X)
Like decode, find most likely mixtures components for each observation in X.
rvs(n=1, random_state=None)
Generate n samples from the model.
score(X)
Compute the log likelihood of X under the model.

See Also

DPGMM : Ininite gaussian mixture model, using the dirichlet process, fit with a variational algorithm

VBGMM : Finite gaussian mixture model fit with a variational algorithm, better for situations where there might be too little data to get a good estimate of the covariance matrix.

Examples

>>> import numpy as np
>>> from sklearn import mixture
>>> np.random.seed(1)
>>> g = mixture.GMM(n_components=2)
>>> # Generate random observations with two modes centered on 0
>>> # and 10 to use for training.
>>> obs = np.concatenate((np.random.randn(100, 1),
...                       10 + np.random.randn(300, 1)))
>>> g.fit(obs)
GMM(cvtype='diag', n_components=2)
>>> np.round(g.weights, 2)
array([ 0.75,  0.25])
>>> np.round(g.means, 2)
array([[ 10.05],
       [  0.06]])
>>> np.round(g.covars, 2) 
array([[[ 1.02]],
       [[ 0.96]]])
>>> g.predict([[0], [2], [9], [10]])
array([1, 1, 0, 0])
>>> np.round(g.score([[0], [2], [9], [10]]), 2)
array([-2.19, -4.58, -1.75, -1.21])
>>> # Refit the model on new data (initial parameters remain the
>>> # same), this time with an even split between the two modes.
>>> g.fit(20 * [[0]] +  20 * [[10]])
GMM(cvtype='diag', n_components=2)
>>> np.round(g.weights, 2)
array([ 0.5,  0.5])

Full API documentation: GMMScikitsLearnNode

class mdp.nodes.PatchExtractorScikitsLearnNode

Extracts patches from a collection of images

This node has been automatically generated by wrapping the sklearn.feature_extraction.image.PatchExtractor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

patch_size: tuple of ints (patch_height, patch_width)
the dimensions of one patch
max_patches: integer or float, optional default is None
The maximum number of patches per image to extract. If max_patches is a float in (0, 1), it is taken to mean a proportion of the total number of patches.
random_state: int or RandomState
Pseudo number generator state used for random sampling.

Full API documentation: PatchExtractorScikitsLearnNode

class mdp.nodes.ARDRegressionScikitsLearnNode

Bayesian ARD regression.

This node has been automatically generated by wrapping the sklearn.linear_model.bayes.ARDRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Fit the weights of a regression model, using an ARD prior. The weights of the regression model are assumed to be in Gaussian distributions. Also estimate the parameters lambda (precisions of the distributions of the weights) and alpha (precision of the distribution of the noise). The estimation is done by an iterative procedures (Evidence Maximization)

Parameters

X : array, shape = (n_samples, n_features)
Training vectors.
y : array, shape = (n_samples)
Target values for training vectors
n_iter : int, optional
Maximum number of iterations. Default is 300
tol : float, optional
Stop the algorithm if w has converged. Default is 1.e-3.
alpha_1 : float, optional
Hyper-parameter : shape parameter for the Gamma distribution prior over the alpha parameter. Default is 1.e-6.
alpha_2 : float, optional
Hyper-parameter : inverse scale parameter (rate parameter) for the Gamma distribution prior over the alpha parameter. Default is 1.e-6.
lambda_1 : float, optional
Hyper-parameter : shape parameter for the Gamma distribution prior over the lambda parameter. Default is 1.e-6.
lambda_2 : float, optional
Hyper-parameter : inverse scale parameter (rate parameter) for the Gamma distribution prior over the lambda parameter. Default is 1.e-6.
compute_score : boolean, optional
If True, compute the objective function at each step of the model. Default is False.
threshold_lambda : float, optional
threshold for removing (pruning) weights with high precision from the computation. Default is 1.e+4.
fit_intercept : boolean, optional
wether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered). Default is True.
normalize : boolean, optional
If True, the regressors X are normalized
overwrite_X : boolean, optional
If True, X will not be copied Default is False
verbose : boolean, optional
Verbose mode when fitting the model. Default is False.

Attributes

coef_ : array, shape = (n_features)
Coefficients of the regression model (mean of distribution)
alpha_ : float
estimated precision of the noise.
lambda_ : array, shape = (n_features)
estimated precisions of the weights.
sigma_ : array, shape = (n_features, n_features)
estimated variance-covariance matrix of the weights
scores_ : float
if computed, value of the objective function (to be maximized)

Methods

fit(X, y) : self
Fit the model
predict(X) : array
Predict using the model.

Examples

>>> from sklearn import linear_model
>>> clf = linear_model.ARDRegression()
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
ARDRegression(alpha_1=1e-06, alpha_2=1e-06, compute_score=False,
       fit_intercept=True, lambda_1=1e-06, lambda_2=1e-06, n_iter=300,
       normalize=False, overwrite_X=False, threshold_lambda=10000.0,
       tol=0.001, verbose=False)
>>> clf.predict([[1, 1]])
array([ 1.])

Notes

See examples/linear_model/plot_ard.py for an example.

Full API documentation: ARDRegressionScikitsLearnNode

class mdp.nodes.GenericUnivariateSelectScikitsLearnNode

Full API documentation: GenericUnivariateSelectScikitsLearnNode

class mdp.nodes.BernoulliNBScikitsLearnNode

Naive Bayes classifier for multivariate Bernoulli models.

This node has been automatically generated by wrapping the sklearn.naive_bayes.BernoulliNB class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Like MultinomialNB, this classifier is suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features.

Note: this class does not check whether features are actually boolean.

Parameters

alpha: float, optional (default=1.0)
Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).
binarize: float or None, optional
Threshold for binarizing (mapping to booleans) of sample features. If None, input is presumed to already consist of binary vectors.
fit_prior: boolean
Whether to learn class prior probabilities or not. If false, a uniform prior will be used.

Methods

fit(X, y) : self
Fit the model
predict(X) : array
Predict using the model.
predict_proba(X) : array
Predict the probability of each class using the model.
predict_log_proba(X) : array
Predict the log probability of each class using the model.

Attributes

class_log_prior_ : array, shape = [n_classes]
Log probability of each class (smoothed).
feature_log_prob_ : array, shape = [n_classes, n_features]
Empirical log probability of features given a class, P(x_i|y).

Examples

>>> import numpy as np
>>> X = np.random.randint(2, size=(6, 100))
>>> Y = np.array([1, 2, 3, 4, 4, 5])
>>> from sklearn.naive_bayes import BernoulliNB
>>> clf = BernoulliNB()
>>> clf.fit(X, Y)
BernoulliNB(alpha=1.0, binarize=0.0, fit_prior=True)
>>> print clf.predict(X[2])
[3]

References

C.D. Manning, P. Raghavan and H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234–265.

A. McCallum and K. Nigam (1998). A comparison of event models for naive Bayes text classification. Proc. AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41–48.

V. Metsis, I. Androutsopoulos and G. Paliouras (2006). Spam filtering with naive Bayes – Which naive Bayes? 3rd Conf. on Email and Anti-Spam (CEAS).

Full API documentation: BernoulliNBScikitsLearnNode

class mdp.nodes.LarsScikitsLearnNode

Least Angle Regression model a.k.a. LAR

This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.Lars class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

n_nonzero_coefs : int, optional
Target number of non-zero coefficients. Use np.inf for no limit.
fit_intercept : boolean
Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
verbose : boolean or integer, optional
Sets the verbosity amount
normalize : boolean, optional
If True, the regressors X are normalized
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument.
overwrite_X : boolean, optional
If True, X will not be copied Default is False
eps: float, optional
The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems. Unlike the ‘tol’ parameter in some iterative optimization-based algorithms, this parameter does not control the tolerance of the optimization.

Attributes

coef_ : array, shape = [n_features]
parameter vector (w in the fomulation formula)
intercept_ : float
independent term in decision function.

Examples

>>> from sklearn import linear_model
>>> clf = linear_model.Lars(n_nonzero_coefs=1)
>>> clf.fit([[-1, 1], [0, 0], [1, 1]], [-1.1111, 0, -1.1111]) 
Lars(eps=..., fit_intercept=True, n_nonzero_coefs=1,
   normalize=True, overwrite_X=False, precompute='auto', verbose=False)
>>> print clf.coef_ 
[ 0. -1.11...]

References

http://en.wikipedia.org/wiki/Least_angle_regression

See also

lars_path, LassoLARS, LarsCV, LassoLarsCV decomposition.sparse_encode, decomposition.sparse_encode_parallel

Full API documentation: LarsScikitsLearnNode

class mdp.nodes.SelectKBestScikitsLearnNode

This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectKBest class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Full API documentation: SelectKBestScikitsLearnNode

class mdp.nodes.OneClassSVMScikitsLearnNode

Unsupervised Outliers Detection.

This node has been automatically generated by wrapping the sklearn.svm.classes.OneClassSVM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Estimate the support of a high-dimensional distribution.

Parameters

kernel : string, optional
Specifies the kernel type to be used in the algorithm. Can be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’. If none is given ‘rbf’ will be used.
nu : float, optional
An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Should be in the interval (0, 1]. By default 0.5 will be taken.
degree : int, optional
Degree of kernel function. Significant only in poly, rbf, sigmoid.
gamma : float, optional (default=0.0)
kernel coefficient for rbf and poly, if gamma is 0.0 then 1/n_features will be taken.
coef0 : float, optional
Independent term in kernel function. It is only significant in poly/sigmoid.
tol: float, optional
Tolerance for stopping criterion.
shrinking: boolean, optional
Whether to use the shrinking heuristic.

Attributes

support_ : array-like, shape = [n_SV]
Index of support vectors.
support_vectors_ : array-like, shape = [nSV, n_features]
Support vectors.
dual_coef_ : array, shape = [n_classes-1, n_SV]
Coefficient of the support vector in the decision function.
coef_ : array, shape = [n_classes-1, n_features]
Weights asigned to the features (coefficients in the primal problem). This is only available in the case of linear kernel.
intercept_ : array, shape = [n_classes-1]
Constants in decision function.

Full API documentation: OneClassSVMScikitsLearnNode

class mdp.nodes.LogisticRegressionScikitsLearnNode

Logistic Regression.

This node has been automatically generated by wrapping the sklearn.linear_model.logistic.LogisticRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Implements L1 and L2 regularized logistic regression.

Parameters

penalty : string, ‘l1’ or ‘l2’
Used to specify the norm used in the penalization
dual : boolean
Dual or primal formulation. Dual formulation is only implemented for l2 penalty.
C : float
Specifies the strength of the regularization. The smaller it is the bigger in the regularization.
fit_intercept : bool, default: True
Specifies if a constant (a.k.a. bias or intercept) should be added the decision function
intercept_scaling : float, default: 1
when self.fit_intercept is True, instance vector x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equals to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic feature weight Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased
tol: float, optional
tolerance for stopping criteria

Attributes

coef_ : array, shape = [n_classes-1, n_features]
Coefficient of the features in the decision function.
intercept_ : array, shape = [n_classes-1]
intercept (a.k.a. bias) added to the decision function. It is available only when parameter intercept is set to True

See also

LinearSVC

Notes

The underlying C implementation uses a random number generator to select features when fitting the model. It is thus not uncommon, to have slightly different results for the same input data. If that happens, try with a smaller tol parameter.

References

LIBLINEAR – A Library for Large Linear Classification http://www.csie.ntu.edu.tw/~cjlin/liblinear/

Full API documentation: LogisticRegressionScikitsLearnNode

class mdp.nodes.SVRScikitsLearnNode

epsilon-Support Vector Regression.

This node has been automatically generated by wrapping the sklearn.svm.classes.SVR class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The free parameters in the model are C and epsilon.

Parameters

C : float, optional (default=1.0)
penalty parameter C of the error term.
epsilon : float, optional (default=0.1)
epsilon in the epsilon-SVR model. It specifies the epsilon-tube within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value.
kernel : string, optional (default=’rbf’)
Specifies the kernel type to be used in the algorithm. one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’. If none is given ‘rbf’ will be used.
degree : int, optional (default=3)
degree of kernel function is significant only in poly, rbf, sigmoid
gamma : float, optional (default=0.0)
kernel coefficient for rbf and poly, if gamma is 0.0 then 1/n_features will be taken.
coef0 : float, optional (default=0.0)
independent term in kernel function. It is only significant in poly/sigmoid.
probability: boolean, optional (default=False)
Whether to enable probability estimates. This must be enabled prior to calling prob_predict.
shrinking: boolean, optional (default=True)
Whether to use the shrinking heuristic.
tol: float, optional (default=1e-3)
Tolerance for stopping criterion.

Attributes

support_ : array-like, shape = [n_SV]
Index of support vectors.
support_vectors_ : array-like, shape = [nSV, n_features]
Support vectors.
dual_coef_ : array, shape = [n_classes-1, n_SV]
Coefficients of the support vector in the decision function.
coef_ : array, shape = [n_classes-1, n_features]
Weights asigned to the features (coefficients in the primal problem). This is only available in the case of linear kernel.
intercept_ : array, shape = [n_class * (n_class-1) / 2]
Constants in decision function.

Examples

>>> from sklearn.svm import SVR
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> np.random.seed(0)
>>> y = np.random.randn(n_samples)
>>> X = np.random.randn(n_samples, n_features)
>>> clf = SVR(C=1.0, epsilon=0.2)
>>> clf.fit(X, y)
SVR(C=1.0, coef0=0.0, degree=3, epsilon=0.2, gamma=0.2, kernel='rbf',
  probability=False, shrinking=True, tol=0.001)

See also

NuSVR

Full API documentation: SVRScikitsLearnNode

class mdp.nodes.NuSVCScikitsLearnNode

NuSVC for sparse matrices (csr).

This node has been automatically generated by wrapping the sklearn.svm.sparse.classes.NuSVC class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

See sklearn.svm.NuSVC for a complete list of parameters

Notes

For best results, this accepts a matrix in csr format (scipy.sparse.csr), but should be able to convert from any array-like object (including other sparse representations).

Examples

>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> y = np.array([1, 1, 2, 2])
>>> from sklearn.svm.sparse import NuSVC
>>> clf = NuSVC()
>>> clf.fit(X, y)
NuSVC(coef0=0.0, degree=3, gamma=0.5, kernel='rbf', nu=0.5, probability=False,
   shrinking=True, tol=0.001)
>>> print clf.predict([[-0.8, -1]])
[ 1.]

Full API documentation: NuSVCScikitsLearnNode

class mdp.nodes.GaussianProcessScikitsLearnNode

The Gaussian Process model class.

This node has been automatically generated by wrapping the sklearn.gaussian_process.gaussian_process.GaussianProcess class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

regr : string or callable, optional

A regression function returning an array of outputs of the linear regression functional basis. The number of observations n_samples should be greater than the size p of this basis. Default assumes a simple constant regression trend. Here is the list of built-in regression models:

  • ‘constant’, ‘linear’, ‘quadratic’
corr : string or callable, optional

A stationary autocorrelation function returning the autocorrelation between two points x and x’. Default assumes a squared-exponential autocorrelation model. Here is the list of built-in correlation models:

  • ‘absolute_exponential’, ‘squared_exponential’,
  • ‘generalized_exponential’, ‘cubic’, ‘linear’
beta0 : double array_like, optional
The regression weight vector to perform Ordinary Kriging (OK). Default assumes Universal Kriging (UK) so that the vector beta of regression weights is estimated using the maximum likelihood principle.
storage_mode : string, optional
A string specifying whether the Cholesky decomposition of the correlation matrix should be stored in the class (storage_mode = ‘full’) or not (storage_mode = ‘light’). Default assumes storage_mode = ‘full’, so that the Cholesky decomposition of the correlation matrix is stored. This might be a useful parameter when one is not interested in the MSE and only plan to estimate the BLUP, for which the correlation matrix is not required.
verbose : boolean, optional
A boolean specifying the verbose level. Default is verbose = False.
theta0 : double array_like, optional
An array with shape (n_features, ) or (1, ). The parameters in the autocorrelation model. If thetaL and thetaU are also specified, theta0 is considered as the starting point for the maximum likelihood rstimation of the best set of parameters. Default assumes isotropic autocorrelation model with theta0 = 1e-1.
thetaL : double array_like, optional
An array with shape matching theta0’s. Lower bound on the autocorrelation parameters for maximum likelihood estimation. Default is None, so that it skips maximum likelihood estimation and it uses theta0.
thetaU : double array_like, optional
An array with shape matching theta0’s. Upper bound on the autocorrelation parameters for maximum likelihood estimation. Default is None, so that it skips maximum likelihood estimation and it uses theta0.
normalize : boolean, optional
Input X and observations y are centered and reduced wrt means and standard deviations estimated from the n_samples observations provided. Default is normalize = True so that data is normalized to ease maximum likelihood estimation.
nugget : double, optional
Introduce a nugget effect to allow smooth predictions from noisy data. Default assumes a nugget close to machine precision for the sake of robustness (nugget = 10. * MACHINE_EPSILON).
optimizer : string, optional

A string specifying the optimization algorithm to be used. Default uses ‘fmin_cobyla’ algorithm from scipy.optimize. Here is the list of available optimizers:

  • ‘fmin_cobyla’, ‘Welch’

‘Welch’ optimizer is dued to Welch et al., see reference [2]. It consists in iterating over several one-dimensional optimizations instead of running one single multi-dimensional optimization.

random_start : int, optional
The number of times the Maximum Likelihood Estimation should be performed from a random starting point. The first MLE always uses the specified starting point (theta0), the next starting points are picked at random according to an exponential distribution (log-uniform on [thetaL, thetaU]). Default does not use random starting point (random_start = 1).

Example

>>> import numpy as np
>>> from sklearn.gaussian_process import GaussianProcess
>>> X = np.atleast_2d([1., 3., 5., 6., 7., 8.]).T
>>> y = (X * np.sin(X)).ravel()
>>> gp = GaussianProcess(theta0=0.1, thetaL=.001, thetaU=1.)
>>> gp.fit(X, y) 
GaussianProcess(beta0=None, corr=...,
        normalize=..., nugget=...,
        ...

Implementation details

The presentation implementation is based on a translation of the DACE Matlab toolbox, see reference [1].

References

[1] H.B. Nielsen, S.N. Lophaven, H. B. Nielsen and J. Sondergaard (2002).
DACE - A MATLAB Kriging Toolbox. http://www2.imm.dtu.dk/~hbn/dace/dace.pdf
[2] W.J. Welch, R.J. Buck, J. Sacks, H.P. Wynn, T.J. Mitchell, and M.D.
Morris (1992). Screening, predicting, and computer experiments. Technometrics, 34(1) 15–25. http://www.jstor.org/pss/1269548

Full API documentation: GaussianProcessScikitsLearnNode

class mdp.nodes.ElasticNetScikitsLearnNode

Linear Model trained with L1 and L2 prior as regularizer

This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.ElasticNet class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

rho=1 is the lasso penalty. Currently, rho <= 0.01 is not reliable, unless you supply your own sequence of alpha.

Parameters

alpha : float
Constant that multiplies the penalty terms. Defaults to 1.0 See the notes for the exact mathematical meaning of this parameter
rho : float
The ElasticNet mixing parameter, with 0 < rho <= 1. For rho = 0 the penalty is an L1 penalty. For rho = 1 it is an L2 penalty. For 0 < rho < 1, the penalty is a combination of L1 and L2
fit_intercept: bool
Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
normalize : boolean, optional
If True, the regressors X are normalized
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument.
max_iter: int, optional
The maximum number of iterations
overwrite_X : boolean, optional
If True, X will not be copied Default is False
tol: float, optional
The tolerance for the optimization: if the updates are smaller than ‘tol’, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.

Notes

To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.

The parameter rho corresponds to alpha in the glmnet R package while alpha corresponds to the lambda parameter in glmnet. More specifically, the penalty is:

alpha*rho*L1 + alpha*(1-rho)*L2

If you are interested in controlling the L1 and L2 penalty separately, keep in mind that this is equivalent to:

a*L1 + b*L2

for:

alpha = a + b and rho = a/(a+b)

Full API documentation: ElasticNetScikitsLearnNode

class mdp.nodes.NeighborsRegressorScikitsLearnNode

Regression based on nearest neighbors. (Deprecated)

This node has been automatically generated by wrapping the sklearn.neighbors.regression.NeighborsRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

DEPRECATED IN VERSION 0.9; WILL BE REMOVED IN VERSION 0.11 Please use KNeighborsRegressor or RadiusNeighborsRegressor instead.

The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set. Samples used for the regression are either the k-nearest points, or all points within some fixed radius.

Parameters

n_neighbors : int, optional (default = 5)
Number of neighbors to use by default for k_neighbors() queries.
radius : float, optional (default = 1.0)
Range of parameter space to use by default for :meth`radius_neighbors` queries.
algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional

Algorithm used to compute the nearest neighbors:

  • ‘ball_tree’ will use BallTree
  • ‘kd_tree’ will use scipy.spatial.cKDtree
  • ‘brute’ will use a brute-force search.
  • ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_size : int, optional (default = 30)
Leaf size passed to BallTree or cKDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.
classification_type : {‘knn_vote’, ‘radius_vote’}, optional
Type of fit to use: ‘knn_vote’ specifies a k-NN classification. ‘radius_vote’ specifies a r-NN classification. Default is ‘knn_vote’.

Examples

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import NeighborsRegressor
>>> neigh = NeighborsRegressor(n_neighbors=2)
>>> neigh.fit(X, y)
NeighborsRegressor(algorithm='auto', classification_type='knn_vote',
          leaf_size=30, n_neighbors=2, radius=1.0)
>>> print neigh.predict([[1.5]])
[ 0.5]

See also

NearestNeighbors KNeighborsRegressor RadiusNeighborsRegressor KNeighborsClassifier RadiusNeighborsClassifier

Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

References

http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Full API documentation: NeighborsRegressorScikitsLearnNode

class mdp.nodes.OrthogonalMatchingPursuitScikitsLearnNode

Orthogonal Mathching Pursuit model (OMP)

This node has been automatically generated by wrapping the sklearn.linear_model.omp.OrthogonalMatchingPursuit class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

n_nonzero_coefs: int, optional
Desired number of non-zero entries in the solution. If None (by default) this value is set to 10% of n_features.
tol: float, optional
Maximum norm of the residual. If not None, overrides n_nonzero_coefs.
fit_intercept: boolean, optional
whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
normalize: boolean, optional
If False, the regressors X are assumed to be already normalized.
precompute_gram: {True, False, ‘auto’},
Whether to use a precomputed Gram and Xy matrix to speed up calculations. Improves performance when n_targets or n_samples is very large. Note that if you already have such matrices, you can pass them directly to the fit method.
overwrite_X: bool,
Whether the design matrix X can be overwritten by the algorithm. This is only helpful if X is already Fortran-ordered, otherwise a copy is made anyway.
overwrite_gram: bool,
Whether the gram matrix can be overwritten by the algorithm. This is only helpful if it is already Fortran-ordered, otherwise a copy is made anyway.
overwrite_Xy: bool,
Whether the covariance vector Xy can be overwritten by the algorithm.

Attributes

coef_: array, shape = (n_features,) or (n_features, n_targets)
parameter vector (w in the fomulation formula)
intercept_: float or array, shape =(n_targets,)
independent term in decision function.

Notes

Orthogonal matching pursuit was introduced in G. Mallat, Z. Zhang, Matching pursuits with time-frequency dictionaries, IEEE Transactions on Signal Processing, Vol. 41, No. 12. (December 1993), pp. 3397-3415. (http://blanche.polytechnique.fr/~mallat/papiers/MallatPursuit93.pdf)

This implementation is based on Rubinstein, R., Zibulevsky, M. and Elad, M., Efficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit Technical Report - CS Technion, April 2008. http://www.cs.technion.ac.il/~ronrubin/Publications/KSVX-OMP-v2.pdf

See also

orthogonal_mp orthogonal_mp_gram lars_path Lars LassoLars decomposition.sparse_encode decomposition.sparse_encode_parallel

Full API documentation: OrthogonalMatchingPursuitScikitsLearnNode

class mdp.nodes.PLSRegressionScikitsLearnNode

PLS regression

This node has been automatically generated by wrapping the sklearn.pls.PLSRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

PLSRegression inherits from PLS with mode=”A” and deflation_mode=”regression”. Also known PLS2 or PLS in case of one dimensional response.

Parameters

X: array-like of predictors, shape = [n_samples, p]
Training vectors, where n_samples in the number of samples and p is the number of predictors.
Y: array-like of response, shape = [n_samples, q]
Training vectors, where n_samples in the number of samples and q is the number of response variables.
n_components: int, (default 2)
Number of components to keep.
scale: boolean, (default True)
whether to scale the data
algorithm: string, “nipals” or “svd”
The algorithm used to estimate the weights. It will be called n_components times, i.e. once for each iteration of the outer loop.
max_iter: an integer, (default 500)
the maximum number of iterations of the NIPALS inner loop (used only if algorithm=”nipals”)
tol: non-negative real
Tolerance used in the iterative algorithm default 1e-06.
copy: boolean, default True
Whether the deflation should be done on a copy. Let the default value to True unless you don’t care about side effect

Attributes

x_weights_: array, [p, n_components]
X block weights vectors.
y_weights_: array, [q, n_components]
Y block weights vectors.
x_loadings_: array, [p, n_components]
X block loadings vectors.
y_loadings_: array, [q, n_components]
Y block loadings vectors.
x_scores_: array, [n_samples, n_components]
X scores.
y_scores_: array, [n_samples, n_components]
Y scores.
x_rotations_: array, [p, n_components]
X block to latents rotations.
y_rotations_: array, [q, n_components]
Y block to latents rotations.
coefs: array, [p, q]
The coeficients of the linear model: Y = X coefs + Err

Notes

For each component k, find weights u, v that optimizes:

max corr(Xk u, Yk v) * var(Xk u) var(Yk u), such that |u| = |v| = 1

Note that it maximizes both the correlations between the scores and the intra-block variances.

The residual matrix of X (Xk+1) block is obtained by the deflation on the current X score: x_score.

The residual matrix of Y (Yk+1) block is obtained by deflation on the current X score. This performs the PLS regression known as PLS2. This mode is prediction oriented.

Examples

>>> from sklearn.pls import PLSCanonical, PLSRegression, CCA
>>> X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
>>> Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
>>> pls2 = PLSRegression(n_components=2)
>>> pls2.fit(X, Y)
PLSRegression(algorithm='nipals', copy=True, max_iter=500, n_components=2,
       scale=True, tol=1e-06)
>>> Y_pred = pls2.predict(X)

References

Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. Technical Report 371, Department of Statistics, University of Washington, Seattle, 2000.

In french but still a reference:

Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris:

Editions Technic.

Full API documentation: PLSRegressionScikitsLearnNode

class mdp.nodes.PLSCanonicalScikitsLearnNode

PLS canonical. PLSCanonical inherits from PLS with mode=”A” and deflation_mode=”canonical”.

This node has been automatically generated by wrapping the sklearn.pls.PLSCanonical class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

X: array-like of predictors, shape = [n_samples, p]
Training vectors, where n_samples in the number of samples and p is the number of predictors.
Y: array-like of response, shape = [n_samples, q]
Training vectors, where n_samples in the number of samples and q is the number of response variables.

n_components: int, number of components to keep. (default 2).

scale: boolean, scale data? (default True)

algorithm: string, “nipals” or “svd”
The algorithm used to estimate the weights. It will be called n_components times, i.e. once for each iteration of the outer loop.
max_iter: an integer, (default 500)
the maximum number of iterations of the NIPALS inner loop (used only if algorithm=”nipals”)
tol: non-negative real, default 1e-06
the tolerance used in the iterative algorithm
copy: boolean, default True
Whether the deflation should be done on a copy. Let the default value to True unless you don’t care about side effect

Attributes

x_weights_: array, shape = [p, n_components]
X block weights vectors.
y_weights_: array, shape = [q, n_components]
Y block weights vectors.
x_loadings_: array, shape = [p, n_components]
X block loadings vectors.
y_loadings_: array, shape = [q, n_components]
Y block loadings vectors.
x_scores_: array, shape = [n_samples, n_components]
X scores.
y_scores_: array, shape = [n_samples, n_components]
Y scores.
x_rotations_: array, shape = [p, n_components]
X block to latents rotations.
y_rotations_: array, shape = [q, n_components]
Y block to latents rotations.

Notes

For each component k, find weights u, v that optimize:

max corr(Xk u, Yk v) * var(Xk u) var(Yk u), such that |u| = |v| = 1

Note that it maximizes both the correlations between the scores and the intra-block variances.

The residual matrix of X (Xk+1) block is obtained by the deflation on the current X score: x_score.

The residual matrix of Y (Yk+1) block is obtained by deflation on the current Y score. This performs a canonical symetric version of the PLS regression. But slightly different than the CCA. This is mode mostly used for modeling.

Examples

>>> from sklearn.pls import PLSCanonical, PLSRegression, CCA
>>> X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
>>> Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
>>> plsca = PLSCanonical(n_components=2)
>>> plsca.fit(X, Y)
PLSCanonical(algorithm='nipals', copy=True, max_iter=500, n_components=2,
       scale=True, tol=1e-06)
>>> X_c, Y_c = plsca.transform(X, Y)

References

Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. Technical Report 371, Department of Statistics, University of Washington, Seattle, 2000.

Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris:

Editions Technic.

See also

CCA PLSSVD

Full API documentation: PLSCanonicalScikitsLearnNode

class mdp.nodes.ProbabilisticPCAScikitsLearnNode

Additional layer on top of PCA that adds a probabilistic evaluation

This node has been automatically generated by wrapping the sklearn.decomposition.pca.ProbabilisticPCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Principal component analysis (PCA)

Linear dimensionality reduction using Singular Value Decomposition of the data and keeping only the most significant singular vectors to project the data to a lower dimensional space.

This implementation uses the scipy.linalg implementation of the singular value decomposition. It only works for dense arrays and is not scalable to large dimensional data.

The time complexity of this implementation is O(n ** 3) assuming n ~ n_samples ~ n_features.

Parameters

n_components: int, none or string

Number of components to keep. if n_components is not set all components are kept:

  • n_components == min(n_samples, n_features)

if n_components == ‘mle’, Minka’s MLE is used to guess the dimension

if 0 < n_components < 1, select the number of components such that
the amount of variance that needs to be explained is greater than the percentage specified by n_components
copy: bool
If False, data passed to fit are overwritten
whiten: bool, optional

When True (False by default) the components_ vectors are divided by n_samples times singular values to ensure uncorrelated outputs with unit component-wise variances.

Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making there data respect some hard-wired assumptions.

Attributes

components_: array, [n_components, n_features]
Components with maximum variance.
explained_variance_ratio_: array, [n_components]
Percentage of variance explained by each of the selected components. k is not set then all components are stored and the sum of explained variances is equal to 1.0

Notes

For n_components=’mle’, this class uses the method of Thomas P. Minka:

Automatic Choice of Dimensionality for PCA. NIPS 2000: 598-604

Due to implementation subtleties of the Singular Value Decomposition (SVD), which is used in this implementation, running fit twice on the same matrix can lead to principal components with signs flipped (change in direction). For this reason, it is important to always use the same estimator object to transform data in a consistent fashion.

Examples

>>> import numpy as np
>>> from sklearn.decomposition import PCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = PCA(n_components=2)
>>> pca.fit(X)
PCA(copy=True, n_components=2, whiten=False)
>>> print pca.explained_variance_ratio_
[ 0.99244289  0.00755711]

See also

ProbabilisticPCA RandomizedPCA

Full API documentation: ProbabilisticPCAScikitsLearnNode

class mdp.nodes.LinearRegressionScikitsLearnNode

Ordinary least squares Linear Regression.

This node has been automatically generated by wrapping the sklearn.linear_model.base.LinearRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Attributes

coef_ : array
Estimated coefficients for the linear regression problem.
intercept_ : array
Independent term in the linear model.

Notes

From the implementation point of view, this is just plain Ordinary Least Squares (numpy.linalg.lstsq) wrapped as a predictor object.

Full API documentation: LinearRegressionScikitsLearnNode

class mdp.nodes.RadiusNeighborsRegressorScikitsLearnNode

Regression based on neighbors within a fixed radius.

This node has been automatically generated by wrapping the sklearn.neighbors.regression.RadiusNeighborsRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set.

Parameters

radius : float, optional (default = 1.0)
Range of parameter space to use by default for :meth`radius_neighbors` queries.
weights : str or callable

weight function used in prediction. Possible values:

  • ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
  • ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
  • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

Uniform weights are used by default.

algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional

Algorithm used to compute the nearest neighbors:

  • ‘ball_tree’ will use BallTree
  • ‘kd_tree’ will use scipy.spatial.cKDtree
  • ‘brute’ will use a brute-force search.
  • ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_size : int, optional (default = 30)
Leaf size passed to BallTree or cKDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

Examples

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import RadiusNeighborsRegressor
>>> neigh = RadiusNeighborsRegressor(radius=1.0)
>>> neigh.fit(X, y) 
RadiusNeighborsRegressor(...)
>>> print neigh.predict([[1.5]])
[ 0.5]

See also

NearestNeighbors KNeighborsRegressor KNeighborsClassifier RadiusNeighborsClassifier

Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

References

http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Full API documentation: RadiusNeighborsRegressorScikitsLearnNode

class mdp.nodes.LabelBinarizerScikitsLearnNode

Binarize labels in a one-vs-all fashion

This node has been automatically generated by wrapping the sklearn.preprocessing.LabelBinarizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Several regression and binary classification algorithms are available in the scikit. A simple way to extend these algorithms to the multi-class classification case is to use the so-called one-vs-all scheme.

At learning time, this simply consists in learning one regressor or binary classifier per class. In doing so, one needs to convert multi-class labels to binary labels (belong or does not belong to the class). LabelBinarizer makes this process easy with the transform method.

At prediction time, one assigns the class for which the corresponding model gave the greatest confidence. LabelBinarizer makes this easy with the inverse_transform method.

Attributes

classes_ : array of shape [n_class]
Holds the label for each class.

Examples

>>> from sklearn import preprocessing
>>> clf = preprocessing.LabelBinarizer()
>>> clf.fit([1, 2, 6, 4, 2])
LabelBinarizer()
>>> clf.classes_
array([1, 2, 4, 6])
>>> clf.transform([1, 6])
array([[ 1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.]])
>>> clf.fit_transform([(1, 2), (3,)])
array([[ 1.,  1.,  0.],
       [ 0.,  0.,  1.]])
>>> clf.classes_
array([1, 2, 3])

Full API documentation: LabelBinarizerScikitsLearnNode

class mdp.nodes.MiniBatchDictionaryLearningScikitsLearnNode

Mini-batch dictionary learning

This node has been automatically generated by wrapping the sklearn.decomposition.dict_learning.MiniBatchDictionaryLearning class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Finds a dictionary (a set of atoms) that can best be used to represent data using a sparse code.

Solves the optimization problem:

(U^*,V^*) = argmin 0.5 || Y - U V ||_2^2 + alpha * || U ||_1
(U,V)

with || V_k ||_2 = 1 for all 0 <= k < n_atoms

Parameters

n_atoms: int,
number of dictionary elements to extract
alpha: int,
sparsity controlling parameter
n_iter: int,
total number of iterations to perform
fit_algorithm: {‘lars’, ‘cd’}
lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.
transform_algorithm: {‘lasso_lars’, ‘lasso_cd’, ‘lars’, ‘omp’, ‘threshold’}
Algorithm used to transform the data. lars: uses the least angle regression method (linear_model.lars_path) lasso_lars: uses Lars to compute the Lasso solution lasso_cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). lasso_lars will be faster if the estimated components are sparse. omp: uses orthogonal matching pursuit to estimate the sparse solution threshold: squashes to zero all coefficients less than alpha from the projection X.T * Y
transform_n_nonzero_coefs: int, 0.1 * n_features by default
Number of nonzero coefficients to target in each column of the solution. This is only used by algorithm=’lars’ and algorithm=’omp’ and is overridden by alpha in the omp case.
transform_alpha: float, 1. by default
If algorithm=’lasso_lars’ or algorithm=’lasso_cd’, alpha is the penalty applied to the L1 norm. If algorithm=’threshold’, alpha is the absolute value of the threshold below which coefficients will be squashed to zero. If algorithm=’omp’, alpha is the tolerance parameter: the value of the reconstruction error targeted. In this case, it overrides n_nonzero_coefs.
n_jobs: int,
number of parallel jobs to run
dict_init: array of shape (n_atoms, n_features),
initial value of the dictionary for warm restart scenarios

verbose:

  • degree of verbosity of the printed output
chunk_size: int,
number of samples in each mini-batch
shuffle: bool,
whether to shuffle the samples before forming batches
random_state: int or RandomState
Pseudo number generator state used for random sampling.

Attributes

components_: array, [n_atoms, n_features]
components extracted from the data

References

J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning for sparse coding (http://www.di.ens.fr/sierra/pdfs/icml09.pdf)

See also

sklearn.decomposition.SparsePCA which solves the transposed problem, finding sparse components to represent data.

Full API documentation: MiniBatchDictionaryLearningScikitsLearnNode

class mdp.nodes.SVCScikitsLearnNode

C-Support Vector Classification.

This node has been automatically generated by wrapping the sklearn.svm.classes.SVC class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

C : float, optional (default=1.0)
Penalty parameter C of the error term.
kernel : string, optional (default=’rbf’)
Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’. If none is given, ‘rbf’ will be used.
degree : int, optional (default=3)
Degree of kernel function. It is significant only in ‘poly’ and ‘sigmoid’.
gamma : float, optional (default=0.0)
Kernel coefficient for ‘rbf’ and ‘poly’. If gamma is 0.0 then 1/n_features will be used instead.
coef0 : float, optional (default=0.0)
Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.
probability: boolean, optional (default=False)
Whether to enable probability estimates. This must be enabled prior to calling prob_predict.
shrinking: boolean, optional (default=True)
Whether to use the shrinking heuristic.
tol: float, optional (default=1e-3)
Tolerance for stopping criterion.

Attributes

support_ : array-like, shape = [n_SV]
Index of support vectors.
support_vectors_ : array-like, shape = [n_SV, n_features]
Support vectors.
n_support_ : array-like, dtype=int32, shape = [n_class]
number of support vector for each class.
dual_coef_ : array, shape = [n_class-1, n_SV]
Coefficients of the support vector in the decision function.
coef_ : array, shape = [n_class-1, n_features]
Weights asigned to the features (coefficients in the primal problem). This is only available in the case of linear kernel.
intercept_ : array, shape = [n_class * (n_class-1) / 2]
Constants in decision function.

Examples

>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> y = np.array([1, 1, 2, 2])
>>> from sklearn.svm import SVC
>>> clf = SVC()
>>> clf.fit(X, y)
SVC(C=1.0, coef0=0.0, degree=3, gamma=0.5, kernel='rbf', probability=False,
  shrinking=True, tol=0.001)
>>> print clf.predict([[-0.8, -1]])
[ 1.]

See also

SVR, LinearSVC

Full API documentation: SVCScikitsLearnNode

class mdp.nodes.PLSSVDScikitsLearnNode

Partial Least Square SVD

This node has been automatically generated by wrapping the sklearn.pls.PLSSVD class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Simply perform a svd on the crosscovariance matrix: X’Y The are no iterative deflation here.

Parameters

X: array-like of predictors, shape = [n_samples, p]
Training vector, where n_samples in the number of samples and p is the number of predictors. X will be centered before any analysis.
Y: array-like of response, shape = [n_samples, q]
Training vector, where n_samples in the number of samples and q is the number of response variables. X will be centered before any analysis.
n_components: int, (default 2).
number of components to keep.
scale: boolean, (default True)
scale X and Y

Attributes

x_weights_: array, [p, n_components]
X block weights vectors.
y_weights_: array, [q, n_components]
Y block weights vectors.
x_scores_: array, [n_samples, n_components]
X scores.
y_scores_: array, [n_samples, n_components]
Y scores.

See also

PLSCanonical CCA

Full API documentation: PLSSVDScikitsLearnNode

class mdp.nodes.GaussianNBScikitsLearnNode

Gaussian Naive Bayes (GaussianNB)

This node has been automatically generated by wrapping the sklearn.naive_bayes.GaussianNB class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

X : array-like, shape = [n_samples, n_features]
Training vector, where n_samples in the number of samples and n_features is the number of features.
y : array, shape = [n_samples]
Target vector relative to X

Attributes

class_prior : array, shape = [n_classes]
probability of each class.
theta : array, shape [n_classes * n_features]
mean of each feature for the different class
sigma : array, shape [n_classes * n_features]
variance of each feature for the different class

Methods

fit(X, y) : self
Fit the model
predict(X) : array
Predict using the model.
predict_proba(X) : array
Predict the probability of each class using the model.
predict_log_proba(X) : array
Predict the log-probability of each class using the model.

Examples

>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> Y = np.array([1, 1, 1, 2, 2, 2])
>>> from sklearn.naive_bayes import GaussianNB
>>> clf = GaussianNB()
>>> clf.fit(X, Y)
GaussianNB()
>>> print clf.predict([[-0.8, -1]])
[1]

Full API documentation: GaussianNBScikitsLearnNode

class mdp.nodes.RadiusNeighborsClassifierScikitsLearnNode

Classifier implementing a vote among neighbors within a given radius

This node has been automatically generated by wrapping the sklearn.neighbors.classification.RadiusNeighborsClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

radius : float, optional (default = 1.0)
Range of parameter space to use by default for :meth`radius_neighbors` queries.
weights : str or callable

weight function used in prediction. Possible values:

  • ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
  • ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
  • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

Uniform weights are used by default.

algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional

Algorithm used to compute the nearest neighbors:

  • ‘ball_tree’ will use BallTree
  • ‘kd_tree’ will use scipy.spatial.cKDtree
  • ‘brute’ will use a brute-force search.
  • ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_size : int, optional (default = 30)
Leaf size passed to BallTree or cKDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

Examples

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import RadiusNeighborsClassifier
>>> neigh = RadiusNeighborsClassifier(radius=1.0)
>>> neigh.fit(X, y) 
RadiusNeighborsClassifier(...)
>>> print neigh.predict([[1.5]])
[0]

See also

KNeighborsClassifier RadiusNeighborsRegressor KNeighborsRegressor NearestNeighbors

Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

References

http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Full API documentation: RadiusNeighborsClassifierScikitsLearnNode

class mdp.nodes.LassoLarsCVScikitsLearnNode

Cross-validated Lasso, using the LARS algorithm

This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.LassoLarsCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

fit_intercept : boolean
whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
verbose : boolean or integer, optional
Sets the verbosity amount
normalize : boolean, optional
If True, the regressors X are normalized
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument.
max_iter: integer, optional
Maximum number of iterations to perform.
cv : crossvalidation generator, optional
see sklearn.cross_validation module. If None is passed, default to a 5-fold strategy
n_jobs : integer, optional
Number of CPUs to use during the cross validation. If ‘-1’, use all the CPUs
eps: float, optional
The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems.
overwrite_X : boolean, optional
If True, X will not be copied Default is False

Attributes

coef_ : array, shape = [n_features]
parameter vector (w in the fomulation formula)
intercept_ : float
independent term in decision function.
coef_path: array, shape = [n_features, n_alpha]
the varying values of the coefficients along the path
alphas_: array, shape = [n_alpha]
the different values of alpha along the path
cv_alphas: array, shape = [n_cv_alphas]
all the values of alpha along the path for the different folds
cv_mse_path_: array, shape = [n_folds, n_cv_alphas]
the mean square error on left-out for each fold along the path (alpha values given by cv_alphas)

Notes

The object solves the same problem as the LassoCV object. However, unlike the LassoCV, it find the relevent alphas values by itself. In general, because of this property, it will be more stable. However, it is more fragile to heavily multicollinear datasets.

It is more efficient than the LassoCV if only a small number of features are selected compared to the total number, for instance if there are very few samples compared to the number of features.

See also

lars_path, LassoLARS, LarsCV, LassoCV

Full API documentation: LassoLarsCVScikitsLearnNode

class mdp.nodes.LARSScikitsLearnNode

Full API documentation: LARSScikitsLearnNode

class mdp.nodes.LassoLarsICScikitsLearnNode

Lasso model fit with Lars using BIC or AIC for model selection

This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.LassoLarsIC class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

AIC is the Akaike information criterion and BIC is the Bayes Information criterion. Such criteria are useful to select the value of the regularization parameter by making a trade-off between the goodness of fit and the complexity of the model. A good model should explain well the data while being simple.

Parameters

criterion: ‘bic’ | ‘aic’
The type of criterion to use.
fit_intercept : boolean
whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
verbose : boolean or integer, optional
Sets the verbosity amount
normalize : boolean, optional
If True, the regressors X are normalized
overwrite_X : boolean, optional
Default is False. If True, X will be overwritten
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument.
max_iter: integer, optional
Maximum number of iterations to perform. Can be used for early stopping.
eps: float, optional
The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems. Unlike the ‘tol’ parameter in some iterative optimization-based algorithms, this parameter does not control the tolerance of the optimization.

Attributes

coef_ : array, shape = [n_features]
parameter vector (w in the fomulation formula)
intercept_ : float
independent term in decision function.

Examples

>>> from sklearn import linear_model
>>> clf = linear_model.LassoLarsIC(criterion='bic')
>>> clf.fit([[-1, 1], [0, 0], [1, 1]], [-1.1111, 0, -1.1111]) 
LassoLarsIC(criterion='bic', eps=..., fit_intercept=True,
      max_iter=500, normalize=True, overwrite_X=False, precompute='auto',
      verbose=False)
>>> print clf.coef_ 
[ 0.  -1.11...]

References

The estimation of the number of degrees of freedom is given by:

“On the degrees of freedom of the lasso” Hui Zou, Trevor Hastie, and Robert Tibshirani Ann. Statist. Volume 35, Number 5 (2007), 2173-2192.

http://en.wikipedia.org/wiki/Akaike_information_criterion http://en.wikipedia.org/wiki/Bayesian_information_criterion

See also

lars_path, LassoLars, LassoLarsCV

Full API documentation: LassoLarsICScikitsLearnNode

class mdp.nodes.QDAScikitsLearnNode

Quadratic Discriminant Analysis (QDA)

This node has been automatically generated by wrapping the sklearn.qda.QDA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

X : array-like, shape = [n_samples, n_features]
Training vector, where n_samples in the number of samples and n_features is the number of features.
y : array, shape = [n_samples]
Target vector relative to X
priors : array, optional, shape = [n_classes]
Priors on classes

Attributes

means_ : array-like, shape = [n_classes, n_features]
Class means
priors_ : array-like, shape = [n_classes]
Class priors (sum to 1)
covariances_ : list of array-like, shape = [n_features, n_features]
Covariance matrices of each class

Examples

>>> from sklearn.qda import QDA
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = QDA()
>>> clf.fit(X, y)
QDA(priors=None)
>>> print clf.predict([[-0.8, -1]])
[1]

See also

LDA

Full API documentation: QDAScikitsLearnNode

class mdp.nodes.RidgeClassifierCVScikitsLearnNode

Full API documentation: RidgeClassifierCVScikitsLearnNode