# Node List¶

Full API documentation: nodes

class mdp.nodes.PCANode

Filter the input data through the most significatives of its principal components.

Internal variables of interest

self.avg
Mean of the input data (available after training).
self.v
Transposed of the projection matrix (available after training).
self.d
Variance corresponding to the PCA components (eigenvalues of the covariance matrix).
self.explained_variance
When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.

More information about Principal Component Analysis, a.k.a. discrete Karhunen-Loeve transform can be found among others in I.T. Jolliffe, Principal Component Analysis, Springer-Verlag (1986).

Full API documentation: PCANode

class mdp.nodes.WhiteningNode

Whiten the input data by filtering it through the most significatives of its principal components. All output signals have zero mean, unit variance and are decorrelated.

Internal variables of interest

self.avg
Mean of the input data (available after training).
self.v
Transpose of the projection matrix (available after training).
self.d
Variance corresponding to the PCA components (eigenvalues of the covariance matrix).
self.explained_variance
When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.

Full API documentation: WhiteningNode

class mdp.nodes.NIPALSNode

Perform Principal Component Analysis using the NIPALS algorithm. This algorithm is particularyl useful if you have more variable than observations, or in general when the number of variables is huge and calculating a full covariance matrix may be unfeasable. It’s also more efficient of the standard PCANode if you expect the number of significant principal components to be a small. In this case setting output_dim to be a certain fraction of the total variance, say 90%, may be of some help.

Internal variables of interest

self.avg
Mean of the input data (available after training).
self.d
Variance corresponding to the PCA components.
self.v
Transposed of the projection matrix (available after training).
self.explained_variance
When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.

Reference for NIPALS (Nonlinear Iterative Partial Least Squares): Wold, H. Nonlinear estimation by iterative least squares procedures. in David, F. (Editor), Research Papers in Statistics, Wiley, New York, pp 411-444 (1966).

More information about Principal Component Analysis, a.k.a. discrete Karhunen-Loeve transform can be found among others in I.T. Jolliffe, Principal Component Analysis, Springer-Verlag (1986).

Original code contributed by: Michael Schmuker, Susanne Lezius, and Farzad Farkhooi (2008).

Full API documentation: NIPALSNode

class mdp.nodes.FastICANode

Perform Independent Component Analysis using the FastICA algorithm. Note that FastICA is a batch-algorithm. This means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.

FastICA does not support the telescope mode (the convergence criterium is not robust in telescope mode).

Reference: Aapo Hyvarinen (1999). Fast and Robust Fixed-Point Algorithms for Independent Component Analysis IEEE Transactions on Neural Networks, 10(3):626-634.

Internal variables of interest

self.white
The whitening node used for preprocessing.
self.filters
The ICA filters matrix (this is the transposed of the projection matrix after whitening).
self.convergence
The value of the convergence threshold.

History:

• 1.4.1998 created for Matlab by Jarmo Hurri, Hugo Gavert, Jaakko Sarela, and Aapo Hyvarinen
• 7.3.2003 modified for Python by Thomas Wendler
• 3.6.2004 rewritten and adapted for scipy and MDP by MDP’s authors
• 25.5.2005 now independent from scipy. Requires Numeric or numarray
• 26.6.2006 converted to numpy
• 14.9.2007 updated to Matlab version 2.5

Full API documentation: FastICANode

class mdp.nodes.CuBICANode

Perform Independent Component Analysis using the CuBICA algorithm. Note that CuBICA is a batch-algorithm, which means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.

As an alternative to this batch mode you might consider the telescope mode (see the docs of the __init__ method).

Reference: Blaschke, T. and Wiskott, L. (2003). CuBICA: Independent Component Analysis by Simultaneous Third- and Fourth-Order Cumulant Diagonalization. IEEE Transactions on Signal Processing, 52(5), pp. 1250-1256.

Internal variables of interest

self.white
The whitening node used for preprocessing.
self.filters
The ICA filters matrix (this is the transposed of the projection matrix after whitening).
self.convergence
The value of the convergence threshold.

Full API documentation: CuBICANode

class mdp.nodes.TDSEPNode

Perform Independent Component Analysis using the TDSEP algorithm. Note that TDSEP, as implemented in this Node, is an online algorithm, i.e. it is suited to be trained on huge data sets, provided that the training is done sending small chunks of data for each time.

Reference: Ziehe, Andreas and Muller, Klaus-Robert (1998). TDSEP an efficient algorithm for blind separation using time structure. in Niklasson, L, Boden, M, and Ziemke, T (Editors), Proc. 8th Int. Conf. Artificial Neural Networks (ICANN 1998).

Internal variables of interest

self.white
The whitening node used for preprocessing.
self.filters
The ICA filters matrix (this is the transposed of the projection matrix after whitening).
self.convergence
The value of the convergence threshold.

Full API documentation: TDSEPNode

Perform Independent Component Analysis using the JADE algorithm. Note that JADE is a batch-algorithm. This means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.

JADE does not support the telescope mode.

Main references:

• Cardoso, Jean-Francois and Souloumiac, Antoine (1993). Blind beamforming for non Gaussian signals. Radar and Signal Processing, IEE Proceedings F, 140(6): 362-370.
• Cardoso, Jean-Francois (1999). High-order contrasts for independent component analysis. Neural Computation, 11(1): 157-192.

Original code contributed by: Gabriel Beckers (2008).

History:

• May 2005 version 1.8 for MATLAB released by Jean-Francois Cardoso
• Dec 2007 MATLAB version 1.8 ported to Python/NumPy by Gabriel Beckers
• Feb 15 2008 Python/NumPy version adapted for MDP by Gabriel Beckers

class mdp.nodes.SFANode

Extract the slowly varying components from the input data. More information about Slow Feature Analysis can be found in Wiskott, L. and Sejnowski, T.J., Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715-770 (2002).

Instance variables of interest

self.avg
Mean of the input data (available after training)
self.sf
Matrix of the SFA filters (available after training)
self.d
Delta values corresponding to the SFA components (generalized eigenvalues). [See the docs of the get_eta_values method for more information]

Special arguments for constructor

include_last_sample

If False the train method discards the last sample in every chunk during training when calculating the covariance matrix. The last sample is in this case only used for calculating the covariance matrix of the derivatives. The switch should be set to False if you plan to train with several small chunks. For example we can split a sequence (index is time):

x_1 x_2 x_3 x_4

in smaller parts like this:

x_1 x_2
x_2 x_3
x_3 x_4

The SFANode will see 3 derivatives for the temporal covariance matrix, and the first 3 points for the spatial covariance matrix. Of course you will need to use a generator that connects the small chunks (the last sample needs to be sent again in the next chunk). If include_last_sample was True, depending on the generator you use, you would either get:

x_1 x_2
x_2 x_3
x_3 x_4

in which case the last sample of every chunk would be used twice when calculating the covariance matrix, or:

x_1 x_2
x_3 x_4

in which case you loose the derivative between x_3 and x_2.

If you plan to train with a single big chunk leave include_last_sample to the default value, i.e. True.

You can even change this behaviour during training. Just set the corresponding switch in the train method.

Full API documentation: SFANode

class mdp.nodes.SFA2Node

Get an input signal, expand it in the space of inhomogeneous polynomials of degree 2 and extract its slowly varying components. The get_quadratic_form method returns the input-output function of one of the learned unit as a QuadraticForm object. See the documentation of mdp.utils.QuadraticForm for additional information.

More information about Slow Feature Analysis can be found in Wiskott, L. and Sejnowski, T.J., Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715-770 (2002).

Full API documentation: SFA2Node

class mdp.nodes.ISFANode

Perform Independent Slow Feature Analysis on the input data.

Internal variables of interest

self.RP
The global rotation-permutation matrix. This is the filter applied on input_data to get output_data
self.RPC
The complete global rotation-permutation matrix. This is a matrix of dimension input_dim x input_dim (the ‘outer space’ is retained)
self.covs

A mdp.utils.MultipleCovarianceMatrices instance containing the current time-delayed covariance matrices of the input_data. After convergence the uppermost output_dim x output_dim submatrices should be almost diagonal.

self.covs[n-1] is the covariance matrix relative to the n-th time-lag

Note: they are not cleared after convergence. If you need to free some memory, you can safely delete them with:

>>> del self.covs

self.initial_contrast
A dictionary with the starting contrast and the SFA and ICA parts of it.
self.final_contrast
Like the above but after convergence.

Note: If you intend to use this node for large datasets please have a look at the stop_training method documentation for speeding things up.

References: Blaschke, T. , Zito, T., and Wiskott, L. (2007). Independent Slow Feature Analysis and Nonlinear Blind Source Separation. Neural Computation 19(4):994-1021 (2007) http://itb.biologie.hu-berlin.de/~wiskott/Publications/BlasZitoWisk2007-ISFA-NeurComp.pdf

Full API documentation: ISFANode

class mdp.nodes.XSFANode

Perform Non-linear Blind Source Separation using Slow Feature Analysis.

This node is designed to iteratively extract statistically independent sources from (in principle) arbitrary invertible nonlinear mixtures. The method relies on temporal correlations in the sources and consists of a combination of nonlinear SFA and a projection algorithm. More details can be found in the reference given below (once it’s published).

The node has multiple training phases. The number of training phases depends on the number of sources that must be extracted. The recommended way of training this node is through a container flow:

>>> flow = mdp.Flow([XSFANode()])
>>> flow.train(x)


doing so will automatically train all training phases. The argument x to the Flow.train method can be an array or a list of iterables (see the section about Iterators in the MDP tutorial for more info).

If the number of training samples is large, you may run into memory problems: use data iterators and chunk training to reduce memory usage.

If you need to debug training and/or execution of this node, the suggested approach is to use the capabilities of BiMDP. For example:

>>> flow = mdp.Flow([XSFANode()])
>>> tr_filename = bimdp.show_training(flow=flow, data_iterators=x)
>>> ex_filename, out = bimdp.show_execution(flow, x=x)


this will run training and execution with bimdp inspection. Snapshots of the internal flow state for each training phase and execution step will be opened in a web brower and presented as a slideshow.

References: Sprekeler, H., Zito, T., and Wiskott, L. (2009). An Extension of Slow Feature Analysis for Nonlinear Blind Source Separation. Journal of Machine Learning Research. http://cogprints.org/7056/1/SprekelerZitoWiskott-Cogprints-2010.pdf

Full API documentation: XSFANode

class mdp.nodes.FDANode

Perform a (generalized) Fisher Discriminant Analysis of its input. It is a supervised node that implements FDA using a generalized eigenvalue approach.

FDANode has two training phases and is supervised so make sure to pay attention to the following points when you train it:

• call the train method with two arguments: the input data and the labels (see the doc string of the train method for details).
• if you are training the node by hand, call the train method twice.
• if you are training the node using a flow (recommended), the only argument to Flow.train must be a list of (data_point, label) tuples or an iterator returning lists of such tuples, not a generator. The Flow.train function can be called just once as usual, since it takes care of rewinding the iterator to perform the second training step.

More information on Fisher Discriminant Analysis can be found for example in C. Bishop, Neural Networks for Pattern Recognition, Oxford Press, pp. 105-112.

Internal variables of interest

self.avg
Mean of the input data (available after training)
self.v
Transposed of the projection matrix, so that output = dot(input-self.avg, self.v) (available after training).

Full API documentation: FDANode

class mdp.nodes.FANode

Perform Factor Analysis.

The current implementation should be most efficient for long data sets: the sufficient statistics are collected in the training phase, and all EM-cycles are performed at its end.

The execute method returns the Maximum A Posteriori estimate of the latent variables. The generate_input method generates observations from the prior distribution.

Internal variables of interest

self.mu
Mean of the input data (available after training)
self.A
Generating weights (available after training)
self.E_y_mtx
Weights for Maximum A Posteriori inference
self.sigma
Vector of estimated variance of the noise for all input components

More information about Factor Analysis can be found in Max Welling’s classnotes: http://www.ics.uci.edu/~welling/classnotes/classnotes.html , in the chapter ‘Linear Models’.

Full API documentation: FANode

class mdp.nodes.RBMNode

Restricted Boltzmann Machine node. An RBM is an undirected probabilistic network with binary variables. The graph is bipartite into observed (visible) and hidden (latent) variables.

By default, the execute method returns the probability of one of the hiden variables being equal to 1 given the input.

Use the sample_v method to sample from the observed variables given a setting of the hidden variables, and sample_h to do the opposite. The energy method can be used to compute the energy of a given setting of all variables.

The network is trained by Contrastive Divergence, as described in Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1711-1800

Internal variables of interest

self.w
Generative weights between hidden and observed variables
self.bv
bias vector of the observed variables
self.bh
bias vector of the hidden variables

For more information on RBMs, see Geoffrey E. Hinton (2007) Boltzmann machine. Scholarpedia, 2(5):1668

Full API documentation: RBMNode

class mdp.nodes.RBMWithLabelsNode

Restricted Boltzmann Machine with softmax labels. An RBM is an undirected probabilistic network with binary variables. In this case, the node is partitioned into a set of observed (visible) variables, a set of hidden (latent) variables, and a set of label variables (also observed), only one of which is active at any time. The node is able to learn associations between the visible variables and the labels.

By default, the execute method returns the probability of one of the hiden variables being equal to 1 given the input.

Use the sample_v method to sample from the observed variables (visible and labels) given a setting of the hidden variables, and sample_h to do the opposite. The energy method can be used to compute the energy of a given setting of all variables.

The network is trained by Contrastive Divergence, as described in Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1711-1800

Internal variables of interest:

self.w
Generative weights between hidden and observed variables
self.bv
bias vector of the observed variables
self.bh
bias vector of the hidden variables

• Geoffrey E. Hinton (2007) Boltzmann machine. Scholarpedia, 2(5):1668.
• Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554.

Full API documentation: RBMWithLabelsNode

class mdp.nodes.GrowingNeuralGasNode

Learn the topological structure of the input data by building a corresponding graph approximation.

The algorithm expands on the original Neural Gas algorithm (see mdp.nodes NeuralGasNode) in that the algorithm adds new nodes are added to the graph as more data becomes available. Im this way, if the growth rate is appropriate, one can avoid overfitting or underfitting the data.

More information about the Growing Neural Gas algorithm can be found in B. Fritzke, A Growing Neural Gas Network Learns Topologies, in G. Tesauro, D. S. Touretzky, and T. K. Leen (editors), Advances in Neural Information Processing Systems 7, pages 625-632. MIT Press, Cambridge MA, 1995.

Attributes and methods of interest

• graph – The corresponding mdp.graph.Graph object

Full API documentation: GrowingNeuralGasNode

class mdp.nodes.LLENode

Perform a Locally Linear Embedding analysis on the data.

Internal variables of interest

self.training_projection
The LLE projection of the training data (defined when training finishes).
self.desired_variance
variance limit used to compute intrinsic dimensionality.

Based on the algorithm outlined in An Introduction to Locally Linear Embedding by L. Saul and S. Roweis, using improvements suggested in Locally Linear Embedding for Classification by D. deRidder and R.P.W. Duin.

References: Roweis, S. and Saul, L., Nonlinear dimensionality reduction by locally linear embedding, Science 290 (5500), pp. 2323-2326, 2000.

Original code contributed by: Jake VanderPlas, University of Washington,

Full API documentation: LLENode

class mdp.nodes.HLLENode

Perform a Hessian Locally Linear Embedding analysis on the data.

Internal variables of interest

self.training_projection
the HLLE projection of the training data (defined when training finishes)
self.desired_variance
variance limit used to compute intrinsic dimensionality.

Implementation based on algorithm outlined in Donoho, D. L., and Grimes, C., Hessian Eigenmaps: new locally linear embedding techniques for high-dimensional data, Proceedings of the National Academy of Sciences 100(10): 5591-5596, 2003.

Original code contributed by: Jake Vanderplas, University of Washington

Full API documentation: HLLENode

class mdp.nodes.LinearRegressionNode

Compute least-square, multivariate linear regression on the input data, i.e., learn coefficients b_j so that:

y_i = b_0 + b_1 x_1 + ... b_N x_N ,

for i = 1 ... M, minimizes the square error given the training x‘s and y‘s.

This is a supervised learning node, and requires input data x and target data y to be supplied during training (see train docstring).

Internal variables of interest

self.beta
The coefficients of the linear regression

Full API documentation: LinearRegressionNode

Perform expansion in the space formed by all linear and quadratic monomials. QuadraticExpansionNode() is equivalent to a PolynomialExpansionNode(2)

class mdp.nodes.PolynomialExpansionNode

Perform expansion in a polynomial space.

Full API documentation: PolynomialExpansionNode

class mdp.nodes.RBFExpansionNode

Expand input space with Gaussian Radial Basis Functions (RBFs).

The input data is filtered through a set of unnormalized Gaussian filters, i.e.:

y_j = exp(-0.5/s_j * ||x - c_j||^2)

for isotropic RBFs, or more in general:

y_j = exp(-0.5 * (x-c_j)^T S^-1 (x-c_j))

for anisotropic RBFs.

Full API documentation: RBFExpansionNode

class mdp.nodes.GeneralExpansionNode

Expands the input signal x according to a list [f_0, ... f_k] of functions.

Each function f_i should take the whole two-dimensional array x as input and output another two-dimensional array. Moreover the output dimension should depend only on the input dimension. The output of the node is [f_0[x], ... f_k[x]], that is, the concatenation of each one of the outputs f_i[x].

Original code contributed by Alberto Escalante.

Full API documentation: GeneralExpansionNode

class mdp.nodes.GrowingNeuralGasExpansionNode

Perform a trainable radial basis expansion, where the centers and sizes of the basis functions are learned through a growing neural gas.

positions of RBFs
position of the nodes of the neural gas
sizes of the RBFs
mean distance to the neighbouring nodes.

Important: Adjust the maximum number of nodes to control the dimension of the expansion.

More information on this expansion type can be found in: B. Fritzke. Growing cell structures-a self-organizing network for unsupervised and supervised learning. Neural Networks 7, p. 1441–1460 (1994).

Full API documentation: GrowingNeuralGasExpansionNode

class mdp.nodes.NeuralGasNode

Learn the topological structure of the input data by building a corresponding graph approximation (original Neural Gas algorithm).

The Neural Gas algorithm was originally published in Martinetz, T. and Schulten, K.: A “Neural-Gas” Network Learns Topologies. In Kohonen, T., Maekisara, K., Simula, O., and Kangas, J. (eds.), Artificial Neural Networks. Elsevier, North-Holland., 1991.

Attributes and methods of interest

• graph – The corresponding mdp.graph.Graph object
• max_epochs - maximum number of epochs until which to train.

Full API documentation: NeuralGasNode

class mdp.nodes.SignumClassifier

This classifier node classifies as 1 if the sum of the data points is positive and as -1 if the data point is negative

Full API documentation: SignumClassifier

class mdp.nodes.PerceptronClassifier

A simple perceptron with input_dim input nodes.

Full API documentation: PerceptronClassifier

class mdp.nodes.SimpleMarkovClassifier

A simple version of a Markov classifier. It can be trained on a vector of tuples the label being the next element in the testing data.

Full API documentation: SimpleMarkovClassifier

class mdp.nodes.DiscreteHopfieldClassifier

Node for simulating a simple discrete Hopfield model

Full API documentation: DiscreteHopfieldClassifier

class mdp.nodes.KMeansClassifier

Employs K-Means Clustering for a given number of centroids.

Full API documentation: KMeansClassifier

class mdp.nodes.NormalizeNode

Make input signal meanfree and unit variance

Full API documentation: NormalizeNode

class mdp.nodes.GaussianClassifier

Perform a supervised Gaussian classification.

Given a set of labelled data, the node fits a gaussian distribution to each class.

Full API documentation: GaussianClassifier

class mdp.nodes.NearestMeanClassifier

Nearest-Mean classifier.

Full API documentation: NearestMeanClassifier

class mdp.nodes.KNNClassifier

K-Nearest-Neighbour Classifier.

Full API documentation: KNNClassifier

class mdp.nodes.EtaComputerNode

Compute the eta values of the normalized training data.

The delta value of a signal is a measure of its temporal variation, and is defined as the mean of the derivative squared, i.e. delta(x) = mean(dx/dt(t)^2). delta(x) is zero if x is a constant signal, and increases if the temporal variation of the signal is bigger.

The eta value is a more intuitive measure of temporal variation, defined as:

eta(x) = T/(2*pi) * sqrt(delta(x))


If x is a signal of length T which consists of a sine function that accomplishes exactly N oscillations, then eta(x)=N.

EtaComputerNode normalizes the training data to have unit variance, such that it is possible to compare the temporal variation of two signals independently from their scaling.

Reference: Wiskott, L. and Sejnowski, T.J. (2002). Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715-770.

Important: if a data chunk is tlen data points long, this node is going to consider only the first tlen-1 points together with their derivatives. This means in particular that the variance of the signal is not computed on all data points. This behavior is compatible with that of SFANode.

This is an analysis node, i.e. the data is analyzed during training and the results are stored internally. Use the method get_eta to access them.

Full API documentation: EtaComputerNode

Collect the first n local maxima and minima of the training signal which are separated by a minimum gap d.

This is an analysis node, i.e. the data is analyzed during training and the results are stored internally. Use the get_maxima and get_minima methods to access them.

class mdp.nodes.NoiseNode

Inject multiplicative or additive noise into the input data.

Original code contributed by Mathias Franzius.

Full API documentation: NoiseNode

class mdp.nodes.NormalNoiseNode

Special version of NoiseNode for Gaussian additive noise.

Unlike NoiseNode it does not store a noise function reference but simply uses numx_rand.normal.

Full API documentation: NormalNoiseNode

class mdp.nodes.TimeFramesNode

Copy delayed version of the input signal on the space dimensions.

For example, for time_frames=3 and gap=2:

[ X(1) Y(1)        [ X(1) Y(1) X(3) Y(3) X(5) Y(5)
X(2) Y(2)          X(2) Y(2) X(4) Y(4) X(6) Y(6)
X(3) Y(3)   -->    X(3) Y(3) X(5) Y(5) X(7) Y(7)
X(4) Y(4)          X(4) Y(4) X(6) Y(6) X(8) Y(8)
X(5) Y(5)          ...  ...  ...  ...  ...  ... ]
X(6) Y(6)
X(7) Y(7)
X(8) Y(8)
...  ...  ]

It is not always possible to invert this transformation (the transformation is not surjective. However, the pseudo_inverse method does the correct thing when it is indeed possible.

Full API documentation: TimeFramesNode

class mdp.nodes.TimeDelayNode

Copy delayed version of the input signal on the space dimensions.

For example, for time_frames=3 and gap=2:

[ X(1) Y(1)        [ X(1) Y(1)   0    0    0    0
X(2) Y(2)          X(2) Y(2)   0    0    0    0
X(3) Y(3)   -->    X(3) Y(3) X(1) Y(1)   0    0
X(4) Y(4)          X(4) Y(4) X(2) Y(2)   0    0
X(5) Y(5)          X(5) Y(5) X(3) Y(3) X(1) Y(1)
X(6) Y(6)          ...  ...  ...  ...  ...  ... ]
X(7) Y(7)
X(8) Y(8)
...  ...  ]

This node provides similar functionality as the TimeFramesNode, only that it performs a time embedding into the past rather than into the future.

See TimeDelaySlidingWindowNode for a sliding window delay node for application in a non-batch manner.

Original code contributed by Sebastian Hoefer. Dec 31, 2010

Full API documentation: TimeDelayNode

class mdp.nodes.TimeDelaySlidingWindowNode

TimeDelaySlidingWindowNode is an alternative to TimeDelayNode which should be used for online learning/execution. Whereas the TimeDelayNode works in a batch manner, for online application a sliding window is necessary which yields only one row per call.

Applied to the same data the collection of all returned rows of the TimeDelaySlidingWindowNode is equivalent to the result of the TimeDelayNode.

Original code contributed by Sebastian Hoefer. Dec 31, 2010

Full API documentation: TimeDelaySlidingWindowNode

class mdp.nodes.CutoffNode

Node to cut off values at specified bounds.

Works similar to numpy.clip, but also works when only a lower or upper bound is specified.

Full API documentation: CutoffNode

Node which uses the data history during training to learn cutoff values.

As opposed to the simple CutoffNode, a different cutoff value is learned for each data coordinate. For example if an upper cutoff fraction of 0.05 is specified, then the upper cutoff bound is set so that the upper 5% of the training data would have been clipped (in each dimension). The cutoff bounds are then applied during execution. This node also works as a HistogramNode, so the histogram data is stored.

When stop_training is called the cutoff values for each coordinate are calculated based on the collected histogram data.

class mdp.nodes.HistogramNode

Node which stores a history of the data during its training phase.

The data history is stored in self.data_hist and can also be deleted to free memory. Alternatively it can be automatically pickled to disk.

Note that data is only stored during training.

Full API documentation: HistogramNode

class mdp.nodes.IdentityNode

Execute returns the input data and the node is not trainable.

This node can be instantiated and is for example useful in complex network layouts.

Full API documentation: IdentityNode

class mdp.nodes.Convolution2DNode

Convolve input data with filter banks.

The filters argument specifies a set of 2D filters that are convolved with the input data during execution. Convolution can be selected to be executed by linear filtering of the data, or in the frequency domain using a Discrete Fourier Transform.

Input data can be given as 3D data, each row being a 2D array to be convolved with the filters, or as 2D data, in which case the input_shape argument must be specified.

This node depends on scipy.

Full API documentation: Convolution2DNode

class mdp.nodes.ShogunSVMClassifier

The ShogunSVMClassifier works as a wrapper class for accessing the SHOGUN machine learning toolbox for support vector machines.

Most kernel machines and linear classifier should work with this class.

Currently, distance machines such as the K-means classifier are not supported yet.

Information to paramters and additional options can be found on http://www.shogun-toolbox.org/

Note that some parts in this classifier might receive some refinement in the future.

This node depends on shogun.

Full API documentation: ShogunSVMClassifier

class mdp.nodes.LibSVMClassifier

The LibSVMClassifier class acts as a wrapper around the LibSVM library for support vector machines.

Information to the parameters can be found on http://www.csie.ntu.edu.tw/~cjlin/libsvm/

The class provides access to change kernel and svm type with a text string.

Additionally self.parameter is exposed which allows to change all other svm parameters directly.

This node depends on libsvm.

Full API documentation: LibSVMClassifier

class mdp.nodes.SGDRegressorScikitsLearnNode

Full API documentation: SGDRegressorScikitsLearnNode

class mdp.nodes.PatchExtractorScikitsLearnNode

Extracts patches from a collection of images

This node has been automatically generated by wrapping the sklearn.feature_extraction.image.PatchExtractor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

patch_size: tuple of ints (patch_height, patch_width)
the dimensions of one patch
max_patches: integer or float, optional default is None
The maximum number of patches per image to extract. If max_patches is a float in (0, 1), it is taken to mean a proportion of the total number of patches.
random_state: int or RandomState
Pseudo number generator state used for random sampling.

Full API documentation: PatchExtractorScikitsLearnNode

class mdp.nodes.LinearModelCVScikitsLearnNode

This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.LinearModelCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Full API documentation: LinearModelCVScikitsLearnNode

class mdp.nodes.DictionaryLearningScikitsLearnNode

Dictionary learning

This node has been automatically generated by wrapping the sklearn.decomposition.dict_learning.DictionaryLearning class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Finds a dictionary (a set of atoms) that can best be used to represent data using a sparse code.

Solves the optimization problem:

(U^*,V^*) = argmin 0.5 || Y - U V ||_2^2 + alpha * || U ||_1
(U,V)
with || V_k ||_2 = 1 for all  0 <= k < n_atoms

Parameters

n_atoms : int,
number of dictionary elements to extract
alpha : int,
sparsity controlling parameter
max_iter : int,
maximum number of iterations to perform
tol : float,
tolerance for numerical error
fit_algorithm : {‘lars’, ‘cd’}
lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.
transform_algorithm : {‘lasso_lars’, ‘lasso_cd’, ‘lars’, ‘omp’, ‘threshold’}
Algorithm used to transform the data lars: uses the least angle regression method (linear_model.lars_path) lasso_lars: uses Lars to compute the Lasso solution lasso_cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). lasso_lars will be faster if the estimated components are sparse. omp: uses orthogonal matching pursuit to estimate the sparse solution threshold: squashes to zero all coefficients less than alpha from the projection dictionary * X'
transform_n_nonzero_coefs : int, 0.1 * n_features by default
Number of nonzero coefficients to target in each column of the solution. This is only used by algorithm=’lars’ and algorithm=’omp’ and is overridden by alpha in the omp case.
transform_alpha : float, 1. by default
If algorithm=’lasso_lars’ or algorithm=’lasso_cd’, alpha is the penalty applied to the L1 norm. If algorithm=’threshold’, alpha is the absolute value of the threshold below which coefficients will be squashed to zero. If algorithm=’omp’, alpha is the tolerance parameter: the value of the reconstruction error targeted. In this case, it overrides n_nonzero_coefs.
split_sign : bool, False by default
Whether to split the sparse feature vector into the concatenation of its negative part and its positive part. This can improve the performance of downstream classifiers.
n_jobs : int,
number of parallel jobs to run
code_init : array of shape (n_samples, n_atoms),
initial value for the code, for warm restart
dict_init : array of shape (n_atoms, n_features),
initial values for the dictionary, for warm restart

verbose :

• degree of verbosity of the printed output
random_state : int or RandomState
Pseudo number generator state used for random sampling.

Attributes

components_ : array, [n_atoms, n_features]
dictionary atoms extracted from the data
error_ : array
vector of errors at each iteration

Notes

References:

J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning for sparse coding (http://www.di.ens.fr/sierra/pdfs/icml09.pdf)

SparseCoder MiniBatchDictionaryLearning SparsePCA MiniBatchSparsePCA

Full API documentation: DictionaryLearningScikitsLearnNode

class mdp.nodes.PerceptronScikitsLearnNode

Perceptron

This node has been automatically generated by wrapping the sklearn.linear_model.perceptron.Perceptron class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

penalty : None, ‘l2’ or ‘l1’ or ‘elasticnet’
The penalty (aka regularization term) to be used. Defaults to None.
alpha : float
Constant that multiplies the regularization term if regularization is used. Defaults to 0.0001
fit_intercept: bool
Whether the intercept should be estimated or not. If False, the data is assumed to be already centered. Defaults to True.
n_iter: int, optional
The number of passes over the training data (aka epochs). Defaults to 5.
shuffle: bool, optional
Whether or not the training data should be shuffled after each epoch. Defaults to False.
seed: int, optional
The seed of the pseudo random number generator to use when shuffling the data.
verbose: integer, optional
The verbosity level
n_jobs: integer, optional
The number of CPUs to use to do the OVA (One Versus All, for multi-class problems) computation. -1 means ‘all CPUs’. Defaults to 1.
eta0 : double
Constant by which the updates are multiplied. Defaults to 1.
class_weight : dict, {class_label : weight} or “auto” or None, optional

Preset for the class_weight fit parameter.

Weights associated with classes. If not given, all classes are supposed to have weight one.

The “auto” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies.

warm_start : bool, optional
When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.

Attributes

coef_ : array, shape = [1, n_features] if n_classes == 2 else [n_classes, n_features]

Weights assigned to the features.
intercept_ : array, shape = [1] if n_classes == 2 else [n_classes]
Constants in decision function.

Notes

Perceptron and SGDClassifier share the same underlying implementation. In fact, Perceptron() is equivalent to SGDClassifier(loss=”perceptron”, eta0=1, learning_rate=”constant”, penalty=None).

SGDClassifier

References

http://en.wikipedia.org/wiki/Perceptron and references therein.

Full API documentation: PerceptronScikitsLearnNode

class mdp.nodes.RidgeClassifierScikitsLearnNode

Classifier using Ridge regression.

This node has been automatically generated by wrapping the sklearn.linear_model.ridge.RidgeClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

alpha : float
Small positive values of alpha improve the conditioning of the problem and reduce the variance of the estimates. Alpha corresponds to (2*C)^-1 in other linear models such as LogisticRegression or LinearSVC.
fit_intercept : boolean
Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
normalize : boolean, optional
If True, the regressors X are normalized
copy_X : boolean, optional, default True
If True, X will be copied; else, it may be overwritten.
tol : float
Precision of the solution.
class_weight : dict, optional
Weights associated with classes in the form {class_label : weight}. If not given, all classes are supposed to have weight one.

Attributes

coef_ : array, shape = [n_features] or [n_classes, n_features]
Weight vector(s).

Ridge, RidgeClassifierCV

Notes

For multi-class classification, n_class classifiers are trained in a one-versus-all approach. Concretely, this is implemented by taking advantage of the multi-variate response support in Ridge.

Full API documentation: RidgeClassifierScikitsLearnNode

class mdp.nodes.WardAgglomerationScikitsLearnNode

Feature agglomeration based on Ward hierarchical clustering

This node has been automatically generated by wrapping the sklearn.cluster.hierarchical.WardAgglomeration class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

n_clusters : int or ndarray
The number of clusters.
connectivity : sparse matrix
connectivity matrix. Defines for each feature the neigbhoring features following a given structure of the data. Default is None, i.e, the hiearchical agglomeration algorithm is unstructured.
memory : Instance of joblib.Memory or string
Used to cache the output of the computation of the tree. By default, no caching is done. If a string is given, it is the path to the caching directory.
copy : bool
Copy the connectivity matrix or work inplace.
n_components : int (optional)
The number of connected components in the graph defined by the connectivity matrix. If not set, it is estimated.
compute_full_tree: bool or ‘auto’ (optional)
Stop early the construction of the tree at n_clusters. This is useful to decrease computation time if the number of clusters is not small compared to the number of samples. This option is useful only when specifying a connectivity matrix. Note also that when varying the number of cluster and using caching, it may be advantageous to compute the full tree.

Attributes

children_ : array-like, shape = [n_nodes, 2]
List of the children of each nodes. Leaves of the tree do not appear.
labels_ : array [n_samples]
cluster labels for each point
n_leaves_ : int
Number of leaves in the hiearchical tree.

Full API documentation: WardAgglomerationScikitsLearnNode

class mdp.nodes.KNeighborsClassifierScikitsLearnNode

Classifier implementing the k-nearest neighbors vote.

This node has been automatically generated by wrapping the sklearn.neighbors.classification.KNeighborsClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

n_neighbors : int, optional (default = 5)
Number of neighbors to use by default for k_neighbors() queries.
weights : str or callable

weight function used in prediction. Possible values:

• ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
• ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
• [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

Uniform weights are used by default.

algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional

Algorithm used to compute the nearest neighbors:

• ‘ball_tree’ will use BallTree
• ‘kd_tree’ will use scipy.spatial.cKDtree
• ‘brute’ will use a brute-force search.
• ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_size : int, optional (default = 30)
Leaf size passed to BallTree or cKDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.
warn_on_equidistant : boolean, optional. Defaults to True.
Generate a warning if equidistant neighbors are discarded. For classification or regression based on k-neighbors, if neighbor k and neighbor k+1 have identical distances but different labels, then the result will be dependent on the ordering of the training data. If the fit method is 'kd_tree', no warnings will be generated.
p: integer, optional (default = 2)
Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

Examples

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import KNeighborsClassifier
>>> neigh = KNeighborsClassifier(n_neighbors=3)
>>> neigh.fit(X, y)
KNeighborsClassifier(...)
>>> print(neigh.predict([[1.1]]))
[0]
>>> print(neigh.predict_proba([[0.9]]))
[[ 0.66666667  0.33333333]]


Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Full API documentation: KNeighborsClassifierScikitsLearnNode

class mdp.nodes.NuSVRScikitsLearnNode

NuSVR for sparse matrices (csr)

This node has been automatically generated by wrapping the sklearn.svm.sparse.classes.NuSVR class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

See sklearn.svm.NuSVC for a complete list of parameters

Notes

For best results, this accepts a matrix in csr format (scipy.sparse.csr), but should be able to convert from any array-like object (including other sparse representations).

Examples

>>> from sklearn.svm.sparse import NuSVR
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> np.random.seed(0)
>>> y = np.random.randn(n_samples)
>>> X = np.random.randn(n_samples, n_features)
>>> clf = NuSVR(nu=0.1, C=1.0)
>>> clf.fit(X, y)
NuSVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma=0.0,
kernel='rbf', nu=0.1, probability=False, shrinking=True, tol=0.001,
verbose=False)


Full API documentation: NuSVRScikitsLearnNode

class mdp.nodes.NearestCentroidScikitsLearnNode

Nearest centroid classifier.

This node has been automatically generated by wrapping the sklearn.neighbors.nearest_centroid.NearestCentroid class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Each class is represented by its centroid, with test samples classified to the class with the nearest centroid.

Parameters

metric: string, or callable
The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by metrics.pairwise.pairwise_distances for its metric parameter.
shrink_threshold : float, optional (default = None)
Threshold for shrinking centroids to remove features.

Attributes

centroids_ : array-like, shape = [n_classes, n_features]
Centroid of each class

Examples

>>> from sklearn.neighbors.nearest_centroid import NearestCentroid
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = NearestCentroid()
>>> clf.fit(X, y)
NearestCentroid(metric='euclidean', shrink_threshold=None)
>>> print clf.predict([[-0.8, -1]])
[1]


sklearn.neighbors.KNeighborsClassifier: nearest neighbors classifier

Notes

When used for text classification with tf–idf vectors, this classifier is also known as the Rocchio classifier.

References

Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences of the United States of America, 99(10), 6567-6572. The National Academy of Sciences.

Full API documentation: NearestCentroidScikitsLearnNode

class mdp.nodes.ExtraTreeRegressorScikitsLearnNode

An extremely randomized tree regressor.

This node has been automatically generated by wrapping the sklearn.tree.tree.ExtraTreeRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Extra-trees differ from classic decision trees in the way they are built. When looking for the best split to separate the samples of a node into two groups, random splits are drawn for each of the max_features randomly selected features and the best split among those is chosen. When max_features is set 1, this amounts to building a totally random decision tree.

Warning: Extra-trees should only be used within ensemble methods.

ExtraTreeClassifier : A classifier base on extremely randomized trees sklearn.ensemble.ExtraTreesClassifier : An ensemble of extra-trees for

classification
sklearn.ensemble.ExtraTreesRegressor : An ensemble of extra-trees for
regression

References

 [1] P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006.

Full API documentation: ExtraTreeRegressorScikitsLearnNode

class mdp.nodes.ExtraTreesClassifierScikitsLearnNode

An extra-trees classifier.

This node has been automatically generated by wrapping the sklearn.ensemble.forest.ExtraTreesClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.

Parameters

n_estimators : integer, optional (default=10)
The number of trees in the forest.
criterion : string, optional (default=”gini”)
The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain. Note: this parameter is tree-specific.
max_depth : integer or None, optional (default=None)
The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. Note: this parameter is tree-specific.
min_samples_split : integer, optional (default=1)
The minimum number of samples required to split an internal node. Note: this parameter is tree-specific.
min_samples_leaf : integer, optional (default=1)
The minimum number of samples in newly created leaves. A split is discarded if after the split, one of the leaves would contain less then min_samples_leaf samples. Note: this parameter is tree-specific.
min_density : float, optional (default=0.1)
This parameter controls a trade-off in an optimization heuristic. It controls the minimum density of the sample_mask (i.e. the fraction of samples in the mask). If the density falls below this threshold the mask is recomputed and the input data is packed which results in data copying. If min_density equals to one, the partitions are always represented as copies of the original data. Otherwise, partitions are represented as bit masks (aka sample masks). Note: this parameter is tree-specific.
max_features : int, string or None, optional (default=”auto”)
The number of features to consider when looking for the best split.
• If “auto”, then max_features=sqrt(n_features) on classification tasks and max_features=n_features on regression problems.
• If “sqrt”, then max_features=sqrt(n_features).
• If “log2”, then max_features=log2(n_features).
• If None, then max_features=n_features.

Note: this parameter is tree-specific.

bootstrap : boolean, optional (default=False)
Whether bootstrap samples are used when building trees.
compute_importances : boolean, optional (default=True)
Whether feature importances are computed and stored into the feature_importances_ attribute when calling fit.
oob_score : bool
Whether to use out-of-bag samples to estimate the generalization error.
n_jobs : integer, optional (default=1)
The number of jobs to run in parallel. If -1, then the number of jobs is set to the number of cores.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
verbose : int, optional (default=0)
Controlls the verbosity of the tree building process.

Attributes

estimators_: list of DecisionTreeClassifier
The collection of fitted sub-estimators.
feature_importances_ : array of shape = [n_features]
The feature mportances (the higher, the more important the feature).
oob_score_ : float
Score of the training dataset obtained using an out-of-bag estimate.
oob_decision_function_ : array, shape = [n_samples, n_classes]
Decision function computed with out-of-bag estimate on the training set.

References

 [1] P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006.

sklearn.tree.ExtraTreeClassifier : Base classifier for this ensemble. RandomForestClassifier : Ensemble Classifier based on trees with optimal

splits.

Full API documentation: ExtraTreesClassifierScikitsLearnNode

class mdp.nodes.LassoCVScikitsLearnNode

Lasso linear model with iterative fitting along a regularization path

This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.LassoCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The best model is selected by cross-validation.

The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1

Parameters

eps : float, optional
Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3.
n_alphas : int, optional
Number of alphas along the regularization path
alphas : numpy array, optional
List of alphas where to compute the models. If None alphas are set automatically
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument.
max_iter: int, optional
The maximum number of iterations
tol: float, optional
The tolerance for the optimization: if the updates are smaller than ‘tol’, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.
cv : integer or crossvalidation generator, optional
If an integer is passed, it is the number of fold (default 3). Specific crossvalidation objects can be passed, see sklearn.cross_validation module for the list of possible objects
verbose : bool or integer
amount of verbosity

Attributes

alpha_: float
The amount of penalization choosen by cross validation
coef_ : array, shape = (n_features,)
parameter vector (w in the cost function formula)
intercept_ : float
independent term in decision function.
mse_path_: array, shape = (n_alphas, n_folds)
mean square error for the test set on each fold, varying alpha
alphas_: numpy array
The grid of alphas used for fitting

Notes

See examples/linear_model/lasso_path_with_crossvalidation.py for an example.

To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.

lars_path lasso_path LassoLars Lasso LassoLarsCV

Full API documentation: LassoCVScikitsLearnNode

class mdp.nodes.OneClassSVMScikitsLearnNode

Unsupervised Outliers Detection.

This node has been automatically generated by wrapping the sklearn.svm.classes.OneClassSVM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Estimate the support of a high-dimensional distribution.

The implementation is based on libsvm.

Parameters

kernel : string, optional
Specifies the kernel type to be used in the algorithm. Can be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’. If none is given ‘rbf’ will be used.
nu : float, optional
An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Should be in the interval (0, 1]. By default 0.5 will be taken.
degree : int, optional
Degree of kernel function. Significant only in poly, rbf, sigmoid.
gamma : float, optional (default=0.0)
kernel coefficient for rbf and poly, if gamma is 0.0 then 1/n_features will be taken.
coef0 : float, optional
Independent term in kernel function. It is only significant in poly/sigmoid.
tol: float, optional
Tolerance for stopping criterion.
shrinking: boolean, optional
Whether to use the shrinking heuristic.
cache_size: float, optional
Specify the size of the kernel cache (in MB)
verbose : bool, default: False
Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.

Attributes

support_ : array-like, shape = [n_SV]
Index of support vectors.
support_vectors_ : array-like, shape = [nSV, n_features]
Support vectors.
dual_coef_ : array, shape = [n_classes-1, n_SV]
Coefficient of the support vector in the decision function.
coef_ : array, shape = [n_classes-1, n_features]

Weights asigned to the features (coefficients in the primal problem). This is only available in the case of linear kernel.

coef_ is readonly property derived from dual_coef_ and support_vectors_

intercept_ : array, shape = [n_classes-1]
Constants in decision function.

Full API documentation: OneClassSVMScikitsLearnNode

class mdp.nodes.RidgeCVScikitsLearnNode

Ridge regression with built-in cross-validation.

This node has been automatically generated by wrapping the sklearn.linear_model.ridge.RidgeCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

By default, it performs Generalized Cross-Validation, which is a form of efficient Leave-One-Out cross-validation.

Parameters

alphas: numpy array of shape [n_alphas]
Array of alpha values to try. Small positive values of alpha improve the conditioning of the problem and reduce the variance of the estimates. Alpha corresponds to (2*C)^-1 in other linear models such as LogisticRegression or LinearSVC.
fit_intercept : boolean
Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
normalize : boolean, optional
If True, the regressors X are normalized
score_func: callable, optional
function that takes 2 arguments and compares them in order to evaluate the performance of prediction (big is good) if None is passed, the score of the estimator is maximized
loss_func: callable, optional
function that takes 2 arguments and compares them in order to evaluate the performance of prediction (small is good) if None is passed, the score of the estimator is maximized
cv : cross-validation generator, optional
If None, Generalized Cross-Validation (efficient Leave-One-Out) will be used.
gcv_mode : {None, ‘auto’, ‘svd’, eigen’}, optional

Flag indicating which strategy to use when performing Generalized Cross-Validation. Options are:

'auto' : use svd if n_samples > n_features, otherwise use eigen
'svd' : force computation via singular value decomposition of X
'eigen' : force computation via eigendecomposition of X^T X

The ‘auto’ mode is the default and is intended to pick the cheaper option of the two depending upon the shape of the training data.

store_cv_values : boolean, default=False
Flag indicating if the cross-validation values corresponding to each alpha should be stored in the cv_values_ attribute (see below). This flag is only compatible with cv=None (i.e. using Generalized Cross-Validation).

Attributes

cv_values_ : array, shape = [n_samples, n_alphas] or shape = [n_samples, n_responses, n_alphas], optional
Cross-validation values for each alpha (if store_cv_values=True and cv=None). After fit() has been called, this attribute will contain the mean squared errors (by default) or the values of the {loss,score}_func function (if provided in the constructor).
coef_ : array, shape = [n_features] or [n_responses, n_features]
Weight vector(s).
alpha_ : float
Estimated regularization parameter.

Ridge: Ridge regression RidgeClassifier: Ridge classifier RidgeClassifierCV: Ridge classifier with built-in cross validation

Full API documentation: RidgeCVScikitsLearnNode

class mdp.nodes.PriorProbabilityEstimatorScikitsLearnNode

An estimator predicting the probability of each

This node has been automatically generated by wrapping the sklearn.ensemble.gradient_boosting.PriorProbabilityEstimator class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Full API documentation: PriorProbabilityEstimatorScikitsLearnNode

class mdp.nodes.ARDRegressionScikitsLearnNode

Bayesian ARD regression.

This node has been automatically generated by wrapping the sklearn.linear_model.bayes.ARDRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Fit the weights of a regression model, using an ARD prior. The weights of the regression model are assumed to be in Gaussian distributions. Also estimate the parameters lambda (precisions of the distributions of the weights) and alpha (precision of the distribution of the noise). The estimation is done by an iterative procedures (Evidence Maximization)

Parameters

X : array, shape = (n_samples, n_features)
Training vectors.
y : array, shape = (n_samples)
Target values for training vectors
n_iter : int, optional
Maximum number of iterations. Default is 300
tol : float, optional
Stop the algorithm if w has converged. Default is 1.e-3.
alpha_1 : float, optional
Hyper-parameter : shape parameter for the Gamma distribution prior over the alpha parameter. Default is 1.e-6.
alpha_2 : float, optional
Hyper-parameter : inverse scale parameter (rate parameter) for the Gamma distribution prior over the alpha parameter. Default is 1.e-6.
lambda_1 : float, optional
Hyper-parameter : shape parameter for the Gamma distribution prior over the lambda parameter. Default is 1.e-6.
lambda_2 : float, optional
Hyper-parameter : inverse scale parameter (rate parameter) for the Gamma distribution prior over the lambda parameter. Default is 1.e-6.
compute_score : boolean, optional
If True, compute the objective function at each step of the model. Default is False.
threshold_lambda : float, optional
threshold for removing (pruning) weights with high precision from the computation. Default is 1.e+4.
fit_intercept : boolean, optional
wether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered). Default is True.
normalize : boolean, optional
If True, the regressors X are normalized
copy_X : boolean, optional, default True.
If True, X will be copied; else, it may be overwritten.
verbose : boolean, optional, default False
Verbose mode when fitting the model.

Attributes

coef_ : array, shape = (n_features)
Coefficients of the regression model (mean of distribution)
alpha_ : float
estimated precision of the noise.
lambda_ : array, shape = (n_features)
estimated precisions of the weights.
sigma_ : array, shape = (n_features, n_features)
estimated variance-covariance matrix of the weights
scores_ : float
if computed, value of the objective function (to be maximized)

Examples

>>> from sklearn import linear_model
>>> clf = linear_model.ARDRegression()
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
...
ARDRegression(alpha_1=1e-06, alpha_2=1e-06, compute_score=False,
copy_X=True, fit_intercept=True, lambda_1=1e-06, lambda_2=1e-06,
n_iter=300, normalize=False, threshold_lambda=10000.0, tol=0.001,
verbose=False)
>>> clf.predict([[1, 1]])
array([ 1.])


Notes

See examples/linear_model/plot_ard.py for an example.

Full API documentation: ARDRegressionScikitsLearnNode

This node has been automatically generated by wrapping the sklearn.ensemble.gradient_boosting.GradientBoostingRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage a regression tree is fit on the negative gradient of the given loss function.

Parameters

loss : {‘ls’, ‘lad’, ‘huber’, ‘quantile’}, optional (default=’ls’)
loss function to be optimized. ‘ls’ refers to least squares regression. ‘lad’ (least absolute deviation) is a highly robust loss function soley based on order information of the input variables. ‘huber’ is a combination of the two. ‘quantile’ allows quantile regression (use alpha to specify the quantile).
learn_rate : float, optional (default=0.1)
learning rate shrinks the contribution of each tree by learn_rate. There is a trade-off between learn_rate and n_estimators.
n_estimators : int (default=100)
The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
max_depth : integer, optional (default=3)
maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
min_samples_split : integer, optional (default=1)
The minimum number of samples required to split an internal node.
min_samples_leaf : integer, optional (default=1)
The minimum number of samples required to be at a leaf node.
subsample : float, optional (default=1.0)
The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
max_features : int, None, optional (default=None)
The number of features to consider when looking for the best split. Features are choosen randomly at each split point. If None, then max_features=n_features. Choosing max_features < n_features leads to a reduction of variance and an increase in bias.
alpha : float (default=0.9)
The alpha-quantile of the huber loss function and the quantile loss function. Only if loss='huber' or loss='quantile'.

Attributes

feature_importances_ : array, shape = [n_features]
The feature importances (the higher, the more important the feature).
oob_score_ : array, shape = [n_estimators]
Score of the training dataset obtained using an out-of-bag estimate. The i-th score oob_score_[i] is the deviance (= loss) of the model at iteration i on the out-of-bag sample.
train_score_ : array, shape = [n_estimators]
The i-th score train_score_[i] is the deviance (= loss) of the model at iteration i on the in-bag sample. If subsample == 1 this is the deviance on the training data.

Examples

>>> samples = [[0, 0, 2], [1, 0, 0]]
>>> labels = [0, 1]
>>> print gb.predict([[0, 0, 0]])
[  1.32806...


DecisionTreeRegressor, RandomForestRegressor

References

J. Friedman, Greedy Function Approximation: A Gradient Boosting Machine, The Annals of Statistics, Vol. 29, No. 5, 2001.

1. Friedman, Stochastic Gradient Boosting, 1999

T. Hastie, R. Tibshirani and J. Friedman. Elements of Statistical Learning Ed. 2, Springer, 2009.

class mdp.nodes.PLSCanonicalScikitsLearnNode

PLSCanonical implements the 2 blocks canonical PLS of the original Wold algorithm [Tenenhaus 1998] p.204, refered as PLS-C2A in [Wegelin 2000].

This node has been automatically generated by wrapping the sklearn.pls.PLSCanonical class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

This class inherits from PLS with mode=”A” and deflation_mode=”canonical”, norm_y_weights=True and algorithm=”nipals”, but svd should provide similar results up to numerical errors.

Parameters

X : array-like of predictors, shape = [n_samples, p]
Training vectors, where n_samples in the number of samples and p is the number of predictors.
Y : array-like of response, shape = [n_samples, q]
Training vectors, where n_samples in the number of samples and q is the number of response variables.

n_components : int, number of components to keep. (default 2).

scale : boolean, scale data? (default True)

algorithm : string, “nipals” or “svd”
The algorithm used to estimate the weights. It will be called n_components times, i.e. once for each iteration of the outer loop.
max_iter : an integer, (default 500)
the maximum number of iterations of the NIPALS inner loop (used only if algorithm=”nipals”)
tol : non-negative real, default 1e-06
the tolerance used in the iterative algorithm
copy : boolean, default True
Whether the deflation should be done on a copy. Let the default value to True unless you don’t care about side effect

Attributes

x_weights_ : array, shape = [p, n_components]
X block weights vectors.
y_weights_ : array, shape = [q, n_components]
Y block weights vectors.
x_scores_ : array, shape = [n_samples, n_components]
X scores.
y_scores_ : array, shape = [n_samples, n_components]
Y scores.
x_rotations_ : array, shape = [p, n_components]
X block to latents rotations.
y_rotations_ : array, shape = [q, n_components]
Y block to latents rotations.

Notes

For each component k, find weights u, v that optimize:

max corr(Xk u, Yk v) * var(Xk u) var(Yk u), such that |u| = |v| = 1

Note that it maximizes both the correlations between the scores and the intra-block variances.

The residual matrix of X (Xk+1) block is obtained by the deflation on the current X score: x_score.

The residual matrix of Y (Yk+1) block is obtained by deflation on the current Y score. This performs a canonical symetric version of the PLS regression. But slightly different than the CCA. This is mode mostly used for modeling.

This implementation provides the same results that the “plspm” package provided in the R language (R-project), using the function plsca(X, Y). Results are equal or colinear with the function pls(..., mode = "canonical") of the “mixOmics” package. The difference relies in the fact that mixOmics implmentation does not exactly implement the Wold algorithm since it does not normalize y_weights to one.

Examples

>>> from sklearn.pls import PLSCanonical, PLSRegression, CCA
>>> X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
>>> Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
>>> plsca = PLSCanonical(n_components=2)
>>> plsca.fit(X, Y)
...
PLSCanonical(algorithm='nipals', copy=True, max_iter=500, n_components=2,
scale=True, tol=1e-06)
>>> X_c, Y_c = plsca.transform(X, Y)


References

Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. Technical Report 371, Department of Statistics, University of Washington, Seattle, 2000.

Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris:

Editions Technic.

CCA PLSSVD

Full API documentation: PLSCanonicalScikitsLearnNode

class mdp.nodes.SelectPercentileScikitsLearnNode

Filter: Select the best percentile of the p_values

This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectPercentile class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

score_func: callable

Function taking two arrays X and y, and returning 2 arrays:

• both scores and pvalues
percentile: int, optional
Percent of features to keep

Full API documentation: SelectPercentileScikitsLearnNode

class mdp.nodes.RandomForestRegressorScikitsLearnNode

A random forest regressor.

This node has been automatically generated by wrapping the sklearn.ensemble.forest.RandomForestRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

A random forest is a meta estimator that fits a number of classifical decision trees on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.

Parameters

n_estimators : integer, optional (default=10)
The number of trees in the forest.
criterion : string, optional (default=”mse”)
The function to measure the quality of a split. The only supported criterion is “mse” for the mean squared error. Note: this parameter is tree-specific.
max_depth : integer or None, optional (default=None)
The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. Note: this parameter is tree-specific.
min_samples_split : integer, optional (default=1)
The minimum number of samples required to split an internal node. Note: this parameter is tree-specific.
min_samples_leaf : integer, optional (default=1)
The minimum number of samples in newly created leaves. A split is discarded if after the split, one of the leaves would contain less then min_samples_leaf samples. Note: this parameter is tree-specific.
min_density : float, optional (default=0.1)
This parameter controls a trade-off in an optimization heuristic. It controls the minimum density of the sample_mask (i.e. the fraction of samples in the mask). If the density falls below this threshold the mask is recomputed and the input data is packed which results in data copying. If min_density equals to one, the partitions are always represented as copies of the original data. Otherwise, partitions are represented as bit masks (aka sample masks). Note: this parameter is tree-specific.
max_features : int, string or None, optional (default=”auto”)

The number of features to consider when looking for the best split:

• If “auto”, then max_features=sqrt(n_features) on
• on regression problems.
• If “sqrt”, then max_features=sqrt(n_features).
• If “log2”, then max_features=log2(n_features).
• If None, then max_features=n_features.

Note: this parameter is tree-specific.

bootstrap : boolean, optional (default=True)
Whether bootstrap samples are used when building trees.
compute_importances : boolean, optional (default=True)
Whether feature importances are computed and stored into the feature_importances_ attribute when calling fit.
oob_score : bool
whether to use out-of-bag samples to estimate the generalization error.
n_jobs : integer, optional (default=1)
The number of jobs to run in parallel. If -1, then the number of jobs is set to the number of cores.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
verbose : int, optional (default=0)
Controlls the verbosity of the tree building process.

Attributes

estimators_: list of DecisionTreeRegressor
The collection of fitted sub-estimators.
feature_importances_ : array of shape = [n_features]
The feature mportances (the higher, the more important the feature).
oob_score_ : float
Score of the training dataset obtained using an out-of-bag estimate.
oob_prediction_ : array, shape = [n_samples]
Prediction computed with out-of-bag estimate on the training set.

References

 [1] Breiman, “Random Forests”, Machine Learning, 45(1), 5-32, 2001.

DecisionTreeRegressor, ExtraTreesRegressor

Full API documentation: RandomForestRegressorScikitsLearnNode

class mdp.nodes.GaussianNBScikitsLearnNode

Gaussian Naive Bayes (GaussianNB)

This node has been automatically generated by wrapping the sklearn.naive_bayes.GaussianNB class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

X : array-like, shape = [n_samples, n_features]
Training vector, where n_samples in the number of samples and n_features is the number of features.
y : array, shape = [n_samples]
Target vector relative to X

Attributes

class_prior_ : array, shape = [n_classes]
probability of each class.
theta_ : array, shape = [n_classes, n_features]
mean of each feature per class
sigma_ : array, shape = [n_classes, n_features]
variance of each feature per class

Examples

>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> Y = np.array([1, 1, 1, 2, 2, 2])
>>> from sklearn.naive_bayes import GaussianNB
>>> clf = GaussianNB()
>>> clf.fit(X, Y)
GaussianNB()
>>> print(clf.predict([[-0.8, -1]]))
[1]


Full API documentation: GaussianNBScikitsLearnNode

class mdp.nodes.GaussianHMMScikitsLearnNode

Hidden Markov Model with Gaussian emissions

This node has been automatically generated by wrapping the sklearn.hmm.GaussianHMM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Representation of a hidden Markov model probability distribution. This class allows for easy evaluation of, sampling from, and maximum-likelihood estimation of the parameters of a HMM.

Parameters

n_components : int
Number of states.
_covariance_type : string
String describing the type of covariance parameters to use. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. Defaults to ‘diag’.

Attributes

_covariance_type : string
String describing the type of covariance parameters used by the model. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’.
n_features : int
Dimensionality of the Gaussian emissions.
n_components : int
Number of states in the model.
transmat : array, shape (n_components, n_components)
Matrix of transition probabilities between states.
startprob : array, shape (‘n_components,)
Initial state occupation distribution.
means : array, shape (n_components, n_features)
Mean parameters for each state.
covars : array

Covariance parameters for each state. The shape depends on _covariance_type:

(n_components,)                   if 'spherical',
(n_features, n_features)              if 'tied',
(n_components, n_features)           if 'diag',
(n_components, n_features, n_features)  if 'full'
random_state: RandomState or an int seed (0 by default)
A random number generator instance
n_iter : int, optional
Number of iterations to perform.
thresh : float, optional
Convergence threshold.
params : string, optional
Controls which parameters are updated in the training process. Can contain any combination of ‘s’ for startprob, ‘t’ for transmat, ‘m’ for means, and ‘c’ for covars, etc. Defaults to all parameters.
init_params : string, optional
Controls which parameters are initialized prior to training. Can contain any combination of ‘s’ for startprob, ‘t’ for transmat, ‘m’ for means, and ‘c’ for covars, etc. Defaults to all parameters.

Examples

>>> from sklearn.hmm import GaussianHMM
>>> GaussianHMM(n_components=2)
...
GaussianHMM(algorithm='viterbi',...


GMM : Gaussian mixture model

Full API documentation: GaussianHMMScikitsLearnNode

This node has been automatically generated by wrapping the sklearn.semi_supervised.label_propagation.LabelSpreading class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

This model is similar to the basic Label Propgation algorithm, but uses affinity matrix based on the normalized graph Laplacian and soft clamping across the labels.

Parameters

kernel : {‘knn’, ‘rbf’}
String identifier for kernel function to use. Only ‘rbf’ and ‘knn’ kernels are currently supported.
gamma : float
parameter for rbf kernel
n_neighbors : integer > 0
parameter for knn kernel
alpha : float
clamping factor
max_iters : float
maximum number of iterations allowed
tol : float
Convergence tolerance: threshold to consider the system at steady state

Examples

>>> from sklearn import datasets
>>> random_unlabeled_points = np.where(np.random.random_integers(0, 1,
...    size=len(iris.target)))
>>> labels = np.copy(iris.target)
>>> labels[random_unlabeled_points] = -1
>>> label_prop_model.fit(iris.data, labels)
...


References

Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, Bernhard Schölkopf. Learning with local and global consistency (2004) http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.3219

LabelPropagation : Unregularized graph based semi-supervised learning

class mdp.nodes.NMFScikitsLearnNode

Non-Negative matrix factorization by Projected Gradient (NMF)

This node has been automatically generated by wrapping the sklearn.decomposition.nmf.NMF class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

X: {array-like, sparse matrix}, shape = [n_samples, n_features]
Data the model will be fit to.
n_components: int or None
Number of components, if n_components is not set all components are kept
init: ‘nndsvd’ | ‘nndsvda’ | ‘nndsvdar’ | int | RandomState

Method used to initialize the procedure. Default: ‘nndsvdar’ Valid options:

'nndsvd': Nonnegative Double Singular Value Decomposition (NNDSVD)
initialization (better for sparseness)
'nndsvda': NNDSVD with zeros filled with the average of X
(better when sparsity is not desired)
'nndsvdar': NNDSVD with zeros filled with small random values
(generally faster, less accurate alternative to NNDSVDa
for when sparsity is not desired)
int seed or RandomState: non-negative random matrices
sparseness: ‘data’ | ‘components’ | None, default: None
Where to enforce sparsity in the model.
beta: double, default: 1
Degree of sparseness, if sparseness is not None. Larger values mean more sparseness.
eta: double, default: 0.1
Degree of correctness to mantain, if sparsity is not None. Smaller values mean larger error.
tol: double, default: 1e-4
Tolerance value used in stopping conditions.
max_iter: int, default: 200
Number of iterations to compute.
nls_max_iter: int, default: 2000
Number of iterations in NLS subproblem.

Attributes

components_ : array, [n_components, n_features]
Non-negative components of the data
reconstruction_err_ : number
Frobenius norm of the matrix difference between the training data and the reconstructed data from the fit produced by the model. || X - WH ||_2 Not computed for sparse input matrices because it is too expensive in terms of memory.

Examples

>>> import numpy as np
>>> X = np.array([[1,1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])
>>> model.fit(X)
nls_max_iter=2000, sparseness=None, tol=0.0001)
>>> model.components_
array([[ 0.77032744,  0.11118662],
[ 0.38526873,  0.38228063]])
>>> model.reconstruction_err_
0.00746...
...                              sparseness='components')
>>> model.fit(X)
nls_max_iter=2000, sparseness='components', tol=0.0001)
>>> model.components_
array([[ 1.67481991,  0.29614922],
[-0.        ,  0.4681982 ]])
>>> model.reconstruction_err_
0.513...


Notes

This implements

C.-J. Lin. Projected gradient methods for non-negative matrix factorization. Neural Computation, 19(2007), 2756-2779. http://www.csie.ntu.edu.tw/~cjlin/nmf/

P. Hoyer. Non-negative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Research 2004.

NNDSVD is introduced in

C. Boutsidis, E. Gallopoulos: SVD based initialization: A head start for nonnegative matrix factorization - Pattern Recognition, 2008 http://www.cs.rpi.edu/~boutsc/files/nndsvd.pdf

Full API documentation: NMFScikitsLearnNode

class mdp.nodes.SparseBaseLibSVMScikitsLearnNode

Full API documentation: SparseBaseLibSVMScikitsLearnNode

class mdp.nodes.DPGMMScikitsLearnNode

Variational Inference for the Infinite Gaussian Mixture Model.

This node has been automatically generated by wrapping the sklearn.mixture.dpgmm.DPGMM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

DPGMM stands for Dirichlet Process Gaussian Mixture Model, and it is an infinite mixture model with the Dirichlet Process as a prior distribution on the number of clusters. In practice the approximate inference algorithm uses a truncated distribution with a fixed maximum number of components, but almost always the number of components actually used depends on the data.

Stick-breaking Representation of a Gaussian mixture model probability distribution. This class allows for easy and efficient inference of an approximate posterior distribution over the parameters of a Gaussian mixture model with a variable number of components (smaller than the truncation parameter n_components).

Initialization is with normally-distributed means and identity covariance, for proper convergence.

Parameters

n_components: int, optional
Number of mixture components. Defaults to 1.
covariance_type: string, optional
String describing the type of covariance parameters to use. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. Defaults to ‘diag’.
alpha: float, optional
Real number representing the concentration parameter of the dirichlet process. Intuitively, the Dirichlet Process is as likely to start a new cluster for a point as it is to add that point to a cluster with alpha elements. A higher alpha means more clusters, as the expected number of clusters is alpha*log(N). Defaults to 1.
thresh : float, optional
Convergence threshold.
n_iter : int, optional
Maximum number of iterations to perform before convergence.
params : string, optional
Controls which parameters are updated in the training process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.
init_params : string, optional
Controls which parameters are updated in the initialization process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.

Attributes

covariance_type : string
String describing the type of covariance parameters used by the DP-GMM. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’.
n_components : int
Number of mixture components.
weights_ : array, shape (n_components,)
Mixing weights for each mixture component.
means_ : array, shape (n_components, n_features)
Mean parameters for each mixture component.
precisions_ : array

Precision (inverse covariance) parameters for each mixture component. The shape depends on covariance_type:

(n_components, 'n_features')                if 'spherical',
(n_features, n_features)                  if 'tied',
(n_components, n_features)                if 'diag',
(n_components, n_features, n_features)  if 'full'
converged_ : bool
True when convergence was reached in fit(), False otherwise.

GMM : Finite Gaussian mixture model fit with EM

VBGMM : Finite Gaussian mixture model fit with a variational
algorithm, better for situations where there might be too little data to get a good estimate of the covariance matrix.

Full API documentation: DPGMMScikitsLearnNode

class mdp.nodes.SVCScikitsLearnNode

C-Support Vector Classification.

This node has been automatically generated by wrapping the sklearn.svm.classes.SVC class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The implementations is a based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples.

The multiclass support is handled according to a one-vs-one scheme.

For details on the precise mathematical formulation of the provided kernel functions and how gamma, coef0 and degree affect each, see the corresponding section in the narrative documentation:

svm_kernels.

Parameters

C : float, optional (default=1.0)
Penalty parameter C of the error term.
kernel : string, optional (default=’rbf’)
Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’. If none is given, ‘rbf’ will be used.
degree : int, optional (default=3)
Degree of kernel function. It is significant only in ‘poly’ and ‘sigmoid’.
gamma : float, optional (default=0.0)
Kernel coefficient for ‘rbf’ and ‘poly’. If gamma is 0.0 then 1/n_features will be used instead.
coef0 : float, optional (default=0.0)
Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.
probability: boolean, optional (default=False)
Whether to enable probability estimates. This must be enabled prior to calling predict_proba.
shrinking: boolean, optional (default=True)
Whether to use the shrinking heuristic.
tol: float, optional (default=1e-3)
Tolerance for stopping criterion.
cache_size: float, optional
Specify the size of the kernel cache (in MB)
class_weight : {dict, ‘auto’}, optional
Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The ‘auto’ mode uses the values of y to automatically adjust weights inversely proportional to class frequencies.
verbose : bool, default: False
Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.

Attributes

support_ : array-like, shape = [n_SV]
Index of support vectors.
support_vectors_ : array-like, shape = [n_SV, n_features]
Support vectors.
n_support_ : array-like, dtype=int32, shape = [n_class]
number of support vector for each class.
dual_coef_ : array, shape = [n_class-1, n_SV]
Coefficients of the support vector in the decision function. For multiclass, coefficient for all 1-vs-1 classifiers. The layout of the coefficients in the multiclass case is somewhat non-trivial. See the section about multi-class classification in the SVM section of the User Guide for details.
coef_ : array, shape = [n_class-1, n_features]

Weights asigned to the features (coefficients in the primal problem). This is only available in the case of linear kernel.

coef_ is readonly property derived from dual_coef_ and support_vectors_

intercept_ : array, shape = [n_class * (n_class-1) / 2]
Constants in decision function.

Examples

>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> y = np.array([1, 1, 2, 2])
>>> from sklearn.svm import SVC
>>> clf = SVC()
>>> clf.fit(X, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3,
gamma=0.0, kernel='rbf', probability=False, shrinking=True,
tol=0.001, verbose=False)
>>> print(clf.predict([[-0.8, -1]]))
[ 1.]


SVR
Support Vector Machine for Regression implemented using libsvm.
LinearSVC
Scalable Linear Support Vector Machine for classififcation implemented using liblinear. Check the See also section of LinearSVC for more comparison element.

Full API documentation: SVCScikitsLearnNode

class mdp.nodes.VBGMMScikitsLearnNode

Variational Inference for the Gaussian Mixture Model

This node has been automatically generated by wrapping the sklearn.mixture.dpgmm.VBGMM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Variational inference for a Gaussian mixture model probability distribution. This class allows for easy and efficient inference of an approximate posterior distribution over the parameters of a Gaussian mixture model with a fixed number of components.

Initialization is with normally-distributed means and identity covariance, for proper convergence.

Parameters

n_components: int, optional
Number of mixture components. Defaults to 1.
covariance_type: string, optional
String describing the type of covariance parameters to use. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. Defaults to ‘diag’.
alpha: float, optional
Real number representing the concentration parameter of the dirichlet distribution. Intuitively, the higher the value of alpha the more likely the variational mixture of Gaussians model will use all components it can. Defaults to 1.

Attributes

covariance_type : string
String describing the type of covariance parameters used by the DP-GMM. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’.
n_features : int
Dimensionality of the Gaussians.
Number of mixture components.
weights_ : array, shape (n_components,)
Mixing weights for each mixture component.
means_ : array, shape (n_components, n_features)
Mean parameters for each mixture component.
precisions_ : array

Precision (inverse covariance) parameters for each mixture component. The shape depends on covariance_type:

(n_components, 'n_features')                if 'spherical',
(n_features, n_features)                  if 'tied',
(n_components, n_features)                if 'diag',
(n_components, n_features, n_features)  if 'full'
converged_ : bool
True when convergence was reached in fit(), False otherwise.

GMM : Finite Gaussian mixture model fit with EM DPGMM : Ininite Gaussian mixture model, using the dirichlet

process, fit with a variational algorithm

Full API documentation: VBGMMScikitsLearnNode

class mdp.nodes.DictVectorizerScikitsLearnNode

Transforms lists of feature-value mappings to vectors.

This node has been automatically generated by wrapping the sklearn.feature_extraction.dict_vectorizer.DictVectorizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

This transformer turns lists of mappings (dict-like objects) of feature names to feature values into Numpy arrays or scipy.sparse matrices for use with scikit-learn estimators.

When feature values are strings, this transformer will do a binary one-hot (aka one-of-K) coding: one boolean-valued feature is constructed for each of the possible string values that the feature can take on. For instance, a feature “f” that can take on the values “ham” and “spam” will become two features in the output, one signifying “f=ham”, the other “f=spam”.

Features that do not occur in a sample (mapping) will have a zero value in the resulting array/matrix.

Parameters

dtype : callable, optional
The type of feature values. Passed to Numpy array/scipy.sparse matrix constructors as the dtype argument.
separator: string, optional
Separator string used when constructing new features for one-hot coding.
sparse: boolean, optional.
Whether transform should produce scipy.sparse matrices. True by default.

Examples

>>> from sklearn.feature_extraction import DictVectorizer
>>> v = DictVectorizer(sparse=False)
>>> D = [{'foo': 1, 'bar': 2}, {'foo': 3, 'baz': 1}]
>>> X = v.fit_transform(D)
>>> X
array([[ 2.,  0.,  1.],
[ 0.,  1.,  3.]])
>>> v.inverse_transform(X) ==         [{'bar': 2.0, 'foo': 1.0}, {'baz': 1.0, 'foo': 3.0}]
True
>>> v.transform({'foo': 4, 'unseen_feature': 3})
array([[ 0.,  0.,  4.]])


Full API documentation: DictVectorizerScikitsLearnNode

class mdp.nodes.LinearSVCScikitsLearnNode

Full API documentation: LinearSVCScikitsLearnNode

class mdp.nodes.RandomizedLassoScikitsLearnNode

Randomized Lasso

This node has been automatically generated by wrapping the sklearn.linear_model.randomized_l1.RandomizedLasso class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Randomized Lasso works by resampling the train data and computing a Lasso on each resampling. In short, the features selected more often are good features. It is also known as stability selection.

Parameters

alpha : float, ‘aic’, or ‘bic’
The regularization parameter alpha parameter in the Lasso. Warning: this is not the alpha parameter in the stability selection article which is scaling.
scaling : float
The alpha parameter in the stability selection article used to randomly scale the features. Should be between 0 and 1.
sample_fraction : float
The fraction of samples to be used in each randomized design. Should be between 0 and 1. If 1, all samples are used.
fit_intercept : boolean
whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
verbose : boolean or integer, optional
Sets the verbosity amount
normalize : boolean, optional
If True, the regressors X are normalized
precompute : True | False | ‘auto’
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument.
max_iter : integer, optional
Maximum number of iterations to perform in the Lars algorithm.
eps : float, optional
The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems. Unlike the ‘tol’ parameter in some iterative optimization-based algorithms, this parameter does not control the tolerance of the optimization.
n_jobs : integer, optional
Number of CPUs to use during the resampling. If ‘-1’, use all the CPUs
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
pre_dispatch : int, or string, optional

Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:

• None, in which case all the jobs are immediatly created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs
• An int, giving the exact number of total jobs that are spawned
• A string, giving an expression as a function of n_jobs, as in ‘2*n_jobs’
memory : Instance of joblib.Memory or string
Used for internal caching. By default, no caching is done. If a string is given, it is thepath to the caching directory.

Attributes

scores_ : array, shape = [n_features]
Feature scores between 0 and 1.
all_scores_ : array, shape = [n_features, n_reg_parameter]
Feature scores between 0 and 1 for all values of the regularization parameter. The reference article suggests scores_ is the max of all_scores_.

Examples

>>> from sklearn.linear_model import RandomizedLasso
>>> randomized_lasso = RandomizedLasso()


Notes

See examples/linear_model/plot_sparse_recovery.py for an example.

References

Stability selection Nicolai Meinshausen, Peter Buhlmann Journal of the Royal Statistical Society: Series B Volume 72, Issue 4, pages 417-473, September 2010 DOI: 10.1111/j.1467-9868.2010.00740.x

RandomizedLogisticRegression, LogisticRegression

Full API documentation: RandomizedLassoScikitsLearnNode

class mdp.nodes.MultinomialNBScikitsLearnNode

Naive Bayes classifier for multinomial models

This node has been automatically generated by wrapping the sklearn.naive_bayes.MultinomialNB class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf may also work.

Parameters

alpha: float, optional (default=1.0)
Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).
fit_prior: boolean
Whether to learn class prior probabilities or not. If false, a uniform prior will be used.

Attributes

intercept_, class_log_prior_ : array, shape = [n_classes]
Smoothed empirical log probability for each class.
feature_log_prob_, coef_ : array, shape = [n_classes, n_features]

Empirical log probability of features given a class, P(x_i|y).

(intercept_ and coef_ are properties referring to class_log_prior_ and feature_log_prob_, respectively.)

Examples

>>> import numpy as np
>>> X = np.random.randint(5, size=(6, 100))
>>> Y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import MultinomialNB
>>> clf = MultinomialNB()
>>> clf.fit(X, Y)
MultinomialNB(alpha=1.0, fit_prior=True)
>>> print(clf.predict(X[2]))
[3]


Notes

For the rationale behind the names coef_ and intercept_, i.e. naive Bayes as a linear classifier, see J. Rennie et al. (2003), Tackling the poor assumptions of naive Bayes text classifiers, ICML.

Full API documentation: MultinomialNBScikitsLearnNode

class mdp.nodes.LassoScikitsLearnNode

Linear Model trained with L1 prior as regularizer (aka the Lasso)

This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.Lasso class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1

Technically the Lasso model is optimizing the same objective function as the Elastic Net with rho=1.0 (no L2 penalty).

Parameters

alpha : float, optional
Constant that multiplies the L1 term. Defaults to 1.0
fit_intercept : boolean
whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
normalize : boolean, optional
If True, the regressors X are normalized
copy_X : boolean, optional, default True
If True, X will be copied; else, it may be overwritten.
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument. For sparse input this option is always True to preserve sparsity.
max_iter: int, optional
The maximum number of iterations
tol : float, optional
The tolerance for the optimization: if the updates are smaller than ‘tol’, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.
warm_start : bool, optional
When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
positive : bool, optional
When set to True, forces the coefficients to be positive.

Attributes

coef_ : array, shape = (n_features,)
parameter vector (w in the cost function formula)
sparse_coef_ : scipy.sparse matrix, shape = (n_features, 1)
sparse_coef_ is a readonly property derived from coef_
intercept_ : float
independent term in decision function.

Examples

>>> from sklearn import linear_model
>>> clf = linear_model.Lasso(alpha=0.1)
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
normalize=False, positive=False, precompute='auto', tol=0.0001,
warm_start=False)
>>> print(clf.coef_)
[ 0.85  0.  ]
>>> print(clf.intercept_)
0.15


lars_path lasso_path LassoLars LassoCV LassoLarsCV sklearn.decomposition.sparse_encode

Notes

The algorithm used to fit the model is coordinate descent.

To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.

Full API documentation: LassoScikitsLearnNode

class mdp.nodes.LocallyLinearEmbeddingScikitsLearnNode

Locally Linear Embedding

This node has been automatically generated by wrapping the sklearn.manifold.locally_linear.LocallyLinearEmbedding class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

n_neighbors : integer
number of neighbors to consider for each point.
n_components : integer
number of coordinates for the manifold
reg : float
regularization constant, multiplies the trace of the local covariance matrix of the distances.
eigen_solver : string, {‘auto’, ‘arpack’, ‘dense’}

auto : algorithm will attempt to choose the best method for input data

arpack : use arnoldi iteration in shift-invert mode.
For this method, M may be a dense matrix, sparse matrix, or general linear operator. Warning: ARPACK can be unstable for some problems. It is best to try several random seeds in order to check results.
dense : use standard dense matrix operations for the eigenvalue
decomposition. For this method, M must be an array or matrix type. This method should be avoided for large problems.
tol : float, optional
Tolerance for ‘arpack’ method Not used if eigen_solver==’dense’.
max_iter : integer
maximum number of iterations for the arpack solver. Not used if eigen_solver==’dense’.
method : string [‘standard’ | ‘hessian’ | ‘modified’]
standard : use the standard locally linear embedding algorithm.
see reference [1]
hessian : use the Hessian eigenmap method. This method requires
n_neighbors > n_components * (1 + (n_components + 1) / 2. see reference [2]
modified : use the modified locally linear embedding algorithm.
see reference [3]
ltsa : use local tangent space alignment algorithm
see reference [4]
hessian_tol : float, optional
Tolerance for Hessian eigenmapping method. Only used if method == ‘hessian’
modified_tol : float, optional
Tolerance for modified LLE method. Only used if method == ‘modified’
neighbors_algorithm : string [‘auto’|’brute’|’kd_tree’|’ball_tree’]
algorithm to use for nearest neighbors search, passed to neighbors.NearestNeighbors instance
random_state: numpy.RandomState or int, optional
The generator or seed used to determine the starting vector for arpack iterations. Defaults to numpy.random.

Attributes

embedding_vectors_ : array-like, shape [n_components, n_samples]
Stores the embedding vectors
reconstruction_error_ : float
Reconstruction error associated with embedding_vectors_
nbrs_ : NearestNeighbors object
Stores nearest neighbors instance, including BallTree or KDtree if applicable.

References

 [1] Roweis, S. & Saul, L. Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323 (2000).
 [2] Donoho, D. & Grimes, C. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proc Natl Acad Sci U S A. 100:5591 (2003).
 [3] Zhang, Z. & Wang, J. MLLE: Modified Locally Linear Embedding Using Multiple Weights. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.70.382
 [4] Zhang, Z. & Zha, H. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. Journal of Shanghai Univ. 8:406 (2004)

Full API documentation: LocallyLinearEmbeddingScikitsLearnNode

class mdp.nodes.LarsCVScikitsLearnNode

Cross-validated Least Angle Regression model

This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.LarsCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

fit_intercept : boolean
whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
verbose : boolean or integer, optional
Sets the verbosity amount
normalize : boolean, optional
If True, the regressors X are normalized
copy_X : boolean, optional, default True
If True, X will be copied; else, it may be overwritten.
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument.
max_iter: integer, optional
Maximum number of iterations to perform.
cv : crossvalidation generator, optional
see sklearn.cross_validation module. If None is passed, default to a 5-fold strategy
max_n_alphas : integer, optional
The maximum number of points on the path used to compute the residuals in the cross-validation
n_jobs : integer, optional
Number of CPUs to use during the cross validation. If ‘-1’, use all the CPUs
eps: float, optional
The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems.

Attributes

coef_ : array, shape = [n_features]
parameter vector (w in the fomulation formula)
intercept_ : float
independent term in decision function
coef_path_: array, shape = [n_features, n_alpha]
the varying values of the coefficients along the path
alpha_: float
the estimated regularization parameter alpha
alphas_: array, shape = [n_alpha]
the different values of alpha along the path
cv_alphas_: array, shape = [n_cv_alphas]
all the values of alpha along the path for the different folds
cv_mse_path_: array, shape = [n_folds, n_cv_alphas]
the mean square error on left-out for each fold along the path (alpha values given by cv_alphas)

lars_path, LassoLARS, LassoLarsCV

Full API documentation: LarsCVScikitsLearnNode

class mdp.nodes.LDAScikitsLearnNode

Linear Discriminant Analysis (LDA)

This node has been automatically generated by wrapping the sklearn.lda.LDA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

A classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule.

The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix.

The fitted model can also be used to reduce the dimensionality of the input, by projecting it to the most discriminative directions.

Parameters

n_components: int
Number of components (< n_classes - 1) for dimensionality reduction
priors : array, optional, shape = [n_classes]
Priors on classes

Attributes

means_ : array-like, shape = [n_classes, n_features]
Class means
xbar_ : float, shape = [n_features]
Over all mean
priors_ : array-like, shape = [n_classes]
Class priors (sum to 1)
covariance_ : array-like, shape = [n_features, n_features]
Covariance matrix (shared by all classes)

Examples

>>> import numpy as np
>>> from sklearn.lda import LDA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = LDA()
>>> clf.fit(X, y)
LDA(n_components=None, priors=None)
>>> print(clf.predict([[-0.8, -1]]))
[1]


Full API documentation: LDAScikitsLearnNode

class mdp.nodes.QuantileEstimatorScikitsLearnNode

This node has been automatically generated by wrapping the sklearn.ensemble.gradient_boosting.QuantileEstimator class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Full API documentation: QuantileEstimatorScikitsLearnNode

class mdp.nodes.CountVectorizerScikitsLearnNode

Convert a collection of raw documents to a matrix of token counts

This node has been automatically generated by wrapping the sklearn.feature_extraction.text.CountVectorizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

This implementation produces a sparse representation of the counts using scipy.sparse.coo_matrix.

If you do not provide an a-priori dictionary and you do not use an analyzer that does some kind of feature selection then the number of features will be equal to the vocabulary size found by analysing the data. The default analyzer does simple stop word filtering for English.

Parameters

input: string {‘filename’, ‘file’, ‘content’}

If filename, the sequence passed as an argument to fit is expected to be a list of filenames that need reading to fetch the raw content to analyze.

If ‘file’, the sequence items must have ‘read’ method (file-like object) it is called to fetch the bytes in memory.

Otherwise the input is expected to be the sequence strings or bytes items are expected to be analyzed directly.

charset: string, ‘utf-8’ by default.
If bytes or files are given to analyze, this charset is used to decode.
charset_error: {‘strict’, ‘ignore’, ‘replace’}
Instruction on what to do if a byte sequence is given to analyze that contains characters not of the given charset. By default, it is ‘strict’, meaning that a UnicodeDecodeError will be raised. Other values are ‘ignore’ and ‘replace’.
strip_accents: {‘ascii’, ‘unicode’, None}
Remove accents during the preprocessing step. ‘ascii’ is a fast method that only works on characters that have an direct ASCII mapping. ‘unicode’ is a slightly slower method that works on any characters. None (default) does nothing.
analyzer: string, {‘word’, ‘char’, ‘char_wb’} or callable

Whether the feature should be made of word or character n-grams. Option ‘char_wb’ creates character n-grams only from text inside word boundaries.

If a callable is passed it is used to extract the sequence of features out of the raw, unprocessed input.

preprocessor: callable or None (default)
Override the preprocessing (string transformation) stage while preserving the tokenizing and n-grams generation steps.
tokenizer: callable or None (default)
Override the string tokenization step while preserving the preprocessing and n-grams generation steps.
ngram_range: tuple (min_n, max_n)
The lower and upper boundary of the range of n-values for different n-grams to be extracted. All values of n such that min_n <= n <= max_n will be used.
stop_words: string {‘english’}, list, or None (default)

If a string, it is passed to _check_stop_list and the appropriate stop list is returned is currently the only supported string value.

If a list, that list is assumed to contain stop words, all of which will be removed from the resulting tokens.

If None, no stop words will be used. max_df can be set to a value in the range [0.7, 1.0) to automatically detect and filter stop words based on intra corpus document frequency of terms.

lowercase: boolean, default True
Convert all characters to lowercase befor tokenizing.
token_pattern: string
Regular expression denoting what constitutes a “token”, only used if tokenize == ‘word’. The default regexp select tokens of 2 or more letters characters (punctuation is completely ignored and always treated as a token separator).
max_df : float in range [0.0, 1.0] or int, optional, 1.0 by default
When building the vocabulary ignore terms that have a term frequency strictly higher than the given threshold (corpus specific stop words). If float, the parameter represents a proportion of documents, integer absolute counts. This parameter is ignored if vocabulary is not None.
min_df : float in range [0.0, 1.0] or int, optional, 2 by default
When building the vocabulary ignore terms that have a term frequency strictly lower than the given threshold. This value is also called cut-off in the literature. If float, the parameter represents a proportion of documents, integer absolute counts. This parameter is ignored if vocabulary is not None.
max_features : optional, None by default

If not None, build a vocabulary that only consider the top max_features ordered by term frequency across the corpus.

This parameter is ignored if vocabulary is not None.

vocabulary: Mapping or iterable, optional
Either a Mapping (e.g., a dict) where keys are terms and values are indices in the feature matrix, or an iterable over terms. If not given, a vocabulary is determined from the input documents.
binary: boolean, False by default.
If True, all non zero counts are set to 1. This is useful for discrete probabilistic models that model binary events rather than integer counts.
dtype: type, optional
Type of the matrix returned by fit_transform() or transform().

Full API documentation: CountVectorizerScikitsLearnNode

class mdp.nodes.ExtraTreesRegressorScikitsLearnNode

An extra-trees regressor.

This node has been automatically generated by wrapping the sklearn.ensemble.forest.ExtraTreesRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.

Parameters

n_estimators : integer, optional (default=10)
The number of trees in the forest.
criterion : string, optional (default=”mse”)
The function to measure the quality of a split. The only supported criterion is “mse” for the mean squared error. Note: this parameter is tree-specific.
max_depth : integer or None, optional (default=None)
The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. Note: this parameter is tree-specific.
min_samples_split : integer, optional (default=1)
The minimum number of samples required to split an internal node. Note: this parameter is tree-specific.
min_samples_leaf : integer, optional (default=1)
The minimum number of samples in newly created leaves. A split is discarded if after the split, one of the leaves would contain less then min_samples_leaf samples. Note: this parameter is tree-specific.
min_density : float, optional (default=0.1)
This parameter controls a trade-off in an optimization heuristic. It controls the minimum density of the sample_mask (i.e. the fraction of samples in the mask). If the density falls below this threshold the mask is recomputed and the input data is packed which results in data copying. If min_density equals to one, the partitions are always represented as copies of the original data. Otherwise, partitions are represented as bit masks (aka sample masks). Note: this parameter is tree-specific.
max_features : int, string or None, optional (default=”auto”)

The number of features to consider when looking for the best split:

• If “auto”, then max_features=sqrt(n_features) on
• on regression problems.
• If “sqrt”, then max_features=sqrt(n_features).
• If “log2”, then max_features=log2(n_features).
• If None, then max_features=n_features.

Note: this parameter is tree-specific.

bootstrap : boolean, optional (default=False)
Whether bootstrap samples are used when building trees. Note: this parameter is tree-specific.
compute_importances : boolean, optional (default=True)
Whether feature importances are computed and stored into the feature_importances_ attribute when calling fit.
oob_score : bool
Whether to use out-of-bag samples to estimate the generalization error.
n_jobs : integer, optional (default=1)
The number of jobs to run in parallel. If -1, then the number of jobs is set to the number of cores.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
verbose : int, optional (default=0)
Controlls the verbosity of the tree building process.

Attributes

estimators_: list of DecisionTreeRegressor
The collection of fitted sub-estimators.
feature_importances_ : array of shape = [n_features]
The feature mportances (the higher, the more important the feature).
oob_score_ : float
Score of the training dataset obtained using an out-of-bag estimate.
oob_prediction_ : array, shape = [n_samples]
Prediction computed with out-of-bag estimate on the training set.

References

 [1] P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006.

sklearn.tree.ExtraTreeRegressor: Base estimator for this ensemble. RandomForestRegressor: Ensemble regressor using trees with optimal splits.

Full API documentation: ExtraTreesRegressorScikitsLearnNode

class mdp.nodes.MultinomialHMMScikitsLearnNode

Hidden Markov Model with multinomial (discrete) emissions

This node has been automatically generated by wrapping the sklearn.hmm.MultinomialHMM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Attributes

n_components : int
Number of states in the model.
n_symbols : int
Number of possible symbols emitted by the model (in the observations).
transmat : array, shape (n_components, n_components)
Matrix of transition probabilities between states.
startprob : array, shape (‘n_components,)
Initial state occupation distribution.
emissionprob : array, shape (‘n_components, ‘n_symbols)
Probability of emitting a given symbol when in each state.
random_state: RandomState or an int seed (0 by default)
A random number generator instance
n_iter : int, optional
Number of iterations to perform.
thresh : float, optional
Convergence threshold.
params : string, optional
Controls which parameters are updated in the training process. Can contain any combination of ‘s’ for startprob, ‘t’ for transmat, ‘m’ for means, and ‘c’ for covars, etc. Defaults to all parameters.
init_params : string, optional
Controls which parameters are initialized prior to training. Can contain any combination of ‘s’ for startprob, ‘t’ for transmat, ‘m’ for means, and ‘c’ for covars, etc. Defaults to all parameters.

Examples

>>> from sklearn.hmm import MultinomialHMM
>>> MultinomialHMM(n_components=2)
...
MultinomialHMM(algorithm='viterbi',...


GaussianHMM : HMM with Gaussian emissions

Full API documentation: MultinomialHMMScikitsLearnNode

class mdp.nodes.LabelPropagationScikitsLearnNode

Label Propagation classifier

This node has been automatically generated by wrapping the sklearn.semi_supervised.label_propagation.LabelPropagation class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

kernel : {‘knn’, ‘rbf’}
String identifier for kernel function to use. Only ‘rbf’ and ‘knn’ kernels are currently supported..
gamma : float
parameter for rbf kernel
n_neighbors : integer > 0
parameter for knn kernel
alpha : float
clamping factor
max_iters : float
change maximum number of iterations allowed
tol : float
Convergence tolerance: threshold to consider the system at steady state

Examples

>>> from sklearn import datasets
>>> from sklearn.semi_supervised import LabelPropagation
>>> label_prop_model = LabelPropagation()
>>> random_unlabeled_points = np.where(np.random.random_integers(0, 1,
...    size=len(iris.target)))
>>> labels = np.copy(iris.target)
>>> labels[random_unlabeled_points] = -1
>>> label_prop_model.fit(iris.data, labels)
...
LabelPropagation(...)


References

Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002 http://pages.cs.wisc.edu/~jerryzhu/pub/CMU-CALD-02-107.pdf

LabelSpreading : Alternate label proagation strategy more robust to noise

Full API documentation: LabelPropagationScikitsLearnNode

class mdp.nodes.GaussianProcessScikitsLearnNode

The Gaussian Process model class.

This node has been automatically generated by wrapping the sklearn.gaussian_process.gaussian_process.GaussianProcess class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

regr : string or callable, optional

A regression function returning an array of outputs of the linear regression functional basis. The number of observations n_samples should be greater than the size p of this basis. Default assumes a simple constant regression trend. Available built-in regression models are:

'constant', 'linear', 'quadratic'

corr : string or callable, optional

A stationary autocorrelation function returning the autocorrelation between two points x and x’. Default assumes a squared-exponential autocorrelation model. Built-in correlation models are:

'absolute_exponential', 'squared_exponential',
'generalized_exponential', 'cubic', 'linear'

beta0 : double array_like, optional
The regression weight vector to perform Ordinary Kriging (OK). Default assumes Universal Kriging (UK) so that the vector beta of regression weights is estimated using the maximum likelihood principle.
storage_mode : string, optional
A string specifying whether the Cholesky decomposition of the correlation matrix should be stored in the class (storage_mode = ‘full’) or not (storage_mode = ‘light’). Default assumes storage_mode = ‘full’, so that the Cholesky decomposition of the correlation matrix is stored. This might be a useful parameter when one is not interested in the MSE and only plan to estimate the BLUP, for which the correlation matrix is not required.
verbose : boolean, optional
A boolean specifying the verbose level. Default is verbose = False.
theta0 : double array_like, optional
An array with shape (n_features, ) or (1, ). The parameters in the autocorrelation model. If thetaL and thetaU are also specified, theta0 is considered as the starting point for the maximum likelihood rstimation of the best set of parameters. Default assumes isotropic autocorrelation model with theta0 = 1e-1.
thetaL : double array_like, optional
An array with shape matching theta0’s. Lower bound on the autocorrelation parameters for maximum likelihood estimation. Default is None, so that it skips maximum likelihood estimation and it uses theta0.
thetaU : double array_like, optional
An array with shape matching theta0’s. Upper bound on the autocorrelation parameters for maximum likelihood estimation. Default is None, so that it skips maximum likelihood estimation and it uses theta0.
normalize : boolean, optional
Input X and observations y are centered and reduced wrt means and standard deviations estimated from the n_samples observations provided. Default is normalize = True so that data is normalized to ease maximum likelihood estimation.
nugget : double or ndarray, optional
Introduce a nugget effect to allow smooth predictions from noisy data. If nugget is an ndarray, it must be the same length as the number of data points used for the fit. The nugget is added to the diagonal of the assumed training covariance; in this way it acts as a Tikhonov regularization in the problem. In the special case of the squared exponential correlation function, the nugget mathematically represents the variance of the input values. Default assumes a nugget close to machine precision for the sake of robustness (nugget = 10. * MACHINE_EPSILON).
optimizer : string, optional

A string specifying the optimization algorithm to be used. Default uses ‘fmin_cobyla’ algorithm from scipy.optimize. Available optimizers are:

'fmin_cobyla', 'Welch'


‘Welch’ optimizer is dued to Welch et al., see reference [WBSWM1992]. It consists in iterating over several one-dimensional optimizations instead of running one single multi-dimensional optimization.

random_start : int, optional
The number of times the Maximum Likelihood Estimation should be performed from a random starting point. The first MLE always uses the specified starting point (theta0), the next starting points are picked at random according to an exponential distribution (log-uniform on [thetaL, thetaU]). Default does not use random starting point (random_start = 1).
random_state: integer or numpy.RandomState, optional
The generator used to shuffle the sequence of coordinates of theta in the Welch optimizer. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.

Attributes

theta_: array
Specified theta OR the best set of autocorrelation parameters (the sought maximizer of the reduced likelihood function).
reduced_likelihood_function_value_: array
The optimal reduced likelihood function value.

Examples

>>> import numpy as np
>>> from sklearn.gaussian_process import GaussianProcess
>>> X = np.array([[1., 3., 5., 6., 7., 8.]]).T
>>> y = (X * np.sin(X)).ravel()
>>> gp = GaussianProcess(theta0=0.1, thetaL=.001, thetaU=1.)
>>> gp.fit(X, y)
GaussianProcess(beta0=None...
...


Notes

The presentation implementation is based on a translation of the DACE Matlab toolbox, see reference [NLNS2002].

References

 [NLNS2002] H.B. Nielsen, S.N. Lophaven, H. B. Nielsen and J. Sondergaard. DACE - A MATLAB Kriging Toolbox. (2002) http://www2.imm.dtu.dk/~hbn/dace/dace.pdf
 [WBSWM1992] W.J. Welch, R.J. Buck, J. Sacks, H.P. Wynn, T.J. Mitchell, and M.D. Morris (1992). Screening, predicting, and computer experiments. Technometrics, 34(1) 15–25. http://www.jstor.org/pss/1269548

Full API documentation: GaussianProcessScikitsLearnNode

class mdp.nodes.MeanEstimatorScikitsLearnNode

This node has been automatically generated by wrapping the sklearn.ensemble.gradient_boosting.MeanEstimator class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Full API documentation: MeanEstimatorScikitsLearnNode

Regression based on neighbors within a fixed radius.

This node has been automatically generated by wrapping the sklearn.neighbors.regression.RadiusNeighborsRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set.

Parameters

radius : float, optional (default = 1.0)
Range of parameter space to use by default for :methradius_neighbors queries.
weights : str or callable

weight function used in prediction. Possible values:

• ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
• ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
• [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

Uniform weights are used by default.

algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional

Algorithm used to compute the nearest neighbors:

• ‘ball_tree’ will use BallTree
• ‘kd_tree’ will use scipy.spatial.cKDtree
• ‘brute’ will use a brute-force search.
• ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_size : int, optional (default = 30)
Leaf size passed to BallTree or cKDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.
p: integer, optional (default = 2)
Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

Examples

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> neigh.fit(X, y)
>>> print(neigh.predict([[1.5]]))
[ 0.5]


Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

class mdp.nodes.PLSSVDScikitsLearnNode

Partial Least Square SVD

This node has been automatically generated by wrapping the sklearn.pls.PLSSVD class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Simply perform a svd on the crosscovariance matrix: X’Y The are no iterative deflation here.

Parameters

X : array-like of predictors, shape = [n_samples, p]
Training vector, where n_samples in the number of samples and p is the number of predictors. X will be centered before any analysis.
Y : array-like of response, shape = [n_samples, q]
Training vector, where n_samples in the number of samples and q is the number of response variables. X will be centered before any analysis.
n_components : int, (default 2).
number of components to keep.
scale : boolean, (default True)
scale X and Y

Attributes

x_weights_ : array, [p, n_components]
X block weights vectors.
y_weights_ : array, [q, n_components]
Y block weights vectors.
x_scores_ : array, [n_samples, n_components]
X scores.
y_scores_ : array, [n_samples, n_components]
Y scores.

PLSCanonical CCA

Full API documentation: PLSSVDScikitsLearnNode

class mdp.nodes.LassoLarsCVScikitsLearnNode

Cross-validated Lasso, using the LARS algorithm

This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.LassoLarsCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1

Parameters

fit_intercept : boolean
whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
verbose : boolean or integer, optional
Sets the verbosity amount
normalize : boolean, optional
If True, the regressors X are normalized
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument.
max_iter: integer, optional
Maximum number of iterations to perform.
cv : crossvalidation generator, optional
see sklearn.cross_validation module. If None is passed, default to a 5-fold strategy
max_n_alphas : integer, optional
The maximum number of points on the path used to compute the residuals in the cross-validation
n_jobs : integer, optional
Number of CPUs to use during the cross validation. If ‘-1’, use all the CPUs
eps: float, optional
The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems.
copy_X : boolean, optional, default True
If True, X will be copied; else, it may be overwritten.

Attributes

coef_ : array, shape = [n_features]
parameter vector (w in the fomulation formula)
intercept_ : float
independent term in decision function.
coef_path_: array, shape = [n_features, n_alpha]
the varying values of the coefficients along the path
alpha_: float
the estimated regularization parameter alpha
alphas_: array, shape = [n_alpha]
the different values of alpha along the path
cv_alphas_: array, shape = [n_cv_alphas]
all the values of alpha along the path for the different folds
cv_mse_path_: array, shape = [n_folds, n_cv_alphas]
the mean square error on left-out for each fold along the path (alpha values given by cv_alphas)

Notes

The object solves the same problem as the LassoCV object. However, unlike the LassoCV, it find the relevent alphas values by itself. In general, because of this property, it will be more stable. However, it is more fragile to heavily multicollinear datasets.

It is more efficient than the LassoCV if only a small number of features are selected compared to the total number, for instance if there are very few samples compared to the number of features.

lars_path, LassoLars, LarsCV, LassoCV

Full API documentation: LassoLarsCVScikitsLearnNode

class mdp.nodes.KNeighborsRegressorScikitsLearnNode

Regression based on k-nearest neighbors.

This node has been automatically generated by wrapping the sklearn.neighbors.regression.KNeighborsRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set.

Parameters

n_neighbors : int, optional (default = 5)
Number of neighbors to use by default for k_neighbors() queries.
weights : str or callable

weight function used in prediction. Possible values:

• ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
• ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
• [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

Uniform weights are used by default.

algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional

Algorithm used to compute the nearest neighbors:

• ‘ball_tree’ will use BallTree
• ‘kd_tree’ will use scipy.spatial.cKDtree
• ‘brute’ will use a brute-force search.
• ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_size : int, optional (default = 30)
Leaf size passed to BallTree or cKDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.
warn_on_equidistant : boolean, optional. Defaults to True.
Generate a warning if equidistant neighbors are discarded. For classification or regression based on k-neighbors, if neighbor k and neighbor k+1 have identical distances but different labels, then the result will be dependent on the ordering of the training data. If the fit method is 'kd_tree', no warnings will be generated.
p: integer, optional (default = 2)
Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

Examples

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import KNeighborsRegressor
>>> neigh = KNeighborsRegressor(n_neighbors=2)
>>> neigh.fit(X, y)
KNeighborsRegressor(...)
>>> print(neigh.predict([[1.5]]))
[ 0.5]


Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Full API documentation: KNeighborsRegressorScikitsLearnNode

class mdp.nodes.RandomForestClassifierScikitsLearnNode

A random forest classifier.

This node has been automatically generated by wrapping the sklearn.ensemble.forest.RandomForestClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

A random forest is a meta estimator that fits a number of classifical decision trees on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.

Parameters

n_estimators : integer, optional (default=10)
The number of trees in the forest.
criterion : string, optional (default=”gini”)
The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain. Note: this parameter is tree-specific.
max_depth : integer or None, optional (default=None)
The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. Note: this parameter is tree-specific.
min_samples_split : integer, optional (default=1)
The minimum number of samples required to split an internal node. Note: this parameter is tree-specific.
min_samples_leaf : integer, optional (default=1)
The minimum number of samples in newly created leaves. A split is discarded if after the split, one of the leaves would contain less then min_samples_leaf samples. Note: this parameter is tree-specific.
min_density : float, optional (default=0.1)
This parameter controls a trade-off in an optimization heuristic. It controls the minimum density of the sample_mask (i.e. the fraction of samples in the mask). If the density falls below this threshold the mask is recomputed and the input data is packed which results in data copying. If min_density equals to one, the partitions are always represented as copies of the original data. Otherwise, partitions are represented as bit masks (aka sample masks). Note: this parameter is tree-specific.
max_features : int, string or None, optional (default=”auto”)

The number of features to consider when looking for the best split:

• If “auto”, then max_features=sqrt(n_features) on
• classification tasks and max_features=n_features on regression
• problems.
• If “sqrt”, then max_features=sqrt(n_features).
• If “log2”, then max_features=log2(n_features).
• If None, then max_features=n_features.

Note: this parameter is tree-specific.

bootstrap : boolean, optional (default=True)
Whether bootstrap samples are used when building trees.
compute_importances : boolean, optional (default=True)
Whether feature importances are computed and stored into the feature_importances_ attribute when calling fit.
oob_score : bool
Whether to use out-of-bag samples to estimate the generalization error.
n_jobs : integer, optional (default=1)
The number of jobs to run in parallel. If -1, then the number of jobs is set to the number of cores.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
verbose : int, optional (default=0)
Controlls the verbosity of the tree building process.

Attributes

estimators_: list of DecisionTreeClassifier
The collection of fitted sub-estimators.
feature_importances_ : array, shape = [n_features]
The feature importances (the higher, the more important the feature).
oob_score_ : float
Score of the training dataset obtained using an out-of-bag estimate.
oob_decision_function_ : array, shape = [n_samples, n_classes]
Decision function computed with out-of-bag estimate on the training set.

References

 [1] Breiman, “Random Forests”, Machine Learning, 45(1), 5-32, 2001.

DecisionTreeClassifier, ExtraTreesClassifier

Full API documentation: RandomForestClassifierScikitsLearnNode

class mdp.nodes.ForestRegressorScikitsLearnNode

Base class for forest of trees-based regressors.

This node has been automatically generated by wrapping the sklearn.ensemble.forest.ForestRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Warning: This class should not be used directly. Use derived classes instead.

Full API documentation: ForestRegressorScikitsLearnNode

Least Angle Regression model a.k.a. LAR

This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.Lars class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

n_nonzero_coefs : int, optional
Target number of non-zero coefficients. Use np.inf for no limit.
fit_intercept : boolean
Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
verbose : boolean or integer, optional
Sets the verbosity amount
normalize : boolean, optional
If True, the regressors X are normalized
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument.
copy_X : boolean, optional, default True
If True, X will be copied; else, it may be overwritten.
eps: float, optional
The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems. Unlike the ‘tol’ parameter in some iterative optimization-based algorithms, this parameter does not control the tolerance of the optimization.
fit_path : boolean
If True the full path is stored in the coef_path_ attribute. If you compute the solution for a large problem or many targets, setting fit_path to False will lead to a speedup, especially with a small alpha.

Attributes

coef_path_ : array, shape = [n_features, n_alpha]
The varying values of the coefficients along the path. It is not present if the fit_path parameter is False.
coef_ : array, shape = [n_features]
Parameter vector (w in the fomulation formula).
intercept_ : float
Independent term in decision function.

Examples

>>> from sklearn import linear_model
>>> clf = linear_model.Lars(n_nonzero_coefs=1)
>>> clf.fit([[-1, 1], [0, 0], [1, 1]], [-1.1111, 0, -1.1111])
...
Lars(copy_X=True, eps=..., fit_intercept=True, fit_path=True,
n_nonzero_coefs=1, normalize=True, precompute='auto', verbose=False)
>>> print(clf.coef_)
[ 0. -1.11...]


lars_path, LarsCV sklearn.decomposition.sparse_encode

http://en.wikipedia.org/wiki/Least_angle_regression

class mdp.nodes.ElasticNetScikitsLearnNode

Linear Model trained with L1 and L2 prior as regularizer

This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.ElasticNet class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Minimizes the objective function:

1 / (2 * n_samples) * ||y - Xw||^2_2 +
+ alpha * rho * ||w||_1 + 0.5 * alpha * (1 - rho) * ||w||^2_2

If you are interested in controlling the L1 and L2 penalty separately, keep in mind that this is equivalent to:

a * L1 + b * L2


where:

alpha = a + b and rho = a / (a + b)


The parameter rho corresponds to alpha in the glmnet R package while alpha corresponds to the lambda parameter in glmnet. Specifically, rho = 1 is the lasso penalty. Currently, rho <= 0.01 is not reliable, unless you supply your own sequence of alpha.

Parameters

alpha : float
Constant that multiplies the penalty terms. Defaults to 1.0 See the notes for the exact mathematical meaning of this parameter
rho : float
The ElasticNet mixing parameter, with 0 < rho <= 1. For rho = 0 the penalty is an L1 penalty. For rho = 1 it is an L2 penalty. For 0 < rho < 1, the penalty is a combination of L1 and L2
fit_intercept: bool
Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
normalize : boolean, optional
If True, the regressors X are normalized
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument. For sparse input this option is always True to preserve sparsity.
max_iter: int, optional
The maximum number of iterations
copy_X : boolean, optional, default False
If True, X will be copied; else, it may be overwritten.
tol: float, optional
The tolerance for the optimization: if the updates are smaller than ‘tol’, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.
warm_start : bool, optional
When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
positive: bool, optional
When set to True, forces the coefficients to be positive.

Attributes

coef_ : array, shape = (n_features,)
parameter vector (w in the cost function formula)
sparse_coef_ : scipy.sparse matrix, shape = (n_features, 1)
sparse_coef_ is a readonly property derived from coef_
intercept_ : float | array, shape = (n_targets,)
independent term in decision function.

Notes

To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.

Full API documentation: ElasticNetScikitsLearnNode

class mdp.nodes.IsomapScikitsLearnNode

Isomap Embedding

This node has been automatically generated by wrapping the sklearn.manifold.isomap.Isomap class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Non-linear dimensionality reduction through Isometric Mapping

Parameters

n_neighbors : integer
number of neighbors to consider for each point.
n_components : integer
number of coordinates for the manifold
eigen_solver : [‘auto’|’arpack’|’dense’]
‘auto’ : attempt to choose the most efficient solver
for the given problem.
‘arpack’ : use Arnoldi decomposition to find the eigenvalues
and eigenvectors. Note that arpack can handle both dense and sparse data efficiently
‘dense’ : use a direct solver (i.e. LAPACK)
for the eigenvalue decomposition.
tol : float
convergence tolerance passed to arpack or lobpcg. not used if eigen_solver == ‘dense’
max_iter : integer
maximum number of iterations for the arpack solver. not used if eigen_solver == ‘dense’
path_method : string [‘auto’|’FW’|’D’]
method to use in finding shortest path. ‘auto’ : attempt to choose the best algorithm automatically ‘FW’ : Floyd-Warshall algorithm ‘D’ : Dijkstra algorithm with Fibonacci Heaps
neighbors_algorithm : string [‘auto’|’brute’|’kd_tree’|’ball_tree’]
algorithm to use for nearest neighbors search, passed to neighbors.NearestNeighbors instance

Attributes

embedding_ : array-like, shape (n_samples, n_components)
Stores the embedding vectors

kernel_pca_ : KernelPCA object used to implement the embedding

training_data_ : array-like, shape (n_samples, n_features)
Stores the training data
nbrs_ : sklearn.neighbors.NearestNeighbors instance
Stores nearest neighbors instance, including BallTree or KDtree if applicable.
dist_matrix_ : array-like, shape (n_samples, n_samples)
Stores the geodesic distance matrix of training data

References

[1] Tenenbaum, J.B.; De Silva, V.; & Langford, J.C. A global geometric
framework for nonlinear dimensionality reduction. Science 290 (5500)

Full API documentation: IsomapScikitsLearnNode

class mdp.nodes.BinarizerScikitsLearnNode

Binarize data (set feature values to 0 or 1) according to a threshold

This node has been automatically generated by wrapping the sklearn.preprocessing.Binarizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The default threshold is 0.0 so that any non-zero values are set to 1.0 and zeros are left untouched.

Binarization is a common operation on text count data where the analyst can decide to only consider the presence or absence of a feature rather than a quantified number of occurences for instance.

It can also be used as a pre-processing step for estimators that consider boolean random variables (e.g. modeled using the Bernoulli distribution in a Bayesian setting).

Parameters

threshold : float, optional (0.0 by default)
The lower bound that triggers feature values to be replaced by 1.0.
copy : boolean, optional, default is True
set to False to perform inplace binarization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix).

Notes

If the input is a sparse matrix, only the non-zero values are subject to update by the Binarizer class.

This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.

Full API documentation: BinarizerScikitsLearnNode

class mdp.nodes.MiniBatchDictionaryLearningScikitsLearnNode

Mini-batch dictionary learning

This node has been automatically generated by wrapping the sklearn.decomposition.dict_learning.MiniBatchDictionaryLearning class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Finds a dictionary (a set of atoms) that can best be used to represent data using a sparse code.

Solves the optimization problem:

(U^*,V^*) = argmin 0.5 || Y - U V ||_2^2 + alpha * || U ||_1
(U,V)
with || V_k ||_2 = 1 for all  0 <= k < n_atoms

Parameters

n_atoms : int,
number of dictionary elements to extract
alpha : int,
sparsity controlling parameter
n_iter : int,
total number of iterations to perform
fit_algorithm : {‘lars’, ‘cd’}
lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.
transform_algorithm : {‘lasso_lars’, ‘lasso_cd’, ‘lars’, ‘omp’, ‘threshold’}
Algorithm used to transform the data. lars: uses the least angle regression method (linear_model.lars_path) lasso_lars: uses Lars to compute the Lasso solution lasso_cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). lasso_lars will be faster if the estimated components are sparse. omp: uses orthogonal matching pursuit to estimate the sparse solution threshold: squashes to zero all coefficients less than alpha from the projection dictionary * X’
transform_n_nonzero_coefs : int, 0.1 * n_features by default
Number of nonzero coefficients to target in each column of the solution. This is only used by algorithm=’lars’ and algorithm=’omp’ and is overridden by alpha in the omp case.
transform_alpha : float, 1. by default
If algorithm=’lasso_lars’ or algorithm=’lasso_cd’, alpha is the penalty applied to the L1 norm. If algorithm=’threshold’, alpha is the absolute value of the threshold below which coefficients will be squashed to zero. If algorithm=’omp’, alpha is the tolerance parameter: the value of the reconstruction error targeted. In this case, it overrides n_nonzero_coefs.
split_sign : bool, False by default
Whether to split the sparse feature vector into the concatenation of its negative part and its positive part. This can improve the performance of downstream classifiers.
n_jobs : int,
number of parallel jobs to run
dict_init : array of shape (n_atoms, n_features),
initial value of the dictionary for warm restart scenarios

verbose :

• degree of verbosity of the printed output
chunk_size : int,
number of samples in each mini-batch
shuffle : bool,
whether to shuffle the samples before forming batches
random_state : int or RandomState
Pseudo number generator state used for random sampling.

Attributes

components_ : array, [n_atoms, n_features]
components extracted from the data

Notes

References:

J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning for sparse coding (http://www.di.ens.fr/sierra/pdfs/icml09.pdf)

SparseCoder DictionaryLearning SparsePCA MiniBatchSparsePCA

Full API documentation: MiniBatchDictionaryLearningScikitsLearnNode

class mdp.nodes.TfidfVectorizerScikitsLearnNode

Convert a collection of raw documents to a matrix of TF-IDF features.

This node has been automatically generated by wrapping the sklearn.feature_extraction.text.TfidfVectorizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Equivalent to CountVectorizer followed by TfidfTransformer.

Parameters

input: string {‘filename’, ‘file’, ‘content’}

If filename, the sequence passed as an argument to fit is expected to be a list of filenames that need reading to fetch the raw content to analyze.

If ‘file’, the sequence items must have ‘read’ method (file-like object) it is called to fetch the bytes in memory.

Otherwise the input is expected to be the sequence strings or bytes items are expected to be analyzed directly.

charset: string, ‘utf-8’ by default.
If bytes or files are given to analyze, this charset is used to decode.
charset_error: {‘strict’, ‘ignore’, ‘replace’}
Instruction on what to do if a byte sequence is given to analyze that contains characters not of the given charset. By default, it is ‘strict’, meaning that a UnicodeDecodeError will be raised. Other values are ‘ignore’ and ‘replace’.
strip_accents: {‘ascii’, ‘unicode’, None}
Remove accents during the preprocessing step. ‘ascii’ is a fast method that only works on characters that have an direct ASCII mapping. ‘unicode’ is a slightly slower method that works on any characters. None (default) does nothing.
analyzer: string, {‘word’, ‘char’} or callable

Whether the feature should be made of word or character n-grams.

If a callable is passed it is used to extract the sequence of features out of the raw, unprocessed input.

preprocessor: callable or None (default)
Override the preprocessing (string transformation) stage while preserving the tokenizing and n-grams generation steps.
tokenizer: callable or None (default)
Override the string tokenization step while preserving the preprocessing and n-grams generation steps.
ngram_range: tuple (min_n, max_n)
The lower and upper boundary of the range of n-values for different n-grams to be extracted. All values of n such that min_n <= n <= max_n will be used.
stop_words: string {‘english’}, list, or None (default)

If a string, it is passed to _check_stop_list and the appropriate stop list is returned is currently the only supported string value.

If a list, that list is assumed to contain stop words, all of which will be removed from the resulting tokens.

If None, no stop words will be used. max_df can be set to a value in the range [0.7, 1.0) to automatically detect and filter stop words based on intra corpus document frequency of terms.

lowercase: boolean, default True
Convert all characters to lowercase befor tokenizing.
token_pattern: string
Regular expression denoting what constitutes a “token”, only used if tokenize == ‘word’. The default regexp select tokens of 2 or more letters characters (punctuation is completely ignored and always treated as a token separator).
max_df : float in range [0.0, 1.0] or int, optional, 1.0 by default
When building the vocabulary ignore terms that have a term frequency strictly higher than the given threshold (corpus specific stop words). If float, the parameter represents a proportion of documents, integer absolute counts. This parameter is ignored if vocabulary is not None.
min_df : float in range [0.0, 1.0] or int, optional, 2 by default
When building the vocabulary ignore terms that have a term frequency strictly lower than the given threshold. This value is also called cut-off in the literature. If float, the parameter represents a proportion of documents, integer absolute counts. This parameter is ignored if vocabulary is not None.
max_features : optional, None by default

If not None, build a vocabulary that only consider the top max_features ordered by term frequency across the corpus.

This parameter is ignored if vocabulary is not None.

vocabulary: Mapping or iterable, optional
Either a Mapping (e.g., a dict) where keys are terms and values are indices in the feature matrix, or an iterable over terms. If not given, a vocabulary is determined from the input documents.
binary: boolean, False by default.
If True, all non zero counts are set to 1. This is useful for discrete probabilistic models that model binary events rather than integer counts.
dtype: type, optional
Type of the matrix returned by fit_transform() or transform().
norm : ‘l1’, ‘l2’ or None, optional
Norm used to normalize term vectors. None for no normalization.
use_idf : boolean, optional
Enable inverse-document-frequency reweighting.
smooth_idf : boolean, optional
Smooth idf weights by adding one to document frequencies, as if an extra document was seen containing every term in the collection exactly once. Prevents zero divisions.
sublinear_tf : boolean, optional
Apply sublinear tf scaling, i.e. replace tf with 1 + log(tf).

CountVectorizer
Tokenize the documents and count the occurrences of token and return them as a sparse matrix
TfidfTransformer
Apply Term Frequency Inverse Document Frequency normalization to a sparse matrix of occurrence counts.

Full API documentation: TfidfVectorizerScikitsLearnNode

class mdp.nodes.RandomizedPCAScikitsLearnNode

Principal component analysis (PCA) using randomized SVD

This node has been automatically generated by wrapping the sklearn.decomposition.pca.RandomizedPCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Linear dimensionality reduction using approximated Singular Value Decomposition of the data and keeping only the most significant singular vectors to project the data to a lower dimensional space.

This implementation uses a randomized SVD implementation and can handle both scipy.sparse and numpy dense arrays as input.

Parameters

n_components : int
Maximum number of components to keep: default is 50.
copy : bool
If False, data passed to fit are overwritten
iterated_power : int, optional
Number of iteration for the power method. 3 by default.
whiten : bool, optional

When True (False by default) the components_ vectors are divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.

Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hard-wired assumptions.

random_state : int or RandomState instance or None (default)
Pseudo Random Number generator seed control. If None, use the numpy.random singleton.

Attributes

components_ : array, [n_components, n_features]
Components with maximum variance.
explained_variance_ratio_ : array, [n_components]
Percentage of variance explained by each of the selected components. k is not set then all components are stored and the sum of explained variances is equal to 1.0

Examples

>>> import numpy as np
>>> from sklearn.decomposition import RandomizedPCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = RandomizedPCA(n_components=2)
>>> pca.fit(X)
RandomizedPCA(copy=True, iterated_power=3, n_components=2,
random_state=<mtrand.RandomState object at 0x...>, whiten=False)
>>> print(pca.explained_variance_ratio_)
[ 0.99244...  0.00755...]


PCA ProbabilisticPCA

References

 [Halko2009] Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions Halko, et al., 2009 (arXiv:909)
 [MRT] A randomized algorithm for the decomposition of matrices Per-Gunnar Martinsson, Vladimir Rokhlin and Mark Tygert

Full API documentation: RandomizedPCAScikitsLearnNode

class mdp.nodes.MiniBatchSparsePCAScikitsLearnNode

Mini-batch Sparse Principal Components Analysis

This node has been automatically generated by wrapping the sklearn.decomposition.sparse_pca.MiniBatchSparsePCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha.

Parameters

n_components : int,
number of sparse atoms to extract
alpha : int,
Sparsity controlling parameter. Higher values lead to sparser components.
ridge_alpha : float,
Amount of ridge shrinkage to apply in order to improve conditioning when calling the transform method.
n_iter : int,
number of iterations to perform for each mini batch
callback : callable,
callable that gets invoked every five iterations
chunk_size : int,
the number of features to take in each mini batch

verbose :

• degree of output the procedure will print
shuffle : boolean,
whether to shuffle the data before splitting it in batches
n_jobs : int,
number of parallel jobs to run, or -1 to autodetect.
method : {‘lars’, ‘cd’}
lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.
random_state : int or RandomState
Pseudo number generator state used for random sampling.

Attributes

components_ : array, [n_components, n_features]
Sparse components extracted from the data.
error_ : array
Vector of errors at each iteration.

PCA SparsePCA DictionaryLearning

Full API documentation: MiniBatchSparsePCAScikitsLearnNode

Non-Negative matrix factorization by Projected Gradient (NMF)

This node has been automatically generated by wrapping the sklearn.decomposition.nmf.ProjectedGradientNMF class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

X: {array-like, sparse matrix}, shape = [n_samples, n_features]
Data the model will be fit to.
n_components: int or None
Number of components, if n_components is not set all components are kept
init: ‘nndsvd’ | ‘nndsvda’ | ‘nndsvdar’ | int | RandomState

Method used to initialize the procedure. Default: ‘nndsvdar’ Valid options:

'nndsvd': Nonnegative Double Singular Value Decomposition (NNDSVD)
initialization (better for sparseness)
'nndsvda': NNDSVD with zeros filled with the average of X
(better when sparsity is not desired)
'nndsvdar': NNDSVD with zeros filled with small random values
(generally faster, less accurate alternative to NNDSVDa
for when sparsity is not desired)
int seed or RandomState: non-negative random matrices
sparseness: ‘data’ | ‘components’ | None, default: None
Where to enforce sparsity in the model.
beta: double, default: 1
Degree of sparseness, if sparseness is not None. Larger values mean more sparseness.
eta: double, default: 0.1
Degree of correctness to mantain, if sparsity is not None. Smaller values mean larger error.
tol: double, default: 1e-4
Tolerance value used in stopping conditions.
max_iter: int, default: 200
Number of iterations to compute.
nls_max_iter: int, default: 2000
Number of iterations in NLS subproblem.

Attributes

components_ : array, [n_components, n_features]
Non-negative components of the data
reconstruction_err_ : number
Frobenius norm of the matrix difference between the training data and the reconstructed data from the fit produced by the model. || X - WH ||_2 Not computed for sparse input matrices because it is too expensive in terms of memory.

Examples

>>> import numpy as np
>>> X = np.array([[1,1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])
>>> model.fit(X)
nls_max_iter=2000, sparseness=None, tol=0.0001)
>>> model.components_
array([[ 0.77032744,  0.11118662],
[ 0.38526873,  0.38228063]])
>>> model.reconstruction_err_
0.00746...
...                              sparseness='components')
>>> model.fit(X)
nls_max_iter=2000, sparseness='components', tol=0.0001)
>>> model.components_
array([[ 1.67481991,  0.29614922],
[-0.        ,  0.4681982 ]])
>>> model.reconstruction_err_
0.513...


Notes

This implements

C.-J. Lin. Projected gradient methods for non-negative matrix factorization. Neural Computation, 19(2007), 2756-2779. http://www.csie.ntu.edu.tw/~cjlin/nmf/

P. Hoyer. Non-negative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Research 2004.

NNDSVD is introduced in

C. Boutsidis, E. Gallopoulos: SVD based initialization: A head start for nonnegative matrix factorization - Pattern Recognition, 2008 http://www.cs.rpi.edu/~boutsc/files/nndsvd.pdf

class mdp.nodes.ElasticNetCVScikitsLearnNode

Elastic Net model with iterative fitting along a regularization path

This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.ElasticNetCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The best model is selected by cross-validation.

Parameters

rho : float, optional
float between 0 and 1 passed to ElasticNet (scaling between l1 and l2 penalties). For rho = 0 the penalty is an L1 penalty. For rho = 1 it is an L2 penalty. For 0 < rho < 1, the penalty is a combination of L1 and L2 This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values for rho is often to put more values close to 1 (i.e. Lasso) and less close to 0 (i.e. Ridge), as in [.1, .5, .7, .9, .95, .99, 1]
eps : float, optional
Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3.
n_alphas : int, optional
Number of alphas along the regularization path
alphas : numpy array, optional
List of alphas where to compute the models. If None alphas are set automatically
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument.
max_iter : int, optional
The maximum number of iterations
tol : float, optional
The tolerance for the optimization: if the updates are smaller than ‘tol’, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.
cv : integer or crossvalidation generator, optional
If an integer is passed, it is the number of fold (default 3). Specific crossvalidation objects can be passed, see sklearn.cross_validation module for the list of possible objects
verbose : bool or integer
amount of verbosity
n_jobs : integer, optional
Number of CPUs to use during the cross validation. If ‘-1’, use all the CPUs. Note that this is used only if multiple values for rho are given.

Attributes

alpha_ : float
The amount of penalization choosen by cross validation
rho_ : float
The compromise between l1 and l2 penalization choosen by cross validation
coef_ : array, shape = (n_features,)
parameter vector (w in the cost function formula)
intercept_ : float
independent term in decision function.
mse_path_ : array, shape = (n_rho, n_alpha, n_folds)
mean square error for the test set on each fold, varying rho and alpha

Notes

See examples/linear_model/lasso_path_with_crossvalidation.py for an example.

To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.

The parameter rho corresponds to alpha in the glmnet R package while alpha corresponds to the lambda parameter in glmnet. More specifically, the optimization objective is:

1 / (2 * n_samples) * ||y - Xw||^2_2 +
+ alpha * rho * ||w||_1 + 0.5 * alpha * (1 - rho) * ||w||^2_2

If you are interested in controlling the L1 and L2 penalty separately, keep in mind that this is equivalent to:

a * L1 + b * L2


for:

alpha = a + b and rho = a / (a + b)


enet_path ElasticNet

Full API documentation: ElasticNetCVScikitsLearnNode

class mdp.nodes.LassoLarsICScikitsLearnNode

Lasso model fit with Lars using BIC or AIC for model selection

This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.LassoLarsIC class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1

AIC is the Akaike information criterion and BIC is the Bayes Information criterion. Such criteria are useful to select the value of the regularization parameter by making a trade-off between the goodness of fit and the complexity of the model. A good model should explain well the data while being simple.

Parameters

criterion: ‘bic’ | ‘aic’
The type of criterion to use.
fit_intercept : boolean
whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
verbose : boolean or integer, optional
Sets the verbosity amount
normalize : boolean, optional
If True, the regressors X are normalized
copy_X : boolean, optional, default True
If True, X will be copied; else, it may be overwritten.
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument.
max_iter: integer, optional
Maximum number of iterations to perform. Can be used for early stopping.
eps: float, optional
The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems. Unlike the ‘tol’ parameter in some iterative optimization-based algorithms, this parameter does not control the tolerance of the optimization.

Attributes

coef_ : array, shape = [n_features]
parameter vector (w in the fomulation formula)
intercept_ : float
independent term in decision function.
alpha_ : float
the alpha parameter chosen by the information criterion

Examples

>>> from sklearn import linear_model
>>> clf = linear_model.LassoLarsIC(criterion='bic')
>>> clf.fit([[-1, 1], [0, 0], [1, 1]], [-1.1111, 0, -1.1111])
...
LassoLarsIC(copy_X=True, criterion='bic', eps=..., fit_intercept=True,
max_iter=500, normalize=True, precompute='auto',
verbose=False)
>>> print(clf.coef_)
[ 0.  -1.11...]


Notes

The estimation of the number of degrees of freedom is given by:

“On the degrees of freedom of the lasso” Hui Zou, Trevor Hastie, and Robert Tibshirani Ann. Statist. Volume 35, Number 5 (2007), 2173-2192.

lars_path, LassoLars, LassoLarsCV

Full API documentation: LassoLarsICScikitsLearnNode

class mdp.nodes.RFEScikitsLearnNode

Feature ranking with recursive feature elimination.

This node has been automatically generated by wrapping the sklearn.feature_selection.rfe.RFE class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and weights are assigned to each one of them. Then, features whose absolute weights are the smallest are pruned from the current set features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached.

Parameters

estimator : object

A supervised learning estimator with a fit method that updates a coef_ attribute that holds the fitted parameters. Important features must correspond to high absolute values in the coef_ array.

For instance, this is the case for most supervised learning algorithms such as Support Vector Classifiers and Generalized Linear Models from the svm and linear_model modules.

n_features_to_select : int or None (default=None)
The number of features to select. If None, half of the features are selected.
step : int or float, optional (default=1)
If greater than or equal to 1, then step corresponds to the (integer) number of features to remove at each iteration. If within (0.0, 1.0), then step corresponds to the percentage (rounded down) of features to remove at each iteration.

Attributes

n_features_ : int
The number of selected features.
support_ : array of shape [n_features]
ranking_ : array of shape [n_features]
The feature ranking, such that ranking_[i] corresponds to the ranking position of the i-th feature. Selected (i.e., estimated best) features are assigned rank 1.

Examples

The following example shows how to retrieve the 5 right informative features in the Friedman #1 dataset.

>>> from sklearn.datasets import make_friedman1
>>> from sklearn.feature_selection import RFE
>>> from sklearn.svm import SVR
>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
>>> estimator = SVR(kernel="linear")
>>> selector = RFE(estimator, 5, step=1)
>>> selector = selector.fit(X, y)
>>> selector.support_
array([ True,  True,  True,  True,  True,
False, False, False, False, False], dtype=bool)
>>> selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])


References

 [1] Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., “Gene selection for cancer classification using support vector machines”, Mach. Learn., 46(1-3), 389–422, 2002.

Full API documentation: RFEScikitsLearnNode

class mdp.nodes.PCAScikitsLearnNode

Principal component analysis (PCA)

This node has been automatically generated by wrapping the sklearn.decomposition.pca.PCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Linear dimensionality reduction using Singular Value Decomposition of the data and keeping only the most significant singular vectors to project the data to a lower dimensional space.

This implementation uses the scipy.linalg implementation of the singular value decomposition. It only works for dense arrays and is not scalable to large dimensional data.

The time complexity of this implementation is O(n ** 3) assuming n ~ n_samples ~ n_features.

Parameters

n_components : int, None or string

Number of components to keep. if n_components is not set all components are kept:

n_components == min(n_samples, n_features)


if n_components == ‘mle’, Minka’s MLE is used to guess the dimension if 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components

copy : bool
If False, data passed to fit are overwritten
whiten : bool, optional

When True (False by default) the components_ vectors are divided by n_samples times singular values to ensure uncorrelated outputs with unit component-wise variances.

Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making there data respect some hard-wired assumptions.

Attributes

components_ : array, [n_components, n_features]
Components with maximum variance.
explained_variance_ratio_ : array, [n_components]
Percentage of variance explained by each of the selected components. k is not set then all components are stored and the sum of explained variances is equal to 1.0

Notes

For n_components=’mle’, this class uses the method of Thomas P. Minka:

Automatic Choice of Dimensionality for PCA. NIPS 2000: 598-604

Due to implementation subtleties of the Singular Value Decomposition (SVD), which is used in this implementation, running fit twice on the same matrix can lead to principal components with signs flipped (change in direction). For this reason, it is important to always use the same estimator object to transform data in a consistent fashion.

Examples

>>> import numpy as np
>>> from sklearn.decomposition import PCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = PCA(n_components=2)
>>> pca.fit(X)
PCA(copy=True, n_components=2, whiten=False)
>>> print(pca.explained_variance_ratio_)
[ 0.99244...  0.00755...]


ProbabilisticPCA RandomizedPCA KernelPCA SparsePCA

Full API documentation: PCAScikitsLearnNode

Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer

This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.MultiTaskLasso class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||Y - XW||^2_Fro + alpha * ||W||_21

Where:

||W||_21 = \sum_i \sqrt{\sum_j w_{ij}^2}

i.e. the sum of norm of earch row.

Parameters

alpha : float, optional
Constant that multiplies the L1/L2 term. Defaults to 1.0
fit_intercept : boolean
whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
normalize : boolean, optional
If True, the regressors X are normalized
copy_X : boolean, optional, default True
If True, X will be copied; else, it may be overwritten.
max_iter : int, optional
The maximum number of iterations
tol : float, optional
The tolerance for the optimization: if the updates are smaller than ‘tol’, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.
warm_start : bool, optional
When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.

Attributes

coef_ : array, shape = (n_tasks, n_features)
parameter vector (W in the cost function formula)
intercept_ : array, shape = (n_tasks,)
independent term in decision function.

Examples

>>> from sklearn import linear_model
>>> clf.fit([[0,0], [1, 1], [2, 2]], [[0, 0], [1, 1], [2, 2]])
normalize=False, tol=0.0001, warm_start=False)
>>> print clf.coef_
[[ 0.89393398  0.        ]
[ 0.89393398  0.        ]]
>>> print clf.intercept_
[ 0.10606602  0.10606602]


Notes

The algorithm used to fit the model is coordinate descent.

To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.

class mdp.nodes.RandomizedLogisticRegressionScikitsLearnNode

Randomized Logistic Regression

This node has been automatically generated by wrapping the sklearn.linear_model.randomized_l1.RandomizedLogisticRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Randomized Regression works by resampling the train data and computing a LogisticRegression on each resampling. In short, the features selected more often are good features. It is also known as stability selection.

Parameters

C : float
The regularization parameter C in the LogisticRegression.
scaling : float
The alpha parameter in the stability selection article used to randomly scale the features. Should be between 0 and 1.
sample_fraction : float
The fraction of samples to be used in each randomized design. Should be between 0 and 1. If 1, all samples are used.
fit_intercept : boolean
whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
verbose : boolean or integer, optional
Sets the verbosity amount
normalize : boolean, optional
If True, the regressors X are normalized
tol : float, optional
tolerance for stopping criteria of LogisticRegression
n_jobs : integer, optional
Number of CPUs to use during the resampling. If ‘-1’, use all the CPUs
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
pre_dispatch : int, or string, optional

Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:

• None, in which case all the jobs are immediatly created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs
• An int, giving the exact number of total jobs that are spawned
• A string, giving an expression as a function of n_jobs, as in ‘2*n_jobs’
memory : Instance of joblib.Memory or string
Used for internal caching. By default, no caching is done. If a string is given, it is thepath to the caching directory.

Attributes

scores_ : array, shape = [n_features]
Feature scores between 0 and 1.
all_scores_ : array, shape = [n_features, n_reg_parameter]
Feature scores between 0 and 1 for all values of the regularization parameter. The reference article suggests scores_ is the max of all_scores_.

Examples

>>> from sklearn.linear_model import RandomizedLogisticRegression
>>> randomized_logistic = RandomizedLogisticRegression()


Notes

See examples/linear_model/plot_randomized_lasso.py for an example.

References

Stability selection Nicolai Meinshausen, Peter Buhlmann Journal of the Royal Statistical Society: Series B Volume 72, Issue 4, pages 417-473, September 2010 DOI: 10.1111/j.1467-9868.2010.00740.x

RandomizedLasso, Lasso, ElasticNet

Full API documentation: RandomizedLogisticRegressionScikitsLearnNode

class mdp.nodes.SelectFweScikitsLearnNode

Filter: Select the p-values corresponding to Family-wise error rate

This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectFwe class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

score_func: callable

Function taking two arrays X and y, and returning 2 arrays:

• both scores and pvalues
alpha: float, optional
The highest uncorrected p-value for features to keep

Full API documentation: SelectFweScikitsLearnNode

Multi-task ElasticNet model trained with L1/L2 mixed-norm as regularizer

This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.MultiTaskElasticNet class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||Y - XW||^Fro_2
+ alpha * rho * ||W||_21 + 0.5 * alpha * (1 - rho) * ||W||_Fro^2

Where:

||W||_21 = \sum_i \sqrt{\sum_j w_{ij}^2}

i.e. the sum of norm of earch row.

Parameters

alpha : float, optional
Constant that multiplies the L1/L2 term. Defaults to 1.0
rho : float
The ElasticNet mixing parameter, with 0 < rho <= 1. For rho = 0 the penalty is an L1/L2 penalty. For rho = 1 it is an L2 penalty. For 0 < rho < 1, the penalty is a combination of L1/L2 and L2
fit_intercept : boolean
whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
normalize : boolean, optional
If True, the regressors X are normalized
copy_X : boolean, optional, default True
If True, X will be copied; else, it may be overwritten.
max_iter : int, optional
The maximum number of iterations
tol : float, optional
The tolerance for the optimization: if the updates are smaller than ‘tol’, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.
warm_start : bool, optional
When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.

Attributes

intercept_ : array, shape = (n_tasks,)
Independent term in decision function.
coef_ : array, shape = (n_tasks, n_features)
Parameter vector (W in the cost function formula). If a 1D y is passed in at fit (non multi-task usage), coef_ is then a 1D array

Examples

>>> from sklearn import linear_model
>>> clf.fit([[0,0], [1, 1], [2, 2]], [[0, 0], [1, 1], [2, 2]])
...
max_iter=1000, normalize=False, rho=0.5, tol=0.0001,
warm_start=False)
>>> print clf.coef_
[[ 0.45663524  0.45612256]
[ 0.45663524  0.45612256]]
>>> print clf.intercept_
[ 0.0872422  0.0872422]


Notes

The algorithm used to fit the model is coordinate descent.

To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.

class mdp.nodes.SparseCoderScikitsLearnNode

Sparse coding

This node has been automatically generated by wrapping the sklearn.decomposition.dict_learning.SparseCoder class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Finds a sparse representation of data against a fixed, precomputed dictionary.

Each row of the result is the solution to a sparse coding problem. The goal is to find a sparse array code such that:

X ~= code * dictionary

Parameters

dictionary : array, [n_atoms, n_features]
The dictionary atoms used for sparse coding. Lines are assumed to be normalized to unit norm.
transform_algorithm : {‘lasso_lars’, ‘lasso_cd’, ‘lars’, ‘omp’, ‘threshold’}

Algorithm used to transform the data:

• lars: uses the least angle regression method (linear_model.lars_path)
• lasso_lars: uses Lars to compute the Lasso solution
• lasso_cd: uses the coordinate descent method to compute the
• Lasso solution (linear_model.Lasso). lasso_lars will be faster if
• the estimated components are sparse.
• omp: uses orthogonal matching pursuit to estimate the sparse solution
• threshold: squashes to zero all coefficients less than alpha from
• the projection dictionary * X'
transform_n_nonzero_coefs : int, 0.1 * n_features by default
Number of nonzero coefficients to target in each column of the solution. This is only used by algorithm=’lars’ and algorithm=’omp’ and is overridden by alpha in the omp case.
transform_alpha : float, 1. by default
If algorithm=’lasso_lars’ or algorithm=’lasso_cd’, alpha is the penalty applied to the L1 norm. If algorithm=’threshold’, alpha is the absolute value of the threshold below which coefficients will be squashed to zero. If algorithm=’omp’, alpha is the tolerance parameter: the value of the reconstruction error targeted. In this case, it overrides n_nonzero_coefs.
split_sign : bool, False by default
Whether to split the sparse feature vector into the concatenation of its negative part and its positive part. This can improve the performance of downstream classifiers.
n_jobs : int,
number of parallel jobs to run

Attributes

components_ : array, [n_atoms, n_features]
The unchanged dictionary atoms

DictionaryLearning MiniBatchDictionaryLearning SparsePCA MiniBatchSparsePCA sparse_encode

Full API documentation: SparseCoderScikitsLearnNode

class mdp.nodes.GMMScikitsLearnNode

Gaussian Mixture Model

This node has been automatically generated by wrapping the sklearn.mixture.gmm.GMM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Representation of a Gaussian mixture model probability distribution. This class allows for easy evaluation of, sampling from, and maximum-likelihood estimation of the parameters of a GMM distribution.

Initializes parameters such that every mixture component has zero mean and identity covariance.

Parameters

n_components : int, optional
Number of mixture components. Defaults to 1.
covariance_type : string, optional
String describing the type of covariance parameters to use. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. Defaults to ‘diag’.
random_state: RandomState or an int seed (0 by default)
A random number generator instance
min_covar : float, optional
Floor on the diagonal of the covariance matrix to prevent overfitting. Defaults to 1e-3.
thresh : float, optional
Convergence threshold.
n_iter : int, optional
Number of EM iterations to perform.
n_init : int, optional
Number of initializations to perform. the best results is kept
params : string, optional
Controls which parameters are updated in the training process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.
init_params : string, optional
Controls which parameters are updated in the initialization process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.

Attributes

weights_ : array, shape (n_components,)
This attribute stores the mixing weights for each mixture component.
means_ : array, shape (n_components, n_features)
Mean parameters for each mixture component.
covars_ : array

Covariance parameters for each mixture component. The shape depends on covariance_type:

(n_components,)                        if 'spherical',
(n_features, n_features)               if 'tied',
(n_components, n_features)             if 'diag',
(n_components, n_features, n_features) if 'full'
converged_ : bool
True when convergence was reached in fit(), False otherwise.

DPGMM : Ininite gaussian mixture model, using the dirichlet
process, fit with a variational algorithm
VBGMM : Finite gaussian mixture model fit with a variational
algorithm, better for situations where there might be too little data to get a good estimate of the covariance matrix.

Examples

>>> import numpy as np
>>> from sklearn import mixture
>>> np.random.seed(1)
>>> g = mixture.GMM(n_components=2)
>>> # Generate random observations with two modes centered on 0
>>> # and 10 to use for training.
>>> obs = np.concatenate((np.random.randn(100, 1),
...                       10 + np.random.randn(300, 1)))
>>> g.fit(obs)
GMM(covariance_type='diag', init_params='wmc', min_covar=0.001,
n_components=2, n_init=1, n_iter=100, params='wmc',
random_state=None, thresh=0.01)
>>> np.round(g.weights_, 2)
array([ 0.75,  0.25])
>>> np.round(g.means_, 2)
array([[ 10.05],
[  0.06]])
>>> np.round(g.covars_, 2)
array([[[ 1.02]],
[[ 0.96]]])
>>> g.predict([[0], [2], [9], [10]])
array([1, 1, 0, 0]...)
>>> np.round(g.score([[0], [2], [9], [10]]), 2)
array([-2.19, -4.58, -1.75, -1.21])
>>> # Refit the model on new data (initial parameters remain the
>>> # same), this time with an even split between the two modes.
>>> g.fit(20 * [[0]] +  20 * [[10]])
GMM(covariance_type='diag', init_params='wmc', min_covar=0.001,
n_components=2, n_init=1, n_iter=100, params='wmc',
random_state=None, thresh=0.01)
>>> np.round(g.weights_, 2)
array([ 0.5,  0.5])


Full API documentation: GMMScikitsLearnNode

class mdp.nodes.DecisionTreeClassifierScikitsLearnNode

A decision tree classifier.

This node has been automatically generated by wrapping the sklearn.tree.tree.DecisionTreeClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

criterion : string, optional (default=”gini”)
The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain.
max_depth : integer or None, optional (default=None)
The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split : integer, optional (default=1)
The minimum number of samples required to split an internal node.
min_samples_leaf : integer, optional (default=1)
The minimum number of samples required to be at a leaf node.
min_density : float, optional (default=0.1)
This parameter controls a trade-off in an optimization heuristic. It controls the minimum density of the sample_mask (i.e. the fraction of samples in the mask). If the density falls below this threshold the mask is recomputed and the input data is packed which results in data copying. If min_density equals to one, the partitions are always represented as copies of the original data. Otherwise, partitions are represented as bit masks (aka sample masks).
max_features : int, string or None, optional (default=None)
The number of features to consider when looking for the best split. If “auto”, then max_features=sqrt(n_features) on classification tasks and max_features=n_features on regression problems. If “sqrt”, then max_features=sqrt(n_features). If “log2”, then max_features=log2(n_features). If None, then max_features=n_features.
compute_importances : boolean, optional (default=True)
Whether feature importances are computed and stored into the feature_importances_ attribute when calling fit.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Attributes

tree_ : Tree object
The underlying Tree object.
feature_importances_ : array of shape = [n_features]
The feature importances (the higher, the more important the feature). The importance I(f) of a feature f is computed as the (normalized) total reduction of error brought by that feature. It is also known as the Gini importance [4]_.

DecisionTreeRegressor

References

 [2] L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification and Regression Trees”, Wadsworth, Belmont, CA, 1984.
 [3] T. Hastie, R. Tibshirani and J. Friedman. “Elements of Statistical Learning”, Springer, 2009.
 [4] L. Breiman, and A. Cutler, “Random Forests”, http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

Examples

>>> from sklearn.datasets import load_iris
>>> from sklearn.cross_validation import cross_val_score
>>> from sklearn.tree import DecisionTreeClassifier

>>> clf = DecisionTreeClassifier(random_state=0)

>>> cross_val_score(clf, iris.data, iris.target, cv=10)
...
...
array([ 1.     ,  0.93...,  0.86...,  0.93...,  0.93...,
0.93...,  0.93...,  1.     ,  0.93...,  1.      ])


Full API documentation: DecisionTreeClassifierScikitsLearnNode

class mdp.nodes.PipelineScikitsLearnNode

Pipeline of transforms with a final estimator.

This node has been automatically generated by wrapping the sklearn.pipeline.Pipeline class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be ‘transforms’, that is, they must implements fit and transform methods. The final estimator needs only implements fit.

The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below.

Parameters

steps: list
List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator.

Attributes

steps : list of (name, object)
List of the named object that compose the pipeline, in the order that they are applied on the data.

Examples

>>> from sklearn import svm
>>> from sklearn.datasets import samples_generator
>>> from sklearn.feature_selection import SelectKBest
>>> from sklearn.feature_selection import f_regression
>>> from sklearn.pipeline import Pipeline

>>> # generate some data to play with
>>> X, y = samples_generator.make_classification(
...     n_informative=5, n_redundant=0, random_state=42)

>>> # ANOVA SVM-C
>>> anova_filter = SelectKBest(f_regression, k=5)
>>> clf = svm.SVC(kernel='linear')
>>> anova_svm = Pipeline([('anova', anova_filter), ('svc', clf)])

>>> # You can set the parameters using the names issued
>>> # For instance, fit using a k of 10 in the SelectKBest
>>> # and a parameter 'C' of the svn
>>> anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)
...
Pipeline(steps=[...])

>>> prediction = anova_svm.predict(X)
>>> anova_svm.score(X, y)
0.75


Full API documentation: PipelineScikitsLearnNode

class mdp.nodes.GenericUnivariateSelectScikitsLearnNode

Univariate feature selector with configurable strategy

This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.GenericUnivariateSelect class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

score_func: callable

Function taking two arrays X and y, and returning 2 arrays:

• both scores and pvalues
mode: {‘percentile’, ‘k_best’, ‘fpr’, ‘fdr’, ‘fwe’}
Feature selection mode
param: float or int depending on the feature selection mode
Parameter of the corresponding mode

Full API documentation: GenericUnivariateSelectScikitsLearnNode

class mdp.nodes.BernoulliNBScikitsLearnNode

Naive Bayes classifier for multivariate Bernoulli models.

This node has been automatically generated by wrapping the sklearn.naive_bayes.BernoulliNB class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Like MultinomialNB, this classifier is suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features.

Parameters

alpha: float, optional (default=1.0)
Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).
binarize: float or None, optional
Threshold for binarizing (mapping to booleans) of sample features. If None, input is presumed to already consist of binary vectors.
fit_prior: boolean
Whether to learn class prior probabilities or not. If false, a uniform prior will be used.

Attributes

class_log_prior_ : array, shape = [n_classes]
Log probability of each class (smoothed).
feature_log_prob_ : array, shape = [n_classes, n_features]
Empirical log probability of features given a class, P(x_i|y).

Examples

>>> import numpy as np
>>> X = np.random.randint(2, size=(6, 100))
>>> Y = np.array([1, 2, 3, 4, 4, 5])
>>> from sklearn.naive_bayes import BernoulliNB
>>> clf = BernoulliNB()
>>> clf.fit(X, Y)
BernoulliNB(alpha=1.0, binarize=0.0, fit_prior=True)
>>> print(clf.predict(X[2]))
[3]


References

C.D. Manning, P. Raghavan and H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234–265.

A. McCallum and K. Nigam (1998). A comparison of event models for naive Bayes text classification. Proc. AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41–48.

V. Metsis, I. Androutsopoulos and G. Paliouras (2006). Spam filtering with naive Bayes – Which naive Bayes? 3rd Conf. on Email and Anti-Spam (CEAS).

Full API documentation: BernoulliNBScikitsLearnNode

class mdp.nodes.LogisticRegressionScikitsLearnNode

Logistic Regression (aka logit, MaxEnt) classifier.

This node has been automatically generated by wrapping the sklearn.linear_model.logistic.LogisticRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

In the multiclass case, the training algorithm uses a one-vs.-all (OvA) scheme, rather than the “true” multinomial LR.

This class implements L1 and L2 regularized logistic regression using the liblinear library. It can handle both dense and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit floats for optimal performance; any other input format will be converted (and copied).

Parameters

penalty : string, ‘l1’ or ‘l2’
Used to specify the norm used in the penalization
dual : boolean
Dual or primal formulation. Dual formulation is only implemented for l2 penalty. Prefer dual=False when n_samples > n_features.
C : float, optional (default=1.0)
Specifies the strength of the regularization. The smaller it is the bigger is the regularization.
fit_intercept : bool, default: True
Specifies if a constant (a.k.a. bias or intercept) should be added the decision function
intercept_scaling : float, default: 1
when self.fit_intercept is True, instance vector x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equals to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic feature weight Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased
class_weight : {dict, ‘auto’}, optional
Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The ‘auto’ mode uses the values of y to automatically adjust weights inversely proportional to class frequencies.
tol: float, optional
tolerance for stopping criteria

Attributes

coef_ : array, shape = [n_classes-1, n_features]

Coefficient of the features in the decision function.

coef_ is readonly property derived from raw_coef_ that follows the internal memory layout of liblinear.

intercept_ : array, shape = [n_classes-1]
intercept (a.k.a. bias) added to the decision function. It is available only when parameter intercept is set to True

LinearSVC

Notes

The underlying C implementation uses a random number generator to select features when fitting the model. It is thus not uncommon, to have slightly different results for the same input data. If that happens, try with a smaller tol parameter.

References:

LIBLINEAR – A Library for Large Linear Classification
http://www.csie.ntu.edu.tw/~cjlin/liblinear/
Hsiang-Fu Yu, Fang-Lan Huang, Chih-Jen Lin (2011). Dual coordinate descent
methods for logistic regression and maximum entropy models. Machine Learning 85(1-2):41-75. http://www.csie.ntu.edu.tw/~cjlin/papers/maxent_dual.pdf

Full API documentation: LogisticRegressionScikitsLearnNode

class mdp.nodes.NuSVCScikitsLearnNode

NuSVC for sparse matrices (csr).

This node has been automatically generated by wrapping the sklearn.svm.sparse.classes.NuSVC class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

See sklearn.svm.NuSVC for a complete list of parameters

Notes

For best results, this accepts a matrix in csr format (scipy.sparse.csr), but should be able to convert from any array-like object (including other sparse representations).

Examples

>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> y = np.array([1, 1, 2, 2])
>>> from sklearn.svm.sparse import NuSVC
>>> clf = NuSVC()
>>> clf.fit(X, y)
NuSVC(cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel='rbf', nu=0.5, probability=False, shrinking=True, tol=0.001,
verbose=False)
>>> print(clf.predict([[-0.8, -1]]))
[ 1.]


Full API documentation: NuSVCScikitsLearnNode

class mdp.nodes.SparsePCAScikitsLearnNode

Sparse Principal Components Analysis (SparsePCA)

This node has been automatically generated by wrapping the sklearn.decomposition.sparse_pca.SparsePCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha.

Parameters

n_components : int,
Number of sparse atoms to extract.
alpha : float,
Sparsity controlling parameter. Higher values lead to sparser components.
ridge_alpha : float,
Amount of ridge shrinkage to apply in order to improve conditioning when calling the transform method.
max_iter : int,
Maximum number of iterations to perform.
tol : float,
Tolerance for the stopping condition.
method : {‘lars’, ‘cd’}
lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.
n_jobs : int,
Number of parallel jobs to run.
U_init : array of shape (n_samples, n_atoms),
V_init : array of shape (n_atoms, n_features),
Initial values for the components for warm restart scenarios.

verbose :

• Degree of verbosity of the printed output.
random_state : int or RandomState
Pseudo number generator state used for random sampling.

Attributes

components_ : array, [n_components, n_features]
Sparse components extracted from the data.
error_ : array
Vector of errors at each iteration.

PCA MiniBatchSparsePCA DictionaryLearning

Full API documentation: SparsePCAScikitsLearnNode

class mdp.nodes.OrthogonalMatchingPursuitScikitsLearnNode

Orthogonal Mathching Pursuit model (OMP)

This node has been automatically generated by wrapping the sklearn.linear_model.omp.OrthogonalMatchingPursuit class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

n_nonzero_coefs : int, optional
Desired number of non-zero entries in the solution. If None (by default) this value is set to 10% of n_features.
tol : float, optional
Maximum norm of the residual. If not None, overrides n_nonzero_coefs.
fit_intercept : boolean, optional
whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
normalize : boolean, optional
If False, the regressors X are assumed to be already normalized.
precompute_gram : {True, False, ‘auto’},
Whether to use a precomputed Gram and Xy matrix to speed up calculations. Improves performance when n_targets or n_samples is very large. Note that if you already have such matrices, you can pass them directly to the fit method.
copy_X : bool, optional
Whether the design matrix X must be copied by the algorithm. A false value is only helpful if X is already Fortran-ordered, otherwise a copy is made anyway.
copy_Gram : bool, optional
Whether the gram matrix must be copied by the algorithm. A false value is only helpful if X is already Fortran-ordered, otherwise a copy is made anyway.
copy_Xy : bool, optional
Whether the covariance vector Xy must be copied by the algorithm. If False, it may be overwritten.

Attributes

coef_ : array, shape = (n_features,) or (n_features, n_targets)
parameter vector (w in the fomulation formula)
intercept_ : float or array, shape =(n_targets,)
independent term in decision function.

Notes

Orthogonal matching pursuit was introduced in G. Mallat, Z. Zhang, Matching pursuits with time-frequency dictionaries, IEEE Transactions on Signal Processing, Vol. 41, No. 12. (December 1993), pp. 3397-3415. (http://blanche.polytechnique.fr/~mallat/papiers/MallatPursuit93.pdf)

This implementation is based on Rubinstein, R., Zibulevsky, M. and Elad, M., Efficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit Technical Report - CS Technion, April 2008. http://www.cs.technion.ac.il/~ronrubin/Publications/KSVD-OMP-v2.pdf

orthogonal_mp orthogonal_mp_gram lars_path Lars LassoLars decomposition.sparse_encode

Full API documentation: OrthogonalMatchingPursuitScikitsLearnNode

class mdp.nodes.SelectFprScikitsLearnNode

Filter: Select the pvalues below alpha based on a FPR test.

This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectFpr class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

FPR test stands for False Positive Rate test. It controls the total amount of false detections.

Parameters

score_func: callable

Function taking two arrays X and y, and returning 2 arrays:

• both scores and pvalues
alpha: float, optional
The highest p-value for features to be kept

Full API documentation: SelectFprScikitsLearnNode

class mdp.nodes.LabelEncoderScikitsLearnNode

Encode labels with value between 0 and n_classes-1.

This node has been automatically generated by wrapping the sklearn.preprocessing.LabelEncoder class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Attributes

classes_: array of shape [n_class]
Holds the label for each class.

Examples

LabelEncoder can be used to normalize labels.

>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6])
array([0, 0, 1, 2])
>>> le.inverse_transform([0, 0, 1, 2])
array([1, 1, 2, 6])


It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels.

>>> le = preprocessing.LabelEncoder()
>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
LabelEncoder()
>>> list(le.classes_)
['amsterdam', 'paris', 'tokyo']
>>> le.transform(["tokyo", "tokyo", "paris"])
array([2, 2, 1])
>>> list(le.inverse_transform([2, 2, 1]))
['tokyo', 'tokyo', 'paris']


Full API documentation: LabelEncoderScikitsLearnNode

class mdp.nodes.QDAScikitsLearnNode

This node has been automatically generated by wrapping the sklearn.qda.QDA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

A classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule.

The model fits a Gaussian density to each class.

Parameters

priors : array, optional, shape = [n_classes]
Priors on classes

Attributes

means_ : array-like, shape = [n_classes, n_features]
Class means
priors_ : array-like, shape = [n_classes]
Class priors (sum to 1)
covariances_ : list of array-like, shape = [n_features, n_features]
Covariance matrices of each class

Examples

>>> from sklearn.qda import QDA
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = QDA()
>>> clf.fit(X, y)
QDA(priors=None)
>>> print(clf.predict([[-0.8, -1]]))
[1]


sklearn.lda.LDA: Linear discriminant analysis

Full API documentation: QDAScikitsLearnNode

class mdp.nodes.LogOddsEstimatorScikitsLearnNode

This node has been automatically generated by wrapping the sklearn.ensemble.gradient_boosting.LogOddsEstimator class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Full API documentation: LogOddsEstimatorScikitsLearnNode

class mdp.nodes.VectorizerScikitsLearnNode

This node has been automatically generated by wrapping the sklearn.feature_extraction.text.Vectorizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Full API documentation: VectorizerScikitsLearnNode

class mdp.nodes.SGDClassifierScikitsLearnNode

Linear model fitted by minimizing a regularized empirical loss with SGD.

This node has been automatically generated by wrapping the sklearn.linear_model.stochastic_gradient.SGDClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

SGD stands for Stochastic Gradient Descent: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).

The regularizer is a penalty added to the loss function that shrinks model parameters towards the zero vector using either the squared euclidean norm L2 or the absolute norm L1 or a combination of both (Elastic Net). If the parameter update crosses the 0.0 value because of the regularizer, the update is truncated to 0.0 to allow for learning sparse models and achieve online feature selection.

This implementation works with data represented as dense or sparse arrays of floating point values for the features.

Parameters

loss : str, ‘hinge’ or ‘log’ or ‘modified_huber’
The loss function to be used. Defaults to ‘hinge’. The hinge loss is a margin loss used by standard linear SVM models. The ‘log’ loss is the loss of logistic regression models and can be used for probability estimation in binary classifiers. ‘modified_huber’ is another smooth loss that brings tolerance to outliers.
penalty : str, ‘l2’ or ‘l1’ or ‘elasticnet’
The penalty (aka regularization term) to be used. Defaults to ‘l2’ which is the standard regularizer for linear SVM models. ‘l1’ and ‘elasticnet’ migh bring sparsity to the model (feature selection) not achievable with ‘l2’.
alpha : float
Constant that multiplies the regularization term. Defaults to 0.0001
rho : float
The Elastic Net mixing parameter, with 0 < rho <= 1. Defaults to 0.85.
fit_intercept: bool
Whether the intercept should be estimated or not. If False, the data is assumed to be already centered. Defaults to True.
n_iter: int, optional
The number of passes over the training data (aka epochs). Defaults to 5.
shuffle: bool, optional
Whether or not the training data should be shuffled after each epoch. Defaults to False.
seed: int, optional
The seed of the pseudo random number generator to use when shuffling the data.
verbose: integer, optional
The verbosity level
n_jobs: integer, optional
The number of CPUs to use to do the OVA (One Versus All, for multi-class problems) computation. -1 means ‘all CPUs’. Defaults to 1.
learning_rate : string, optional

The learning rate:

• constant: eta = eta0
• optimal: eta = 1.0/(t+t0) [default]
• invscaling: eta = eta0 / pow(t, power_t)
eta0 : double
The initial learning rate [default 0.01].
power_t : double
The exponent for inverse scaling learning rate [default 0.25].
class_weight : dict, {class_label : weight} or “auto” or None, optional

Preset for the class_weight fit parameter.

Weights associated with classes. If not given, all classes are supposed to have weight one.

The “auto” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies.

warm_start : bool, optional
When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.

Attributes

coef_ : array, shape = [1, n_features] if n_classes == 2 else [n_classes, n_features]

Weights assigned to the features.
intercept_ : array, shape = [1] if n_classes == 2 else [n_classes]
Constants in decision function.

Examples

>>> import numpy as np
>>> from sklearn import linear_model
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> Y = np.array([1, 1, 2, 2])
>>> clf = linear_model.SGDClassifier()
>>> clf.fit(X, Y)
...
SGDClassifier(alpha=0.0001, class_weight=None, epsilon=0.1, eta0=0.0,
fit_intercept=True, learning_rate='optimal', loss='hinge',
n_iter=5, n_jobs=1, penalty='l2', power_t=0.5, rho=0.85, seed=0,
shuffle=False, verbose=0, warm_start=False)
>>> print(clf.predict([[-0.8, -1]]))
[1]


LinearSVC, LogisticRegression, Perceptron

Full API documentation: SGDClassifierScikitsLearnNode

Lasso model fit with Least Angle Regression a.k.a. Lars

This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.LassoLars class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

It is a Linear Model trained with an L1 prior as regularizer.

The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1

Parameters

fit_intercept : boolean
whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
verbose : boolean or integer, optional
Sets the verbosity amount
normalize : boolean, optional
If True, the regressors X are normalized
copy_X : boolean, optional, default True
If True, X will be copied; else, it may be overwritten.
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument.
max_iter: integer, optional
Maximum number of iterations to perform.
eps: float, optional
The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems. Unlike the ‘tol’ parameter in some iterative optimization-based algorithms, this parameter does not control the tolerance of the optimization.
fit_path : boolean
If True the full path is stored in the coef_path_ attribute. If you compute the solution for a large problem or many targets, setting fit_path to False will lead to a speedup, especially with a small alpha.

Attributes

coef_path_ : array, shape = [n_features, n_alpha]
The varying values of the coefficients along the path. It is not present if fit_path parameter is False.
coef_ : array, shape = [n_features]
Parameter vector (w in the fomulation formula).
intercept_ : float
Independent term in decision function.

Examples

>>> from sklearn import linear_model
>>> clf = linear_model.LassoLars(alpha=0.01)
>>> clf.fit([[-1, 1], [0, 0], [1, 1]], [-1, 0, -1])
...
LassoLars(alpha=0.01, copy_X=True, eps=..., fit_intercept=True,
fit_path=True, max_iter=500, normalize=True, precompute='auto',
verbose=False)
>>> print(clf.coef_)
[ 0.         -0.963257...]


lars_path lasso_path Lasso LassoCV LassoLarsCV sklearn.decomposition.sparse_encode

http://en.wikipedia.org/wiki/Least_angle_regression

class mdp.nodes.KernelPCAScikitsLearnNode

Kernel Principal component analysis (KPCA)

This node has been automatically generated by wrapping the sklearn.decomposition.kernel_pca.KernelPCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Non-linear dimensionality reduction through the use of kernels.

Parameters

n_components: int or None
Number of components. If None, all non-zero components are kept.
kernel: “linear” | “poly” | “rbf” | “sigmoid” | “precomputed”
Kernel. Default: “linear”
degree : int, optional
Degree for poly, rbf and sigmoid kernels. Default: 3.
gamma : float, optional
Kernel coefficient for rbf and poly kernels. Default: 1/n_features.
coef0 : float, optional
Independent term in poly and sigmoid kernels.
alpha: int
Hyperparameter of the ridge regression that learns the inverse transform (when fit_inverse_transform=True). Default: 1.0
fit_inverse_transform: bool
Learn the inverse transform for non-precomputed kernels. (i.e. learn to find the pre-image of a point) Default: False
eigen_solver: string [‘auto’|’dense’|’arpack’]
Select eigensolver to use. If n_components is much less than the number of training samples, arpack may be more efficient than the dense eigensolver.
tol: float
convergence tolerance for arpack. Default: 0 (optimal value will be chosen by arpack)
max_iter : int
maximum number of iterations for arpack Default: None (optimal value will be chosen by arpack)

Attributes

lambdas_, alphas_:

• Eigenvalues and eigenvectors of the centered kernel matrix

dual_coef_:

• Inverse transform matrix

X_transformed_fit_:

• Projection of the fitted data on the kernel principal components

References

Kernel PCA was intoduced in:

• Bernhard Schoelkopf, Alexander J. Smola,
• and Klaus-Robert Mueller. 1999. Kernel principal
• component analysis. In Advances in kernel methods,
• MIT Press, Cambridge, MA, USA 327-352.

Full API documentation: KernelPCAScikitsLearnNode

class mdp.nodes.ScalerScikitsLearnNode

Standardize features by removing the mean and scaling to unit variance

This node has been automatically generated by wrapping the sklearn.preprocessing.Scaler class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Centering and scaling happen indepently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using the transform method.

Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual feature do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).

For instance many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered around 0 and have variance in the same order. If a feature has a variance that is orders of magnitude larger that others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.

Parameters

with_mean : boolean, True by default
If True, center the data before scaling.
with_std : boolean, True by default
If True, scale the data to unit variance (or equivalently, unit standard deviation).
copy : boolean, optional, default is True
set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix and if axis is 1).

Attributes

mean_ : array of floats with shape [n_features]
The mean value for each feature in the training set.
std_ : array of floats with shape [n_features]
The standard deviation for each feature in the training set.

sklearn.preprocessing.scale() to perform centering and scaling without using the Transformer object oriented API

sklearn.decomposition.RandomizedPCA with whiten=True to further remove the linear correlation across features.

Full API documentation: ScalerScikitsLearnNode

class mdp.nodes.CCAScikitsLearnNode

CCA Canonical Correlation Analysis. CCA inherits from PLS with mode=”B” and deflation_mode=”canonical”.

This node has been automatically generated by wrapping the sklearn.pls.CCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

X : array-like of predictors, shape = [n_samples, p]
Training vectors, where n_samples in the number of samples and p is the number of predictors.
Y : array-like of response, shape = [n_samples, q]
Training vectors, where n_samples in the number of samples and q is the number of response variables.
n_components : int, (default 2).
number of components to keep.
scale : boolean, (default True)
whether to scale the data?
max_iter : an integer, (default 500)
the maximum number of iterations of the NIPALS inner loop (used only if algorithm=”nipals”)
tol : non-negative real, default 1e-06.
the tolerance used in the iterative algorithm
copy : boolean
Whether the deflation be done on a copy. Let the default value to True unless you don’t care about side effects

Attributes

x_weights_ : array, [p, n_components]
X block weights vectors.
y_weights_ : array, [q, n_components]
Y block weights vectors.
x_scores_ : array, [n_samples, n_components]
X scores.
y_scores_ : array, [n_samples, n_components]
Y scores.
x_rotations_ : array, [p, n_components]
X block to latents rotations.
y_rotations_ : array, [q, n_components]
Y block to latents rotations.

Notes

For each component k, find the weights u, v that maximizes max corr(Xk u, Yk v), such that |u| = |v| = 1

Note that it maximizes only the correlations between the scores.

The residual matrix of X (Xk+1) block is obtained by the deflation on the current X score: x_score.

The residual matrix of Y (Yk+1) block is obtained by deflation on the current Y score.

Examples

>>> from sklearn.pls import PLSCanonical, PLSRegression, CCA
>>> X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [3.,5.,4.]]
>>> Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
>>> cca = CCA(n_components=1)
>>> cca.fit(X, Y)
...
CCA(copy=True, max_iter=500, n_components=1, scale=True, tol=1e-06)
>>> X_c, Y_c = cca.transform(X, Y)


References

Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. Technical Report 371, Department of Statistics, University of Washington, Seattle, 2000.

In french but still a reference:

Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris:

Editions Technic.

PLSCanonical PLSSVD

Full API documentation: CCAScikitsLearnNode

class mdp.nodes.KernelCentererScikitsLearnNode

Center a kernel matrix

This node has been automatically generated by wrapping the sklearn.preprocessing.KernelCenterer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

This is equivalent to centering phi(X) with sklearn.preprocessing.Scaler(with_std=False).

Full API documentation: KernelCentererScikitsLearnNode

class mdp.nodes.SelectFdrScikitsLearnNode

Filter: Select the p-values for an estimated false discovery rate

This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectFdr class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

This uses the Benjamini-Hochberg procedure. alpha is the target false discovery rate.

Parameters

score_func: callable

Function taking two arrays X and y, and returning 2 arrays:

• both scores and pvalues
alpha: float, optional
The highest uncorrected p-value for features to keep

Full API documentation: SelectFdrScikitsLearnNode

class mdp.nodes.ExtraTreeClassifierScikitsLearnNode

An extremely randomized tree classifier.

This node has been automatically generated by wrapping the sklearn.tree.tree.ExtraTreeClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Extra-trees differ from classic decision trees in the way they are built. When looking for the best split to separate the samples of a node into two groups, random splits are drawn for each of the max_features randomly selected features and the best split among those is chosen. When max_features is set 1, this amounts to building a totally random decision tree.

Warning: Extra-trees should only be used within ensemble methods.

ExtraTreeRegressor, ExtraTreesClassifier, ExtraTreesRegressor

References

 [1] P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006.

Full API documentation: ExtraTreeClassifierScikitsLearnNode

class mdp.nodes.SelectKBestScikitsLearnNode

Filter: Select the k lowest p-values.

This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectKBest class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

score_func: callable

Function taking two arrays X and y, and returning 2 arrays:

• both scores and pvalues
k: int, optional
Number of top features to select.

Notes

Ties between features with equal p-values will be broken in an unspecified way.

Full API documentation: SelectKBestScikitsLearnNode

class mdp.nodes.NormalizerScikitsLearnNode

Normalize samples individually to unit norm

This node has been automatically generated by wrapping the sklearn.preprocessing.Normalizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.

This transformer is able to work both with dense numpy arrays and scipy.sparse matrix (use CSR format if you want to avoid the burden of a copy / conversion).

Scaling inputs to unit norms is a common operation for text classification or clustering for instance. For instance the dot product of two l2-normalized TF-IDF vectors is the cosine similarity of the vectors and is the base similarity metric for the Vector Space Model commonly used by the Information Retrieval community.

Parameters

norm : ‘l1’ or ‘l2’, optional (‘l2’ by default)
The norm to use to normalize each non zero sample.
copy : boolean, optional, default is True
set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix).

Notes

This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.

sklearn.preprocessing.normalize() equivalent function without the object oriented API

Full API documentation: NormalizerScikitsLearnNode

class mdp.nodes.TfidfTransformerScikitsLearnNode

Transform a count matrix to a normalized tf or tf–idf representation

This node has been automatically generated by wrapping the sklearn.feature_extraction.text.TfidfTransformer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Tf means term-frequency while tf–idf means term-frequency times inverse document-frequency. This is a common term weighting scheme in information retrieval, that has also found good use in document classification.

The goal of using tf–idf instead of the raw frequencies of occurrence of a token in a given document is to scale down the impact of tokens that occur very frequently in a given corpus and that are hence empirically less informative than features that occur in a small fraction of the training corpus.

In the SMART notation used in IR, this class implements several tf–idf variants. Tf is always “n” (natural), idf is “t” iff use_idf is given, “n” otherwise, and normalization is “c” iff norm=’l2’, “n” iff norm=None.

Parameters

norm : ‘l1’, ‘l2’ or None, optional
Norm used to normalize term vectors. None for no normalization.
use_idf : boolean, optional
Enable inverse-document-frequency reweighting.
smooth_idf : boolean, optional
Smooth idf weights by adding one to document frequencies, as if an extra document was seen containing every term in the collection exactly once. Prevents zero divisions.
sublinear_tf : boolean, optional
Apply sublinear tf scaling, i.e. replace tf with 1 + log(tf).

References

 [Yates2011] R. Baeza-Yates and B. Ribeiro-Neto (2011). Modern Information Retrieval. Addison Wesley, pp. 68–74.
 [MSR2008] C.D. Manning, H. Schütze and P. Raghavan (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 121–125.

Full API documentation: TfidfTransformerScikitsLearnNode

This node has been automatically generated by wrapping the sklearn.ensemble.gradient_boosting.GradientBoostingClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage n_classes_ regression trees are fit on the negative gradient of the binomial or multinomial deviance loss function. Binary classification is a special case where only a single regression tree is induced.

Parameters

loss : {‘deviance’}, optional (default=’deviance’)
loss function to be optimized. ‘deviance’ refers to deviance (= logistic regression) for classification with probabilistic outputs.
learn_rate : float, optional (default=0.1)
learning rate shrinks the contribution of each tree by learn_rate. There is a trade-off between learn_rate and n_estimators.
n_estimators : int (default=100)
The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
max_depth : integer, optional (default=3)
maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
min_samples_split : integer, optional (default=1)
The minimum number of samples required to split an internal node.
min_samples_leaf : integer, optional (default=1)
The minimum number of samples required to be at a leaf node.
subsample : float, optional (default=1.0)
The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
max_features : int, None, optional (default=None)
The number of features to consider when looking for the best split. Features are choosen randomly at each split point. If None, then max_features=n_features. Choosing max_features < n_features leads to a reduction of variance and an increase in bias.

Attributes

feature_importances_ : array, shape = [n_features]
The feature importances (the higher, the more important the feature).
oob_score_ : array, shape = [n_estimators]
Score of the training dataset obtained using an out-of-bag estimate. The i-th score oob_score_[i] is the deviance (= loss) of the model at iteration i on the out-of-bag sample.
train_score_ : array, shape = [n_estimators]
The i-th score train_score_[i] is the deviance (= loss) of the model at iteration i on the in-bag sample. If subsample == 1 this is the deviance on the training data.

Examples

>>> samples = [[0, 0, 2], [1, 0, 0]]
>>> labels = [0, 1]
>>> print gb.predict([[0.5, 0, 0]])
[0]


sklearn.tree.DecisionTreeClassifier, RandomForestClassifier

References

J. Friedman, Greedy Function Approximation: A Gradient Boosting Machine, The Annals of Statistics, Vol. 29, No. 5, 2001.

1. Friedman, Stochastic Gradient Boosting, 1999

T. Hastie, R. Tibshirani and J. Friedman. Elements of Statistical Learning Ed. 2, Springer, 2009.

class mdp.nodes.GMMHMMScikitsLearnNode

Hidden Markov Model with Gaussin mixture emissions

This node has been automatically generated by wrapping the sklearn.hmm.GMMHMM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Attributes

init_params : string, optional
Controls which parameters are initialized prior to training. Can contain any combination of ‘s’ for startprob, ‘t’ for transmat, ‘m’ for means, and ‘c’ for covars, etc. Defaults to all parameters.
params : string, optional
Controls which parameters are updated in the training process. Can contain any combination of ‘s’ for startprob, ‘t’ for transmat,’m’ for means, and ‘c’ for covars, etc. Defaults to all parameters.
n_components : int
Number of states in the model.
transmat : array, shape (n_components, n_components)
Matrix of transition probabilities between states.
startprob : array, shape (‘n_components,)
Initial state occupation distribution.
gmms : array of GMM objects, length n_components
GMM emission distributions for each state.
random_state : RandomState or an int seed (0 by default)
A random number generator instance
n_iter : int, optional
Number of iterations to perform.
thresh : float, optional
Convergence threshold.

Examples

>>> from sklearn.hmm import GMMHMM
>>> GMMHMM(n_components=2, n_mix=10, covariance_type='diag')
...
GMMHMM(algorithm='viterbi', covariance_type='diag',...


GaussianHMM : HMM with Gaussian emissions

Full API documentation: GMMHMMScikitsLearnNode

class mdp.nodes.DecisionTreeRegressorScikitsLearnNode

A tree regressor.

This node has been automatically generated by wrapping the sklearn.tree.tree.DecisionTreeRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

criterion : string, optional (default=”mse”)
The function to measure the quality of a split. The only supported criterion is “mse” for the mean squared error.
max_depth : integer or None, optional (default=None)
The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split : integer, optional (default=1)
The minimum number of samples required to split an internal node.
min_samples_leaf : integer, optional (default=1)
The minimum number of samples required to be at a leaf node.
min_density : float, optional (default=0.1)
This parameter controls a trade-off in an optimization heuristic. It controls the minimum density of the sample_mask (i.e. the fraction of samples in the mask). If the density falls below this threshold the mask is recomputed and the input data is packed which results in data copying. If min_density equals to one, the partitions are always represented as copies of the original data. Otherwise, partitions are represented as bit masks (aka sample masks).
max_features : int, string or None, optional (default=None)
The number of features to consider when looking for the best split. If “auto”, then max_features=sqrt(n_features) on classification tasks and max_features=n_features on regression problems. If “sqrt”, then max_features=sqrt(n_features). If “log2”, then max_features=log2(n_features). If None, then max_features=n_features.
compute_importances : boolean, optional (default=True)
Whether feature importances are computed and stored into the feature_importances_ attribute when calling fit.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Attributes

tree_ : Tree object
The underlying Tree object.
feature_importances_ : array of shape = [n_features]
The feature importances (the higher, the more important the feature). The importance I(f) of a feature f is computed as the (normalized) total reduction of error brought by that feature. It is also known as the Gini importance [4]_.

DecisionTreeClassifier

References

 [2] L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification and Regression Trees”, Wadsworth, Belmont, CA, 1984.
 [3] T. Hastie, R. Tibshirani and J. Friedman. “Elements of Statistical Learning”, Springer, 2009.
 [4] L. Breiman, and A. Cutler, “Random Forests”, http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

Examples

>>> from sklearn.datasets import load_boston
>>> from sklearn.cross_validation import cross_val_score
>>> from sklearn.tree import DecisionTreeRegressor

>>> boston = load_boston()
>>> regressor = DecisionTreeRegressor(random_state=0)


R2 scores (a.k.a. coefficient of determination) over 10-folds CV:

>>> cross_val_score(regressor, boston.data, boston.target, cv=10)
...
...
array([ 0.61..., 0.57..., -0.34..., 0.41..., 0.75...,
0.07..., 0.29..., 0.33..., -1.42..., -1.77...])


Full API documentation: DecisionTreeRegressorScikitsLearnNode

class mdp.nodes.RidgeScikitsLearnNode

Linear least squares with l2 regularization.

This node has been automatically generated by wrapping the sklearn.linear_model.ridge.Ridge class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

This model solves a regression model where the loss function is the linear least squares function and regularization is given by the l2-norm. Also known as Ridge Regression or Tikhonov regularization. This estimator has built-in support for multi-variate regression (i.e., when y is a 2d-array of shape [n_samples, n_responses]).

Parameters

alpha : float
Small positive values of alpha improve the conditioning of the problem and reduce the variance of the estimates. Alpha corresponds to (2*C)^-1 in other linear models such as LogisticRegression or LinearSVC.
fit_intercept : boolean
Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
normalize : boolean, optional
If True, the regressors X are normalized
copy_X : boolean, optional, default True
If True, X will be copied; else, it may be overwritten.
tol : float
Precision of the solution.

Attributes

coef_ : array, shape = [n_features] or [n_responses, n_features]
Weight vector(s).

RidgeClassifier, RidgeCV

Examples

>>> from sklearn.linear_model import Ridge
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> np.random.seed(0)
>>> y = np.random.randn(n_samples)
>>> X = np.random.randn(n_samples, n_features)
>>> clf = Ridge(alpha=1.0)
>>> clf.fit(X, y)
Ridge(alpha=1.0, copy_X=True, fit_intercept=True, normalize=False,
tol=0.001)


Full API documentation: RidgeScikitsLearnNode

class mdp.nodes.SVRScikitsLearnNode

epsilon-Support Vector Regression.

This node has been automatically generated by wrapping the sklearn.svm.classes.SVR class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

The free parameters in the model are C and epsilon.

The implementations is a based on libsvm.

Parameters

C : float, optional (default=1.0)
penalty parameter C of the error term.
epsilon : float, optional (default=0.1)
epsilon in the epsilon-SVR model. It specifies the epsilon-tube within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value.
kernel : string, optional (default=’rbf’)
Specifies the kernel type to be used in the algorithm. one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’. If none is given ‘rbf’ will be used.
degree : int, optional (default=3)
degree of kernel function is significant only in poly, rbf, sigmoid
gamma : float, optional (default=0.0)
kernel coefficient for rbf and poly, if gamma is 0.0 then 1/n_features will be taken.
coef0 : float, optional (default=0.0)
independent term in kernel function. It is only significant in poly/sigmoid.
probability: boolean, optional (default=False)
Whether to enable probability estimates. This must be enabled prior to calling predict_proba.
shrinking: boolean, optional (default=True)
Whether to use the shrinking heuristic.
tol: float, optional (default=1e-3)
Tolerance for stopping criterion.
cache_size: float, optional
Specify the size of the kernel cache (in MB)
verbose : bool, default: False
Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.

Attributes

support_ : array-like, shape = [n_SV]
Index of support vectors.
support_vectors_ : array-like, shape = [nSV, n_features]
Support vectors.
dual_coef_ : array, shape = [n_classes-1, n_SV]
Coefficients of the support vector in the decision function.
coef_ : array, shape = [n_classes-1, n_features]

Weights asigned to the features (coefficients in the primal problem). This is only available in the case of linear kernel.

coef_ is readonly property derived from dual_coef_ and support_vectors_

intercept_ : array, shape = [n_class * (n_class-1) / 2]
Constants in decision function.

Examples

>>> from sklearn.svm import SVR
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> np.random.seed(0)
>>> y = np.random.randn(n_samples)
>>> X = np.random.randn(n_samples, n_features)
>>> clf = SVR(C=1.0, epsilon=0.2)
>>> clf.fit(X, y)
SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.2, gamma=0.0,
kernel='rbf', probability=False, shrinking=True, tol=0.001,
verbose=False)


NuSVR
Support Vector Machine for regression implemented using libsvm using a parameter to control the number of support vectors.

Full API documentation: SVRScikitsLearnNode

class mdp.nodes.RFECVScikitsLearnNode

Feature ranking with recursive feature elimination and cross-validated selection of the best number of features.

This node has been automatically generated by wrapping the sklearn.feature_selection.rfe.RFECV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

estimator : object

A supervised learning estimator with a fit method that updates a coef_ attribute that holds the fitted parameters. Important features must correspond to high absolute values in the coef_ array.

For instance, this is the case for most supervised learning algorithms such as Support Vector Classifiers and Generalized Linear Models from the svm and linear_model modules.

step : int or float, optional (default=1)
If greater than or equal to 1, then step corresponds to the (integer) number of features to remove at each iteration. If within (0.0, 1.0), then step corresponds to the percentage (rounded down) of features to remove at each iteration.
cv : int or cross-validation generator, optional (default=None)
If int, it is the number of folds. If None, 3-fold cross-validation is performed by default. Specific cross-validation objects can also be passed, see sklearn.cross_validation module for details.
loss_function : function, optional (default=None)
The loss function to minimize by cross-validation. If None, then the score function of the estimator is maximized.

Attributes

n_features_ : int
The number of selected features with cross-validation.
support_ : array of shape [n_features]
ranking_ : array of shape [n_features]
The feature ranking, such that ranking_[i] corresponds to the ranking position of the i-th feature. Selected (i.e., estimated best) features are assigned rank 1.
cv_scores_ : array of shape [n_subsets_of_features]
The cross-validation scores such that cv_scores_[i] corresponds to the CV score of the i-th subset of features.

Examples

The following example shows how to retrieve the a-priori not known 5 informative features in the Friedman #1 dataset.

>>> from sklearn.datasets import make_friedman1
>>> from sklearn.feature_selection import RFECV
>>> from sklearn.svm import SVR
>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
>>> estimator = SVR(kernel="linear")
>>> selector = RFECV(estimator, step=1, cv=5)
>>> selector = selector.fit(X, y)
>>> selector.support_
array([ True,  True,  True,  True,  True,
False, False, False, False, False], dtype=bool)
>>> selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])


References

 [1] Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., “Gene selection for cancer classification using support vector machines”, Mach. Learn., 46(1-3), 389–422, 2002.

Full API documentation: RFECVScikitsLearnNode

class mdp.nodes.BayesianRidgeScikitsLearnNode

Bayesian ridge regression

This node has been automatically generated by wrapping the sklearn.linear_model.bayes.BayesianRidge class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Fit a Bayesian ridge model and optimize the regularization parameters lambda (precision of the weights) and alpha (precision of the noise).

Parameters

X : array, shape = (n_samples, n_features)
Training vectors.
y : array, shape = (length)
Target values for training vectors
n_iter : int, optional
Maximum number of iterations. Default is 300.
tol : float, optional
Stop the algorithm if w has converged. Default is 1.e-3.
alpha_1 : float, optional
Hyper-parameter : shape parameter for the Gamma distribution prior over the alpha parameter. Default is 1.e-6
alpha_2 : float, optional
Hyper-parameter : inverse scale parameter (rate parameter) for the Gamma distribution prior over the alpha parameter. Default is 1.e-6.
lambda_1 : float, optional
Hyper-parameter : shape parameter for the Gamma distribution prior over the lambda parameter. Default is 1.e-6.
lambda_2 : float, optional
Hyper-parameter : inverse scale parameter (rate parameter) for the Gamma distribution prior over the lambda parameter. Default is 1.e-6
compute_score : boolean, optional
If True, compute the objective function at each step of the model. Default is False
fit_intercept : boolean, optional
wether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered). Default is True.
normalize : boolean, optional, default False
If True, the regressors X are normalized
copy_X : boolean, optional, default True
If True, X will be copied; else, it may be overwritten.
verbose : boolean, optional, default False
Verbose mode when fitting the model.

Attributes

coef_ : array, shape = (n_features)
Coefficients of the regression model (mean of distribution)
alpha_ : float
estimated precision of the noise.
lambda_ : array, shape = (n_features)
estimated precisions of the weights.
scores_ : float
if computed, value of the objective function (to be maximized)

Examples

>>> from sklearn import linear_model
>>> clf = linear_model.BayesianRidge()
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
...
BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, compute_score=False,
copy_X=True, fit_intercept=True, lambda_1=1e-06, lambda_2=1e-06,
n_iter=300, normalize=False, tol=0.001, verbose=False)
>>> clf.predict([[1, 1]])
array([ 1.])


Notes

See examples/linear_model/plot_bayesian_ridge.py for an example.

Full API documentation: BayesianRidgeScikitsLearnNode

class mdp.nodes.PLSRegressionScikitsLearnNode

PLS regression

This node has been automatically generated by wrapping the sklearn.pls.PLSRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

PLSRegression implements the PLS 2 blocks regression known as PLS2 or PLS1 in case of one dimensional response. This class inherits from _PLS with mode=”A”, deflation_mode=”regression”, norm_y_weights=False and algorithm=”nipals”.

Parameters

X : array-like of predictors, shape = [n_samples, p]
Training vectors, where n_samples in the number of samples and p is the number of predictors.
Y : array-like of response, shape = [n_samples, q]
Training vectors, where n_samples in the number of samples and q is the number of response variables.
n_components : int, (default 2)
Number of components to keep.
scale : boolean, (default True)
whether to scale the data
max_iter : an integer, (default 500)
the maximum number of iterations of the NIPALS inner loop (used only if algorithm=”nipals”)
tol : non-negative real
Tolerance used in the iterative algorithm default 1e-06.
copy : boolean, default True
Whether the deflation should be done on a copy. Let the default value to True unless you don’t care about side effect

Attributes

x_weights_ : array, [p, n_components]
X block weights vectors.
y_weights_ : array, [q, n_components]
Y block weights vectors.
x_scores_ : array, [n_samples, n_components]
X scores.
y_scores_ : array, [n_samples, n_components]
Y scores.
x_rotations_ : array, [p, n_components]
X block to latents rotations.
y_rotations_ : array, [q, n_components]
Y block to latents rotations.
coefs: array, [p, q]
The coeficients of the linear model: Y = X coefs + Err

Notes

For each component k, find weights u, v that optimizes:

max corr(Xk u, Yk v) * var(Xk u) var(Yk u), such that |u| = 1

Note that it maximizes both the correlations between the scores and the intra-block variances.

The residual matrix of X (Xk+1) block is obtained by the deflation on the current X score: x_score.

The residual matrix of Y (Yk+1) block is obtained by deflation on the current X score. This performs the PLS regression known as PLS2. This mode is prediction oriented.

This implementation provides the same results that 3 PLS packages provided in the R language (R-project):

• “mixOmics” with function pls(X, Y, mode = “regression”)
• “plspm ” with function plsreg2(X, Y)
• “pls” with function oscorespls.fit(X, Y)

Examples

>>> from sklearn.pls import PLSCanonical, PLSRegression, CCA
>>> X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
>>> Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
>>> pls2 = PLSRegression(n_components=2)
>>> pls2.fit(X, Y)
...
PLSRegression(copy=True, max_iter=500, n_components=2, scale=True,
tol=1e-06)
>>> Y_pred = pls2.predict(X)


References

Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. Technical Report 371, Department of Statistics, University of Washington, Seattle, 2000.

In french but still a reference:

Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris:

Editions Technic.

Full API documentation: PLSRegressionScikitsLearnNode

class mdp.nodes.ProbabilisticPCAScikitsLearnNode

Additional layer on top of PCA that adds a probabilistic evaluationPrincipal component analysis (PCA)

This node has been automatically generated by wrapping the sklearn.decomposition.pca.ProbabilisticPCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Linear dimensionality reduction using Singular Value Decomposition of the data and keeping only the most significant singular vectors to project the data to a lower dimensional space.

This implementation uses the scipy.linalg implementation of the singular value decomposition. It only works for dense arrays and is not scalable to large dimensional data.

The time complexity of this implementation is O(n ** 3) assuming n ~ n_samples ~ n_features.

Parameters

n_components : int, None or string

Number of components to keep. if n_components is not set all components are kept:

n_components == min(n_samples, n_features)


if n_components == ‘mle’, Minka’s MLE is used to guess the dimension if 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components

copy : bool
If False, data passed to fit are overwritten
whiten : bool, optional

When True (False by default) the components_ vectors are divided by n_samples times singular values to ensure uncorrelated outputs with unit component-wise variances.

Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making there data respect some hard-wired assumptions.

Attributes

components_ : array, [n_components, n_features]
Components with maximum variance.
explained_variance_ratio_ : array, [n_components]
Percentage of variance explained by each of the selected components. k is not set then all components are stored and the sum of explained variances is equal to 1.0

Notes

For n_components=’mle’, this class uses the method of Thomas P. Minka:

Automatic Choice of Dimensionality for PCA. NIPS 2000: 598-604

Due to implementation subtleties of the Singular Value Decomposition (SVD), which is used in this implementation, running fit twice on the same matrix can lead to principal components with signs flipped (change in direction). For this reason, it is important to always use the same estimator object to transform data in a consistent fashion.

Examples

>>> import numpy as np
>>> from sklearn.decomposition import PCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = PCA(n_components=2)
>>> pca.fit(X)
PCA(copy=True, n_components=2, whiten=False)
>>> print(pca.explained_variance_ratio_)
[ 0.99244...  0.00755...]


ProbabilisticPCA RandomizedPCA KernelPCA SparsePCA

Full API documentation: ProbabilisticPCAScikitsLearnNode

class mdp.nodes.LinearRegressionScikitsLearnNode

Ordinary least squares Linear Regression.

This node has been automatically generated by wrapping the sklearn.linear_model.base.LinearRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Attributes

coef_ : array
Estimated coefficients for the linear regression problem.
intercept_ : array
Independent term in the linear model.

Parameters

fit_intercept : boolean, optional
wether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
normalize : boolean, optional
If True, the regressors X are normalized

Notes

From the implementation point of view, this is just plain Ordinary Least Squares (numpy.linalg.lstsq) wrapped as a predictor object.

Full API documentation: LinearRegressionScikitsLearnNode

class mdp.nodes.LabelBinarizerScikitsLearnNode

Binarize labels in a one-vs-all fashion

This node has been automatically generated by wrapping the sklearn.preprocessing.LabelBinarizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Several regression and binary classification algorithms are available in the scikit. A simple way to extend these algorithms to the multi-class classification case is to use the so-called one-vs-all scheme.

At learning time, this simply consists in learning one regressor or binary classifier per class. In doing so, one needs to convert multi-class labels to binary labels (belong or does not belong to the class). LabelBinarizer makes this process easy with the transform method.

At prediction time, one assigns the class for which the corresponding model gave the greatest confidence. LabelBinarizer makes this easy with the inverse_transform method.

Parameters

neg_label: int (default: 0)
Value with which negative labels must be encoded.
pos_label: int (default: 1)
Value with which positive labels must be encoded.

Attributes

classes_: array of shape [n_class]
Holds the label for each class.

Examples

>>> from sklearn import preprocessing
>>> lb = preprocessing.LabelBinarizer()
>>> lb.fit([1, 2, 6, 4, 2])
LabelBinarizer(neg_label=0, pos_label=1)
>>> lb.classes_
array([1, 2, 4, 6])
>>> lb.transform([1, 6])
array([[1, 0, 0, 0],
[0, 0, 0, 1]])

>>> lb.fit_transform([(1, 2), (3,)])
array([[1, 1, 0],
[0, 0, 1]])
>>> lb.classes_
array([1, 2, 3])


Full API documentation: LabelBinarizerScikitsLearnNode

Classifier implementing a vote among neighbors within a given radius

This node has been automatically generated by wrapping the sklearn.neighbors.classification.RadiusNeighborsClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

radius : float, optional (default = 1.0)
Range of parameter space to use by default for :methradius_neighbors queries.
weights : str or callable

weight function used in prediction. Possible values:

• ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
• ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
• [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

Uniform weights are used by default.

algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional

Algorithm used to compute the nearest neighbors:

• ‘ball_tree’ will use BallTree
• ‘kd_tree’ will use scipy.spatial.cKDtree
• ‘brute’ will use a brute-force search.
• ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_size : int, optional (default = 30)
Leaf size passed to BallTree or cKDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.
p: integer, optional (default = 2)
Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
outlier_label: int, optional (default = None)
Label, which is given for outlier samples (samples with no neighbors on given radius). If set to None, ValueError is raised, when outlier is detected.

Examples

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> neigh.fit(X, y)
>>> print(neigh.predict([[1.5]]))
[0]
`

Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

class mdp.nodes.RidgeClassifierCVScikitsLearnNode

Ridge classifier with built-in cross-validation.

This node has been automatically generated by wrapping the sklearn.linear_model.ridge.RidgeClassifierCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

By default, it performs Generalized Cross-Validation, which is a form of efficient Leave-One-Out cross-validation. Currently, only the n_features > n_samples case is handled efficiently.

Parameters

alphas: numpy array of shape [n_alphas]
Array of alpha values to try. Small positive values of alpha improve the conditioning of the problem and reduce the variance of the estimates. Alpha corresponds to (2*C)^-1 in other linear models such as LogisticRegression or LinearSVC.
fit_intercept : boolean
Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
normalize : boolean, optional
If True, the regressors X are normalized
score_func: callable, optional
function that takes 2 arguments and compares them in order to evaluate the performance of prediction (big is good) if None is passed, the score of the estimator is maximized
loss_func: callable, optional
function that takes 2 arguments and compares them in order to evaluate the performance of prediction (small is good) if None is passed, the score of the estimator is maximized
cv : cross-validation generator, optional
If None, Generalized Cross-Validation (efficient Leave-One-Out) will be used.
class_weight : dict, optional
Weights associated with classes in the form {class_label : weight}. If not given, all classes are supposed to have weight one.

Attributes

cv_values_ : array, shape = [n_samples, n_alphas] or shape = [n_samples, n_responses, n_alphas], optional
Cross-validation values for each alpha (if store_cv_values=True and

cv=None). After fit() has been called, this attribute will contain the mean squared errors (by default) or the values of the {loss,score}_func function (if provided in the constructor).

coef_ : array, shape = [n_features] or [n_responses, n_features]
Weight vector(s).
alpha_ : float
Estimated regularization parameter