Full API documentation: nodes
Filter the input data through the most significatives of its principal components.
Internal variables of interest
- self.avg
- Mean of the input data (available after training).
- self.v
- Transposed of the projection matrix (available after training).
- self.d
- Variance corresponding to the PCA components (eigenvalues of the covariance matrix).
- self.explained_variance
- When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.
More information about Principal Component Analysis, a.k.a. discrete Karhunen-Loeve transform can be found among others in I.T. Jolliffe, Principal Component Analysis, Springer-Verlag (1986).
Full API documentation: PCANode
Whiten the input data by filtering it through the most significatives of its principal components. All output signals have zero mean, unit variance and are decorrelated.
Internal variables of interest
- self.avg
- Mean of the input data (available after training).
- self.v
- Transpose of the projection matrix (available after training).
- self.d
- Variance corresponding to the PCA components (eigenvalues of the covariance matrix).
- self.explained_variance
- When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.
Full API documentation: WhiteningNode
Perform Principal Component Analysis using the NIPALS algorithm. This algorithm is particularyl useful if you have more variable than observations, or in general when the number of variables is huge and calculating a full covariance matrix may be unfeasable. It’s also more efficient of the standard PCANode if you expect the number of significant principal components to be a small. In this case setting output_dim to be a certain fraction of the total variance, say 90%, may be of some help.
Internal variables of interest
- self.avg
- Mean of the input data (available after training).
- self.d
- Variance corresponding to the PCA components.
- self.v
- Transposed of the projection matrix (available after training).
- self.explained_variance
- When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.
Reference for NIPALS (Nonlinear Iterative Partial Least Squares): Wold, H. Nonlinear estimation by iterative least squares procedures. in David, F. (Editor), Research Papers in Statistics, Wiley, New York, pp 411-444 (1966).
More information about Principal Component Analysis, a.k.a. discrete Karhunen-Loeve transform can be found among others in I.T. Jolliffe, Principal Component Analysis, Springer-Verlag (1986).
Original code contributed by: Michael Schmuker, Susanne Lezius, and Farzad Farkhooi (2008).
Full API documentation: NIPALSNode
Perform Independent Component Analysis using the FastICA algorithm. Note that FastICA is a batch-algorithm. This means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.
FastICA does not support the telescope mode (the convergence criterium is not robust in telescope mode).
Reference: Aapo Hyvarinen (1999). Fast and Robust Fixed-Point Algorithms for Independent Component Analysis IEEE Transactions on Neural Networks, 10(3):626-634.
Internal variables of interest
- self.white
- The whitening node used for preprocessing.
- self.filters
- The ICA filters matrix (this is the transposed of the projection matrix after whitening).
- self.convergence
- The value of the convergence threshold.
History:
Full API documentation: FastICANode
Perform Independent Component Analysis using the CuBICA algorithm. Note that CuBICA is a batch-algorithm, which means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.
As an alternative to this batch mode you might consider the telescope mode (see the docs of the __init__ method).
Reference: Blaschke, T. and Wiskott, L. (2003). CuBICA: Independent Component Analysis by Simultaneous Third- and Fourth-Order Cumulant Diagonalization. IEEE Transactions on Signal Processing, 52(5), pp. 1250-1256.
Internal variables of interest
- self.white
- The whitening node used for preprocessing.
- self.filters
- The ICA filters matrix (this is the transposed of the projection matrix after whitening).
- self.convergence
- The value of the convergence threshold.
Full API documentation: CuBICANode
Perform Independent Component Analysis using the TDSEP algorithm. Note that TDSEP, as implemented in this Node, is an online algorithm, i.e. it is suited to be trained on huge data sets, provided that the training is done sending small chunks of data for each time.
Reference: Ziehe, Andreas and Muller, Klaus-Robert (1998). TDSEP an efficient algorithm for blind separation using time structure. in Niklasson, L, Boden, M, and Ziemke, T (Editors), Proc. 8th Int. Conf. Artificial Neural Networks (ICANN 1998).
Internal variables of interest
- self.white
- The whitening node used for preprocessing.
- self.filters
- The ICA filters matrix (this is the transposed of the projection matrix after whitening).
- self.convergence
- The value of the convergence threshold.
Full API documentation: TDSEPNode
Perform Independent Component Analysis using the JADE algorithm. Note that JADE is a batch-algorithm. This means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.
JADE does not support the telescope mode.
Main references:
- Cardoso, Jean-Francois and Souloumiac, Antoine (1993). Blind beamforming for non Gaussian signals. Radar and Signal Processing, IEE Proceedings F, 140(6): 362-370.
- Cardoso, Jean-Francois (1999). High-order contrasts for independent component analysis. Neural Computation, 11(1): 157-192.
Original code contributed by: Gabriel Beckers (2008).
History:
Full API documentation: JADENode
Extract the slowly varying components from the input data. More information about Slow Feature Analysis can be found in Wiskott, L. and Sejnowski, T.J., Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715-770 (2002).
Instance variables of interest
- self.avg
- Mean of the input data (available after training)
- self.sf
- Matrix of the SFA filters (available after training)
- self.d
- Delta values corresponding to the SFA components (generalized eigenvalues). [See the docs of the get_eta_values method for more information]
Special arguments for constructor
- include_last_sample
If False the train method discards the last sample in every chunk during training when calculating the covariance matrix. The last sample is in this case only used for calculating the covariance matrix of the derivatives. The switch should be set to False if you plan to train with several small chunks. For example we can split a sequence (index is time):
x_1 x_2 x_3 x_4in smaller parts like this:
x_1 x_2 x_2 x_3 x_3 x_4The SFANode will see 3 derivatives for the temporal covariance matrix, and the first 3 points for the spatial covariance matrix. Of course you will need to use a generator that connects the small chunks (the last sample needs to be sent again in the next chunk). If include_last_sample was True, depending on the generator you use, you would either get:
x_1 x_2 x_2 x_3 x_3 x_4in which case the last sample of every chunk would be used twice when calculating the covariance matrix, or:
x_1 x_2 x_3 x_4in which case you loose the derivative between x_3 and x_2.
If you plan to train with a single big chunk leave include_last_sample to the default value, i.e. True.
You can even change this behaviour during training. Just set the corresponding switch in the train method.
Full API documentation: SFANode
Get an input signal, expand it in the space of inhomogeneous polynomials of degree 2 and extract its slowly varying components. The get_quadratic_form method returns the input-output function of one of the learned unit as a QuadraticForm object. See the documentation of mdp.utils.QuadraticForm for additional information.
More information about Slow Feature Analysis can be found in Wiskott, L. and Sejnowski, T.J., Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715-770 (2002).
Full API documentation: SFA2Node
Perform Independent Slow Feature Analysis on the input data.
Internal variables of interest
- self.RP
- The global rotation-permutation matrix. This is the filter applied on input_data to get output_data
- self.RPC
- The complete global rotation-permutation matrix. This is a matrix of dimension input_dim x input_dim (the ‘outer space’ is retained)
- self.covs
A mdp.utils.MultipleCovarianceMatrices instance containing the current time-delayed covariance matrices of the input_data. After convergence the uppermost output_dim x output_dim submatrices should be almost diagonal.
self.covs[n-1] is the covariance matrix relative to the n-th time-lag
Note: they are not cleared after convergence. If you need to free some memory, you can safely delete them with:
>>> del self.covs- self.initial_contrast
- A dictionary with the starting contrast and the SFA and ICA parts of it.
- self.final_contrast
- Like the above but after convergence.
Note: If you intend to use this node for large datasets please have a look at the stop_training method documentation for speeding things up.
References: Blaschke, T. , Zito, T., and Wiskott, L. (2007). Independent Slow Feature Analysis and Nonlinear Blind Source Separation. Neural Computation 19(4):994-1021 (2007) http://itb.biologie.hu-berlin.de/~wiskott/Publications/BlasZitoWisk2007-ISFA-NeurComp.pdf
Full API documentation: ISFANode
Perform Non-linear Blind Source Separation using Slow Feature Analysis.
This node is designed to iteratively extract statistically independent sources from (in principle) arbitrary invertible nonlinear mixtures. The method relies on temporal correlations in the sources and consists of a combination of nonlinear SFA and a projection algorithm. More details can be found in the reference given below (once it’s published).
The node has multiple training phases. The number of training phases depends on the number of sources that must be extracted. The recommended way of training this node is through a container flow:
>>> flow = mdp.Flow([XSFANode()])
>>> flow.train(x)
doing so will automatically train all training phases. The argument x to the Flow.train method can be an array or a list of iterables (see the section about Iterators in the MDP tutorial for more info).
If the number of training samples is large, you may run into memory problems: use data iterators and chunk training to reduce memory usage.
If you need to debug training and/or execution of this node, the suggested approach is to use the capabilities of BiMDP. For example:
>>> flow = mdp.Flow([XSFANode()])
>>> tr_filename = bimdp.show_training(flow=flow, data_iterators=x)
>>> ex_filename, out = bimdp.show_execution(flow, x=x)
this will run training and execution with bimdp inspection. Snapshots of the internal flow state for each training phase and execution step will be opened in a web brower and presented as a slideshow.
References: Sprekeler, H., Zito, T., and Wiskott, L. (2009). An Extension of Slow Feature Analysis for Nonlinear Blind Source Separation. Journal of Machine Learning Research. http://cogprints.org/7056/1/SprekelerZitoWiskott-Cogprints-2010.pdf
Full API documentation: XSFANode
Perform a (generalized) Fisher Discriminant Analysis of its input. It is a supervised node that implements FDA using a generalized eigenvalue approach.
FDANode has two training phases and is supervised so make sure to pay attention to the following points when you train it:
More information on Fisher Discriminant Analysis can be found for example in C. Bishop, Neural Networks for Pattern Recognition, Oxford Press, pp. 105-112.
Internal variables of interest
- self.avg
- Mean of the input data (available after training)
- self.v
- Transposed of the projection matrix, so that output = dot(input-self.avg, self.v) (available after training).
Full API documentation: FDANode
Perform Factor Analysis.
The current implementation should be most efficient for long data sets: the sufficient statistics are collected in the training phase, and all EM-cycles are performed at its end.
The execute method returns the Maximum A Posteriori estimate of the latent variables. The generate_input method generates observations from the prior distribution.
Internal variables of interest
- self.mu
- Mean of the input data (available after training)
- self.A
- Generating weights (available after training)
- self.E_y_mtx
- Weights for Maximum A Posteriori inference
- self.sigma
- Vector of estimated variance of the noise for all input components
More information about Factor Analysis can be found in Max Welling’s classnotes: http://www.ics.uci.edu/~welling/classnotes/classnotes.html , in the chapter ‘Linear Models’.
Full API documentation: FANode
Restricted Boltzmann Machine node. An RBM is an undirected probabilistic network with binary variables. The graph is bipartite into observed (visible) and hidden (latent) variables.
By default, the execute method returns the probability of one of the hiden variables being equal to 1 given the input.
Use the sample_v method to sample from the observed variables given a setting of the hidden variables, and sample_h to do the opposite. The energy method can be used to compute the energy of a given setting of all variables.
The network is trained by Contrastive Divergence, as described in Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1711-1800
Internal variables of interest
- self.w
- Generative weights between hidden and observed variables
- self.bv
- bias vector of the observed variables
- self.bh
- bias vector of the hidden variables
For more information on RBMs, see Geoffrey E. Hinton (2007) Boltzmann machine. Scholarpedia, 2(5):1668
Full API documentation: RBMNode
Restricted Boltzmann Machine with softmax labels. An RBM is an undirected probabilistic network with binary variables. In this case, the node is partitioned into a set of observed (visible) variables, a set of hidden (latent) variables, and a set of label variables (also observed), only one of which is active at any time. The node is able to learn associations between the visible variables and the labels.
By default, the execute method returns the probability of one of the hiden variables being equal to 1 given the input.
Use the sample_v method to sample from the observed variables (visible and labels) given a setting of the hidden variables, and sample_h to do the opposite. The energy method can be used to compute the energy of a given setting of all variables.
The network is trained by Contrastive Divergence, as described in Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1711-1800
Internal variables of interest:
- self.w
- Generative weights between hidden and observed variables
- self.bv
- bias vector of the observed variables
- self.bh
- bias vector of the hidden variables
For more information on RBMs with labels, see
- Geoffrey E. Hinton (2007) Boltzmann machine. Scholarpedia, 2(5):1668.
- Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554.
Full API documentation: RBMWithLabelsNode
Learn the topological structure of the input data by building a corresponding graph approximation.
The algorithm expands on the original Neural Gas algorithm (see mdp.nodes NeuralGasNode) in that the algorithm adds new nodes are added to the graph as more data becomes available. Im this way, if the growth rate is appropriate, one can avoid overfitting or underfitting the data.
More information about the Growing Neural Gas algorithm can be found in B. Fritzke, A Growing Neural Gas Network Learns Topologies, in G. Tesauro, D. S. Touretzky, and T. K. Leen (editors), Advances in Neural Information Processing Systems 7, pages 625-632. MIT Press, Cambridge MA, 1995.
Attributes and methods of interest
Full API documentation: GrowingNeuralGasNode
Perform a Locally Linear Embedding analysis on the data.
Internal variables of interest
- self.training_projection
- The LLE projection of the training data (defined when training finishes).
- self.desired_variance
- variance limit used to compute intrinsic dimensionality.
Based on the algorithm outlined in An Introduction to Locally Linear Embedding by L. Saul and S. Roweis, using improvements suggested in Locally Linear Embedding for Classification by D. deRidder and R.P.W. Duin.
References: Roweis, S. and Saul, L., Nonlinear dimensionality reduction by locally linear embedding, Science 290 (5500), pp. 2323-2326, 2000.
Original code contributed by: Jake VanderPlas, University of Washington,
Full API documentation: LLENode
Perform a Hessian Locally Linear Embedding analysis on the data.
Internal variables of interest
- self.training_projection
- the HLLE projection of the training data (defined when training finishes)
- self.desired_variance
- variance limit used to compute intrinsic dimensionality.
Implementation based on algorithm outlined in Donoho, D. L., and Grimes, C., Hessian Eigenmaps: new locally linear embedding techniques for high-dimensional data, Proceedings of the National Academy of Sciences 100(10): 5591-5596, 2003.
Original code contributed by: Jake Vanderplas, University of Washington
Full API documentation: HLLENode
Compute least-square, multivariate linear regression on the input data, i.e., learn coefficients b_j so that:
y_i = b_0 + b_1 x_1 + ... b_N x_N ,
for i = 1 ... M, minimizes the square error given the training x‘s and y‘s.
This is a supervised learning node, and requires input data x and target data y to be supplied during training (see train docstring).
Internal variables of interest
- self.beta
- The coefficients of the linear regression
Full API documentation: LinearRegressionNode
Perform expansion in the space formed by all linear and quadratic monomials. QuadraticExpansionNode() is equivalent to a PolynomialExpansionNode(2)
Full API documentation: QuadraticExpansionNode
Perform expansion in a polynomial space.
Full API documentation: PolynomialExpansionNode
Expand input space with Gaussian Radial Basis Functions (RBFs).
The input data is filtered through a set of unnormalized Gaussian filters, i.e.:
y_j = exp(-0.5/s_j * ||x - c_j||^2)
for isotropic RBFs, or more in general:
y_j = exp(-0.5 * (x-c_j)^T S^-1 (x-c_j))
for anisotropic RBFs.
Full API documentation: RBFExpansionNode
Expands the input signal x according to a list [f_0, ... f_k] of functions.
Each function f_i should take the whole two-dimensional array x as input and output another two-dimensional array. Moreover the output dimension should depend only on the input dimension. The output of the node is [f_0[x], ... f_k[x]], that is, the concatenation of each one of the outputs f_i[x].
Original code contributed by Alberto Escalante.
Full API documentation: GeneralExpansionNode
Perform a trainable radial basis expansion, where the centers and sizes of the basis functions are learned through a growing neural gas.
- positions of RBFs
- position of the nodes of the neural gas
- sizes of the RBFs
- mean distance to the neighbouring nodes.
Important: Adjust the maximum number of nodes to control the dimension of the expansion.
More information on this expansion type can be found in: B. Fritzke. Growing cell structures-a self-organizing network for unsupervised and supervised learning. Neural Networks 7, p. 1441–1460 (1994).
Full API documentation: GrowingNeuralGasExpansionNode
Learn the topological structure of the input data by building a corresponding graph approximation (original Neural Gas algorithm).
The Neural Gas algorithm was originally published in Martinetz, T. and Schulten, K.: A “Neural-Gas” Network Learns Topologies. In Kohonen, T., Maekisara, K., Simula, O., and Kangas, J. (eds.), Artificial Neural Networks. Elsevier, North-Holland., 1991.
Attributes and methods of interest
Full API documentation: NeuralGasNode
This classifier node classifies as 1 if the sum of the data points is positive and as -1 if the data point is negative
Full API documentation: SignumClassifier
A simple perceptron with input_dim input nodes.
Full API documentation: PerceptronClassifier
A simple version of a Markov classifier. It can be trained on a vector of tuples the label being the next element in the testing data.
Full API documentation: SimpleMarkovClassifier
Node for simulating a simple discrete Hopfield model
Full API documentation: DiscreteHopfieldClassifier
Employs K-Means Clustering for a given number of centroids.
Full API documentation: KMeansClassifier
Make input signal meanfree and unit variance
Full API documentation: NormalizeNode
Perform a supervised Gaussian classification.
Given a set of labelled data, the node fits a gaussian distribution to each class.
Full API documentation: GaussianClassifier
Nearest-Mean classifier.
Full API documentation: NearestMeanClassifier
K-Nearest-Neighbour Classifier.
Full API documentation: KNNClassifier
Compute the eta values of the normalized training data.
The delta value of a signal is a measure of its temporal variation, and is defined as the mean of the derivative squared, i.e. delta(x) = mean(dx/dt(t)^2). delta(x) is zero if x is a constant signal, and increases if the temporal variation of the signal is bigger.
The eta value is a more intuitive measure of temporal variation, defined as:
eta(x) = T/(2*pi) * sqrt(delta(x))
If x is a signal of length T which consists of a sine function that accomplishes exactly N oscillations, then eta(x)=N.
EtaComputerNode normalizes the training data to have unit variance, such that it is possible to compare the temporal variation of two signals independently from their scaling.
Reference: Wiskott, L. and Sejnowski, T.J. (2002). Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715-770.
Important: if a data chunk is tlen data points long, this node is going to consider only the first tlen-1 points together with their derivatives. This means in particular that the variance of the signal is not computed on all data points. This behavior is compatible with that of SFANode.
This is an analysis node, i.e. the data is analyzed during training and the results are stored internally. Use the method get_eta to access them.
Full API documentation: EtaComputerNode
Collect the first n local maxima and minima of the training signal which are separated by a minimum gap d.
This is an analysis node, i.e. the data is analyzed during training and the results are stored internally. Use the get_maxima and get_minima methods to access them.
Full API documentation: HitParadeNode
Inject multiplicative or additive noise into the input data.
Original code contributed by Mathias Franzius.
Full API documentation: NoiseNode
Special version of NoiseNode for Gaussian additive noise.
Unlike NoiseNode it does not store a noise function reference but simply uses numx_rand.normal.
Full API documentation: NormalNoiseNode
Copy delayed version of the input signal on the space dimensions.
For example, for time_frames=3 and gap=2:
[ X(1) Y(1) [ X(1) Y(1) X(3) Y(3) X(5) Y(5)
X(2) Y(2) X(2) Y(2) X(4) Y(4) X(6) Y(6)
X(3) Y(3) --> X(3) Y(3) X(5) Y(5) X(7) Y(7)
X(4) Y(4) X(4) Y(4) X(6) Y(6) X(8) Y(8)
X(5) Y(5) ... ... ... ... ... ... ]
X(6) Y(6)
X(7) Y(7)
X(8) Y(8)
... ... ]
It is not always possible to invert this transformation (the transformation is not surjective. However, the pseudo_inverse method does the correct thing when it is indeed possible.
Full API documentation: TimeFramesNode
Copy delayed version of the input signal on the space dimensions.
For example, for time_frames=3 and gap=2:
[ X(1) Y(1) [ X(1) Y(1) 0 0 0 0
X(2) Y(2) X(2) Y(2) 0 0 0 0
X(3) Y(3) --> X(3) Y(3) X(1) Y(1) 0 0
X(4) Y(4) X(4) Y(4) X(2) Y(2) 0 0
X(5) Y(5) X(5) Y(5) X(3) Y(3) X(1) Y(1)
X(6) Y(6) ... ... ... ... ... ... ]
X(7) Y(7)
X(8) Y(8)
... ... ]
This node provides similar functionality as the TimeFramesNode, only that it performs a time embedding into the past rather than into the future.
See TimeDelaySlidingWindowNode for a sliding window delay node for application in a non-batch manner.
Original code contributed by Sebastian Hoefer. Dec 31, 2010
Full API documentation: TimeDelayNode
TimeDelaySlidingWindowNode is an alternative to TimeDelayNode which should be used for online learning/execution. Whereas the TimeDelayNode works in a batch manner, for online application a sliding window is necessary which yields only one row per call.
Applied to the same data the collection of all returned rows of the TimeDelaySlidingWindowNode is equivalent to the result of the TimeDelayNode.
Original code contributed by Sebastian Hoefer. Dec 31, 2010
Full API documentation: TimeDelaySlidingWindowNode
Node to cut off values at specified bounds.
Works similar to numpy.clip, but also works when only a lower or upper bound is specified.
Full API documentation: CutoffNode
Node which uses the data history during training to learn cutoff values.
As opposed to the simple CutoffNode, a different cutoff value is learned for each data coordinate. For example if an upper cutoff fraction of 0.05 is specified, then the upper cutoff bound is set so that the upper 5% of the training data would have been clipped (in each dimension). The cutoff bounds are then applied during execution. This node also works as a HistogramNode, so the histogram data is stored.
When stop_training is called the cutoff values for each coordinate are calculated based on the collected histogram data.
Full API documentation: AdaptiveCutoffNode
Node which stores a history of the data during its training phase.
The data history is stored in self.data_hist and can also be deleted to free memory. Alternatively it can be automatically pickled to disk.
Note that data is only stored during training.
Full API documentation: HistogramNode
Execute returns the input data and the node is not trainable.
This node can be instantiated and is for example useful in complex network layouts.
Full API documentation: IdentityNode
Convolve input data with filter banks.
The filters argument specifies a set of 2D filters that are convolved with the input data during execution. Convolution can be selected to be executed by linear filtering of the data, or in the frequency domain using a Discrete Fourier Transform.
Input data can be given as 3D data, each row being a 2D array to be convolved with the filters, or as 2D data, in which case the input_shape argument must be specified.
This node depends on scipy.
Full API documentation: Convolution2DNode
The ShogunSVMClassifier works as a wrapper class for accessing the SHOGUN machine learning toolbox for support vector machines.
Most kernel machines and linear classifier should work with this class.
Currently, distance machines such as the K-means classifier are not supported yet.
Information to paramters and additional options can be found on http://www.shogun-toolbox.org/
Note that some parts in this classifier might receive some refinement in the future.
This node depends on shogun.
Full API documentation: ShogunSVMClassifier
The LibSVMClassifier class acts as a wrapper around the LibSVM library for support vector machines.
Information to the parameters can be found on http://www.csie.ntu.edu.tw/~cjlin/libsvm/
The class provides access to change kernel and svm type with a text string.
Additionally self.parameter is exposed which allows to change all other svm parameters directly.
This node depends on libsvm.
Full API documentation: LibSVMClassifier
Linear model fitted by minimizing a regularized empirical loss with SGD
This node has been automatically generated by wrapping the sklearn.linear_model.sparse.stochastic_gradient.SGDRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
SGD stands for Stochastic Gradient Descent: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).
The regularizer is a penalty added to the loss function that shrinks model parameters towards the zero vector using either the squared euclidean norm L2 or the absolute norm L1 or a combination of both (Elastic Net). If the parameter update crosses the 0.0 value because of the regularizer, the update is truncated to 0.0 to allow for learning sparse models and achieve online feature selection.
This implementation works with data represented as dense numpy arrays of floating point values for the features.
Parameters
The learning rate:
Attributes
Examples
>>> import numpy as np
>>> from sklearn import linear_model
>>> n_samples, n_features = 10, 5
>>> np.random.seed(0)
>>> y = np.random.randn(n_samples)
>>> X = np.random.randn(n_samples, n_features)
>>> clf = linear_model.sparse.SGDRegressor()
>>> clf.fit(X, y)
SGDRegressor(alpha=0.0001, eta0=0.01, fit_intercept=True,
learning_rate='invscaling', loss='squared_loss', n_iter=5, p=0.1,
penalty='l2', power_t=0.25, rho=1.0, seed=0, shuffle=False,
verbose=0)
See also
RidgeRegression, ElasticNet, Lasso, SVR
Full API documentation: SGDRegressorScikitsLearnNode
Feature ranking with recursive feature elimination.
This node has been automatically generated by wrapping the sklearn.feature_selection.rfe.RFE class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and weights are assigned to each one of them. Then, features whose absolute weights are the smallest are pruned from the current set features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached.
Parameters
A supervised learning estimator with a fit method that updates a coef_ attribute that holds the fitted parameters. The first dimension of the coef_ array must be equal to the number of features of the input dataset of the estimator. Important features must correspond to high absolute values in the coef_ array.
For instance, this is the case for most supervised learning algorithms such as Support Vector Classifiers and Generalized Linear Models from the svm and linear_model modules.
Attributes
Examples
The following example shows how to retrieve the 5 right informative features in the Friedman #1 dataset.
>>> from sklearn.datasets import make_friedman1
>>> from sklearn.feature_selection import RFE
>>> from sklearn.svm import SVR
>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
>>> estimator = SVR(kernel="linear")
>>> selector = RFE(estimator, 5, step=1)
>>> selector = selector.fit(X, y)
>>> selector.support_
array([ True, True, True, True, True,
False, False, False, False, False], dtype=bool)
>>> selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])
References
| [1] | Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., “Gene selection for cancer classification using support vector machines”, Mach. Learn., 46(1-3), 389–422, 2002. |
Full API documentation: RFEScikitsLearnNode
Non-Negative matrix factorization by Projected Gradient (NMF)
This node has been automatically generated by wrapping the sklearn.decomposition.nmf.NMF class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Method used to initialize the procedure. Default: ‘nndsvdar’ Valid options:
- 'nndsvd': Nonnegative Double Singular Value Decomposition (NNDSVD)
- initialization (better for sparseness)
- 'nndsvda': NNDSVD with zeros filled with the average of X
- (better when sparsity is not desired)
- 'nndsvdar': NNDSVD with zeros filled with small random values
- (generally faster, less accurate alternative to NNDSVDa
- for when sparsity is not desired)
- int seed or RandomState: non-negative random matrices
Attributes
Examples
>>> import numpy as np
>>> X = np.array([[1,1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])
>>> from sklearn.decomposition import ProjectedGradientNMF
>>> model = ProjectedGradientNMF(n_components=2, init=0)
>>> model.fit(X)
ProjectedGradientNMF(beta=1, eta=0.1,
init=<mtrand.RandomState object at 0x...>, max_iter=200,
n_components=2, nls_max_iter=2000, sparseness=None, tol=0.0001)
>>> model.components_
array([[ 0.77032744, 0.11118662],
[ 0.38526873, 0.38228063]])
>>> model.reconstruction_err_
0.00746...
>>> model = ProjectedGradientNMF(n_components=2, init=0,
... sparseness='components')
>>> model.fit(X)
ProjectedGradientNMF(beta=1, eta=0.1,
init=<mtrand.RandomState object at 0x...>, max_iter=200,
n_components=2, nls_max_iter=2000, sparseness='components',
tol=0.0001)
>>> model.components_
array([[ 1.67481991, 0.29614922],
[-0. , 0.4681982 ]])
>>> model.reconstruction_err_
0.513...
Notes
This implements C.-J. Lin. Projected gradient methods for non-negative matrix factorization. Neural Computation, 19(2007), 2756-2779. http://www.csie.ntu.edu.tw/~cjlin/nmf/
NNDSVD is introduced in C. Boutsidis, E. Gallopoulos: SVD based initialization: A head start for nonnegative matrix factorization - Pattern Recognition, 2008 http://www.cs.rpi.edu/~boutsc/files/nndsvd.pdf
Full API documentation: NMFScikitsLearnNode
Filter : Select the pvalues below alpha based on a FPR test: False
This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectFpr class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Full API documentation: SelectFprScikitsLearnNode
Variational Inference for the Gaussian Mixture Model
This node has been automatically generated by wrapping the sklearn.mixture.dpgmm.VBGMM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Variational inference for a Gaussian mixture model probability distribution. This class allows for easy and efficient inference of an approximate posterior distribution over the parameters of a gaussian mixture model with a fixed number of components.
Initialization is with normally-distributed means and identity covariance, for proper convergence.
Parameters
Attributes
Precision (inverse covariance) parameters for each mixture component. The shape depends on cvtype:
- (n_components,) if ‘spherical’,
- (n_features, n_features) if ‘tied’,
- (n_components, n_features) if ‘diag’,
- (n_components, n_features, n_features) if ‘full’
Methods
See Also
GMM : Finite gaussian mixture model fit with EM
DPGMM : Ininite gaussian mixture model, using the dirichlet process, fit with a variational algorithm
Full API documentation: VBGMMScikitsLearnNode
Full API documentation: SparseBaseLibSVMScikitsLearnNode
Convert a collection of raw documents to a matrix
This node has been automatically generated by wrapping the sklearn.feature_extraction.text.Vectorizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Equivalent to CountVectorizer followed by TfidfTransformer.
Full API documentation: VectorizerScikitsLearnNode
Variational Inference for the Infinite Gaussian Mixture Model.
This node has been automatically generated by wrapping the sklearn.mixture.dpgmm.DPGMM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
DPGMM stands for Dirichlet Process Gaussian Mixture Model, and it is an infinite mixture model with the Dirichlet Process as a prior distribution on the number of clusters. In practice the approximate inference algorithm uses a truncated distribution with a fixed maximum number of components, but almost always the number of components actually used depends on the data.
Stick-breaking Representation of a Gaussian mixture model probability distribution. This class allows for easy and efficient inference of an approximate posterior distribution over the parameters of a gaussian mixture model with a variable number of components (smaller than the truncation parameter n_components).
Initialization is with normally-distributed means and identity covariance, for proper convergence.
Parameters
Attributes
Precision (inverse covariance) parameters for each mixture component. The shape depends on cvtype:
- (`n_components`,) if 'spherical',
- (`n_features`, `n_features`) if 'tied',
- (`n_components`, `n_features`) if 'diag',
- (`n_components`, `n_features`, `n_features`) if 'full'
Methods
See Also
GMM : Finite gaussian mixture model fit with EM
VBGMM : Finite gaussian mixture model fit with a variational algorithm, better for situations where there might be too little data to get a good estimate of the covariance matrix.
Full API documentation: DPGMMScikitsLearnNode
Lasso model fit with Least Angle Regression a.k.a. Lars
This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.LassoLars class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
It is a Linear Model trained with an L1 prior as regularizer. lasso).
Parameters
Attributes
Examples
>>> from sklearn import linear_model
>>> clf = linear_model.LassoLars(alpha=0.01)
>>> clf.fit([[-1, 1], [0, 0], [1, 1]], [-1, 0, -1])
LassoLars(alpha=0.01, eps=..., fit_intercept=True,
max_iter=500, normalize=True, overwrite_X=False, precompute='auto',
verbose=False)
>>> print clf.coef_
[ 0. -0.963257...]
References
http://en.wikipedia.org/wiki/Least_angle_regression
See also
lars_path, Lasso
Full API documentation: LassoLarsScikitsLearnNode
This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.LinearModelCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Full API documentation: LinearModelCVScikitsLearnNode
Normalize samples individually to unit norm
This node has been automatically generated by wrapping the sklearn.preprocessing.Normalizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.
This transformer is able to work both with dense numpy arrays and scipy.sparse matrix (use CSR format if you want to avoid the burden of a copy / conversion).
Scaling inputs to unit norms is a common operation for text classification or clustering for instance. For instance the dot product of two l2-normalized TF-IDF vectors is the cosine similarity of the vectors and is the base similarity metric for the Vector Space Model commonly used by the Information Retrieval community.
Parameters
Note
This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.
See also
sklearn.preprocessing.normalize() equivalent function without the object oriented API
Full API documentation: NormalizerScikitsLearnNode
Dictionary learning
This node has been automatically generated by wrapping the sklearn.decomposition.dict_learning.DictionaryLearning class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Finds a dictionary (a set of atoms) that can best be used to represent data using a sparse code.
Solves the optimization problem:
(U,V)
with || V_k ||_2 = 1 for all 0 <= k < n_atoms
Parameters
verbose:
- degree of verbosity of the printed output
Attributes
References
J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning for sparse coding (http://www.di.ens.fr/sierra/pdfs/icml09.pdf)
See also
sklearn.decomposition.SparsePCA which solves the transposed problem, finding sparse components to represent data.
Full API documentation: DictionaryLearningScikitsLearnNode
Convert a collection of raw documents to a matrix of token counts
This node has been automatically generated by wrapping the sklearn.feature_extraction.text.CountVectorizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
This implementation produces a sparse representation of the counts using scipy.sparse.coo_matrix.
If you do not provide an a-priori dictionary and you do not use an analyzer that does some kind of feature selection then the number of features will be equal to the vocabulary size found by analysing the data. The default analyzer does simple stop word filtering for English.
Parameters
analyzer: WordNGramAnalyzer or CharNGramAnalyzer, optional
Either a dictionary where keys are tokens and values are indices in the matrix, or an iterable over terms (in which case the indices are determined by the iteration order as per enumerate).
This is useful in order to fix the vocabulary in advance.
When building the vocabulary ignore terms that have a term frequency strictly higher than the given threshold (corpus specific stop words).
This parameter is ignored if vocabulary is not None.
If not None, build a vocabulary that only consider the top max_features ordered by term frequency across the corpus.
This parameter is ignored if vocabulary is not None.
Full API documentation: CountVectorizerScikitsLearnNode
Elastic Net model with iterative fitting along a regularization path
This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.ElasticNetCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The best model is selected by cross-validation.
Parameters
Notes
See examples/linear_model/lasso_path_with_crossvalidation.py for an example.
To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.
The parameter rho corresponds to alpha in the glmnet R package while alpha corresponds to the lambda parameter in glmnet. More specifically, the penalty is:
alpha*rho*L1 + alpha*(1-rho)*L2
If you are interested in controlling the L1 and L2 penalty separately, keep in mind that this is equivalent to:
a*L1 + b*L2
for:
alpha = a + b and rho = a/(a+b)
Full API documentation: ElasticNetCVScikitsLearnNode
Kernel Principal component analysis (KPCA)
This node has been automatically generated by wrapping the sklearn.decomposition.kernel_pca.KernelPCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Non-linear dimensionality reduction through the use of kernels.
Parameters
Attributes
lambdas_, alphas_:
- Eigenvalues and eigenvectors of the centered kernel matrix
- Inverse transform matrix
- Projection of the fitted data on the kernel principal components
Reference
Kernel PCA was intoduced in:
- Bernhard Schoelkopf, Alexander J. Smola,
- and Klaus-Robert Mueller. 1999. Kernel principal
- component analysis. In Advances in kernel methods,
- MIT Press, Cambridge, MA, USA 327-352.
Full API documentation: KernelPCAScikitsLearnNode
This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectPercentile class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Full API documentation: SelectPercentileScikitsLearnNode
Standardize features by removing the mean and scaling to unit variance
This node has been automatically generated by wrapping the sklearn.preprocessing.Scaler class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Centering and scaling happen indepently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using the transform method.
Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual feature do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).
For instance many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered around 0 and have variance in the same order. If a feature has a variance that is orders of magnitude larger that others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.
Parameters
Attributes
See also
sklearn.preprocessing.scale() to perform centering and scaling without using the Transformer object oriented API
sklearn.decomposition.RandomizedPCA with whiten=True to further remove the linear correlation across features.
Full API documentation: ScalerScikitsLearnNode
Ridge regression.
This node has been automatically generated by wrapping the sklearn.linear_model.ridge.Ridge class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
Examples
>>> from sklearn.linear_model import Ridge
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> np.random.seed(0)
>>> y = np.random.randn(n_samples)
>>> X = np.random.randn(n_samples, n_features)
>>> clf = Ridge(alpha=1.0)
>>> clf.fit(X, y)
Ridge(alpha=1.0, fit_intercept=True, normalize=False, overwrite_X=False,
tol=0.001)
Full API documentation: RidgeScikitsLearnNode
CCA Canonical Correlation Analysis. CCA inherits from PLS with mode=”B” and deflation_mode=”canonical”.
This node has been automatically generated by wrapping the sklearn.pls.CCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
Notes
For each component k, find the weights u, v that maximizes max corr(Xk u, Yk v), such that |u| = |v| = 1
Note that it maximizes only the correlations between the scores.
The residual matrix of X (Xk+1) block is obtained by the deflation on the current X score: x_score.
The residual matrix of Y (Yk+1) block is obtained by deflation on the current Y score.
Examples
>>> from sklearn.pls import PLSCanonical, PLSRegression, CCA
>>> X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [3.,5.,4.]]
>>> Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
>>> cca = CCA(n_components=1)
>>> cca.fit(X, Y)
CCA(algorithm='nipals', copy=True, max_iter=500, n_components=1, scale=True,
tol=1e-06)
>>> X_c, Y_c = cca.transform(X, Y)
References
Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. Technical Report 371, Department of Statistics, University of Washington, Seattle, 2000.
In french but still a reference:
Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris:
Editions Technic.
See also
PLSCanonical PLSSVD
Full API documentation: CCAScikitsLearnNode
Cross-validated Least Angle Regression model
This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.LarsCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
See also
lars_path, LassoLARS, LassoLarsCV
Full API documentation: LarsCVScikitsLearnNode
Non-Negative matrix factorization by Projected Gradient (NMF)
This node has been automatically generated by wrapping the sklearn.decomposition.nmf.ProjectedGradientNMF class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Method used to initialize the procedure. Default: ‘nndsvdar’ Valid options:
- 'nndsvd': Nonnegative Double Singular Value Decomposition (NNDSVD)
- initialization (better for sparseness)
- 'nndsvda': NNDSVD with zeros filled with the average of X
- (better when sparsity is not desired)
- 'nndsvdar': NNDSVD with zeros filled with small random values
- (generally faster, less accurate alternative to NNDSVDa
- for when sparsity is not desired)
- int seed or RandomState: non-negative random matrices
Attributes
Examples
>>> import numpy as np
>>> X = np.array([[1,1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])
>>> from sklearn.decomposition import ProjectedGradientNMF
>>> model = ProjectedGradientNMF(n_components=2, init=0)
>>> model.fit(X)
ProjectedGradientNMF(beta=1, eta=0.1,
init=<mtrand.RandomState object at 0x...>, max_iter=200,
n_components=2, nls_max_iter=2000, sparseness=None, tol=0.0001)
>>> model.components_
array([[ 0.77032744, 0.11118662],
[ 0.38526873, 0.38228063]])
>>> model.reconstruction_err_
0.00746...
>>> model = ProjectedGradientNMF(n_components=2, init=0,
... sparseness='components')
>>> model.fit(X)
ProjectedGradientNMF(beta=1, eta=0.1,
init=<mtrand.RandomState object at 0x...>, max_iter=200,
n_components=2, nls_max_iter=2000, sparseness='components',
tol=0.0001)
>>> model.components_
array([[ 1.67481991, 0.29614922],
[-0. , 0.4681982 ]])
>>> model.reconstruction_err_
0.513...
Notes
This implements C.-J. Lin. Projected gradient methods for non-negative matrix factorization. Neural Computation, 19(2007), 2756-2779. http://www.csie.ntu.edu.tw/~cjlin/nmf/
NNDSVD is introduced in C. Boutsidis, E. Gallopoulos: SVD based initialization: A head start for nonnegative matrix factorization - Pattern Recognition, 2008 http://www.cs.rpi.edu/~boutsc/files/nndsvd.pdf
Full API documentation: ProjectedGradientNMFScikitsLearnNode
Center a kernel matrix
This node has been automatically generated by wrapping the sklearn.preprocessing.KernelCenterer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
This is equivalent to centering phi(X) with sklearn.preprocessing.Scaler(with_std=False).
Full API documentation: KernelCentererScikitsLearnNode
Classifier using Ridge regression
This node has been automatically generated by wrapping the sklearn.linear_model.ridge.RidgeClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
Note
For multi-class classification, n_class classifiers are trained in a one-versus-all approach.
Full API documentation: RidgeClassifierScikitsLearnNode
Binarize data (set feature values to 0 or 1) according to a threshold
This node has been automatically generated by wrapping the sklearn.preprocessing.Binarizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The default threshold is 0.0 so that any non-zero values are set to 1.0 and zeros are left untouched.
Binarization is a common operation on text count data where the analyst can decide to only consider the presence or absence of a feature rather than a quantified number of occurences for instance.
It can also be used as a pre-processing step for estimators that consider boolean random variables (e.g. modeled using the Bernoulli distribution in a Bayesian setting).
Parameters
Notes
If the input is a sparse matrix, only the non-zero values are subject to update by the Binarizer class.
This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.
Full API documentation: BinarizerScikitsLearnNode
Linear Support Vector Classification, Sparse Version
This node has been automatically generated by wrapping the sklearn.svm.sparse.classes.LinearSVC class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Similar to SVC with parameter kernel=’linear’, but uses internally liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should be faster for huge datasets.
See sklearn.svm.SVC for a complete list of parameters
Notes
For best results, this accepts a matrix in csr format (scipy.sparse.csr), but should be able to convert from any array-like object (including other sparse representations).
Full API documentation: LinearSVCScikitsLearnNode
Classifier implementing the nearest neighbors vote. (Deprecated)
This node has been automatically generated by wrapping the sklearn.neighbors.classification.NeighborsClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
DEPRECATED IN VERSION 0.9; WILL BE REMOVED IN VERSION 0.11 Please use KNeighborsClassifier or RadiusNeighborsClassifier instead.
Samples participating in the vote are either the k-nearest neighbors (for some k) or all neighbors within some fixed radius around the sample to classify.
Parameters
Algorithm used to compute the nearest neighbors:
Note: fitting on sparse input will override the setting of this parameter, using brute force.
Examples
>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import NeighborsClassifier
>>> neigh = NeighborsClassifier(n_neighbors=2)
>>> neigh.fit(X, y)
NeighborsClassifier(algorithm='auto', classification_type='knn_vote',
leaf_size=30, n_neighbors=2, radius=1.0)
>>> print neigh.predict([[1.5]])
[0]
See also
NearestNeighbors NeighborsRegressor
Notes
See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.
References
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
Full API documentation: NeighborsClassifierScikitsLearnNode
Naive Bayes classifier for multinomial models
This node has been automatically generated by wrapping the sklearn.naive_bayes.MultinomialNB class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf may also work.
Parameters
Methods
Attributes
Empirical log probability of features given a class, P(x_i|y).
(intercept_ and coef_ are properties referring to class_log_prior_ and feature_log_prob_, respectively.)
Examples
>>> import numpy as np
>>> X = np.random.randint(5, size=(6, 100))
>>> Y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import MultinomialNB
>>> clf = MultinomialNB()
>>> clf.fit(X, Y)
MultinomialNB(alpha=1.0, fit_prior=True)
>>> print clf.predict(X[2])
[3]
References
For the rationale behind the names coef_ and intercept_, i.e. naive Bayes as a linear classifier, see J. Rennie et al. (2003), Tackling the poor assumptions of naive Bayes text classifiers, ICML.
Full API documentation: MultinomialNBScikitsLearnNode
Linear Model trained with L1 prior as regularizer (aka the Lasso)
This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.Lasso class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Technically the Lasso model is optimizing the same objective function as the Elastic Net with rho=1.0 (no L2 penalty).
Parameters
Attributes
Examples
>>> from sklearn import linear_model
>>> clf = linear_model.Lasso(alpha=0.1)
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
Lasso(alpha=0.1, fit_intercept=True, max_iter=1000, normalize=False,
overwrite_X=False, precompute='auto', tol=0.0001)
>>> print clf.coef_
[ 0.85 0. ]
>>> print clf.intercept_
0.15
See also
LassoLars decomposition.sparse_encode decomposition.sparse_encode_parallel
Notes
The algorithm used to fit the model is coordinate descent.
To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.
Full API documentation: LassoScikitsLearnNode
Filter : Select the p-values corresponding to an estimated false
This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectFdr class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Full API documentation: SelectFdrScikitsLearnNode
NuSVR for sparse matrices (csr)
This node has been automatically generated by wrapping the sklearn.svm.sparse.classes.NuSVR class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
See sklearn.svm.NuSVC for a complete list of parameters
Notes
For best results, this accepts a matrix in csr format (scipy.sparse.csr), but should be able to convert from any array-like object (including other sparse representations).
Examples
>>> from sklearn.svm.sparse import NuSVR
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> np.random.seed(0)
>>> y = np.random.randn(n_samples)
>>> X = np.random.randn(n_samples, n_features)
>>> clf = NuSVR(nu=0.1, C=1.0)
>>> clf.fit(X, y)
NuSVR(C=1.0, coef0=0.0, degree=3, epsilon=0.1, gamma=0.2, kernel='rbf',
nu=0.1, probability=False, shrinking=True, tol=0.001)
Full API documentation: NuSVRScikitsLearnNode
Feature agglomeration based on Ward hierarchical clustering
This node has been automatically generated by wrapping the sklearn.cluster.hierarchical.WardAgglomeration class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Methods
fit:
- Compute the clustering of features
Attributes
Full API documentation: WardAgglomerationScikitsLearnNode
Full API documentation: SparseBaseLibLinearScikitsLearnNode
Full API documentation: LassoLARSScikitsLearnNode
Principal component analysis (PCA) using randomized SVD
This node has been automatically generated by wrapping the sklearn.decomposition.pca.RandomizedPCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Linear dimensionality reduction using approximated Singular Value Decomposition of the data and keeping only the most significant singular vectors to project the data to a lower dimensional space.
This implementation uses a randomized SVD implementation and can handle both scipy.sparse and numpy dense arrays as input.
Parameters
When True (False by default) the components_ vectors are divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.
Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making there data respect some hard-wired assumptions.
Attributes
Examples
>>> import numpy as np
>>> from sklearn.decomposition import RandomizedPCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = RandomizedPCA(n_components=2)
>>> pca.fit(X)
RandomizedPCA(copy=True, iterated_power=3, n_components=2, whiten=False)
>>> print pca.explained_variance_ratio_
[ 0.99244289 0.00755711]
See also
PCA ProbabilisticPCA
Notes
References:
Full API documentation: RandomizedPCAScikitsLearnNode
Classifier implementing the k-nearest neighbors vote.
This node has been automatically generated by wrapping the sklearn.neighbors.classification.KNeighborsClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
weight function used in prediction. Possible values:
Uniform weights are used by default.
Algorithm used to compute the nearest neighbors:
Note: fitting on sparse input will override the setting of this parameter, using brute force.
Examples
>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import KNeighborsClassifier
>>> neigh = KNeighborsClassifier(n_neighbors=2)
>>> neigh.fit(X, y)
KNeighborsClassifier(...)
>>> print neigh.predict([[1.5]])
[0]
See also
RadiusNeighborsClassifier KNeighborsRegressor RadiusNeighborsRegressor NearestNeighbors
Notes
See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.
References
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
Full API documentation: KNeighborsClassifierScikitsLearnNode
Regression based on k-nearest neighbors.
This node has been automatically generated by wrapping the sklearn.neighbors.regression.KNeighborsRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set.
Parameters
weight function used in prediction. Possible values:
Uniform weights are used by default.
Algorithm used to compute the nearest neighbors:
Note: fitting on sparse input will override the setting of this parameter, using brute force.
Examples
>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import KNeighborsRegressor
>>> neigh = KNeighborsRegressor(n_neighbors=2)
>>> neigh.fit(X, y)
KNeighborsRegressor(...)
>>> print neigh.predict([[1.5]])
[ 0.5]
See also
NearestNeighbors RadiusNeighborsRegressor KNeighborsClassifier RadiusNeighborsClassifier
Notes
See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.
References
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
Full API documentation: KNeighborsRegressorScikitsLearnNode
Sparse Principal Components Analysis (SparsePCA)
This node has been automatically generated by wrapping the sklearn.decomposition.sparse_pca.SparsePCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha.
Parameters
verbose:
- Degree of verbosity of the printed output.
Attributes
See also
PCA
Full API documentation: SparsePCAScikitsLearnNode
Linear Discriminant Analysis (LDA)
This node has been automatically generated by wrapping the sklearn.lda.LDA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
Examples
>>> import numpy as np
>>> from sklearn.lda import LDA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = LDA()
>>> clf.fit(X, y)
LDA(n_components=None, priors=None)
>>> print clf.predict([[-0.8, -1]])
[1]
See also
QDA
Full API documentation: LDAScikitsLearnNode
Linear model fitted by minimizing a regularized empirical loss with SGD.
This node has been automatically generated by wrapping the sklearn.linear_model.stochastic_gradient.SGDClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
SGD stands for Stochastic Gradient Descent: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).
The regularizer is a penalty added to the loss function that shrinks model parameters towards the zero vector using either the squared euclidean norm L2 or the absolute norm L1 or a combination of both (Elastic Net). If the parameter update crosses the 0.0 value because of the regularizer, the update is truncated to 0.0 to allow for learning sparse models and achieve online feature selection.
This implementation works with data represented as dense numpy arrays of floating point values for the features.
Parameters
The learning rate:
Attributes
coef_ : array, shape = [1, n_features] if n_classes == 2 else [n_classes, n_features]
Weights assigned to the features.
Examples
>>> import numpy as np
>>> from sklearn import linear_model
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> Y = np.array([1, 1, 2, 2])
>>> clf = linear_model.SGDClassifier()
>>> clf.fit(X, Y)
SGDClassifier(alpha=0.0001, eta0=0.0, fit_intercept=True,
learning_rate='optimal', loss='hinge', n_iter=5, n_jobs=1,
penalty='l2', power_t=0.5, rho=1.0, seed=0, shuffle=False,
verbose=0)
>>> print clf.predict([[-0.8, -1]])
[ 1.]
See also
LinearSVC, LogisticRegression
Full API documentation: SGDClassifierScikitsLearnNode
Mini-batch Sparse Principal Components Analysis
This node has been automatically generated by wrapping the sklearn.decomposition.sparse_pca.MiniBatchSparsePCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha.
Parameters
verbose:
- degree of output the procedure will print
Full API documentation: MiniBatchSparsePCAScikitsLearnNode
Transform a count matrix to a normalized tf or tf–idf representation
This node has been automatically generated by wrapping the sklearn.feature_extraction.text.TfidfTransformer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Tf means term-frequency while tf–idf means term-frequency times inverse document-frequency. This is a common term weighting scheme in information retrieval, that has also found good use in document classification.
The goal of using tf–idf instead of the raw frequencies of occurrence of a token in a given document is to scale down the impact of tokens that occur very frequently in a given corpus and that are hence empirically less informative than features that occur in a small fraction of the training corpus.
In the SMART notation used in IR, this class implements several tf–idf variants. Tf is always “n” (natural), idf is “t” iff use_idf is given, “n” otherwise, and normalization is “c” iff norm=’l2’, “n” iff norm=None.
Parameters
References
Addison Wesley, pp. 68–74.
Full API documentation: TfidfTransformerScikitsLearnNode
Principal component analysis (PCA)
This node has been automatically generated by wrapping the sklearn.decomposition.pca.PCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Linear dimensionality reduction using Singular Value Decomposition of the data and keeping only the most significant singular vectors to project the data to a lower dimensional space.
This implementation uses the scipy.linalg implementation of the singular value decomposition. It only works for dense arrays and is not scalable to large dimensional data.
The time complexity of this implementation is O(n ** 3) assuming n ~ n_samples ~ n_features.
Parameters
Number of components to keep. if n_components is not set all components are kept:
- n_components == min(n_samples, n_features)
if n_components == ‘mle’, Minka’s MLE is used to guess the dimension
When True (False by default) the components_ vectors are divided by n_samples times singular values to ensure uncorrelated outputs with unit component-wise variances.
Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making there data respect some hard-wired assumptions.
Attributes
Notes
For n_components=’mle’, this class uses the method of Thomas P. Minka:
Automatic Choice of Dimensionality for PCA. NIPS 2000: 598-604
Due to implementation subtleties of the Singular Value Decomposition (SVD), which is used in this implementation, running fit twice on the same matrix can lead to principal components with signs flipped (change in direction). For this reason, it is important to always use the same estimator object to transform data in a consistent fashion.
Examples
>>> import numpy as np
>>> from sklearn.decomposition import PCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = PCA(n_components=2)
>>> pca.fit(X)
PCA(copy=True, n_components=2, whiten=False)
>>> print pca.explained_variance_ratio_
[ 0.99244289 0.00755711]
See also
ProbabilisticPCA RandomizedPCA
Full API documentation: PCAScikitsLearnNode
Feature ranking with recursive feature elimination and cross-validated selection of the best number of features.
This node has been automatically generated by wrapping the sklearn.feature_selection.rfe.RFECV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
A supervised learning estimator with a fit method that updates a coef_ attribute that holds the fitted parameters. The first dimension of the coef_ array must be equal to the number of features of the input dataset of the estimator. Important features must correspond to high absolute values in the coef_ array.
For instance, this is the case for most supervised learning algorithms such as Support Vector Classifiers and Generalized Linear Models from the svm and linear_model modules.
Attributes
Examples
The following example shows how to retrieve the a-priori not known 5 informative features in the Friedman #1 dataset.
>>> from sklearn.datasets import make_friedman1
>>> from sklearn.feature_selection import RFECV
>>> from sklearn.svm import SVR
>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
>>> estimator = SVR(kernel="linear")
>>> selector = RFECV(estimator, step=1, cv=5)
>>> selector = selector.fit(X, y)
>>> selector.support_
array([ True, True, True, True, True,
False, False, False, False, False], dtype=bool)
>>> selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])
References
| [1] | Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., “Gene selection for cancer classification using support vector machines”, Mach. Learn., 46(1-3), 389–422, 2002. |
Full API documentation: RFECVScikitsLearnNode
Lasso linear model with iterative fitting along a regularization path
This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.LassoCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The best model is selected by cross-validation.
Parameters
Notes
See examples/linear_model/lasso_path_with_crossvalidation.py for an example.
To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.
Full API documentation: LassoCVScikitsLearnNode
Filter : Select the p-values corresponding to Family-wise error rate: a
This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectFwe class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Full API documentation: SelectFweScikitsLearnNode
Bayesian ridge regression
This node has been automatically generated by wrapping the sklearn.linear_model.bayes.BayesianRidge class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Fit a Bayesian ridge model and optimize the regularization parameters lambda (precision of the weights) and alpha (precision of the noise).
Parameters
Attributes
Methods
Examples
>>> from sklearn import linear_model
>>> clf = linear_model.BayesianRidge()
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, compute_score=False,
fit_intercept=True, lambda_1=1e-06, lambda_2=1e-06, n_iter=300,
normalize=False, overwrite_X=False, tol=0.001, verbose=False)
>>> clf.predict([[1, 1]])
array([ 1.])
Notes
See examples/linear_model/plot_bayesian_ridge.py for an example.
Full API documentation: BayesianRidgeScikitsLearnNode
Ridge regression with built-in cross-validation.
This node has been automatically generated by wrapping the sklearn.linear_model.ridge.RidgeCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
By default, it performs Generalized Cross-Validation, which is a form of efficient Leave-One-Out cross-validation. Currently, only the n_features > n_samples case is handled efficiently.
Parameters
See also
Ridge
Full API documentation: RidgeCVScikitsLearnNode
Gaussian Mixture Model
This node has been automatically generated by wrapping the sklearn.mixture.gmm.GMM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Representation of a Gaussian mixture model probability distribution. This class allows for easy evaluation of, sampling from, and maximum-likelihood estimation of the parameters of a GMM distribution.
Initializes parameters such that every mixture component has zero mean and identity covariance.
Parameters
Attributes
Covariance parameters for each mixture component. The shape depends on cvtype:
- (n_states,) if ‘spherical’,
- (n_features, n_features) if ‘tied’,
- (n_states, n_features) if ‘diag’,
- (n_states, n_features, n_features) if ‘full’
Methods
See Also
DPGMM : Ininite gaussian mixture model, using the dirichlet process, fit with a variational algorithm
VBGMM : Finite gaussian mixture model fit with a variational algorithm, better for situations where there might be too little data to get a good estimate of the covariance matrix.
Examples
>>> import numpy as np
>>> from sklearn import mixture
>>> np.random.seed(1)
>>> g = mixture.GMM(n_components=2)
>>> # Generate random observations with two modes centered on 0
>>> # and 10 to use for training.
>>> obs = np.concatenate((np.random.randn(100, 1),
... 10 + np.random.randn(300, 1)))
>>> g.fit(obs)
GMM(cvtype='diag', n_components=2)
>>> np.round(g.weights, 2)
array([ 0.75, 0.25])
>>> np.round(g.means, 2)
array([[ 10.05],
[ 0.06]])
>>> np.round(g.covars, 2)
array([[[ 1.02]],
[[ 0.96]]])
>>> g.predict([[0], [2], [9], [10]])
array([1, 1, 0, 0])
>>> np.round(g.score([[0], [2], [9], [10]]), 2)
array([-2.19, -4.58, -1.75, -1.21])
>>> # Refit the model on new data (initial parameters remain the
>>> # same), this time with an even split between the two modes.
>>> g.fit(20 * [[0]] + 20 * [[10]])
GMM(cvtype='diag', n_components=2)
>>> np.round(g.weights, 2)
array([ 0.5, 0.5])
Full API documentation: GMMScikitsLearnNode
Extracts patches from a collection of images
This node has been automatically generated by wrapping the sklearn.feature_extraction.image.PatchExtractor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Full API documentation: PatchExtractorScikitsLearnNode
Bayesian ARD regression.
This node has been automatically generated by wrapping the sklearn.linear_model.bayes.ARDRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Fit the weights of a regression model, using an ARD prior. The weights of the regression model are assumed to be in Gaussian distributions. Also estimate the parameters lambda (precisions of the distributions of the weights) and alpha (precision of the distribution of the noise). The estimation is done by an iterative procedures (Evidence Maximization)
Parameters
Attributes
Methods
Examples
>>> from sklearn import linear_model
>>> clf = linear_model.ARDRegression()
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
ARDRegression(alpha_1=1e-06, alpha_2=1e-06, compute_score=False,
fit_intercept=True, lambda_1=1e-06, lambda_2=1e-06, n_iter=300,
normalize=False, overwrite_X=False, threshold_lambda=10000.0,
tol=0.001, verbose=False)
>>> clf.predict([[1, 1]])
array([ 1.])
Notes
See examples/linear_model/plot_ard.py for an example.
Full API documentation: ARDRegressionScikitsLearnNode
Full API documentation: GenericUnivariateSelectScikitsLearnNode
Naive Bayes classifier for multivariate Bernoulli models.
This node has been automatically generated by wrapping the sklearn.naive_bayes.BernoulliNB class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Like MultinomialNB, this classifier is suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features.
Note: this class does not check whether features are actually boolean.
Parameters
Methods
Attributes
Examples
>>> import numpy as np
>>> X = np.random.randint(2, size=(6, 100))
>>> Y = np.array([1, 2, 3, 4, 4, 5])
>>> from sklearn.naive_bayes import BernoulliNB
>>> clf = BernoulliNB()
>>> clf.fit(X, Y)
BernoulliNB(alpha=1.0, binarize=0.0, fit_prior=True)
>>> print clf.predict(X[2])
[3]
References
C.D. Manning, P. Raghavan and H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234–265.
A. McCallum and K. Nigam (1998). A comparison of event models for naive Bayes text classification. Proc. AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41–48.
V. Metsis, I. Androutsopoulos and G. Paliouras (2006). Spam filtering with naive Bayes – Which naive Bayes? 3rd Conf. on Email and Anti-Spam (CEAS).
Full API documentation: BernoulliNBScikitsLearnNode
Least Angle Regression model a.k.a. LAR
This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.Lars class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
Examples
>>> from sklearn import linear_model
>>> clf = linear_model.Lars(n_nonzero_coefs=1)
>>> clf.fit([[-1, 1], [0, 0], [1, 1]], [-1.1111, 0, -1.1111])
Lars(eps=..., fit_intercept=True, n_nonzero_coefs=1,
normalize=True, overwrite_X=False, precompute='auto', verbose=False)
>>> print clf.coef_
[ 0. -1.11...]
References
http://en.wikipedia.org/wiki/Least_angle_regression
See also
lars_path, LassoLARS, LarsCV, LassoLarsCV decomposition.sparse_encode, decomposition.sparse_encode_parallel
Full API documentation: LarsScikitsLearnNode
This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectKBest class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Full API documentation: SelectKBestScikitsLearnNode
Unsupervised Outliers Detection.
This node has been automatically generated by wrapping the sklearn.svm.classes.OneClassSVM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Estimate the support of a high-dimensional distribution.
Parameters
Attributes
Full API documentation: OneClassSVMScikitsLearnNode
Logistic Regression.
This node has been automatically generated by wrapping the sklearn.linear_model.logistic.LogisticRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Implements L1 and L2 regularized logistic regression.
Parameters
Attributes
See also
LinearSVC
Notes
The underlying C implementation uses a random number generator to select features when fitting the model. It is thus not uncommon, to have slightly different results for the same input data. If that happens, try with a smaller tol parameter.
References
LIBLINEAR – A Library for Large Linear Classification http://www.csie.ntu.edu.tw/~cjlin/liblinear/
Full API documentation: LogisticRegressionScikitsLearnNode
epsilon-Support Vector Regression.
This node has been automatically generated by wrapping the sklearn.svm.classes.SVR class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The free parameters in the model are C and epsilon.
Parameters
Attributes
Examples
>>> from sklearn.svm import SVR
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> np.random.seed(0)
>>> y = np.random.randn(n_samples)
>>> X = np.random.randn(n_samples, n_features)
>>> clf = SVR(C=1.0, epsilon=0.2)
>>> clf.fit(X, y)
SVR(C=1.0, coef0=0.0, degree=3, epsilon=0.2, gamma=0.2, kernel='rbf',
probability=False, shrinking=True, tol=0.001)
See also
NuSVR
Full API documentation: SVRScikitsLearnNode
NuSVC for sparse matrices (csr).
This node has been automatically generated by wrapping the sklearn.svm.sparse.classes.NuSVC class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
See sklearn.svm.NuSVC for a complete list of parameters
Notes
For best results, this accepts a matrix in csr format (scipy.sparse.csr), but should be able to convert from any array-like object (including other sparse representations).
Examples
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> y = np.array([1, 1, 2, 2])
>>> from sklearn.svm.sparse import NuSVC
>>> clf = NuSVC()
>>> clf.fit(X, y)
NuSVC(coef0=0.0, degree=3, gamma=0.5, kernel='rbf', nu=0.5, probability=False,
shrinking=True, tol=0.001)
>>> print clf.predict([[-0.8, -1]])
[ 1.]
Full API documentation: NuSVCScikitsLearnNode
The Gaussian Process model class.
This node has been automatically generated by wrapping the sklearn.gaussian_process.gaussian_process.GaussianProcess class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
A regression function returning an array of outputs of the linear regression functional basis. The number of observations n_samples should be greater than the size p of this basis. Default assumes a simple constant regression trend. Here is the list of built-in regression models:
- ‘constant’, ‘linear’, ‘quadratic’
A stationary autocorrelation function returning the autocorrelation between two points x and x’. Default assumes a squared-exponential autocorrelation model. Here is the list of built-in correlation models:
- ‘absolute_exponential’, ‘squared_exponential’,
- ‘generalized_exponential’, ‘cubic’, ‘linear’
A string specifying the optimization algorithm to be used. Default uses ‘fmin_cobyla’ algorithm from scipy.optimize. Here is the list of available optimizers:
- ‘fmin_cobyla’, ‘Welch’
‘Welch’ optimizer is dued to Welch et al., see reference [2]. It consists in iterating over several one-dimensional optimizations instead of running one single multi-dimensional optimization.
Example
>>> import numpy as np
>>> from sklearn.gaussian_process import GaussianProcess
>>> X = np.atleast_2d([1., 3., 5., 6., 7., 8.]).T
>>> y = (X * np.sin(X)).ravel()
>>> gp = GaussianProcess(theta0=0.1, thetaL=.001, thetaU=1.)
>>> gp.fit(X, y)
GaussianProcess(beta0=None, corr=...,
normalize=..., nugget=...,
...
Implementation details
The presentation implementation is based on a translation of the DACE Matlab toolbox, see reference [1].
References
Full API documentation: GaussianProcessScikitsLearnNode
Linear Model trained with L1 and L2 prior as regularizer
This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.ElasticNet class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
rho=1 is the lasso penalty. Currently, rho <= 0.01 is not reliable, unless you supply your own sequence of alpha.
Parameters
Notes
To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.
The parameter rho corresponds to alpha in the glmnet R package while alpha corresponds to the lambda parameter in glmnet. More specifically, the penalty is:
alpha*rho*L1 + alpha*(1-rho)*L2
If you are interested in controlling the L1 and L2 penalty separately, keep in mind that this is equivalent to:
a*L1 + b*L2
for:
alpha = a + b and rho = a/(a+b)
Full API documentation: ElasticNetScikitsLearnNode
Regression based on nearest neighbors. (Deprecated)
This node has been automatically generated by wrapping the sklearn.neighbors.regression.NeighborsRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
DEPRECATED IN VERSION 0.9; WILL BE REMOVED IN VERSION 0.11 Please use KNeighborsRegressor or RadiusNeighborsRegressor instead.
The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set. Samples used for the regression are either the k-nearest points, or all points within some fixed radius.
Parameters
Algorithm used to compute the nearest neighbors:
Note: fitting on sparse input will override the setting of this parameter, using brute force.
Examples
>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import NeighborsRegressor
>>> neigh = NeighborsRegressor(n_neighbors=2)
>>> neigh.fit(X, y)
NeighborsRegressor(algorithm='auto', classification_type='knn_vote',
leaf_size=30, n_neighbors=2, radius=1.0)
>>> print neigh.predict([[1.5]])
[ 0.5]
See also
NearestNeighbors KNeighborsRegressor RadiusNeighborsRegressor KNeighborsClassifier RadiusNeighborsClassifier
Notes
See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.
References
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
Full API documentation: NeighborsRegressorScikitsLearnNode
Orthogonal Mathching Pursuit model (OMP)
This node has been automatically generated by wrapping the sklearn.linear_model.omp.OrthogonalMatchingPursuit class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
Notes
Orthogonal matching pursuit was introduced in G. Mallat, Z. Zhang, Matching pursuits with time-frequency dictionaries, IEEE Transactions on Signal Processing, Vol. 41, No. 12. (December 1993), pp. 3397-3415. (http://blanche.polytechnique.fr/~mallat/papiers/MallatPursuit93.pdf)
This implementation is based on Rubinstein, R., Zibulevsky, M. and Elad, M., Efficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit Technical Report - CS Technion, April 2008. http://www.cs.technion.ac.il/~ronrubin/Publications/KSVX-OMP-v2.pdf
See also
orthogonal_mp orthogonal_mp_gram lars_path Lars LassoLars decomposition.sparse_encode decomposition.sparse_encode_parallel
Full API documentation: OrthogonalMatchingPursuitScikitsLearnNode
PLS regression
This node has been automatically generated by wrapping the sklearn.pls.PLSRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
PLSRegression inherits from PLS with mode=”A” and deflation_mode=”regression”. Also known PLS2 or PLS in case of one dimensional response.
Parameters
Attributes
Notes
For each component k, find weights u, v that optimizes:
max corr(Xk u, Yk v) * var(Xk u) var(Yk u), such that |u| = |v| = 1
Note that it maximizes both the correlations between the scores and the intra-block variances.
The residual matrix of X (Xk+1) block is obtained by the deflation on the current X score: x_score.
The residual matrix of Y (Yk+1) block is obtained by deflation on the current X score. This performs the PLS regression known as PLS2. This mode is prediction oriented.
Examples
>>> from sklearn.pls import PLSCanonical, PLSRegression, CCA
>>> X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
>>> Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
>>> pls2 = PLSRegression(n_components=2)
>>> pls2.fit(X, Y)
PLSRegression(algorithm='nipals', copy=True, max_iter=500, n_components=2,
scale=True, tol=1e-06)
>>> Y_pred = pls2.predict(X)
References
Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. Technical Report 371, Department of Statistics, University of Washington, Seattle, 2000.
In french but still a reference:
Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris:
Editions Technic.
Full API documentation: PLSRegressionScikitsLearnNode
PLS canonical. PLSCanonical inherits from PLS with mode=”A” and deflation_mode=”canonical”.
This node has been automatically generated by wrapping the sklearn.pls.PLSCanonical class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
n_components: int, number of components to keep. (default 2).
scale: boolean, scale data? (default True)
Attributes
Notes
For each component k, find weights u, v that optimize:
max corr(Xk u, Yk v) * var(Xk u) var(Yk u), such that |u| = |v| = 1
Note that it maximizes both the correlations between the scores and the intra-block variances.
The residual matrix of X (Xk+1) block is obtained by the deflation on the current X score: x_score.
The residual matrix of Y (Yk+1) block is obtained by deflation on the current Y score. This performs a canonical symetric version of the PLS regression. But slightly different than the CCA. This is mode mostly used for modeling.
Examples
>>> from sklearn.pls import PLSCanonical, PLSRegression, CCA
>>> X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
>>> Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
>>> plsca = PLSCanonical(n_components=2)
>>> plsca.fit(X, Y)
PLSCanonical(algorithm='nipals', copy=True, max_iter=500, n_components=2,
scale=True, tol=1e-06)
>>> X_c, Y_c = plsca.transform(X, Y)
References
Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. Technical Report 371, Department of Statistics, University of Washington, Seattle, 2000.
Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris:
Editions Technic.
See also
CCA PLSSVD
Full API documentation: PLSCanonicalScikitsLearnNode
Additional layer on top of PCA that adds a probabilistic evaluation
This node has been automatically generated by wrapping the sklearn.decomposition.pca.ProbabilisticPCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Principal component analysis (PCA)
Linear dimensionality reduction using Singular Value Decomposition of the data and keeping only the most significant singular vectors to project the data to a lower dimensional space.
This implementation uses the scipy.linalg implementation of the singular value decomposition. It only works for dense arrays and is not scalable to large dimensional data.
The time complexity of this implementation is O(n ** 3) assuming n ~ n_samples ~ n_features.
Parameters
Number of components to keep. if n_components is not set all components are kept:
- n_components == min(n_samples, n_features)
if n_components == ‘mle’, Minka’s MLE is used to guess the dimension
When True (False by default) the components_ vectors are divided by n_samples times singular values to ensure uncorrelated outputs with unit component-wise variances.
Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making there data respect some hard-wired assumptions.
Attributes
Notes
For n_components=’mle’, this class uses the method of Thomas P. Minka:
Automatic Choice of Dimensionality for PCA. NIPS 2000: 598-604
Due to implementation subtleties of the Singular Value Decomposition (SVD), which is used in this implementation, running fit twice on the same matrix can lead to principal components with signs flipped (change in direction). For this reason, it is important to always use the same estimator object to transform data in a consistent fashion.
Examples
>>> import numpy as np
>>> from sklearn.decomposition import PCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = PCA(n_components=2)
>>> pca.fit(X)
PCA(copy=True, n_components=2, whiten=False)
>>> print pca.explained_variance_ratio_
[ 0.99244289 0.00755711]
See also
ProbabilisticPCA RandomizedPCA
Full API documentation: ProbabilisticPCAScikitsLearnNode
Ordinary least squares Linear Regression.
This node has been automatically generated by wrapping the sklearn.linear_model.base.LinearRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Attributes
Notes
From the implementation point of view, this is just plain Ordinary Least Squares (numpy.linalg.lstsq) wrapped as a predictor object.
Full API documentation: LinearRegressionScikitsLearnNode
Regression based on neighbors within a fixed radius.
This node has been automatically generated by wrapping the sklearn.neighbors.regression.RadiusNeighborsRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set.
Parameters
weight function used in prediction. Possible values:
Uniform weights are used by default.
Algorithm used to compute the nearest neighbors:
Note: fitting on sparse input will override the setting of this parameter, using brute force.
Examples
>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import RadiusNeighborsRegressor
>>> neigh = RadiusNeighborsRegressor(radius=1.0)
>>> neigh.fit(X, y)
RadiusNeighborsRegressor(...)
>>> print neigh.predict([[1.5]])
[ 0.5]
See also
NearestNeighbors KNeighborsRegressor KNeighborsClassifier RadiusNeighborsClassifier
Notes
See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.
References
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
Full API documentation: RadiusNeighborsRegressorScikitsLearnNode
Binarize labels in a one-vs-all fashion
This node has been automatically generated by wrapping the sklearn.preprocessing.LabelBinarizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Several regression and binary classification algorithms are available in the scikit. A simple way to extend these algorithms to the multi-class classification case is to use the so-called one-vs-all scheme.
At learning time, this simply consists in learning one regressor or binary classifier per class. In doing so, one needs to convert multi-class labels to binary labels (belong or does not belong to the class). LabelBinarizer makes this process easy with the transform method.
At prediction time, one assigns the class for which the corresponding model gave the greatest confidence. LabelBinarizer makes this easy with the inverse_transform method.
Attributes
Examples
>>> from sklearn import preprocessing
>>> clf = preprocessing.LabelBinarizer()
>>> clf.fit([1, 2, 6, 4, 2])
LabelBinarizer()
>>> clf.classes_
array([1, 2, 4, 6])
>>> clf.transform([1, 6])
array([[ 1., 0., 0., 0.],
[ 0., 0., 0., 1.]])
>>> clf.fit_transform([(1, 2), (3,)])
array([[ 1., 1., 0.],
[ 0., 0., 1.]])
>>> clf.classes_
array([1, 2, 3])
Full API documentation: LabelBinarizerScikitsLearnNode
Mini-batch dictionary learning
This node has been automatically generated by wrapping the sklearn.decomposition.dict_learning.MiniBatchDictionaryLearning class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Finds a dictionary (a set of atoms) that can best be used to represent data using a sparse code.
Solves the optimization problem:
(U,V)
with || V_k ||_2 = 1 for all 0 <= k < n_atoms
Parameters
verbose:
- degree of verbosity of the printed output
Attributes
References
J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning for sparse coding (http://www.di.ens.fr/sierra/pdfs/icml09.pdf)
See also
sklearn.decomposition.SparsePCA which solves the transposed problem, finding sparse components to represent data.
Full API documentation: MiniBatchDictionaryLearningScikitsLearnNode
C-Support Vector Classification.
This node has been automatically generated by wrapping the sklearn.svm.classes.SVC class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
Examples
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> y = np.array([1, 1, 2, 2])
>>> from sklearn.svm import SVC
>>> clf = SVC()
>>> clf.fit(X, y)
SVC(C=1.0, coef0=0.0, degree=3, gamma=0.5, kernel='rbf', probability=False,
shrinking=True, tol=0.001)
>>> print clf.predict([[-0.8, -1]])
[ 1.]
See also
SVR, LinearSVC
Full API documentation: SVCScikitsLearnNode
Partial Least Square SVD
This node has been automatically generated by wrapping the sklearn.pls.PLSSVD class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Simply perform a svd on the crosscovariance matrix: X’Y The are no iterative deflation here.
Parameters
Attributes
See also
PLSCanonical CCA
Full API documentation: PLSSVDScikitsLearnNode
Gaussian Naive Bayes (GaussianNB)
This node has been automatically generated by wrapping the sklearn.naive_bayes.GaussianNB class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
Methods
Examples
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> Y = np.array([1, 1, 1, 2, 2, 2])
>>> from sklearn.naive_bayes import GaussianNB
>>> clf = GaussianNB()
>>> clf.fit(X, Y)
GaussianNB()
>>> print clf.predict([[-0.8, -1]])
[1]
Full API documentation: GaussianNBScikitsLearnNode
Classifier implementing a vote among neighbors within a given radius
This node has been automatically generated by wrapping the sklearn.neighbors.classification.RadiusNeighborsClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
weight function used in prediction. Possible values:
Uniform weights are used by default.
Algorithm used to compute the nearest neighbors:
Note: fitting on sparse input will override the setting of this parameter, using brute force.
Examples
>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import RadiusNeighborsClassifier
>>> neigh = RadiusNeighborsClassifier(radius=1.0)
>>> neigh.fit(X, y)
RadiusNeighborsClassifier(...)
>>> print neigh.predict([[1.5]])
[0]
See also
KNeighborsClassifier RadiusNeighborsRegressor KNeighborsRegressor NearestNeighbors
Notes
See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.
References
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
Full API documentation: RadiusNeighborsClassifierScikitsLearnNode
Cross-validated Lasso, using the LARS algorithm
This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.LassoLarsCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
Notes
The object solves the same problem as the LassoCV object. However, unlike the LassoCV, it find the relevent alphas values by itself. In general, because of this property, it will be more stable. However, it is more fragile to heavily multicollinear datasets.
It is more efficient than the LassoCV if only a small number of features are selected compared to the total number, for instance if there are very few samples compared to the number of features.
See also
lars_path, LassoLARS, LarsCV, LassoCV
Full API documentation: LassoLarsCVScikitsLearnNode
Full API documentation: LARSScikitsLearnNode
Lasso model fit with Lars using BIC or AIC for model selection
This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.LassoLarsIC class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
AIC is the Akaike information criterion and BIC is the Bayes Information criterion. Such criteria are useful to select the value of the regularization parameter by making a trade-off between the goodness of fit and the complexity of the model. A good model should explain well the data while being simple.
Parameters
Attributes
Examples
>>> from sklearn import linear_model
>>> clf = linear_model.LassoLarsIC(criterion='bic')
>>> clf.fit([[-1, 1], [0, 0], [1, 1]], [-1.1111, 0, -1.1111])
LassoLarsIC(criterion='bic', eps=..., fit_intercept=True,
max_iter=500, normalize=True, overwrite_X=False, precompute='auto',
verbose=False)
>>> print clf.coef_
[ 0. -1.11...]
References
The estimation of the number of degrees of freedom is given by:
“On the degrees of freedom of the lasso” Hui Zou, Trevor Hastie, and Robert Tibshirani Ann. Statist. Volume 35, Number 5 (2007), 2173-2192.
http://en.wikipedia.org/wiki/Akaike_information_criterion http://en.wikipedia.org/wiki/Bayesian_information_criterion
See also
lars_path, LassoLars, LassoLarsCV
Full API documentation: LassoLarsICScikitsLearnNode
Quadratic Discriminant Analysis (QDA)
This node has been automatically generated by wrapping the sklearn.qda.QDA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
Examples
>>> from sklearn.qda import QDA
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = QDA()
>>> clf.fit(X, y)
QDA(priors=None)
>>> print clf.predict([[-0.8, -1]])
[1]
See also
LDA
Full API documentation: QDAScikitsLearnNode
Full API documentation: RidgeClassifierCVScikitsLearnNode