Full API documentation: nodes
Filter the input data through the most significatives of its principal components.
Internal variables of interest
- self.avg
- Mean of the input data (available after training).
- self.v
- Transposed of the projection matrix (available after training).
- self.d
- Variance corresponding to the PCA components (eigenvalues of the covariance matrix).
- self.explained_variance
- When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.
More information about Principal Component Analysis, a.k.a. discrete Karhunen-Loeve transform can be found among others in I.T. Jolliffe, Principal Component Analysis, Springer-Verlag (1986).
Full API documentation: PCANode
Whiten the input data by filtering it through the most significatives of its principal components. All output signals have zero mean, unit variance and are decorrelated.
Internal variables of interest
- self.avg
- Mean of the input data (available after training).
- self.v
- Transpose of the projection matrix (available after training).
- self.d
- Variance corresponding to the PCA components (eigenvalues of the covariance matrix).
- self.explained_variance
- When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.
Full API documentation: WhiteningNode
Perform Principal Component Analysis using the NIPALS algorithm. This algorithm is particularyl useful if you have more variable than observations, or in general when the number of variables is huge and calculating a full covariance matrix may be unfeasable. It’s also more efficient of the standard PCANode if you expect the number of significant principal components to be a small. In this case setting output_dim to be a certain fraction of the total variance, say 90%, may be of some help.
Internal variables of interest
- self.avg
- Mean of the input data (available after training).
- self.d
- Variance corresponding to the PCA components.
- self.v
- Transposed of the projection matrix (available after training).
- self.explained_variance
- When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.
Reference for NIPALS (Nonlinear Iterative Partial Least Squares): Wold, H. Nonlinear estimation by iterative least squares procedures. in David, F. (Editor), Research Papers in Statistics, Wiley, New York, pp 411-444 (1966).
More information about Principal Component Analysis, a.k.a. discrete Karhunen-Loeve transform can be found among others in I.T. Jolliffe, Principal Component Analysis, Springer-Verlag (1986).
Original code contributed by: Michael Schmuker, Susanne Lezius, and Farzad Farkhooi (2008).
Full API documentation: NIPALSNode
Perform Independent Component Analysis using the FastICA algorithm. Note that FastICA is a batch-algorithm. This means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.
FastICA does not support the telescope mode (the convergence criterium is not robust in telescope mode).
Reference: Aapo Hyvarinen (1999). Fast and Robust Fixed-Point Algorithms for Independent Component Analysis IEEE Transactions on Neural Networks, 10(3):626-634.
Internal variables of interest
- self.white
- The whitening node used for preprocessing.
- self.filters
- The ICA filters matrix (this is the transposed of the projection matrix after whitening).
- self.convergence
- The value of the convergence threshold.
History:
Full API documentation: FastICANode
Perform Independent Component Analysis using the CuBICA algorithm. Note that CuBICA is a batch-algorithm, which means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.
As an alternative to this batch mode you might consider the telescope mode (see the docs of the __init__ method).
Reference: Blaschke, T. and Wiskott, L. (2003). CuBICA: Independent Component Analysis by Simultaneous Third- and Fourth-Order Cumulant Diagonalization. IEEE Transactions on Signal Processing, 52(5), pp. 1250-1256.
Internal variables of interest
- self.white
- The whitening node used for preprocessing.
- self.filters
- The ICA filters matrix (this is the transposed of the projection matrix after whitening).
- self.convergence
- The value of the convergence threshold.
Full API documentation: CuBICANode
Perform Independent Component Analysis using the TDSEP algorithm. Note that TDSEP, as implemented in this Node, is an online algorithm, i.e. it is suited to be trained on huge data sets, provided that the training is done sending small chunks of data for each time.
Reference: Ziehe, Andreas and Muller, Klaus-Robert (1998). TDSEP an efficient algorithm for blind separation using time structure. in Niklasson, L, Boden, M, and Ziemke, T (Editors), Proc. 8th Int. Conf. Artificial Neural Networks (ICANN 1998).
Internal variables of interest
- self.white
- The whitening node used for preprocessing.
- self.filters
- The ICA filters matrix (this is the transposed of the projection matrix after whitening).
- self.convergence
- The value of the convergence threshold.
Full API documentation: TDSEPNode
Perform Independent Component Analysis using the JADE algorithm. Note that JADE is a batch-algorithm. This means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.
JADE does not support the telescope mode.
Main references:
- Cardoso, Jean-Francois and Souloumiac, Antoine (1993). Blind beamforming for non Gaussian signals. Radar and Signal Processing, IEE Proceedings F, 140(6): 362-370.
- Cardoso, Jean-Francois (1999). High-order contrasts for independent component analysis. Neural Computation, 11(1): 157-192.
Original code contributed by: Gabriel Beckers (2008).
History:
Full API documentation: JADENode
Extract the slowly varying components from the input data. More information about Slow Feature Analysis can be found in Wiskott, L. and Sejnowski, T.J., Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715-770 (2002).
Instance variables of interest
- self.avg
- Mean of the input data (available after training)
- self.sf
- Matrix of the SFA filters (available after training)
- self.d
- Delta values corresponding to the SFA components (generalized eigenvalues). [See the docs of the get_eta_values method for more information]
Special arguments for constructor
- include_last_sample
If False the train method discards the last sample in every chunk during training when calculating the covariance matrix. The last sample is in this case only used for calculating the covariance matrix of the derivatives. The switch should be set to False if you plan to train with several small chunks. For example we can split a sequence (index is time):
x_1 x_2 x_3 x_4in smaller parts like this:
x_1 x_2 x_2 x_3 x_3 x_4The SFANode will see 3 derivatives for the temporal covariance matrix, and the first 3 points for the spatial covariance matrix. Of course you will need to use a generator that connects the small chunks (the last sample needs to be sent again in the next chunk). If include_last_sample was True, depending on the generator you use, you would either get:
x_1 x_2 x_2 x_3 x_3 x_4in which case the last sample of every chunk would be used twice when calculating the covariance matrix, or:
x_1 x_2 x_3 x_4in which case you loose the derivative between x_3 and x_2.
If you plan to train with a single big chunk leave include_last_sample to the default value, i.e. True.
You can even change this behaviour during training. Just set the corresponding switch in the train method.
Full API documentation: SFANode
Get an input signal, expand it in the space of inhomogeneous polynomials of degree 2 and extract its slowly varying components. The get_quadratic_form method returns the input-output function of one of the learned unit as a QuadraticForm object. See the documentation of mdp.utils.QuadraticForm for additional information.
More information about Slow Feature Analysis can be found in Wiskott, L. and Sejnowski, T.J., Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715-770 (2002).
Full API documentation: SFA2Node
Perform Independent Slow Feature Analysis on the input data.
Internal variables of interest
- self.RP
- The global rotation-permutation matrix. This is the filter applied on input_data to get output_data
- self.RPC
- The complete global rotation-permutation matrix. This is a matrix of dimension input_dim x input_dim (the ‘outer space’ is retained)
- self.covs
A mdp.utils.MultipleCovarianceMatrices instance containing the current time-delayed covariance matrices of the input_data. After convergence the uppermost output_dim x output_dim submatrices should be almost diagonal.
self.covs[n-1] is the covariance matrix relative to the n-th time-lag
Note: they are not cleared after convergence. If you need to free some memory, you can safely delete them with:
>>> del self.covs- self.initial_contrast
- A dictionary with the starting contrast and the SFA and ICA parts of it.
- self.final_contrast
- Like the above but after convergence.
Note: If you intend to use this node for large datasets please have a look at the stop_training method documentation for speeding things up.
References: Blaschke, T. , Zito, T., and Wiskott, L. (2007). Independent Slow Feature Analysis and Nonlinear Blind Source Separation. Neural Computation 19(4):994-1021 (2007) http://itb.biologie.hu-berlin.de/~wiskott/Publications/BlasZitoWisk2007-ISFA-NeurComp.pdf
Full API documentation: ISFANode
Perform Non-linear Blind Source Separation using Slow Feature Analysis.
This node is designed to iteratively extract statistically independent sources from (in principle) arbitrary invertible nonlinear mixtures. The method relies on temporal correlations in the sources and consists of a combination of nonlinear SFA and a projection algorithm. More details can be found in the reference given below (once it’s published).
The node has multiple training phases. The number of training phases depends on the number of sources that must be extracted. The recommended way of training this node is through a container flow:
>>> flow = mdp.Flow([XSFANode()])
>>> flow.train(x)
doing so will automatically train all training phases. The argument x to the Flow.train method can be an array or a list of iterables (see the section about Iterators in the MDP tutorial for more info).
If the number of training samples is large, you may run into memory problems: use data iterators and chunk training to reduce memory usage.
If you need to debug training and/or execution of this node, the suggested approach is to use the capabilities of BiMDP. For example:
>>> flow = mdp.Flow([XSFANode()])
>>> tr_filename = bimdp.show_training(flow=flow, data_iterators=x)
>>> ex_filename, out = bimdp.show_execution(flow, x=x)
this will run training and execution with bimdp inspection. Snapshots of the internal flow state for each training phase and execution step will be opened in a web brower and presented as a slideshow.
References: Sprekeler, H., Zito, T., and Wiskott, L. (2009). An Extension of Slow Feature Analysis for Nonlinear Blind Source Separation. Journal of Machine Learning Research. http://cogprints.org/7056/1/SprekelerZitoWiskott-Cogprints-2010.pdf
Full API documentation: XSFANode
Perform a (generalized) Fisher Discriminant Analysis of its input. It is a supervised node that implements FDA using a generalized eigenvalue approach.
FDANode has two training phases and is supervised so make sure to pay attention to the following points when you train it:
More information on Fisher Discriminant Analysis can be found for example in C. Bishop, Neural Networks for Pattern Recognition, Oxford Press, pp. 105-112.
Internal variables of interest
- self.avg
- Mean of the input data (available after training)
- self.v
- Transposed of the projection matrix, so that output = dot(input-self.avg, self.v) (available after training).
Full API documentation: FDANode
Perform Factor Analysis.
The current implementation should be most efficient for long data sets: the sufficient statistics are collected in the training phase, and all EM-cycles are performed at its end.
The execute method returns the Maximum A Posteriori estimate of the latent variables. The generate_input method generates observations from the prior distribution.
Internal variables of interest
- self.mu
- Mean of the input data (available after training)
- self.A
- Generating weights (available after training)
- self.E_y_mtx
- Weights for Maximum A Posteriori inference
- self.sigma
- Vector of estimated variance of the noise for all input components
More information about Factor Analysis can be found in Max Welling’s classnotes: http://www.ics.uci.edu/~welling/classnotes/classnotes.html , in the chapter ‘Linear Models’.
Full API documentation: FANode
Restricted Boltzmann Machine node. An RBM is an undirected probabilistic network with binary variables. The graph is bipartite into observed (visible) and hidden (latent) variables.
By default, the execute method returns the probability of one of the hiden variables being equal to 1 given the input.
Use the sample_v method to sample from the observed variables given a setting of the hidden variables, and sample_h to do the opposite. The energy method can be used to compute the energy of a given setting of all variables.
The network is trained by Contrastive Divergence, as described in Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1711-1800
Internal variables of interest
- self.w
- Generative weights between hidden and observed variables
- self.bv
- bias vector of the observed variables
- self.bh
- bias vector of the hidden variables
For more information on RBMs, see Geoffrey E. Hinton (2007) Boltzmann machine. Scholarpedia, 2(5):1668
Full API documentation: RBMNode
Restricted Boltzmann Machine with softmax labels. An RBM is an undirected probabilistic network with binary variables. In this case, the node is partitioned into a set of observed (visible) variables, a set of hidden (latent) variables, and a set of label variables (also observed), only one of which is active at any time. The node is able to learn associations between the visible variables and the labels.
By default, the execute method returns the probability of one of the hiden variables being equal to 1 given the input.
Use the sample_v method to sample from the observed variables (visible and labels) given a setting of the hidden variables, and sample_h to do the opposite. The energy method can be used to compute the energy of a given setting of all variables.
The network is trained by Contrastive Divergence, as described in Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1711-1800
Internal variables of interest:
- self.w
- Generative weights between hidden and observed variables
- self.bv
- bias vector of the observed variables
- self.bh
- bias vector of the hidden variables
For more information on RBMs with labels, see
- Geoffrey E. Hinton (2007) Boltzmann machine. Scholarpedia, 2(5):1668.
- Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554.
Full API documentation: RBMWithLabelsNode
Learn the topological structure of the input data by building a corresponding graph approximation.
The algorithm expands on the original Neural Gas algorithm (see mdp.nodes NeuralGasNode) in that the algorithm adds new nodes are added to the graph as more data becomes available. Im this way, if the growth rate is appropriate, one can avoid overfitting or underfitting the data.
More information about the Growing Neural Gas algorithm can be found in B. Fritzke, A Growing Neural Gas Network Learns Topologies, in G. Tesauro, D. S. Touretzky, and T. K. Leen (editors), Advances in Neural Information Processing Systems 7, pages 625-632. MIT Press, Cambridge MA, 1995.
Attributes and methods of interest
Full API documentation: GrowingNeuralGasNode
Perform a Locally Linear Embedding analysis on the data.
Internal variables of interest
- self.training_projection
- The LLE projection of the training data (defined when training finishes).
- self.desired_variance
- variance limit used to compute intrinsic dimensionality.
Based on the algorithm outlined in An Introduction to Locally Linear Embedding by L. Saul and S. Roweis, using improvements suggested in Locally Linear Embedding for Classification by D. deRidder and R.P.W. Duin.
References: Roweis, S. and Saul, L., Nonlinear dimensionality reduction by locally linear embedding, Science 290 (5500), pp. 2323-2326, 2000.
Original code contributed by: Jake VanderPlas, University of Washington,
Full API documentation: LLENode
Perform a Hessian Locally Linear Embedding analysis on the data.
Internal variables of interest
- self.training_projection
- the HLLE projection of the training data (defined when training finishes)
- self.desired_variance
- variance limit used to compute intrinsic dimensionality.
Implementation based on algorithm outlined in Donoho, D. L., and Grimes, C., Hessian Eigenmaps: new locally linear embedding techniques for high-dimensional data, Proceedings of the National Academy of Sciences 100(10): 5591-5596, 2003.
Original code contributed by: Jake Vanderplas, University of Washington
Full API documentation: HLLENode
Compute least-square, multivariate linear regression on the input data, i.e., learn coefficients b_j so that:
y_i = b_0 + b_1 x_1 + ... b_N x_N ,
for i = 1 ... M, minimizes the square error given the training x‘s and y‘s.
This is a supervised learning node, and requires input data x and target data y to be supplied during training (see train docstring).
Internal variables of interest
- self.beta
- The coefficients of the linear regression
Full API documentation: LinearRegressionNode
Perform expansion in the space formed by all linear and quadratic monomials. QuadraticExpansionNode() is equivalent to a PolynomialExpansionNode(2)
Full API documentation: QuadraticExpansionNode
Perform expansion in a polynomial space.
Full API documentation: PolynomialExpansionNode
Expand input space with Gaussian Radial Basis Functions (RBFs).
The input data is filtered through a set of unnormalized Gaussian filters, i.e.:
y_j = exp(-0.5/s_j * ||x - c_j||^2)
for isotropic RBFs, or more in general:
y_j = exp(-0.5 * (x-c_j)^T S^-1 (x-c_j))
for anisotropic RBFs.
Full API documentation: RBFExpansionNode
Expands the input signal x according to a list [f_0, ... f_k] of functions.
Each function f_i should take the whole two-dimensional array x as input and output another two-dimensional array. Moreover the output dimension should depend only on the input dimension. The output of the node is [f_0[x], ... f_k[x]], that is, the concatenation of each one of the outputs f_i[x].
Original code contributed by Alberto Escalante.
Full API documentation: GeneralExpansionNode
Perform a trainable radial basis expansion, where the centers and sizes of the basis functions are learned through a growing neural gas.
- positions of RBFs
- position of the nodes of the neural gas
- sizes of the RBFs
- mean distance to the neighbouring nodes.
Important: Adjust the maximum number of nodes to control the dimension of the expansion.
More information on this expansion type can be found in: B. Fritzke. Growing cell structures-a self-organizing network for unsupervised and supervised learning. Neural Networks 7, p. 1441–1460 (1994).
Full API documentation: GrowingNeuralGasExpansionNode
Learn the topological structure of the input data by building a corresponding graph approximation (original Neural Gas algorithm).
The Neural Gas algorithm was originally published in Martinetz, T. and Schulten, K.: A “Neural-Gas” Network Learns Topologies. In Kohonen, T., Maekisara, K., Simula, O., and Kangas, J. (eds.), Artificial Neural Networks. Elsevier, North-Holland., 1991.
Attributes and methods of interest
Full API documentation: NeuralGasNode
This classifier node classifies as 1 if the sum of the data points is positive and as -1 if the data point is negative
Full API documentation: SignumClassifier
A simple perceptron with input_dim input nodes.
Full API documentation: PerceptronClassifier
A simple version of a Markov classifier. It can be trained on a vector of tuples the label being the next element in the testing data.
Full API documentation: SimpleMarkovClassifier
Node for simulating a simple discrete Hopfield model
Full API documentation: DiscreteHopfieldClassifier
Employs K-Means Clustering for a given number of centroids.
Full API documentation: KMeansClassifier
Make input signal meanfree and unit variance
Full API documentation: NormalizeNode
Perform a supervised Gaussian classification.
Given a set of labelled data, the node fits a gaussian distribution to each class.
Full API documentation: GaussianClassifier
Nearest-Mean classifier.
Full API documentation: NearestMeanClassifier
K-Nearest-Neighbour Classifier.
Full API documentation: KNNClassifier
Compute the eta values of the normalized training data.
The delta value of a signal is a measure of its temporal variation, and is defined as the mean of the derivative squared, i.e. delta(x) = mean(dx/dt(t)^2). delta(x) is zero if x is a constant signal, and increases if the temporal variation of the signal is bigger.
The eta value is a more intuitive measure of temporal variation, defined as:
eta(x) = T/(2*pi) * sqrt(delta(x))
If x is a signal of length T which consists of a sine function that accomplishes exactly N oscillations, then eta(x)=N.
EtaComputerNode normalizes the training data to have unit variance, such that it is possible to compare the temporal variation of two signals independently from their scaling.
Reference: Wiskott, L. and Sejnowski, T.J. (2002). Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715-770.
Important: if a data chunk is tlen data points long, this node is going to consider only the first tlen-1 points together with their derivatives. This means in particular that the variance of the signal is not computed on all data points. This behavior is compatible with that of SFANode.
This is an analysis node, i.e. the data is analyzed during training and the results are stored internally. Use the method get_eta to access them.
Full API documentation: EtaComputerNode
Collect the first n local maxima and minima of the training signal which are separated by a minimum gap d.
This is an analysis node, i.e. the data is analyzed during training and the results are stored internally. Use the get_maxima and get_minima methods to access them.
Full API documentation: HitParadeNode
Inject multiplicative or additive noise into the input data.
Original code contributed by Mathias Franzius.
Full API documentation: NoiseNode
Special version of NoiseNode for Gaussian additive noise.
Unlike NoiseNode it does not store a noise function reference but simply uses numx_rand.normal.
Full API documentation: NormalNoiseNode
Copy delayed version of the input signal on the space dimensions.
For example, for time_frames=3 and gap=2:
[ X(1) Y(1) [ X(1) Y(1) X(3) Y(3) X(5) Y(5)
X(2) Y(2) X(2) Y(2) X(4) Y(4) X(6) Y(6)
X(3) Y(3) --> X(3) Y(3) X(5) Y(5) X(7) Y(7)
X(4) Y(4) X(4) Y(4) X(6) Y(6) X(8) Y(8)
X(5) Y(5) ... ... ... ... ... ... ]
X(6) Y(6)
X(7) Y(7)
X(8) Y(8)
... ... ]
It is not always possible to invert this transformation (the transformation is not surjective. However, the pseudo_inverse method does the correct thing when it is indeed possible.
Full API documentation: TimeFramesNode
Copy delayed version of the input signal on the space dimensions.
For example, for time_frames=3 and gap=2:
[ X(1) Y(1) [ X(1) Y(1) 0 0 0 0
X(2) Y(2) X(2) Y(2) 0 0 0 0
X(3) Y(3) --> X(3) Y(3) X(1) Y(1) 0 0
X(4) Y(4) X(4) Y(4) X(2) Y(2) 0 0
X(5) Y(5) X(5) Y(5) X(3) Y(3) X(1) Y(1)
X(6) Y(6) ... ... ... ... ... ... ]
X(7) Y(7)
X(8) Y(8)
... ... ]
This node provides similar functionality as the TimeFramesNode, only that it performs a time embedding into the past rather than into the future.
See TimeDelaySlidingWindowNode for a sliding window delay node for application in a non-batch manner.
Original code contributed by Sebastian Hoefer. Dec 31, 2010
Full API documentation: TimeDelayNode
TimeDelaySlidingWindowNode is an alternative to TimeDelayNode which should be used for online learning/execution. Whereas the TimeDelayNode works in a batch manner, for online application a sliding window is necessary which yields only one row per call.
Applied to the same data the collection of all returned rows of the TimeDelaySlidingWindowNode is equivalent to the result of the TimeDelayNode.
Original code contributed by Sebastian Hoefer. Dec 31, 2010
Full API documentation: TimeDelaySlidingWindowNode
Node to cut off values at specified bounds.
Works similar to numpy.clip, but also works when only a lower or upper bound is specified.
Full API documentation: CutoffNode
Node which uses the data history during training to learn cutoff values.
As opposed to the simple CutoffNode, a different cutoff value is learned for each data coordinate. For example if an upper cutoff fraction of 0.05 is specified, then the upper cutoff bound is set so that the upper 5% of the training data would have been clipped (in each dimension). The cutoff bounds are then applied during execution. This node also works as a HistogramNode, so the histogram data is stored.
When stop_training is called the cutoff values for each coordinate are calculated based on the collected histogram data.
Full API documentation: AdaptiveCutoffNode
Node which stores a history of the data during its training phase.
The data history is stored in self.data_hist and can also be deleted to free memory. Alternatively it can be automatically pickled to disk.
Note that data is only stored during training.
Full API documentation: HistogramNode
Execute returns the input data and the node is not trainable.
This node can be instantiated and is for example useful in complex network layouts.
Full API documentation: IdentityNode
Convolve input data with filter banks.
The filters argument specifies a set of 2D filters that are convolved with the input data during execution. Convolution can be selected to be executed by linear filtering of the data, or in the frequency domain using a Discrete Fourier Transform.
Input data can be given as 3D data, each row being a 2D array to be convolved with the filters, or as 2D data, in which case the input_shape argument must be specified.
This node depends on scipy.
Full API documentation: Convolution2DNode
The ShogunSVMClassifier works as a wrapper class for accessing the SHOGUN machine learning toolbox for support vector machines.
Most kernel machines and linear classifier should work with this class.
Currently, distance machines such as the K-means classifier are not supported yet.
Information to paramters and additional options can be found on http://www.shogun-toolbox.org/
Note that some parts in this classifier might receive some refinement in the future.
This node depends on shogun.
Full API documentation: ShogunSVMClassifier
The LibSVMClassifier class acts as a wrapper around the LibSVM library for support vector machines.
Information to the parameters can be found on http://www.csie.ntu.edu.tw/~cjlin/libsvm/
The class provides access to change kernel and svm type with a text string.
Additionally self.parameter is exposed which allows to change all other svm parameters directly.
This node depends on libsvm.
Full API documentation: LibSVMClassifier
Full API documentation: SGDRegressorScikitsLearnNode
Extracts patches from a collection of images
This node has been automatically generated by wrapping the sklearn.feature_extraction.image.PatchExtractor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Full API documentation: PatchExtractorScikitsLearnNode
This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.LinearModelCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Full API documentation: LinearModelCVScikitsLearnNode
Dictionary learning
This node has been automatically generated by wrapping the sklearn.decomposition.dict_learning.DictionaryLearning class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Finds a dictionary (a set of atoms) that can best be used to represent data using a sparse code.
Solves the optimization problem:
(U^*,V^*) = argmin 0.5 || Y - U V ||_2^2 + alpha * || U ||_1
(U,V)
with || V_k ||_2 = 1 for all 0 <= k < n_atoms
Parameters
verbose :
- degree of verbosity of the printed output
Attributes
Notes
References:
J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning for sparse coding (http://www.di.ens.fr/sierra/pdfs/icml09.pdf)
See also
SparseCoder MiniBatchDictionaryLearning SparsePCA MiniBatchSparsePCA
Full API documentation: DictionaryLearningScikitsLearnNode
Perceptron
This node has been automatically generated by wrapping the sklearn.linear_model.perceptron.Perceptron class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Preset for the class_weight fit parameter.
Weights associated with classes. If not given, all classes are supposed to have weight one.
The “auto” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies.
Attributes
coef_ : array, shape = [1, n_features] if n_classes == 2 else [n_classes, n_features]
Weights assigned to the features.
Notes
Perceptron and SGDClassifier share the same underlying implementation. In fact, Perceptron() is equivalent to SGDClassifier(loss=”perceptron”, eta0=1, learning_rate=”constant”, penalty=None).
See also
SGDClassifier
References
http://en.wikipedia.org/wiki/Perceptron and references therein.
Full API documentation: PerceptronScikitsLearnNode
Classifier using Ridge regression.
This node has been automatically generated by wrapping the sklearn.linear_model.ridge.RidgeClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
See also
Ridge, RidgeClassifierCV
Notes
For multi-class classification, n_class classifiers are trained in a one-versus-all approach. Concretely, this is implemented by taking advantage of the multi-variate response support in Ridge.
Full API documentation: RidgeClassifierScikitsLearnNode
Feature agglomeration based on Ward hierarchical clustering
This node has been automatically generated by wrapping the sklearn.cluster.hierarchical.WardAgglomeration class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
Full API documentation: WardAgglomerationScikitsLearnNode
Classifier implementing the k-nearest neighbors vote.
This node has been automatically generated by wrapping the sklearn.neighbors.classification.KNeighborsClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
weight function used in prediction. Possible values:
Uniform weights are used by default.
Algorithm used to compute the nearest neighbors:
Note: fitting on sparse input will override the setting of this parameter, using brute force.
Examples
>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import KNeighborsClassifier
>>> neigh = KNeighborsClassifier(n_neighbors=3)
>>> neigh.fit(X, y)
KNeighborsClassifier(...)
>>> print(neigh.predict([[1.1]]))
[0]
>>> print(neigh.predict_proba([[0.9]]))
[[ 0.66666667 0.33333333]]
See also
RadiusNeighborsClassifier KNeighborsRegressor RadiusNeighborsRegressor NearestNeighbors
Notes
See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
Full API documentation: KNeighborsClassifierScikitsLearnNode
NuSVR for sparse matrices (csr)
This node has been automatically generated by wrapping the sklearn.svm.sparse.classes.NuSVR class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
See sklearn.svm.NuSVC for a complete list of parameters
Notes
For best results, this accepts a matrix in csr format (scipy.sparse.csr), but should be able to convert from any array-like object (including other sparse representations).
Examples
>>> from sklearn.svm.sparse import NuSVR
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> np.random.seed(0)
>>> y = np.random.randn(n_samples)
>>> X = np.random.randn(n_samples, n_features)
>>> clf = NuSVR(nu=0.1, C=1.0)
>>> clf.fit(X, y)
NuSVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma=0.0,
kernel='rbf', nu=0.1, probability=False, shrinking=True, tol=0.001,
verbose=False)
Full API documentation: NuSVRScikitsLearnNode
Nearest centroid classifier.
This node has been automatically generated by wrapping the sklearn.neighbors.nearest_centroid.NearestCentroid class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Each class is represented by its centroid, with test samples classified to the class with the nearest centroid.
Parameters
Attributes
Examples
>>> from sklearn.neighbors.nearest_centroid import NearestCentroid
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = NearestCentroid()
>>> clf.fit(X, y)
NearestCentroid(metric='euclidean', shrink_threshold=None)
>>> print clf.predict([[-0.8, -1]])
[1]
See also
sklearn.neighbors.KNeighborsClassifier: nearest neighbors classifier
Notes
When used for text classification with tf–idf vectors, this classifier is also known as the Rocchio classifier.
References
Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences of the United States of America, 99(10), 6567-6572. The National Academy of Sciences.
Full API documentation: NearestCentroidScikitsLearnNode
An extremely randomized tree regressor.
This node has been automatically generated by wrapping the sklearn.tree.tree.ExtraTreeRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Extra-trees differ from classic decision trees in the way they are built. When looking for the best split to separate the samples of a node into two groups, random splits are drawn for each of the max_features randomly selected features and the best split among those is chosen. When max_features is set 1, this amounts to building a totally random decision tree.
Warning: Extra-trees should only be used within ensemble methods.
See also
ExtraTreeClassifier : A classifier base on extremely randomized trees sklearn.ensemble.ExtraTreesClassifier : An ensemble of extra-trees for
classification
References
| [1] | P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006. |
Full API documentation: ExtraTreeRegressorScikitsLearnNode
An extra-trees classifier.
This node has been automatically generated by wrapping the sklearn.ensemble.forest.ExtraTreesClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.
Parameters
Note: this parameter is tree-specific.
Attributes
References
| [1] | P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006. |
See also
sklearn.tree.ExtraTreeClassifier : Base classifier for this ensemble. RandomForestClassifier : Ensemble Classifier based on trees with optimal
splits.
Full API documentation: ExtraTreesClassifierScikitsLearnNode
Lasso linear model with iterative fitting along a regularization path
This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.LassoCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The best model is selected by cross-validation.
The optimization objective for Lasso is:
(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1
Parameters
Attributes
Notes
See examples/linear_model/lasso_path_with_crossvalidation.py for an example.
To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.
See also
lars_path lasso_path LassoLars Lasso LassoLarsCV
Full API documentation: LassoCVScikitsLearnNode
Unsupervised Outliers Detection.
This node has been automatically generated by wrapping the sklearn.svm.classes.OneClassSVM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Estimate the support of a high-dimensional distribution.
The implementation is based on libsvm.
Parameters
Attributes
Weights asigned to the features (coefficients in the primal problem). This is only available in the case of linear kernel.
coef_ is readonly property derived from dual_coef_ and support_vectors_
Full API documentation: OneClassSVMScikitsLearnNode
Ridge regression with built-in cross-validation.
This node has been automatically generated by wrapping the sklearn.linear_model.ridge.RidgeCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
By default, it performs Generalized Cross-Validation, which is a form of efficient Leave-One-Out cross-validation.
Parameters
Flag indicating which strategy to use when performing Generalized Cross-Validation. Options are:
'auto' : use svd if n_samples > n_features, otherwise use eigen
'svd' : force computation via singular value decomposition of X
'eigen' : force computation via eigendecomposition of X^T X
The ‘auto’ mode is the default and is intended to pick the cheaper option of the two depending upon the shape of the training data.
Attributes
See also
Ridge: Ridge regression RidgeClassifier: Ridge classifier RidgeClassifierCV: Ridge classifier with built-in cross validation
Full API documentation: RidgeCVScikitsLearnNode
An estimator predicting the probability of each
This node has been automatically generated by wrapping the sklearn.ensemble.gradient_boosting.PriorProbabilityEstimator class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Full API documentation: PriorProbabilityEstimatorScikitsLearnNode
Bayesian ARD regression.
This node has been automatically generated by wrapping the sklearn.linear_model.bayes.ARDRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Fit the weights of a regression model, using an ARD prior. The weights of the regression model are assumed to be in Gaussian distributions. Also estimate the parameters lambda (precisions of the distributions of the weights) and alpha (precision of the distribution of the noise). The estimation is done by an iterative procedures (Evidence Maximization)
Parameters
Attributes
Examples
>>> from sklearn import linear_model
>>> clf = linear_model.ARDRegression()
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
...
ARDRegression(alpha_1=1e-06, alpha_2=1e-06, compute_score=False,
copy_X=True, fit_intercept=True, lambda_1=1e-06, lambda_2=1e-06,
n_iter=300, normalize=False, threshold_lambda=10000.0, tol=0.001,
verbose=False)
>>> clf.predict([[1, 1]])
array([ 1.])
Notes
See examples/linear_model/plot_ard.py for an example.
Full API documentation: ARDRegressionScikitsLearnNode
Gradient Boosting for regression.
This node has been automatically generated by wrapping the sklearn.ensemble.gradient_boosting.GradientBoostingRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage a regression tree is fit on the negative gradient of the given loss function.
Parameters
Attributes
Examples
>>> samples = [[0, 0, 2], [1, 0, 0]]
>>> labels = [0, 1]
>>> from sklearn.ensemble import GradientBoostingRegressor
>>> gb = GradientBoostingRegressor().fit(samples, labels)
>>> print gb.predict([[0, 0, 0]])
[ 1.32806...
See also
DecisionTreeRegressor, RandomForestRegressor
References
J. Friedman, Greedy Function Approximation: A Gradient Boosting Machine, The Annals of Statistics, Vol. 29, No. 5, 2001.
T. Hastie, R. Tibshirani and J. Friedman. Elements of Statistical Learning Ed. 2, Springer, 2009.
Full API documentation: GradientBoostingRegressorScikitsLearnNode
PLSCanonical implements the 2 blocks canonical PLS of the original Wold algorithm [Tenenhaus 1998] p.204, refered as PLS-C2A in [Wegelin 2000].
This node has been automatically generated by wrapping the sklearn.pls.PLSCanonical class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
This class inherits from PLS with mode=”A” and deflation_mode=”canonical”, norm_y_weights=True and algorithm=”nipals”, but svd should provide similar results up to numerical errors.
Parameters
n_components : int, number of components to keep. (default 2).
scale : boolean, scale data? (default True)
Attributes
Notes
For each component k, find weights u, v that optimize:
max corr(Xk u, Yk v) * var(Xk u) var(Yk u), such that |u| = |v| = 1
Note that it maximizes both the correlations between the scores and the intra-block variances.
The residual matrix of X (Xk+1) block is obtained by the deflation on the current X score: x_score.
The residual matrix of Y (Yk+1) block is obtained by deflation on the current Y score. This performs a canonical symetric version of the PLS regression. But slightly different than the CCA. This is mode mostly used for modeling.
This implementation provides the same results that the “plspm” package provided in the R language (R-project), using the function plsca(X, Y). Results are equal or colinear with the function pls(..., mode = "canonical") of the “mixOmics” package. The difference relies in the fact that mixOmics implmentation does not exactly implement the Wold algorithm since it does not normalize y_weights to one.
Examples
>>> from sklearn.pls import PLSCanonical, PLSRegression, CCA
>>> X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
>>> Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
>>> plsca = PLSCanonical(n_components=2)
>>> plsca.fit(X, Y)
...
PLSCanonical(algorithm='nipals', copy=True, max_iter=500, n_components=2,
scale=True, tol=1e-06)
>>> X_c, Y_c = plsca.transform(X, Y)
References
Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. Technical Report 371, Department of Statistics, University of Washington, Seattle, 2000.
Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris:
Editions Technic.
See also
CCA PLSSVD
Full API documentation: PLSCanonicalScikitsLearnNode
Filter: Select the best percentile of the p_values
This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectPercentile class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Function taking two arrays X and y, and returning 2 arrays:
Full API documentation: SelectPercentileScikitsLearnNode
A random forest regressor.
This node has been automatically generated by wrapping the sklearn.ensemble.forest.RandomForestRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
A random forest is a meta estimator that fits a number of classifical decision trees on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.
Parameters
The number of features to consider when looking for the best split:
- If “auto”, then max_features=sqrt(n_features) on
- classification tasks and max_features=n_features
- on regression problems.
- If “sqrt”, then max_features=sqrt(n_features).
- If “log2”, then max_features=log2(n_features).
- If None, then max_features=n_features.
Note: this parameter is tree-specific.
Attributes
References
| [1] |
|
See also
DecisionTreeRegressor, ExtraTreesRegressor
Full API documentation: RandomForestRegressorScikitsLearnNode
Gaussian Naive Bayes (GaussianNB)
This node has been automatically generated by wrapping the sklearn.naive_bayes.GaussianNB class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
Examples
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> Y = np.array([1, 1, 1, 2, 2, 2])
>>> from sklearn.naive_bayes import GaussianNB
>>> clf = GaussianNB()
>>> clf.fit(X, Y)
GaussianNB()
>>> print(clf.predict([[-0.8, -1]]))
[1]
Full API documentation: GaussianNBScikitsLearnNode
Hidden Markov Model with Gaussian emissions
This node has been automatically generated by wrapping the sklearn.hmm.GaussianHMM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Representation of a hidden Markov model probability distribution. This class allows for easy evaluation of, sampling from, and maximum-likelihood estimation of the parameters of a HMM.
Parameters
Attributes
Covariance parameters for each state. The shape depends on _covariance_type:
(`n_components`,) if 'spherical',
(`n_features`, `n_features`) if 'tied',
(`n_components`, `n_features`) if 'diag',
(`n_components`, `n_features`, `n_features`) if 'full'
Examples
>>> from sklearn.hmm import GaussianHMM
>>> GaussianHMM(n_components=2)
...
GaussianHMM(algorithm='viterbi',...
See Also
GMM : Gaussian mixture model
Full API documentation: GaussianHMMScikitsLearnNode
LabelSpreading model for semi-supervised learning
This node has been automatically generated by wrapping the sklearn.semi_supervised.label_propagation.LabelSpreading class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
This model is similar to the basic Label Propgation algorithm, but uses affinity matrix based on the normalized graph Laplacian and soft clamping across the labels.
Parameters
Examples
>>> from sklearn import datasets
>>> from sklearn.semi_supervised import LabelSpreading
>>> label_prop_model = LabelSpreading()
>>> iris = datasets.load_iris()
>>> random_unlabeled_points = np.where(np.random.random_integers(0, 1,
... size=len(iris.target)))
>>> labels = np.copy(iris.target)
>>> labels[random_unlabeled_points] = -1
>>> label_prop_model.fit(iris.data, labels)
...
LabelSpreading(...)
References
Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, Bernhard Schölkopf. Learning with local and global consistency (2004) http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.3219
See Also
LabelPropagation : Unregularized graph based semi-supervised learning
Full API documentation: LabelSpreadingScikitsLearnNode
Non-Negative matrix factorization by Projected Gradient (NMF)
This node has been automatically generated by wrapping the sklearn.decomposition.nmf.NMF class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Method used to initialize the procedure. Default: ‘nndsvdar’ Valid options:
'nndsvd': Nonnegative Double Singular Value Decomposition (NNDSVD)
initialization (better for sparseness)
'nndsvda': NNDSVD with zeros filled with the average of X
(better when sparsity is not desired)
'nndsvdar': NNDSVD with zeros filled with small random values
(generally faster, less accurate alternative to NNDSVDa
for when sparsity is not desired)
int seed or RandomState: non-negative random matrices
Attributes
Examples
>>> import numpy as np
>>> X = np.array([[1,1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])
>>> from sklearn.decomposition import ProjectedGradientNMF
>>> model = ProjectedGradientNMF(n_components=2, init=0)
>>> model.fit(X)
ProjectedGradientNMF(beta=1, eta=0.1, init=0, max_iter=200, n_components=2,
nls_max_iter=2000, sparseness=None, tol=0.0001)
>>> model.components_
array([[ 0.77032744, 0.11118662],
[ 0.38526873, 0.38228063]])
>>> model.reconstruction_err_
0.00746...
>>> model = ProjectedGradientNMF(n_components=2, init=0,
... sparseness='components')
>>> model.fit(X)
ProjectedGradientNMF(beta=1, eta=0.1, init=0, max_iter=200, n_components=2,
nls_max_iter=2000, sparseness='components', tol=0.0001)
>>> model.components_
array([[ 1.67481991, 0.29614922],
[-0. , 0.4681982 ]])
>>> model.reconstruction_err_
0.513...
Notes
This implements
C.-J. Lin. Projected gradient methods for non-negative matrix factorization. Neural Computation, 19(2007), 2756-2779. http://www.csie.ntu.edu.tw/~cjlin/nmf/
P. Hoyer. Non-negative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Research 2004.
NNDSVD is introduced in
C. Boutsidis, E. Gallopoulos: SVD based initialization: A head start for nonnegative matrix factorization - Pattern Recognition, 2008 http://www.cs.rpi.edu/~boutsc/files/nndsvd.pdf
Full API documentation: NMFScikitsLearnNode
Full API documentation: SparseBaseLibSVMScikitsLearnNode
Variational Inference for the Infinite Gaussian Mixture Model.
This node has been automatically generated by wrapping the sklearn.mixture.dpgmm.DPGMM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
DPGMM stands for Dirichlet Process Gaussian Mixture Model, and it is an infinite mixture model with the Dirichlet Process as a prior distribution on the number of clusters. In practice the approximate inference algorithm uses a truncated distribution with a fixed maximum number of components, but almost always the number of components actually used depends on the data.
Stick-breaking Representation of a Gaussian mixture model probability distribution. This class allows for easy and efficient inference of an approximate posterior distribution over the parameters of a Gaussian mixture model with a variable number of components (smaller than the truncation parameter n_components).
Initialization is with normally-distributed means and identity covariance, for proper convergence.
Parameters
Attributes
Precision (inverse covariance) parameters for each mixture component. The shape depends on covariance_type:
(`n_components`, 'n_features') if 'spherical',
(`n_features`, `n_features`) if 'tied',
(`n_components`, `n_features`) if 'diag',
(`n_components`, `n_features`, `n_features`) if 'full'
See Also
GMM : Finite Gaussian mixture model fit with EM
Full API documentation: DPGMMScikitsLearnNode
C-Support Vector Classification.
This node has been automatically generated by wrapping the sklearn.svm.classes.SVC class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The implementations is a based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples.
The multiclass support is handled according to a one-vs-one scheme.
For details on the precise mathematical formulation of the provided kernel functions and how gamma, coef0 and degree affect each, see the corresponding section in the narrative documentation:
svm_kernels.
Parameters
Attributes
Weights asigned to the features (coefficients in the primal problem). This is only available in the case of linear kernel.
coef_ is readonly property derived from dual_coef_ and support_vectors_
Examples
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> y = np.array([1, 1, 2, 2])
>>> from sklearn.svm import SVC
>>> clf = SVC()
>>> clf.fit(X, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3,
gamma=0.0, kernel='rbf', probability=False, shrinking=True,
tol=0.001, verbose=False)
>>> print(clf.predict([[-0.8, -1]]))
[ 1.]
See also
Full API documentation: SVCScikitsLearnNode
Variational Inference for the Gaussian Mixture Model
This node has been automatically generated by wrapping the sklearn.mixture.dpgmm.VBGMM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Variational inference for a Gaussian mixture model probability distribution. This class allows for easy and efficient inference of an approximate posterior distribution over the parameters of a Gaussian mixture model with a fixed number of components.
Initialization is with normally-distributed means and identity covariance, for proper convergence.
Parameters
Attributes
Precision (inverse covariance) parameters for each mixture component. The shape depends on covariance_type:
(`n_components`, 'n_features') if 'spherical',
(`n_features`, `n_features`) if 'tied',
(`n_components`, `n_features`) if 'diag',
(`n_components`, `n_features`, `n_features`) if 'full'
See Also
GMM : Finite Gaussian mixture model fit with EM DPGMM : Ininite Gaussian mixture model, using the dirichlet
process, fit with a variational algorithm
Full API documentation: VBGMMScikitsLearnNode
Transforms lists of feature-value mappings to vectors.
This node has been automatically generated by wrapping the sklearn.feature_extraction.dict_vectorizer.DictVectorizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
This transformer turns lists of mappings (dict-like objects) of feature names to feature values into Numpy arrays or scipy.sparse matrices for use with scikit-learn estimators.
When feature values are strings, this transformer will do a binary one-hot (aka one-of-K) coding: one boolean-valued feature is constructed for each of the possible string values that the feature can take on. For instance, a feature “f” that can take on the values “ham” and “spam” will become two features in the output, one signifying “f=ham”, the other “f=spam”.
Features that do not occur in a sample (mapping) will have a zero value in the resulting array/matrix.
Parameters
Examples
>>> from sklearn.feature_extraction import DictVectorizer
>>> v = DictVectorizer(sparse=False)
>>> D = [{'foo': 1, 'bar': 2}, {'foo': 3, 'baz': 1}]
>>> X = v.fit_transform(D)
>>> X
array([[ 2., 0., 1.],
[ 0., 1., 3.]])
>>> v.inverse_transform(X) == [{'bar': 2.0, 'foo': 1.0}, {'baz': 1.0, 'foo': 3.0}]
True
>>> v.transform({'foo': 4, 'unseen_feature': 3})
array([[ 0., 0., 4.]])
Full API documentation: DictVectorizerScikitsLearnNode
Full API documentation: LinearSVCScikitsLearnNode
Randomized Lasso
This node has been automatically generated by wrapping the sklearn.linear_model.randomized_l1.RandomizedLasso class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Randomized Lasso works by resampling the train data and computing a Lasso on each resampling. In short, the features selected more often are good features. It is also known as stability selection.
Parameters
Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:
- None, in which case all the jobs are immediatly created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs
- An int, giving the exact number of total jobs that are spawned
- A string, giving an expression as a function of n_jobs, as in ‘2*n_jobs’
Attributes
Examples
>>> from sklearn.linear_model import RandomizedLasso
>>> randomized_lasso = RandomizedLasso()
Notes
See examples/linear_model/plot_sparse_recovery.py for an example.
References
Stability selection Nicolai Meinshausen, Peter Buhlmann Journal of the Royal Statistical Society: Series B Volume 72, Issue 4, pages 417-473, September 2010 DOI: 10.1111/j.1467-9868.2010.00740.x
See also
RandomizedLogisticRegression, LogisticRegression
Full API documentation: RandomizedLassoScikitsLearnNode
Naive Bayes classifier for multinomial models
This node has been automatically generated by wrapping the sklearn.naive_bayes.MultinomialNB class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf may also work.
Parameters
Attributes
Empirical log probability of features given a class, P(x_i|y).
(intercept_ and coef_ are properties referring to class_log_prior_ and feature_log_prob_, respectively.)
Examples
>>> import numpy as np
>>> X = np.random.randint(5, size=(6, 100))
>>> Y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import MultinomialNB
>>> clf = MultinomialNB()
>>> clf.fit(X, Y)
MultinomialNB(alpha=1.0, fit_prior=True)
>>> print(clf.predict(X[2]))
[3]
Notes
For the rationale behind the names coef_ and intercept_, i.e. naive Bayes as a linear classifier, see J. Rennie et al. (2003), Tackling the poor assumptions of naive Bayes text classifiers, ICML.
Full API documentation: MultinomialNBScikitsLearnNode
Linear Model trained with L1 prior as regularizer (aka the Lasso)
This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.Lasso class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The optimization objective for Lasso is:
(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1
Technically the Lasso model is optimizing the same objective function as the Elastic Net with rho=1.0 (no L2 penalty).
Parameters
Attributes
Examples
>>> from sklearn import linear_model
>>> clf = linear_model.Lasso(alpha=0.1)
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
normalize=False, positive=False, precompute='auto', tol=0.0001,
warm_start=False)
>>> print(clf.coef_)
[ 0.85 0. ]
>>> print(clf.intercept_)
0.15
See also
lars_path lasso_path LassoLars LassoCV LassoLarsCV sklearn.decomposition.sparse_encode
Notes
The algorithm used to fit the model is coordinate descent.
To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.
Full API documentation: LassoScikitsLearnNode
Locally Linear Embedding
This node has been automatically generated by wrapping the sklearn.manifold.locally_linear.LocallyLinearEmbedding class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
auto : algorithm will attempt to choose the best method for input data
Attributes
References
| [1] | Roweis, S. & Saul, L. Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323 (2000). |
| [2] | Donoho, D. & Grimes, C. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proc Natl Acad Sci U S A. 100:5591 (2003). |
| [3] | Zhang, Z. & Wang, J. MLLE: Modified Locally Linear Embedding Using Multiple Weights. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.70.382 |
| [4] | Zhang, Z. & Zha, H. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. Journal of Shanghai Univ. 8:406 (2004) |
Full API documentation: LocallyLinearEmbeddingScikitsLearnNode
Cross-validated Least Angle Regression model
This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.LarsCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
See also
lars_path, LassoLARS, LassoLarsCV
Full API documentation: LarsCVScikitsLearnNode
Linear Discriminant Analysis (LDA)
This node has been automatically generated by wrapping the sklearn.lda.LDA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
A classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule.
The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix.
The fitted model can also be used to reduce the dimensionality of the input, by projecting it to the most discriminative directions.
Parameters
Attributes
Examples
>>> import numpy as np
>>> from sklearn.lda import LDA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = LDA()
>>> clf.fit(X, y)
LDA(n_components=None, priors=None)
>>> print(clf.predict([[-0.8, -1]]))
[1]
See also
sklearn.qda.QDA: Quadratic discriminant analysis
Full API documentation: LDAScikitsLearnNode
This node has been automatically generated by wrapping the sklearn.ensemble.gradient_boosting.QuantileEstimator class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Full API documentation: QuantileEstimatorScikitsLearnNode
Convert a collection of raw documents to a matrix of token counts
This node has been automatically generated by wrapping the sklearn.feature_extraction.text.CountVectorizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
This implementation produces a sparse representation of the counts using scipy.sparse.coo_matrix.
If you do not provide an a-priori dictionary and you do not use an analyzer that does some kind of feature selection then the number of features will be equal to the vocabulary size found by analysing the data. The default analyzer does simple stop word filtering for English.
Parameters
If filename, the sequence passed as an argument to fit is expected to be a list of filenames that need reading to fetch the raw content to analyze.
If ‘file’, the sequence items must have ‘read’ method (file-like object) it is called to fetch the bytes in memory.
Otherwise the input is expected to be the sequence strings or bytes items are expected to be analyzed directly.
Whether the feature should be made of word or character n-grams. Option ‘char_wb’ creates character n-grams only from text inside word boundaries.
If a callable is passed it is used to extract the sequence of features out of the raw, unprocessed input.
If a string, it is passed to _check_stop_list and the appropriate stop list is returned is currently the only supported string value.
If a list, that list is assumed to contain stop words, all of which will be removed from the resulting tokens.
If None, no stop words will be used. max_df can be set to a value in the range [0.7, 1.0) to automatically detect and filter stop words based on intra corpus document frequency of terms.
If not None, build a vocabulary that only consider the top max_features ordered by term frequency across the corpus.
This parameter is ignored if vocabulary is not None.
Full API documentation: CountVectorizerScikitsLearnNode
An extra-trees regressor.
This node has been automatically generated by wrapping the sklearn.ensemble.forest.ExtraTreesRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.
Parameters
The number of features to consider when looking for the best split:
- If “auto”, then max_features=sqrt(n_features) on
- classification tasks and max_features=n_features
- on regression problems.
- If “sqrt”, then max_features=sqrt(n_features).
- If “log2”, then max_features=log2(n_features).
- If None, then max_features=n_features.
Note: this parameter is tree-specific.
Attributes
References
| [1] | P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006. |
See also
sklearn.tree.ExtraTreeRegressor: Base estimator for this ensemble. RandomForestRegressor: Ensemble regressor using trees with optimal splits.
Full API documentation: ExtraTreesRegressorScikitsLearnNode
Hidden Markov Model with multinomial (discrete) emissions
This node has been automatically generated by wrapping the sklearn.hmm.MultinomialHMM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Attributes
Examples
>>> from sklearn.hmm import MultinomialHMM
>>> MultinomialHMM(n_components=2)
...
MultinomialHMM(algorithm='viterbi',...
See Also
GaussianHMM : HMM with Gaussian emissions
Full API documentation: MultinomialHMMScikitsLearnNode
Label Propagation classifier
This node has been automatically generated by wrapping the sklearn.semi_supervised.label_propagation.LabelPropagation class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Examples
>>> from sklearn import datasets
>>> from sklearn.semi_supervised import LabelPropagation
>>> label_prop_model = LabelPropagation()
>>> iris = datasets.load_iris()
>>> random_unlabeled_points = np.where(np.random.random_integers(0, 1,
... size=len(iris.target)))
>>> labels = np.copy(iris.target)
>>> labels[random_unlabeled_points] = -1
>>> label_prop_model.fit(iris.data, labels)
...
LabelPropagation(...)
References
Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002 http://pages.cs.wisc.edu/~jerryzhu/pub/CMU-CALD-02-107.pdf
See Also
LabelSpreading : Alternate label proagation strategy more robust to noise
Full API documentation: LabelPropagationScikitsLearnNode
The Gaussian Process model class.
This node has been automatically generated by wrapping the sklearn.gaussian_process.gaussian_process.GaussianProcess class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
A regression function returning an array of outputs of the linear regression functional basis. The number of observations n_samples should be greater than the size p of this basis. Default assumes a simple constant regression trend. Available built-in regression models are:
'constant', 'linear', 'quadratic'
A stationary autocorrelation function returning the autocorrelation between two points x and x’. Default assumes a squared-exponential autocorrelation model. Built-in correlation models are:
'absolute_exponential', 'squared_exponential',
'generalized_exponential', 'cubic', 'linear'
A string specifying the optimization algorithm to be used. Default uses ‘fmin_cobyla’ algorithm from scipy.optimize. Available optimizers are:
'fmin_cobyla', 'Welch'
‘Welch’ optimizer is dued to Welch et al., see reference [WBSWM1992]. It consists in iterating over several one-dimensional optimizations instead of running one single multi-dimensional optimization.
Attributes
Examples
>>> import numpy as np
>>> from sklearn.gaussian_process import GaussianProcess
>>> X = np.array([[1., 3., 5., 6., 7., 8.]]).T
>>> y = (X * np.sin(X)).ravel()
>>> gp = GaussianProcess(theta0=0.1, thetaL=.001, thetaU=1.)
>>> gp.fit(X, y)
GaussianProcess(beta0=None...
...
Notes
The presentation implementation is based on a translation of the DACE Matlab toolbox, see reference [NLNS2002].
References
| [NLNS2002] | H.B. Nielsen, S.N. Lophaven, H. B. Nielsen and J. Sondergaard. DACE - A MATLAB Kriging Toolbox. (2002) http://www2.imm.dtu.dk/~hbn/dace/dace.pdf |
| [WBSWM1992] | W.J. Welch, R.J. Buck, J. Sacks, H.P. Wynn, T.J. Mitchell, and M.D. Morris (1992). Screening, predicting, and computer experiments. Technometrics, 34(1) 15–25. http://www.jstor.org/pss/1269548 |
Full API documentation: GaussianProcessScikitsLearnNode
This node has been automatically generated by wrapping the sklearn.ensemble.gradient_boosting.MeanEstimator class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Full API documentation: MeanEstimatorScikitsLearnNode
Regression based on neighbors within a fixed radius.
This node has been automatically generated by wrapping the sklearn.neighbors.regression.RadiusNeighborsRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set.
Parameters
weight function used in prediction. Possible values:
Uniform weights are used by default.
Algorithm used to compute the nearest neighbors:
Note: fitting on sparse input will override the setting of this parameter, using brute force.
Examples
>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import RadiusNeighborsRegressor
>>> neigh = RadiusNeighborsRegressor(radius=1.0)
>>> neigh.fit(X, y)
RadiusNeighborsRegressor(...)
>>> print(neigh.predict([[1.5]]))
[ 0.5]
See also
NearestNeighbors KNeighborsRegressor KNeighborsClassifier RadiusNeighborsClassifier
Notes
See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
Full API documentation: RadiusNeighborsRegressorScikitsLearnNode
Partial Least Square SVD
This node has been automatically generated by wrapping the sklearn.pls.PLSSVD class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Simply perform a svd on the crosscovariance matrix: X’Y The are no iterative deflation here.
Parameters
Attributes
See also
PLSCanonical CCA
Full API documentation: PLSSVDScikitsLearnNode
Cross-validated Lasso, using the LARS algorithm
This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.LassoLarsCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The optimization objective for Lasso is:
(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1
Parameters
Attributes
Notes
The object solves the same problem as the LassoCV object. However, unlike the LassoCV, it find the relevent alphas values by itself. In general, because of this property, it will be more stable. However, it is more fragile to heavily multicollinear datasets.
It is more efficient than the LassoCV if only a small number of features are selected compared to the total number, for instance if there are very few samples compared to the number of features.
See also
lars_path, LassoLars, LarsCV, LassoCV
Full API documentation: LassoLarsCVScikitsLearnNode
Regression based on k-nearest neighbors.
This node has been automatically generated by wrapping the sklearn.neighbors.regression.KNeighborsRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set.
Parameters
weight function used in prediction. Possible values:
Uniform weights are used by default.
Algorithm used to compute the nearest neighbors:
Note: fitting on sparse input will override the setting of this parameter, using brute force.
Examples
>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import KNeighborsRegressor
>>> neigh = KNeighborsRegressor(n_neighbors=2)
>>> neigh.fit(X, y)
KNeighborsRegressor(...)
>>> print(neigh.predict([[1.5]]))
[ 0.5]
See also
NearestNeighbors RadiusNeighborsRegressor KNeighborsClassifier RadiusNeighborsClassifier
Notes
See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
Full API documentation: KNeighborsRegressorScikitsLearnNode
A random forest classifier.
This node has been automatically generated by wrapping the sklearn.ensemble.forest.RandomForestClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
A random forest is a meta estimator that fits a number of classifical decision trees on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.
Parameters
The number of features to consider when looking for the best split:
- If “auto”, then max_features=sqrt(n_features) on
- classification tasks and max_features=n_features on regression
- problems.
- If “sqrt”, then max_features=sqrt(n_features).
- If “log2”, then max_features=log2(n_features).
- If None, then max_features=n_features.
Note: this parameter is tree-specific.
Attributes
References
| [1] |
|
See also
DecisionTreeClassifier, ExtraTreesClassifier
Full API documentation: RandomForestClassifierScikitsLearnNode
Base class for forest of trees-based regressors.
This node has been automatically generated by wrapping the sklearn.ensemble.forest.ForestRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Warning: This class should not be used directly. Use derived classes instead.
Full API documentation: ForestRegressorScikitsLearnNode
Least Angle Regression model a.k.a. LAR
This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.Lars class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
Examples
>>> from sklearn import linear_model
>>> clf = linear_model.Lars(n_nonzero_coefs=1)
>>> clf.fit([[-1, 1], [0, 0], [1, 1]], [-1.1111, 0, -1.1111])
...
Lars(copy_X=True, eps=..., fit_intercept=True, fit_path=True,
n_nonzero_coefs=1, normalize=True, precompute='auto', verbose=False)
>>> print(clf.coef_)
[ 0. -1.11...]
See also
lars_path, LarsCV sklearn.decomposition.sparse_encode
http://en.wikipedia.org/wiki/Least_angle_regression
Full API documentation: LarsScikitsLearnNode
Linear Model trained with L1 and L2 prior as regularizer
This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.ElasticNet class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Minimizes the objective function:
1 / (2 * n_samples) * ||y - Xw||^2_2 +
+ alpha * rho * ||w||_1 + 0.5 * alpha * (1 - rho) * ||w||^2_2
If you are interested in controlling the L1 and L2 penalty separately, keep in mind that this is equivalent to:
a * L1 + b * L2
where:
alpha = a + b and rho = a / (a + b)
The parameter rho corresponds to alpha in the glmnet R package while alpha corresponds to the lambda parameter in glmnet. Specifically, rho = 1 is the lasso penalty. Currently, rho <= 0.01 is not reliable, unless you supply your own sequence of alpha.
Parameters
Attributes
Notes
To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.
Full API documentation: ElasticNetScikitsLearnNode
Isomap Embedding
This node has been automatically generated by wrapping the sklearn.manifold.isomap.Isomap class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Non-linear dimensionality reduction through Isometric Mapping
Parameters
Attributes
kernel_pca_ : KernelPCA object used to implement the embedding
References
Full API documentation: IsomapScikitsLearnNode
Binarize data (set feature values to 0 or 1) according to a threshold
This node has been automatically generated by wrapping the sklearn.preprocessing.Binarizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The default threshold is 0.0 so that any non-zero values are set to 1.0 and zeros are left untouched.
Binarization is a common operation on text count data where the analyst can decide to only consider the presence or absence of a feature rather than a quantified number of occurences for instance.
It can also be used as a pre-processing step for estimators that consider boolean random variables (e.g. modeled using the Bernoulli distribution in a Bayesian setting).
Parameters
Notes
If the input is a sparse matrix, only the non-zero values are subject to update by the Binarizer class.
This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.
Full API documentation: BinarizerScikitsLearnNode
Mini-batch dictionary learning
This node has been automatically generated by wrapping the sklearn.decomposition.dict_learning.MiniBatchDictionaryLearning class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Finds a dictionary (a set of atoms) that can best be used to represent data using a sparse code.
Solves the optimization problem:
(U^*,V^*) = argmin 0.5 || Y - U V ||_2^2 + alpha * || U ||_1
(U,V)
with || V_k ||_2 = 1 for all 0 <= k < n_atoms
Parameters
verbose :
- degree of verbosity of the printed output
Attributes
Notes
References:
J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning for sparse coding (http://www.di.ens.fr/sierra/pdfs/icml09.pdf)
See also
SparseCoder DictionaryLearning SparsePCA MiniBatchSparsePCA
Full API documentation: MiniBatchDictionaryLearningScikitsLearnNode
Convert a collection of raw documents to a matrix of TF-IDF features.
This node has been automatically generated by wrapping the sklearn.feature_extraction.text.TfidfVectorizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Equivalent to CountVectorizer followed by TfidfTransformer.
Parameters
If filename, the sequence passed as an argument to fit is expected to be a list of filenames that need reading to fetch the raw content to analyze.
If ‘file’, the sequence items must have ‘read’ method (file-like object) it is called to fetch the bytes in memory.
Otherwise the input is expected to be the sequence strings or bytes items are expected to be analyzed directly.
Whether the feature should be made of word or character n-grams.
If a callable is passed it is used to extract the sequence of features out of the raw, unprocessed input.
If a string, it is passed to _check_stop_list and the appropriate stop list is returned is currently the only supported string value.
If a list, that list is assumed to contain stop words, all of which will be removed from the resulting tokens.
If None, no stop words will be used. max_df can be set to a value in the range [0.7, 1.0) to automatically detect and filter stop words based on intra corpus document frequency of terms.
If not None, build a vocabulary that only consider the top max_features ordered by term frequency across the corpus.
This parameter is ignored if vocabulary is not None.
See also
Full API documentation: TfidfVectorizerScikitsLearnNode
Principal component analysis (PCA) using randomized SVD
This node has been automatically generated by wrapping the sklearn.decomposition.pca.RandomizedPCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Linear dimensionality reduction using approximated Singular Value Decomposition of the data and keeping only the most significant singular vectors to project the data to a lower dimensional space.
This implementation uses a randomized SVD implementation and can handle both scipy.sparse and numpy dense arrays as input.
Parameters
When True (False by default) the components_ vectors are divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.
Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hard-wired assumptions.
Attributes
Examples
>>> import numpy as np
>>> from sklearn.decomposition import RandomizedPCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = RandomizedPCA(n_components=2)
>>> pca.fit(X)
RandomizedPCA(copy=True, iterated_power=3, n_components=2,
random_state=<mtrand.RandomState object at 0x...>, whiten=False)
>>> print(pca.explained_variance_ratio_)
[ 0.99244... 0.00755...]
See also
PCA ProbabilisticPCA
References
| [Halko2009] | Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions Halko, et al., 2009 (arXiv:909) |
| [MRT] | A randomized algorithm for the decomposition of matrices Per-Gunnar Martinsson, Vladimir Rokhlin and Mark Tygert |
Full API documentation: RandomizedPCAScikitsLearnNode
Mini-batch Sparse Principal Components Analysis
This node has been automatically generated by wrapping the sklearn.decomposition.sparse_pca.MiniBatchSparsePCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha.
Parameters
verbose :
- degree of output the procedure will print
Attributes
See also
PCA SparsePCA DictionaryLearning
Full API documentation: MiniBatchSparsePCAScikitsLearnNode
Non-Negative matrix factorization by Projected Gradient (NMF)
This node has been automatically generated by wrapping the sklearn.decomposition.nmf.ProjectedGradientNMF class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Method used to initialize the procedure. Default: ‘nndsvdar’ Valid options:
'nndsvd': Nonnegative Double Singular Value Decomposition (NNDSVD)
initialization (better for sparseness)
'nndsvda': NNDSVD with zeros filled with the average of X
(better when sparsity is not desired)
'nndsvdar': NNDSVD with zeros filled with small random values
(generally faster, less accurate alternative to NNDSVDa
for when sparsity is not desired)
int seed or RandomState: non-negative random matrices
Attributes
Examples
>>> import numpy as np
>>> X = np.array([[1,1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])
>>> from sklearn.decomposition import ProjectedGradientNMF
>>> model = ProjectedGradientNMF(n_components=2, init=0)
>>> model.fit(X)
ProjectedGradientNMF(beta=1, eta=0.1, init=0, max_iter=200, n_components=2,
nls_max_iter=2000, sparseness=None, tol=0.0001)
>>> model.components_
array([[ 0.77032744, 0.11118662],
[ 0.38526873, 0.38228063]])
>>> model.reconstruction_err_
0.00746...
>>> model = ProjectedGradientNMF(n_components=2, init=0,
... sparseness='components')
>>> model.fit(X)
ProjectedGradientNMF(beta=1, eta=0.1, init=0, max_iter=200, n_components=2,
nls_max_iter=2000, sparseness='components', tol=0.0001)
>>> model.components_
array([[ 1.67481991, 0.29614922],
[-0. , 0.4681982 ]])
>>> model.reconstruction_err_
0.513...
Notes
This implements
C.-J. Lin. Projected gradient methods for non-negative matrix factorization. Neural Computation, 19(2007), 2756-2779. http://www.csie.ntu.edu.tw/~cjlin/nmf/
P. Hoyer. Non-negative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Research 2004.
NNDSVD is introduced in
C. Boutsidis, E. Gallopoulos: SVD based initialization: A head start for nonnegative matrix factorization - Pattern Recognition, 2008 http://www.cs.rpi.edu/~boutsc/files/nndsvd.pdf
Full API documentation: ProjectedGradientNMFScikitsLearnNode
Elastic Net model with iterative fitting along a regularization path
This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.ElasticNetCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The best model is selected by cross-validation.
Parameters
Attributes
Notes
See examples/linear_model/lasso_path_with_crossvalidation.py for an example.
To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.
The parameter rho corresponds to alpha in the glmnet R package while alpha corresponds to the lambda parameter in glmnet. More specifically, the optimization objective is:
1 / (2 * n_samples) * ||y - Xw||^2_2 +
+ alpha * rho * ||w||_1 + 0.5 * alpha * (1 - rho) * ||w||^2_2
If you are interested in controlling the L1 and L2 penalty separately, keep in mind that this is equivalent to:
a * L1 + b * L2
for:
alpha = a + b and rho = a / (a + b)
See also
enet_path ElasticNet
Full API documentation: ElasticNetCVScikitsLearnNode
Lasso model fit with Lars using BIC or AIC for model selection
This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.LassoLarsIC class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The optimization objective for Lasso is:
(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1
AIC is the Akaike information criterion and BIC is the Bayes Information criterion. Such criteria are useful to select the value of the regularization parameter by making a trade-off between the goodness of fit and the complexity of the model. A good model should explain well the data while being simple.
Parameters
Attributes
Examples
>>> from sklearn import linear_model
>>> clf = linear_model.LassoLarsIC(criterion='bic')
>>> clf.fit([[-1, 1], [0, 0], [1, 1]], [-1.1111, 0, -1.1111])
...
LassoLarsIC(copy_X=True, criterion='bic', eps=..., fit_intercept=True,
max_iter=500, normalize=True, precompute='auto',
verbose=False)
>>> print(clf.coef_)
[ 0. -1.11...]
Notes
The estimation of the number of degrees of freedom is given by:
“On the degrees of freedom of the lasso” Hui Zou, Trevor Hastie, and Robert Tibshirani Ann. Statist. Volume 35, Number 5 (2007), 2173-2192.
http://en.wikipedia.org/wiki/Akaike_information_criterion http://en.wikipedia.org/wiki/Bayesian_information_criterion
See also
lars_path, LassoLars, LassoLarsCV
Full API documentation: LassoLarsICScikitsLearnNode
Feature ranking with recursive feature elimination.
This node has been automatically generated by wrapping the sklearn.feature_selection.rfe.RFE class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and weights are assigned to each one of them. Then, features whose absolute weights are the smallest are pruned from the current set features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached.
Parameters
A supervised learning estimator with a fit method that updates a coef_ attribute that holds the fitted parameters. Important features must correspond to high absolute values in the coef_ array.
For instance, this is the case for most supervised learning algorithms such as Support Vector Classifiers and Generalized Linear Models from the svm and linear_model modules.
Attributes
Examples
The following example shows how to retrieve the 5 right informative features in the Friedman #1 dataset.
>>> from sklearn.datasets import make_friedman1
>>> from sklearn.feature_selection import RFE
>>> from sklearn.svm import SVR
>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
>>> estimator = SVR(kernel="linear")
>>> selector = RFE(estimator, 5, step=1)
>>> selector = selector.fit(X, y)
>>> selector.support_
array([ True, True, True, True, True,
False, False, False, False, False], dtype=bool)
>>> selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])
References
| [1] | Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., “Gene selection for cancer classification using support vector machines”, Mach. Learn., 46(1-3), 389–422, 2002. |
Full API documentation: RFEScikitsLearnNode
Principal component analysis (PCA)
This node has been automatically generated by wrapping the sklearn.decomposition.pca.PCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Linear dimensionality reduction using Singular Value Decomposition of the data and keeping only the most significant singular vectors to project the data to a lower dimensional space.
This implementation uses the scipy.linalg implementation of the singular value decomposition. It only works for dense arrays and is not scalable to large dimensional data.
The time complexity of this implementation is O(n ** 3) assuming n ~ n_samples ~ n_features.
Parameters
Number of components to keep. if n_components is not set all components are kept:
n_components == min(n_samples, n_features)
if n_components == ‘mle’, Minka’s MLE is used to guess the dimension if 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components
When True (False by default) the components_ vectors are divided by n_samples times singular values to ensure uncorrelated outputs with unit component-wise variances.
Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making there data respect some hard-wired assumptions.
Attributes
Notes
For n_components=’mle’, this class uses the method of `Thomas P. Minka:
Automatic Choice of Dimensionality for PCA. NIPS 2000: 598-604`
Due to implementation subtleties of the Singular Value Decomposition (SVD), which is used in this implementation, running fit twice on the same matrix can lead to principal components with signs flipped (change in direction). For this reason, it is important to always use the same estimator object to transform data in a consistent fashion.
Examples
>>> import numpy as np
>>> from sklearn.decomposition import PCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = PCA(n_components=2)
>>> pca.fit(X)
PCA(copy=True, n_components=2, whiten=False)
>>> print(pca.explained_variance_ratio_)
[ 0.99244... 0.00755...]
See also
ProbabilisticPCA RandomizedPCA KernelPCA SparsePCA
Full API documentation: PCAScikitsLearnNode
Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer
This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.MultiTaskLasso class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The optimization objective for Lasso is:
(1 / (2 * n_samples)) * ||Y - XW||^2_Fro + alpha * ||W||_21
Where:
||W||_21 = \sum_i \sqrt{\sum_j w_{ij}^2}
i.e. the sum of norm of earch row.
Parameters
Attributes
Examples
>>> from sklearn import linear_model
>>> clf = linear_model.MultiTaskLasso(alpha=0.1)
>>> clf.fit([[0,0], [1, 1], [2, 2]], [[0, 0], [1, 1], [2, 2]])
MultiTaskLasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
normalize=False, tol=0.0001, warm_start=False)
>>> print clf.coef_
[[ 0.89393398 0. ]
[ 0.89393398 0. ]]
>>> print clf.intercept_
[ 0.10606602 0.10606602]
See also
Lasso, MultiTaskElasticNet
Notes
The algorithm used to fit the model is coordinate descent.
To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.
Full API documentation: MultiTaskLassoScikitsLearnNode
Randomized Logistic Regression
This node has been automatically generated by wrapping the sklearn.linear_model.randomized_l1.RandomizedLogisticRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Randomized Regression works by resampling the train data and computing a LogisticRegression on each resampling. In short, the features selected more often are good features. It is also known as stability selection.
Parameters
Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:
- None, in which case all the jobs are immediatly created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs
- An int, giving the exact number of total jobs that are spawned
- A string, giving an expression as a function of n_jobs, as in ‘2*n_jobs’
Attributes
Examples
>>> from sklearn.linear_model import RandomizedLogisticRegression
>>> randomized_logistic = RandomizedLogisticRegression()
Notes
See examples/linear_model/plot_randomized_lasso.py for an example.
References
Stability selection Nicolai Meinshausen, Peter Buhlmann Journal of the Royal Statistical Society: Series B Volume 72, Issue 4, pages 417-473, September 2010 DOI: 10.1111/j.1467-9868.2010.00740.x
See also
RandomizedLasso, Lasso, ElasticNet
Full API documentation: RandomizedLogisticRegressionScikitsLearnNode
Filter: Select the p-values corresponding to Family-wise error rate
This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectFwe class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Function taking two arrays X and y, and returning 2 arrays:
Full API documentation: SelectFweScikitsLearnNode
Multi-task ElasticNet model trained with L1/L2 mixed-norm as regularizer
This node has been automatically generated by wrapping the sklearn.linear_model.coordinate_descent.MultiTaskElasticNet class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The optimization objective for Lasso is:
(1 / (2 * n_samples)) * ||Y - XW||^Fro_2
+ alpha * rho * ||W||_21 + 0.5 * alpha * (1 - rho) * ||W||_Fro^2
Where:
||W||_21 = \sum_i \sqrt{\sum_j w_{ij}^2}
i.e. the sum of norm of earch row.
Parameters
Attributes
Examples
>>> from sklearn import linear_model
>>> clf = linear_model.MultiTaskElasticNet(alpha=0.1)
>>> clf.fit([[0,0], [1, 1], [2, 2]], [[0, 0], [1, 1], [2, 2]])
...
MultiTaskElasticNet(alpha=0.1, copy_X=True, fit_intercept=True,
max_iter=1000, normalize=False, rho=0.5, tol=0.0001,
warm_start=False)
>>> print clf.coef_
[[ 0.45663524 0.45612256]
[ 0.45663524 0.45612256]]
>>> print clf.intercept_
[ 0.0872422 0.0872422]
See also
ElasticNet, MultiTaskLasso
Notes
The algorithm used to fit the model is coordinate descent.
To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.
Full API documentation: MultiTaskElasticNetScikitsLearnNode
Sparse coding
This node has been automatically generated by wrapping the sklearn.decomposition.dict_learning.SparseCoder class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Finds a sparse representation of data against a fixed, precomputed dictionary.
Each row of the result is the solution to a sparse coding problem. The goal is to find a sparse array code such that:
X ~= code * dictionary
Parameters
Algorithm used to transform the data:
Attributes
See also
DictionaryLearning MiniBatchDictionaryLearning SparsePCA MiniBatchSparsePCA sparse_encode
Full API documentation: SparseCoderScikitsLearnNode
Gaussian Mixture Model
This node has been automatically generated by wrapping the sklearn.mixture.gmm.GMM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Representation of a Gaussian mixture model probability distribution. This class allows for easy evaluation of, sampling from, and maximum-likelihood estimation of the parameters of a GMM distribution.
Initializes parameters such that every mixture component has zero mean and identity covariance.
Parameters
Attributes
Covariance parameters for each mixture component. The shape depends on covariance_type:
(n_components,) if 'spherical',
(n_features, n_features) if 'tied',
(n_components, n_features) if 'diag',
(n_components, n_features, n_features) if 'full'
See Also
Examples
>>> import numpy as np
>>> from sklearn import mixture
>>> np.random.seed(1)
>>> g = mixture.GMM(n_components=2)
>>> # Generate random observations with two modes centered on 0
>>> # and 10 to use for training.
>>> obs = np.concatenate((np.random.randn(100, 1),
... 10 + np.random.randn(300, 1)))
>>> g.fit(obs)
GMM(covariance_type='diag', init_params='wmc', min_covar=0.001,
n_components=2, n_init=1, n_iter=100, params='wmc',
random_state=None, thresh=0.01)
>>> np.round(g.weights_, 2)
array([ 0.75, 0.25])
>>> np.round(g.means_, 2)
array([[ 10.05],
[ 0.06]])
>>> np.round(g.covars_, 2)
array([[[ 1.02]],
[[ 0.96]]])
>>> g.predict([[0], [2], [9], [10]])
array([1, 1, 0, 0]...)
>>> np.round(g.score([[0], [2], [9], [10]]), 2)
array([-2.19, -4.58, -1.75, -1.21])
>>> # Refit the model on new data (initial parameters remain the
>>> # same), this time with an even split between the two modes.
>>> g.fit(20 * [[0]] + 20 * [[10]])
GMM(covariance_type='diag', init_params='wmc', min_covar=0.001,
n_components=2, n_init=1, n_iter=100, params='wmc',
random_state=None, thresh=0.01)
>>> np.round(g.weights_, 2)
array([ 0.5, 0.5])
Full API documentation: GMMScikitsLearnNode
A decision tree classifier.
This node has been automatically generated by wrapping the sklearn.tree.tree.DecisionTreeClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
See also
DecisionTreeRegressor
References
| [1] | http://en.wikipedia.org/wiki/Decision_tree_learning |
| [2] | L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification and Regression Trees”, Wadsworth, Belmont, CA, 1984. |
| [3] | T. Hastie, R. Tibshirani and J. Friedman. “Elements of Statistical Learning”, Springer, 2009. |
| [4] | L. Breiman, and A. Cutler, “Random Forests”, http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm |
Examples
>>> from sklearn.datasets import load_iris
>>> from sklearn.cross_validation import cross_val_score
>>> from sklearn.tree import DecisionTreeClassifier
>>> clf = DecisionTreeClassifier(random_state=0)
>>> iris = load_iris()
>>> cross_val_score(clf, iris.data, iris.target, cv=10)
...
...
array([ 1. , 0.93..., 0.86..., 0.93..., 0.93...,
0.93..., 0.93..., 1. , 0.93..., 1. ])
Full API documentation: DecisionTreeClassifierScikitsLearnNode
Pipeline of transforms with a final estimator.
This node has been automatically generated by wrapping the sklearn.pipeline.Pipeline class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be ‘transforms’, that is, they must implements fit and transform methods. The final estimator needs only implements fit.
The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below.
Parameters
Attributes
Examples
>>> from sklearn import svm
>>> from sklearn.datasets import samples_generator
>>> from sklearn.feature_selection import SelectKBest
>>> from sklearn.feature_selection import f_regression
>>> from sklearn.pipeline import Pipeline
>>> # generate some data to play with
>>> X, y = samples_generator.make_classification(
... n_informative=5, n_redundant=0, random_state=42)
>>> # ANOVA SVM-C
>>> anova_filter = SelectKBest(f_regression, k=5)
>>> clf = svm.SVC(kernel='linear')
>>> anova_svm = Pipeline([('anova', anova_filter), ('svc', clf)])
>>> # You can set the parameters using the names issued
>>> # For instance, fit using a k of 10 in the SelectKBest
>>> # and a parameter 'C' of the svn
>>> anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)
...
Pipeline(steps=[...])
>>> prediction = anova_svm.predict(X)
>>> anova_svm.score(X, y)
0.75
Full API documentation: PipelineScikitsLearnNode
Univariate feature selector with configurable strategy
This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.GenericUnivariateSelect class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Function taking two arrays X and y, and returning 2 arrays:
Full API documentation: GenericUnivariateSelectScikitsLearnNode
Naive Bayes classifier for multivariate Bernoulli models.
This node has been automatically generated by wrapping the sklearn.naive_bayes.BernoulliNB class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Like MultinomialNB, this classifier is suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features.
Parameters
Attributes
Examples
>>> import numpy as np
>>> X = np.random.randint(2, size=(6, 100))
>>> Y = np.array([1, 2, 3, 4, 4, 5])
>>> from sklearn.naive_bayes import BernoulliNB
>>> clf = BernoulliNB()
>>> clf.fit(X, Y)
BernoulliNB(alpha=1.0, binarize=0.0, fit_prior=True)
>>> print(clf.predict(X[2]))
[3]
References
C.D. Manning, P. Raghavan and H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234–265.
A. McCallum and K. Nigam (1998). A comparison of event models for naive Bayes text classification. Proc. AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41–48.
V. Metsis, I. Androutsopoulos and G. Paliouras (2006). Spam filtering with naive Bayes – Which naive Bayes? 3rd Conf. on Email and Anti-Spam (CEAS).
Full API documentation: BernoulliNBScikitsLearnNode
Logistic Regression (aka logit, MaxEnt) classifier.
This node has been automatically generated by wrapping the sklearn.linear_model.logistic.LogisticRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
In the multiclass case, the training algorithm uses a one-vs.-all (OvA) scheme, rather than the “true” multinomial LR.
This class implements L1 and L2 regularized logistic regression using the liblinear library. It can handle both dense and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit floats for optimal performance; any other input format will be converted (and copied).
Parameters
Attributes
Coefficient of the features in the decision function.
coef_ is readonly property derived from raw_coef_ that follows the internal memory layout of liblinear.
See also
LinearSVC
Notes
The underlying C implementation uses a random number generator to select features when fitting the model. It is thus not uncommon, to have slightly different results for the same input data. If that happens, try with a smaller tol parameter.
References:
Full API documentation: LogisticRegressionScikitsLearnNode
NuSVC for sparse matrices (csr).
This node has been automatically generated by wrapping the sklearn.svm.sparse.classes.NuSVC class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
See sklearn.svm.NuSVC for a complete list of parameters
Notes
For best results, this accepts a matrix in csr format (scipy.sparse.csr), but should be able to convert from any array-like object (including other sparse representations).
Examples
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> y = np.array([1, 1, 2, 2])
>>> from sklearn.svm.sparse import NuSVC
>>> clf = NuSVC()
>>> clf.fit(X, y)
NuSVC(cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel='rbf', nu=0.5, probability=False, shrinking=True, tol=0.001,
verbose=False)
>>> print(clf.predict([[-0.8, -1]]))
[ 1.]
Full API documentation: NuSVCScikitsLearnNode
Sparse Principal Components Analysis (SparsePCA)
This node has been automatically generated by wrapping the sklearn.decomposition.sparse_pca.SparsePCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha.
Parameters
verbose :
- Degree of verbosity of the printed output.
Attributes
See also
PCA MiniBatchSparsePCA DictionaryLearning
Full API documentation: SparsePCAScikitsLearnNode
Orthogonal Mathching Pursuit model (OMP)
This node has been automatically generated by wrapping the sklearn.linear_model.omp.OrthogonalMatchingPursuit class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
Notes
Orthogonal matching pursuit was introduced in G. Mallat, Z. Zhang, Matching pursuits with time-frequency dictionaries, IEEE Transactions on Signal Processing, Vol. 41, No. 12. (December 1993), pp. 3397-3415. (http://blanche.polytechnique.fr/~mallat/papiers/MallatPursuit93.pdf)
This implementation is based on Rubinstein, R., Zibulevsky, M. and Elad, M., Efficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit Technical Report - CS Technion, April 2008. http://www.cs.technion.ac.il/~ronrubin/Publications/KSVD-OMP-v2.pdf
See also
orthogonal_mp orthogonal_mp_gram lars_path Lars LassoLars decomposition.sparse_encode
Full API documentation: OrthogonalMatchingPursuitScikitsLearnNode
Filter: Select the pvalues below alpha based on a FPR test.
This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectFpr class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
FPR test stands for False Positive Rate test. It controls the total amount of false detections.
Parameters
Function taking two arrays X and y, and returning 2 arrays:
Full API documentation: SelectFprScikitsLearnNode
Encode labels with value between 0 and n_classes-1.
This node has been automatically generated by wrapping the sklearn.preprocessing.LabelEncoder class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Attributes
Examples
LabelEncoder can be used to normalize labels.
>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6])
array([0, 0, 1, 2])
>>> le.inverse_transform([0, 0, 1, 2])
array([1, 1, 2, 6])
It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels.
>>> le = preprocessing.LabelEncoder()
>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
LabelEncoder()
>>> list(le.classes_)
['amsterdam', 'paris', 'tokyo']
>>> le.transform(["tokyo", "tokyo", "paris"])
array([2, 2, 1])
>>> list(le.inverse_transform([2, 2, 1]))
['tokyo', 'tokyo', 'paris']
Full API documentation: LabelEncoderScikitsLearnNode
Quadratic Discriminant Analysis (QDA)
This node has been automatically generated by wrapping the sklearn.qda.QDA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
A classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule.
The model fits a Gaussian density to each class.
Parameters
Attributes
Examples
>>> from sklearn.qda import QDA
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = QDA()
>>> clf.fit(X, y)
QDA(priors=None)
>>> print(clf.predict([[-0.8, -1]]))
[1]
See also
sklearn.lda.LDA: Linear discriminant analysis
Full API documentation: QDAScikitsLearnNode
This node has been automatically generated by wrapping the sklearn.ensemble.gradient_boosting.LogOddsEstimator class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Full API documentation: LogOddsEstimatorScikitsLearnNode
This node has been automatically generated by wrapping the sklearn.feature_extraction.text.Vectorizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Full API documentation: VectorizerScikitsLearnNode
Linear model fitted by minimizing a regularized empirical loss with SGD.
This node has been automatically generated by wrapping the sklearn.linear_model.stochastic_gradient.SGDClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
SGD stands for Stochastic Gradient Descent: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).
The regularizer is a penalty added to the loss function that shrinks model parameters towards the zero vector using either the squared euclidean norm L2 or the absolute norm L1 or a combination of both (Elastic Net). If the parameter update crosses the 0.0 value because of the regularizer, the update is truncated to 0.0 to allow for learning sparse models and achieve online feature selection.
This implementation works with data represented as dense or sparse arrays of floating point values for the features.
Parameters
The learning rate:
Preset for the class_weight fit parameter.
Weights associated with classes. If not given, all classes are supposed to have weight one.
The “auto” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies.
Attributes
coef_ : array, shape = [1, n_features] if n_classes == 2 else [n_classes, n_features]
Weights assigned to the features.
Examples
>>> import numpy as np
>>> from sklearn import linear_model
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> Y = np.array([1, 1, 2, 2])
>>> clf = linear_model.SGDClassifier()
>>> clf.fit(X, Y)
...
SGDClassifier(alpha=0.0001, class_weight=None, epsilon=0.1, eta0=0.0,
fit_intercept=True, learning_rate='optimal', loss='hinge',
n_iter=5, n_jobs=1, penalty='l2', power_t=0.5, rho=0.85, seed=0,
shuffle=False, verbose=0, warm_start=False)
>>> print(clf.predict([[-0.8, -1]]))
[1]
See also
LinearSVC, LogisticRegression, Perceptron
Full API documentation: SGDClassifierScikitsLearnNode
Lasso model fit with Least Angle Regression a.k.a. Lars
This node has been automatically generated by wrapping the sklearn.linear_model.least_angle.LassoLars class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
It is a Linear Model trained with an L1 prior as regularizer.
The optimization objective for Lasso is:
(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1
Parameters
Attributes
Examples
>>> from sklearn import linear_model
>>> clf = linear_model.LassoLars(alpha=0.01)
>>> clf.fit([[-1, 1], [0, 0], [1, 1]], [-1, 0, -1])
...
LassoLars(alpha=0.01, copy_X=True, eps=..., fit_intercept=True,
fit_path=True, max_iter=500, normalize=True, precompute='auto',
verbose=False)
>>> print(clf.coef_)
[ 0. -0.963257...]
See also
lars_path lasso_path Lasso LassoCV LassoLarsCV sklearn.decomposition.sparse_encode
http://en.wikipedia.org/wiki/Least_angle_regression
Full API documentation: LassoLarsScikitsLearnNode
Kernel Principal component analysis (KPCA)
This node has been automatically generated by wrapping the sklearn.decomposition.kernel_pca.KernelPCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Non-linear dimensionality reduction through the use of kernels.
Parameters
Attributes
lambdas_, alphas_:
- Eigenvalues and eigenvectors of the centered kernel matrix
dual_coef_:
- Inverse transform matrix
X_transformed_fit_:
- Projection of the fitted data on the kernel principal components
References
Kernel PCA was intoduced in:
- Bernhard Schoelkopf, Alexander J. Smola,
- and Klaus-Robert Mueller. 1999. Kernel principal
- component analysis. In Advances in kernel methods,
- MIT Press, Cambridge, MA, USA 327-352.
Full API documentation: KernelPCAScikitsLearnNode
Standardize features by removing the mean and scaling to unit variance
This node has been automatically generated by wrapping the sklearn.preprocessing.Scaler class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Centering and scaling happen indepently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using the transform method.
Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual feature do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).
For instance many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered around 0 and have variance in the same order. If a feature has a variance that is orders of magnitude larger that others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.
Parameters
Attributes
See also
sklearn.preprocessing.scale() to perform centering and scaling without using the Transformer object oriented API
sklearn.decomposition.RandomizedPCA with whiten=True to further remove the linear correlation across features.
Full API documentation: ScalerScikitsLearnNode
CCA Canonical Correlation Analysis. CCA inherits from PLS with mode=”B” and deflation_mode=”canonical”.
This node has been automatically generated by wrapping the sklearn.pls.CCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
Notes
For each component k, find the weights u, v that maximizes max corr(Xk u, Yk v), such that |u| = |v| = 1
Note that it maximizes only the correlations between the scores.
The residual matrix of X (Xk+1) block is obtained by the deflation on the current X score: x_score.
The residual matrix of Y (Yk+1) block is obtained by deflation on the current Y score.
Examples
>>> from sklearn.pls import PLSCanonical, PLSRegression, CCA
>>> X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [3.,5.,4.]]
>>> Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
>>> cca = CCA(n_components=1)
>>> cca.fit(X, Y)
...
CCA(copy=True, max_iter=500, n_components=1, scale=True, tol=1e-06)
>>> X_c, Y_c = cca.transform(X, Y)
References
Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. Technical Report 371, Department of Statistics, University of Washington, Seattle, 2000.
In french but still a reference:
Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris:
Editions Technic.
See also
PLSCanonical PLSSVD
Full API documentation: CCAScikitsLearnNode
Center a kernel matrix
This node has been automatically generated by wrapping the sklearn.preprocessing.KernelCenterer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
This is equivalent to centering phi(X) with sklearn.preprocessing.Scaler(with_std=False).
Full API documentation: KernelCentererScikitsLearnNode
Filter: Select the p-values for an estimated false discovery rate
This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectFdr class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
This uses the Benjamini-Hochberg procedure. alpha is the target false discovery rate.
Parameters
Function taking two arrays X and y, and returning 2 arrays:
Full API documentation: SelectFdrScikitsLearnNode
An extremely randomized tree classifier.
This node has been automatically generated by wrapping the sklearn.tree.tree.ExtraTreeClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Extra-trees differ from classic decision trees in the way they are built. When looking for the best split to separate the samples of a node into two groups, random splits are drawn for each of the max_features randomly selected features and the best split among those is chosen. When max_features is set 1, this amounts to building a totally random decision tree.
Warning: Extra-trees should only be used within ensemble methods.
See also
ExtraTreeRegressor, ExtraTreesClassifier, ExtraTreesRegressor
References
| [1] | P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006. |
Full API documentation: ExtraTreeClassifierScikitsLearnNode
Filter: Select the k lowest p-values.
This node has been automatically generated by wrapping the sklearn.feature_selection.univariate_selection.SelectKBest class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Function taking two arrays X and y, and returning 2 arrays:
Notes
Ties between features with equal p-values will be broken in an unspecified way.
Full API documentation: SelectKBestScikitsLearnNode
Normalize samples individually to unit norm
This node has been automatically generated by wrapping the sklearn.preprocessing.Normalizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.
This transformer is able to work both with dense numpy arrays and scipy.sparse matrix (use CSR format if you want to avoid the burden of a copy / conversion).
Scaling inputs to unit norms is a common operation for text classification or clustering for instance. For instance the dot product of two l2-normalized TF-IDF vectors is the cosine similarity of the vectors and is the base similarity metric for the Vector Space Model commonly used by the Information Retrieval community.
Parameters
Notes
This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.
See also
sklearn.preprocessing.normalize() equivalent function without the object oriented API
Full API documentation: NormalizerScikitsLearnNode
Transform a count matrix to a normalized tf or tf–idf representation
This node has been automatically generated by wrapping the sklearn.feature_extraction.text.TfidfTransformer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Tf means term-frequency while tf–idf means term-frequency times inverse document-frequency. This is a common term weighting scheme in information retrieval, that has also found good use in document classification.
The goal of using tf–idf instead of the raw frequencies of occurrence of a token in a given document is to scale down the impact of tokens that occur very frequently in a given corpus and that are hence empirically less informative than features that occur in a small fraction of the training corpus.
In the SMART notation used in IR, this class implements several tf–idf variants. Tf is always “n” (natural), idf is “t” iff use_idf is given, “n” otherwise, and normalization is “c” iff norm=’l2’, “n” iff norm=None.
Parameters
References
| [Yates2011] | R. Baeza-Yates and B. Ribeiro-Neto (2011). Modern Information Retrieval. Addison Wesley, pp. 68–74. |
| [MSR2008] | C.D. Manning, H. Schütze and P. Raghavan (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 121–125. |
Full API documentation: TfidfTransformerScikitsLearnNode
Gradient Boosting for classification.
This node has been automatically generated by wrapping the sklearn.ensemble.gradient_boosting.GradientBoostingClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage n_classes_ regression trees are fit on the negative gradient of the binomial or multinomial deviance loss function. Binary classification is a special case where only a single regression tree is induced.
Parameters
Attributes
Examples
>>> samples = [[0, 0, 2], [1, 0, 0]]
>>> labels = [0, 1]
>>> from sklearn.ensemble import GradientBoostingClassifier
>>> gb = GradientBoostingClassifier().fit(samples, labels)
>>> print gb.predict([[0.5, 0, 0]])
[0]
See also
sklearn.tree.DecisionTreeClassifier, RandomForestClassifier
References
J. Friedman, Greedy Function Approximation: A Gradient Boosting Machine, The Annals of Statistics, Vol. 29, No. 5, 2001.
T. Hastie, R. Tibshirani and J. Friedman. Elements of Statistical Learning Ed. 2, Springer, 2009.
Full API documentation: GradientBoostingClassifierScikitsLearnNode
Hidden Markov Model with Gaussin mixture emissions
This node has been automatically generated by wrapping the sklearn.hmm.GMMHMM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Attributes
Examples
>>> from sklearn.hmm import GMMHMM
>>> GMMHMM(n_components=2, n_mix=10, covariance_type='diag')
...
GMMHMM(algorithm='viterbi', covariance_type='diag',...
See Also
GaussianHMM : HMM with Gaussian emissions
Full API documentation: GMMHMMScikitsLearnNode
A tree regressor.
This node has been automatically generated by wrapping the sklearn.tree.tree.DecisionTreeRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
Attributes
See also
DecisionTreeClassifier
References
| [1] | http://en.wikipedia.org/wiki/Decision_tree_learning |
| [2] | L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification and Regression Trees”, Wadsworth, Belmont, CA, 1984. |
| [3] | T. Hastie, R. Tibshirani and J. Friedman. “Elements of Statistical Learning”, Springer, 2009. |
| [4] | L. Breiman, and A. Cutler, “Random Forests”, http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm |
Examples
>>> from sklearn.datasets import load_boston
>>> from sklearn.cross_validation import cross_val_score
>>> from sklearn.tree import DecisionTreeRegressor
>>> boston = load_boston()
>>> regressor = DecisionTreeRegressor(random_state=0)
R2 scores (a.k.a. coefficient of determination) over 10-folds CV:
>>> cross_val_score(regressor, boston.data, boston.target, cv=10)
...
...
array([ 0.61..., 0.57..., -0.34..., 0.41..., 0.75...,
0.07..., 0.29..., 0.33..., -1.42..., -1.77...])
Full API documentation: DecisionTreeRegressorScikitsLearnNode
Linear least squares with l2 regularization.
This node has been automatically generated by wrapping the sklearn.linear_model.ridge.Ridge class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
This model solves a regression model where the loss function is the linear least squares function and regularization is given by the l2-norm. Also known as Ridge Regression or Tikhonov regularization. This estimator has built-in support for multi-variate regression (i.e., when y is a 2d-array of shape [n_samples, n_responses]).
Parameters
Attributes
See also
RidgeClassifier, RidgeCV
Examples
>>> from sklearn.linear_model import Ridge
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> np.random.seed(0)
>>> y = np.random.randn(n_samples)
>>> X = np.random.randn(n_samples, n_features)
>>> clf = Ridge(alpha=1.0)
>>> clf.fit(X, y)
Ridge(alpha=1.0, copy_X=True, fit_intercept=True, normalize=False,
tol=0.001)
Full API documentation: RidgeScikitsLearnNode
epsilon-Support Vector Regression.
This node has been automatically generated by wrapping the sklearn.svm.classes.SVR class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
The free parameters in the model are C and epsilon.
The implementations is a based on libsvm.
Parameters
Attributes
Weights asigned to the features (coefficients in the primal problem). This is only available in the case of linear kernel.
coef_ is readonly property derived from dual_coef_ and support_vectors_
Examples
>>> from sklearn.svm import SVR
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> np.random.seed(0)
>>> y = np.random.randn(n_samples)
>>> X = np.random.randn(n_samples, n_features)
>>> clf = SVR(C=1.0, epsilon=0.2)
>>> clf.fit(X, y)
SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.2, gamma=0.0,
kernel='rbf', probability=False, shrinking=True, tol=0.001,
verbose=False)
See also
Full API documentation: SVRScikitsLearnNode
Feature ranking with recursive feature elimination and cross-validated selection of the best number of features.
This node has been automatically generated by wrapping the sklearn.feature_selection.rfe.RFECV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
A supervised learning estimator with a fit method that updates a coef_ attribute that holds the fitted parameters. Important features must correspond to high absolute values in the coef_ array.
For instance, this is the case for most supervised learning algorithms such as Support Vector Classifiers and Generalized Linear Models from the svm and linear_model modules.
Attributes
Examples
The following example shows how to retrieve the a-priori not known 5 informative features in the Friedman #1 dataset.
>>> from sklearn.datasets import make_friedman1
>>> from sklearn.feature_selection import RFECV
>>> from sklearn.svm import SVR
>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
>>> estimator = SVR(kernel="linear")
>>> selector = RFECV(estimator, step=1, cv=5)
>>> selector = selector.fit(X, y)
>>> selector.support_
array([ True, True, True, True, True,
False, False, False, False, False], dtype=bool)
>>> selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])
References
| [1] | Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., “Gene selection for cancer classification using support vector machines”, Mach. Learn., 46(1-3), 389–422, 2002. |
Full API documentation: RFECVScikitsLearnNode
Bayesian ridge regression
This node has been automatically generated by wrapping the sklearn.linear_model.bayes.BayesianRidge class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Fit a Bayesian ridge model and optimize the regularization parameters lambda (precision of the weights) and alpha (precision of the noise).
Parameters
Attributes
Examples
>>> from sklearn import linear_model
>>> clf = linear_model.BayesianRidge()
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
...
BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, compute_score=False,
copy_X=True, fit_intercept=True, lambda_1=1e-06, lambda_2=1e-06,
n_iter=300, normalize=False, tol=0.001, verbose=False)
>>> clf.predict([[1, 1]])
array([ 1.])
Notes
See examples/linear_model/plot_bayesian_ridge.py for an example.
Full API documentation: BayesianRidgeScikitsLearnNode
PLS regression
This node has been automatically generated by wrapping the sklearn.pls.PLSRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
PLSRegression implements the PLS 2 blocks regression known as PLS2 or PLS1 in case of one dimensional response. This class inherits from _PLS with mode=”A”, deflation_mode=”regression”, norm_y_weights=False and algorithm=”nipals”.
Parameters
Attributes
Notes
For each component k, find weights u, v that optimizes:
max corr(Xk u, Yk v) * var(Xk u) var(Yk u), such that |u| = 1
Note that it maximizes both the correlations between the scores and the intra-block variances.
The residual matrix of X (Xk+1) block is obtained by the deflation on the current X score: x_score.
The residual matrix of Y (Yk+1) block is obtained by deflation on the current X score. This performs the PLS regression known as PLS2. This mode is prediction oriented.
This implementation provides the same results that 3 PLS packages provided in the R language (R-project):
- “mixOmics” with function pls(X, Y, mode = “regression”)
- “plspm ” with function plsreg2(X, Y)
- “pls” with function oscorespls.fit(X, Y)
Examples
>>> from sklearn.pls import PLSCanonical, PLSRegression, CCA
>>> X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
>>> Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
>>> pls2 = PLSRegression(n_components=2)
>>> pls2.fit(X, Y)
...
PLSRegression(copy=True, max_iter=500, n_components=2, scale=True,
tol=1e-06)
>>> Y_pred = pls2.predict(X)
References
Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. Technical Report 371, Department of Statistics, University of Washington, Seattle, 2000.
In french but still a reference:
Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris:
Editions Technic.
Full API documentation: PLSRegressionScikitsLearnNode
Additional layer on top of PCA that adds a probabilistic evaluationPrincipal component analysis (PCA)
This node has been automatically generated by wrapping the sklearn.decomposition.pca.ProbabilisticPCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Linear dimensionality reduction using Singular Value Decomposition of the data and keeping only the most significant singular vectors to project the data to a lower dimensional space.
This implementation uses the scipy.linalg implementation of the singular value decomposition. It only works for dense arrays and is not scalable to large dimensional data.
The time complexity of this implementation is O(n ** 3) assuming n ~ n_samples ~ n_features.
Parameters
Number of components to keep. if n_components is not set all components are kept:
n_components == min(n_samples, n_features)
if n_components == ‘mle’, Minka’s MLE is used to guess the dimension if 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components
When True (False by default) the components_ vectors are divided by n_samples times singular values to ensure uncorrelated outputs with unit component-wise variances.
Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making there data respect some hard-wired assumptions.
Attributes
Notes
For n_components=’mle’, this class uses the method of `Thomas P. Minka:
Automatic Choice of Dimensionality for PCA. NIPS 2000: 598-604`
Due to implementation subtleties of the Singular Value Decomposition (SVD), which is used in this implementation, running fit twice on the same matrix can lead to principal components with signs flipped (change in direction). For this reason, it is important to always use the same estimator object to transform data in a consistent fashion.
Examples
>>> import numpy as np
>>> from sklearn.decomposition import PCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = PCA(n_components=2)
>>> pca.fit(X)
PCA(copy=True, n_components=2, whiten=False)
>>> print(pca.explained_variance_ratio_)
[ 0.99244... 0.00755...]
See also
ProbabilisticPCA RandomizedPCA KernelPCA SparsePCA
Full API documentation: ProbabilisticPCAScikitsLearnNode
Ordinary least squares Linear Regression.
This node has been automatically generated by wrapping the sklearn.linear_model.base.LinearRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Attributes
Parameters
Notes
From the implementation point of view, this is just plain Ordinary Least Squares (numpy.linalg.lstsq) wrapped as a predictor object.
Full API documentation: LinearRegressionScikitsLearnNode
Binarize labels in a one-vs-all fashion
This node has been automatically generated by wrapping the sklearn.preprocessing.LabelBinarizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Several regression and binary classification algorithms are available in the scikit. A simple way to extend these algorithms to the multi-class classification case is to use the so-called one-vs-all scheme.
At learning time, this simply consists in learning one regressor or binary classifier per class. In doing so, one needs to convert multi-class labels to binary labels (belong or does not belong to the class). LabelBinarizer makes this process easy with the transform method.
At prediction time, one assigns the class for which the corresponding model gave the greatest confidence. LabelBinarizer makes this easy with the inverse_transform method.
Parameters
Attributes
Examples
>>> from sklearn import preprocessing
>>> lb = preprocessing.LabelBinarizer()
>>> lb.fit([1, 2, 6, 4, 2])
LabelBinarizer(neg_label=0, pos_label=1)
>>> lb.classes_
array([1, 2, 4, 6])
>>> lb.transform([1, 6])
array([[1, 0, 0, 0],
[0, 0, 0, 1]])
>>> lb.fit_transform([(1, 2), (3,)])
array([[1, 1, 0],
[0, 0, 1]])
>>> lb.classes_
array([1, 2, 3])
Full API documentation: LabelBinarizerScikitsLearnNode
Classifier implementing a vote among neighbors within a given radius
This node has been automatically generated by wrapping the sklearn.neighbors.classification.RadiusNeighborsClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
Parameters
weight function used in prediction. Possible values:
Uniform weights are used by default.
Algorithm used to compute the nearest neighbors:
Note: fitting on sparse input will override the setting of this parameter, using brute force.
Examples
>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import RadiusNeighborsClassifier
>>> neigh = RadiusNeighborsClassifier(radius=1.0)
>>> neigh.fit(X, y)
RadiusNeighborsClassifier(...)
>>> print(neigh.predict([[1.5]]))
[0]
See also
KNeighborsClassifier RadiusNeighborsRegressor KNeighborsRegressor NearestNeighbors
Notes
See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
Full API documentation: RadiusNeighborsClassifierScikitsLearnNode
Ridge classifier with built-in cross-validation.
This node has been automatically generated by wrapping the sklearn.linear_model.ridge.RidgeClassifierCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.
By default, it performs Generalized Cross-Validation, which is a form of efficient Leave-One-Out cross-validation. Currently, only the n_features > n_samples case is handled efficiently.
Parameters
Attributes
cv=None). After fit() has been called, this attribute will contain the mean squared errors (by default) or the values of the {loss,score}_func function (if provided in the constructor).
See also
Ridge: Ridge regression RidgeClassifier: Ridge classifier RidgeCV: Ridge regression with built-in cross validation
Notes
For multi-class classification, n_class classifiers are trained in a one-versus-all approach. Concretely, this is implemented by taking advantage of the multi-variate response support in Ridge.
Full API documentation: RidgeClassifierCVScikitsLearnNode