Package mdp :: Package nodes :: Class TheilSenRegressorScikitsLearnNode
[hide private]
[frames] | no frames]

Class TheilSenRegressorScikitsLearnNode

Theil-Sen Estimator: robust multivariate regression model.

This node has been automatically generated by wrapping the ``sklearn.linear_model.theil_sen.TheilSenRegressor`` class
from the ``sklearn`` library.  The wrapped instance can be accessed
through the ``scikits_alg`` attribute.

The algorithm calculates least square solutions on subsets with size
n_subsamples of the samples in X. Any value of n_subsamples between the
number of features and samples leads to an estimator with a compromise
between robustness and efficiency. Since the number of least square
solutions is "n_samples choose n_subsamples", it can be extremely large
and can therefore be limited with max_subpopulation. If this limit is
reached, the subsets are chosen randomly. In a final step, the spatial
median (or L1 median) is calculated of all least square solutions.

Read more in the :ref:`User Guide <theil_sen_regression>`.


fit_intercept : boolean, optional, default True
    Whether to calculate the intercept for this model. If set
    to false, no intercept will be used in calculations.

copy_X : boolean, optional, default True
    If True, X will be copied; else, it may be overwritten.

max_subpopulation : int, optional, default 1e4
    Instead of computing with a set of cardinality 'n choose k', where n is
    the number of samples and k is the number of subsamples (at least
    number of features), consider only a stochastic subpopulation of a
    given maximal size if 'n choose k' is larger than max_subpopulation.
    For other than small problem sizes this parameter will determine
    memory usage and runtime if n_subsamples is not changed.

n_subsamples : int, optional, default None
    Number of samples to calculate the parameters. This is at least the
    number of features (plus 1 if fit_intercept=True) and the number of
    samples as a maximum. A lower number leads to a higher breakdown
    point and a low efficiency while a high number leads to a low
    breakdown point and a high efficiency. If None, take the
    minimum number of subsamples leading to maximal robustness.
    If n_subsamples is set to n_samples, Theil-Sen is identical to least

max_iter : int, optional, default 300
    Maximum number of iterations for the calculation of spatial median.

tol : float, optional, default 1.e-3
    Tolerance when calculating spatial median.

random_state : RandomState or an int seed, optional, default None
    A random number generator instance to define the state of the
    random permutations generator.

n_jobs : integer, optional, default 1
    Number of CPUs to use during the cross validation. If ``-1``, use
    all the CPUs.

verbose : boolean, optional, default False
    Verbose mode when fitting the model.


``coef_`` : array, shape = (n_features)
    Coefficients of the regression model (median of distribution).

``intercept_`` : float
    Estimated intercept of regression model.

``breakdown_`` : float
    Approximated breakdown point.

``n_iter_`` : int
    Number of iterations needed for the spatial median.

``n_subpopulation_`` : int
    Number of combinations taken into account from 'n choose k', where n is
    the number of samples and k is the number of subsamples.


- Theil-Sen Estimators in a Multiple Linear Regression Model, 2009
  Xin Dang, Hanxiang Peng, Xueqin Wang and Heping Zhang

Instance Methods [hide private]
__init__(self, input_dim=None, output_dim=None, dtype=None, **kwargs)
Theil-Sen Estimator: robust multivariate regression model.
_execute(self, x)
Return the list of dtypes supported by this node. The types can be specified in any format allowed by numpy.dtype.
_stop_training(self, **kwargs)
Concatenate the collected data in a single array.
execute(self, x)
Predict using the linear model
stop_training(self, **kwargs)
Fit linear model.

Inherited from unreachable.newobject: __long__, __native__, __nonzero__, __unicode__, next

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

    Inherited from Cumulator
_train(self, *args)
Collect all input data in a list.
train(self, *args)
Collect all input data in a list.
    Inherited from Node
__add__(self, other)
__call__(self, x, *args, **kwargs)
Calling an instance of Node is equivalent to calling its execute method.
_check_input(self, x)
_check_output(self, y)
_check_train_args(self, x, *args, **kwargs)
_inverse(self, x)
_pre_execution_checks(self, x)
This method contains all pre-execution checks.
_pre_inversion_checks(self, y)
This method contains all pre-inversion checks.
_refcast(self, x)
Helper function to cast arrays to the internal dtype.
_set_dtype(self, t)
_set_input_dim(self, n)
_set_output_dim(self, n)
copy(self, protocol=None)
Return a deep copy of the node.
Return the index of the current training phase.
Return dtype.
Return input dimensions.
Return output dimensions.
Return the number of training phases still to accomplish.
Return dtypes supported by the node as a list of dtype objects.
Return True if the node has multiple training phases.
inverse(self, y, *args, **kwargs)
Invert y.
Return True if the node is in the training phase, False otherwise.
save(self, filename, protocol=-1)
Save a pickled serialization of the node to filename. If filename is None, return a string.
set_dtype(self, t)
Set internal structures' dtype.
set_input_dim(self, n)
Set input dimensions.
set_output_dim(self, n)
Set output dimensions.
Static Methods [hide private]
Return True if the node can be inverted, False otherwise.
Return True if the node can be trained, False otherwise.
Properties [hide private]

Inherited from object: __class__

    Inherited from Node
List of tuples:
Input dimensions
Output dimensions
Supported dtypes
Method Details [hide private]

__init__(self, input_dim=None, output_dim=None, dtype=None, **kwargs)

Theil-Sen Estimator: robust multivariate regression model.

This node has been automatically generated by wrapping the ``sklearn.linear_model.theil_sen.TheilSenRegressor`` class
from the ``sklearn`` library.  The wrapped instance can be accessed
through the ``scikits_alg`` attribute.

The algorithm calculates least square solutions on subsets with size
n_subsamples of the samples in X. Any value of n_subsamples between the
number of features and samples leads to an estimator with a compromise
between robustness and efficiency. Since the number of least square
solutions is "n_samples choose n_subsamples", it can be extremely large
and can therefore be limited with max_subpopulation. If this limit is
reached, the subsets are chosen randomly. In a final step, the spatial
median (or L1 median) is calculated of all least square solutions.

Read more in the :ref:`User Guide <theil_sen_regression>`.


fit_intercept : boolean, optional, default True
    Whether to calculate the intercept for this model. If set
    to false, no intercept will be used in calculations.

copy_X : boolean, optional, default True
    If True, X will be copied; else, it may be overwritten.

max_subpopulation : int, optional, default 1e4
    Instead of computing with a set of cardinality 'n choose k', where n is
    the number of samples and k is the number of subsamples (at least
    number of features), consider only a stochastic subpopulation of a
    given maximal size if 'n choose k' is larger than max_subpopulation.
    For other than small problem sizes this parameter will determine
    memory usage and runtime if n_subsamples is not changed.

n_subsamples : int, optional, default None
    Number of samples to calculate the parameters. This is at least the
    number of features (plus 1 if fit_intercept=True) and the number of
    samples as a maximum. A lower number leads to a higher breakdown
    point and a low efficiency while a high number leads to a low
    breakdown point and a high efficiency. If None, take the
    minimum number of subsamples leading to maximal robustness.
    If n_subsamples is set to n_samples, Theil-Sen is identical to least

max_iter : int, optional, default 300
    Maximum number of iterations for the calculation of spatial median.

tol : float, optional, default 1.e-3
    Tolerance when calculating spatial median.

random_state : RandomState or an int seed, optional, default None
    A random number generator instance to define the state of the
    random permutations generator.

n_jobs : integer, optional, default 1
    Number of CPUs to use during the cross validation. If ``-1``, use
    all the CPUs.

verbose : boolean, optional, default False
    Verbose mode when fitting the model.


``coef_`` : array, shape = (n_features)
    Coefficients of the regression model (median of distribution).

``intercept_`` : float
    Estimated intercept of regression model.

``breakdown_`` : float
    Approximated breakdown point.

``n_iter_`` : int
    Number of iterations needed for the spatial median.

``n_subpopulation_`` : int
    Number of combinations taken into account from 'n choose k', where n is
    the number of samples and k is the number of subsamples.


- Theil-Sen Estimators in a Multiple Linear Regression Model, 2009
  Xin Dang, Hanxiang Peng, Xueqin Wang and Heping Zhang

Overrides: object.__init__

_execute(self, x)

Overrides: Node._execute


Return the list of dtypes supported by this node. The types can be specified in any format allowed by numpy.dtype.
Overrides: Node._get_supported_dtypes

_stop_training(self, **kwargs)

Concatenate the collected data in a single array.
Overrides: Node._stop_training

execute(self, x)


Predict using the linear model

This node has been automatically generated by wrapping the sklearn.linear_model.theil_sen.TheilSenRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.


X : {array-like, sparse matrix}, shape = (n_samples, n_features)


C : array, shape = (n_samples,)
Returns predicted values.
Overrides: Node.execute

Static Method

Return True if the node can be inverted, False otherwise.
Overrides: Node.is_invertible
(inherited documentation)

Static Method

Return True if the node can be trained, False otherwise.
Overrides: Node.is_trainable

stop_training(self, **kwargs)


Fit linear model.

This node has been automatically generated by wrapping the sklearn.linear_model.theil_sen.TheilSenRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.


X : numpy array of shape [n_samples, n_features]
Training data
y : numpy array of shape [n_samples]
Target values


self : returns an instance of self.

Overrides: Node.stop_training