Package mdp :: Package nodes :: Class LogisticRegressionCVScikitsLearnNode
[hide private]
[frames] | no frames]

Class LogisticRegressionCVScikitsLearnNode



Logistic Regression CV (aka logit, MaxEnt) classifier.

This node has been automatically generated by wrapping the ``sklearn.linear_model.logistic.LogisticRegressionCV`` class
from the ``sklearn`` library.  The wrapped instance can be accessed
through the ``scikits_alg`` attribute.

This class implements logistic regression using liblinear, newton-cg, sag
of lbfgs optimizer. The newton-cg, sag and lbfgs solvers support only L2
regularization with primal formulation. The liblinear solver supports both
L1 and L2 regularization, with a dual formulation only for the L2 penalty.

For the grid of Cs values (that are set by default to be ten values in
a logarithmic scale between 1e-4 and 1e4), the best hyperparameter is
selected by the cross-validator StratifiedKFold, but it can be changed
using the cv parameter. In the case of newton-cg and lbfgs solvers,
we warm start along the path i.e guess the initial coefficients of the
present fit to be the coefficients got after convergence in the previous
fit, so it is supposed to be faster for high-dimensional dense data.

For a multiclass problem, the hyperparameters for each class are computed
using the best scores got by doing a one-vs-rest in parallel across all
folds and classes. Hence this is not the true multinomial loss.

Read more in the :ref:`User Guide <logistic_regression>`.

**Parameters**

Cs : list of floats | int
    Each of the values in Cs describes the inverse of regularization
    strength. If Cs is as an int, then a grid of Cs values are chosen
    in a logarithmic scale between 1e-4 and 1e4.
    Like in support vector machines, smaller values specify stronger
    regularization.

fit_intercept : bool, default: True
    Specifies if a constant (a.k.a. bias or intercept) should be
    added to the decision function.

class_weight : dict or 'balanced', optional
    Weights associated with classes in the form ``{class_label: weight}``.
    If not given, all classes are supposed to have weight one.

    The "balanced" mode uses the values of y to automatically adjust
    weights inversely proportional to class frequencies in the input data
    as ``n_samples / (n_classes * np.bincount(y))``

    Note that these weights will be multiplied with sample_weight (passed
    through the fit method) if sample_weight is specified.

    .. versionadded:: 0.17
       class_weight == 'balanced'

cv : integer or cross-validation generator
    The default cross-validation generator used is Stratified K-Folds.
    If an integer is provided, then it is the number of folds used.
    See the module :mod:`sklearn.cross_validation` module for the
    list of possible cross-validation objects.

penalty : str, 'l1' or 'l2'
    Used to specify the norm used in the penalization. The newton-cg and
    lbfgs solvers support only l2 penalties.

dual : bool
    Dual or primal formulation. Dual formulation is only implemented for
    l2 penalty with liblinear solver. Prefer dual=False when
    n_samples > n_features.

scoring : callabale
    Scoring function to use as cross-validation criteria. For a list of
    scoring functions that can be used, look at :mod:`sklearn.metrics`.
    The default scoring option used is accuracy_score.

solver : {'newton-cg', 'lbfgs', 'liblinear', 'sag'}
    Algorithm to use in the optimization problem.

    - For small datasets, 'liblinear' is a good choice, whereas 'sag' is
        faster for large ones.
    - For multiclass problems, only 'newton-cg' and 'lbfgs' handle
        multinomial loss; 'sag' and 'liblinear' are limited to
        one-versus-rest schemes.
    - 'newton-cg', 'lbfgs' and 'sag' only handle L2 penalty.
    - 'liblinear' might be slower in LogisticRegressionCV because it does
        not handle warm-starting.

tol : float, optional
    Tolerance for stopping criteria.

max_iter : int, optional
    Maximum number of iterations of the optimization algorithm.

n_jobs : int, optional
    Number of CPU cores used during the cross-validation loop. If given
    a value of -1, all cores are used.

verbose : int
    For the 'liblinear', 'sag' and 'lbfgs' solvers set verbose to any
    positive number for verbosity.

refit : bool
    If set to True, the scores are averaged across all folds, and the
    coefs and the C that corresponds to the best score is taken, and a
    final refit is done using these parameters.
    Otherwise the coefs, intercepts and C that correspond to the
    best scores across folds are averaged.

multi_class : str, {'ovr', 'multinomial'}
    Multiclass option can be either 'ovr' or 'multinomial'. If the option
    chosen is 'ovr', then a binary problem is fit for each label. Else
    the loss minimised is the multinomial loss fit across
    the entire probability distribution. Works only for 'lbfgs' and
    'newton-cg' solvers.

intercept_scaling : float, default 1.
    Useful only if solver is liblinear.
    This parameter is useful only when the solver 'liblinear' is used
    and self.fit_intercept is set to True. In this case, x becomes
    [x, self.intercept_scaling],
    i.e. a "synthetic" feature with constant value equals to
    intercept_scaling is appended to the instance vector.
    The intercept becomes intercept_scaling * synthetic feature weight
    Note! the synthetic feature weight is subject to l1/l2 regularization
    as all other features.
    To lessen the effect of regularization on synthetic feature weight
    (and therefore on the intercept) intercept_scaling has to be increased.

random_state : int seed, RandomState instance, or None (default)
    The seed of the pseudo random number generator to use when
    shuffling the data.

**Attributes**

``coef_`` : array, shape (1, n_features) or (n_classes, n_features)
    Coefficient of the features in the decision function.

    `coef_` is of shape (1, n_features) when the given problem
    is binary.
    `coef_` is readonly property derived from `raw_coef_` that
    follows the internal memory layout of liblinear.

``intercept_`` : array, shape (1,) or (n_classes,)
    Intercept (a.k.a. bias) added to the decision function.
    It is available only when parameter intercept is set to True
    and is of shape(1,) when the problem is binary.

``Cs_`` : array
    Array of C i.e. inverse of regularization parameter values used
    for cross-validation.

``coefs_paths_`` : array, shape ``(n_folds, len(Cs_), n_features)`` or                    ``(n_folds, len(Cs_), n_features + 1)``
    dict with classes as the keys, and the path of coefficients obtained
    during cross-validating across each fold and then across each Cs
    after doing an OvR for the corresponding class as values.
    If the 'multi_class' option is set to 'multinomial', then
    the coefs_paths are the coefficients corresponding to each class.
    Each dict value has shape ``(n_folds, len(Cs_), n_features)`` or
    ``(n_folds, len(Cs_), n_features + 1)`` depending on whether the
    intercept is fit or not.

``scores_`` : dict
    dict with classes as the keys, and the values as the
    grid of scores obtained during cross-validating each fold, after doing
    an OvR for the corresponding class. If the 'multi_class' option
    given is 'multinomial' then the same scores are repeated across
    all classes, since this is the multinomial class.
    Each dict value has shape (n_folds, len(Cs))

``C_`` : array, shape (n_classes,) or (n_classes - 1,)
    Array of C that maps to the best scores across every class. If refit is
    set to False, then for each class, the best C is the average of the
    C's that correspond to the best scores for each fold.

``n_iter_`` : array, shape (n_classes, n_folds, n_cs) or (1, n_folds, n_cs)
    Actual number of iterations for all classes, folds and Cs.
    In the binary or multinomial cases, the first dimension is equal to 1.

See also

LogisticRegression

Instance Methods [hide private]
 
__init__(self, input_dim=None, output_dim=None, dtype=None, **kwargs)
Logistic Regression CV (aka logit, MaxEnt) classifier.
 
_get_supported_dtypes(self)
Return the list of dtypes supported by this node. The types can be specified in any format allowed by numpy.dtype.
 
_label(self, x)
 
_stop_training(self, **kwargs)
Transform the data and labels lists to array objects and reshape them.
 
label(self, x)
Predict class labels for samples in X.
 
stop_training(self, **kwargs)
Fit the model according to the given training data.

Inherited from PreserveDimNode (private): _set_input_dim, _set_output_dim

Inherited from unreachable.newobject: __long__, __native__, __nonzero__, __unicode__, next

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

    Inherited from ClassifierCumulator
 
_check_train_args(self, x, labels)
 
_train(self, x, labels)
Cumulate all input data in a one dimensional list.
 
train(self, x, labels)
Cumulate all input data in a one dimensional list.
    Inherited from ClassifierNode
 
_execute(self, x)
 
_prob(self, x, *args, **kargs)
 
execute(self, x)
Process the data contained in x.
 
prob(self, x, *args, **kwargs)
Predict probability for each possible outcome.
 
rank(self, x, threshold=None)
Returns ordered list with all labels ordered according to prob(x) (e.g., [[3 1 2], [2 1 3], ...]).
    Inherited from Node
 
__add__(self, other)
 
__call__(self, x, *args, **kwargs)
Calling an instance of Node is equivalent to calling its execute method.
 
__repr__(self)
repr(x)
 
__str__(self)
str(x)
 
_check_input(self, x)
 
_check_output(self, y)
 
_get_train_seq(self)
 
_if_training_stop_training(self)
 
_inverse(self, x)
 
_pre_execution_checks(self, x)
This method contains all pre-execution checks.
 
_pre_inversion_checks(self, y)
This method contains all pre-inversion checks.
 
_refcast(self, x)
Helper function to cast arrays to the internal dtype.
 
_set_dtype(self, t)
 
copy(self, protocol=None)
Return a deep copy of the node.
 
get_current_train_phase(self)
Return the index of the current training phase.
 
get_dtype(self)
Return dtype.
 
get_input_dim(self)
Return input dimensions.
 
get_output_dim(self)
Return output dimensions.
 
get_remaining_train_phase(self)
Return the number of training phases still to accomplish.
 
get_supported_dtypes(self)
Return dtypes supported by the node as a list of dtype objects.
 
has_multiple_training_phases(self)
Return True if the node has multiple training phases.
 
inverse(self, y, *args, **kwargs)
Invert y.
 
is_training(self)
Return True if the node is in the training phase, False otherwise.
 
save(self, filename, protocol=-1)
Save a pickled serialization of the node to filename. If filename is None, return a string.
 
set_dtype(self, t)
Set internal structures' dtype.
 
set_input_dim(self, n)
Set input dimensions.
 
set_output_dim(self, n)
Set output dimensions.
Static Methods [hide private]
 
is_invertible()
Return True if the node can be inverted, False otherwise.
 
is_trainable()
Return True if the node can be trained, False otherwise.
Properties [hide private]

Inherited from object: __class__

    Inherited from Node
  _train_seq
List of tuples:
  dtype
dtype
  input_dim
Input dimensions
  output_dim
Output dimensions
  supported_dtypes
Supported dtypes
Method Details [hide private]

__init__(self, input_dim=None, output_dim=None, dtype=None, **kwargs)
(Constructor)

 

Logistic Regression CV (aka logit, MaxEnt) classifier.

This node has been automatically generated by wrapping the ``sklearn.linear_model.logistic.LogisticRegressionCV`` class
from the ``sklearn`` library.  The wrapped instance can be accessed
through the ``scikits_alg`` attribute.

This class implements logistic regression using liblinear, newton-cg, sag
of lbfgs optimizer. The newton-cg, sag and lbfgs solvers support only L2
regularization with primal formulation. The liblinear solver supports both
L1 and L2 regularization, with a dual formulation only for the L2 penalty.

For the grid of Cs values (that are set by default to be ten values in
a logarithmic scale between 1e-4 and 1e4), the best hyperparameter is
selected by the cross-validator StratifiedKFold, but it can be changed
using the cv parameter. In the case of newton-cg and lbfgs solvers,
we warm start along the path i.e guess the initial coefficients of the
present fit to be the coefficients got after convergence in the previous
fit, so it is supposed to be faster for high-dimensional dense data.

For a multiclass problem, the hyperparameters for each class are computed
using the best scores got by doing a one-vs-rest in parallel across all
folds and classes. Hence this is not the true multinomial loss.

Read more in the :ref:`User Guide <logistic_regression>`.

**Parameters**

Cs : list of floats | int
    Each of the values in Cs describes the inverse of regularization
    strength. If Cs is as an int, then a grid of Cs values are chosen
    in a logarithmic scale between 1e-4 and 1e4.
    Like in support vector machines, smaller values specify stronger
    regularization.

fit_intercept : bool, default: True
    Specifies if a constant (a.k.a. bias or intercept) should be
    added to the decision function.

class_weight : dict or 'balanced', optional
    Weights associated with classes in the form ``{class_label: weight}``.
    If not given, all classes are supposed to have weight one.

    The "balanced" mode uses the values of y to automatically adjust
    weights inversely proportional to class frequencies in the input data
    as ``n_samples / (n_classes * np.bincount(y))``

    Note that these weights will be multiplied with sample_weight (passed
    through the fit method) if sample_weight is specified.

    .. versionadded:: 0.17
       class_weight == 'balanced'

cv : integer or cross-validation generator
    The default cross-validation generator used is Stratified K-Folds.
    If an integer is provided, then it is the number of folds used.
    See the module :mod:`sklearn.cross_validation` module for the
    list of possible cross-validation objects.

penalty : str, 'l1' or 'l2'
    Used to specify the norm used in the penalization. The newton-cg and
    lbfgs solvers support only l2 penalties.

dual : bool
    Dual or primal formulation. Dual formulation is only implemented for
    l2 penalty with liblinear solver. Prefer dual=False when
    n_samples > n_features.

scoring : callabale
    Scoring function to use as cross-validation criteria. For a list of
    scoring functions that can be used, look at :mod:`sklearn.metrics`.
    The default scoring option used is accuracy_score.

solver : {'newton-cg', 'lbfgs', 'liblinear', 'sag'}
    Algorithm to use in the optimization problem.

    - For small datasets, 'liblinear' is a good choice, whereas 'sag' is
        faster for large ones.
    - For multiclass problems, only 'newton-cg' and 'lbfgs' handle
        multinomial loss; 'sag' and 'liblinear' are limited to
        one-versus-rest schemes.
    - 'newton-cg', 'lbfgs' and 'sag' only handle L2 penalty.
    - 'liblinear' might be slower in LogisticRegressionCV because it does
        not handle warm-starting.

tol : float, optional
    Tolerance for stopping criteria.

max_iter : int, optional
    Maximum number of iterations of the optimization algorithm.

n_jobs : int, optional
    Number of CPU cores used during the cross-validation loop. If given
    a value of -1, all cores are used.

verbose : int
    For the 'liblinear', 'sag' and 'lbfgs' solvers set verbose to any
    positive number for verbosity.

refit : bool
    If set to True, the scores are averaged across all folds, and the
    coefs and the C that corresponds to the best score is taken, and a
    final refit is done using these parameters.
    Otherwise the coefs, intercepts and C that correspond to the
    best scores across folds are averaged.

multi_class : str, {'ovr', 'multinomial'}
    Multiclass option can be either 'ovr' or 'multinomial'. If the option
    chosen is 'ovr', then a binary problem is fit for each label. Else
    the loss minimised is the multinomial loss fit across
    the entire probability distribution. Works only for 'lbfgs' and
    'newton-cg' solvers.

intercept_scaling : float, default 1.
    Useful only if solver is liblinear.
    This parameter is useful only when the solver 'liblinear' is used
    and self.fit_intercept is set to True. In this case, x becomes
    [x, self.intercept_scaling],
    i.e. a "synthetic" feature with constant value equals to
    intercept_scaling is appended to the instance vector.
    The intercept becomes intercept_scaling * synthetic feature weight
    Note! the synthetic feature weight is subject to l1/l2 regularization
    as all other features.
    To lessen the effect of regularization on synthetic feature weight
    (and therefore on the intercept) intercept_scaling has to be increased.

random_state : int seed, RandomState instance, or None (default)
    The seed of the pseudo random number generator to use when
    shuffling the data.

**Attributes**

``coef_`` : array, shape (1, n_features) or (n_classes, n_features)
    Coefficient of the features in the decision function.

    `coef_` is of shape (1, n_features) when the given problem
    is binary.
    `coef_` is readonly property derived from `raw_coef_` that
    follows the internal memory layout of liblinear.

``intercept_`` : array, shape (1,) or (n_classes,)
    Intercept (a.k.a. bias) added to the decision function.
    It is available only when parameter intercept is set to True
    and is of shape(1,) when the problem is binary.

``Cs_`` : array
    Array of C i.e. inverse of regularization parameter values used
    for cross-validation.

``coefs_paths_`` : array, shape ``(n_folds, len(Cs_), n_features)`` or                    ``(n_folds, len(Cs_), n_features + 1)``
    dict with classes as the keys, and the path of coefficients obtained
    during cross-validating across each fold and then across each Cs
    after doing an OvR for the corresponding class as values.
    If the 'multi_class' option is set to 'multinomial', then
    the coefs_paths are the coefficients corresponding to each class.
    Each dict value has shape ``(n_folds, len(Cs_), n_features)`` or
    ``(n_folds, len(Cs_), n_features + 1)`` depending on whether the
    intercept is fit or not.

``scores_`` : dict
    dict with classes as the keys, and the values as the
    grid of scores obtained during cross-validating each fold, after doing
    an OvR for the corresponding class. If the 'multi_class' option
    given is 'multinomial' then the same scores are repeated across
    all classes, since this is the multinomial class.
    Each dict value has shape (n_folds, len(Cs))

``C_`` : array, shape (n_classes,) or (n_classes - 1,)
    Array of C that maps to the best scores across every class. If refit is
    set to False, then for each class, the best C is the average of the
    C's that correspond to the best scores for each fold.

``n_iter_`` : array, shape (n_classes, n_folds, n_cs) or (1, n_folds, n_cs)
    Actual number of iterations for all classes, folds and Cs.
    In the binary or multinomial cases, the first dimension is equal to 1.

See also

LogisticRegression

Overrides: object.__init__

_get_supported_dtypes(self)

 
Return the list of dtypes supported by this node. The types can be specified in any format allowed by numpy.dtype.
Overrides: Node._get_supported_dtypes

_label(self, x)

 
Overrides: ClassifierNode._label

_stop_training(self, **kwargs)

 
Transform the data and labels lists to array objects and reshape them.

Overrides: Node._stop_training

is_invertible()
Static Method

 
Return True if the node can be inverted, False otherwise.
Overrides: Node.is_invertible
(inherited documentation)

is_trainable()
Static Method

 
Return True if the node can be trained, False otherwise.
Overrides: Node.is_trainable

label(self, x)

 

Predict class labels for samples in X.

This node has been automatically generated by wrapping the sklearn.linear_model.logistic.LogisticRegressionCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

X : {array-like, sparse matrix}, shape = [n_samples, n_features]
Samples.

Returns

C : array, shape = [n_samples]
Predicted class label per sample.
Overrides: ClassifierNode.label

stop_training(self, **kwargs)

 

Fit the model according to the given training data.

This node has been automatically generated by wrapping the sklearn.linear_model.logistic.LogisticRegressionCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute.

Parameters

X : {array-like, sparse matrix}, shape (n_samples, n_features)
Training vector, where n_samples in the number of samples and n_features is the number of features.
y : array-like, shape (n_samples,)
Target vector relative to X.
sample_weight : array-like, shape (n_samples,) optional
Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

Returns

self : object
Returns self.
Overrides: Node.stop_training