| Home | Trees | Indices | Help |
|
|---|
|
|
Encode categorical integer features using a one-hot aka one-of-K scheme.
This node has been automatically generated by wrapping the ``sklearn.preprocessing.data.OneHotEncoder`` class
from the ``sklearn`` library. The wrapped instance can be accessed
through the ``scikits_alg`` attribute.
The input to this transformer should be a matrix of integers, denoting
the values taken on by categorical (discrete) features. The output will be
a sparse matrix where each column corresponds to one possible value of one
feature. It is assumed that input features take on values in the range
[0, n_values).
This encoding is needed for feeding categorical data to many scikit-learn
estimators, notably linear models and SVMs with the standard kernels.
Read more in the :ref:`User Guide <preprocessing_categorical_features>`.
**Parameters**
n_values : 'auto', int or array of ints
Number of values per feature.
- 'auto' : determine value range from training data.
- int : maximum value for all features.
- array : maximum value per feature.
categorical_features: "all" or array of indices or mask
Specify what features are treated as categorical.
- 'all' (default): All features are treated as categorical.
- array of indices: Array of categorical feature indices.
- mask: Array of length n_features and with dtype=bool.
Non-categorical features are always stacked to the right of the matrix.
dtype : number type, default=np.float
Desired dtype of output.
sparse : boolean, default=True
Will return sparse matrix if set True else will return an array.
handle_unknown : str, 'error' or 'ignore'
Whether to raise an error or ignore if a unknown categorical feature is
present during transform.
**Attributes**
``active_features_`` : array
Indices for active features, meaning values that actually occur
in the training set. Only available when n_values is ``'auto'``.
``feature_indices_`` : array of shape (n_features,)
Indices to feature ranges.
Feature ``i`` in the original data is mapped to features
from ``feature_indices_[i]`` to ``feature_indices_[i+1]``
(and then potentially masked by `active_features_` afterwards)
``n_values_`` : array of shape (n_features,)
Maximum number of values per feature.
**Examples**
Given a dataset with three features and two samples, we let the encoder
find the maximum value per feature and transform the data to a binary
one-hot encoding.
>>> from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder()
>>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]]) # doctest: +ELLIPSIS
OneHotEncoder(categorical_features='all', dtype=<... 'float'>,
handle_unknown='error', n_values='auto', sparse=True)
>>> enc.n_values_
array([2, 3, 4])
>>> enc.feature_indices_
array([0, 2, 5, 9])
>>> enc.transform([[0, 1, 1]]).toarray()
array([[ 1., 0., 0., 1., 0., 0., 1., 0., 0.]])
See also
sklearn.feature_extraction.DictVectorizer : performs a one-hot encoding of
dictionary items (also handles string-valued features).
sklearn.feature_extraction.FeatureHasher : performs an approximate one-hot
encoding of dictionary items or strings.
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
Inherited from Inherited from |
|||
| Inherited from Cumulator | |||
|---|---|---|---|
|
|||
|
|||
| Inherited from Node | |||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
Inherited from |
|||
| Inherited from Node | |||
|---|---|---|---|
|
_train_seq List of tuples: |
|||
|
dtype dtype |
|||
|
input_dim Input dimensions |
|||
|
output_dim Output dimensions |
|||
|
supported_dtypes Supported dtypes |
|||
|
|||
Encode categorical integer features using a one-hot aka one-of-K scheme.
This node has been automatically generated by wrapping the ``sklearn.preprocessing.data.OneHotEncoder`` class
from the ``sklearn`` library. The wrapped instance can be accessed
through the ``scikits_alg`` attribute.
The input to this transformer should be a matrix of integers, denoting
the values taken on by categorical (discrete) features. The output will be
a sparse matrix where each column corresponds to one possible value of one
feature. It is assumed that input features take on values in the range
[0, n_values).
This encoding is needed for feeding categorical data to many scikit-learn
estimators, notably linear models and SVMs with the standard kernels.
Read more in the :ref:`User Guide <preprocessing_categorical_features>`.
**Parameters**
n_values : 'auto', int or array of ints
Number of values per feature.
- 'auto' : determine value range from training data.
- int : maximum value for all features.
- array : maximum value per feature.
categorical_features: "all" or array of indices or mask
Specify what features are treated as categorical.
- 'all' (default): All features are treated as categorical.
- array of indices: Array of categorical feature indices.
- mask: Array of length n_features and with dtype=bool.
Non-categorical features are always stacked to the right of the matrix.
dtype : number type, default=np.float
Desired dtype of output.
sparse : boolean, default=True
Will return sparse matrix if set True else will return an array.
handle_unknown : str, 'error' or 'ignore'
Whether to raise an error or ignore if a unknown categorical feature is
present during transform.
**Attributes**
``active_features_`` : array
Indices for active features, meaning values that actually occur
in the training set. Only available when n_values is ``'auto'``.
``feature_indices_`` : array of shape (n_features,)
Indices to feature ranges.
Feature ``i`` in the original data is mapped to features
from ``feature_indices_[i]`` to ``feature_indices_[i+1]``
(and then potentially masked by `active_features_` afterwards)
``n_values_`` : array of shape (n_features,)
Maximum number of values per feature.
**Examples**
Given a dataset with three features and two samples, we let the encoder
find the maximum value per feature and transform the data to a binary
one-hot encoding.
>>> from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder()
>>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]]) # doctest: +ELLIPSIS
OneHotEncoder(categorical_features='all', dtype=<... 'float'>,
handle_unknown='error', n_values='auto', sparse=True)
>>> enc.n_values_
array([2, 3, 4])
>>> enc.feature_indices_
array([0, 2, 5, 9])
>>> enc.transform([[0, 1, 1]]).toarray()
array([[ 1., 0., 0., 1., 0., 0., 1., 0., 0.]])
See also
sklearn.feature_extraction.DictVectorizer : performs a one-hot encoding of
dictionary items (also handles string-valued features).
sklearn.feature_extraction.FeatureHasher : performs an approximate one-hot
encoding of dictionary items or strings.
|
|
|
|
Transform X using one-hot encoding. This node has been automatically generated by wrapping the sklearn.preprocessing.data.OneHotEncoder class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Parameters
Returns
|
|
|
Fit OneHotEncoder to X. This node has been automatically generated by wrapping the sklearn.preprocessing.data.OneHotEncoder class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Parameters
Returns self
|
| Home | Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0.1 on Tue Mar 8 12:39:48 2016 | http://epydoc.sourceforge.net |