Home | Trees | Indices | Help |
|
---|
|
Encode categorical integer features using a one-hot aka one-of-K scheme. This node has been automatically generated by wrapping the ``sklearn.preprocessing.data.OneHotEncoder`` class from the ``sklearn`` library. The wrapped instance can be accessed through the ``scikits_alg`` attribute. The input to this transformer should be a matrix of integers, denoting the values taken on by categorical (discrete) features. The output will be a sparse matrix where each column corresponds to one possible value of one feature. It is assumed that input features take on values in the range [0, n_values). This encoding is needed for feeding categorical data to many scikit-learn estimators, notably linear models and SVMs with the standard kernels. Read more in the :ref:`User Guide <preprocessing_categorical_features>`. **Parameters** n_values : 'auto', int or array of ints Number of values per feature. - 'auto' : determine value range from training data. - int : maximum value for all features. - array : maximum value per feature. categorical_features: "all" or array of indices or mask Specify what features are treated as categorical. - 'all' (default): All features are treated as categorical. - array of indices: Array of categorical feature indices. - mask: Array of length n_features and with dtype=bool. Non-categorical features are always stacked to the right of the matrix. dtype : number type, default=np.float Desired dtype of output. sparse : boolean, default=True Will return sparse matrix if set True else will return an array. handle_unknown : str, 'error' or 'ignore' Whether to raise an error or ignore if a unknown categorical feature is present during transform. **Attributes** ``active_features_`` : array Indices for active features, meaning values that actually occur in the training set. Only available when n_values is ``'auto'``. ``feature_indices_`` : array of shape (n_features,) Indices to feature ranges. Feature ``i`` in the original data is mapped to features from ``feature_indices_[i]`` to ``feature_indices_[i+1]`` (and then potentially masked by `active_features_` afterwards) ``n_values_`` : array of shape (n_features,) Maximum number of values per feature. **Examples** Given a dataset with three features and two samples, we let the encoder find the maximum value per feature and transform the data to a binary one-hot encoding. >>> from sklearn.preprocessing import OneHotEncoder >>> enc = OneHotEncoder() >>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]]) # doctest: +ELLIPSIS OneHotEncoder(categorical_features='all', dtype=<... 'float'>, handle_unknown='error', n_values='auto', sparse=True) >>> enc.n_values_ array([2, 3, 4]) >>> enc.feature_indices_ array([0, 2, 5, 9]) >>> enc.transform([[0, 1, 1]]).toarray() array([[ 1., 0., 0., 1., 0., 0., 1., 0., 0.]]) See also sklearn.feature_extraction.DictVectorizer : performs a one-hot encoding of dictionary items (also handles string-valued features). sklearn.feature_extraction.FeatureHasher : performs an approximate one-hot encoding of dictionary items or strings.
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
Inherited from Inherited from |
|||
Inherited from Cumulator | |||
---|---|---|---|
|
|||
|
|||
Inherited from Node | |||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|
|||
|
|||
|
|
|||
Inherited from |
|||
Inherited from Node | |||
---|---|---|---|
_train_seq List of tuples: |
|||
dtype dtype |
|||
input_dim Input dimensions |
|||
output_dim Output dimensions |
|||
supported_dtypes Supported dtypes |
|
Encode categorical integer features using a one-hot aka one-of-K scheme. This node has been automatically generated by wrapping the ``sklearn.preprocessing.data.OneHotEncoder`` class from the ``sklearn`` library. The wrapped instance can be accessed through the ``scikits_alg`` attribute. The input to this transformer should be a matrix of integers, denoting the values taken on by categorical (discrete) features. The output will be a sparse matrix where each column corresponds to one possible value of one feature. It is assumed that input features take on values in the range [0, n_values). This encoding is needed for feeding categorical data to many scikit-learn estimators, notably linear models and SVMs with the standard kernels. Read more in the :ref:`User Guide <preprocessing_categorical_features>`. **Parameters** n_values : 'auto', int or array of ints Number of values per feature. - 'auto' : determine value range from training data. - int : maximum value for all features. - array : maximum value per feature. categorical_features: "all" or array of indices or mask Specify what features are treated as categorical. - 'all' (default): All features are treated as categorical. - array of indices: Array of categorical feature indices. - mask: Array of length n_features and with dtype=bool. Non-categorical features are always stacked to the right of the matrix. dtype : number type, default=np.float Desired dtype of output. sparse : boolean, default=True Will return sparse matrix if set True else will return an array. handle_unknown : str, 'error' or 'ignore' Whether to raise an error or ignore if a unknown categorical feature is present during transform. **Attributes** ``active_features_`` : array Indices for active features, meaning values that actually occur in the training set. Only available when n_values is ``'auto'``. ``feature_indices_`` : array of shape (n_features,) Indices to feature ranges. Feature ``i`` in the original data is mapped to features from ``feature_indices_[i]`` to ``feature_indices_[i+1]`` (and then potentially masked by `active_features_` afterwards) ``n_values_`` : array of shape (n_features,) Maximum number of values per feature. **Examples** Given a dataset with three features and two samples, we let the encoder find the maximum value per feature and transform the data to a binary one-hot encoding. >>> from sklearn.preprocessing import OneHotEncoder >>> enc = OneHotEncoder() >>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]]) # doctest: +ELLIPSIS OneHotEncoder(categorical_features='all', dtype=<... 'float'>, handle_unknown='error', n_values='auto', sparse=True) >>> enc.n_values_ array([2, 3, 4]) >>> enc.feature_indices_ array([0, 2, 5, 9]) >>> enc.transform([[0, 1, 1]]).toarray() array([[ 1., 0., 0., 1., 0., 0., 1., 0., 0.]]) See also sklearn.feature_extraction.DictVectorizer : performs a one-hot encoding of dictionary items (also handles string-valued features). sklearn.feature_extraction.FeatureHasher : performs an approximate one-hot encoding of dictionary items or strings.
|
|
|
|
Transform X using one-hot encoding. This node has been automatically generated by wrapping the sklearn.preprocessing.data.OneHotEncoder class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Parameters
Returns
|
|
|
Fit OneHotEncoder to X. This node has been automatically generated by wrapping the sklearn.preprocessing.data.OneHotEncoder class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Parameters
Returns self
|
Home | Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0.1 on Tue Mar 8 12:39:48 2016 | http://epydoc.sourceforge.net |