sklearn.preprocessing
.robust_scale¶
-
sklearn.preprocessing.
robust_scale
(X, *, axis=0, with_centering=True, with_scaling=True, quantile_range=25.0, 75.0, copy=True, unit_variance=False)[source]¶ Standardize a dataset along any axis
Center to the median and component wise scale according to the interquartile range.
Read more in the User Guide.
- Parameters
- X{array-like, sparse matrix} of shape (n_sample, n_features)
The data to center and scale.
- axisint, default=0
axis used to compute the medians and IQR along. If 0, independently scale each feature, otherwise (if 1) scale each sample.
- with_centeringbool, default=True
If True, center the data before scaling.
- with_scalingbool, default=True
If True, scale the data to unit variance (or equivalently, unit standard deviation).
- quantile_rangetuple (q_min, q_max), 0.0 < q_min < q_max < 100.0
default=(25.0, 75.0), == (1st quantile, 3rd quantile), == IQR Quantile range used to calculate
scale_
.New in version 0.18.
- copybool, default=True
set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix and if axis is 1).
- unit_variancebool, default=False
If True, scale data so that normally distributed features have a variance of 1. In general, if the difference between the x-values of
q_max
andq_min
for a standard normal distribution is greater than 1, the dataset will be scaled down. If less than 1, the dataset will be scaled up.New in version 0.24.
- Returns
- X_tr{ndarray, sparse matrix} of shape (n_samples, n_features)
The transformed data.
See also
RobustScaler
Performs centering and scaling using the Transformer API (e.g. as part of a preprocessing
Pipeline
).
Notes
This implementation will refuse to center scipy.sparse matrices since it would make them non-sparse and would potentially crash the program with memory exhaustion problems.
Instead the caller is expected to either set explicitly
with_centering=False
(in that case, only variance scaling will be performed on the features of the CSR matrix) or to callX.toarray()
if he/she expects the materialized dense array to fit in memory.To avoid memory copy the caller should pass a CSR matrix.
For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.
Warning
Risk of data leak
Do not use
robust_scale
unless you know what you are doing. A common mistake is to apply it to the entire data before splitting into training and test sets. This will bias the model evaluation because information would have leaked from the test set to the training set. In general, we recommend usingRobustScaler
within a Pipeline in order to prevent most risks of data leaking:pipe = make_pipeline(RobustScaler(), LogisticRegression())
.