sklearn.preprocessing
.power_transform¶

sklearn.preprocessing.
power_transform
(X, method='yeojohnson', *, standardize=True, copy=True)[source]¶ Power transforms are a family of parametric, monotonic transformations that are applied to make data more Gaussianlike. This is useful for modeling issues related to heteroscedasticity (nonconstant variance), or other situations where normality is desired.
Currently, power_transform supports the BoxCox transform and the YeoJohnson transform. The optimal parameter for stabilizing variance and minimizing skewness is estimated through maximum likelihood.
BoxCox requires input data to be strictly positive, while YeoJohnson supports both positive or negative data.
By default, zeromean, unitvariance normalization is applied to the transformed data.
Read more in the User Guide.
 Parameters
 Xarraylike of shape (n_samples, n_features)
The data to be transformed using a power transformation.
 method{‘yeojohnson’, ‘boxcox’}, default=’yeojohnson’
The power transform method. Available methods are:
‘yeojohnson’ [1], works with positive and negative values
‘boxcox’ [2], only works with strictly positive values
Changed in version 0.23: The default value of the
method
parameter changed from ‘boxcox’ to ‘yeojohnson’ in 0.23. standardizebool, default=True
Set to True to apply zeromean, unitvariance normalization to the transformed output.
 copybool, default=True
Set to False to perform inplace computation during transformation.
 Returns
 X_transndarray of shape (n_samples, n_features)
The transformed data.
See also
PowerTransformer
Equivalent transformation with the
Transformer
API (e.g. as part of a preprocessingPipeline
).quantile_transform
Maps data to a standard normal distribution with the parameter
output_distribution='normal'
.
Notes
NaNs are treated as missing values: disregarded in
fit
, and maintained intransform
.For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.
References
 1
I.K. Yeo and R.A. Johnson, “A new family of power transformations to improve normality or symmetry.” Biometrika, 87(4), pp.954959, (2000).
 2
G.E.P. Box and D.R. Cox, “An Analysis of Transformations”, Journal of the Royal Statistical Society B, 26, 211252 (1964).
Examples
>>> import numpy as np >>> from sklearn.preprocessing import power_transform >>> data = [[1, 2], [3, 2], [4, 5]] >>> print(power_transform(data, method='boxcox')) [[1.332... 0.707...] [ 0.256... 0.707...] [ 1.076... 1.414...]]
Warning
Risk of data leak. Do not use
power_transform
unless you know what you are doing. A common mistake is to apply it to the entire data before splitting into training and test sets. This will bias the model evaluation because information would have leaked from the test set to the training set. In general, we recommend usingPowerTransformer
within a Pipeline in order to prevent most risks of data leaking, e.g.:pipe = make_pipeline(PowerTransformer(), LogisticRegression())
.