`sklearn.covariance`.EllipticEnvelope¶

class sklearn.covariance.EllipticEnvelope(store_precision=True, assume_centered=False, support_fraction=None, contamination=0.1, random_state=None)[source]¶

An object for detecting outliers in a Gaussian distributed dataset.

See also

EmpiricalCovariance, MinCovDet

Notes

Outlier detection from covariance estimation may break or not perform well in high-dimensional settings. In particular, one will always take care to work with n_samples > n_features ** 2.

References

[R68ae096da0e4-1]

Rousseeuw, P.J., Van Driessen, K. “A fast algorithm for the minimum covariance determinant estimator” Technometrics 41(3), 212 (1999)

Examples

>>> import numpy as np
>>> from sklearn.covariance import EllipticEnvelope
>>> true_cov = np.array([[.8, .3],
...                      [.3, .4]])
>>> X = np.random.RandomState(0).multivariate_normal(mean=[0, 0],
...                                                  cov=true_cov,
...                                                  size=500)
>>> cov = EllipticEnvelope(random_state=0).fit(X)
>>> # predict returns 1 for an inlier and -1 for an outlier
>>> cov.predict([[0, 0],
...              [3, 3]])
array([ 1, -1])
>>> cov.covariance_ 
array([[0.7411..., 0.2535...],
       [0.2535..., 0.3053...]])
>>> cov.location_
array([0.0813... , 0.0427...])

Methods

`correct_covariance`(self, data)	Apply a correction to raw Minimum Covariance Determinant estimates.
`decision_function`(self, X[, raw_values])	Compute the decision function of the given observations.
`error_norm`(self, comp_cov[, norm, scaling, …])	Computes the Mean Squared Error between two covariance estimators.
`fit`(self, X[, y])	Fit the EllipticEnvelope model.
`fit_predict`(self, X[, y])	Performs fit on X and returns labels for X.
`get_params`(self[, deep])	Get parameters for this estimator.
`get_precision`(self)	Getter for the precision matrix.
`mahalanobis`(self, X)	Computes the squared Mahalanobis distances of given observations.
`predict`(self, X)	Predict the labels (1 inlier, -1 outlier) of X according to the fitted model.
`reweight_covariance`(self, data)	Re-weight raw Minimum Covariance Determinant estimates.
`score`(self, X, y[, sample_weight])	Returns the mean accuracy on the given test data and labels.
`score_samples`(self, X)	Compute the negative Mahalanobis distances.
`set_params`(self, \\params)	Set the parameters of this estimator.

__init__(self, store_precision=True, assume_centered=False, support_fraction=None, contamination=0.1, random_state=None)[source]¶

correct_covariance(self, data)[source]¶

Apply a correction to raw Minimum Covariance Determinant estimates.

Correction using the empirical correction factor suggested by Rousseeuw and Van Driessen in [RVD].

Parameters:	data : array-like, shape (n_samples, n_features) The data matrix, with p features and n samples. The data set must be the one which was used to compute the raw estimates.
Returns:	covariance_corrected : array-like, shape (n_features, n_features) Corrected robust covariance estimate.

References

[RVD]

A Fast Algorithm for the Minimum Covariance Determinant Estimator, 1999, American Statistical Association and the American Society for Quality, TECHNOMETRICS

decision_function(self, X, raw_values=None)[source]¶

Compute the decision function of the given observations.

Parameters:	X : array-like, shape (n_samples, n_features) raw_values : bool, optional Whether or not to consider raw Mahalanobis distances as the decision function. Must be False (default) for compatibility with the others outlier detection tools. Deprecated since version 0.20: `raw_values` has been deprecated in 0.20 and will be removed in 0.22.
Returns:	decision : array-like, shape (n_samples, ) Decision function of the samples. It is equal to the shifted Mahalanobis distances. The threshold for being an outlier is 0, which ensures a compatibility with other outlier detection algorithms.

error_norm(self, comp_cov, norm=’frobenius’, scaling=True, squared=True)[source]¶

Computes the Mean Squared Error between two covariance estimators. (In the sense of the Frobenius norm).

Parameters:

comp_cov : array-like, shape = [n_features, n_features]: The covariance to compare with.
norm : str: The type of norm used to compute the error. Available error types: - ‘frobenius’ (default): sqrt(tr(A^t.A)) - ‘spectral’: sqrt(max(eigenvalues(A^t.A)) where A is the error (comp_cov - self.covariance_).
scaling : bool: If True (default), the squared error norm is divided by n_features. If False, the squared error norm is not rescaled.
squared : bool: Whether to compute the squared error norm or the error norm. If True (default), the squared error norm is returned. If False, the error norm is returned.

Returns:

The Mean Squared Error (in the sense of the Frobenius norm) between
self and comp_cov covariance estimators.

fit(self, X, y=None)[source]¶

Fit the EllipticEnvelope model.

Parameters:	X : numpy array or sparse matrix, shape (n_samples, n_features). Training data y : Ignored not used, present for API consistency by convention.

fit_predict(self, X, y=None)[source]¶

Performs fit on X and returns labels for X.

Returns -1 for outliers and 1 for inliers.

Parameters:	X : ndarray, shape (n_samples, n_features) Input data. y : Ignored not used, present for API consistency by convention.
Returns:	y : ndarray, shape (n_samples,) 1 for inliers, -1 for outliers.

get_params(self, deep=True)[source]¶

Get parameters for this estimator.

Parameters:	deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params : mapping of string to any Parameter names mapped to their values.

get_precision(self)[source]¶

Getter for the precision matrix.

Returns:	precision_ : array-like The precision matrix associated to the current covariance object.

mahalanobis(self, X)[source]¶

Computes the squared Mahalanobis distances of given observations.

Parameters:	X : array-like, shape = [n_samples, n_features] The observations, the Mahalanobis distances of the which we compute. Observations are assumed to be drawn from the same distribution than the data used in fit.
Returns:	dist : array, shape = [n_samples,] Squared Mahalanobis distances of the observations.

predict(self, X)[source]¶

Predict the labels (1 inlier, -1 outlier) of X according to the fitted model.

Parameters:	X : array-like, shape (n_samples, n_features)
Returns:	is_inlier : array, shape (n_samples,) Returns -1 for anomalies/outliers and +1 for inliers.

reweight_covariance(self, data)[source]¶

Re-weight raw Minimum Covariance Determinant estimates.

Re-weight observations using Rousseeuw’s method (equivalent to deleting outlying observations from the data set before computing location and covariance estimates) described in [RVDriessen].

Parameters:	data : array-like, shape (n_samples, n_features) The data matrix, with p features and n samples. The data set must be the one which was used to compute the raw estimates.
Returns:	location_reweighted : array-like, shape (n_features, ) Re-weighted robust location estimate. covariance_reweighted : array-like, shape (n_features, n_features) Re-weighted robust covariance estimate. support_reweighted : array-like, type boolean, shape (n_samples,) A mask of the observations that have been used to compute the re-weighted robust location and covariance estimates.

References

[RVDriessen]

A Fast Algorithm for the Minimum Covariance Determinant Estimator, 1999, American Statistical Association and the American Society for Quality, TECHNOMETRICS

score(self, X, y, sample_weight=None)[source]¶

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:	X : array-like, shape (n_samples, n_features) Test samples. y : array-like, shape (n_samples,) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape (n_samples,), optional Sample weights.
Returns:	score : float Mean accuracy of self.predict(X) wrt. y.

score_samples(self, X)[source]¶

Compute the negative Mahalanobis distances.

Parameters:	X : array-like, shape (n_samples, n_features)
Returns:	negative_mahal_distances : array-like, shape (n_samples, ) Opposite of the Mahalanobis distances.

set_params(self, **params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:	self

Examples using `sklearn.covariance.EllipticEnvelope`¶

Comparing anomaly detection algorithms for outlier detection on toy datasets

Outlier detection on a real data set

sklearn.covariance.EllipticEnvelope¶

Examples using sklearn.covariance.EllipticEnvelope¶

`sklearn.covariance`.EllipticEnvelope¶

Examples using `sklearn.covariance.EllipticEnvelope`¶