sklearn.linear_model
.RANSACRegressor¶
-
class
sklearn.linear_model.
RANSACRegressor
(base_estimator=None, min_samples=None, residual_threshold=None, is_data_valid=None, is_model_valid=None, max_trials=100, max_skips=inf, stop_n_inliers=inf, stop_score=inf, stop_probability=0.99, loss=’absolute_loss’, random_state=None)[source]¶ RANSAC (RANdom SAmple Consensus) algorithm.
RANSAC is an iterative algorithm for the robust estimation of parameters from a subset of inliers from the complete data set. More information can be found in the general documentation of linear models.
A detailed description of the algorithm can be found in the documentation of the
linear_model
sub-package.Read more in the User Guide.
Parameters: - base_estimator : object, optional
Base estimator object which implements the following methods:
fit(X, y)
: Fit model to given training data and target values.score(X, y)
: Returns the mean accuracy on the given test data, which is used for the stop criterion defined bystop_score
. Additionally, the score is used to decide which of two equally large consensus sets is chosen as the better one.predict(X)
: Returns predicted values using the linear model, which is used to compute residual error using loss function.
If
base_estimator
is None, thenbase_estimator=sklearn.linear_model.LinearRegression()
is used for target values of dtype float.Note that the current implementation only supports regression estimators.
- min_samples : int (>= 1) or float ([0, 1]), optional
Minimum number of samples chosen randomly from original data. Treated as an absolute number of samples for
min_samples >= 1
, treated as a relative numberceil(min_samples * X.shape[0]
) formin_samples < 1
. This is typically chosen as the minimal number of samples necessary to estimate the givenbase_estimator
. By default asklearn.linear_model.LinearRegression()
estimator is assumed andmin_samples
is chosen asX.shape[1] + 1
.- residual_threshold : float, optional
Maximum residual for a data sample to be classified as an inlier. By default the threshold is chosen as the MAD (median absolute deviation) of the target values y.
- is_data_valid : callable, optional
This function is called with the randomly selected data before the model is fitted to it:
is_data_valid(X, y)
. If its return value is False the current randomly chosen sub-sample is skipped.- is_model_valid : callable, optional
This function is called with the estimated model and the randomly selected data:
is_model_valid(model, X, y)
. If its return value is False the current randomly chosen sub-sample is skipped. Rejecting samples with this function is computationally costlier than withis_data_valid
.is_model_valid
should therefore only be used if the estimated model is needed for making the rejection decision.- max_trials : int, optional
Maximum number of iterations for random sample selection.
- max_skips : int, optional
Maximum number of iterations that can be skipped due to finding zero inliers or invalid data defined by
is_data_valid
or invalid models defined byis_model_valid
.New in version 0.19.
- stop_n_inliers : int, optional
Stop iteration if at least this number of inliers are found.
- stop_score : float, optional
Stop iteration if score is greater equal than this threshold.
- stop_probability : float in range [0, 1], optional
RANSAC iteration stops if at least one outlier-free set of the training data is sampled in RANSAC. This requires to generate at least N samples (iterations):
N >= log(1 - probability) / log(1 - e**m)
where the probability (confidence) is typically set to high value such as 0.99 (the default) and e is the current fraction of inliers w.r.t. the total number of samples.
- loss : string, callable, optional, default “absolute_loss”
String inputs, “absolute_loss” and “squared_loss” are supported which find the absolute loss and squared loss per sample respectively.
If
loss
is a callable, then it should be a function that takes two arrays as inputs, the true and predicted value and returns a 1-D array with the i-th value of the array corresponding to the loss onX[i]
.If the loss on a sample is greater than the
residual_threshold
, then this sample is classified as an outlier.- random_state : int, RandomState instance or None, optional, default None
The generator used to initialize the centers. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by
np.random
.
Attributes: - estimator_ : object
Best fitted model (copy of the
base_estimator
object).- n_trials_ : int
Number of random selection trials until one of the stop criteria is met. It is always
<= max_trials
.- inlier_mask_ : bool array of shape [n_samples]
Boolean mask of inliers classified as
True
.- n_skips_no_inliers_ : int
Number of iterations skipped due to finding zero inliers.
New in version 0.19.
- n_skips_invalid_data_ : int
Number of iterations skipped due to invalid data defined by
is_data_valid
.New in version 0.19.
- n_skips_invalid_model_ : int
Number of iterations skipped due to an invalid model defined by
is_model_valid
.New in version 0.19.
References
[R80ce5b25cf9d-1] https://en.wikipedia.org/wiki/RANSAC [R80ce5b25cf9d-2] https://www.sri.com/sites/default/files/publications/ransac-publication.pdf [R80ce5b25cf9d-3] http://www.bmva.org/bmvc/2009/Papers/Paper355/Paper355.pdf Examples
>>> from sklearn.linear_model import RANSACRegressor >>> from sklearn.datasets import make_regression >>> X, y = make_regression( ... n_samples=200, n_features=2, noise=4.0, random_state=0) >>> reg = RANSACRegressor(random_state=0).fit(X, y) >>> reg.score(X, y) 0.9885... >>> reg.predict(X[:1,]) array([-31.9417...])
Methods
fit
(self, X, y[, sample_weight])Fit estimator using RANSAC algorithm. get_params
(self[, deep])Get parameters for this estimator. predict
(self, X)Predict using the estimated model. score
(self, X, y)Returns the score of the prediction. set_params
(self, \*\*params)Set the parameters of this estimator. -
__init__
(self, base_estimator=None, min_samples=None, residual_threshold=None, is_data_valid=None, is_model_valid=None, max_trials=100, max_skips=inf, stop_n_inliers=inf, stop_score=inf, stop_probability=0.99, loss=’absolute_loss’, random_state=None)[source]¶
-
fit
(self, X, y, sample_weight=None)[source]¶ Fit estimator using RANSAC algorithm.
Parameters: - X : array-like or sparse matrix, shape [n_samples, n_features]
Training data.
- y : array-like, shape = [n_samples] or [n_samples, n_targets]
Target values.
- sample_weight : array-like, shape = [n_samples]
Individual weights for each sample raises error if sample_weight is passed and base_estimator fit method does not support it.
Raises: - ValueError
If no valid consensus set could be found. This occurs if
is_data_valid
andis_model_valid
return False for allmax_trials
randomly chosen sub-samples.
-
get_params
(self, deep=True)[source]¶ Get parameters for this estimator.
Parameters: - deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: - params : mapping of string to any
Parameter names mapped to their values.
-
predict
(self, X)[source]¶ Predict using the estimated model.
This is a wrapper for
estimator_.predict(X)
.Parameters: - X : numpy array of shape [n_samples, n_features]
Returns: - y : array, shape = [n_samples] or [n_samples, n_targets]
Returns predicted values.
-
score
(self, X, y)[source]¶ Returns the score of the prediction.
This is a wrapper for
estimator_.score(X, y)
.Parameters: - X : numpy array or sparse matrix of shape [n_samples, n_features]
Training data.
- y : array, shape = [n_samples] or [n_samples, n_targets]
Target values.
Returns: - z : float
Score of the prediction.
-
set_params
(self, **params)[source]¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Returns: - self