sklearn.linear_model.ridge_regression(X, y, alpha, *, sample_weight=None, solver='auto', max_iter=None, tol=0.0001, verbose=0, positive=False, random_state=None, return_n_iter=False, return_intercept=False, check_input=True)[source]#

Solve the ridge equation by the method of normal equations.

Read more in the User Guide.

X{array-like, sparse matrix, LinearOperator} of shape (n_samples, n_features)

Training data.

yarray-like of shape (n_samples,) or (n_samples, n_targets)

Target values.

alphafloat or array-like of shape (n_targets,)

Constant that multiplies the L2 term, controlling regularization strength. alpha must be a non-negative float i.e. in [0, inf).

When alpha = 0, the objective is equivalent to ordinary least squares, solved by the LinearRegression object. For numerical reasons, using alpha = 0 with the Ridge object is not advised. Instead, you should use the LinearRegression object.

If an array is passed, penalties are assumed to be specific to the targets. Hence they must correspond in number.

sample_weightfloat or array-like of shape (n_samples,), default=None

Individual weights for each sample. If given a float, every sample will have the same weight. If sample_weight is not None and solver=’auto’, the solver will be set to ‘cholesky’.

Added in version 0.17.

solver{‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’, ‘lbfgs’}, default=’auto’

Solver to use in the computational routines:

  • ‘auto’ chooses the solver automatically based on the type of data.

  • ‘svd’ uses a Singular Value Decomposition of X to compute the Ridge coefficients. It is the most stable solver, in particular more stable for singular matrices than ‘cholesky’ at the cost of being slower.

  • ‘cholesky’ uses the standard scipy.linalg.solve function to obtain a closed-form solution via a Cholesky decomposition of dot(X.T, X)

  • ‘sparse_cg’ uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than ‘cholesky’ for large-scale data (possibility to set tol and max_iter).

  • ‘lsqr’ uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure.

  • ‘sag’ uses a Stochastic Average Gradient descent, and ‘saga’ uses its improved, unbiased version named SAGA. Both methods also use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.

  • ‘lbfgs’ uses L-BFGS-B algorithm implemented in scipy.optimize.minimize. It can be used only when positive is True.

All solvers except ‘svd’ support both dense and sparse data. However, only ‘lsqr’, ‘sag’, ‘sparse_cg’, and ‘lbfgs’ support sparse input when fit_intercept is True.

Added in version 0.17: Stochastic Average Gradient descent solver.

Added in version 0.19: SAGA solver.

max_iterint, default=None

Maximum number of iterations for conjugate gradient solver. For the ‘sparse_cg’ and ‘lsqr’ solvers, the default value is determined by scipy.sparse.linalg. For ‘sag’ and saga solver, the default value is 1000. For ‘lbfgs’ solver, the default value is 15000.

tolfloat, default=1e-4

Precision of the solution. Note that tol has no effect for solvers ‘svd’ and ‘cholesky’.

Changed in version 1.2: Default value changed from 1e-3 to 1e-4 for consistency with other linear models.

verboseint, default=0

Verbosity level. Setting verbose > 0 will display additional information depending on the solver used.

positivebool, default=False

When set to True, forces the coefficients to be positive. Only ‘lbfgs’ solver is supported in this case.

random_stateint, RandomState instance, default=None

Used when solver == ‘sag’ or ‘saga’ to shuffle the data. See Glossary for details.

return_n_iterbool, default=False

If True, the method also returns n_iter, the actual number of iteration performed by the solver.

Added in version 0.17.

return_interceptbool, default=False

If True and if X is sparse, the method also returns the intercept, and the solver is automatically changed to ‘sag’. This is only a temporary fix for fitting the intercept with sparse data. For dense data, use sklearn.linear_model._preprocess_data before your regression.

Added in version 0.17.

check_inputbool, default=True

If False, the input arrays X and y will not be checked.

Added in version 0.21.

coefndarray of shape (n_features,) or (n_targets, n_features)

Weight vector(s).

n_iterint, optional

The actual number of iteration performed by the solver. Only returned if return_n_iter is True.

interceptfloat or ndarray of shape (n_targets,)

The intercept of the model. Only returned if return_intercept is True and if X is a scipy sparse array.


This function won’t compute the intercept.

Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to 1 / (2C) in other linear models such as LogisticRegression or LinearSVC. If an array is passed, penalties are assumed to be specific to the targets. Hence they must correspond in number.


>>> import numpy as np
>>> from sklearn.datasets import make_regression
>>> from sklearn.linear_model import ridge_regression
>>> rng = np.random.RandomState(0)
>>> X = rng.randn(100, 4)
>>> y = 2.0 * X[:, 0] - 1.0 * X[:, 1] + 0.1 * rng.standard_normal(100)
>>> coef, intercept = ridge_regression(X, y, alpha=1.0, return_intercept=True)
>>> list(coef)
[1.9..., -1.0..., -0.0..., -0.0...]
>>> intercept