sklearn.linear_model
.TweedieRegressor¶
- class sklearn.linear_model.TweedieRegressor(*, power=0.0, alpha=1.0, fit_intercept=True, link='auto', solver='lbfgs', max_iter=100, tol=0.0001, warm_start=False, verbose=0)[source]¶
Generalized Linear Model with a Tweedie distribution.
This estimator can be used to model different GLMs depending on the
power
parameter, which determines the underlying distribution.Read more in the User Guide.
New in version 0.23.
- Parameters:
- powerfloat, default=0
The power determines the underlying target distribution according to the following table:
Power
Distribution
0
Normal
1
Poisson
(1,2)
Compound Poisson Gamma
2
Gamma
3
Inverse Gaussian
For
0 < power < 1
, no distribution exists.- alphafloat, default=1
Constant that multiplies the L2 penalty term and determines the regularization strength.
alpha = 0
is equivalent to unpenalized GLMs. In this case, the design matrixX
must have full column rank (no collinearities). Values ofalpha
must be in the range[0.0, inf)
.- fit_interceptbool, default=True
Specifies if a constant (a.k.a. bias or intercept) should be added to the linear predictor (
X @ coef + intercept
).- link{‘auto’, ‘identity’, ‘log’}, default=’auto’
The link function of the GLM, i.e. mapping from linear predictor
X @ coeff + intercept
to predictiony_pred
. Option ‘auto’ sets the link depending on the chosenpower
parameter as follows:‘identity’ for
power <= 0
, e.g. for the Normal distribution‘log’ for
power > 0
, e.g. for Poisson, Gamma and Inverse Gaussian distributions
- solver{‘lbfgs’, ‘newton-cholesky’}, default=’lbfgs’
Algorithm to use in the optimization problem:
- ‘lbfgs’
Calls scipy’s L-BFGS-B optimizer.
- ‘newton-cholesky’
Uses Newton-Raphson steps (in arbitrary precision arithmetic equivalent to iterated reweighted least squares) with an inner Cholesky based solver. This solver is a good choice for
n_samples
>>n_features
, especially with one-hot encoded categorical features with rare categories. Be aware that the memory usage of this solver has a quadratic dependency onn_features
because it explicitly computes the Hessian matrix.New in version 1.2.
- max_iterint, default=100
The maximal number of iterations for the solver. Values must be in the range
[1, inf)
.- tolfloat, default=1e-4
Stopping criterion. For the lbfgs solver, the iteration will stop when
max{|g_j|, j = 1, ..., d} <= tol
whereg_j
is the j-th component of the gradient (derivative) of the objective function. Values must be in the range(0.0, inf)
.- warm_startbool, default=False
If set to
True
, reuse the solution of the previous call tofit
as initialization forcoef_
andintercept_
.- verboseint, default=0
For the lbfgs solver set verbose to any positive number for verbosity. Values must be in the range
[0, inf)
.
- Attributes:
- coef_array of shape (n_features,)
Estimated coefficients for the linear predictor (
X @ coef_ + intercept_
) in the GLM.- intercept_float
Intercept (a.k.a. bias) added to linear predictor.
- n_iter_int
Actual number of iterations used in the solver.
- n_features_in_int
Number of features seen during fit.
New in version 0.24.
- feature_names_in_ndarray of shape (
n_features_in_
,) Names of features seen during fit. Defined only when
X
has feature names that are all strings.New in version 1.0.
See also
PoissonRegressor
Generalized Linear Model with a Poisson distribution.
GammaRegressor
Generalized Linear Model with a Gamma distribution.
Examples
>>> from sklearn import linear_model >>> clf = linear_model.TweedieRegressor() >>> X = [[1, 2], [2, 3], [3, 4], [4, 3]] >>> y = [2, 3.5, 5, 5.5] >>> clf.fit(X, y) TweedieRegressor() >>> clf.score(X, y) 0.839... >>> clf.coef_ array([0.599..., 0.299...]) >>> clf.intercept_ 1.600... >>> clf.predict([[1, 1], [3, 4]]) array([2.500..., 4.599...])
Methods
fit
(X, y[, sample_weight])Fit a Generalized Linear Model.
get_params
([deep])Get parameters for this estimator.
predict
(X)Predict using GLM with feature matrix X.
score
(X, y[, sample_weight])Compute D^2, the percentage of deviance explained.
set_params
(**params)Set the parameters of this estimator.
- property family¶
Ensure backward compatibility for the time of deprecation.
Deprecated since version 1.1: Will be removed in 1.3
- fit(X, y, sample_weight=None)[source]¶
Fit a Generalized Linear Model.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
Training data.
- yarray-like of shape (n_samples,)
Target values.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- selfobject
Fitted model.
- get_params(deep=True)[source]¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]¶
Predict using GLM with feature matrix X.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
Samples.
- Returns:
- y_predarray of shape (n_samples,)
Returns predicted values.
- score(X, y, sample_weight=None)[source]¶
Compute D^2, the percentage of deviance explained.
D^2 is a generalization of the coefficient of determination R^2. R^2 uses squared error and D^2 uses the deviance of this GLM, see the User Guide.
D^2 is defined as \(D^2 = 1-\frac{D(y_{true},y_{pred})}{D_{null}}\), \(D_{null}\) is the null deviance, i.e. the deviance of a model with intercept alone, which corresponds to \(y_{pred} = \bar{y}\). The mean \(\bar{y}\) is averaged by sample_weight. Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,)
True values of target.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
D^2 of self.predict(X) w.r.t. y.
- set_params(**params)[source]¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
Examples using sklearn.linear_model.TweedieRegressor
¶
Release Highlights for scikit-learn 0.23
Tweedie regression on insurance claims