sklearn.linear_model
.PoissonRegressor¶
- class sklearn.linear_model.PoissonRegressor(*, alpha=1.0, fit_intercept=True, solver='lbfgs', max_iter=100, tol=0.0001, warm_start=False, verbose=0)[source]¶
Generalized Linear Model with a Poisson distribution.
This regressor uses the ‘log’ link function.
Read more in the User Guide.
New in version 0.23.
- Parameters:
- alphafloat, default=1
Constant that multiplies the L2 penalty term and determines the regularization strength.
alpha = 0
is equivalent to unpenalized GLMs. In this case, the design matrixX
must have full column rank (no collinearities). Values ofalpha
must be in the range[0.0, inf)
.- fit_interceptbool, default=True
Specifies if a constant (a.k.a. bias or intercept) should be added to the linear predictor (
X @ coef + intercept
).- solver{‘lbfgs’, ‘newton-cholesky’}, default=’lbfgs’
Algorithm to use in the optimization problem:
- ‘lbfgs’
Calls scipy’s L-BFGS-B optimizer.
- ‘newton-cholesky’
Uses Newton-Raphson steps (in arbitrary precision arithmetic equivalent to iterated reweighted least squares) with an inner Cholesky based solver. This solver is a good choice for
n_samples
>>n_features
, especially with one-hot encoded categorical features with rare categories. Be aware that the memory usage of this solver has a quadratic dependency onn_features
because it explicitly computes the Hessian matrix.New in version 1.2.
- max_iterint, default=100
The maximal number of iterations for the solver. Values must be in the range
[1, inf)
.- tolfloat, default=1e-4
Stopping criterion. For the lbfgs solver, the iteration will stop when
max{|g_j|, j = 1, ..., d} <= tol
whereg_j
is the j-th component of the gradient (derivative) of the objective function. Values must be in the range(0.0, inf)
.- warm_startbool, default=False
If set to
True
, reuse the solution of the previous call tofit
as initialization forcoef_
andintercept_
.- verboseint, default=0
For the lbfgs solver set verbose to any positive number for verbosity. Values must be in the range
[0, inf)
.
- Attributes:
- coef_array of shape (n_features,)
Estimated coefficients for the linear predictor (
X @ coef_ + intercept_
) in the GLM.- intercept_float
Intercept (a.k.a. bias) added to linear predictor.
- n_features_in_int
Number of features seen during fit.
New in version 0.24.
- feature_names_in_ndarray of shape (
n_features_in_
,) Names of features seen during fit. Defined only when
X
has feature names that are all strings.New in version 1.0.
- n_iter_int
Actual number of iterations used in the solver.
See also
TweedieRegressor
Generalized Linear Model with a Tweedie distribution.
Examples
>>> from sklearn import linear_model >>> clf = linear_model.PoissonRegressor() >>> X = [[1, 2], [2, 3], [3, 4], [4, 3]] >>> y = [12, 17, 22, 21] >>> clf.fit(X, y) PoissonRegressor() >>> clf.score(X, y) 0.990... >>> clf.coef_ array([0.121..., 0.158...]) >>> clf.intercept_ 2.088... >>> clf.predict([[1, 1], [3, 4]]) array([10.676..., 21.875...])
Methods
fit
(X, y[, sample_weight])Fit a Generalized Linear Model.
get_params
([deep])Get parameters for this estimator.
predict
(X)Predict using GLM with feature matrix X.
score
(X, y[, sample_weight])Compute D^2, the percentage of deviance explained.
set_params
(**params)Set the parameters of this estimator.
- property family¶
Ensure backward compatibility for the time of deprecation.
Deprecated since version 1.1: Will be removed in 1.3
- fit(X, y, sample_weight=None)[source]¶
Fit a Generalized Linear Model.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
Training data.
- yarray-like of shape (n_samples,)
Target values.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- selfobject
Fitted model.
- get_params(deep=True)[source]¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]¶
Predict using GLM with feature matrix X.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
Samples.
- Returns:
- y_predarray of shape (n_samples,)
Returns predicted values.
- score(X, y, sample_weight=None)[source]¶
Compute D^2, the percentage of deviance explained.
D^2 is a generalization of the coefficient of determination R^2. R^2 uses squared error and D^2 uses the deviance of this GLM, see the User Guide.
D^2 is defined as \(D^2 = 1-\frac{D(y_{true},y_{pred})}{D_{null}}\), \(D_{null}\) is the null deviance, i.e. the deviance of a model with intercept alone, which corresponds to \(y_{pred} = \bar{y}\). The mean \(\bar{y}\) is averaged by sample_weight. Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,)
True values of target.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
D^2 of self.predict(X) w.r.t. y.
- set_params(**params)[source]¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
Examples using sklearn.linear_model.PoissonRegressor
¶
Release Highlights for scikit-learn 0.23
Poisson regression and non-normal loss
Tweedie regression on insurance claims