sklearn.linear_model.logistic_regression_path

sklearn.linear_model.logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True, max_iter=100, tol=0.0001, verbose=0, solver='lbfgs', coef=None, copy=False, class_weight=None, dual=False, penalty='l2', intercept_scaling=1.0, multi_class='ovr', random_state=None, check_input=True, max_squared_sum=None, sample_weight=None)[source]

Compute a Logistic Regression model for a list of regularization parameters.

This is an implementation that uses the result of the previous model to speed up computations along the set of solutions, making it faster than sequentially calling LogisticRegression for the different parameters. Note that there will be no speedup with liblinear solver, since it does not handle warm-starting.

Read more in the User Guide.

Parameters:

X : array-like or sparse matrix, shape (n_samples, n_features)

Input data.

y : array-like, shape (n_samples,)

Input data, target values.

Cs : int | array-like, shape (n_cs,)

List of values for the regularization parameter or integer specifying the number of regularization parameters that should be used. In this case, the parameters will be chosen in a logarithmic scale between 1e-4 and 1e4.

pos_class : int, None

The class with respect to which we perform a one-vs-all fit. If None, then it is assumed that the given problem is binary.

fit_intercept : bool

Whether to fit an intercept for the model. In this case the shape of the returned array is (n_cs, n_features + 1).

max_iter : int

Maximum number of iterations for the solver.

tol : float

Stopping criterion. For the newton-cg and lbfgs solvers, the iteration will stop when max{|g_i | i = 1, ..., n} <= tol where g_i is the i-th component of the gradient.

verbose : int

For the liblinear and lbfgs solvers set verbose to any positive number for verbosity.

solver : {‘lbfgs’, ‘newton-cg’, ‘liblinear’, ‘sag’}

Numerical solver to use.

coef : array-like, shape (n_features,), default None

Initialization value for coefficients of logistic regression. Useless for liblinear solver.

copy : bool, default False

Whether or not to produce a copy of the data. A copy is not required anymore. This parameter is deprecated and will be removed in 0.19.

class_weight : dict or ‘balanced’, optional

Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

dual : bool

Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.

penalty : str, ‘l1’ or ‘l2’

Used to specify the norm used in the penalization. The ‘newton-cg’, ‘sag’ and ‘lbfgs’ solvers support only l2 penalties.

intercept_scaling : float, default 1.

Useful only when the solver ‘liblinear’ is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic_feature_weight.

Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.

multi_class : str, {‘ovr’, ‘multinomial’}

Multiclass option can be either ‘ovr’ or ‘multinomial’. If the option chosen is ‘ovr’, then a binary problem is fit for each label. Else the loss minimised is the multinomial loss fit across the entire probability distribution. Works only for the ‘lbfgs’ and ‘newton-cg’ solvers.

random_state : int seed, RandomState instance, or None (default)

The seed of the pseudo random number generator to use when shuffling the data. Used only in solvers ‘sag’ and ‘liblinear’.

check_input : bool, default True

If False, the input arrays X and y will not be checked.

max_squared_sum : float, default None

Maximum squared sum of X over samples. Used only in SAG solver. If None, it will be computed, going through all the samples. The value should be precomputed to speed up cross validation.

sample_weight : array-like, shape(n_samples,) optional

Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

Returns:

coefs : ndarray, shape (n_cs, n_features) or (n_cs, n_features + 1)

List of coefficients for the Logistic Regression model. If fit_intercept is set to True then the second dimension will be n_features + 1, where the last item represents the intercept.

Cs : ndarray

Grid of Cs used for cross-validation.

n_iter : array, shape (n_cs,)

Actual number of iteration for each Cs.

Notes

You might get slightly different results with the solver liblinear than with the others since this uses LIBLINEAR which penalizes the intercept.