Note

Go to the end to download the full example code or to run this example in your browser via JupyterLite or Binder.

Fitting an Elastic Net with a precomputed Gram Matrix and Weighted Samples#

The following example shows how to precompute the gram matrix while using weighted samples with an ElasticNet.

If weighted samples are used, the design matrix must be centered and then rescaled by the square root of the weight vector before the gram matrix is computed.

Note

sample_weight vector is also rescaled to sum to n_samples, see the: documentation for the sample_weight parameter to fit.

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

Let’s start by loading the dataset and creating some sample weights.

import numpy as np

from sklearn.datasets import make_regression

rng = np.random.RandomState(0)

n_samples = int(1e5)
X, y = make_regression(n_samples=n_samples, noise=0.5, random_state=rng)

sample_weight = rng.lognormal(size=n_samples)
# normalize the sample weights
normalized_weights = sample_weight * (n_samples / (sample_weight.sum()))

To fit the elastic net using the precompute option together with the sample weights, we must first center the design matrix, and rescale it by the normalized weights prior to computing the gram matrix.

X_offset = np.average(X, axis=0, weights=normalized_weights)
X_centered = X - np.average(X, axis=0, weights=normalized_weights)
X_scaled = X_centered * np.sqrt(normalized_weights)[:, np.newaxis]
gram = np.dot(X_scaled.T, X_scaled)

We can now proceed with fitting. We must passed the centered design matrix to fit otherwise the elastic net estimator will detect that it is uncentered and discard the gram matrix we passed. However, if we pass the scaled design matrix, the preprocessing code will incorrectly rescale it a second time.

from sklearn.linear_model import ElasticNet

lm = ElasticNet(alpha=0.01, precompute=gram)
lm.fit(X_centered, y, sample_weight=normalized_weights)

ElasticNet(alpha=0.01,
           precompute=array([[ 9.98809919e+04, -4.48938813e+02, -1.03237920e+03, ...,
        -2.25349312e+02, -3.53959628e+02, -1.67451144e+02],
       [-4.48938813e+02,  1.00768662e+05,  1.19112072e+02, ...,
        -1.07963978e+03,  7.47987268e+01, -5.76195467e+02],
       [-1.03237920e+03,  1.19112072e+02,  1.00393284e+05, ...,
        -3.07582983e+02,  6.66670169e+02,  2.65799352e+02],
       ...,
       [-2.25349312e+02, -1.07963978e+03, -3.07582983e+02, ...,
         9.99891212e+04, -4.58195950e+02, -1.58667835e+02],
       [-3.53959628e+02,  7.47987268e+01,  6.66670169e+02, ...,
        -4.58195950e+02,  9.98350372e+04,  5.60836363e+02],
       [-1.67451144e+02, -5.76195467e+02,  2.65799352e+02, ...,
        -1.58667835e+02,  5.60836363e+02,  1.00911944e+05]],
      shape=(100, 100)))

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

ElasticNet

?Documentation for ElasticNetiFitted

Parameters

	alpha alpha: float, default=1.0 Constant that multiplies the penalty terms. Defaults to 1.0. See the notes for the exact mathematical meaning of this parameter. ``alpha = 0`` is equivalent to an ordinary least square, solved by the :class:`LinearRegression` object. For numerical reasons, using ``alpha = 0`` with the ``Lasso`` object is not advised. Given this, you should use the :class:`LinearRegression` object.	0.01
	precompute precompute: bool or array-like of shape (n_features, n_features), default=False Whether to use a precomputed Gram matrix to speed up calculations. The Gram matrix can also be passed as argument. For sparse input this option is always ``False`` to preserve sparsity. Check :ref:`an example on how to use a precomputed Gram Matrix in ElasticNet <sphx_glr_auto_examples_linear_model_plot_elastic_net_precomputed_gram_matrix_with_weighted_samples.py>` for details.	array([[ 9.98...pe=(100, 100))
	l1_ratio l1_ratio: float, default=0.5 The ElasticNet mixing parameter, with ``0 <= l1_ratio <= 1``. For ``l1_ratio = 0`` the penalty is an L2 penalty. ``For l1_ratio = 1`` it is an L1 penalty. For ``0 < l1_ratio < 1``, the penalty is a combination of L1 and L2.	0.5
	fit_intercept fit_intercept: bool, default=True Whether the intercept should be estimated or not. If ``False``, the data is assumed to be already centered.	True
	max_iter max_iter: int, default=1000 The maximum number of iterations.	1000
	copy_X copy_X: bool, default=True If ``True``, X will be copied; else, it may be overwritten.	True
	tol tol: float, default=1e-4 The tolerance for the optimization: if the updates are smaller or equal to ``tol``, the optimization code checks the dual gap for optimality and continues until it is smaller or equal to ``tol``, see Notes below.	0.0001
	warm_start warm_start: bool, default=False When set to ``True``, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See :term:`the Glossary <warm_start>`.	False
	positive positive: bool, default=False When set to ``True``, forces the coefficients to be positive.	False
	random_state random_state: int, RandomState instance, default=None The seed of the pseudo random number generator that selects a random feature to update. Used when ``selection`` == 'random'. Pass an int for reproducible output across multiple function calls. See :term:`Glossary <random_state>`.	None
	selection selection: {'cyclic', 'random'}, default='cyclic' If set to 'random', a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to 'random') often leads to significantly faster convergence especially when tol is higher than 1e-4.	'cyclic'

Fitted attributes

Name	Type	Value
coef_ coef_: ndarray of shape (n_features,) or (n_targets, n_features) Parameter vector (w in the cost function formula).	ndarray[float64](100,)	[-0.,-0., 0.,..., 0., 0., 0.]
dual_gap_ dual_gap_: float or ndarray of shape (n_targets,) Given param alpha, the dual gaps at the end of the optimization, same shape as each observation of y.	float64	np.float64(0.003)
intercept_ intercept_: float or ndarray of shape (n_targets,) Independent term in decision function.	float64	np.float64(-0.03)
n_features_in_ n_features_in_: int Number of features seen during :term:`fit`. .. versionadded:: 0.24	int	100
n_iter_ n_iter_: list of int Number of iterations run by the coordinate descent solver to reach the specified tolerance.	int	4
sparse_coef_ sparse_coef_: sparse matrix of shape (n_features,) or (n_targets, n_features) Sparse representation of the `coef_`.	csr_matrix[float64](1, 100)	<Compressed S...hape (1, 100)>