{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# Effect of model regularization on training and test error\n\nIn this example, we evaluate the impact of the regularization parameter in a\nlinear model called :class:`~sklearn.linear_model.ElasticNet`. To carry out this\nevaluation, we use a validation curve using\n:class:`~sklearn.model_selection.ValidationCurveDisplay`. This curve shows the\ntraining and test scores of the model for different values of the regularization\nparameter.\n\nOnce we identify the optimal regularization parameter, we compare the true and\nestimated coefficients of the model to determine if the model is able to recover\nthe coefficients from the noisy input data.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generate sample data\n\nWe generate a regression dataset that contains many features relative to the\nnumber of samples. However, only 10% of the features are informative. In this context,\nlinear models exposing L1 penalization are commonly used to recover a sparse\nset of coefficients.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from sklearn.datasets import make_regression\nfrom sklearn.model_selection import train_test_split\n\nn_samples_train, n_samples_test, n_features = 150, 300, 500\nX, y, true_coef = make_regression(\n n_samples=n_samples_train + n_samples_test,\n n_features=n_features,\n n_informative=50,\n shuffle=False,\n noise=1.0,\n coef=True,\n random_state=42,\n)\nX_train, X_test, y_train, y_test = train_test_split(\n X, y, train_size=n_samples_train, test_size=n_samples_test, shuffle=False\n)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model definition\n\nHere, we do not use a model that only exposes an L1 penalty. Instead, we use\nan :class:`~sklearn.linear_model.ElasticNet` model that exposes both L1 and L2\npenalties.\n\nWe fix the `l1_ratio` parameter such that the solution found by the model is still\nsparse. Therefore, this type of model tries to find a sparse solution but at the same\ntime also tries to shrink all coefficients towards zero.\n\nIn addition, we force the coefficients of the model to be positive since we know that\n`make_regression` generates a response with a positive signal. So we use this\npre-knowledge to get a better model.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from sklearn.linear_model import ElasticNet\n\nenet = ElasticNet(l1_ratio=0.9, positive=True, max_iter=10_000)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evaluate the impact of the regularization parameter\n\nTo evaluate the impact of the regularization parameter, we use a validation\ncurve. This curve shows the training and test scores of the model for different\nvalues of the regularization parameter.\n\nThe regularization `alpha` is a parameter applied to the coefficients of the model:\nwhen it tends to zero, no regularization is applied and the model tries to fit the\ntraining data with the least amount of error. However, it leads to overfitting when\nfeatures are noisy. When `alpha` increases, the model coefficients are constrained,\nand thus the model cannot fit the training data as closely, avoiding overfitting.\nHowever, if too much regularization is applied, the model underfits the data and\nis not able to properly capture the signal.\n\nThe validation curve helps in finding a good trade-off between both extremes: the\nmodel is not regularized and thus flexible enough to fit the signal, but not too\nflexible to overfit. The :class:`~sklearn.model_selection.ValidationCurveDisplay`\nallows us to display the training and validation scores across a range of alpha\nvalues.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy as np\n\nfrom sklearn.model_selection import ValidationCurveDisplay\n\nalphas = np.logspace(-5, 1, 60)\ndisp = ValidationCurveDisplay.from_estimator(\n enet,\n X_train,\n y_train,\n param_name=\"alpha\",\n param_range=alphas,\n scoring=\"r2\",\n n_jobs=2,\n score_type=\"both\",\n)\ndisp.ax_.set(\n title=r\"Validation Curve for ElasticNet (R$^2$ Score)\",\n xlabel=r\"alpha (regularization strength)\",\n ylabel=\"R$^2$ Score\",\n)\n\ntest_scores_mean = disp.test_scores.mean(axis=1)\nidx_avg_max_test_score = np.argmax(test_scores_mean)\ndisp.ax_.vlines(\n alphas[idx_avg_max_test_score],\n disp.ax_.get_ylim()[0],\n test_scores_mean[idx_avg_max_test_score],\n color=\"k\",\n linewidth=2,\n linestyle=\"--\",\n label=f\"Optimum on test\\n$\\\\alpha$ = {alphas[idx_avg_max_test_score]:.2e}\",\n)\n_ = disp.ax_.legend(loc=\"lower right\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To find the optimal regularization parameter, we can select the value of `alpha`\nthat maximizes the validation score.\n\n## Coefficients comparison\n\nNow that we have identified the optimal regularization parameter, we can compare the\ntrue coefficients and the estimated coefficients.\n\nFirst, let's set the regularization parameter to the optimal value and fit the\nmodel on the training data. In addition, we'll show the test score for this model.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"enet.set_params(alpha=alphas[idx_avg_max_test_score]).fit(X_train, y_train)\nprint(\n f\"Test score: {enet.score(X_test, y_test):.3f}\",\n)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we plot the true coefficients and the estimated coefficients.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n\nfig, axs = plt.subplots(ncols=2, figsize=(12, 6), sharex=True, sharey=True)\nfor ax, coef, title in zip(axs, [true_coef, enet.coef_], [\"True\", \"Model\"]):\n ax.stem(coef)\n ax.set(\n title=f\"{title} Coefficients\",\n xlabel=\"Feature Index\",\n ylabel=\"Coefficient Value\",\n )\nfig.suptitle(\n \"Comparison of the coefficients of the true generative model and \\n\"\n \"the estimated elastic net coefficients\"\n)\n\nplt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While the original coefficients are sparse, the estimated coefficients are not\nas sparse. The reason is that we fixed the `l1_ratio` parameter to 0.9. We could\nforce the model to get a sparser solution by increasing the `l1_ratio` parameter.\n\nHowever, we observed that for the estimated coefficients that are close to zero in\nthe true generative model, our model shrinks them towards zero. So we don't recover\nthe true coefficients, but we get a sensible outcome in line with the performance\nobtained on the test set.\n\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.20"
}
},
"nbformat": 4,
"nbformat_minor": 0
}