load_diabetes#

sklearn.datasets.load_diabetes(*, return_X_y=False, as_frame=False, scaled=True)[source]#

Load and return the diabetes dataset (regression).

Samples total	442
Dimensionality	10
Features	real, -.2 < x < .2
Targets	integer 25 - 346

Note

The meaning of each feature (i.e. feature_names) might be unclear (especially for ltg) as the documentation of the original dataset is not explicit. We provide information that seems correct in regard with the scientific literature in this field of research.

Read more in the User Guide.

Parameters:

return_X_ybool, default=False: If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target objects.

Added in version 0.18.
as_framebool, default=False: If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric). The target is a pandas DataFrame or Series depending on the number of target columns. If return_X_y is True, then (data, target) will be pandas DataFrames or Series as described below.

Added in version 0.23.
scaledbool, default=True: If True, the feature variables are mean centered and scaled by the standard deviation times the square root of n_samples. If False, raw data is returned for the feature variables.

Added in version 1.1.

Returns:

dataBunch

Dictionary-like object, with the following attributes.

data{ndarray, dataframe} of shape (442, 10): The data matrix. If as_frame=True, data will be a pandas DataFrame.
target: {ndarray, Series} of shape (442,): The regression target. If as_frame=True, target will be a pandas Series.
feature_names: list: The names of the dataset columns.
frame: DataFrame of shape (442, 11): Only present when as_frame=True. DataFrame with data and target.

Added in version 0.23.
DESCR: str: The full description of the dataset.
data_filename: str: The path to the location of the data.
target_filename: str: The path to the location of the target.

(data, target)tuple if return_X_y is True

A tuple of two ndarrays. The first contains a 2D ndarray of shape (442, 10) with each row representing one sample and the columns representing the features. The second ndarray of shape (442,) contains the target samples. If as_frame=True, both arrays are pandas objects, i.e. X a dataframe and y a series.

Added in version 0.18.

Examples

>>> from sklearn.datasets import load_diabetes
>>> diabetes = load_diabetes()
>>> diabetes.target[:3]
array([151.,  75., 141.])
>>> diabetes.data.shape
(442, 10)

Gallery examples#

Model Complexity Influence

Gradient Boosting regression

Plot individual and voting regression predictions

Model-based and sequential feature selection

Imputing missing values before building an estimator

Advanced Plotting With Partial Dependence

Lasso model selection via information criteria

Lasso, Lasso-LARS, and Elastic Net paths

Lasso model selection: AIC-BIC / cross-validation

Ordinary Least Squares and Ridge Regression

Plotting Cross-Validated Predictions

Release Highlights for scikit-learn 1.2