# Statistical learning: the setting and the estimator object in scikit-learn¶

## Datasets¶

Scikit-learn deals with learning information from one or more
datasets that are represented as 2D arrays. They can be understood as a
list of multi-dimensional observations. We say that the first axis of
these arrays is the **samples** axis, while the second is the
**features** axis.

When the data is not initially in the `(n_samples, n_features)`

shape, it
needs to be preprocessed in order to be used by scikit-learn.

## Estimators objects¶

**Fitting data**: the main API implemented by scikit-learn is that of the
`estimator`

. An estimator is any object that learns from data;
it may be a classification, regression or clustering algorithm or
a *transformer* that extracts/filters useful features from raw data.

All estimator objects expose a `fit`

method that takes a dataset
(usually a 2-d array):

```
>>> estimator.fit(data)
```

**Estimator parameters**: All the parameters of an estimator can be set
when it is instantiated or by modifying the corresponding attribute:

```
>>> estimator = Estimator(param1=1, param2=2)
>>> estimator.param1
1
```

**Estimated parameters**: When data is fitted with an estimator,
parameters are estimated from the data at hand. All the estimated
parameters are attributes of the estimator object ending by an
underscore:

```
>>> estimator.estimated_param_
```