A tutorial on statistical-learning for scientific data processing¶

Statistical learning

Machine learning is a technique with a growing importance, as the size of the datasets experimental sciences are facing is rapidly growing. Problems it tackles range from building a prediction function linking different observations, to classifying observations, or learning the structure in an unlabeled dataset.

This tutorial will explore statistical learning, the use of machine learning techniques with the goal of statistical inference: drawing conclusions on the data at hand.

Scikit-learn is a Python module integrating classic machine learning algorithms in the tightly-knit world of scientific Python packages (NumPy, SciPy, matplotlib).

Statistical learning: the setting and the estimator object in scikit-learn
- Datasets
- Estimators objects
Supervised learning: predicting an output variable from high-dimensional observations
Model selection: choosing estimators and their parameters
Unsupervised learning: seeking representations of the data
- Clustering: grouping observations together
- Decompositions: from a signal to components and loadings
Putting it all together
Finding help
- The project mailing list
- Q&A communities with Machine Learning practitioners