- 1. Supervised learning
- 1.1. Generalized Linear Models
- 1.1.1. Ordinary Least Squares
- 1.1.2. Ridge Regression
- 1.1.3. Lasso
- 1.1.4. Elastic Net
- 1.1.5. Multi-task Lasso
- 1.1.6. Least Angle Regression
- 1.1.7. LARS Lasso
- 1.1.8. Orthogonal Matching Pursuit (OMP)
- 1.1.9. Bayesian Regression
- 1.1.10. Logistic regression
- 1.1.11. Stochastic Gradient Descent - SGD
- 1.1.12. Perceptron
- 1.1.13. Passive Aggressive Algorithms
- 1.1.14. Robustness regression: outliers and modeling errors
- 1.1.15. Polynomial regression: extending linear models with basis functions
- 1.2. Linear and quadratic discriminant analysis
- 1.3. Kernel ridge regression
- 1.4. Support Vector Machines
- 1.5. Stochastic Gradient Descent
- 1.6. Nearest Neighbors
- 1.7. Gaussian Processes
- 1.8. Cross decomposition
- 1.9. Naive Bayes
- 1.10. Decision Trees
- 1.11. Ensemble methods
- 1.11.1. Bagging meta-estimator
- 1.11.2. Forests of randomized trees
- 1.11.3. AdaBoost
- 1.11.4. Gradient Tree Boosting
- 1.12. Multiclass and multilabel algorithms
- 1.13. Feature selection
- 1.14. Semi-Supervised
- 1.15. Isotonic regression
- 1.16. Probability calibration
- 1.1. Generalized Linear Models
- 2. Unsupervised learning
- 2.1. Gaussian mixture models
- 2.2. Manifold learning
- 2.2.1. Introduction
- 2.2.2. Isomap
- 2.2.3. Locally Linear Embedding
- 2.2.4. Modified Locally Linear Embedding
- 2.2.5. Hessian Eigenmapping
- 2.2.6. Spectral Embedding
- 2.2.7. Local Tangent Space Alignment
- 2.2.8. Multi-dimensional Scaling (MDS)
- 2.2.9. t-distributed Stochastic Neighbor Embedding (t-SNE)
- 2.2.10. Tips on practical use
- 2.3. Clustering
- 2.3.1. Overview of clustering methods
- 2.3.2. K-means
- 2.3.3. Affinity Propagation
- 2.3.4. Mean Shift
- 2.3.5. Spectral clustering
- 2.3.6. Hierarchical clustering
- 2.3.7. DBSCAN
- 2.3.8. Birch
- 2.3.9. Clustering performance evaluation
- 2.4. Biclustering
- 2.5. Decomposing signals in components (matrix factorization problems)
- 2.6. Covariance estimation
- 2.7. Novelty and Outlier Detection
- 2.8. Density Estimation
- 2.9. Neural network models (unsupervised)
- 3. Model selection and evaluation
- 3.1. Cross-validation: evaluating estimator performance
- 3.2. Grid Search: Searching for estimator parameters
- 3.2.1. Exhaustive Grid Search
- 3.2.2. Randomized Parameter Optimization
- 3.2.3. Tips for parameter search
- 3.2.4. Alternatives to brute force parameter search
- 3.2.4.1. Model specific cross-validation
- 3.2.4.1.1. sklearn.linear_model.ElasticNetCV
- 3.2.4.1.2. sklearn.linear_model.LarsCV
- 3.2.4.1.3. sklearn.linear_model.LassoCV
- 3.2.4.1.4. sklearn.linear_model.LassoLarsCV
- 3.2.4.1.5. sklearn.linear_model.LogisticRegressionCV
- 3.2.4.1.6. sklearn.linear_model.MultiTaskElasticNetCV
- 3.2.4.1.7. sklearn.linear_model.MultiTaskLassoCV
- 3.2.4.1.8. sklearn.linear_model.OrthogonalMatchingPursuitCV
- 3.2.4.1.9. sklearn.linear_model.RidgeCV
- 3.2.4.1.10. sklearn.linear_model.RidgeClassifierCV
- 3.2.4.2. Information Criterion
- 3.2.4.3. Out of Bag Estimates
- 3.2.4.3.1. sklearn.ensemble.RandomForestClassifier
- 3.2.4.3.2. sklearn.ensemble.RandomForestRegressor
- 3.2.4.3.3. sklearn.ensemble.ExtraTreesClassifier
- 3.2.4.3.4. sklearn.ensemble.ExtraTreesRegressor
- 3.2.4.3.5. sklearn.ensemble.GradientBoostingClassifier
- 3.2.4.3.6. sklearn.ensemble.GradientBoostingRegressor
- 3.2.4.1. Model specific cross-validation
- 3.3. Model evaluation: quantifying the quality of predictions
- 3.3.1. The scoring parameter: defining model evaluation rules
- 3.3.2. Classification metrics
- 3.3.2.1. From binary to multiclass and multilabel
- 3.3.2.2. Accuracy score
- 3.3.2.3. Confusion matrix
- 3.3.2.4. Classification report
- 3.3.2.5. Hamming loss
- 3.3.2.6. Jaccard similarity coefficient score
- 3.3.2.7. Precision, recall and F-measures
- 3.3.2.8. Hinge loss
- 3.3.2.9. Log loss
- 3.3.2.10. Matthews correlation coefficient
- 3.3.2.11. Receiver operating characteristic (ROC)
- 3.3.2.12. Zero one loss
- 3.3.3. Multilabel ranking metrics
- 3.3.4. Regression metrics
- 3.3.5. Clustering metrics
- 3.3.6. Dummy estimators
- 3.4. Model persistence
- 3.5. Validation curves: plotting scores to evaluate models
- 4. Dataset transformations
- 4.1. Pipeline and FeatureUnion: combining estimators
- 4.2. Feature extraction
- 4.2.1. Loading features from dicts
- 4.2.2. Feature hashing
- 4.2.3. Text feature extraction
- 4.2.3.1. The Bag of Words representation
- 4.2.3.2. Sparsity
- 4.2.3.3. Common Vectorizer usage
- 4.2.3.4. Tf–idf term weighting
- 4.2.3.5. Decoding text files
- 4.2.3.6. Applications and examples
- 4.2.3.7. Limitations of the Bag of Words representation
- 4.2.3.8. Vectorizing a large text corpus with the hashing trick
- 4.2.3.9. Performing out-of-core scaling with HashingVectorizer
- 4.2.3.10. Customizing the vectorizer classes
- 4.2.4. Image feature extraction
- 4.3. Preprocessing data
- 4.4. Unsupervised dimensionality reduction
- 4.5. Random Projection
- 4.6. Kernel Approximation
- 4.7. Pairwise metrics, Affinities and Kernels
- 4.8. Transforming the prediction target (y)
- 5. Dataset loading utilities
- 5.1. General dataset API
- 5.2. Toy datasets
- 5.3. Sample images
- 5.4. Sample generators
- 5.5. Datasets in svmlight / libsvm format
- 5.6. The Olivetti faces dataset
- 5.7. The 20 newsgroups text dataset
- 5.8. Downloading datasets from the mldata.org repository
- 5.9. The Labeled Faces in the Wild face recognition dataset
- 5.10. Forest covertypes
- 6. Strategies to scale computationally: bigger data
- 7. Computational Performance