Version 0.18#
Warning
Scikit-learn 0.18 is the last major release of scikit-learn to support Python 2.6. Later versions of scikit-learn will require Python 2.7 or above.
Version 0.18.2#
June 20, 2017
Changelog#
Code Contributors#
Aman Dalmia, Loic Esteve, Nate Guerin, Sergei Lebedev
Version 0.18.1#
November 11, 2016
Changelog#
Enhancements#
Improved
sample_without_replacementspeed by utilizing numpy.random.permutation for most cases. As a result, samples may differ in this release for a fixed random state. Affected estimators:This also affects the
datasets.make_classificationmethod.
Bug fixes#
Fix issue where
min_grad_normandn_iter_without_progressparameters were not being utilised bymanifold.TSNE. #6497 by Sebastian SägerFix bug for svm’s decision values when
decision_function_shapeisovrinsvm.SVC.svm.SVC’s decision_function was incorrect from versions 0.17.0 through 0.18.0. #7724 by Bing Tian DaiAttribute
explained_variance_ratioofdiscriminant_analysis.LinearDiscriminantAnalysiscalculated with SVD and Eigen solver are now of the same length. #7632 by JPFrancoiaFixes issue in Univariate feature selection where score functions were not accepting multi-label targets. #7676 by Mohammed Affan
Fixed setting parameters when calling
fitmultiple times onfeature_selection.SelectFromModel. #7756 by Andreas MüllerFixes issue in
partial_fitmethod ofmulticlass.OneVsRestClassifierwhen number of classes used inpartial_fitwas less than the total number of classes in the data. #7786 by Srivatsan RameshFixes issue in
calibration.CalibratedClassifierCVwhere the sum of probabilities of each class for a data was not 1, andCalibratedClassifierCVnow handles the case where the training set has less number of classes than the total data. #7799 by Srivatsan RameshFix a bug where
sklearn.feature_selection.SelectFdrdid not exactly implement Benjamini-Hochberg procedure. It formerly may have selected fewer features than it should. #7490 by Peng Meng.sklearn.manifold.LocallyLinearEmbeddingnow correctly handles integer inputs. #6282 by Jake Vanderplas.The
min_weight_fraction_leafparameter of tree-based classifiers and regressors now assumes uniform sample weights by default if thesample_weightargument is not passed to thefitfunction. Previously, the parameter was silently ignored. #7301 by Nelson Liu.Numerical issue with
linear_model.RidgeCVon centered data whenn_features > n_samples. #6178 by Bertrand ThirionTree splitting criterion classes’ cloning/pickling is now memory safe #7680 by Ibraim Ganiev.
Fixed a bug where
decomposition.NMFsets itsn_iters_attribute intransform(). #7553 by Ekaterina Krivich.sklearn.linear_model.LogisticRegressionCVnow correctly handles string labels. #5874 by Raghav RV.Fixed a bug where
sklearn.model_selection.train_test_splitraised an error whenstratifyis a list of string labels. #7593 by Raghav RV.Fixed a bug where
sklearn.model_selection.GridSearchCVandsklearn.model_selection.RandomizedSearchCVwere not pickleable because of a pickling bug innp.ma.MaskedArray. #7594 by Raghav RV.All cross-validation utilities in
sklearn.model_selectionnow permit one time cross-validation splitters for thecvparameter. Also non-deterministic cross-validation splitters (where multiple calls tosplitproduce dissimilar splits) can be used ascvparameter. Thesklearn.model_selection.GridSearchCVwill cross-validate each parameter setting on the split produced by the firstsplitcall to the cross-validation splitter. #7660 by Raghav RV.Fix bug where
preprocessing.MultiLabelBinarizer.fit_transformreturned an invalid CSR matrix. #7750 by CJ Carey.Fixed a bug where
metrics.pairwise.cosine_distancescould return a small negative distance. #7732 by Artsion.
API changes summary#
Trees and forests
The
min_weight_fraction_leafparameter of tree-based classifiers and regressors now assumes uniform sample weights by default if thesample_weightargument is not passed to thefitfunction. Previously, the parameter was silently ignored. #7301 by Nelson Liu.Tree splitting criterion classes’ cloning/pickling is now memory safe. #7680 by Ibraim Ganiev.
Linear, kernelized and related models
Length of
explained_variance_ratioofdiscriminant_analysis.LinearDiscriminantAnalysischanged for both Eigen and SVD solvers. The attribute has now a length of min(n_components, n_classes - 1). #7632 by JPFrancoiaNumerical issue with
linear_model.RidgeCVon centered data whenn_features > n_samples. #6178 by Bertrand Thirion
Version 0.18#
September 28, 2016
Model Selection Enhancements and API Changes#
The model_selection module
The new module
sklearn.model_selection, which groups together the functionalities of formerlysklearn.cross_validation,sklearn.grid_searchandsklearn.learning_curve, introduces new possibilities such as nested cross-validation and better manipulation of parameter searches with Pandas.Many things will stay the same but there are some key differences. Read below to know more about the changes.
Data-independent CV splitters enabling nested cross-validation
The new cross-validation splitters, defined in the
sklearn.model_selection, are no longer initialized with any data-dependent parameters such asy. Instead they expose asplitmethod that takes in the data and yields a generator for the different splits.This change makes it possible to use the cross-validation splitters to perform nested cross-validation, facilitated by
model_selection.GridSearchCVandmodel_selection.RandomizedSearchCVutilities.The enhanced cv_results_ attribute
The new
cv_results_attribute (ofmodel_selection.GridSearchCVandmodel_selection.RandomizedSearchCV) introduced in lieu of thegrid_scores_attribute is a dict of 1D arrays with elements in each array corresponding to the parameter settings (i.e. search candidates).The
cv_results_dict can be easily imported intopandasas aDataFramefor exploring the search results.The
cv_results_arrays include scores for each cross-validation split (with keys such as'split0_test_score'), as well as their mean ('mean_test_score') and standard deviation ('std_test_score').The ranks for the search candidates (based on their mean cross-validation score) is available at
cv_results_['rank_test_score'].The parameter values for each parameter is stored separately as numpy masked object arrays. The value, for that search candidate, is masked if the corresponding parameter is not applicable. Additionally a list of all the parameter dicts are stored at
cv_results_['params'].Parameters n_folds and n_iter renamed to n_splits
Some parameter names have changed: The
n_foldsparameter in newmodel_selection.KFold,model_selection.GroupKFold(see below for the name change), andmodel_selection.StratifiedKFoldis now renamed ton_splits. Then_iterparameter inmodel_selection.ShuffleSplit, the new classmodel_selection.GroupShuffleSplitandmodel_selection.StratifiedShuffleSplitis now renamed ton_splits.Rename of splitter classes which accepts group labels along with data
The cross-validation splitters
LabelKFold,LabelShuffleSplit,LeaveOneLabelOutandLeavePLabelOuthave been renamed tomodel_selection.GroupKFold,model_selection.GroupShuffleSplit,model_selection.LeaveOneGroupOutandmodel_selection.LeavePGroupsOutrespectively.Note the change from singular to plural form in
model_selection.LeavePGroupsOut.Fit parameter labels renamed to groups
The
labelsparameter in thesplitmethod of the newly renamed splittersmodel_selection.GroupKFold,model_selection.LeaveOneGroupOut,model_selection.LeavePGroupsOut,model_selection.GroupShuffleSplitis renamed togroupsfollowing the new nomenclature of their class names.Parameter n_labels renamed to n_groups
The parameter
n_labelsin the newly renamedmodel_selection.LeavePGroupsOutis changed ton_groups.Training scores and Timing information
cv_results_also includes the training scores for each cross-validation split (with keys such as'split0_train_score'), as well as their mean ('mean_train_score') and standard deviation ('std_train_score'). To avoid the cost of evaluating training score, setreturn_train_score=False.Additionally the mean and standard deviation of the times taken to split, train and score the model across all the cross-validation splits is available at the key
'mean_time'and'std_time'respectively.
Changelog#
New features#
Classifiers and Regressors
The Gaussian Process module has been reimplemented and now offers classification and regression estimators through
gaussian_process.GaussianProcessClassifierandgaussian_process.GaussianProcessRegressor. Among other things, the new implementation supports kernel engineering, gradient-based hyperparameter optimization or sampling of functions from GP prior and GP posterior. Extensive documentation and examples are provided. By Jan Hendrik Metzen.Added new supervised learning algorithm: Multi-layer Perceptron #3204 by Issam H. Laradji
Added
linear_model.HuberRegressor, a linear model robust to outliers. #5291 by Manoj Kumar.Added the
multioutput.MultiOutputRegressormeta-estimator. It converts single output regressors to multi-output regressors by fitting one regressor per output. By Tim Head.
Other estimators
New
mixture.GaussianMixtureandmixture.BayesianGaussianMixturereplace former mixture models, employing faster inference for sounder results. #7295 by Wei Xue and Thierry Guillemot.Class
decomposition.RandomizedPCAis now factored intodecomposition.PCAand it is available calling with parametersvd_solver='randomized'. The default number ofn_iterfor'randomized'has changed to 4. The old behavior of PCA is recovered bysvd_solver='full'. An additional solver callsarpackand performs truncated (non-randomized) SVD. By default, the best solver is selected depending on the size of the input and the number of components requested. #5299 by Giorgio Patrini.Added two functions for mutual information estimation:
feature_selection.mutual_info_classifandfeature_selection.mutual_info_regression. These functions can be used infeature_selection.SelectKBestandfeature_selection.SelectPercentileas score functions. By Andrea Bravi and Nikolay Mayorov.Added the
ensemble.IsolationForestclass for anomaly detection based on random forests. By Nicolas Goix.Added
algorithm="elkan"tocluster.KMeansimplementing Elkan’s fast K-Means algorithm. By Andreas Müller.
Model selection and evaluation
Added
metrics.fowlkes_mallows_score, the Fowlkes Mallows Index which measures the similarity of two clusterings of a set of points By Arnaud Fouchet and Thierry Guillemot.Added
metrics.calinski_harabaz_score, which computes the Calinski and Harabaz score to evaluate the resulting clustering of a set of points. By Arnaud Fouchet and Thierry Guillemot.Added new cross-validation splitter
model_selection.TimeSeriesSplitto handle time series data. #6586 by YenChen LinThe cross-validation iterators are replaced by cross-validation splitters available from
sklearn.model_selection, allowing for nested cross-validation. See Model Selection Enhancements and API Changes for more information. #4294 by Raghav RV.
Enhancements#
Trees and ensembles
Added a new splitting criterion for
tree.DecisionTreeRegressor, the mean absolute error. This criterion can also be used inensemble.ExtraTreesRegressor,ensemble.RandomForestRegressor, and the gradient boosting estimators. #6667 by Nelson Liu.Added weighted impurity-based early stopping criterion for decision tree growth. #6954 by Nelson Liu
The random forest, extra tree and decision tree estimators now has a method
decision_pathwhich returns the decision path of samples in the tree. By Arnaud Joly.A new example has been added unveiling the decision tree structure. By Arnaud Joly.
Random forest, extra trees, decision trees and gradient boosting estimator accept the parameter
min_samples_splitandmin_samples_leafprovided as a percentage of the training samples. By yelite and Arnaud Joly.Gradient boosting estimators accept the parameter
criterionto specify to splitting criterion used in built decision trees. #6667 by Nelson Liu.The memory footprint is reduced (sometimes greatly) for
ensemble.bagging.BaseBaggingand classes that inherit from it, i.e,ensemble.BaggingClassifier,ensemble.BaggingRegressor, andensemble.IsolationForest, by dynamically generating attributeestimators_samples_only when it is needed. By David Staub.Added
n_jobsandsample_weightparameters forensemble.VotingClassifierto fit underlying estimators in parallel. #5805 by Ibraim Ganiev.
Linear, kernelized and related models
In
linear_model.LogisticRegression, the SAG solver is now available in the multinomial case. #5251 by Tom Dupre la Tour.linear_model.RANSACRegressor,svm.LinearSVCandsvm.LinearSVRnow supportsample_weight. By Imaculate.Add parameter
losstolinear_model.RANSACRegressorto measure the error on the samples for every trial. By Manoj Kumar.Prediction of out-of-sample events with Isotonic Regression (
isotonic.IsotonicRegression) is now much faster (over 1000x in tests with synthetic data). By Jonathan Arfa.Isotonic regression (
isotonic.IsotonicRegression) now uses a better algorithm to avoidO(n^2)behavior in pathological cases, and is also generally faster (##6691). By Antony Lee.naive_bayes.GaussianNBnow accepts data-independent class-priors through the parameterpriors. By Guillaume Lemaitre.linear_model.ElasticNetandlinear_model.Lassonow works withnp.float32input data without converting it intonp.float64. This allows to reduce the memory consumption. #6913 by YenChen Lin.semi_supervised.LabelPropagationandsemi_supervised.LabelSpreadingnow accept arbitrary kernel functions in addition to stringsknnandrbf. #5762 by Utkarsh Upadhyay.
Decomposition, manifold learning and clustering
Added
inverse_transformfunction todecomposition.NMFto compute data matrix of original shape. By Anish Shah.cluster.KMeansandcluster.MiniBatchKMeansnow works withnp.float32andnp.float64input data without converting it. This allows to reduce the memory consumption by usingnp.float32. #6846 by Sebastian Säger and YenChen Lin.
Preprocessing and feature selection
preprocessing.RobustScalernow acceptsquantile_rangeparameter. #5929 by Konstantin Podshumok.feature_extraction.FeatureHashernow accepts string values. #6173 by Ryad Zenine and Devashish Deshpande.Keyword arguments can now be supplied to
funcinpreprocessing.FunctionTransformerby means of thekw_argsparameter. By Brian McFee.feature_selection.SelectKBestandfeature_selection.SelectPercentilenow accept score functions that take X, y as input and return only the scores. By Nikolay Mayorov.
Model evaluation and meta-estimators
multiclass.OneVsOneClassifierandmulticlass.OneVsRestClassifiernow supportpartial_fit. By Asish Panda and Philipp Dowling.Added support for substituting or disabling
pipeline.Pipelineandpipeline.FeatureUnioncomponents using theset_paramsinterface that powerssklearn.grid_search. See Selecting dimensionality reduction with Pipeline and GridSearchCV By Joel Nothman and Robert McGibbon.The new
cv_results_attribute ofmodel_selection.GridSearchCV(andmodel_selection.RandomizedSearchCV) can be easily imported into pandas as aDataFrame. Ref Model Selection Enhancements and API Changes for more information. #6697 by Raghav RV.Generalization of
model_selection.cross_val_predict. One can pass method names such aspredict_probato be used in the cross validation framework instead of the defaultpredict. By Ori Ziv and Sears Merritt.The training scores and time taken for training followed by scoring for each search candidate are now available at the
cv_results_dict. See Model Selection Enhancements and API Changes for more information. #7325 by Eugene Chen and Raghav RV.
Metrics
Added
labelsflag tometrics.log_lossto explicitly provide the labels when the number of classes iny_trueandy_preddiffer. #7239 by Hong Guangguo with help from Mads Jensen and Nelson Liu.Support sparse contingency matrices in cluster evaluation (
metrics.cluster.supervised) to scale to a large number of clusters. #7419 by Gregory Stupp and Joel Nothman.Add
sample_weightparameter tometrics.matthews_corrcoef. By Jatin Shah and Raghav RV.Speed up
metrics.silhouette_scoreby using vectorized operations. By Manoj Kumar.Add
sample_weightparameter tometrics.confusion_matrix. By Bernardo Stein.
Miscellaneous
Added
n_jobsparameter tofeature_selection.RFECVto compute the score on the test folds in parallel. By Manoj KumarCodebase does not contain C/C++ cython generated files: they are generated during build. Distribution packages will still contain generated C/C++ files. By Arthur Mensch.
Reduce the memory usage for 32-bit float input arrays of
utils.sparse_func.mean_variance_axisandutils.sparse_func.incr_mean_variance_axisby supporting cython fused types. By YenChen Lin.The
ignore_warningsnow accept a category argument to ignore only the warnings of a specified type. By Thierry Guillemot.Added parameter
return_X_yand return type(data, target) : tupleoption todatasets.load_irisdataset #7049,datasets.load_breast_cancerdataset #7152,datasets.load_digitsdataset,datasets.load_diabetesdataset,datasets.load_linneruddataset,datasets.load_bostondataset #7154 by Manvendra Singh.Simplification of the
clonefunction, deprecate support for estimators that modify parameters in__init__. #5540 by Andreas Müller.When unpickling a scikit-learn estimator in a different version than the one the estimator was trained with, a
UserWarningis raised, see the documentation on model persistence for more details. (#7248) By Andreas Müller.
Bug fixes#
Trees and ensembles
Random forest, extra trees, decision trees and gradient boosting won’t accept anymore
min_samples_split=1as at least 2 samples are required to split a decision tree node. By Arnaud Jolyensemble.VotingClassifiernow raisesNotFittedErrorifpredict,transformorpredict_probaare called on the non-fitted estimator. by Sebastian Raschka.Fix bug where
ensemble.AdaBoostClassifierandensemble.AdaBoostRegressorwould perform poorly if therandom_statewas fixed (#7411). By Joel Nothman.Fix bug in ensembles with randomization where the ensemble would not set
random_stateon base estimators in a pipeline or similar nesting. (#7411). Note, results forensemble.BaggingClassifierensemble.BaggingRegressor,ensemble.AdaBoostClassifierandensemble.AdaBoostRegressorwill now differ from previous versions. By Joel Nothman.
Linear, kernelized and related models
Fixed incorrect gradient computation for
loss='squared_epsilon_insensitive'inlinear_model.SGDClassifierandlinear_model.SGDRegressor(#6764). By Wenhua Yang.Fix bug in
linear_model.LogisticRegressionCVwheresolver='liblinear'did not acceptclass_weights='balanced. (#6817). By Tom Dupre la Tour.Fix bug in
neighbors.RadiusNeighborsClassifierwhere an error occurred when there were outliers being labelled and a weight function specified (#6902). By LeonieBorne.Fix
linear_model.ElasticNetsparse decision function to match output with dense in the multioutput case.
Decomposition, manifold learning and clustering
decomposition.RandomizedPCAdefault number ofiterated_poweris 4 instead of 3. #5141 by Giorgio Patrini.utils.extmath.randomized_svdperforms 4 power iterations by default, instead of 0. In practice this is enough for obtaining a good approximation of the true eigenvalues/vectors in the presence of noise. Whenn_componentsis small (< .1 * min(X.shape))n_iteris set to 7, unless the user specifies a higher number. This improves precision with few components. #5299 by Giorgio Patrini.Whiten/non-whiten inconsistency between components of
decomposition.PCAanddecomposition.RandomizedPCA(now factored into PCA, see the New features) is fixed.components_are stored with no whitening. #5299 by Giorgio Patrini.Fixed bug in
manifold.spectral_embeddingwhere diagonal of unnormalized Laplacian matrix was incorrectly set to 1. #4995 by Peter Fischer.Fixed incorrect initialization of
utils.arpack.eigshon all occurrences. Affectscluster.bicluster.SpectralBiclustering,decomposition.KernelPCA,manifold.LocallyLinearEmbedding, andmanifold.SpectralEmbedding(#5012). By Peter Fischer.Attribute
explained_variance_ratio_calculated with the SVD solver ofdiscriminant_analysis.LinearDiscriminantAnalysisnow returns correct results. By JPFrancoia
Preprocessing and feature selection
preprocessing.data._transform_selectednow always passes a copy ofXto transform function whencopy=True(#7194). By Caio Oliveira.
Model evaluation and meta-estimators
model_selection.StratifiedKFoldnow raises error if all n_labels for individual classes is less than n_folds. #6182 by Devashish Deshpande.Fixed bug in
model_selection.StratifiedShuffleSplitwhere train and test sample could overlap in some edge cases, see #6121 for more details. By Loic Esteve.Fix in
sklearn.model_selection.StratifiedShuffleSplitto return splits of sizetrain_sizeandtest_sizein all cases (#6472). By Andreas Müller.Cross-validation of
multiclass.OneVsOneClassifierandmulticlass.OneVsRestClassifiernow works with precomputed kernels. #7350 by Russell Smith.Fix incomplete
predict_probamethod delegation frommodel_selection.GridSearchCVtolinear_model.SGDClassifier(#7159) by Yichuan Liu.
Metrics
Fix bug in
metrics.silhouette_scorein which clusters of size 1 were incorrectly scored. They should get a score of 0. By Joel Nothman.Fix bug in
metrics.silhouette_samplesso that it now works with arbitrary labels, not just those ranging from 0 to n_clusters - 1.Fix bug where expected and adjusted mutual information were incorrect if cluster contingency cells exceeded
2**16. By Joel Nothman.metrics.pairwise_distancesnow converts arrays to boolean arrays when required inscipy.spatial.distance. #5460 by Tom Dupre la Tour.Fix sparse input support in
metrics.silhouette_scoreas well as example examples/text/document_clustering.py. By YenChen Lin.metrics.roc_curveandmetrics.precision_recall_curveno longer roundy_scorevalues when creating ROC curves; this was causing problems for users with very small differences in scores (#7353).
Miscellaneous
model_selection.tests._search._check_param_gridnow works correctly with all types that extends/implementsSequence(except string), including range (Python 3.x) and xrange (Python 2.x). #7323 by Viacheslav Kovalevskyi.utils.extmath.randomized_range_finderis more numerically stable when many power iterations are requested, since it applies LU normalization by default. Ifn_iter<2numerical issues are unlikely, thus no normalization is applied. Other normalization options are available:'none', 'LU'and'QR'. #5141 by Giorgio Patrini.Fix a bug where some formats of
scipy.sparsematrix, and estimators with them as parameters, could not be passed tobase.clone. By Loic Esteve.datasets.load_svmlight_filenow is able to read long int QID values. #7101 by Ibraim Ganiev.
API changes summary#
Linear, kernelized and related models
residual_metrichas been deprecated inlinear_model.RANSACRegressor. Uselossinstead. By Manoj Kumar.Access to public attributes
.X_and.y_has been deprecated inisotonic.IsotonicRegression. By Jonathan Arfa.
Decomposition, manifold learning and clustering
The old
mixture.DPGMMis deprecated in favor of the newmixture.BayesianGaussianMixture(with the parameterweight_concentration_prior_type='dirichlet_process'). The new class solves the computational problems of the old class and computes the Gaussian mixture with a Dirichlet process prior faster than before. #7295 by Wei Xue and Thierry Guillemot.The old
mixture.VBGMMis deprecated in favor of the newmixture.BayesianGaussianMixture(with the parameterweight_concentration_prior_type='dirichlet_distribution'). The new class solves the computational problems of the old class and computes the Variational Bayesian Gaussian mixture faster than before. #6651 by Wei Xue and Thierry Guillemot.The old
mixture.GMMis deprecated in favor of the newmixture.GaussianMixture. The new class computes the Gaussian mixture faster than before and some of computational problems have been solved. #6666 by Wei Xue and Thierry Guillemot.
Model evaluation and meta-estimators
The
sklearn.cross_validation,sklearn.grid_searchandsklearn.learning_curvehave been deprecated and the classes and functions have been reorganized into thesklearn.model_selectionmodule. Ref Model Selection Enhancements and API Changes for more information. #4294 by Raghav RV.The
grid_scores_attribute ofmodel_selection.GridSearchCVandmodel_selection.RandomizedSearchCVis deprecated in favor of the attributecv_results_. Ref Model Selection Enhancements and API Changes for more information. #6697 by Raghav RV.The parameters
n_iterorn_foldsin old CV splitters are replaced by the new parametern_splitssince it can provide a consistent and unambiguous interface to represent the number of train-test splits. #7187 by YenChen Lin.classesparameter was renamed tolabelsinmetrics.hamming_loss. #7260 by Sebastián Vanrell.The splitter classes
LabelKFold,LabelShuffleSplit,LeaveOneLabelOutandLeavePLabelsOutare renamed tomodel_selection.GroupKFold,model_selection.GroupShuffleSplit,model_selection.LeaveOneGroupOutandmodel_selection.LeavePGroupsOutrespectively. Also the parameterlabelsin thesplitmethod of the newly renamed splittersmodel_selection.LeaveOneGroupOutandmodel_selection.LeavePGroupsOutis renamed togroups. Additionally inmodel_selection.LeavePGroupsOut, the parametern_labelsis renamed ton_groups. #6660 by Raghav RV.Error and loss names for
scoringparameters are now prefixed by'neg_', such asneg_mean_squared_error. The unprefixed versions are deprecated and will be removed in version 0.20. #7261 by Tim Head.
Code Contributors#
Aditya Joshi, Alejandro, Alexander Fabisch, Alexander Loginov, Alexander Minyushkin, Alexander Rudy, Alexandre Abadie, Alexandre Abraham, Alexandre Gramfort, Alexandre Saint, alexfields, Alvaro Ulloa, alyssaq, Amlan Kar, Andreas Mueller, andrew giessel, Andrew Jackson, Andrew McCulloh, Andrew Murray, Anish Shah, Arafat, Archit Sharma, Ariel Rokem, Arnaud Joly, Arnaud Rachez, Arthur Mensch, Ash Hoover, asnt, b0noI, Behzad Tabibian, Bernardo, Bernhard Kratzwald, Bhargav Mangipudi, blakeflei, Boyuan Deng, Brandon Carter, Brett Naul, Brian McFee, Caio Oliveira, Camilo Lamus, Carol Willing, Cass, CeShine Lee, Charles Truong, Chyi-Kwei Yau, CJ Carey, codevig, Colin Ni, Dan Shiebler, Daniel, Daniel Hnyk, David Ellis, David Nicholson, David Staub, David Thaler, David Warshaw, Davide Lasagna, Deborah, definitelyuncertain, Didi Bar-Zev, djipey, dsquareindia, edwinENSAE, Elias Kuthe, Elvis DOHMATOB, Ethan White, Fabian Pedregosa, Fabio Ticconi, fisache, Florian Wilhelm, Francis, Francis O’Donovan, Gael Varoquaux, Ganiev Ibraim, ghg, Gilles Louppe, Giorgio Patrini, Giovanni Cherubin, Giovanni Lanzani, Glenn Qian, Gordon Mohr, govin-vatsan, Graham Clenaghan, Greg Reda, Greg Stupp, Guillaume Lemaitre, Gustav Mörtberg, halwai, Harizo Rajaona, Harry Mavroforakis, hashcode55, hdmetor, Henry Lin, Hobson Lane, Hugo Bowne-Anderson, Igor Andriushchenko, Imaculate, Inki Hwang, Isaac Sijaranamual, Ishank Gulati, Issam Laradji, Iver Jordal, jackmartin, Jacob Schreiber, Jake Vanderplas, James Fiedler, James Routley, Jan Zikes, Janna Brettingen, jarfa, Jason Laska, jblackburne, jeff levesque, Jeffrey Blackburne, Jeffrey04, Jeremy Hintz, jeremynixon, Jeroen, Jessica Yung, Jill-Jênn Vie, Jimmy Jia, Jiyuan Qian, Joel Nothman, johannah, John, John Boersma, John Kirkham, John Moeller, jonathan.striebel, joncrall, Jordi, Joseph Munoz, Joshua Cook, JPFrancoia, jrfiedler, JulianKahnert, juliathebrave, kaichogami, KamalakerDadi, Kenneth Lyons, Kevin Wang, kingjr, kjell, Konstantin Podshumok, Kornel Kielczewski, Krishna Kalyan, krishnakalyan3, Kvle Putnam, Kyle Jackson, Lars Buitinck, ldavid, LeiG, LeightonZhang, Leland McInnes, Liang-Chi Hsieh, Lilian Besson, lizsz, Loic Esteve, Louis Tiao, Léonie Borne, Mads Jensen, Maniteja Nandana, Manoj Kumar, Manvendra Singh, Marco, Mario Krell, Mark Bao, Mark Szepieniec, Martin Madsen, MartinBpr, MaryanMorel, Massil, Matheus, Mathieu Blondel, Mathieu Dubois, Matteo, Matthias Ekman, Max Moroz, Michael Scherer, michiaki ariga, Mikhail Korobov, Moussa Taifi, mrandrewandrade, Mridul Seth, nadya-p, Naoya Kanai, Nate George, Nelle Varoquaux, Nelson Liu, Nick James, NickleDave, Nico, Nicolas Goix, Nikolay Mayorov, ningchi, nlathia, okbalefthanded, Okhlopkov, Olivier Grisel, Panos Louridas, Paul Strickland, Perrine Letellier, pestrickland, Peter Fischer, Pieter, Ping-Yao, Chang, practicalswift, Preston Parry, Qimu Zheng, Rachit Kansal, Raghav RV, Ralf Gommers, Ramana.S, Rammig, Randy Olson, Rob Alexander, Robert Lutz, Robin Schucker, Rohan Jain, Ruifeng Zheng, Ryan Yu, Rémy Léone, saihttam, Saiwing Yeung, Sam Shleifer, Samuel St-Jean, Sartaj Singh, Sasank Chilamkurthy, saurabh.bansod, Scott Andrews, Scott Lowe, seales, Sebastian Raschka, Sebastian Saeger, Sebastián Vanrell, Sergei Lebedev, shagun Sodhani, shanmuga cv, Shashank Shekhar, shawpan, shengxiduan, Shota, shuckle16, Skipper Seabold, sklearn-ci, SmedbergM, srvanrell, Sébastien Lerique, Taranjeet, themrmax, Thierry, Thierry Guillemot, Thomas, Thomas Hallock, Thomas Moreau, Tim Head, tKammy, toastedcornflakes, Tom, TomDLT, Toshihiro Kamishima, tracer0tong, Trent Hauck, trevorstephens, Tue Vo, Varun, Varun Jewalikar, Viacheslav, Vighnesh Birodkar, Vikram, Villu Ruusmann, Vinayak Mehta, walter, waterponey, Wenhua Yang, Wenjian Huang, Will Welch, wyseguy7, xyguo, yanlend, Yaroslav Halchenko, yelite, Yen, YenChenLin, Yichuan Liu, Yoav Ram, Yoshiki, Zheng RuiFeng, zivori, Óscar Nájera