Version 0.18.2

June 20, 2017

Changelog

Code Contributors

Aman Dalmia, Loic Esteve, Nate Guerin, Sergei Lebedev

Version 0.18.1

November 11, 2016

Changelog

Enhancements

Bug fixes

API changes summary

Trees and forests

  • The min_weight_fraction_leaf parameter of tree-based classifiers and regressors now assumes uniform sample weights by default if the sample_weight argument is not passed to the fit function. Previously, the parameter was silently ignored. #7301 by Nelson Liu.

  • Tree splitting criterion classes’ cloning/pickling is now memory safe. #7680 by Ibraim Ganiev.

Linear, kernelized and related models

Version 0.18

September 28, 2016

Model Selection Enhancements and API Changes

  • The model_selection module

    The new module sklearn.model_selection, which groups together the functionalities of formerly sklearn.cross_validation, sklearn.grid_search and sklearn.learning_curve, introduces new possibilities such as nested cross-validation and better manipulation of parameter searches with Pandas.

    Many things will stay the same but there are some key differences. Read below to know more about the changes.

  • Data-independent CV splitters enabling nested cross-validation

    The new cross-validation splitters, defined in the sklearn.model_selection, are no longer initialized with any data-dependent parameters such as y. Instead they expose a split method that takes in the data and yields a generator for the different splits.

    This change makes it possible to use the cross-validation splitters to perform nested cross-validation, facilitated by model_selection.GridSearchCV and model_selection.RandomizedSearchCV utilities.

  • The enhanced cv_results_ attribute

    The new cv_results_ attribute (of model_selection.GridSearchCV and model_selection.RandomizedSearchCV) introduced in lieu of the grid_scores_ attribute is a dict of 1D arrays with elements in each array corresponding to the parameter settings (i.e. search candidates).

    The cv_results_ dict can be easily imported into pandas as a DataFrame for exploring the search results.

    The cv_results_ arrays include scores for each cross-validation split (with keys such as 'split0_test_score'), as well as their mean ('mean_test_score') and standard deviation ('std_test_score').

    The ranks for the search candidates (based on their mean cross-validation score) is available at cv_results_['rank_test_score'].

    The parameter values for each parameter is stored separately as numpy masked object arrays. The value, for that search candidate, is masked if the corresponding parameter is not applicable. Additionally a list of all the parameter dicts are stored at cv_results_['params'].

  • Parameters n_folds and n_iter renamed to n_splits

    Some parameter names have changed: The n_folds parameter in new model_selection.KFold, model_selection.GroupKFold (see below for the name change), and model_selection.StratifiedKFold is now renamed to n_splits. The n_iter parameter in model_selection.ShuffleSplit, the new class model_selection.GroupShuffleSplit and model_selection.StratifiedShuffleSplit is now renamed to n_splits.

  • Rename of splitter classes which accepts group labels along with data

    The cross-validation splitters LabelKFold, LabelShuffleSplit, LeaveOneLabelOut and LeavePLabelOut have been renamed to model_selection.GroupKFold, model_selection.GroupShuffleSplit, model_selection.LeaveOneGroupOut and model_selection.LeavePGroupsOut respectively.

    Note the change from singular to plural form in model_selection.LeavePGroupsOut.

  • Fit parameter labels renamed to groups

    The labels parameter in the split method of the newly renamed splitters model_selection.GroupKFold, model_selection.LeaveOneGroupOut, model_selection.LeavePGroupsOut, model_selection.GroupShuffleSplit is renamed to groups following the new nomenclature of their class names.

  • Parameter n_labels renamed to n_groups

    The parameter n_labels in the newly renamed model_selection.LeavePGroupsOut is changed to n_groups.

  • Training scores and Timing information

    cv_results_ also includes the training scores for each cross-validation split (with keys such as 'split0_train_score'), as well as their mean ('mean_train_score') and standard deviation ('std_train_score'). To avoid the cost of evaluating training score, set return_train_score=False.

    Additionally the mean and standard deviation of the times taken to split, train and score the model across all the cross-validation splits is available at the key 'mean_time' and 'std_time' respectively.

Changelog

New features

Classifiers and Regressors

Other estimators

Model selection and evaluation

Enhancements

Trees and ensembles

Linear, kernelized and related models

Decomposition, manifold learning and clustering

Preprocessing and feature selection

Model evaluation and meta-estimators

Metrics

Miscellaneous

  • Added n_jobs parameter to feature_selection.RFECV to compute the score on the test folds in parallel. By Manoj Kumar

  • Codebase does not contain C/C++ cython generated files: they are generated during build. Distribution packages will still contain generated C/C++ files. By Arthur Mensch.

  • Reduce the memory usage for 32-bit float input arrays of utils.sparse_func.mean_variance_axis and utils.sparse_func.incr_mean_variance_axis by supporting cython fused types. By YenChen Lin.

  • The ignore_warnings now accept a category argument to ignore only the warnings of a specified type. By Thierry Guillemot.

  • Added parameter return_X_y and return type (data, target) : tuple option to load_iris dataset #7049, load_breast_cancer dataset #7152, load_digits dataset, load_diabetes dataset, load_linnerud dataset, load_boston dataset #7154 by Manvendra Singh.

  • Simplification of the clone function, deprecate support for estimators that modify parameters in __init__. #5540 by Andreas Müller.

  • When unpickling a scikit-learn estimator in a different version than the one the estimator was trained with, a UserWarning is raised, see the documentation on model persistence for more details. (#7248) By Andreas Müller.

Bug fixes

Trees and ensembles

Linear, kernelized and related models

Decomposition, manifold learning and clustering

Preprocessing and feature selection

  • preprocessing.data._transform_selected now always passes a copy of X to transform function when copy=True (#7194). By Caio Oliveira.

Model evaluation and meta-estimators

Metrics

Miscellaneous

  • model_selection.tests._search._check_param_grid now works correctly with all types that extends/implements Sequence (except string), including range (Python 3.x) and xrange (Python 2.x). #7323 by Viacheslav Kovalevskyi.

  • utils.extmath.randomized_range_finder is more numerically stable when many power iterations are requested, since it applies LU normalization by default. If n_iter<2 numerical issues are unlikely, thus no normalization is applied. Other normalization options are available: 'none', 'LU' and 'QR'. #5141 by Giorgio Patrini.

  • Fix a bug where some formats of scipy.sparse matrix, and estimators with them as parameters, could not be passed to base.clone. By Loic Esteve.

  • datasets.load_svmlight_file now is able to read long int QID values. #7101 by Ibraim Ganiev.

API changes summary

Linear, kernelized and related models

Decomposition, manifold learning and clustering

  • The old mixture.DPGMM is deprecated in favor of the new mixture.BayesianGaussianMixture (with the parameter weight_concentration_prior_type='dirichlet_process'). The new class solves the computational problems of the old class and computes the Gaussian mixture with a Dirichlet process prior faster than before. #7295 by Wei Xue and Thierry Guillemot.

  • The old mixture.VBGMM is deprecated in favor of the new mixture.BayesianGaussianMixture (with the parameter weight_concentration_prior_type='dirichlet_distribution'). The new class solves the computational problems of the old class and computes the Variational Bayesian Gaussian mixture faster than before. #6651 by Wei Xue and Thierry Guillemot.

  • The old mixture.GMM is deprecated in favor of the new mixture.GaussianMixture. The new class computes the Gaussian mixture faster than before and some of computational problems have been solved. #6666 by Wei Xue and Thierry Guillemot.

Model evaluation and meta-estimators

Code Contributors

Aditya Joshi, Alejandro, Alexander Fabisch, Alexander Loginov, Alexander Minyushkin, Alexander Rudy, Alexandre Abadie, Alexandre Abraham, Alexandre Gramfort, Alexandre Saint, alexfields, Alvaro Ulloa, alyssaq, Amlan Kar, Andreas Mueller, andrew giessel, Andrew Jackson, Andrew McCulloh, Andrew Murray, Anish Shah, Arafat, Archit Sharma, Ariel Rokem, Arnaud Joly, Arnaud Rachez, Arthur Mensch, Ash Hoover, asnt, b0noI, Behzad Tabibian, Bernardo, Bernhard Kratzwald, Bhargav Mangipudi, blakeflei, Boyuan Deng, Brandon Carter, Brett Naul, Brian McFee, Caio Oliveira, Camilo Lamus, Carol Willing, Cass, CeShine Lee, Charles Truong, Chyi-Kwei Yau, CJ Carey, codevig, Colin Ni, Dan Shiebler, Daniel, Daniel Hnyk, David Ellis, David Nicholson, David Staub, David Thaler, David Warshaw, Davide Lasagna, Deborah, definitelyuncertain, Didi Bar-Zev, djipey, dsquareindia, edwinENSAE, Elias Kuthe, Elvis DOHMATOB, Ethan White, Fabian Pedregosa, Fabio Ticconi, fisache, Florian Wilhelm, Francis, Francis O’Donovan, Gael Varoquaux, Ganiev Ibraim, ghg, Gilles Louppe, Giorgio Patrini, Giovanni Cherubin, Giovanni Lanzani, Glenn Qian, Gordon Mohr, govin-vatsan, Graham Clenaghan, Greg Reda, Greg Stupp, Guillaume Lemaitre, Gustav Mörtberg, halwai, Harizo Rajaona, Harry Mavroforakis, hashcode55, hdmetor, Henry Lin, Hobson Lane, Hugo Bowne-Anderson, Igor Andriushchenko, Imaculate, Inki Hwang, Isaac Sijaranamual, Ishank Gulati, Issam Laradji, Iver Jordal, jackmartin, Jacob Schreiber, Jake Vanderplas, James Fiedler, James Routley, Jan Zikes, Janna Brettingen, jarfa, Jason Laska, jblackburne, jeff levesque, Jeffrey Blackburne, Jeffrey04, Jeremy Hintz, jeremynixon, Jeroen, Jessica Yung, Jill-Jênn Vie, Jimmy Jia, Jiyuan Qian, Joel Nothman, johannah, John, John Boersma, John Kirkham, John Moeller, jonathan.striebel, joncrall, Jordi, Joseph Munoz, Joshua Cook, JPFrancoia, jrfiedler, JulianKahnert, juliathebrave, kaichogami, KamalakerDadi, Kenneth Lyons, Kevin Wang, kingjr, kjell, Konstantin Podshumok, Kornel Kielczewski, Krishna Kalyan, krishnakalyan3, Kvle Putnam, Kyle Jackson, Lars Buitinck, ldavid, LeiG, LeightonZhang, Leland McInnes, Liang-Chi Hsieh, Lilian Besson, lizsz, Loic Esteve, Louis Tiao, Léonie Borne, Mads Jensen, Maniteja Nandana, Manoj Kumar, Manvendra Singh, Marco, Mario Krell, Mark Bao, Mark Szepieniec, Martin Madsen, MartinBpr, MaryanMorel, Massil, Matheus, Mathieu Blondel, Mathieu Dubois, Matteo, Matthias Ekman, Max Moroz, Michael Scherer, michiaki ariga, Mikhail Korobov, Moussa Taifi, mrandrewandrade, Mridul Seth, nadya-p, Naoya Kanai, Nate George, Nelle Varoquaux, Nelson Liu, Nick James, NickleDave, Nico, Nicolas Goix, Nikolay Mayorov, ningchi, nlathia, okbalefthanded, Okhlopkov, Olivier Grisel, Panos Louridas, Paul Strickland, Perrine Letellier, pestrickland, Peter Fischer, Pieter, Ping-Yao, Chang, practicalswift, Preston Parry, Qimu Zheng, Rachit Kansal, Raghav RV, Ralf Gommers, Ramana.S, Rammig, Randy Olson, Rob Alexander, Robert Lutz, Robin Schucker, Rohan Jain, Ruifeng Zheng, Ryan Yu, Rémy Léone, saihttam, Saiwing Yeung, Sam Shleifer, Samuel St-Jean, Sartaj Singh, Sasank Chilamkurthy, saurabh.bansod, Scott Andrews, Scott Lowe, seales, Sebastian Raschka, Sebastian Saeger, Sebastián Vanrell, Sergei Lebedev, shagun Sodhani, shanmuga cv, Shashank Shekhar, shawpan, shengxiduan, Shota, shuckle16, Skipper Seabold, sklearn-ci, SmedbergM, srvanrell, Sébastien Lerique, Taranjeet, themrmax, Thierry, Thierry Guillemot, Thomas, Thomas Hallock, Thomas Moreau, Tim Head, tKammy, toastedcornflakes, Tom, TomDLT, Toshihiro Kamishima, tracer0tong, Trent Hauck, trevorstephens, Tue Vo, Varun, Varun Jewalikar, Viacheslav, Vighnesh Birodkar, Vikram, Villu Ruusmann, Vinayak Mehta, walter, waterponey, Wenhua Yang, Wenjian Huang, Will Welch, wyseguy7, xyguo, yanlend, Yaroslav Halchenko, yelite, Yen, YenChenLin, Yichuan Liu, Yoav Ram, Yoshiki, Zheng RuiFeng, zivori, Óscar Nájera