Version 0.16.1

April 14, 2015

Changelog

Bug fixes

Version 0.16

March 26, 2015

Highlights

  • Speed improvements (notably in cluster.DBSCAN), reduced memory requirements, bug-fixes and better default settings.

  • Multinomial Logistic regression and a path algorithm in linear_model.LogisticRegressionCV.

  • Out-of core learning of PCA via decomposition.IncrementalPCA.

  • Probability calibration of classifiers using calibration.CalibratedClassifierCV.

  • cluster.Birch clustering method for large-scale datasets.

  • Scalable approximate nearest neighbors search with Locality-sensitive hashing forests in neighbors.LSHForest.

  • Improved error messages and better validation when using malformed input data.

  • More robust integration with pandas dataframes.

Changelog

New features

Enhancements

Documentation improvements

Bug fixes

API changes summary

  • GridSearchCV and cross_val_score and other meta-estimators don’t convert pandas DataFrames into arrays any more, allowing DataFrame specific operations in custom estimators.

  • multiclass.fit_ovr, multiclass.predict_ovr, predict_proba_ovr, multiclass.fit_ovo, multiclass.predict_ovo, multiclass.fit_ecoc and multiclass.predict_ecoc are deprecated. Use the underlying estimators instead.

  • Nearest neighbors estimators used to take arbitrary keyword arguments and pass these to their distance metric. This will no longer be supported in scikit-learn 0.18; use the metric_params argument instead.

  • n_jobs parameter of the fit method shifted to the constructor of the

    LinearRegression class.

  • The predict_proba method of multiclass.OneVsRestClassifier now returns two probabilities per sample in the multiclass case; this is consistent with other estimators and with the method’s documentation, but previous versions accidentally returned only the positive probability. Fixed by Will Lamond and Lars Buitinck.

  • Change default value of precompute in linear_model.ElasticNet and linear_model.Lasso to False. Setting precompute to “auto” was found to be slower when n_samples > n_features since the computation of the Gram matrix is computationally expensive and outweighs the benefit of fitting the Gram for just one alpha. precompute="auto" is now deprecated and will be removed in 0.18 By Manoj Kumar.

  • Expose positive option in linear_model.enet_path and linear_model.enet_path which constrains coefficients to be positive. By Manoj Kumar.

  • Users should now supply an explicit average parameter to sklearn.metrics.f1_score, sklearn.metrics.fbeta_score, sklearn.metrics.recall_score and sklearn.metrics.precision_score when performing multiclass or multilabel (i.e. not binary) classification. By Joel Nothman.

  • scoring parameter for cross validation now accepts 'f1_micro', 'f1_macro' or 'f1_weighted'. 'f1' is now for binary classification only. Similar changes apply to 'precision' and 'recall'. By Joel Nothman.

  • The fit_intercept, normalize and return_models parameters in linear_model.enet_path and linear_model.lasso_path have been removed. They were deprecated since 0.14

  • From now onwards, all estimators will uniformly raise NotFittedError when any of the predict like methods are called before the model is fit. By Raghav RV.

  • Input data validation was refactored for more consistent input validation. The check_arrays function was replaced by check_array and check_X_y. By Andreas Müller.

  • Allow X=None in the methods radius_neighbors, kneighbors, kneighbors_graph and radius_neighbors_graph in sklearn.neighbors.NearestNeighbors and family. If set to None, then for every sample this avoids setting the sample itself as the first nearest neighbor. By Manoj Kumar.

  • Add parameter include_self in neighbors.kneighbors_graph and neighbors.radius_neighbors_graph which has to be explicitly set by the user. If set to True, then the sample itself is considered as the first nearest neighbor.

  • thresh parameter is deprecated in favor of new tol parameter in GMM, DPGMM and VBGMM. See Enhancements section for details. By Hervé Bredin.

  • Estimators will treat input with dtype object as numeric when possible. By Andreas Müller

  • Estimators now raise ValueError consistently when fitted on empty data (less than 1 sample or less than 1 feature for 2D input). By Olivier Grisel.

  • The shuffle option of linear_model.SGDClassifier, linear_model.SGDRegressor, linear_model.Perceptron, linear_model.PassiveAggressiveClassifier and linear_model.PassiveAggressiveRegressor now defaults to True.

  • cluster.DBSCAN now uses a deterministic initialization. The random_state parameter is deprecated. By Erich Schubert.

Code Contributors

A. Flaxman, Aaron Schumacher, Aaron Staple, abhishek thakur, Akshay, akshayah3, Aldrian Obaja, Alexander Fabisch, Alexandre Gramfort, Alexis Mignon, Anders Aagaard, Andreas Mueller, Andreas van Cranenburgh, Andrew Tulloch, Andrew Walker, Antony Lee, Arnaud Joly, banilo, Barmaley.exe, Ben Davies, Benedikt Koehler, bhsu, Boris Feld, Borja Ayerdi, Boyuan Deng, Brent Pedersen, Brian Wignall, Brooke Osborn, Calvin Giles, Cathy Deng, Celeo, cgohlke, chebee7i, Christian Stade-Schuldt, Christof Angermueller, Chyi-Kwei Yau, CJ Carey, Clemens Brunner, Daiki Aminaka, Dan Blanchard, danfrankj, Danny Sullivan, David Fletcher, Dmitrijs Milajevs, Dougal J. Sutherland, Erich Schubert, Fabian Pedregosa, Florian Wilhelm, floydsoft, Félix-Antoine Fortin, Gael Varoquaux, Garrett-R, Gilles Louppe, gpassino, gwulfs, Hampus Bengtsson, Hamzeh Alsalhi, Hanna Wallach, Harry Mavroforakis, Hasil Sharma, Helder, Herve Bredin, Hsiang-Fu Yu, Hugues SALAMIN, Ian Gilmore, Ilambharathi Kanniah, Imran Haque, isms, Jake VanderPlas, Jan Dlabal, Jan Hendrik Metzen, Jatin Shah, Javier López Peña, jdcaballero, Jean Kossaifi, Jeff Hammerbacher, Joel Nothman, Jonathan Helmus, Joseph, Kaicheng Zhang, Kevin Markham, Kyle Beauchamp, Kyle Kastner, Lagacherie Matthieu, Lars Buitinck, Laurent Direr, leepei, Loic Esteve, Luis Pedro Coelho, Lukas Michelbacher, maheshakya, Manoj Kumar, Manuel, Mario Michael Krell, Martin, Martin Billinger, Martin Ku, Mateusz Susik, Mathieu Blondel, Matt Pico, Matt Terry, Matteo Visconti dOC, Matti Lyra, Max Linke, Mehdi Cherti, Michael Bommarito, Michael Eickenberg, Michal Romaniuk, MLG, mr.Shu, Nelle Varoquaux, Nicola Montecchio, Nicolas, Nikolay Mayorov, Noel Dawe, Okal Billy, Olivier Grisel, Óscar Nájera, Paolo Puggioni, Peter Prettenhofer, Pratap Vardhan, pvnguyen, queqichao, Rafael Carrascosa, Raghav R V, Rahiel Kasim, Randall Mason, Rob Zinkov, Robert Bradshaw, Saket Choudhary, Sam Nicholls, Samuel Charron, Saurabh Jha, sethdandridge, sinhrks, snuderl, Stefan Otte, Stefan van der Walt, Steve Tjoa, swu, Sylvain Zimmer, tejesh95, terrycojones, Thomas Delteil, Thomas Unterthiner, Tomas Kazmar, trevorstephens, tttthomasssss, Tzu-Ming Kuo, ugurcaliskan, ugurthemaster, Vinayak Mehta, Vincent Dubourg, Vjacheslav Murashkin, Vlad Niculae, wadawson, Wei Xue, Will Lamond, Wu Jiang, x0l, Xinfan Meng, Yan Yi, Yu-Chin