Version 0.15.2

September 4, 2014

Bug fixes

Version 0.15.1

August 1, 2014

Bug fixes

  • Made cross_validation.cross_val_score use cross_validation.KFold instead of cross_validation.StratifiedKFold on multi-output classification problems. By Nikolay Mayorov.
  • Support unseen labels preprocessing.LabelBinarizer to restore the default behavior of 0.14.1 for backward compatibility. By Hamzeh Alsalhi.
  • Fixed the cluster.KMeans stopping criterion that prevented early convergence detection. By Edward Raff and Gael Varoquaux.
  • Fixed the behavior of multiclass.OneVsOneClassifier. in case of ties at the per-class vote level by computing the correct per-class sum of prediction scores. By Andreas Müller.
  • Made cross_validation.cross_val_score and grid_search.GridSearchCV accept Python lists as input data. This is especially useful for cross-validation and model selection of text processing pipelines. By Andreas Müller.
  • Fixed data input checks of most estimators to accept input data that implements the NumPy __array__ protocol. This is the case for for pandas.Series and pandas.DataFrame in recent versions of pandas. By Gael Varoquaux.
  • Fixed a regression for linear_model.SGDClassifier with class_weight="auto" on data with non-contiguous labels. By Olivier Grisel.

Version 0.15

July 15, 2014

Highlights

  • Many speed and memory improvements all across the code
  • Huge speed and memory improvements to random forests (and extra trees) that also benefit better from parallel computing.
  • Incremental fit to BernoulliRBM
  • Added cluster.AgglomerativeClustering for hierarchical agglomerative clustering with average linkage, complete linkage and ward strategies.
  • Added linear_model.RANSACRegressor for robust regression models.
  • Added dimensionality reduction with manifold.TSNE which can be used to visualize high-dimensional data.

Changelog

New features

Enhancements

Documentation improvements

  • The Working With Text Data tutorial has now been worked in to the main documentation’s tutorial section. Includes exercises and skeletons for tutorial presentation. Original tutorial created by several authors including Olivier Grisel, Lars Buitinck and many others. Tutorial integration into the scikit-learn documentation by Jaques Grobler
  • Added Computational Performance documentation. Discussion and examples of prediction latency / throughput and different factors that have influence over speed. Additional tips for building faster models and choosing a relevant compromise between speed and predictive power. By Eustache Diemert.

Bug fixes

  • Fixed bug in decomposition.MiniBatchDictionaryLearning : partial_fit was not working properly.
  • Fixed bug in linear_model.stochastic_gradient : l1_ratio was used as (1.0 - l1_ratio) .
  • Fixed bug in multiclass.OneVsOneClassifier with string labels
  • Fixed a bug in LassoCV and ElasticNetCV: they would not pre-compute the Gram matrix with precompute=True or precompute="auto" and n_samples > n_features. By Manoj Kumar.
  • Fixed incorrect estimation of the degrees of freedom in feature_selection.f_regression when variates are not centered. By Virgile Fritsch.
  • Fixed a race condition in parallel processing with pre_dispatch != "all" (for instance, in cross_val_score). By Olivier Grisel.
  • Raise error in cluster.FeatureAgglomeration and cluster.WardAgglomeration when no samples are given, rather than returning meaningless clustering.
  • Fixed bug in gradient_boosting.GradientBoostingRegressor with loss='huber': gamma might have not been initialized.
  • Fixed feature importances as computed with a forest of randomized trees when fit with sample_weight != None and/or with bootstrap=True. By Gilles Louppe.

API changes summary

  • sklearn.hmm is deprecated. Its removal is planned for the 0.17 release.
  • Use of covariance.EllipticEnvelop has now been removed after deprecation. Please use covariance.EllipticEnvelope instead.
  • cluster.Ward is deprecated. Use cluster.AgglomerativeClustering instead.
  • cluster.WardClustering is deprecated. Use
  • cluster.AgglomerativeClustering instead.
  • cross_validation.Bootstrap is deprecated. cross_validation.KFold or cross_validation.ShuffleSplit are recommended instead.
  • Direct support for the sequence of sequences (or list of lists) multilabel format is deprecated. To convert to and from the supported binary indicator matrix format, use MultiLabelBinarizer. By Joel Nothman.
  • Add score method to PCA following the model of probabilistic PCA and deprecate ProbabilisticPCA model whose score implementation is not correct. The computation now also exploits the matrix inversion lemma for faster computation. By Alexandre Gramfort.
  • The score method of FactorAnalysis now returns the average log-likelihood of the samples. Use score_samples to get log-likelihood of each sample. By Alexandre Gramfort.
  • Generating boolean masks (the setting indices=False) from cross-validation generators is deprecated. Support for masks will be removed in 0.17. The generators have produced arrays of indices by default since 0.10. By Joel Nothman.
  • 1-d arrays containing strings with dtype=object (as used in Pandas) are now considered valid classification targets. This fixes a regression from version 0.13 in some classifiers. By Joel Nothman.
  • Fix wrong explained_variance_ratio_ attribute in RandomizedPCA. By Alexandre Gramfort.
  • Fit alphas for each l1_ratio instead of mean_l1_ratio in linear_model.ElasticNetCV and linear_model.LassoCV. This changes the shape of alphas_ from (n_alphas,) to (n_l1_ratio, n_alphas) if the l1_ratio provided is a 1-D array like object of length greater than one. By Manoj Kumar.
  • Fix linear_model.ElasticNetCV and linear_model.LassoCV when fitting intercept and input data is sparse. The automatic grid of alphas was not computed correctly and the scaling with normalize was wrong. By Manoj Kumar.
  • Fix wrong maximal number of features drawn (max_features) at each split for decision trees, random forests and gradient tree boosting. Previously, the count for the number of drawn features started only after one non constant features in the split. This bug fix will affect computational and generalization performance of those algorithms in the presence of constant features. To get back previous generalization performance, you should modify the value of max_features. By Arnaud Joly.
  • Fix wrong maximal number of features drawn (max_features) at each split for ensemble.ExtraTreesClassifier and ensemble.ExtraTreesRegressor. Previously, only non constant features in the split was counted as drawn. Now constant features are counted as drawn. Furthermore at least one feature must be non constant in order to make a valid split. This bug fix will affect computational and generalization performance of extra trees in the presence of constant features. To get back previous generalization performance, you should modify the value of max_features. By Arnaud Joly.
  • Fix utils.compute_class_weight when class_weight=="auto". Previously it was broken for input of non-integer dtype and the weighted array that was returned was wrong. By Manoj Kumar.
  • Fix cross_validation.Bootstrap to return ValueError when n_train + n_test > n. By Ronald Phlypo.

People

List of contributors for release 0.15 by number of commits.

  • 312 Olivier Grisel
  • 275 Lars Buitinck
  • 221 Gael Varoquaux
  • 148 Arnaud Joly
  • 134 Johannes Schönberger
  • 119 Gilles Louppe
  • 113 Joel Nothman
  • 111 Alexandre Gramfort
  • 95 Jaques Grobler
  • 89 Denis Engemann
  • 83 Peter Prettenhofer
  • 83 Alexander Fabisch
  • 62 Mathieu Blondel
  • 60 Eustache Diemert
  • 60 Nelle Varoquaux
  • 49 Michael Bommarito
  • 45 Manoj-Kumar-S
  • 28 Kyle Kastner
  • 26 Andreas Mueller
  • 22 Noel Dawe
  • 21 Maheshakya Wijewardena
  • 21 Brooke Osborn
  • 21 Hamzeh Alsalhi
  • 21 Jake VanderPlas
  • 21 Philippe Gervais
  • 19 Bala Subrahmanyam Varanasi
  • 12 Ronald Phlypo
  • 10 Mikhail Korobov
  • 8 Thomas Unterthiner
  • 8 Jeffrey Blackburne
  • 8 eltermann
  • 8 bwignall
  • 7 Ankit Agrawal
  • 7 CJ Carey
  • 6 Daniel Nouri
  • 6 Chen Liu
  • 6 Michael Eickenberg
  • 6 ugurthemaster
  • 5 Aaron Schumacher
  • 5 Baptiste Lagarde
  • 5 Rajat Khanduja
  • 5 Robert McGibbon
  • 5 Sergio Pascual
  • 4 Alexis Metaireau
  • 4 Ignacio Rossi
  • 4 Virgile Fritsch
  • 4 Sebastian Säger
  • 4 Ilambharathi Kanniah
  • 4 sdenton4
  • 4 Robert Layton
  • 4 Alyssa
  • 4 Amos Waterland
  • 3 Andrew Tulloch
  • 3 murad
  • 3 Steven Maude
  • 3 Karol Pysniak
  • 3 Jacques Kvam
  • 3 cgohlke
  • 3 cjlin
  • 3 Michael Becker
  • 3 hamzeh
  • 3 Eric Jacobsen
  • 3 john collins
  • 3 kaushik94
  • 3 Erwin Marsi
  • 2 csytracy
  • 2 LK
  • 2 Vlad Niculae
  • 2 Laurent Direr
  • 2 Erik Shilts
  • 2 Raul Garreta
  • 2 Yoshiki Vázquez Baeza
  • 2 Yung Siang Liau
  • 2 abhishek thakur
  • 2 James Yu
  • 2 Rohit Sivaprasad
  • 2 Roland Szabo
  • 2 amormachine
  • 2 Alexis Mignon
  • 2 Oscar Carlsson
  • 2 Nantas Nardelli
  • 2 jess010
  • 2 kowalski87
  • 2 Andrew Clegg
  • 2 Federico Vaggi
  • 2 Simon Frid
  • 2 Félix-Antoine Fortin
  • 1 Ralf Gommers
  • 1 t-aft
  • 1 Ronan Amicel
  • 1 Rupesh Kumar Srivastava
  • 1 Ryan Wang
  • 1 Samuel Charron
  • 1 Samuel St-Jean
  • 1 Fabian Pedregosa
  • 1 Skipper Seabold
  • 1 Stefan Walk
  • 1 Stefan van der Walt
  • 1 Stephan Hoyer
  • 1 Allen Riddell
  • 1 Valentin Haenel
  • 1 Vijay Ramesh
  • 1 Will Myers
  • 1 Yaroslav Halchenko
  • 1 Yoni Ben-Meshulam
  • 1 Yury V. Zaytsev
  • 1 adrinjalali
  • 1 ai8rahim
  • 1 alemagnani
  • 1 alex
  • 1 benjamin wilson
  • 1 chalmerlowe
  • 1 dzikie drożdże
  • 1 jamestwebber
  • 1 matrixorz
  • 1 popo
  • 1 samuela
  • 1 François Boulogne
  • 1 Alexander Measure
  • 1 Ethan White
  • 1 Guilherme Trein
  • 1 Hendrik Heuer
  • 1 IvicaJovic
  • 1 Jan Hendrik Metzen
  • 1 Jean Michel Rouly
  • 1 Eduardo Ariño de la Rubia
  • 1 Jelle Zijlstra
  • 1 Eddy L O Jansson
  • 1 Denis
  • 1 John
  • 1 John Schmidt
  • 1 Jorge Cañardo Alastuey
  • 1 Joseph Perla
  • 1 Joshua Vredevoogd
  • 1 José Ricardo
  • 1 Julien Miotte
  • 1 Kemal Eren
  • 1 Kenta Sato
  • 1 David Cournapeau
  • 1 Kyle Kelley
  • 1 Daniele Medri
  • 1 Laurent Luce
  • 1 Laurent Pierron
  • 1 Luis Pedro Coelho
  • 1 DanielWeitzenfeld
  • 1 Craig Thompson
  • 1 Chyi-Kwei Yau
  • 1 Matthew Brett
  • 1 Matthias Feurer
  • 1 Max Linke
  • 1 Chris Filo Gorgolewski
  • 1 Charles Earl
  • 1 Michael Hanke
  • 1 Michele Orrù
  • 1 Bryan Lunt
  • 1 Brian Kearns
  • 1 Paul Butler
  • 1 Paweł Mandera
  • 1 Peter
  • 1 Andrew Ash
  • 1 Pietro Zambelli
  • 1 staubda