August 7, 2013
Missing values with sparse and dense matrices can be imputed with the transformer
preprocessing.Imputerby Nicolas Trésegnie.
The core implementation of decisions trees has been rewritten from scratch, allowing for faster tree induction and lower memory consumption in all tree-based estimators. By Gilles Louppe.
grid_search.ParameterSamplerfor randomized hyperparameter optimization. By Andreas Müller.
Added biclustering algorithms (
sklearn.cluster.bicluster.SpectralBiclustering), data generation methods (
sklearn.datasets.make_checkerboard), and scoring metrics (
sklearn.metrics.consensus_score). By Kemal Eren.
cross_validation.cross_val_scorenow support the use of advanced scoring function such as area under the ROC curve and f-beta scores. See The scoring parameter: defining model evaluation rules for details. By Andreas Müller and Lars Buitinck. Passing a function from
Multi-label classification output is now supported by
metrics.recall_scoreby Arnaud Joly.
feature_extraction.text.TfidfVectorizer, which used to be 2, has been reset to 1 to avoid unpleasant surprises (empty vocabularies) for novice users who try it out on tiny document collections. A value of at least 2 is still recommended for practical use.
linear_model.SGDRegressornow have a
sparsifymethod that converts their
coef_into a sparse matrix, meaning stored models trained using these estimators can be made much more compact.
linear_model.SGDClassifiernow produces multiclass probability estimates when trained under log loss or modified Huber loss.
Hyperlinks to documentation in example code on the website by Martin Luessi.
A bug that caused
ensemble.AdaBoostClassifier’s to output incorrect probabilities has been fixed.
Feature selectors now share a mixin providing consistent
get_supportmethods. By Joel Nothman.
grid_search.RandomizedSearchCVcan now generally be pickled. By Joel Nothman.
The default number of components for
sklearn.decomposition.RandomizedPCAis now correctly documented to be
n_features. This was the default behavior, so programs using it will continue to work as they did.
Verbose output in
sklearn.ensemble.gradient_boostingnow uses a column format and prints progress in decreasing frequency. It also shows the remaining time. By Peter Prettenhofer.
sklearn.ensemble.gradient_boostingprovides out-of-bag improvement
oob_improvement_rather than the OOB score for model selection. An example that shows how to use OOB estimates to select the number of trees was added. By Peter Prettenhofer.
Fixed a bug in
sklearn.covariance.GraphLassoCV: the ‘alphas’ parameter now works as expected when given a list of values. By Philippe Gervais.
Fixed an important bug in
sklearn.covariance.GraphLassoCVthat prevented all folds provided by a CV object to be used (only the first 3 were used). When providing a CV object, execution time may thus increase significantly compared to the previous version (bug results are correct now). By Philippe Gervais.
grid_searchmodule is now tested with multi-output data by Arnaud Joly.
neighbors.RadiusNeighborsRegressor, and radius neighbors,
neighbors.RadiusNeighborsClassifiersupport multioutput data by Arnaud Joly.
Random state in LibSVM-based estimators (
svm.NuSVR) can now be controlled. This is useful to ensure consistency in the probability estimates for the classifiers trained with
probability=True. By Vlad Niculae.
Improved documentation on multi-class, multi-label and multi-output classification by Yannick Schwartz and Arnaud Joly.
Speed optimization of the
hmmmodule by Mikhail Korobov
API changes summary¶
Testing scikit-learn with
sklearn.test()is deprecated. Use
nosetests sklearnfrom the command line.
Feature importances in
tree.DecisionTreeRegressorand all derived ensemble estimators are now computed on the fly when accessing the
compute_importances=Trueis no longer required. By Gilles Louppe.
linear_model.enet_pathcan return its results in the same format as that of
linear_model.lars_path. This is done by setting the
False. By Jaques Grobler and Alexandre Gramfort
grid_search.IterGridwas renamed to
Fixed bug in
KFoldcausing imperfect class balance in some cases. By Alexandre Gramfort and Tadej Janež.
sklearn.neighbors.BallTreehas been refactored, and a
sklearn.neighbors.KDTreehas been added which shares the same interface. The Ball Tree now works with a wide variety of distance metrics. Both classes have many new methods, including single-tree and dual-tree queries, breadth-first and depth-first searching, and more advanced queries such as kernel density estimation and 2-point correlation functions. By Jake Vanderplas
Support for scipy.spatial.cKDTree within neighbors queries has been removed, and the functionality replaced with the new
sklearn.neighbors.KernelDensityhas been added, which performs efficient kernel density estimation with a variety of kernels.
sklearn.decomposition.KernelPCAnow always returns output with
n_componentscomponents, unless the new parameter
remove_zero_eigis set to
True. This new behavior is consistent with the way kernel PCA was always documented; previously, the removal of components with zero eigenvalues was tacitly performed on all data.
gcv_mode="auto"no longer tries to perform SVD on a densified sparse matrix in
Sparse matrix support in
sklearn.decomposition.RandomizedPCAis now deprecated in favor of the new
n_folds >= 2otherwise a
ValueErroris raised. By Olivier Grisel.
charset_errorsparameters were renamed
Attributes in OrthogonalMatchingPursuit have been deprecated (copy_X, Gram, …) and precompute_gram renamed precompute for consistency. See #2224.
sklearn.preprocessing.StandardScalernow converts integer input to float, and raises a warning. Previously it rounded for dense integer input.
sklearn.multiclass.OneVsRestClassifiernow has a
decision_functionmethod. This will return the distance of each sample from the decision boundary for each class, as long as the underlying estimators implement the
decision_functionmethod. By Kyle Kastner.
Better input validation, warning on unexpected shapes for y.
List of contributors for release 0.14 by number of commits.
277 Gilles Louppe
245 Lars Buitinck
187 Andreas Mueller
124 Arnaud Joly
112 Jaques Grobler
109 Gael Varoquaux
107 Olivier Grisel
102 Noel Dawe
99 Kemal Eren
79 Joel Nothman
75 Jake VanderPlas
73 Nelle Varoquaux
71 Vlad Niculae
65 Peter Prettenhofer
64 Alexandre Gramfort
54 Mathieu Blondel
38 Nicolas Trésegnie
27 Denis Engemann
25 Yann N. Dauphin
19 Justin Vincent
17 Robert Layton
15 Doug Coleman
14 Michael Eickenberg
13 Robert Marchman
11 Fabian Pedregosa
11 Philippe Gervais
10 Jim Holmström
10 Tadej Janež
9 Mikhail Korobov
9 Steven De Gryze
7 Ben Root
7 Hrishikesh Huilgolkar
6 Kyle Kastner
6 Martin Luessi
6 Rob Speer
5 Federico Vaggi
5 Raul Garreta
5 Rob Zinkov
4 Ken Geis
3 A. Flaxman
3 Denton Cockburn
3 Dougal Sutherland
3 Ian Ozsvald
3 Johannes Schönberger
3 Robert McGibbon
3 Roman Sinayev
3 Szabo Roland
2 Diego Molla
2 Imran Haque
2 Jochen Wersdörfer
2 Sergey Karayev
2 Yannick Schwartz
1 Abhijeet Kolhe
1 Alexander Fabisch
1 Bastiaan van den Berg
1 Benjamin Peterson
1 Daniel Velkov
1 Fazlul Shahriar
1 Felix Brockherde
1 Félix-Antoine Fortin
1 Harikrishnan S
1 Jack Hale
1 James McDermott
1 John Benediktsson
1 John Zwinck
1 Joshua Vredevoogd
1 Justin Pati
1 Kevin Hughes
1 Kyle Kelley
1 Matthias Ekman
1 Miroslav Shubernetskiy
1 Naoki Orii
1 Norbert Crombach
1 Rafael Cunha de Almeida
1 Rolando Espinoza La fuente
1 Seamus Abshere
1 Sergey Feldman
1 Sergio Medina
1 Stefano Lattarini
1 Steve Koch
1 Sturla Molden
1 Thomas Jarosch
1 Yaroslav Halchenko