Version 0.13.1¶

February 23, 2013

The 0.13.1 release only fixes some bugs and does not add any new functionality.

Changelog¶

Fixed a testing error caused by the function cross_validation.train_test_split being interpreted as a test by Yaroslav Halchenko.
Fixed a bug in the reassignment of small clusters in the cluster.MiniBatchKMeans by Gael Varoquaux.
Fixed default value of gamma in decomposition.KernelPCA by Lars Buitinck.
Updated joblib to 0.7.0d by Gael Varoquaux.
Fixed scaling of the deviance in ensemble.GradientBoostingClassifier by Peter Prettenhofer.
Better tie-breaking in multiclass.OneVsOneClassifier by Andreas Müller.
Other small improvements to tests and documentation.

People¶

List of contributors for release 0.13.1 by number of commits.

16 Lars Buitinck
12 Andreas Müller
8 Gael Varoquaux
5 Robert Marchman
3 Peter Prettenhofer
2 Hrishikesh Huilgolkar
1 Bastiaan van den Berg
1 Diego Molla
1 Gilles Louppe
1 Mathieu Blondel
1 Nelle Varoquaux
1 Rafael Cunha de Almeida
1 Rolando Espinoza La fuente
1 Vlad Niculae
1 Yaroslav Halchenko

Version 0.13¶

January 21, 2013

New Estimator Classes¶

dummy.DummyClassifier and dummy.DummyRegressor, two data-independent predictors by Mathieu Blondel. Useful to sanity-check your estimators. See Dummy estimators in the user guide. Multioutput support added by Arnaud Joly.
decomposition.FactorAnalysis, a transformer implementing the classical factor analysis, by Christian Osendorfer and Alexandre Gramfort. See Factor Analysis in the user guide.
feature_extraction.FeatureHasher, a transformer implementing the “hashing trick” for fast, low-memory feature extraction from string fields by Lars Buitinck and feature_extraction.text.HashingVectorizer for text documents by Olivier Grisel See Feature hashing and Vectorizing a large text corpus with the hashing trick for the documentation and sample usage.
pipeline.FeatureUnion, a transformer that concatenates results of several other transformers by Andreas Müller. See FeatureUnion: composite feature spaces in the user guide.
random_projection.GaussianRandomProjection, random_projection.SparseRandomProjection and the function random_projection.johnson_lindenstrauss_min_dim. The first two are transformers implementing Gaussian and sparse random projection matrix by Olivier Grisel and Arnaud Joly. See Random Projection in the user guide.
kernel_approximation.Nystroem, a transformer for approximating arbitrary kernels by Andreas Müller. See Nystroem Method for Kernel Approximation in the user guide.
preprocessing.OneHotEncoder, a transformer that computes binary encodings of categorical features by Andreas Müller. See Encoding categorical features in the user guide.
linear_model.PassiveAggressiveClassifier and linear_model.PassiveAggressiveRegressor, predictors implementing an efficient stochastic optimization for linear models by Rob Zinkov and Mathieu Blondel. See Passive Aggressive Algorithms in the user guide.
ensemble.RandomTreesEmbedding, a transformer for creating high-dimensional sparse representations using ensembles of totally random trees by Andreas Müller. See Totally Random Trees Embedding in the user guide.
manifold.SpectralEmbedding and function manifold.spectral_embedding, implementing the “laplacian eigenmaps” transformation for non-linear dimensionality reduction by Wei Li. See Spectral Embedding in the user guide.
isotonic.IsotonicRegression by Fabian Pedregosa, Alexandre Gramfort and Nelle Varoquaux,

Changelog¶

metrics.zero_one_loss (formerly metrics.zero_one) now has option for normalized output that reports the fraction of misclassifications, rather than the raw number of misclassifications. By Kyle Beauchamp.
tree.DecisionTreeClassifier and all derived ensemble models now support sample weighting, by Noel Dawe and Gilles Louppe.
Speedup improvement when using bootstrap samples in forests of randomized trees, by Peter Prettenhofer and Gilles Louppe.
Partial dependence plots for Gradient-boosted trees in ensemble.partial_dependence.partial_dependence by Peter Prettenhofer. See Partial Dependence and Individual Conditional Expectation Plots for an example.
The table of contents on the website has now been made expandable by Jaques Grobler.
feature_selection.SelectPercentile now breaks ties deterministically instead of returning all equally ranked features.
feature_selection.SelectKBest and feature_selection.SelectPercentile are more numerically stable since they use scores, rather than p-values, to rank results. This means that they might sometimes select different features than they did previously.
Ridge regression and ridge classification fitting with sparse_cg solver no longer has quadratic memory complexity, by Lars Buitinck and Fabian Pedregosa.
Ridge regression and ridge classification now support a new fast solver called lsqr, by Mathieu Blondel.
Speed up of metrics.precision_recall_curve by Conrad Lee.
Added support for reading/writing svmlight files with pairwise preference attribute (qid in svmlight file format) in datasets.dump_svmlight_file and datasets.load_svmlight_file by Fabian Pedregosa.
Faster and more robust metrics.confusion_matrix and Clustering performance evaluation by Wei Li.
cross_validation.cross_val_score now works with precomputed kernels and affinity matrices, by Andreas Müller.
LARS algorithm made more numerically stable with heuristics to drop regressors too correlated as well as to stop the path when numerical noise becomes predominant, by Gael Varoquaux.
Faster implementation of metrics.precision_recall_curve by Conrad Lee.
New kernel metrics.chi2_kernel by Andreas Müller, often used in computer vision applications.
Fix of longstanding bug in naive_bayes.BernoulliNB fixed by Shaun Jackman.
Implemented predict_proba in multiclass.OneVsRestClassifier, by Andrew Winterman.
Improve consistency in gradient boosting: estimators ensemble.GradientBoostingRegressor and ensemble.GradientBoostingClassifier use the estimator tree.DecisionTreeRegressor instead of the tree._tree.Tree data structure by Arnaud Joly.
Fixed a floating point exception in the decision trees module, by Seberg.
Fix metrics.roc_curve fails when y_true has only one class by Wei Li.
Add the metrics.mean_absolute_error function which computes the mean absolute error. The metrics.mean_squared_error, metrics.mean_absolute_error and metrics.r2_score metrics support multioutput by Arnaud Joly.
Fixed class_weight support in svm.LinearSVC and linear_model.LogisticRegression by Andreas Müller. The meaning of class_weight was reversed as erroneously higher weight meant less positives of a given class in earlier releases.
Improve narrative documentation and consistency in sklearn.metrics for regression and classification metrics by Arnaud Joly.
Fixed a bug in sklearn.svm.SVC when using csr-matrices with unsorted indices by Xinfan Meng and Andreas Müller.
cluster.MiniBatchKMeans: Add random reassignment of cluster centers with little observations attached to them, by Gael Varoquaux.

API changes summary¶

Renamed all occurrences of n_atoms to n_components for consistency. This applies to decomposition.DictionaryLearning, decomposition.MiniBatchDictionaryLearning, decomposition.dict_learning, decomposition.dict_learning_online.
Renamed all occurrences of max_iters to max_iter for consistency. This applies to semi_supervised.LabelPropagation and semi_supervised.label_propagation.LabelSpreading.
Renamed all occurrences of learn_rate to learning_rate for consistency in ensemble.BaseGradientBoosting and ensemble.GradientBoostingRegressor.
The module sklearn.linear_model.sparse is gone. Sparse matrix support was already integrated into the “regular” linear models.
sklearn.metrics.mean_square_error, which incorrectly returned the accumulated error, was removed. Use metrics.mean_squared_error instead.
Passing class_weight parameters to fit methods is no longer supported. Pass them to estimator constructors instead.
GMMs no longer have decode and rvs methods. Use the score, predict or sample methods instead.
The solver fit option in Ridge regression and classification is now deprecated and will be removed in v0.14. Use the constructor option instead.
feature_extraction.text.DictVectorizer now returns sparse matrices in the CSR format, instead of COO.
Renamed k in cross_validation.KFold and cross_validation.StratifiedKFold to n_folds, renamed n_bootstraps to n_iter in cross_validation.Bootstrap.
Renamed all occurrences of n_iterations to n_iter for consistency. This applies to cross_validation.ShuffleSplit, cross_validation.StratifiedShuffleSplit, utils.extmath.randomized_range_finder and utils.extmath.randomized_svd.
Replaced rho in linear_model.ElasticNet and linear_model.SGDClassifier by l1_ratio. The rho parameter had different meanings; l1_ratio was introduced to avoid confusion. It has the same meaning as previously rho in linear_model.ElasticNet and (1-rho) in linear_model.SGDClassifier.
linear_model.LassoLars and linear_model.Lars now store a list of paths in the case of multiple targets, rather than an array of paths.
The attribute gmm of hmm.GMMHMM was renamed to gmm_ to adhere more strictly with the API.
cluster.spectral_embedding was moved to manifold.spectral_embedding.
Renamed eig_tol in manifold.spectral_embedding, cluster.SpectralClustering to eigen_tol, renamed mode to eigen_solver.
Renamed mode in manifold.spectral_embedding and cluster.SpectralClustering to eigen_solver.
classes_ and n_classes_ attributes of tree.DecisionTreeClassifier and all derived ensemble models are now flat in case of single output problems and nested in case of multi-output problems.
The estimators_ attribute of ensemble.GradientBoostingRegressor and ensemble.GradientBoostingClassifier is now an array of tree.DecisionTreeRegressor.
Renamed chunk_size to batch_size in decomposition.MiniBatchDictionaryLearning and decomposition.MiniBatchSparsePCA for consistency.
svm.SVC and svm.NuSVC now provide a classes_ attribute and support arbitrary dtypes for labels y. Also, the dtype returned by predict now reflects the dtype of y during fit (used to be np.float).
Changed default test_size in cross_validation.train_test_split to None, added possibility to infer test_size from train_size in cross_validation.ShuffleSplit and cross_validation.StratifiedShuffleSplit.
Renamed function sklearn.metrics.zero_one to sklearn.metrics.zero_one_loss. Be aware that the default behavior in sklearn.metrics.zero_one_loss is different from sklearn.metrics.zero_one: normalize=False is changed to normalize=True.
Renamed function metrics.zero_one_score to metrics.accuracy_score.
datasets.make_circles now has the same number of inner and outer points.
In the Naive Bayes classifiers, the class_prior parameter was moved from fit to __init__.

People¶

List of contributors for release 0.13 by number of commits.

364 Andreas Müller

143 Arnaud Joly

137 Peter Prettenhofer

131 Gael Varoquaux

117 Mathieu Blondel

108 Lars Buitinck

106 Wei Li

101 Olivier Grisel

65 Vlad Niculae

54 Gilles Louppe

40 Jaques Grobler

38 Alexandre Gramfort

30 Rob Zinkov

19 Aymeric Masurelle

18 Andrew Winterman

17 Fabian Pedregosa

17 Nelle Varoquaux

16 Christian Osendorfer

14 Daniel Nouri

13 Virgile Fritsch

13 syhw

12 Satrajit Ghosh

10 Corey Lynch

10 Kyle Beauchamp

9 Brian Cheung

9 Immanuel Bayer

9 mr.Shu

8 Conrad Lee

8 James Bergstra

7 Tadej Janež

6 Brian Cajes

6 Jake Vanderplas

6 Michael

6 Noel Dawe

6 Tiago Nunes

6 cow

5 Anze

5 Shiqiao Du

4 Christian Jauvin

4 Jacques Kvam

4 Richard T. Guy

4 Robert Layton

3 Alexandre Abraham

3 Doug Coleman

3 Scott Dickerson

2 ApproximateIdentity

2 John Benediktsson

2 Mark Veronda

2 Matti Lyra

2 Mikhail Korobov

2 Xinfan Meng

1 Alejandro Weinstein

1 Alexandre Passos

1 Christoph Deil

1 Eugene Nizhibitsky

1 Kenneth C. Arnold

1 Luis Pedro Coelho

1 Miroslav Batchkarov

1 Pavel

1 Sebastian Berg

1 Shaun Jackman

1 Subhodeep Moitra

1 bob

1 dengemann

1 emanuele

1 x006