sklearn.utils.sparsefuncs.incr_mean_variance_axis

sklearn.utils.sparsefuncs.incr_mean_variance_axis(X, *, axis, last_mean, last_var, last_n, weights=None)[source]

Compute incremental mean and variance along an axis on a CSR or CSC matrix.

last_mean, last_var are the statistics computed at the last step by this function. Both must be initialized to 0-arrays of the proper size, i.e. the number of features in X. last_n is the number of samples encountered until now.

Parameters:
XCSR or CSC sparse matrix of shape (n_samples, n_features)

Input data.

axis{0, 1}

Axis along which the axis should be computed.

last_meanndarray of shape (n_features,) or (n_samples,), dtype=floating

Array of means to update with the new data X. Should be of shape (n_features,) if axis=0 or (n_samples,) if axis=1.

last_varndarray of shape (n_features,) or (n_samples,), dtype=floating

Array of variances to update with the new data X. Should be of shape (n_features,) if axis=0 or (n_samples,) if axis=1.

last_nfloat or ndarray of shape (n_features,) or (n_samples,), dtype=floating

Sum of the weights seen so far, excluding the current weights If not float, it should be of shape (n_features,) if axis=0 or (n_samples,) if axis=1. If float it corresponds to having same weights for all samples (or features).

weightsndarray of shape (n_samples,) or (n_features,), default=None

If axis is set to 0 shape is (n_samples,) or if axis is set to 1 shape is (n_features,). If it is set to None, then samples are equally weighted.

New in version 0.24.

Returns:
meansndarray of shape (n_features,) or (n_samples,), dtype=floating

Updated feature-wise means if axis = 0 or sample-wise means if axis = 1.

variancesndarray of shape (n_features,) or (n_samples,), dtype=floating

Updated feature-wise variances if axis = 0 or sample-wise variances if axis = 1.

nndarray of shape (n_features,) or (n_samples,), dtype=integral

Updated number of seen samples per feature if axis=0 or number of seen features per sample if axis=1.

If weights is not None, n is a sum of the weights of the seen samples or features instead of the actual number of seen samples or features.

Notes

NaNs are ignored in the algorithm.

Examples

>>> from sklearn.utils import sparsefuncs
>>> from scipy import sparse
>>> import numpy as np
>>> indptr = np.array([0, 3, 4, 4, 4])
>>> indices = np.array([0, 1, 2, 2])
>>> data = np.array([8, 1, 2, 5])
>>> scale = np.array([2, 3, 2])
>>> csr = sparse.csr_matrix((data, indices, indptr))
>>> csr.todense()
matrix([[8, 1, 2],
        [0, 0, 5],
        [0, 0, 0],
        [0, 0, 0]])
>>> sparsefuncs.incr_mean_variance_axis(
...     csr, axis=0, last_mean=np.zeros(3), last_var=np.zeros(3), last_n=2
... )
(array([1.3..., 0.1..., 1.1...]), array([8.8..., 0.1..., 3.4...]),
array([6., 6., 6.]))