12. Data Interoperability#
Scikit-learn handles four kinds of data for X as used in fit(X, y), fit(X),
fit_transform(X) and transform(X) as well as Xt as returned by
transform(X) and fit_transform(X):
array-like objects
In
fit(X)andtransform(X), array-likeXis converted to a numpy ndarray by callingnumpy.asarrayupon them. The returnedXtoftransformandfit_transformis also a numpy ndarray or it is a sparse matrix or sparse array, see next bullet.sparse matrices and sparse arrays
Many estimators can deal with sparse
X, some cannot and will raise an error. For instance,linear_model.LogisticRegressioncan be fit on sparseX,isotonic.IsotonicRegressioncan not.Some transformers return sparse
Xtfromtransformandfit_transform. Most often, it can be controlled by asparse_outputparameter as inpreprocessing.SplineTransformer.To control whether it returns a sparse matrix or a sparse array, use
sparse_interfaceinconfig_contextorset_config. This also controls whether sparse attributes are sparse matrices or sparse arrays.tabular data: pandas and polars dataframes
See Pandas/Polars Output for Transformers with set_output API.
Array API compliant arrays
Very importantly, this includes arrays on the GPU, see Array API support (experimental).