sklearn.datasets.dump_svmlight_file

sklearn.datasets.dump_svmlight_file(X, y, f, zero_based=True, comment=None, query_id=None, multilabel=False)[source]

Dump the dataset in svmlight / libsvm file format.

This format is a text-based format, with one sample per line. It does not store zero valued features hence is suitable for sparse dataset.

The first element of each line can be used to store a target variable to predict.

Parameters:

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape = [n_samples] or [n_samples, n_labels]

Target values. Class labels must be an integer or float, or array-like objects of integer or float for multilabel classifications.

f : string or file-like in binary mode

If string, specifies the path that will contain the data. If file-like, data will be written to f. f should be opened in binary mode.

zero_based : boolean, optional

Whether column indices should be written zero-based (True) or one-based (False).

comment : string, optional

Comment to insert at the top of the file. This should be either a Unicode string, which will be encoded as UTF-8, or an ASCII byte string. If a comment is given, then it will be preceded by one that identifies the file as having been dumped by scikit-learn. Note that not all tools grok comments in SVMlight files.

query_id : array-like, shape = [n_samples]

Array containing pairwise preference constraints (qid in svmlight format).

multilabel: boolean, optional :

Samples may have several labels each (see http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multilabel.html)

New in version 0.17: parameter multilabel to support multilabel datasets.

Examples using sklearn.datasets.dump_svmlight_file