.. _example_datasets_plot_random_multilabel_dataset.py: ============================================== Plot randomly generated multilabel dataset ============================================== This illustrates the `datasets.make_multilabel_classification` dataset generator. Each sample consists of counts of two features (up to 50 in total), which are differently distributed in each of two classes. Points are labeled as follows, where Y means the class is present: ===== ===== ===== ====== 1 2 3 Color ===== ===== ===== ====== Y N N Red N Y N Blue N N Y Yellow Y Y N Purple Y N Y Orange Y Y N Green Y Y Y Brown ===== ===== ===== ====== A star marks the expected sample for each class; its size reflects the probability of selecting that class label. The left and right examples highlight the ``n_labels`` parameter: more of the samples in the right plot have 2 or 3 labels. Note that this two-dimensional example is very degenerate: generally the number of features would be much greater than the "document length", while here we have much larger documents than vocabulary. Similarly, with ``n_classes > n_features``, it is much less likely that a feature distinguishes a particular class. .. image:: images/plot_random_multilabel_dataset_001.png :align: center **Script output**:: The data was generated from (random_state=169): Class P(C) P(w0|C) P(w1|C) red 0.30 0.34 0.66 blue 0.40 0.95 0.05 yellow 0.30 0.83 0.17 **Python source code:** :download:`plot_random_multilabel_dataset.py ` .. literalinclude:: plot_random_multilabel_dataset.py :lines: 36- **Total running time of the example:** 0.19 seconds ( 0 minutes 0.19 seconds)