.. _example_hetero_feature_union.py: ============================================= Feature Union with Heterogeneous Data Sources ============================================= Datasets can often contain components of that require different feature extraction and processing pipelines. This scenario might occur when: 1. Your dataset consists of heterogeneous data types (e.g. raster images and text captions) 2. Your dataset is stored in a Pandas DataFrame and different columns require different processing pipelines. This example demonstrates how to use :class:`sklearn.feature_extraction.FeatureUnion` on a dataset containing different types of features. We use the 20-newsgroups dataset and compute standard bag-of-words features for the subject line and body in separate pipelines as well as ad hoc features on the body. We combine them (with weights) using a FeatureUnion and finally train a classifier on the combined set of features. The choice of features is not particularly helpful, but serves to illustrate the technique. **Python source code:** :download:`hetero_feature_union.py ` .. literalinclude:: hetero_feature_union.py :lines: 25-