.. _sphx_glr_auto_examples_text:

.. _text_examples:

Working with text documents
----------------------------

Examples concerning the :mod:`sklearn.feature_extraction.text` module.



.. raw:: html

    <div class="sphx-glr-thumbnails">


.. raw:: html

    <div class="sphx-glr-thumbcontainer" tooltip="This is an example showing how scikit-learn can be used to classify documents by topics using a...">

.. only:: html

  .. image:: /auto_examples/text/images/thumb/sphx_glr_plot_document_classification_20newsgroups_thumb.png
    :alt:

  :ref:`sphx_glr_auto_examples_text_plot_document_classification_20newsgroups.py`

.. raw:: html

      <div class="sphx-glr-thumbnail-title">Classification of text documents using sparse features</div>
    </div>


.. raw:: html

    <div class="sphx-glr-thumbcontainer" tooltip="This is an example showing how the scikit-learn API can be used to cluster documents by topics ...">

.. only:: html

  .. image:: /auto_examples/text/images/thumb/sphx_glr_plot_document_clustering_thumb.png
    :alt:

  :ref:`sphx_glr_auto_examples_text_plot_document_clustering.py`

.. raw:: html

      <div class="sphx-glr-thumbnail-title">Clustering text documents using k-means</div>
    </div>


.. raw:: html

    <div class="sphx-glr-thumbcontainer" tooltip="In this example we illustrate text vectorization, which is the process of representing non-nume...">

.. only:: html

  .. image:: /auto_examples/text/images/thumb/sphx_glr_plot_hashing_vs_dict_vectorizer_thumb.png
    :alt:

  :ref:`sphx_glr_auto_examples_text_plot_hashing_vs_dict_vectorizer.py`

.. raw:: html

      <div class="sphx-glr-thumbnail-title">FeatureHasher and DictVectorizer Comparison</div>
    </div>


.. raw:: html

    </div>


.. toctree::
   :hidden:

   /auto_examples/text/plot_document_classification_20newsgroups
   /auto_examples/text/plot_document_clustering
   /auto_examples/text/plot_hashing_vs_dict_vectorizer