`__.
.. _spectral_biclustering:
Spectral Biclustering
=====================
The :class:`SpectralBiclustering` algorithm assumes that the input
data matrix has a hidden checkerboard structure. The rows and columns
of a matrix with this structure may be partitioned so that the entries
of any bicluster in the Cartesian product of row clusters and column
clusters are approximately constant. For instance, if there are two
row partitions and three column partitions, each row will belong to
three biclusters, and each column will belong to two biclusters.
The algorithm partitions the rows and columns of a matrix so that a
corresponding blockwise-constant checkerboard matrix provides a good
approximation to the original matrix.
Mathematical formulation
------------------------
The input matrix :math:`A` is first normalized to make the
checkerboard pattern more obvious. There are three possible methods:
1. *Independent row and column normalization*, as in Spectral
Co-Clustering. This method makes the rows sum to a constant and the
columns sum to a different constant.
2. **Bistochastization**: repeated row and column normalization until
convergence. This method makes both rows and columns sum to the
same constant.
3. **Log normalization**: the log of the data matrix is computed: :math:`L =
\log A`. Then the column mean :math:`\overline{L_{i \cdot}}`, row mean
:math:`\overline{L_{\cdot j}}`, and overall mean :math:`\overline{L_{\cdot
\cdot}}` of :math:`L` are computed. The final matrix is computed
according to the formula
.. math::
K_{ij} = L_{ij} - \overline{L_{i \cdot}} - \overline{L_{\cdot
j}} + \overline{L_{\cdot \cdot}}
After normalizing, the first few singular vectors are computed, just
as in the Spectral Co-Clustering algorithm.
If log normalization was used, all the singular vectors are
meaningful. However, if independent normalization or bistochastization
were used, the first singular vectors, :math:`u_1` and :math:`v_1`.
are discarded. From now on, the "first" singular vectors refers to
:math:`u_2 \dots u_{p+1}` and :math:`v_2 \dots v_{p+1}` except in the
case of log normalization.
Given these singular vectors, they are ranked according to which can
be best approximated by a piecewise-constant vector. The
approximations for each vector are found using one-dimensional k-means
and scored using the Euclidean distance. Some subset of the best left
and right singular vector are selected. Next, the data is projected to
this best subset of singular vectors and clustered.
For instance, if :math:`p` singular vectors were calculated, the
:math:`q` best are found as described, where :math:`q`__.
.. _biclustering_evaluation:
.. currentmodule:: sklearn.metrics
Biclustering evaluation
=======================
There are two ways of evaluating a biclustering result: internal and
external. Internal measures, such as cluster stability, rely only on
the data and the result themselves. Currently there are no internal
bicluster measures in scikit-learn. External measures refer to an
external source of information, such as the true solution. When
working with real data the true solution is usually unknown, but
biclustering artificial data may be useful for evaluating algorithms
precisely because the true solution is known.
To compare a set of found biclusters to the set of true biclusters,
two similarity measures are needed: a similarity measure for
individual biclusters, and a way to combine these individual
similarities into an overall score.
To compare individual biclusters, several measures have been used. For
now, only the Jaccard index is implemented:
.. math::
J(A, B) = \frac{|A \cap B|}{|A| + |B| - |A \cap B|}
where :math:`A` and :math:`B` are biclusters, :math:`|A \cap B|` is
the number of elements in their intersection. The Jaccard index
achieves its minimum of 0 when the biclusters to not overlap at all
and its maximum of 1 when they are identical.
Several methods have been developed to compare two sets of biclusters.
For now, only :func:`consensus_score` (Hochreiter et. al., 2010) is
available:
1. Compute bicluster similarities for pairs of biclusters, one in each
set, using the Jaccard index or a similar measure.
2. Assign biclusters from one set to another in a one-to-one fashion
to maximize the sum of their similarities. This step is performed
using the Hungarian algorithm.
3. The final sum of similarities is divided by the size of the larger
set.
The minimum consensus score, 0, occurs when all pairs of biclusters
are totally dissimilar. The maximum score, 1, occurs when both sets
are identical.
.. topic:: References:
* Hochreiter, Bodenhofer, et. al., 2010. `FABIA: factor analysis
for bicluster acquisition
`__.