Details on unsupported methods
==============================

There are a number of unsupported or experimental methods part of the UNCURL package. We provide information on them here for the sake of completeness but cannot vouch for their correctness.

Alternative state estimation methods
------------------------------------

We provide implementations of the convex mixture model for the negative binomial (NB) and zero-inflated Poisson (ZIP) distributions. In our experiments they did not work as well as the Poisson model on most datasets, and are substantially less efficient.

We also provide methods based on LDA (latent Dirichlet allocation), using the LightLDA implementation. The outputs of these methods can be interpreted as state estimation with a binomial sampling distribution. See ``lightlda_utils.py``. In practice, they had worse performance than Poisson state estimation in accuracy, runtime, and memory usage, especially on larger datasets.

Alternative clustering methods
------------------------------

As with state estimation, we provide NB and ZIP versions of k-means. The same efficiency considerations apply.

Dimensionality reduction
------------------------

The ``mds`` function performs dimensionality reduction using MDS. This works by running MDS on M to convert it into a projection matrix, and then using that matrix to project W onto 2d space. This is much faster than tSNE or even PCA, at the cost of some fidelity, but it might work as a first pass.

Example:

.. code-block:: python

    import numpy as np
    from uncurl import mds, dim_reduce_data

    data = np.loadtxt('counts.txt')

    # dimensionality reduction using MDS on state estimation means
    M, W, ll = poisson_estimate_state(data, 4)
    # proj is a 2d projection of the data.
    proj = mds(M, W, 2)


Lineage estimation
------------------

The ``lineage`` function performs lineage estimation from the output of ``poisson_estimate_state``. It fits the data to a different 5th degree polynomial for each cell type.

The ``pseudotime`` function calculates the pseudotime for each cell given the output of ``lineage`` and a starting cell.

Example (including visualization):

.. code-block:: python

    import numpy as np
    import matplotlib.pyplot as plt

    from uncurl import poisson_estimate_state, mds, lineage, pseudotime

    data = np.loadtxt('counts.txt')

    # pretend that there are three natural clusters in the dataset.
    M, W = poisson_estimate_state(data, 3)

    curve_params, smoothed_points, edges, cell_assignments = lineage(M, W)

    # assume the "root" is cell 0
    ptime = pseudotime(0, edges, smoothed_points)

    # visualizing the lineage
    proj = mds(M, W, 2)

    plt.scatter(proj[0,:], proj[1,:], s=10, c=cell_assignments, edgecolors='none', alpha=0.7)
    plt.scatter(smoothed_points[0,:], smoothed_points[1,:], s=30, c=cell_assignments, edgecolors='none', alpha=0.7)
    # connect the lines
    for edge in edges:
        plt.plot((smoothed_points[0, edge[0]], smoothed_points[0, edge[1]]),
                 (smoothed_points[1, edge[0]], smoothed_points[1, edge[1]]), 'black', linewidth=2)
    plt.xlabel('dim 1')
    plt.ylabel('dim 2')


Ensemble Methods
----------------

Consensus clustering, consensus clustering-based initialization for uncurl, etc. This requires the `Cluster_Ensembles package <https://github.com/GGiecold/Cluster_Ensembles>`_.


Visualization
-------------

see ``vis.py``