UNCURL public functions¶

uncurl.max_variance_genes¶

uncurl.max_variance_genes(data, nbins=5, frac=0.2)[source]¶

This function identifies the genes that have the max variance across a number of bins sorted by mean.

Parameters:	data (array) – genes x cells nbins (int) – number of bins to sort genes by mean expression level. Default: 10. frac (float) – fraction of genes to return per bin - between 0 and 1. Default: 0.1
Returns:	list of gene indices (list of ints)

uncurl.qualNorm¶

uncurl.qualNorm(data, qualitative)[source]¶

Generates starting points using binarized data. If qualitative data is missing for a given gene, all of its entries should be -1 in the qualitative matrix.

Parameters:	data (array) – 2d array of genes x cells qualitative (array) – 2d array of numerical data - genes x clusters
Returns:	Array of starting positions for state estimation or clustering, with shape genes x clusters

uncurl.poisson_cluster¶

uncurl.poisson_cluster(data, k, init=None, max_iters=100)[source]¶

Performs Poisson hard EM on the given data.

Parameters:	data (array) – A 2d array- genes x cells. Can be dense or sparse; for best performance, sparse matrices should be in CSC format. k (int) – Number of clusters init (array, optional) – Initial centers - genes x k array. Default: None, use kmeans++ max_iters (int, optional) – Maximum number of iterations. Default: 100
Returns:	a cells x 1 vector of cluster assignments, and a genes x k array of cluster means.
Return type:	a tuple of two arrays

uncurl.nb_cluster¶

uncurl.nb_cluster(data, k, P_init=None, R_init=None, assignments=None, means=None, max_iters=10)[source]¶

Performs negative binomial clustering on the given data. If some genes have mean > variance, then these genes are fitted to a Poisson distribution.

Parameters:	data (array) – genes x cells k (int) – number of clusters P_init (array) – NB success prob param - genes x k. Default: random R_init (array) – NB stopping param - genes x k. Default: random assignments (array) – cells x 1 array of integers 0...k-1. Default: kmeans-pp (poisson) means (array) – initial cluster means (for use with kmeans-pp to create initial assignments). Default: None max_iters (int) – default: 100
Returns:	1d array of length cells, containing integers 0...k-1 P (array): genes x k - value is 0 for genes with mean > var R (array): genes x k - value is inf for genes with mean > var
Return type:	assignments (array)

uncurl.poisson_estimate_state¶

uncurl.poisson_estimate_state(data, clusters, init_means=None, init_weights=None, method='NoLips', max_iters=30, tol=1e-10, disp=False, inner_max_iters=100, normalize=True, initialization='tsvd', parallel=True, threads=4, max_assign_weight=0.75, run_w_first=True, constrain_w=False, regularization=0.0)[source]¶

Uses a Poisson Covex Mixture model to estimate cell states and cell state mixing weights.

To lower computational costs, use a sparse matrix, set disp to False, and set tol to 0.

Parameters:	data (array) – genes x cells array or sparse matrix. clusters (int) – number of mixture components init_means (array, optional) – initial centers - genes x clusters. Default: from Poisson kmeans init_weights (array, optional) – initial weights - clusters x cells, or assignments as produced by clustering. Default: from Poisson kmeans method (str, optional) – optimization method. Current options are ‘NoLips’ and ‘L-BFGS-B’. Default: ‘NoLips’. max_iters (int, optional) – maximum number of iterations. Default: 30 tol (float, optional) – if both M and W change by less than tol (RMSE), then the iteration is stopped. Default: 1e-10 disp (bool, optional) – whether or not to display optimization progress. Default: False inner_max_iters (int, optional) – Number of iterations to run in the optimization subroutine for M and W. Default: 100 normalize (bool, optional) – True if the resulting W should sum to 1 for each cell. Default: True. initialization (str, optional) – If initial means and weights are not provided, this describes how they are initialized. Options: ‘cluster’ (poisson cluster for means and weights), ‘kmpp’ (kmeans++ for means, random weights), ‘km’ (regular k-means), ‘tsvd’ (tsvd(50) + k-means). Default: tsvd. parallel (bool, optional) – Whether to use parallel updates (sparse NoLips only). Default: True threads (int, optional) – How many threads to use in the parallel computation. Default: 4 max_assign_weight (float, optional) – If using a clustering-based initialization, how much weight to assign to the max weight cluster. Default: 0.75 run_w_first (bool, optional) – Whether or not to optimize W first (if false, M will be optimized first). Default: True constrain_w (bool, optional) – If True, then W is normalized after every iteration. Default: False regularization (float, optional) – Regularization coefficient for M and W. Default: 0 (no regularization).
Returns:	genes x clusters - state means W (array): clusters x cells - state mixing components for each cell ll (float): final log-likelihood
Return type:	M (array)

uncurl.nb_estimate_state¶

uncurl.nb_estimate_state(data, clusters, R=None, init_means=None, init_weights=None, max_iters=10, tol=0.0001, disp=True, inner_max_iters=400, normalize=True)[source]¶

Uses a Negative Binomial Mixture model to estimate cell states and cell state mixing weights.

If some of the genes do not fit a negative binomial distribution (mean > var), then the genes are discarded from the analysis.

Parameters:	data (array) – genes x cells clusters (int) – number of mixture components R (array, optional) – vector of length genes containing the dispersion estimates for each gene. Default: use nb_fit init_means (array, optional) – initial centers - genes x clusters. Default: kmeans++ initializations init_weights (array, optional) – initial weights - clusters x cells. Default: random(0,1) max_iters (int, optional) – maximum number of iterations. Default: 10 tol (float, optional) – if both M and W change by less than tol (in RMSE), then the iteration is stopped. Default: 1e-4 disp (bool, optional) – whether or not to display optimization parameters. Default: True inner_max_iters (int, optional) – Number of iterations to run in the scipy minimizer for M and W. Default: 400 normalize (bool, optional) – True if the resulting W should sum to 1 for each cell. Default: True.
Returns:	genes x clusters - state centers W (array): clusters x cells - state mixing components for each cell R (array): 1 x genes - NB dispersion parameter for each gene ll (float): Log-likelihood of final iteration
Return type:	M (array)

uncurl.mds¶

uncurl.mds(means, weights, d)[source]¶

Dimensionality reduction using MDS.

Parameters:	means (array) – genes x clusters weights (array) – clusters x cells d (int) – desired dimensionality
Returns:	array of shape (d, cells)
Return type:	W_reduced (array)

uncurl.lineage¶

uncurl.lineage(means, weights, curve_function='poly', curve_dimensions=6)[source]¶

Lineage graph produced by minimum spanning tree

Parameters:	means (array) – genes x clusters - output of state estimation weights (array) – clusters x cells - output of state estimation curve_function (string) – either ‘poly’ or ‘fourier’. Default: ‘poly’ curve_dimensions (int) – number of parameters for the curve. Default: 6
Returns:	list of lists for each cluster smoothed data in 2d space: 2 x cells list of edges: pairs of cell indices cell cluster assignments: list of ints
Return type:	curve parameters

uncurl.pseudotime¶

uncurl.pseudotime(starting_node, edges, fitted_vals)[source]¶

Parameters:	starting_node (int) – index of the starting node edges (list) – list of tuples (node1, node2) fitted_vals (array) – output of lineage (2 x cells)
Returns:	A 1d array containing the pseudotime value of each cell.