UNCURL public functions¶
uncurl.max_variance_genes¶
-
uncurl.
max_variance_genes
(data, nbins=5, frac=0.2)[source]¶ This function identifies the genes that have the max variance across a number of bins sorted by mean.
Parameters: - data (array) – genes x cells
- nbins (int) – number of bins to sort genes by mean expression level. Default: 10.
- frac (float) – fraction of genes to return per bin - between 0 and 1. Default: 0.1
Returns: list of gene indices (list of ints)
uncurl.qualNorm¶
-
uncurl.
qualNorm
(data, qualitative)[source]¶ Generates starting points using binarized data. If qualitative data is missing for a given gene, all of its entries should be -1 in the qualitative matrix.
Parameters: - data (array) – 2d array of genes x cells
- qualitative (array) – 2d array of numerical data - genes x clusters
Returns: Array of starting positions for state estimation or clustering, with shape genes x clusters
uncurl.poisson_cluster¶
-
uncurl.
poisson_cluster
(data, k, init=None, max_iters=100)[source]¶ Performs Poisson hard EM on the given data.
Parameters: - data (array) – A 2d array- genes x cells. Can be dense or sparse; for best performance, sparse matrices should be in CSC format.
- k (int) – Number of clusters
- init (array, optional) – Initial centers - genes x k array. Default: None, use kmeans++
- max_iters (int, optional) – Maximum number of iterations. Default: 100
Returns: a cells x 1 vector of cluster assignments, and a genes x k array of cluster means.
Return type: a tuple of two arrays
uncurl.nb_cluster¶
-
uncurl.
nb_cluster
(data, k, P_init=None, R_init=None, assignments=None, means=None, max_iters=10)[source]¶ Performs negative binomial clustering on the given data. If some genes have mean > variance, then these genes are fitted to a Poisson distribution.
Parameters: - data (array) – genes x cells
- k (int) – number of clusters
- P_init (array) – NB success prob param - genes x k. Default: random
- R_init (array) – NB stopping param - genes x k. Default: random
- assignments (array) – cells x 1 array of integers 0...k-1. Default: kmeans-pp (poisson)
- means (array) – initial cluster means (for use with kmeans-pp to create initial assignments). Default: None
- max_iters (int) – default: 100
Returns: 1d array of length cells, containing integers 0...k-1 P (array): genes x k - value is 0 for genes with mean > var R (array): genes x k - value is inf for genes with mean > var
Return type: assignments (array)
uncurl.poisson_estimate_state¶
-
uncurl.
poisson_estimate_state
(data, clusters, init_means=None, init_weights=None, method='NoLips', max_iters=30, tol=1e-10, disp=False, inner_max_iters=100, normalize=True, initialization='tsvd', parallel=True, threads=4, max_assign_weight=0.75, run_w_first=True, constrain_w=False, regularization=0.0)[source]¶ Uses a Poisson Covex Mixture model to estimate cell states and cell state mixing weights.
To lower computational costs, use a sparse matrix, set disp to False, and set tol to 0.
Parameters: - data (array) – genes x cells array or sparse matrix.
- clusters (int) – number of mixture components
- init_means (array, optional) – initial centers - genes x clusters. Default: from Poisson kmeans
- init_weights (array, optional) – initial weights - clusters x cells, or assignments as produced by clustering. Default: from Poisson kmeans
- method (str, optional) – optimization method. Current options are ‘NoLips’ and ‘L-BFGS-B’. Default: ‘NoLips’.
- max_iters (int, optional) – maximum number of iterations. Default: 30
- tol (float, optional) – if both M and W change by less than tol (RMSE), then the iteration is stopped. Default: 1e-10
- disp (bool, optional) – whether or not to display optimization progress. Default: False
- inner_max_iters (int, optional) – Number of iterations to run in the optimization subroutine for M and W. Default: 100
- normalize (bool, optional) – True if the resulting W should sum to 1 for each cell. Default: True.
- initialization (str, optional) – If initial means and weights are not provided, this describes how they are initialized. Options: ‘cluster’ (poisson cluster for means and weights), ‘kmpp’ (kmeans++ for means, random weights), ‘km’ (regular k-means), ‘tsvd’ (tsvd(50) + k-means). Default: tsvd.
- parallel (bool, optional) – Whether to use parallel updates (sparse NoLips only). Default: True
- threads (int, optional) – How many threads to use in the parallel computation. Default: 4
- max_assign_weight (float, optional) – If using a clustering-based initialization, how much weight to assign to the max weight cluster. Default: 0.75
- run_w_first (bool, optional) – Whether or not to optimize W first (if false, M will be optimized first). Default: True
- constrain_w (bool, optional) – If True, then W is normalized after every iteration. Default: False
- regularization (float, optional) – Regularization coefficient for M and W. Default: 0 (no regularization).
Returns: genes x clusters - state means W (array): clusters x cells - state mixing components for each cell ll (float): final log-likelihood
Return type: M (array)
uncurl.nb_estimate_state¶
-
uncurl.
nb_estimate_state
(data, clusters, R=None, init_means=None, init_weights=None, max_iters=10, tol=0.0001, disp=True, inner_max_iters=400, normalize=True)[source]¶ Uses a Negative Binomial Mixture model to estimate cell states and cell state mixing weights.
If some of the genes do not fit a negative binomial distribution (mean > var), then the genes are discarded from the analysis.
Parameters: - data (array) – genes x cells
- clusters (int) – number of mixture components
- R (array, optional) – vector of length genes containing the dispersion estimates for each gene. Default: use nb_fit
- init_means (array, optional) – initial centers - genes x clusters. Default: kmeans++ initializations
- init_weights (array, optional) – initial weights - clusters x cells. Default: random(0,1)
- max_iters (int, optional) – maximum number of iterations. Default: 10
- tol (float, optional) – if both M and W change by less than tol (in RMSE), then the iteration is stopped. Default: 1e-4
- disp (bool, optional) – whether or not to display optimization parameters. Default: True
- inner_max_iters (int, optional) – Number of iterations to run in the scipy minimizer for M and W. Default: 400
- normalize (bool, optional) – True if the resulting W should sum to 1 for each cell. Default: True.
Returns: genes x clusters - state centers W (array): clusters x cells - state mixing components for each cell R (array): 1 x genes - NB dispersion parameter for each gene ll (float): Log-likelihood of final iteration
Return type: M (array)
uncurl.mds¶
uncurl.lineage¶
-
uncurl.
lineage
(means, weights, curve_function='poly', curve_dimensions=6)[source]¶ Lineage graph produced by minimum spanning tree
Parameters: - means (array) – genes x clusters - output of state estimation
- weights (array) – clusters x cells - output of state estimation
- curve_function (string) – either ‘poly’ or ‘fourier’. Default: ‘poly’
- curve_dimensions (int) – number of parameters for the curve. Default: 6
Returns: list of lists for each cluster smoothed data in 2d space: 2 x cells list of edges: pairs of cell indices cell cluster assignments: list of ints
Return type: curve parameters