uncurl package¶

Submodules¶

uncurl.preprocessing module¶

Misc functions...

uncurl.preprocessing.cell_normalize(data)[source]¶: Returns the data where the expression is normalized so that the total count per cell is equal.

uncurl.preprocessing.log1p(data)[source]¶: Returns ln(data+1), whether the original data is dense or sparse.

uncurl.preprocessing.max_variance_genes(data, nbins=5, frac=0.2)[source]¶

This function identifies the genes that have the max variance across a number of bins sorted by mean.

Parameters:	data (array) – genes x cells nbins (int) – number of bins to sort genes by mean expression level. Default: 10. frac (float) – fraction of genes to return per bin - between 0 and 1. Default: 0.1
Returns:	list of gene indices (list of ints)

uncurl.preprocessing.sparse_mean_var(data)[source]¶

Calculates the variance for each row of a sparse matrix, using the relationship Var = E[x^2] - E[x]^2.

Returns:	pair of matrices mean, variance.

uncurl.run_se module¶

uncurl.run_se.run_state_estimation(data, clusters, dist='Poiss', reps=1, **kwargs)[source]¶

Runs state estimation for multiple initializations, returning the result with the highest log-likelihood. All the arguments are passed to the underlying state estimation functions (poisson_estimate_state, nb_estimate_state, zip_estimate_state).

Parameters:	data (array) – genes x cells clusters (int) – number of mixture components dist (str, optional) – Distribution used in state estimation. Options: ‘Poiss’, ‘NB’, ‘ZIP’, ‘LogNorm’, ‘Gaussian’. Default: ‘Poiss’ reps (int, optional) – number of times to run the state estimation, taking the result with the highest log-likelihood. **kwargs – arguments to pass to the underlying state estimation function.
Returns:	genes x clusters - state means W (array): clusters x cells - state mixing components for each cell ll (float): final log-likelihood
Return type:	M (array)

uncurl.state_estimation module¶

uncurl.state_estimation.initialize_from_assignments(assignments, k, max_assign_weight=0.75)[source]¶

Creates a weight initialization matrix from Poisson clustering assignments.

Parameters:	assignments (array) – 1D array of integers, of length cells k (int) – number of states/clusters max_assign_weight (float, optional) – between 0 and 1 - how much weight to assign to the highest cluster. Default: 0.75
Returns:	k x cells
Return type:	init_W (array)

uncurl.state_estimation.initialize_means(data, clusters, k)[source]¶

Initializes the M matrix given the data and a set of cluster labels. Cluster centers are set to the mean of each cluster.

Parameters:	data (array) – genes x cells clusters (array) – 1d array of ints (0...k-1) k (int) – number of clusters

uncurl.state_estimation.initialize_means_weights(data, clusters, init_means=None, init_weights=None, initialization='tsvd', max_assign_weight=0.75)[source]¶: Generates initial means and weights for state estimation.

uncurl.state_estimation.initialize_weights_nn(data, means, lognorm=True)[source]¶: Initializes the weights with a nearest-neighbor approach using the means.

uncurl.state_estimation.poisson_estimate_state(data, clusters, init_means=None, init_weights=None, method='NoLips', max_iters=30, tol=1e-10, disp=False, inner_max_iters=100, normalize=True, initialization='tsvd', parallel=True, threads=4, max_assign_weight=0.75, run_w_first=True, constrain_w=False, regularization=0.0)[source]¶

Uses a Poisson Covex Mixture model to estimate cell states and cell state mixing weights.

To lower computational costs, use a sparse matrix, set disp to False, and set tol to 0.

Parameters:	data (array) – genes x cells array or sparse matrix. clusters (int) – number of mixture components init_means (array, optional) – initial centers - genes x clusters. Default: from Poisson kmeans init_weights (array, optional) – initial weights - clusters x cells, or assignments as produced by clustering. Default: from Poisson kmeans method (str, optional) – optimization method. Current options are ‘NoLips’ and ‘L-BFGS-B’. Default: ‘NoLips’. max_iters (int, optional) – maximum number of iterations. Default: 30 tol (float, optional) – if both M and W change by less than tol (RMSE), then the iteration is stopped. Default: 1e-10 disp (bool, optional) – whether or not to display optimization progress. Default: False inner_max_iters (int, optional) – Number of iterations to run in the optimization subroutine for M and W. Default: 100 normalize (bool, optional) – True if the resulting W should sum to 1 for each cell. Default: True. initialization (str, optional) – If initial means and weights are not provided, this describes how they are initialized. Options: ‘cluster’ (poisson cluster for means and weights), ‘kmpp’ (kmeans++ for means, random weights), ‘km’ (regular k-means), ‘tsvd’ (tsvd(50) + k-means). Default: tsvd. parallel (bool, optional) – Whether to use parallel updates (sparse NoLips only). Default: True threads (int, optional) – How many threads to use in the parallel computation. Default: 4 max_assign_weight (float, optional) – If using a clustering-based initialization, how much weight to assign to the max weight cluster. Default: 0.75 run_w_first (bool, optional) – Whether or not to optimize W first (if false, M will be optimized first). Default: True constrain_w (bool, optional) – If True, then W is normalized after every iteration. Default: False regularization (float, optional) – Regularization coefficient for M and W. Default: 0 (no regularization).
Returns:	genes x clusters - state means W (array): clusters x cells - state mixing components for each cell ll (float): final log-likelihood
Return type:	M (array)

uncurl.nmf_wrapper module¶

uncurl.nmf_wrapper.log_norm_nmf(data, k, normalize_w=True, return_cost=True, init_weights=None, init_means=None, **kwargs)[source]¶

Parameters:

data (array) – dense or sparse array with shape (genes, cells)
k (int) – number of cell types
normalize_w (bool, optional) – True if W should be normalized (so that each column sums to 1). Default: True
return_cost (bool, optional) – True if the NMF objective value (squared error) should be returned. Default: True
init_weights (array, optional) – Initial value for W. Default: None
init_means (array, optional) – Initial value for M. Default: None
**kwargs – misc arguments to NMF

Returns:

Two matrices M of shape (genes, k) and W of shape (k, cells). They correspond to M and M in Poisson state estimation. If return_cost is True (which it is by default), then the cost will also be returned. This might be prohibitably costly

uncurl.nmf_wrapper.nmf_init(data, clusters, k, init='enhanced')[source]¶

Generates initial M and W given a data set and an array of cluster labels.

There are 3 options for init:: enhanced - uses EIn-NMF from Gong 2013 basic - uses means for M, assigns W such that the chosen cluster for a given cell has value 0.75 and all others have 0.25/(k-1). nmf - uses means for M, and assigns W using the NMF objective while holding M constant.

uncurl.nmf_wrapper.norm_nmf(data, k, init_weights=None, init_means=None, normalize_w=True, **kwargs)[source]¶

Parameters:	data (array) – dense or sparse array with shape (genes, cells) k (int) – number of cell types normalize_w (bool) – True if W should be normalized (so that each column sums to 1) init_weights (array, optional) – Initial value for W. Default: None init_means (array, optional) – Initial value for M. Default: None **kwargs – misc arguments to NMF
Returns:	Two matrices M of shape (genes, k) and W of shape (k, cells)

uncurl.qual2quant module¶

uncurl.qual2quant.binarize(qualitative)[source]¶: binarizes an expression dataset.

uncurl.qual2quant.poisson_test(data1, data2, smoothing=1e-05, return_pval=True)[source]¶

Returns a p-value for the ratio of the means of two poisson-distributed datasets.

Source: http://ncss.wpengine.netdna-cdn.com/wp-content/themes/ncss/pdf/Procedures/PASS/Tests_for_Two_Poisson_Means.pdf

Gu, K., Ng, H.K.T., Tang, M.L., and Schucany, W. 2008. ‘Testing the Ratio of Two Poisson Rates.’ Biometrical Journal, 50, 2, 283-298

Based on W2

Parameters:	data1 (array) – 1d array of floats - first distribution data2 (array) – 1d array of floats - second distribution smoothing (float) – number to add to each of the datasets return_pval (bool) – True to return p value; False to return test statistic. Default: True

uncurl.qual2quant.qualNorm(data, qualitative)[source]¶

Generates starting points using binarized data. If qualitative data is missing for a given gene, all of its entries should be -1 in the qualitative matrix.

Parameters:	data (array) – 2d array of genes x cells qualitative (array) – 2d array of numerical data - genes x clusters
Returns:	Array of starting positions for state estimation or clustering, with shape genes x clusters

uncurl.qual2quant.qualNormGaussian(data, qualitative)[source]¶

Generates starting points using binarized data. If qualitative data is missing for a given gene, all of its entries should be -1 in the qualitative matrix.

Parameters:	data (array) – 2d array of genes x cells qualitative (array) – 2d array of numerical data - genes x clusters
Returns:	Array of starting positions for state estimation or clustering, with shape genes x clusters

uncurl.qual2quant.qualNorm_filter_genes(data, qualitative, pval_threshold=0.05, smoothing=1e-05, eps=1e-05)[source]¶: Does qualNorm but returns a filtered gene set, based on a p-value threshold.

uncurl.clustering module¶

uncurl.clustering.kmeans_pp(data, k, centers=None)[source]¶

Generates kmeans++ initial centers.

Parameters:	data (array) – A 2d array- genes x cells k (int) – Number of clusters centers (array, optional) – if provided, these are one or more known cluster centers. 2d array of genes x number of centers (<=k).
Returns:	centers - a genes x k array of cluster means. assignments - a cells x 1 array of cluster assignments

uncurl.clustering.poisson_cluster(data, k, init=None, max_iters=100)[source]¶

Performs Poisson hard EM on the given data.

Parameters:	data (array) – A 2d array- genes x cells. Can be dense or sparse; for best performance, sparse matrices should be in CSC format. k (int) – Number of clusters init (array, optional) – Initial centers - genes x k array. Default: None, use kmeans++ max_iters (int, optional) – Maximum number of iterations. Default: 100
Returns:	a cells x 1 vector of cluster assignments, and a genes x k array of cluster means.
Return type:	a tuple of two arrays

uncurl.dimensionality_reduction module¶

uncurl.dimensionality_reduction.diffusion_mds(means, weights, d, diffusion_rounds=10)[source]¶

Dimensionality reduction using MDS, while running diffusion on W.

Parameters:	means (array) – genes x clusters weights (array) – clusters x cells d (int) – desired dimensionality
Returns:	array of shape (d, cells)
Return type:	W_reduced (array)

uncurl.dimensionality_reduction.dim_reduce(means, weights, d)[source]¶

Dimensionality reduction using Poisson distances and MDS.

Parameters:	means (array) – genes x clusters weights (array) – clusters x cells d (int) – desired dimensionality
Returns:	X, a clusters x d matrix representing the reduced dimensions of the cluster centers.

uncurl.dimensionality_reduction.dim_reduce_data(data, d)[source]¶

Does a MDS on the data directly, not on the means.

Parameters:	data (array) – genes x cells d (int) – desired dimensionality
Returns:	X, a cells x d matrix

uncurl.dimensionality_reduction.mds(means, weights, d)[source]¶

Dimensionality reduction using MDS.

Parameters:	means (array) – genes x clusters weights (array) – clusters x cells d (int) – desired dimensionality
Returns:	array of shape (d, cells)
Return type:	W_reduced (array)

uncurl.evaluation module¶

uncurl.evaluation.mdl(ll, k, data)[source]¶

Returns the minimum description length score of the model given its log-likelihood and k, the number of cell types.

a lower cost is better...

uncurl.evaluation.nne(dim_red, true_labels)[source]¶

Calculates the nearest neighbor accuracy (basically leave-one-out cross validation with a 1NN classifier).

Parameters:	dim_red (array) – dimensions (k, cells) true_labels (array) – 1d array of integers
Returns:	Nearest neighbor accuracy - fraction of points for which the 1NN 1NN classifier returns the correct value.

uncurl.evaluation.purity(labels, true_labels)[source]¶

Calculates the purity score for the given labels.

Parameters:	labels (array) – 1D array of integers true_labels (array) – 1D array of integers - true labels
Returns:	purity score - a float bewteen 0 and 1. Closer to 1 is better.

uncurl.experiment_runner module¶

class uncurl.experiment_runner.Argmax(n_classes, **params)[source]¶

Bases: uncurl.experiment_runner.Cluster

run(data)[source]¶

class uncurl.experiment_runner.BasicNMF(return_h=True, return_w=False, return_mds=False, return_wh=False, **params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

Runs NMF on data, returning H and W*H.

Requires a ‘k’ parameter, which is the rank of the matrices.

run(data)[source]¶

class uncurl.experiment_runner.Bicluster(n_classes, n_gene_classes=10, **params)[source]¶

Bases: uncurl.experiment_runner.Cluster

Spectral Biclustering

run(data)[source]¶

class uncurl.experiment_runner.Cluster(n_classes, **params)[source]¶

Bases: object

Clustering methods take in a matrix of shape k x cells, and return an array of integers in (0, n_classes-1).

They should be able to run on the output of pre-processing...

run(data)[source]¶

class uncurl.experiment_runner.Cocluster(n_classes, n_gene_classes=10, **params)[source]¶

Bases: uncurl.experiment_runner.Cluster

Spectral Coclustering

run(data)[source]¶

class uncurl.experiment_runner.DBScan(n_classes, **params)[source]¶

Bases: uncurl.experiment_runner.Cluster

dbscan clustering

run(data)[source]¶

class uncurl.experiment_runner.EnsembleClusterPoissonSE(**params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

Runs Poisson state estimation initialized from the consensus of 10 runs of Poisson KM.

params: k - dimensionality

run(data)[source]¶

class uncurl.experiment_runner.EnsembleNMF(**params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

Runs Ensemble NMF on log(data+1), returning the consensus results for H and W*H.

Requires a ‘k’ parameter, which is the rank of the matrices.

run(data)[source]¶

class uncurl.experiment_runner.EnsembleTSVDPoissonSE(**params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

Runs Poisson state estimation initialized from 8 runs of tsvd-km.

params: k - dimensionality

run(data)[source]¶

class uncurl.experiment_runner.EnsembleTsneLightLDASE(**params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

Runs tsne-based LightLDA Poisson state estimation

run(data)[source]¶

class uncurl.experiment_runner.EnsembleTsneNMF(**params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

Runs tsne-based ensemble NMF

run(data)[source]¶

class uncurl.experiment_runner.EnsembleTsnePoissonSE(**params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

Runs tsne-based ensemble Poisson state estimation

run(data)[source]¶

class uncurl.experiment_runner.KFoldNMF(**params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

Runs K-fold ensemble NMF on log(data+1), returning the consensus results for H and W*H.

Requires a ‘k’ parameter, which is the rank of the matrices.

run(data)[source]¶

class uncurl.experiment_runner.KM(n_classes, **params)[source]¶

Bases: uncurl.experiment_runner.Cluster

k-means clustering

run(data)[source]¶

class uncurl.experiment_runner.LightLDASE(**params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

Runs LightLDA State Estimation, returning W and MW. Requires a ‘k’ parameter.

run(data)[source]¶

class uncurl.experiment_runner.LoadPreproc(datasets, **params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

takes preprocessed data matrix, just return that when run is called

run(data)[source]¶

class uncurl.experiment_runner.Log(**params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

Takes the natural log of the data+1.

run(data)[source]¶

class uncurl.experiment_runner.LogNMF(return_h=True, return_w=False, return_mds=False, return_wh=False, **params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

Runs NMF on log(normalize(data)+1), returning H and W*H.

Requires a ‘k’ parameter, which is the rank of the matrices.

run(data)[source]¶

class uncurl.experiment_runner.LogNorm(**params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

First, normalizes the counts per cell, and then takes log(normalized_counts+1).

run(data)[source]¶

class uncurl.experiment_runner.Magic(use_magic=True, use_tsne=False, use_pca=False, **params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

run(data)[source]¶

class uncurl.experiment_runner.PLDASE(**params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

Runs PLDA State Estimation, returning W and MW. Requires a ‘k’ parameter.

run(data)[source]¶

class uncurl.experiment_runner.Pca(**params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

PCA preprocessing

run(data)[source]¶

class uncurl.experiment_runner.PcaKm(n_classes, use_log=False, name='pca_km', **params)[source]¶

Bases: uncurl.experiment_runner.Cluster

PCA + kmeans

Requires a parameter k, where k is the dimensionality of PCA.

run(data)[source]¶

class uncurl.experiment_runner.PoissonCluster(n_classes, **params)[source]¶

Bases: uncurl.experiment_runner.Cluster

Poisson k-means clustering

run(data)[source]¶

class uncurl.experiment_runner.PoissonSE(return_w=True, return_m=False, return_mw=False, return_mds=False, normalize_data=False, **params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

Runs Poisson State Estimation, returning W and MW.

Requires a ‘k’ parameter.

Optional args: return_m=True: returns M in outputs return_mw=True: returns MW in outputs

run(data)[source]¶

Returns:	list of W, M*W ll

class uncurl.experiment_runner.Preprocess(**params)[source]¶

Bases: object

Pre-processing methods take in a genes x cells data matrix of integer counts, and return a k x cells matrix, where k <= genes.

Preprocessing methods can return multiple outputs. the outputs are

If k=2, then the method can be used for visualization...

This class represents a ‘blank’ preprocessing.

run(data)[source]¶

should return a list of output matrices of the same length as self.output_names, and an objective value.

data is of shape (genes, cells).

class uncurl.experiment_runner.Simlr(**params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

run(data)[source]¶

class uncurl.experiment_runner.SimlrKm(n_classes, **params)[source]¶

Bases: uncurl.experiment_runner.Cluster

Fast minibatch Kmeans from the simlr library

run(data)[source]¶

class uncurl.experiment_runner.SimlrSmall(**params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

Simlr for small-scale datasets (no PCA preprocessing)

run(data)[source]¶

class uncurl.experiment_runner.TSVD(**params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

Runs truncated SVD on the data. the input param k is the number of dimensions.

run(data)[source]¶

class uncurl.experiment_runner.Tsne(metric='euclidean', **params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

2d tsne dimensionality reduction - tsne always uses 2d

metric is a string that could be any metric usable with tsne, or ‘kld’ or ‘jensen-shannon’

run(data)[source]¶

class uncurl.experiment_runner.TsneKm(n_classes, use_log=False, name='tsne_km', metric='euclidean', use_exp=False, **params)[source]¶

Bases: uncurl.experiment_runner.Cluster

TSNE(2) + Kmeans

run(data)[source]¶

class uncurl.experiment_runner.Zifa(**params)[source]¶

Bases: uncurl.experiment_runner.Preprocess

ZIFA preprocessing

run(data)[source]¶

uncurl.experiment_runner.generate_visualizations(methods, data, true_labels, base_dir='visualizations', figsize=(18, 10), **scatter_options)[source]¶

Generates visualization scatters for all the methods.

Parameters:	methods – follows same format as run_experiments. List of tuples. data – genes x cells true_labels – array of integers base_dir – base directory to save all the plots figsize – tuple of ints representing size of figure scatter_options – options for plt.scatter

uncurl.experiment_runner.run_experiment(methods, data, n_classes, true_labels, n_runs=10, use_purity=True, use_nmi=False, use_ari=False, use_nne=False, consensus=False)[source]¶

runs a pre-processing + clustering experiment...

exactly one of use_purity, use_nmi, or use_ari can be true

Parameters:

methods – list of 2-tuples. The first element is either a single Preprocess object or a list of Preprocess objects, to be applied in sequence to the data. The second element is either a single Cluster object, a list of Cluster objects, or a list of lists, where each list is a sequence of Preprocess objects with the final element being a Cluster object.
data – genes x cells array
true_labels – 1d array of length cells
consensus – if true, runs a consensus on cluster results for each method at the very end.
use_nmi, use_ari, use_nne (use_purity,) – which error metric to use (at most one can be True)

Returns:

purities (list of lists) names (list of lists) other (dict): keys: timing, preprocessing, clusterings

uncurl.lineage module¶

uncurl.lineage.fourier_series(x, *a)[source]¶

Arbitrary dimensionality fourier series.

The first parameter is a_0, and the second parameter is the interval/scale parameter.

The parameters are altering sin and cos paramters.

n = (len(a)-2)/2

uncurl.lineage.graph_distances(start, edges, distances)[source]¶

Given an undirected adjacency list and a pairwise distance matrix between all nodes: calculates distances along graph from start node.

Parameters:	start (int) – start node edges (list) – adjacency list of tuples distances (array) – 2d array of distances between nodes
Returns:	dict of node to distance from start

uncurl.lineage.lineage(means, weights, curve_function='poly', curve_dimensions=6)[source]¶

Lineage graph produced by minimum spanning tree

Parameters:	means (array) – genes x clusters - output of state estimation weights (array) – clusters x cells - output of state estimation curve_function (string) – either ‘poly’ or ‘fourier’. Default: ‘poly’ curve_dimensions (int) – number of parameters for the curve. Default: 6
Returns:	list of lists for each cluster smoothed data in 2d space: 2 x cells list of edges: pairs of cell indices cell cluster assignments: list of ints
Return type:	curve parameters

uncurl.lineage.poly_curve(x, *a)[source]¶: Arbitrary dimension polynomial.

uncurl.lineage.pseudotime(starting_node, edges, fitted_vals)[source]¶

Parameters:	starting_node (int) – index of the starting node edges (list) – list of tuples (node1, node2) fitted_vals (array) – output of lineage (2 x cells)
Returns:	A 1d array containing the pseudotime value of each cell.

uncurl.nb_cluster module¶

uncurl.nb_state_estimation module¶

uncurl.nb_state_estimation.nb_estimate_state(data, clusters, R=None, init_means=None, init_weights=None, max_iters=10, tol=0.0001, disp=True, inner_max_iters=400, normalize=True)[source]¶

Uses a Negative Binomial Mixture model to estimate cell states and cell state mixing weights.

If some of the genes do not fit a negative binomial distribution (mean > var), then the genes are discarded from the analysis.

Parameters:	data (array) – genes x cells clusters (int) – number of mixture components R (array, optional) – vector of length genes containing the dispersion estimates for each gene. Default: use nb_fit init_means (array, optional) – initial centers - genes x clusters. Default: kmeans++ initializations init_weights (array, optional) – initial weights - clusters x cells. Default: random(0,1) max_iters (int, optional) – maximum number of iterations. Default: 10 tol (float, optional) – if both M and W change by less than tol (in RMSE), then the iteration is stopped. Default: 1e-4 disp (bool, optional) – whether or not to display optimization parameters. Default: True inner_max_iters (int, optional) – Number of iterations to run in the scipy minimizer for M and W. Default: 400 normalize (bool, optional) – True if the resulting W should sum to 1 for each cell. Default: True.
Returns:	genes x clusters - state centers W (array): clusters x cells - state mixing components for each cell R (array): 1 x genes - NB dispersion parameter for each gene ll (float): Log-likelihood of final iteration
Return type:	M (array)

uncurl.pois_ll module¶

uncurl.pois_ll.poisson_dist(p1, p2)[source]¶

Calculates the Poisson distance between two vectors.

p1 can be a sparse matrix, while p2 has to be a dense matrix.

uncurl.pois_ll.poisson_ll(data, means)[source]¶

Calculates the Poisson log-likelihood.

Parameters:	data (array) – 2d numpy array of genes x cells means (array) – 2d numpy array of genes x k
Returns:	cells x k array of log-likelihood for each cell/cluster pair

uncurl.pois_ll.poisson_ll_2(p1, p2)[source]¶: Calculates Poisson LL(p1|p2).

uncurl.pois_ll.sparse_poisson_ll(data, means)[source]¶

uncurl.simulation module¶

uncurl.simulation.generate_nb_data(P, R, n_cells, assignments=None)[source]¶

Generates negative binomial data

Parameters:	P (array) – genes x clusters R (array) – genes x clusters n_cells (int) – number of cells assignments (list) – cluster assignment of each cell. Default: random uniform
Returns:	data array with shape genes x cells labels - array of cluster labels

uncurl.simulation.generate_nb_state_data(means, weights, R)[source]¶

Generates data according to the Negative Binomial Convex Mixture Model.

Parameters:	means (array) – Cell types- genes x clusters weights (array) – Cell cluster assignments- clusters x cells R (array) – dispersion parameter - 1 x genes
Returns:	data matrix - genes x cells

uncurl.simulation.generate_nb_states(n_states, n_cells, n_genes)[source]¶

Generates means and weights for the Negative Binomial Mixture Model. Weights are distributed Dirichlet(1,1,...), means are rand(0, 1). Returned values can be passed to generate_state_data(M, W).

Parameters:	n_states (int) – number of states or clusters n_cells (int) – number of cells n_genes (int) – number of genes
Returns:	M - genes x clusters W - clusters x cells R - genes x 1 - randint(1, 100)

uncurl.simulation.generate_poisson_data(centers, n_cells, cluster_probs=None)[source]¶

Generates poisson-distributed data, given a set of means for each cluster.

Parameters:	centers (array) – genes x clusters matrix n_cells (int) – number of output cells cluster_probs (array) – prior probability for each cluster. Default: uniform.
Returns:	output - array with shape genes x n_cells labels - array of cluster labels

uncurl.simulation.generate_poisson_lineage(n_states, n_cells_per_cluster, n_genes, means=300)[source]¶

Generates a lineage for each state- assumes that each state has a common ancestor.

Returns:	M - genes x clusters W - clusters x cells

uncurl.simulation.generate_poisson_states(n_states, n_cells, n_genes)[source]¶

Generates means and weights for the Poisson Convex Mixture Model. Weights are distributed Dirichlet(1,1,...), means are rand(0, 100). Returned values can be passed to generate_state_data(M, W).

Parameters:	n_states (int) – number of states or clusters n_cells (int) – number of cells n_genes (int) – number of genes
Returns:	M - genes x clusters W - clusters x cells

uncurl.simulation.generate_state_data(means, weights)[source]¶

Generates data according to the Poisson Convex Mixture Model.

Parameters:	means (array) – Cell types- genes x clusters weights (array) – Cell cluster assignments- clusters x cells
Returns:	data matrix - genes x cells

uncurl.simulation.generate_zip_data(M, L, n_cells, cluster_probs=None)[source]¶

Generates zero-inflated poisson-distributed data, given a set of means and zero probs for each cluster.

Parameters:	M (array) – genes x clusters matrix L (array) – genes x clusters matrix - zero-inflation parameters n_cells (int) – number of output cells cluster_probs (array) – prior probability for each cluster. Default: uniform.
Returns:	output - array with shape genes x n_cells labels - array of cluster labels

uncurl.simulation.generate_zip_state_data(means, weights, z)[source]¶

Generates data according to the Zero-inflated Poisson Convex Mixture Model.

Parameters:	means (array) – Cell types- genes x clusters weights (array) – Cell cluster assignments- clusters x cells z (float) – zero-inflation parameter
Returns:	data matrix - genes x cells

uncurl package¶

Submodules¶

uncurl.preprocessing module¶

uncurl.run_se module¶

uncurl.state_estimation module¶

uncurl.nmf_wrapper module¶

uncurl.qual2quant module¶

uncurl.clustering module¶

uncurl.dimensionality_reduction module¶

uncurl.evaluation module¶

uncurl.experiment_runner module¶

uncurl.lineage module¶

uncurl.nb_cluster module¶

uncurl.nb_state_estimation module¶

uncurl.pois_ll module¶

uncurl.simulation module¶

Module contents¶

Table Of Contents

Previous topic

This Page