uncurl package¶
Submodules¶
uncurl.preprocessing module¶
Misc functions...
-
uncurl.preprocessing.
cell_normalize
(data)[source]¶ Returns the data where the expression is normalized so that the total count per cell is equal.
-
uncurl.preprocessing.
log1p
(data)[source]¶ Returns ln(data+1), whether the original data is dense or sparse.
-
uncurl.preprocessing.
max_variance_genes
(data, nbins=5, frac=0.2)[source]¶ This function identifies the genes that have the max variance across a number of bins sorted by mean.
Parameters: - data (array) – genes x cells
- nbins (int) – number of bins to sort genes by mean expression level. Default: 10.
- frac (float) – fraction of genes to return per bin - between 0 and 1. Default: 0.1
Returns: list of gene indices (list of ints)
uncurl.run_se module¶
-
uncurl.run_se.
run_state_estimation
(data, clusters, dist='Poiss', reps=1, **kwargs)[source]¶ Runs state estimation for multiple initializations, returning the result with the highest log-likelihood. All the arguments are passed to the underlying state estimation functions (poisson_estimate_state, nb_estimate_state, zip_estimate_state).
Parameters: - data (array) – genes x cells
- clusters (int) – number of mixture components
- dist (str, optional) – Distribution used in state estimation. Options: ‘Poiss’, ‘NB’, ‘ZIP’, ‘LogNorm’, ‘Gaussian’. Default: ‘Poiss’
- reps (int, optional) – number of times to run the state estimation, taking the result with the highest log-likelihood.
- **kwargs – arguments to pass to the underlying state estimation function.
Returns: genes x clusters - state means W (array): clusters x cells - state mixing components for each cell ll (float): final log-likelihood
Return type: M (array)
uncurl.state_estimation module¶
-
uncurl.state_estimation.
initialize_from_assignments
(assignments, k, max_assign_weight=0.75)[source]¶ Creates a weight initialization matrix from Poisson clustering assignments.
Parameters: - assignments (array) – 1D array of integers, of length cells
- k (int) – number of states/clusters
- max_assign_weight (float, optional) – between 0 and 1 - how much weight to assign to the highest cluster. Default: 0.75
Returns: k x cells
Return type: init_W (array)
-
uncurl.state_estimation.
initialize_means
(data, clusters, k)[source]¶ Initializes the M matrix given the data and a set of cluster labels. Cluster centers are set to the mean of each cluster.
Parameters: - data (array) – genes x cells
- clusters (array) – 1d array of ints (0...k-1)
- k (int) – number of clusters
-
uncurl.state_estimation.
initialize_means_weights
(data, clusters, init_means=None, init_weights=None, initialization='tsvd', max_assign_weight=0.75)[source]¶ Generates initial means and weights for state estimation.
-
uncurl.state_estimation.
initialize_weights_nn
(data, means, lognorm=True)[source]¶ Initializes the weights with a nearest-neighbor approach using the means.
-
uncurl.state_estimation.
poisson_estimate_state
(data, clusters, init_means=None, init_weights=None, method='NoLips', max_iters=30, tol=1e-10, disp=False, inner_max_iters=100, normalize=True, initialization='tsvd', parallel=True, threads=4, max_assign_weight=0.75, run_w_first=True, constrain_w=False, regularization=0.0)[source]¶ Uses a Poisson Covex Mixture model to estimate cell states and cell state mixing weights.
To lower computational costs, use a sparse matrix, set disp to False, and set tol to 0.
Parameters: - data (array) – genes x cells array or sparse matrix.
- clusters (int) – number of mixture components
- init_means (array, optional) – initial centers - genes x clusters. Default: from Poisson kmeans
- init_weights (array, optional) – initial weights - clusters x cells, or assignments as produced by clustering. Default: from Poisson kmeans
- method (str, optional) – optimization method. Current options are ‘NoLips’ and ‘L-BFGS-B’. Default: ‘NoLips’.
- max_iters (int, optional) – maximum number of iterations. Default: 30
- tol (float, optional) – if both M and W change by less than tol (RMSE), then the iteration is stopped. Default: 1e-10
- disp (bool, optional) – whether or not to display optimization progress. Default: False
- inner_max_iters (int, optional) – Number of iterations to run in the optimization subroutine for M and W. Default: 100
- normalize (bool, optional) – True if the resulting W should sum to 1 for each cell. Default: True.
- initialization (str, optional) – If initial means and weights are not provided, this describes how they are initialized. Options: ‘cluster’ (poisson cluster for means and weights), ‘kmpp’ (kmeans++ for means, random weights), ‘km’ (regular k-means), ‘tsvd’ (tsvd(50) + k-means). Default: tsvd.
- parallel (bool, optional) – Whether to use parallel updates (sparse NoLips only). Default: True
- threads (int, optional) – How many threads to use in the parallel computation. Default: 4
- max_assign_weight (float, optional) – If using a clustering-based initialization, how much weight to assign to the max weight cluster. Default: 0.75
- run_w_first (bool, optional) – Whether or not to optimize W first (if false, M will be optimized first). Default: True
- constrain_w (bool, optional) – If True, then W is normalized after every iteration. Default: False
- regularization (float, optional) – Regularization coefficient for M and W. Default: 0 (no regularization).
Returns: genes x clusters - state means W (array): clusters x cells - state mixing components for each cell ll (float): final log-likelihood
Return type: M (array)
uncurl.nmf_wrapper module¶
-
uncurl.nmf_wrapper.
log_norm_nmf
(data, k, normalize_w=True, return_cost=True, init_weights=None, init_means=None, **kwargs)[source]¶ Parameters: - data (array) – dense or sparse array with shape (genes, cells)
- k (int) – number of cell types
- normalize_w (bool, optional) – True if W should be normalized (so that each column sums to 1). Default: True
- return_cost (bool, optional) – True if the NMF objective value (squared error) should be returned. Default: True
- init_weights (array, optional) – Initial value for W. Default: None
- init_means (array, optional) – Initial value for M. Default: None
- **kwargs – misc arguments to NMF
Returns: Two matrices M of shape (genes, k) and W of shape (k, cells). They correspond to M and M in Poisson state estimation. If return_cost is True (which it is by default), then the cost will also be returned. This might be prohibitably costly
-
uncurl.nmf_wrapper.
nmf_init
(data, clusters, k, init='enhanced')[source]¶ Generates initial M and W given a data set and an array of cluster labels.
- There are 3 options for init:
- enhanced - uses EIn-NMF from Gong 2013 basic - uses means for M, assigns W such that the chosen cluster for a given cell has value 0.75 and all others have 0.25/(k-1). nmf - uses means for M, and assigns W using the NMF objective while holding M constant.
-
uncurl.nmf_wrapper.
norm_nmf
(data, k, init_weights=None, init_means=None, normalize_w=True, **kwargs)[source]¶ Parameters: - data (array) – dense or sparse array with shape (genes, cells)
- k (int) – number of cell types
- normalize_w (bool) – True if W should be normalized (so that each column sums to 1)
- init_weights (array, optional) – Initial value for W. Default: None
- init_means (array, optional) – Initial value for M. Default: None
- **kwargs – misc arguments to NMF
Returns: Two matrices M of shape (genes, k) and W of shape (k, cells)
uncurl.qual2quant module¶
-
uncurl.qual2quant.
poisson_test
(data1, data2, smoothing=1e-05, return_pval=True)[source]¶ Returns a p-value for the ratio of the means of two poisson-distributed datasets.
Gu, K., Ng, H.K.T., Tang, M.L., and Schucany, W. 2008. ‘Testing the Ratio of Two Poisson Rates.’ Biometrical Journal, 50, 2, 283-298
Based on W2
Parameters: - data1 (array) – 1d array of floats - first distribution
- data2 (array) – 1d array of floats - second distribution
- smoothing (float) – number to add to each of the datasets
- return_pval (bool) – True to return p value; False to return test statistic. Default: True
-
uncurl.qual2quant.
qualNorm
(data, qualitative)[source]¶ Generates starting points using binarized data. If qualitative data is missing for a given gene, all of its entries should be -1 in the qualitative matrix.
Parameters: - data (array) – 2d array of genes x cells
- qualitative (array) – 2d array of numerical data - genes x clusters
Returns: Array of starting positions for state estimation or clustering, with shape genes x clusters
-
uncurl.qual2quant.
qualNormGaussian
(data, qualitative)[source]¶ Generates starting points using binarized data. If qualitative data is missing for a given gene, all of its entries should be -1 in the qualitative matrix.
Parameters: - data (array) – 2d array of genes x cells
- qualitative (array) – 2d array of numerical data - genes x clusters
Returns: Array of starting positions for state estimation or clustering, with shape genes x clusters
uncurl.clustering module¶
-
uncurl.clustering.
kmeans_pp
(data, k, centers=None)[source]¶ Generates kmeans++ initial centers.
Parameters: - data (array) – A 2d array- genes x cells
- k (int) – Number of clusters
- centers (array, optional) – if provided, these are one or more known cluster centers. 2d array of genes x number of centers (<=k).
Returns: centers - a genes x k array of cluster means. assignments - a cells x 1 array of cluster assignments
-
uncurl.clustering.
poisson_cluster
(data, k, init=None, max_iters=100)[source]¶ Performs Poisson hard EM on the given data.
Parameters: - data (array) – A 2d array- genes x cells. Can be dense or sparse; for best performance, sparse matrices should be in CSC format.
- k (int) – Number of clusters
- init (array, optional) – Initial centers - genes x k array. Default: None, use kmeans++
- max_iters (int, optional) – Maximum number of iterations. Default: 100
Returns: a cells x 1 vector of cluster assignments, and a genes x k array of cluster means.
Return type: a tuple of two arrays
uncurl.dimensionality_reduction module¶
-
uncurl.dimensionality_reduction.
diffusion_mds
(means, weights, d, diffusion_rounds=10)[source]¶ Dimensionality reduction using MDS, while running diffusion on W.
Parameters: - means (array) – genes x clusters
- weights (array) – clusters x cells
- d (int) – desired dimensionality
Returns: array of shape (d, cells)
Return type: W_reduced (array)
-
uncurl.dimensionality_reduction.
dim_reduce
(means, weights, d)[source]¶ Dimensionality reduction using Poisson distances and MDS.
Parameters: - means (array) – genes x clusters
- weights (array) – clusters x cells
- d (int) – desired dimensionality
Returns: X, a clusters x d matrix representing the reduced dimensions of the cluster centers.
uncurl.evaluation module¶
-
uncurl.evaluation.
mdl
(ll, k, data)[source]¶ Returns the minimum description length score of the model given its log-likelihood and k, the number of cell types.
a lower cost is better...
-
uncurl.evaluation.
nne
(dim_red, true_labels)[source]¶ Calculates the nearest neighbor accuracy (basically leave-one-out cross validation with a 1NN classifier).
Parameters: - dim_red (array) – dimensions (k, cells)
- true_labels (array) – 1d array of integers
Returns: Nearest neighbor accuracy - fraction of points for which the 1NN 1NN classifier returns the correct value.
uncurl.experiment_runner module¶
-
class
uncurl.experiment_runner.
BasicNMF
(return_h=True, return_w=False, return_mds=False, return_wh=False, **params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
Runs NMF on data, returning H and W*H.
Requires a ‘k’ parameter, which is the rank of the matrices.
-
class
uncurl.experiment_runner.
Bicluster
(n_classes, n_gene_classes=10, **params)[source]¶ Bases:
uncurl.experiment_runner.Cluster
Spectral Biclustering
-
class
uncurl.experiment_runner.
Cluster
(n_classes, **params)[source]¶ Bases:
object
Clustering methods take in a matrix of shape k x cells, and return an array of integers in (0, n_classes-1).
They should be able to run on the output of pre-processing...
-
class
uncurl.experiment_runner.
Cocluster
(n_classes, n_gene_classes=10, **params)[source]¶ Bases:
uncurl.experiment_runner.Cluster
Spectral Coclustering
-
class
uncurl.experiment_runner.
DBScan
(n_classes, **params)[source]¶ Bases:
uncurl.experiment_runner.Cluster
dbscan clustering
-
class
uncurl.experiment_runner.
EnsembleClusterPoissonSE
(**params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
Runs Poisson state estimation initialized from the consensus of 10 runs of Poisson KM.
params: k - dimensionality
-
class
uncurl.experiment_runner.
EnsembleNMF
(**params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
Runs Ensemble NMF on log(data+1), returning the consensus results for H and W*H.
Requires a ‘k’ parameter, which is the rank of the matrices.
-
class
uncurl.experiment_runner.
EnsembleTSVDPoissonSE
(**params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
Runs Poisson state estimation initialized from 8 runs of tsvd-km.
params: k - dimensionality
-
class
uncurl.experiment_runner.
EnsembleTsneLightLDASE
(**params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
Runs tsne-based LightLDA Poisson state estimation
-
class
uncurl.experiment_runner.
EnsembleTsneNMF
(**params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
Runs tsne-based ensemble NMF
-
class
uncurl.experiment_runner.
EnsembleTsnePoissonSE
(**params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
Runs tsne-based ensemble Poisson state estimation
-
class
uncurl.experiment_runner.
KFoldNMF
(**params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
Runs K-fold ensemble NMF on log(data+1), returning the consensus results for H and W*H.
Requires a ‘k’ parameter, which is the rank of the matrices.
-
class
uncurl.experiment_runner.
KM
(n_classes, **params)[source]¶ Bases:
uncurl.experiment_runner.Cluster
k-means clustering
-
class
uncurl.experiment_runner.
LightLDASE
(**params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
Runs LightLDA State Estimation, returning W and MW. Requires a ‘k’ parameter.
-
class
uncurl.experiment_runner.
LoadPreproc
(datasets, **params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
takes preprocessed data matrix, just return that when run is called
-
class
uncurl.experiment_runner.
Log
(**params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
Takes the natural log of the data+1.
-
class
uncurl.experiment_runner.
LogNMF
(return_h=True, return_w=False, return_mds=False, return_wh=False, **params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
Runs NMF on log(normalize(data)+1), returning H and W*H.
Requires a ‘k’ parameter, which is the rank of the matrices.
-
class
uncurl.experiment_runner.
LogNorm
(**params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
First, normalizes the counts per cell, and then takes log(normalized_counts+1).
-
class
uncurl.experiment_runner.
Magic
(use_magic=True, use_tsne=False, use_pca=False, **params)[source]¶
-
class
uncurl.experiment_runner.
PLDASE
(**params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
Runs PLDA State Estimation, returning W and MW. Requires a ‘k’ parameter.
-
class
uncurl.experiment_runner.
Pca
(**params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
PCA preprocessing
-
class
uncurl.experiment_runner.
PcaKm
(n_classes, use_log=False, name='pca_km', **params)[source]¶ Bases:
uncurl.experiment_runner.Cluster
PCA + kmeans
Requires a parameter k, where k is the dimensionality of PCA.
-
class
uncurl.experiment_runner.
PoissonCluster
(n_classes, **params)[source]¶ Bases:
uncurl.experiment_runner.Cluster
Poisson k-means clustering
-
class
uncurl.experiment_runner.
PoissonSE
(return_w=True, return_m=False, return_mw=False, return_mds=False, normalize_data=False, **params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
Runs Poisson State Estimation, returning W and MW.
Requires a ‘k’ parameter.
Optional args: return_m=True: returns M in outputs return_mw=True: returns MW in outputs
-
class
uncurl.experiment_runner.
Preprocess
(**params)[source]¶ Bases:
object
Pre-processing methods take in a genes x cells data matrix of integer counts, and return a k x cells matrix, where k <= genes.
Preprocessing methods can return multiple outputs. the outputs are
If k=2, then the method can be used for visualization...
This class represents a ‘blank’ preprocessing.
-
class
uncurl.experiment_runner.
SimlrKm
(n_classes, **params)[source]¶ Bases:
uncurl.experiment_runner.Cluster
Fast minibatch Kmeans from the simlr library
-
class
uncurl.experiment_runner.
SimlrSmall
(**params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
Simlr for small-scale datasets (no PCA preprocessing)
-
class
uncurl.experiment_runner.
TSVD
(**params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
Runs truncated SVD on the data. the input param k is the number of dimensions.
-
class
uncurl.experiment_runner.
Tsne
(metric='euclidean', **params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
2d tsne dimensionality reduction - tsne always uses 2d
metric is a string that could be any metric usable with tsne, or ‘kld’ or ‘jensen-shannon’
-
class
uncurl.experiment_runner.
TsneKm
(n_classes, use_log=False, name='tsne_km', metric='euclidean', use_exp=False, **params)[source]¶ Bases:
uncurl.experiment_runner.Cluster
TSNE(2) + Kmeans
-
class
uncurl.experiment_runner.
Zifa
(**params)[source]¶ Bases:
uncurl.experiment_runner.Preprocess
ZIFA preprocessing
-
uncurl.experiment_runner.
generate_visualizations
(methods, data, true_labels, base_dir='visualizations', figsize=(18, 10), **scatter_options)[source]¶ Generates visualization scatters for all the methods.
Parameters: - methods – follows same format as run_experiments. List of tuples.
- data – genes x cells
- true_labels – array of integers
- base_dir – base directory to save all the plots
- figsize – tuple of ints representing size of figure
- scatter_options – options for plt.scatter
-
uncurl.experiment_runner.
run_experiment
(methods, data, n_classes, true_labels, n_runs=10, use_purity=True, use_nmi=False, use_ari=False, use_nne=False, consensus=False)[source]¶ runs a pre-processing + clustering experiment...
exactly one of use_purity, use_nmi, or use_ari can be true
Parameters: - methods – list of 2-tuples. The first element is either a single Preprocess object or a list of Preprocess objects, to be applied in sequence to the data. The second element is either a single Cluster object, a list of Cluster objects, or a list of lists, where each list is a sequence of Preprocess objects with the final element being a Cluster object.
- data – genes x cells array
- true_labels – 1d array of length cells
- consensus – if true, runs a consensus on cluster results for each method at the very end.
- use_nmi, use_ari, use_nne (use_purity,) – which error metric to use (at most one can be True)
Returns: purities (list of lists) names (list of lists) other (dict): keys: timing, preprocessing, clusterings
uncurl.lineage module¶
-
uncurl.lineage.
fourier_series
(x, *a)[source]¶ Arbitrary dimensionality fourier series.
The first parameter is a_0, and the second parameter is the interval/scale parameter.
The parameters are altering sin and cos paramters.
n = (len(a)-2)/2
-
uncurl.lineage.
graph_distances
(start, edges, distances)[source]¶ Given an undirected adjacency list and a pairwise distance matrix between all nodes: calculates distances along graph from start node.
Parameters: - start (int) – start node
- edges (list) – adjacency list of tuples
- distances (array) – 2d array of distances between nodes
Returns: dict of node to distance from start
-
uncurl.lineage.
lineage
(means, weights, curve_function='poly', curve_dimensions=6)[source]¶ Lineage graph produced by minimum spanning tree
Parameters: - means (array) – genes x clusters - output of state estimation
- weights (array) – clusters x cells - output of state estimation
- curve_function (string) – either ‘poly’ or ‘fourier’. Default: ‘poly’
- curve_dimensions (int) – number of parameters for the curve. Default: 6
Returns: list of lists for each cluster smoothed data in 2d space: 2 x cells list of edges: pairs of cell indices cell cluster assignments: list of ints
Return type: curve parameters
uncurl.nb_cluster module¶
uncurl.nb_state_estimation module¶
-
uncurl.nb_state_estimation.
nb_estimate_state
(data, clusters, R=None, init_means=None, init_weights=None, max_iters=10, tol=0.0001, disp=True, inner_max_iters=400, normalize=True)[source]¶ Uses a Negative Binomial Mixture model to estimate cell states and cell state mixing weights.
If some of the genes do not fit a negative binomial distribution (mean > var), then the genes are discarded from the analysis.
Parameters: - data (array) – genes x cells
- clusters (int) – number of mixture components
- R (array, optional) – vector of length genes containing the dispersion estimates for each gene. Default: use nb_fit
- init_means (array, optional) – initial centers - genes x clusters. Default: kmeans++ initializations
- init_weights (array, optional) – initial weights - clusters x cells. Default: random(0,1)
- max_iters (int, optional) – maximum number of iterations. Default: 10
- tol (float, optional) – if both M and W change by less than tol (in RMSE), then the iteration is stopped. Default: 1e-4
- disp (bool, optional) – whether or not to display optimization parameters. Default: True
- inner_max_iters (int, optional) – Number of iterations to run in the scipy minimizer for M and W. Default: 400
- normalize (bool, optional) – True if the resulting W should sum to 1 for each cell. Default: True.
Returns: genes x clusters - state centers W (array): clusters x cells - state mixing components for each cell R (array): 1 x genes - NB dispersion parameter for each gene ll (float): Log-likelihood of final iteration
Return type: M (array)
uncurl.pois_ll module¶
-
uncurl.pois_ll.
poisson_dist
(p1, p2)[source]¶ Calculates the Poisson distance between two vectors.
p1 can be a sparse matrix, while p2 has to be a dense matrix.
uncurl.simulation module¶
-
uncurl.simulation.
generate_nb_data
(P, R, n_cells, assignments=None)[source]¶ Generates negative binomial data
Parameters: - P (array) – genes x clusters
- R (array) – genes x clusters
- n_cells (int) – number of cells
- assignments (list) – cluster assignment of each cell. Default: random uniform
Returns: data array with shape genes x cells labels - array of cluster labels
-
uncurl.simulation.
generate_nb_state_data
(means, weights, R)[source]¶ Generates data according to the Negative Binomial Convex Mixture Model.
Parameters: - means (array) – Cell types- genes x clusters
- weights (array) – Cell cluster assignments- clusters x cells
- R (array) – dispersion parameter - 1 x genes
Returns: data matrix - genes x cells
-
uncurl.simulation.
generate_nb_states
(n_states, n_cells, n_genes)[source]¶ Generates means and weights for the Negative Binomial Mixture Model. Weights are distributed Dirichlet(1,1,...), means are rand(0, 1). Returned values can be passed to generate_state_data(M, W).
Parameters: - n_states (int) – number of states or clusters
- n_cells (int) – number of cells
- n_genes (int) – number of genes
Returns: M - genes x clusters W - clusters x cells R - genes x 1 - randint(1, 100)
-
uncurl.simulation.
generate_poisson_data
(centers, n_cells, cluster_probs=None)[source]¶ Generates poisson-distributed data, given a set of means for each cluster.
Parameters: - centers (array) – genes x clusters matrix
- n_cells (int) – number of output cells
- cluster_probs (array) – prior probability for each cluster. Default: uniform.
Returns: output - array with shape genes x n_cells labels - array of cluster labels
-
uncurl.simulation.
generate_poisson_lineage
(n_states, n_cells_per_cluster, n_genes, means=300)[source]¶ Generates a lineage for each state- assumes that each state has a common ancestor.
Returns: M - genes x clusters W - clusters x cells
-
uncurl.simulation.
generate_poisson_states
(n_states, n_cells, n_genes)[source]¶ Generates means and weights for the Poisson Convex Mixture Model. Weights are distributed Dirichlet(1,1,...), means are rand(0, 100). Returned values can be passed to generate_state_data(M, W).
Parameters: - n_states (int) – number of states or clusters
- n_cells (int) – number of cells
- n_genes (int) – number of genes
Returns: M - genes x clusters W - clusters x cells
-
uncurl.simulation.
generate_state_data
(means, weights)[source]¶ Generates data according to the Poisson Convex Mixture Model.
Parameters: - means (array) – Cell types- genes x clusters
- weights (array) – Cell cluster assignments- clusters x cells
Returns: data matrix - genes x cells
-
uncurl.simulation.
generate_zip_data
(M, L, n_cells, cluster_probs=None)[source]¶ Generates zero-inflated poisson-distributed data, given a set of means and zero probs for each cluster.
Parameters: - M (array) – genes x clusters matrix
- L (array) – genes x clusters matrix - zero-inflation parameters
- n_cells (int) – number of output cells
- cluster_probs (array) – prior probability for each cluster. Default: uniform.
Returns: output - array with shape genes x n_cells labels - array of cluster labels
-
uncurl.simulation.
generate_zip_state_data
(means, weights, z)[source]¶ Generates data according to the Zero-inflated Poisson Convex Mixture Model.
Parameters: - means (array) – Cell types- genes x clusters
- weights (array) – Cell cluster assignments- clusters x cells
- z (float) – zero-inflation parameter
Returns: data matrix - genes x cells