uncurl package

Submodules

uncurl.preprocessing module

Misc functions...

uncurl.preprocessing.cell_normalize(data)[source]

Returns the data where the expression is normalized so that the total count per cell is equal.

uncurl.preprocessing.log1p(data)[source]

Returns ln(data+1), whether the original data is dense or sparse.

uncurl.preprocessing.max_variance_genes(data, nbins=5, frac=0.2)[source]

This function identifies the genes that have the max variance across a number of bins sorted by mean.

Parameters:
  • data (array) – genes x cells
  • nbins (int) – number of bins to sort genes by mean expression level. Default: 10.
  • frac (float) – fraction of genes to return per bin - between 0 and 1. Default: 0.1
Returns:

list of gene indices (list of ints)

uncurl.preprocessing.sparse_mean_var(data)[source]

Calculates the variance for each row of a sparse matrix, using the relationship Var = E[x^2] - E[x]^2.

Returns:pair of matrices mean, variance.

uncurl.run_se module

uncurl.run_se.run_state_estimation(data, clusters, dist='Poiss', reps=1, **kwargs)[source]

Runs state estimation for multiple initializations, returning the result with the highest log-likelihood. All the arguments are passed to the underlying state estimation functions (poisson_estimate_state, nb_estimate_state, zip_estimate_state).

Parameters:
  • data (array) – genes x cells
  • clusters (int) – number of mixture components
  • dist (str, optional) – Distribution used in state estimation. Options: ‘Poiss’, ‘NB’, ‘ZIP’, ‘LogNorm’, ‘Gaussian’. Default: ‘Poiss’
  • reps (int, optional) – number of times to run the state estimation, taking the result with the highest log-likelihood.
  • **kwargs – arguments to pass to the underlying state estimation function.
Returns:

genes x clusters - state means W (array): clusters x cells - state mixing components for each cell ll (float): final log-likelihood

Return type:

M (array)

uncurl.state_estimation module

uncurl.state_estimation.initialize_from_assignments(assignments, k, max_assign_weight=0.75)[source]

Creates a weight initialization matrix from Poisson clustering assignments.

Parameters:
  • assignments (array) – 1D array of integers, of length cells
  • k (int) – number of states/clusters
  • max_assign_weight (float, optional) – between 0 and 1 - how much weight to assign to the highest cluster. Default: 0.75
Returns:

k x cells

Return type:

init_W (array)

uncurl.state_estimation.initialize_means(data, clusters, k)[source]

Initializes the M matrix given the data and a set of cluster labels. Cluster centers are set to the mean of each cluster.

Parameters:
  • data (array) – genes x cells
  • clusters (array) – 1d array of ints (0...k-1)
  • k (int) – number of clusters
uncurl.state_estimation.initialize_means_weights(data, clusters, init_means=None, init_weights=None, initialization='tsvd', max_assign_weight=0.75)[source]

Generates initial means and weights for state estimation.

uncurl.state_estimation.initialize_weights_nn(data, means, lognorm=True)[source]

Initializes the weights with a nearest-neighbor approach using the means.

uncurl.state_estimation.poisson_estimate_state(data, clusters, init_means=None, init_weights=None, method='NoLips', max_iters=30, tol=1e-10, disp=False, inner_max_iters=100, normalize=True, initialization='tsvd', parallel=True, threads=4, max_assign_weight=0.75, run_w_first=True, constrain_w=False, regularization=0.0)[source]

Uses a Poisson Covex Mixture model to estimate cell states and cell state mixing weights.

To lower computational costs, use a sparse matrix, set disp to False, and set tol to 0.

Parameters:
  • data (array) – genes x cells array or sparse matrix.
  • clusters (int) – number of mixture components
  • init_means (array, optional) – initial centers - genes x clusters. Default: from Poisson kmeans
  • init_weights (array, optional) – initial weights - clusters x cells, or assignments as produced by clustering. Default: from Poisson kmeans
  • method (str, optional) – optimization method. Current options are ‘NoLips’ and ‘L-BFGS-B’. Default: ‘NoLips’.
  • max_iters (int, optional) – maximum number of iterations. Default: 30
  • tol (float, optional) – if both M and W change by less than tol (RMSE), then the iteration is stopped. Default: 1e-10
  • disp (bool, optional) – whether or not to display optimization progress. Default: False
  • inner_max_iters (int, optional) – Number of iterations to run in the optimization subroutine for M and W. Default: 100
  • normalize (bool, optional) – True if the resulting W should sum to 1 for each cell. Default: True.
  • initialization (str, optional) – If initial means and weights are not provided, this describes how they are initialized. Options: ‘cluster’ (poisson cluster for means and weights), ‘kmpp’ (kmeans++ for means, random weights), ‘km’ (regular k-means), ‘tsvd’ (tsvd(50) + k-means). Default: tsvd.
  • parallel (bool, optional) – Whether to use parallel updates (sparse NoLips only). Default: True
  • threads (int, optional) – How many threads to use in the parallel computation. Default: 4
  • max_assign_weight (float, optional) – If using a clustering-based initialization, how much weight to assign to the max weight cluster. Default: 0.75
  • run_w_first (bool, optional) – Whether or not to optimize W first (if false, M will be optimized first). Default: True
  • constrain_w (bool, optional) – If True, then W is normalized after every iteration. Default: False
  • regularization (float, optional) – Regularization coefficient for M and W. Default: 0 (no regularization).
Returns:

genes x clusters - state means W (array): clusters x cells - state mixing components for each cell ll (float): final log-likelihood

Return type:

M (array)

uncurl.nmf_wrapper module

uncurl.nmf_wrapper.log_norm_nmf(data, k, normalize_w=True, return_cost=True, init_weights=None, init_means=None, **kwargs)[source]
Parameters:
  • data (array) – dense or sparse array with shape (genes, cells)
  • k (int) – number of cell types
  • normalize_w (bool, optional) – True if W should be normalized (so that each column sums to 1). Default: True
  • return_cost (bool, optional) – True if the NMF objective value (squared error) should be returned. Default: True
  • init_weights (array, optional) – Initial value for W. Default: None
  • init_means (array, optional) – Initial value for M. Default: None
  • **kwargs – misc arguments to NMF
Returns:

Two matrices M of shape (genes, k) and W of shape (k, cells). They correspond to M and M in Poisson state estimation. If return_cost is True (which it is by default), then the cost will also be returned. This might be prohibitably costly

uncurl.nmf_wrapper.nmf_init(data, clusters, k, init='enhanced')[source]

Generates initial M and W given a data set and an array of cluster labels.

There are 3 options for init:
enhanced - uses EIn-NMF from Gong 2013 basic - uses means for M, assigns W such that the chosen cluster for a given cell has value 0.75 and all others have 0.25/(k-1). nmf - uses means for M, and assigns W using the NMF objective while holding M constant.
uncurl.nmf_wrapper.norm_nmf(data, k, init_weights=None, init_means=None, normalize_w=True, **kwargs)[source]
Parameters:
  • data (array) – dense or sparse array with shape (genes, cells)
  • k (int) – number of cell types
  • normalize_w (bool) – True if W should be normalized (so that each column sums to 1)
  • init_weights (array, optional) – Initial value for W. Default: None
  • init_means (array, optional) – Initial value for M. Default: None
  • **kwargs – misc arguments to NMF
Returns:

Two matrices M of shape (genes, k) and W of shape (k, cells)

uncurl.qual2quant module

uncurl.qual2quant.binarize(qualitative)[source]

binarizes an expression dataset.

uncurl.qual2quant.poisson_test(data1, data2, smoothing=1e-05, return_pval=True)[source]

Returns a p-value for the ratio of the means of two poisson-distributed datasets.

Source: http://ncss.wpengine.netdna-cdn.com/wp-content/themes/ncss/pdf/Procedures/PASS/Tests_for_Two_Poisson_Means.pdf

Gu, K., Ng, H.K.T., Tang, M.L., and Schucany, W. 2008. ‘Testing the Ratio of Two Poisson Rates.’ Biometrical Journal, 50, 2, 283-298

Based on W2

Parameters:
  • data1 (array) – 1d array of floats - first distribution
  • data2 (array) – 1d array of floats - second distribution
  • smoothing (float) – number to add to each of the datasets
  • return_pval (bool) – True to return p value; False to return test statistic. Default: True
uncurl.qual2quant.qualNorm(data, qualitative)[source]

Generates starting points using binarized data. If qualitative data is missing for a given gene, all of its entries should be -1 in the qualitative matrix.

Parameters:
  • data (array) – 2d array of genes x cells
  • qualitative (array) – 2d array of numerical data - genes x clusters
Returns:

Array of starting positions for state estimation or clustering, with shape genes x clusters

uncurl.qual2quant.qualNormGaussian(data, qualitative)[source]

Generates starting points using binarized data. If qualitative data is missing for a given gene, all of its entries should be -1 in the qualitative matrix.

Parameters:
  • data (array) – 2d array of genes x cells
  • qualitative (array) – 2d array of numerical data - genes x clusters
Returns:

Array of starting positions for state estimation or clustering, with shape genes x clusters

uncurl.qual2quant.qualNorm_filter_genes(data, qualitative, pval_threshold=0.05, smoothing=1e-05, eps=1e-05)[source]

Does qualNorm but returns a filtered gene set, based on a p-value threshold.

uncurl.clustering module

uncurl.clustering.kmeans_pp(data, k, centers=None)[source]

Generates kmeans++ initial centers.

Parameters:
  • data (array) – A 2d array- genes x cells
  • k (int) – Number of clusters
  • centers (array, optional) – if provided, these are one or more known cluster centers. 2d array of genes x number of centers (<=k).
Returns:

centers - a genes x k array of cluster means. assignments - a cells x 1 array of cluster assignments

uncurl.clustering.poisson_cluster(data, k, init=None, max_iters=100)[source]

Performs Poisson hard EM on the given data.

Parameters:
  • data (array) – A 2d array- genes x cells. Can be dense or sparse; for best performance, sparse matrices should be in CSC format.
  • k (int) – Number of clusters
  • init (array, optional) – Initial centers - genes x k array. Default: None, use kmeans++
  • max_iters (int, optional) – Maximum number of iterations. Default: 100
Returns:

a cells x 1 vector of cluster assignments, and a genes x k array of cluster means.

Return type:

a tuple of two arrays

uncurl.dimensionality_reduction module

uncurl.dimensionality_reduction.diffusion_mds(means, weights, d, diffusion_rounds=10)[source]

Dimensionality reduction using MDS, while running diffusion on W.

Parameters:
  • means (array) – genes x clusters
  • weights (array) – clusters x cells
  • d (int) – desired dimensionality
Returns:

array of shape (d, cells)

Return type:

W_reduced (array)

uncurl.dimensionality_reduction.dim_reduce(means, weights, d)[source]

Dimensionality reduction using Poisson distances and MDS.

Parameters:
  • means (array) – genes x clusters
  • weights (array) – clusters x cells
  • d (int) – desired dimensionality
Returns:

X, a clusters x d matrix representing the reduced dimensions of the cluster centers.

uncurl.dimensionality_reduction.dim_reduce_data(data, d)[source]

Does a MDS on the data directly, not on the means.

Parameters:
  • data (array) – genes x cells
  • d (int) – desired dimensionality
Returns:

X, a cells x d matrix

uncurl.dimensionality_reduction.mds(means, weights, d)[source]

Dimensionality reduction using MDS.

Parameters:
  • means (array) – genes x clusters
  • weights (array) – clusters x cells
  • d (int) – desired dimensionality
Returns:

array of shape (d, cells)

Return type:

W_reduced (array)

uncurl.evaluation module

uncurl.evaluation.mdl(ll, k, data)[source]

Returns the minimum description length score of the model given its log-likelihood and k, the number of cell types.

a lower cost is better...

uncurl.evaluation.nne(dim_red, true_labels)[source]

Calculates the nearest neighbor accuracy (basically leave-one-out cross validation with a 1NN classifier).

Parameters:
  • dim_red (array) – dimensions (k, cells)
  • true_labels (array) – 1d array of integers
Returns:

Nearest neighbor accuracy - fraction of points for which the 1NN 1NN classifier returns the correct value.

uncurl.evaluation.purity(labels, true_labels)[source]

Calculates the purity score for the given labels.

Parameters:
  • labels (array) – 1D array of integers
  • true_labels (array) – 1D array of integers - true labels
Returns:

purity score - a float bewteen 0 and 1. Closer to 1 is better.

uncurl.experiment_runner module

class uncurl.experiment_runner.Argmax(n_classes, **params)[source]

Bases: uncurl.experiment_runner.Cluster

run(data)[source]
class uncurl.experiment_runner.BasicNMF(return_h=True, return_w=False, return_mds=False, return_wh=False, **params)[source]

Bases: uncurl.experiment_runner.Preprocess

Runs NMF on data, returning H and W*H.

Requires a ‘k’ parameter, which is the rank of the matrices.

run(data)[source]
class uncurl.experiment_runner.Bicluster(n_classes, n_gene_classes=10, **params)[source]

Bases: uncurl.experiment_runner.Cluster

Spectral Biclustering

run(data)[source]
class uncurl.experiment_runner.Cluster(n_classes, **params)[source]

Bases: object

Clustering methods take in a matrix of shape k x cells, and return an array of integers in (0, n_classes-1).

They should be able to run on the output of pre-processing...

run(data)[source]
class uncurl.experiment_runner.Cocluster(n_classes, n_gene_classes=10, **params)[source]

Bases: uncurl.experiment_runner.Cluster

Spectral Coclustering

run(data)[source]
class uncurl.experiment_runner.DBScan(n_classes, **params)[source]

Bases: uncurl.experiment_runner.Cluster

dbscan clustering

run(data)[source]
class uncurl.experiment_runner.EnsembleClusterPoissonSE(**params)[source]

Bases: uncurl.experiment_runner.Preprocess

Runs Poisson state estimation initialized from the consensus of 10 runs of Poisson KM.

params: k - dimensionality

run(data)[source]
class uncurl.experiment_runner.EnsembleNMF(**params)[source]

Bases: uncurl.experiment_runner.Preprocess

Runs Ensemble NMF on log(data+1), returning the consensus results for H and W*H.

Requires a ‘k’ parameter, which is the rank of the matrices.

run(data)[source]
class uncurl.experiment_runner.EnsembleTSVDPoissonSE(**params)[source]

Bases: uncurl.experiment_runner.Preprocess

Runs Poisson state estimation initialized from 8 runs of tsvd-km.

params: k - dimensionality

run(data)[source]
class uncurl.experiment_runner.EnsembleTsneLightLDASE(**params)[source]

Bases: uncurl.experiment_runner.Preprocess

Runs tsne-based LightLDA Poisson state estimation

run(data)[source]
class uncurl.experiment_runner.EnsembleTsneNMF(**params)[source]

Bases: uncurl.experiment_runner.Preprocess

Runs tsne-based ensemble NMF

run(data)[source]
class uncurl.experiment_runner.EnsembleTsnePoissonSE(**params)[source]

Bases: uncurl.experiment_runner.Preprocess

Runs tsne-based ensemble Poisson state estimation

run(data)[source]
class uncurl.experiment_runner.KFoldNMF(**params)[source]

Bases: uncurl.experiment_runner.Preprocess

Runs K-fold ensemble NMF on log(data+1), returning the consensus results for H and W*H.

Requires a ‘k’ parameter, which is the rank of the matrices.

run(data)[source]
class uncurl.experiment_runner.KM(n_classes, **params)[source]

Bases: uncurl.experiment_runner.Cluster

k-means clustering

run(data)[source]
class uncurl.experiment_runner.LightLDASE(**params)[source]

Bases: uncurl.experiment_runner.Preprocess

Runs LightLDA State Estimation, returning W and MW. Requires a ‘k’ parameter.

run(data)[source]
class uncurl.experiment_runner.LoadPreproc(datasets, **params)[source]

Bases: uncurl.experiment_runner.Preprocess

takes preprocessed data matrix, just return that when run is called

run(data)[source]
class uncurl.experiment_runner.Log(**params)[source]

Bases: uncurl.experiment_runner.Preprocess

Takes the natural log of the data+1.

run(data)[source]
class uncurl.experiment_runner.LogNMF(return_h=True, return_w=False, return_mds=False, return_wh=False, **params)[source]

Bases: uncurl.experiment_runner.Preprocess

Runs NMF on log(normalize(data)+1), returning H and W*H.

Requires a ‘k’ parameter, which is the rank of the matrices.

run(data)[source]
class uncurl.experiment_runner.LogNorm(**params)[source]

Bases: uncurl.experiment_runner.Preprocess

First, normalizes the counts per cell, and then takes log(normalized_counts+1).

run(data)[source]
class uncurl.experiment_runner.Magic(use_magic=True, use_tsne=False, use_pca=False, **params)[source]

Bases: uncurl.experiment_runner.Preprocess

run(data)[source]
class uncurl.experiment_runner.PLDASE(**params)[source]

Bases: uncurl.experiment_runner.Preprocess

Runs PLDA State Estimation, returning W and MW. Requires a ‘k’ parameter.

run(data)[source]
class uncurl.experiment_runner.Pca(**params)[source]

Bases: uncurl.experiment_runner.Preprocess

PCA preprocessing

run(data)[source]
class uncurl.experiment_runner.PcaKm(n_classes, use_log=False, name='pca_km', **params)[source]

Bases: uncurl.experiment_runner.Cluster

PCA + kmeans

Requires a parameter k, where k is the dimensionality of PCA.

run(data)[source]
class uncurl.experiment_runner.PoissonCluster(n_classes, **params)[source]

Bases: uncurl.experiment_runner.Cluster

Poisson k-means clustering

run(data)[source]
class uncurl.experiment_runner.PoissonSE(return_w=True, return_m=False, return_mw=False, return_mds=False, normalize_data=False, **params)[source]

Bases: uncurl.experiment_runner.Preprocess

Runs Poisson State Estimation, returning W and MW.

Requires a ‘k’ parameter.

Optional args: return_m=True: returns M in outputs return_mw=True: returns MW in outputs

run(data)[source]
Returns:list of W, M*W ll
class uncurl.experiment_runner.Preprocess(**params)[source]

Bases: object

Pre-processing methods take in a genes x cells data matrix of integer counts, and return a k x cells matrix, where k <= genes.

Preprocessing methods can return multiple outputs. the outputs are

If k=2, then the method can be used for visualization...

This class represents a ‘blank’ preprocessing.

run(data)[source]

should return a list of output matrices of the same length as self.output_names, and an objective value.

data is of shape (genes, cells).

class uncurl.experiment_runner.Simlr(**params)[source]

Bases: uncurl.experiment_runner.Preprocess

run(data)[source]
class uncurl.experiment_runner.SimlrKm(n_classes, **params)[source]

Bases: uncurl.experiment_runner.Cluster

Fast minibatch Kmeans from the simlr library

run(data)[source]
class uncurl.experiment_runner.SimlrSmall(**params)[source]

Bases: uncurl.experiment_runner.Preprocess

Simlr for small-scale datasets (no PCA preprocessing)

run(data)[source]
class uncurl.experiment_runner.TSVD(**params)[source]

Bases: uncurl.experiment_runner.Preprocess

Runs truncated SVD on the data. the input param k is the number of dimensions.

run(data)[source]
class uncurl.experiment_runner.Tsne(metric='euclidean', **params)[source]

Bases: uncurl.experiment_runner.Preprocess

2d tsne dimensionality reduction - tsne always uses 2d

metric is a string that could be any metric usable with tsne, or ‘kld’ or ‘jensen-shannon’

run(data)[source]
class uncurl.experiment_runner.TsneKm(n_classes, use_log=False, name='tsne_km', metric='euclidean', use_exp=False, **params)[source]

Bases: uncurl.experiment_runner.Cluster

TSNE(2) + Kmeans

run(data)[source]
class uncurl.experiment_runner.Zifa(**params)[source]

Bases: uncurl.experiment_runner.Preprocess

ZIFA preprocessing

run(data)[source]
uncurl.experiment_runner.generate_visualizations(methods, data, true_labels, base_dir='visualizations', figsize=(18, 10), **scatter_options)[source]

Generates visualization scatters for all the methods.

Parameters:
  • methods – follows same format as run_experiments. List of tuples.
  • data – genes x cells
  • true_labels – array of integers
  • base_dir – base directory to save all the plots
  • figsize – tuple of ints representing size of figure
  • scatter_options – options for plt.scatter
uncurl.experiment_runner.run_experiment(methods, data, n_classes, true_labels, n_runs=10, use_purity=True, use_nmi=False, use_ari=False, use_nne=False, consensus=False)[source]

runs a pre-processing + clustering experiment...

exactly one of use_purity, use_nmi, or use_ari can be true

Parameters:
  • methods – list of 2-tuples. The first element is either a single Preprocess object or a list of Preprocess objects, to be applied in sequence to the data. The second element is either a single Cluster object, a list of Cluster objects, or a list of lists, where each list is a sequence of Preprocess objects with the final element being a Cluster object.
  • data – genes x cells array
  • true_labels – 1d array of length cells
  • consensus – if true, runs a consensus on cluster results for each method at the very end.
  • use_nmi, use_ari, use_nne (use_purity,) – which error metric to use (at most one can be True)
Returns:

purities (list of lists) names (list of lists) other (dict): keys: timing, preprocessing, clusterings

uncurl.lineage module

uncurl.lineage.fourier_series(x, *a)[source]

Arbitrary dimensionality fourier series.

The first parameter is a_0, and the second parameter is the interval/scale parameter.

The parameters are altering sin and cos paramters.

n = (len(a)-2)/2

uncurl.lineage.graph_distances(start, edges, distances)[source]

Given an undirected adjacency list and a pairwise distance matrix between all nodes: calculates distances along graph from start node.

Parameters:
  • start (int) – start node
  • edges (list) – adjacency list of tuples
  • distances (array) – 2d array of distances between nodes
Returns:

dict of node to distance from start

uncurl.lineage.lineage(means, weights, curve_function='poly', curve_dimensions=6)[source]

Lineage graph produced by minimum spanning tree

Parameters:
  • means (array) – genes x clusters - output of state estimation
  • weights (array) – clusters x cells - output of state estimation
  • curve_function (string) – either ‘poly’ or ‘fourier’. Default: ‘poly’
  • curve_dimensions (int) – number of parameters for the curve. Default: 6
Returns:

list of lists for each cluster smoothed data in 2d space: 2 x cells list of edges: pairs of cell indices cell cluster assignments: list of ints

Return type:

curve parameters

uncurl.lineage.poly_curve(x, *a)[source]

Arbitrary dimension polynomial.

uncurl.lineage.pseudotime(starting_node, edges, fitted_vals)[source]
Parameters:
  • starting_node (int) – index of the starting node
  • edges (list) – list of tuples (node1, node2)
  • fitted_vals (array) – output of lineage (2 x cells)
Returns:

A 1d array containing the pseudotime value of each cell.

uncurl.nb_cluster module

uncurl.nb_state_estimation module

uncurl.nb_state_estimation.nb_estimate_state(data, clusters, R=None, init_means=None, init_weights=None, max_iters=10, tol=0.0001, disp=True, inner_max_iters=400, normalize=True)[source]

Uses a Negative Binomial Mixture model to estimate cell states and cell state mixing weights.

If some of the genes do not fit a negative binomial distribution (mean > var), then the genes are discarded from the analysis.

Parameters:
  • data (array) – genes x cells
  • clusters (int) – number of mixture components
  • R (array, optional) – vector of length genes containing the dispersion estimates for each gene. Default: use nb_fit
  • init_means (array, optional) – initial centers - genes x clusters. Default: kmeans++ initializations
  • init_weights (array, optional) – initial weights - clusters x cells. Default: random(0,1)
  • max_iters (int, optional) – maximum number of iterations. Default: 10
  • tol (float, optional) – if both M and W change by less than tol (in RMSE), then the iteration is stopped. Default: 1e-4
  • disp (bool, optional) – whether or not to display optimization parameters. Default: True
  • inner_max_iters (int, optional) – Number of iterations to run in the scipy minimizer for M and W. Default: 400
  • normalize (bool, optional) – True if the resulting W should sum to 1 for each cell. Default: True.
Returns:

genes x clusters - state centers W (array): clusters x cells - state mixing components for each cell R (array): 1 x genes - NB dispersion parameter for each gene ll (float): Log-likelihood of final iteration

Return type:

M (array)

uncurl.pois_ll module

uncurl.pois_ll.poisson_dist(p1, p2)[source]

Calculates the Poisson distance between two vectors.

p1 can be a sparse matrix, while p2 has to be a dense matrix.

uncurl.pois_ll.poisson_ll(data, means)[source]

Calculates the Poisson log-likelihood.

Parameters:
  • data (array) – 2d numpy array of genes x cells
  • means (array) – 2d numpy array of genes x k
Returns:

cells x k array of log-likelihood for each cell/cluster pair

uncurl.pois_ll.poisson_ll_2(p1, p2)[source]

Calculates Poisson LL(p1|p2).

uncurl.pois_ll.sparse_poisson_ll(data, means)[source]

uncurl.simulation module

uncurl.simulation.generate_nb_data(P, R, n_cells, assignments=None)[source]

Generates negative binomial data

Parameters:
  • P (array) – genes x clusters
  • R (array) – genes x clusters
  • n_cells (int) – number of cells
  • assignments (list) – cluster assignment of each cell. Default: random uniform
Returns:

data array with shape genes x cells labels - array of cluster labels

uncurl.simulation.generate_nb_state_data(means, weights, R)[source]

Generates data according to the Negative Binomial Convex Mixture Model.

Parameters:
  • means (array) – Cell types- genes x clusters
  • weights (array) – Cell cluster assignments- clusters x cells
  • R (array) – dispersion parameter - 1 x genes
Returns:

data matrix - genes x cells

uncurl.simulation.generate_nb_states(n_states, n_cells, n_genes)[source]

Generates means and weights for the Negative Binomial Mixture Model. Weights are distributed Dirichlet(1,1,...), means are rand(0, 1). Returned values can be passed to generate_state_data(M, W).

Parameters:
  • n_states (int) – number of states or clusters
  • n_cells (int) – number of cells
  • n_genes (int) – number of genes
Returns:

M - genes x clusters W - clusters x cells R - genes x 1 - randint(1, 100)

uncurl.simulation.generate_poisson_data(centers, n_cells, cluster_probs=None)[source]

Generates poisson-distributed data, given a set of means for each cluster.

Parameters:
  • centers (array) – genes x clusters matrix
  • n_cells (int) – number of output cells
  • cluster_probs (array) – prior probability for each cluster. Default: uniform.
Returns:

output - array with shape genes x n_cells labels - array of cluster labels

uncurl.simulation.generate_poisson_lineage(n_states, n_cells_per_cluster, n_genes, means=300)[source]

Generates a lineage for each state- assumes that each state has a common ancestor.

Returns:M - genes x clusters W - clusters x cells
uncurl.simulation.generate_poisson_states(n_states, n_cells, n_genes)[source]

Generates means and weights for the Poisson Convex Mixture Model. Weights are distributed Dirichlet(1,1,...), means are rand(0, 100). Returned values can be passed to generate_state_data(M, W).

Parameters:
  • n_states (int) – number of states or clusters
  • n_cells (int) – number of cells
  • n_genes (int) – number of genes
Returns:

M - genes x clusters W - clusters x cells

uncurl.simulation.generate_state_data(means, weights)[source]

Generates data according to the Poisson Convex Mixture Model.

Parameters:
  • means (array) – Cell types- genes x clusters
  • weights (array) – Cell cluster assignments- clusters x cells
Returns:

data matrix - genes x cells

uncurl.simulation.generate_zip_data(M, L, n_cells, cluster_probs=None)[source]

Generates zero-inflated poisson-distributed data, given a set of means and zero probs for each cluster.

Parameters:
  • M (array) – genes x clusters matrix
  • L (array) – genes x clusters matrix - zero-inflation parameters
  • n_cells (int) – number of output cells
  • cluster_probs (array) – prior probability for each cluster. Default: uniform.
Returns:

output - array with shape genes x n_cells labels - array of cluster labels

uncurl.simulation.generate_zip_state_data(means, weights, z)[source]

Generates data according to the Zero-inflated Poisson Convex Mixture Model.

Parameters:
  • means (array) – Cell types- genes x clusters
  • weights (array) – Cell cluster assignments- clusters x cells
  • z (float) – zero-inflation parameter
Returns:

data matrix - genes x cells

Module contents