BMatrix¶
Created: 1/29/2024
Updated last on 4/21/2025
Required Packages¶
Python, Numpy, Pandas, Re, Itertools, BooleanNetwork
Installation¶
To install the BMatrix code, download it and import within Python file as ‘BMatrix’
Function Descriptions¶
load_network_from_file(filename, initial_state=None)¶
Input: Requires filename
which is the name of the .txt file that contains the equations. initial_state
is a list of integers that represent the initial state of the network. If no initial state is provided, the code will generate a random initial state.
Functionality: The code reads the .txt file and creates a BooleanNetwork object. If an equation is a constant value (0 or 1), meaning that the gene is set as mutated/perturbed.
Output: A BooleanNetwork object
get_equations(file)¶
Input: A .txt file with the equations formatted GENE = ! ( INHIBITOR | INHIBITOR ) & ( ACTIVATOR | ACTIVIATOR )
Functionality: Reads the .txt file by line, if the format is incorrect, it returns an error. Output: equations
which is a list of strings, where each string represents one line of the .txt file
Example: ['FLT3 = FLT3', 'AKT = FLT3', 'CEBPA = ! FLT3', 'DNMT3A = DNMT3A', 'GSK3B = ! AKT']
get_gene_dict(equations)¶
Input: Requires equations
Functionality: From equations, takes all the values on the left side of the = sign and creates a dictionary with those values starting from 0. Output: gene_dict
a dictionary which includes all the genes (nodes) that are included in the simulation
Example: {'FLT3': 0, 'AKT': 1, 'CEBPA': 2, 'DNMT3A': 3, 'GSK3B': 4}
get_upstream_genes(equations)¶
Input: Requires equations
Functionality: Using equations
, it takes the right side of the equations, and removes all symbols such as | () &
and leaves only the genes. Output: upstream_genes
which is a list of strings, where every string represents the upstream genes of the gene (node) at the same line in equations
Example: ['FLT3', 'FLT3', ' FLT3', 'DNMT3A']
get_connectivity_matrix(equations,gene_dict,upstream_genes)¶
Input: Requires equations
, gene_dict
, and upstream_genes
Functionality: Takes the strings from upstream_genes
and iterates over every gene in an individual string. For every gene in the string, it takes the value of that gene from gene_dict
and appends the value to a result list. After every string is complete, it pads up the result list with -1. Output: connectivity_matrix
which is a np.array
Example: array([[ 0, -1, -1, -1],[ 0, -1, -1, -1],[ 0, -1, -1, -1],[ 3, -1, -1, -1]])
get_truth_table(equations,upstream_genes, show_functions = None)¶
Input: Requires equations
and upstream_genes
. show_functions
is a flag that gives the user the option to print out the functions that the code uses in eval() to create the truth table. It is automatically set to False
.Functionality: Takes the right side of the equations
list per line and evaluates the Boolean function using all possible combinations of gene values. It appends this output as a tuple to a list. After every line is evaluated, -1 is added to make all the tuples the same length. Output: truth_table
which is an np.array
Example: [[ 0 1 -1 -1 -1 -1 -1 -1][ 0 1 -1 -1 -1 -1 -1 -1][ 1 0 -1 -1 -1 -1 -1 -1][ 0 1 -1 -1 -1 -1 -1 -1]]
get_mutation_dict(file)¶
Input: Requires file
which is a .txt file that includes all the genes and their mutations (which is simplified in this case as TSG = 0 and oncogenes = 1). The .txt file must have the format of GENE = MUTATION
or else an error will occur. Functionality: For every line in the .txt file, it splits by the = sign and whatever is on the left is a key (in this case a key) and whatever is on the right of the = sign is the value. Output: mutation_dict
which is a dictionary that has genes as the keys and mutations as the value.
Example: {'FLT3': 1, 'DNMT3A': 0, 'NPM1': 1}
This function can also be used to make perturbed_dict, which is needed when perturbed genes are involved in the simulation
get_knocking_genes(profile, mutation_dict, connectivity_matrix, gene_dict,perturbed_genes=None, perturbed_dict=None)¶
Input: Requires profile
which contains the patient’s mutation profile as a string (ex. ‘FLT3,NPM1,DNMT3A’), connectivity_matrix
, gene_dict
This code does not require perturbed_genes
, perturbed_dict
, and mutation_dict
. For both perturbed_dict
and perturbed_genes
, they are only to be used when perturbed genes are considered during the simulation. In this case, (where the effects of perturbed genes is to be measured) one can decide whether to include mutation_dict
. Depends on the results one wants to get. perturbed_genes
is formatted the same way as profile
and perturbed_dict
is formatted the same way as mutation_dict
Functionality:
The code reads the profile
and splits the profile up into it’s seperate genes. For profile
and/or peturbed_genes
the code splits up the string and removes any repeat values.
Perturbed genes + Mutated genes:
If both perturbed and mutated genes are involved, the code reads the profile
and splits the profile and removes duplicates (which is then renamed mutation_profile
). For every gene in the mutation_profile, if there is no genes, it returns ‘no_mutations’ and does not change the connectivity matrix or the inital state. If there is a gene, the code accesses the genes value in from gene_dict
, and knocks the array in the connectivity matrix out to -1, and sets the gene’s inital value equal to the gene’s value in mutation_dict
. Then, after all the genes in the mutation_profile
are considered, the code does the same except it uses the perturbed genes. The main difference is that the perturbed genes DO NOT depend on the profile
. It knocks in/out the perturbed genes regardless of patient profiles.
Peturbed genes ONLY:
If only perturbed genes are involved, the code reads the perturbed_genes
and splits the profile and removes duplicates. For every gene in the perturbed_genes
, if there is no genes, it returns ‘no perturbed genes’ and does not change the connectivity matrix or the inital state. If there is a gene, the code accesses the genes value in from gene_dict
, and knocks the array in the connectivity matrix out to -1, and sets the gene’s inital value equal to the gene’s value in perturbed_dict
.
Mutated genes ONLY:
If only mutated genes are involved, the code reads the profile
and splits the profile and removes duplicates (renamed mutation_profile
to prevent rewriting the variable). For every gene in the mutation_profile
, if there is no genes, it returns ‘no mutated genes’ and does not change the connectivity matrix or the inital state. If there is a gene, the code accesses the genes value in from gene_dict
, and knocks the array in the connectivity matrix out to -1, and sets the gene’s inital value equal to the gene’s value in mutation_dict
.
The perturbed genes take superiority over mutated genes. For example, if a GENE = 1 in a mutation_dict
and the GENE is perturbed with the definition of the GENE being GENE = 0 in perturbed_dict
for all patients, the value of the GENE will be set to 0.
Output: The output is mutated_connectivity_matrix
which is the individual patient’s connectivity matrix, and x0
which is the individual patient’s unique inital state.
Example:
mutated_connectivity_matrix': [ 0 -1 -1][ -1 -1 -1][ 0 -1 -1][ 3 -1 -1][ 1 -1 -1][ 5 -1 -1]
x0: [0,1,1,1,1]
The calculating equations (which all start with cal_
) are used post-simulation and calculate the phenotype scores and the final network score_¶
get_cal_upstream_genes(equations):¶
Input: Requires equations
(These equations are from a .txt file that has the calculating functions Ex. Apoptosis = GENE) Functionality: Using equations
, it takes the right side of the equations, and removes all symbols such as | () &
and leaves only the genes. (Retains any duplicates) Output: cal_upstream_genes
a list of lists, where every individual list is for one phenotype \
Example: [['BCL2', 'TP53'], ['CEBPA', 'ETV6', 'MEIS1']]
get_cal_functions(equations):¶
Input: Requires equations
Functionality: Takes the right side of the equations
(called cal_functions
in the code) and replaces the Boolean symbols with + or -. In this case ! = -
, | = +
, and & = +
. The code then removes any parantheses in the functions, cleans them up (removing any extra spaces that came from removing the parantheses) and then returns the cleaned cal_functions
.
Output: cal_functions
which are functions that have genes being added/subtracted for the total phenotype score. \
Example: ['TP53 + (-BCL2) ', ' CEBPA + ETV6 + (-MEIS1) '}]
get_calculating_scores is to be used post-simulation, which requires the BooleanNetwork code¶
get_calculating_scores(network_traj, cal_functions, cal_upstream_genes, gene_dict, cal_range=None, scores_dict=None)¶
Input: Requires network_traj
(from the BooleanNetwork simulation), cal_functions
, cal_upstream_genes
and gene_dict
. \
This code doesn’t require cal_range
or scores_dict
. cal_range
specifies a specific range of network_traj
values to be used in calculating the scores, it is automatically set to the last 100,000 steps. scores_dict
is a dictionary, where the keys are the phenotype names + Network
score. It is automatically set to have the keys “Apoptosis”, “Differentiation”, “Proliferation”, and “Network” \
Functionality: The code iterates over every function in cal_functions
and retrieves the function’s associated cal_upstream_genes
. For each function, it extracts the genes and evaluates the function for each row in the cal_range
. The results are stored in the scores_dict under the respective keys. After evaluating scores for every function in cal_functions
, it calculates the ‘Network’ scores based on the formula: Proliferation - (Differentiation + Apoptosis). After calculating the Network scores, it calculates the mean of the ‘Network’ scores called the final score. Output: The output is a scores_dict
which is a dictionary that has the scores for every phenotype + network for all of cal_range
and final_score
which is the mean of all the values of scores_dict['Network']
.
Example:
scores_dict: {'Apoptosis': [0,0,...],'Differentiation': [1,0,...],'Proliferation':[0,1,...],'Network':[0,1,...]}
final_score: 4.45318