minorg.minimum_set module
- class minorg.minimum_set.CollapsedgRNA(name, gRNA_objs)[source]
Bases:
minorg.minweight_sc.Set
- __init__(name, gRNA_objs)[source]
Effectively a set object that tracks name and weight.
- Parameters
name (str) – set name
weight (float) – set weight
elements (iterable) – elements in set
- add(gRNA_obj) None [source]
Add a single
gRNA
object to self.- Parameters
gRNA_obj (
gRNA
) – gRNA object
- copy()[source]
Returns a shallow copy of self. (i.e. retains the same gRNA objects, but stores them in a new set)
- Returns
- class minorg.minimum_set.SetOfCollapsedgRNA(*args, **kwargs)[source]
Bases:
minorg.minweight_sc.SetOfSets
- add_grna(*gRNAs) None [source]
Add
gRNA
to aCollapsedgRNA
in self with same coverage. If no CollapsedgRNA has same coverage as the gRNA to add, the gRNA will be skipped.:param
gRNA
:
- all_not_empty() bool [source]
- Returns
True – If no
CollapsedgRNA
in self is emptyFalse – If at least one
CollapsedgRNA
in self is empty
- copy()[source]
Returns shallow copy of self. (i.e. retains same CollapsedgRNA objects but stores them in a new set)
- Returns
- generate_grna_set(prioritise_3prime=False, consume=False)[source]
Outputs list of 1 gRNA from each CollapsedgRNA in self. By default, the gRNA closest to the 5’ (sense, if information exists) strand are selected from each CollapsedgRNA object.
- Parameters
prioritise_3prime (bool) – select gRNA from CollapsedgRNA using proximity to 3’ end instead of 5’ end (default=False)
consume (bool) – remove output gRNA from self’s CollapsedgRNA objects permanently
- Returns
list of
gRNA
objects- Return type
list
- generate_grna_sets(prioritise_3prime=False, set_num=1, max_set_num=1, manual=True)[source]
Yields lists of 1 gRNA from each CollapsedgRNA in self.
- Parameters
prioritise_3prime (bool) – tie-break with proximity to 3’ end instead of 5’ (default=False)
set_num (int) – number to print for manual check. Also used with ‘max_set_num’ to set limit on number of sets to generate. (default=1)
max_set_num (int) – maximum set number. Maximum of sets generated will be ‘<max_set_num> - <set_num> + 1’ (default=1)
manual (bool) – enable manual screening of each gRNA set for approval at interactive terminal (default=False)
- Returns
generator that yields a set of gRNA in format [<1
gRNA
object from each CollapsedgRNA in self>]- Return type
generator
- remove_empty() None [source]
Remove
CollaspedgRNA
from self if the CollapsedgRNA object is empty (i.e. no gRNA in it)
- remove_grna(*gRNAs) None [source]
Remove
gRNA
from anyCollapsedgRNA
in self.:param
gRNA
: gRNA object
- write(fout) None [source]
Write collapsed gRNAs to file. Fields are:
coverage group: unique ID given to each collapsed gRNA group
coverage: number of targets covered by coverage group
gRNA id: gRNA ID
gRNA sequence: gRNA sequence
relative pos: relative position to 5’ end, only valid for comparison within a coverage group
- Parameters
fout (str) – path to output file
- minorg.minimum_set.all_best_nr(potential_coverage, all_coverage, covered)[source]
Get all gRNA with equivalent non-redundnacy.
This function prioritises gRNA with fewest target overlap with already covered targets.
- Parameters
potential_coverage (dict) – dictionary of {‘<gRNA seq>’: [<list of gRNAHit obj>]} where gRNAHits in <list of gRNAHit obj> only contain hits to targets NOT already covered; AND where ONLY as yet unchosen gRNASeq obj are included
all_coverage (dict) – dictionary of {‘<gRNA seq>’: [<list of gRNAHit obj>]} where gRNAHits in <list of gRNAHit obj> only contain hits to all targets REGARDLESS of whether they’ve already been covered; AND where all gRNA’s gRNASeq obj are included REGARDLESS of whether they’ve already been chosen
covered (set) – set of IDs of targets already covered
- Returns
dictionary of {‘<gRNA seq (str)>’: [<list of gRNAHit obj>]} subset of ‘potential_coverage’ with equivalent non-redundancy
- Return type
dict
- minorg.minimum_set.all_best_pos(potential_coverage, all_coverage, covered)[source]
Get all gRNA with equivalent closeness to 5’.
This function prioritises gRNA closest to 5’ end.
- Parameters
potential_coverage (dict) – dictionary of {‘<gRNA seq>’: [<list of gRNAHit obj>]} where gRNAHits in <list of gRNAHit obj> only contain hits to targets NOT already covered; AND where ONLY as yet unchosen gRNASeq obj are included
all_coverage (dict) – dictionary of {‘<gRNA seq>’: [<list of gRNAHit obj>]} where gRNAHits in <list of gRNAHit obj> only contain hits to all targets REGARDLESS of whether they’ve already been covered; AND where all gRNA’s gRNASeq obj are included REGARDLESS of whether they’ve already been chosen
covered (set) – set of IDs of targets already covered
- Returns
dictionary of {‘<gRNA seq (str)>’: [<list of gRNAHit obj>]} subset of ‘potential_coverage’ with equivalent closeness to 5’
- Return type
dict
- class minorg.minimum_set.gRNA(name, gRNASeq_obj)[source]
Bases:
minorg.minweight_sc.Set
- __init__(name, gRNASeq_obj)[source]
Effectively a set object that tracks name and weight.
- Parameters
name (str) – set name
weight (float) – set weight
elements (iterable) – elements in set
- closer_to_5prime(other) bool [source]
Takes another gRNA object and returns True if self is closer to 5’ end (of sense strand, if sense information exists) by comparing relative positions between shared targets. If tie, return True.
- Returns
bool
- Raises
Exception – If no common targets between the two gRNA objects (cannot compare across different targets)
- property relative_5prime_pos: float[source]
Returns a value calculated from all hits that is ONLY VALID FOR COMPARISON BETWEEN gRNA with the SAME COVERAGE. The smaller the value, the closer to the 5’ end.
- relative_distance_to_5prime(other) float [source]
Takes another gRNA object and returns self._relative_5prime_pos(<hits in shared targets>) - other._relative_5prime_pos(<hits in shared targets>). If self, is closer to the 5’ end, the value returned will be negative.
- Returns
float
- Raises
Exception – If no common targets between the two gRNA objects (cannot compare across different targets)
- minorg.minimum_set.limited_minweight_SC(collapsed_grnas, num_sets, targets=None, num_lengths_to_track=None, low_coverage_penalty=0)[source]
Executes
enum_approx_order_SC()
for a capped number of iterations (max(20, 2*num_sets)) while seeding each run with a different CollapsedgRNA, starting with the CollapsedgRNA with the highest coverage to the lowest. CollapsedgRNA will be removed from candidate list once all CollapsedgRNA with equivalent coverage have been used as seed. Stops when coverage of the next CollapsedgRNA to be seeded has a coverage of less than <total targets>/<num CollapsedgRNA in <num_lengths_to_track>th smallest set cover solution>.- Parameters
collapsed_grnas (CollapsedgRNA) –
CollapsedgRNA
objectnum_sets (int) – desired number of sets. Used to inform maximum number of iterations.
targets (list or set or tuple) – targets IDs (str) of targets to be coverd
num_lengths_to_track (int) – length of <num_lengths_to_track>th smallest set cover solution will be used to determine whether to terminate search (see function description)
low_coverage_penalty (float) – multiplier for value calculated by
minweight_sc()
, which will then be multiplied by <number of remaining targets that are not covered by gRNA> and then added to the output of that function. Effectively penalises large sets of many small coverage gRNA. This might make the set less redundant, but will likely reduce set size.
- Returns
set cover solutions (
CollapsedgRNA
)- Return type
list
- minorg.minimum_set.limited_optimal_SC(U, S, size=1, redundancy=1)[source]
Attempts to find set cover solutions by brute force with a capped maximum set size and redundancy.
- Parameters
U (set) – set of elements (targets) to cover
S (
SetOfSets
) – SetOfSets (or child class) object containing sets (gRNA coverage) for set coversize (int) – maximum set cover solution size for optimal search
redundancy (float) – maximum allowable redundancy as fraction of total number of elements to be covered (U)
- Returns
Of
SetOfSets
; set cover solutions- Return type
list
- minorg.minimum_set.make_get_minimum_set(gRNA_hits, manual_check=True, exclude_seqs={}, targets=None, prioritise_nr=False, sc_algorithm='LAR', num_sets=1, tie_breaker=None, low_coverage_penalty=0.5, suppress_warning=False, impossible_set_message='If you used the full programme, consider adjusting --minlen, --minid, and --merge-within, and/or raising --check-reciprocal to restrict candidate target sequences to true biological replicates. Also consider using --domain to restrict the search.')[source]
Make function to generate minimum set of gRNA.
- Parameters
gRNA_hits (list) – gRNAHit objects
manual_check (bool) – manually approve each gRNA set
exclude_seqs (set/list) – optional, gRNA sequences (str) to exclude
targets (list) – optional, target IDs (str)
prioritise_nr (bool) – prioritise non-redundancy. If used, ‘sc_algorithm’ will be ignored. (default=False)
sc_algorithm (str) – set cover algorithm when not prioritising non-redundancy. Only used if prioritise_nr=False. (default=’LAR’)
num_sets (int) – number of sets
tie_breaker (func) – tie-breaker function. Takes (1) ‘gRNA_coverage’ filtered for unselected gRNA seq, (2) unmodified ‘gRNA_coverage’, (3) list of IDs of targets covered by already selected gRNA
impossible_set_message (str) – message to print when gRNA cannot cover all targets
suppress_warning (bool) – suppress printing of warning when gRNA cannot cover all targets
- Returns
function that takes no arguments and returns list of minimum set of gRNA sequences (str)
- Return type
func
- minorg.minimum_set.make_set_cover_nr(gRNA_hits, num_sets=1, target_ids=[], low_coverage_penalty=0, num_lengths_to_track=None, prioritise_3prime=False, optimal_depth=5, suppress_warning=False)[source]
Create function that generates mutually exclusive gRNA sets with non-redundancy as a priority.
- Parameters
gRNA_hits (
gRNAHits
) – gRNAHits objectnum_sets (int) – number of mutually exclusive gRNA sets to return (default=1)
target_ids (list) – list of target names/IDs (str) to be covered. If not provided, will be inferred from the set of target IDs covered by gRNA hits in ‘gRNA_hits’
manual (bool) – manually approve each gRNA set for inclusion through interactive terminal (default=False)
low_coverage_penalty (float) – multiplier for value calculated by
minweight_sc()
, which will then be multiplied by <number of remaining targets that are not covered by gRNA> and then added to the output of that function. Effectively penalises large sets of many small coverage gRNA. This might make the set less redundant, but will likely reduce set size.prioritise_3prime (bool) – tie-break with proximity to 3’ end instead of 5’ (default=False)
- Returns
function that returns list (gRNA panel) of str (gRNA names)
- Return type
func
- minorg.minimum_set.make_set_cover_pos(gRNA_hits, num_sets=1, target_ids=[], algorithm='LAR', id_key=<function <lambda>>, tie_breaker=<function tie_break_first>, suppress_warning=False)[source]
Create function that generates mutually exclusive minimum gRNA sets with position as a priority.
- Parameters
gRNA_hits (list) – gRNAHit objects
target_ids (list) – list of target names/IDs (str) to be covered. If not provided, will be inferred from the set of target IDs covered by gRNA hits in ‘gRNA_hits’
algorithm (str) – set cover algorithm
exclude_seqs (set/list) – gRNA sequences (str) to exclude
id_key (func) – function to extract target ID from gRNAHit obj
tie_breaker (func) – tie-breaker function. Takes (1) ‘gRNA_coverage’ filtered for unselected gRNA seq, (2) unmodified ‘gRNA_coverage’, (3) list of IDs of targets covered by already selected gRNA
suppress_warning (bool) – suppress printing of warning when gRNA cannot cover all targets
- Returns
minimum set of gRNA (
gRNA
)- Return type
list
- minorg.minimum_set.manual_check_prompt(grnas, set_num=None)[source]
Prints prompt for manual check.
- minorg.minimum_set.set_cover_LAR(gRNA_coverage, target_ids, id_key=<function <lambda>>, tie_breaker=<function tie_break_first>)[source]
Set cover algorithm LAR.
Algorithm described in: Yang, Q., Nofsinger, A., Mcpeek, J., Phinney, J. and Knuesel, R. (2015). A Complete Solution to the Set Covering Problem. In International Conference on Scientific Computing (CSC) pp. 36–41
- Parameters
gRNA_coverage (dict) – {‘<gRNA seq>’: [<list of gRNAHit obj associated w/ that gRNA seq>]}
target_ids (list) – IDs of targets to cover
id_key (func) – function to extract target ID from gRNAHit obj
tie_breaker (func) – tie-breaker function. Takes (1) ‘gRNA_coverage’ filtered for unselected gRNA seq, (2) unmodified ‘gRNA_coverage’, (3) list of IDs of targets covered by already selected gRNA
- Returns
minimum set of gRNA sequences (str)
- Return type
set
- minorg.minimum_set.set_cover_greedy(gRNA_coverage, target_ids, id_key=<function <lambda>>, tie_breaker=<function tie_break_first>)[source]
Greedy set cover algorithm.
- Parameters
gRNA_coverage (dict) – {‘<gRNA seq>’: [<list of gRNAHit obj associated w/ that gRNA seq>]}
target_ids (list) – IDs of targets to cover
id_key (func) – function to extract target ID from gRNAHit obj
tie_breaker (func) – tie-breaker function. Takes (1) ‘gRNA_coverage’ filtered for unselected gRNA seq, (2) unmodified ‘gRNA_coverage’, (3) list of IDs of targets covered by already selected gRNA
- Returns
minimum set of gRNA sequences (str)
- Return type
set
- minorg.minimum_set.tie_break_first(cov, all_cov, coverage)[source]
Arbitrarily returns the first gRNA sequence and its associated gRNAHit objects.
- Parameters
potential_coverage (dict) – dictionary of {‘<gRNA seq>’: [<list of gRNAHit obj>]} where gRNAHits in <list of gRNAHit obj> only contain hits to targets NOT already covered; AND where ONLY as yet unchosen gRNASeq obj are included
all_coverage (dict) – dictionary of {‘<gRNA seq>’: [<list of gRNAHit obj>]} where gRNAHits in <list of gRNAHit obj> only contain hits to all targets REGARDLESS of whether they’ve already been covered; AND where all gRNA’s gRNASeq obj are included REGARDLESS of whether they’ve already been chosen
covered (set) – set of IDs of targets already covered
- Returns
str – Of gRNA sequence
list – Of gRNAHit objects (for as yet uncovered targets) associated with the above gRNA sequence