minorg.minimum_set module

class minorg.minimum_set.CollapsedgRNA(name, gRNA_objs)[source]

Bases: minorg.minweight_sc.Set

__init__(name, gRNA_objs)[source]

Effectively a set object that tracks name and weight.

Parameters
  • name (str) – set name

  • weight (float) – set weight

  • elements (iterable) – elements in set

add(gRNA_obj) None[source]

Add a single gRNA object to self.

Parameters

gRNA_obj (gRNA) – gRNA object

copy()[source]

Returns a shallow copy of self. (i.e. retains the same gRNA objects, but stores them in a new set)

Returns

CollapsedgRNA

is_empty() bool[source]

Returns True if no self.num_grna == 0

least_5prime()[source]

Returns gRNA object furthest from 5’ (Usually but not necessarily closest to 3’)

Returns

gRNA

most_5prime()[source]

Returns gRNA object closest to 5’

Returns

gRNA

num_grna() int[source]

Returns number of gRNA objects stored

remove(gRNA_obj) None[source]

Remove a single gRNA object from self.

Parameters

gRNA_obj (gRNA) – gRNA object

class minorg.minimum_set.SetOfCollapsedgRNA(*args, **kwargs)[source]

Bases: minorg.minweight_sc.SetOfSets

__init__(*args, **kwargs)[source]
Parameters

Sets (iter) – iter of Set objects

add_grna(*gRNAs) None[source]

Add gRNA to a CollapsedgRNA in self with same coverage. If no CollapsedgRNA has same coverage as the gRNA to add, the gRNA will be skipped.

:param gRNA:

all_not_empty() bool[source]
Returns

copy()[source]

Returns shallow copy of self. (i.e. retains same CollapsedgRNA objects but stores them in a new set)

Returns

SetOfCollapsedgRNA

empty_collapsed_grna()[source]
Returns

empty gRNA objects in self

Return type

gRNA

generate_grna_set(prioritise_3prime=False, consume=False)[source]

Outputs list of 1 gRNA from each CollapsedgRNA in self. By default, the gRNA closest to the 5’ (sense, if information exists) strand are selected from each CollapsedgRNA object.

Parameters
  • prioritise_3prime (bool) – select gRNA from CollapsedgRNA using proximity to 3’ end instead of 5’ end (default=False)

  • consume (bool) – remove output gRNA from self’s CollapsedgRNA objects permanently

Returns

list of gRNA objects

Return type

list

generate_grna_sets(prioritise_3prime=False, set_num=1, max_set_num=1, manual=True)[source]

Yields lists of 1 gRNA from each CollapsedgRNA in self.

Parameters
  • prioritise_3prime (bool) – tie-break with proximity to 3’ end instead of 5’ (default=False)

  • set_num (int) – number to print for manual check. Also used with ‘max_set_num’ to set limit on number of sets to generate. (default=1)

  • max_set_num (int) – maximum set number. Maximum of sets generated will be ‘<max_set_num> - <set_num> + 1’ (default=1)

  • manual (bool) – enable manual screening of each gRNA set for approval at interactive terminal (default=False)

Returns

generator that yields a set of gRNA in format [<1 gRNA object from each CollapsedgRNA in self>]

Return type

generator

property max_coverage: int[source]
property min_coverage: int[source]
remove_empty() None[source]

Remove CollaspedgRNA from self if the CollapsedgRNA object is empty (i.e. no gRNA in it)

remove_grna(*gRNAs) None[source]

Remove gRNA from any CollapsedgRNA in self.

:param gRNA: gRNA object

write(fout) None[source]

Write collapsed gRNAs to file. Fields are:

  • coverage group: unique ID given to each collapsed gRNA group

  • coverage: number of targets covered by coverage group

  • gRNA id: gRNA ID

  • gRNA sequence: gRNA sequence

  • relative pos: relative position to 5’ end, only valid for comparison within a coverage group

Parameters

fout (str) – path to output file

minorg.minimum_set.all_best_nr(potential_coverage, all_coverage, covered)[source]

Get all gRNA with equivalent non-redundnacy.

This function prioritises gRNA with fewest target overlap with already covered targets.

Parameters
  • potential_coverage (dict) – dictionary of {‘<gRNA seq>’: [<list of gRNAHit obj>]} where gRNAHits in <list of gRNAHit obj> only contain hits to targets NOT already covered; AND where ONLY as yet unchosen gRNASeq obj are included

  • all_coverage (dict) – dictionary of {‘<gRNA seq>’: [<list of gRNAHit obj>]} where gRNAHits in <list of gRNAHit obj> only contain hits to all targets REGARDLESS of whether they’ve already been covered; AND where all gRNA’s gRNASeq obj are included REGARDLESS of whether they’ve already been chosen

  • covered (set) – set of IDs of targets already covered

Returns

dictionary of {‘<gRNA seq (str)>’: [<list of gRNAHit obj>]} subset of ‘potential_coverage’ with equivalent non-redundancy

Return type

dict

minorg.minimum_set.all_best_pos(potential_coverage, all_coverage, covered)[source]

Get all gRNA with equivalent closeness to 5’.

This function prioritises gRNA closest to 5’ end.

Parameters
  • potential_coverage (dict) – dictionary of {‘<gRNA seq>’: [<list of gRNAHit obj>]} where gRNAHits in <list of gRNAHit obj> only contain hits to targets NOT already covered; AND where ONLY as yet unchosen gRNASeq obj are included

  • all_coverage (dict) – dictionary of {‘<gRNA seq>’: [<list of gRNAHit obj>]} where gRNAHits in <list of gRNAHit obj> only contain hits to all targets REGARDLESS of whether they’ve already been covered; AND where all gRNA’s gRNASeq obj are included REGARDLESS of whether they’ve already been chosen

  • covered (set) – set of IDs of targets already covered

Returns

dictionary of {‘<gRNA seq (str)>’: [<list of gRNAHit obj>]} subset of ‘potential_coverage’ with equivalent closeness to 5’

Return type

dict

class minorg.minimum_set.gRNA(name, gRNASeq_obj)[source]

Bases: minorg.minweight_sc.Set

__init__(name, gRNASeq_obj)[source]

Effectively a set object that tracks name and weight.

Parameters
  • name (str) – set name

  • weight (float) – set weight

  • elements (iterable) – elements in set

closer_to_5prime(other) bool[source]

Takes another gRNA object and returns True if self is closer to 5’ end (of sense strand, if sense information exists) by comparing relative positions between shared targets. If tie, return True.

Returns

bool

Raises

Exception – If no common targets between the two gRNA objects (cannot compare across different targets)

common_targets(other)[source]
property coverage: int[source]
property hits: list[source]
property id: str[source]
property relative_5prime_pos: float[source]

Returns a value calculated from all hits that is ONLY VALID FOR COMPARISON BETWEEN gRNA with the SAME COVERAGE. The smaller the value, the closer to the 5’ end.

relative_distance_to_5prime(other) float[source]

Takes another gRNA object and returns self._relative_5prime_pos(<hits in shared targets>) - other._relative_5prime_pos(<hits in shared targets>). If self, is closer to the 5’ end, the value returned will be negative.

Returns

float

Raises

Exception – If no common targets between the two gRNA objects (cannot compare across different targets)

property seq: str[source]
subset_hits_by_target(target_ids) list[source]

Returns list of gRNAHit objects where the hit is to a target with the same name as at lest one entry in ‘target_ids’.

property targets: list[source]
minorg.minimum_set.limited_minweight_SC(collapsed_grnas, num_sets, targets=None, num_lengths_to_track=None, low_coverage_penalty=0)[source]

Executes enum_approx_order_SC() for a capped number of iterations (max(20, 2*num_sets)) while seeding each run with a different CollapsedgRNA, starting with the CollapsedgRNA with the highest coverage to the lowest. CollapsedgRNA will be removed from candidate list once all CollapsedgRNA with equivalent coverage have been used as seed. Stops when coverage of the next CollapsedgRNA to be seeded has a coverage of less than <total targets>/<num CollapsedgRNA in <num_lengths_to_track>th smallest set cover solution>.

Parameters
  • collapsed_grnas (CollapsedgRNA) – CollapsedgRNA object

  • num_sets (int) – desired number of sets. Used to inform maximum number of iterations.

  • targets (list or set or tuple) – targets IDs (str) of targets to be coverd

  • num_lengths_to_track (int) – length of <num_lengths_to_track>th smallest set cover solution will be used to determine whether to terminate search (see function description)

  • low_coverage_penalty (float) – multiplier for value calculated by minweight_sc(), which will then be multiplied by <number of remaining targets that are not covered by gRNA> and then added to the output of that function. Effectively penalises large sets of many small coverage gRNA. This might make the set less redundant, but will likely reduce set size.

Returns

set cover solutions (CollapsedgRNA)

Return type

list

minorg.minimum_set.limited_optimal_SC(U, S, size=1, redundancy=1)[source]

Attempts to find set cover solutions by brute force with a capped maximum set size and redundancy.

Parameters
  • U (set) – set of elements (targets) to cover

  • S (SetOfSets) – SetOfSets (or child class) object containing sets (gRNA coverage) for set cover

  • size (int) – maximum set cover solution size for optimal search

  • redundancy (float) – maximum allowable redundancy as fraction of total number of elements to be covered (U)

Returns

Of SetOfSets; set cover solutions

Return type

list

minorg.minimum_set.make_get_minimum_set(gRNA_hits, manual_check=True, exclude_seqs={}, targets=None, prioritise_nr=False, sc_algorithm='LAR', num_sets=1, tie_breaker=None, low_coverage_penalty=0.5, suppress_warning=False, impossible_set_message='If you used the full programme, consider adjusting --minlen, --minid, and --merge-within, and/or raising --check-reciprocal to restrict candidate target sequences to true biological replicates. Also consider using --domain to restrict the search.')[source]

Make function to generate minimum set of gRNA.

Parameters
  • gRNA_hits (list) – gRNAHit objects

  • manual_check (bool) – manually approve each gRNA set

  • exclude_seqs (set/list) – optional, gRNA sequences (str) to exclude

  • targets (list) – optional, target IDs (str)

  • prioritise_nr (bool) – prioritise non-redundancy. If used, ‘sc_algorithm’ will be ignored. (default=False)

  • sc_algorithm (str) – set cover algorithm when not prioritising non-redundancy. Only used if prioritise_nr=False. (default=’LAR’)

  • num_sets (int) – number of sets

  • tie_breaker (func) – tie-breaker function. Takes (1) ‘gRNA_coverage’ filtered for unselected gRNA seq, (2) unmodified ‘gRNA_coverage’, (3) list of IDs of targets covered by already selected gRNA

  • impossible_set_message (str) – message to print when gRNA cannot cover all targets

  • suppress_warning (bool) – suppress printing of warning when gRNA cannot cover all targets

Returns

function that takes no arguments and returns list of minimum set of gRNA sequences (str)

Return type

func

minorg.minimum_set.make_set_cover_nr(gRNA_hits, num_sets=1, target_ids=[], low_coverage_penalty=0, num_lengths_to_track=None, prioritise_3prime=False, optimal_depth=5, suppress_warning=False)[source]

Create function that generates mutually exclusive gRNA sets with non-redundancy as a priority.

Parameters
  • gRNA_hits (gRNAHits) – gRNAHits object

  • num_sets (int) – number of mutually exclusive gRNA sets to return (default=1)

  • target_ids (list) – list of target names/IDs (str) to be covered. If not provided, will be inferred from the set of target IDs covered by gRNA hits in ‘gRNA_hits’

  • manual (bool) – manually approve each gRNA set for inclusion through interactive terminal (default=False)

  • low_coverage_penalty (float) – multiplier for value calculated by minweight_sc(), which will then be multiplied by <number of remaining targets that are not covered by gRNA> and then added to the output of that function. Effectively penalises large sets of many small coverage gRNA. This might make the set less redundant, but will likely reduce set size.

  • prioritise_3prime (bool) – tie-break with proximity to 3’ end instead of 5’ (default=False)

Returns

function that returns list (gRNA panel) of str (gRNA names)

Return type

func

minorg.minimum_set.make_set_cover_pos(gRNA_hits, num_sets=1, target_ids=[], algorithm='LAR', id_key=<function <lambda>>, tie_breaker=<function tie_break_first>, suppress_warning=False)[source]

Create function that generates mutually exclusive minimum gRNA sets with position as a priority.

Parameters
  • gRNA_hits (list) – gRNAHit objects

  • target_ids (list) – list of target names/IDs (str) to be covered. If not provided, will be inferred from the set of target IDs covered by gRNA hits in ‘gRNA_hits’

  • algorithm (str) – set cover algorithm

  • exclude_seqs (set/list) – gRNA sequences (str) to exclude

  • id_key (func) – function to extract target ID from gRNAHit obj

  • tie_breaker (func) – tie-breaker function. Takes (1) ‘gRNA_coverage’ filtered for unselected gRNA seq, (2) unmodified ‘gRNA_coverage’, (3) list of IDs of targets covered by already selected gRNA

  • suppress_warning (bool) – suppress printing of warning when gRNA cannot cover all targets

Returns

minimum set of gRNA (gRNA)

Return type

list

minorg.minimum_set.manual_check_prompt(grnas, set_num=None)[source]

Prints prompt for manual check.

Parameters

grnas (list) – list of gRNASeq or gRNA objects. Should already be sorted in printing order.

Returns

user input

Return type

str

minorg.minimum_set.set_cover_LAR(gRNA_coverage, target_ids, id_key=<function <lambda>>, tie_breaker=<function tie_break_first>)[source]

Set cover algorithm LAR.

Algorithm described in: Yang, Q., Nofsinger, A., Mcpeek, J., Phinney, J. and Knuesel, R. (2015). A Complete Solution to the Set Covering Problem. In International Conference on Scientific Computing (CSC) pp. 36–41

Parameters
  • gRNA_coverage (dict) – {‘<gRNA seq>’: [<list of gRNAHit obj associated w/ that gRNA seq>]}

  • target_ids (list) – IDs of targets to cover

  • id_key (func) – function to extract target ID from gRNAHit obj

  • tie_breaker (func) – tie-breaker function. Takes (1) ‘gRNA_coverage’ filtered for unselected gRNA seq, (2) unmodified ‘gRNA_coverage’, (3) list of IDs of targets covered by already selected gRNA

Returns

minimum set of gRNA sequences (str)

Return type

set

minorg.minimum_set.set_cover_greedy(gRNA_coverage, target_ids, id_key=<function <lambda>>, tie_breaker=<function tie_break_first>)[source]

Greedy set cover algorithm.

Parameters
  • gRNA_coverage (dict) – {‘<gRNA seq>’: [<list of gRNAHit obj associated w/ that gRNA seq>]}

  • target_ids (list) – IDs of targets to cover

  • id_key (func) – function to extract target ID from gRNAHit obj

  • tie_breaker (func) – tie-breaker function. Takes (1) ‘gRNA_coverage’ filtered for unselected gRNA seq, (2) unmodified ‘gRNA_coverage’, (3) list of IDs of targets covered by already selected gRNA

Returns

minimum set of gRNA sequences (str)

Return type

set

minorg.minimum_set.tie_break_first(cov, all_cov, coverage)[source]

Arbitrarily returns the first gRNA sequence and its associated gRNAHit objects.

Parameters
  • potential_coverage (dict) – dictionary of {‘<gRNA seq>’: [<list of gRNAHit obj>]} where gRNAHits in <list of gRNAHit obj> only contain hits to targets NOT already covered; AND where ONLY as yet unchosen gRNASeq obj are included

  • all_coverage (dict) – dictionary of {‘<gRNA seq>’: [<list of gRNAHit obj>]} where gRNAHits in <list of gRNAHit obj> only contain hits to all targets REGARDLESS of whether they’ve already been covered; AND where all gRNA’s gRNASeq obj are included REGARDLESS of whether they’ve already been chosen

  • covered (set) – set of IDs of targets already covered

Returns

  • str – Of gRNA sequence

  • list – Of gRNAHit objects (for as yet uncovered targets) associated with the above gRNA sequence