minorg.grna module

class minorg.grna.CheckObj(*check_names)[source]

Bases: object

Object with checks that can be set.

_checks[source]

stores check values (format: {‘<check name>’: <check value (True, False, or None)>}

Type

dict

__init__(*check_names)[source]
all_checks_passed() bool[source]

Whether self passes all checks.

Return True only if the values of all checks are True.

Returns

Return type

bool

all_valid_checks_passed() bool[source]

Whether self passes all valid checks.

Return True if the values of all checks are True OR None BUT NOT False.

Returns

Return type

bool

check(check_name, mode='bool') Optional[Union[str, bool]][source]

Get check status.

Parameters
  • check_name (str) – check name

  • mode (str) – return type (valid values: “bool”, “str”)

Returns

  • str – If mode = 'str'. If check status is True, returns ‘pass’. If check status is False, returns ‘fail’. If check status is None, returns ‘NA’.

  • bool – If mode = 'bool' AND check status is not None

  • None – If mode = 'bool' AND check status is None

check_exists(check_name) bool[source]

Whether a given check exists.

Parameters

check_name (str) – check name

Returns

Whether check_name is a valid check name

Return type

bool

property check_names: list[source]

Get all check names

Returns

Of check names

Return type

list of str

clear_checks() None[source]

Set values of all checks to None.

set_check(check_name, status) None[source]

Set check value.

Parameters
  • check_name (str) – check name

  • status (str or bool or None) – check status

Check value will be set to:
  • True: If status = “pass” OR status = True

  • False: If status = “fail” OR status = False

  • None: If status is any other value

some_checks_passed(*check_names) bool[source]

Whether self passes all of some combination of checks.

Parameters

*check_names (str) – check names

Returns

Whether self passed all of the specified checks

Return type

bool

some_valid_checks_passed(*check_names)[source]

Whether self passes all of the valid checks in some combination of checks.

Only returns True if none of the values of the specified checks is False.

Parameters

*check_names (str) – check names

Returns

Whether self’s values for all specified checks are True OR None BUT NOT False

Return type

bool

class minorg.grna.Target(seq, id=None, strand=None)[source]

Bases: minorg.grna.CheckObj

gRNA target sequence.

_seq[source]

sequence, in uppercase

Type

str

_id[source]

sequence name

Type

str

_strand[source]

strand (possible values: None, ‘+’, ‘-‘)

Type

str or None

_sense[source]

sense (possible values: None, ‘+’, ‘-’; where ‘+’ means sense and ‘-’ means antisense)

Type

str or None

__init__(seq, id=None, strand=None)[source]

Create a Target object.

Parameters
  • seq (str or Bio.Seq.Seq) – sequence

  • id (str) – sequence name

  • strand (str) – sequence strand (values: ‘+’, ‘-‘)

property id: str[source]
parent_sense(mode='raw') Optional[str][source]

Return parent sequence’s sense status.

Parameters

mode (str) – return string format (valid values: “raw”, “str”)

Returns

  • str – If mode='str' (‘sense’, ‘antisense’, ‘NA’) OR parent sense has been set (‘+’, ‘-‘)

  • None – If mode='raw' AND parent sense has not been set

property sense: str[source]
property seq: str[source]
set_sense(sense) None[source]

Set sense.

Parameters

sense (str) – sense of sequence. Valid values: ‘+’, ‘sense’, ‘-’, ‘antisense’

set_sense_by_parent(parent_sense) None[source]

Set sense using parent sequence’s sense AND self’s strand.

Sense if
  • parent_sense='+' AND self’s strand is ‘+’

  • OR parent_sense='-' AND self’s strand is ‘-’

Else antisense.

Parameters

parent_sense (str) – sense of parent sequence. Valid values: ‘+’, ‘sense’, ‘-’, ‘antisense’

property strand: str[source]
valid_len() bool[source]

Whether length of sequence is equal to or greater than 1.

class minorg.grna.gRNAHit(target, start, end, strand, hit_id)[source]

Bases: minorg.grna.CheckObj

gRNA hit

_target[source]

Target object of sequence that gRNA targets

Type

Target

_range[source]

position of gRNA in target

Type

tuple

_strand[source]

strand of gRNA in target

Type

str

_hit_id[source]

unique hit identifier

__init__(target, start, end, strand, hit_id)[source]

Create a gRNAHit object.

Parameters
  • target (Target) – Target object of sequence that gRNA targets

  • start (int) – start position of gRNA in target (relative to parent sequence of target)

  • end (int) – end position of gRNA in target (relative to parent sequence of target)

  • hit_id – unique hit identifier

adj_range(mode='strand') tuple[source]

Return range based on strand, target, or gene.

mode='strand': - Return range relative to plus strand. Uses self.strand. mode='target': - Return range relative to target sequence. Uses self.strand == self.target_strand. mode='gene': - Return range relative to sense strand. Uses self.target.sense == self.strand.

Parameters

mode (str) – reference for range. Valid values: ‘strand’, ‘target’, ‘gene’

Returns

  • tuple – Of (start, end)

  • tuple – Of (float(‘nan’), float(‘nan’)) if the reference cannot be determined.

property end: int[source]
flank(length=100)[source]
property hit_id[source]
parent_sense(mode='raw')[source]
property range: tuple[source]
property reverse_range: tuple[source]
set_bg_check(status)[source]
set_exclude_check(status)[source]
set_feature_check(status)[source]
set_gc_check(status)[source]
set_parent_sense(strand) None[source]
property start: int[source]
property strand: str[source]
property target: minorg.grna.Target[source]
property target_id: str[source]
property target_len: int[source]
property target_strand: str[source]
class minorg.grna.gRNAHits(d=None, gRNA_seqs=None, gRNA_hits=None)[source]

Bases: object

Tracks multiple gRNASeq and gRNAHit objects.

_gRNAseqs[source]

stores gRNASeq objects by sequence (format: {‘<seq>’: <gRNASeq object>})

Type

dict

_hits[source]

stores gRNAHit objects by sequence (format: {‘<seq>’: [<gRNAHit objects>]})

Type

dict

_cached_coverage[source]

stores cached coverage info for all gRNA sequences (format: {‘<seq>’: {set of target IDs}})

Type

dict

__init__(d=None, gRNA_seqs=None, gRNA_hits=None)[source]

Create a gRNAHits object.

Parameters
  • d (dict) – dictionary of gRNAHit objects in same format as _hits. If provided, overrides gRNA_seqs and gRNA_hits.

  • gRNA_seqs (dict) – dictionary of gRNASeq objects in same format as _gRNAseqs

  • gRNA_hits (dict) – dictionary of gRNAHit objects in same format as _hits

add_hit(seq, gRNA_hit)[source]

Add gRNA hit.

If entry for the gRNA’s sequence does not exist in self._gRNAseqs, call add_seq() to add it too.

Parameters
  • seq (str or Bio.Seq.Seq) – gRNA sequence

  • gRNA_hit (gRNAHit) – gRNAHit object

add_seq(seq) None[source]

Add gRNA sequence if it doesn’t already in self._gRNAseqs and create empty entry for it in self._hits.

Parameters

seq (str or Bio.Seq.Seq) – gRNA sequence

all_target_len_valid() bool[source]

Whether all targets of gRNA have valid length (i.e. is greater than 0).

Returns

Return type

bool

assign_gRNAseq_id(fasta) str[source]

(Re)name gRNA according to FASTA file.

Parameters

fasta (str) – required, path to FASTA file

assign_seqid(prefix='gRNA_', zfill=3, assign_all=True) None[source]

Assign sequence ID to gRNA sequences using format <prefix><unique gRNA number>.

Parameters
  • prefix (str) – prefix for gRNA ID (default=’gRNA_’)

  • zfill (int) – number of leading zeroes (default=3)

  • assign_all (bool) – assign new sequence IDs to all gRNA sequences regardless of whether they already have a sequence ID

property check_names: list[source]

List of check names.

Type

list of str

property check_names_hits: list[source]

List of check names for gRNA hits (gRNAHit objects).

Type

list of str

property check_names_seqs: list[source]

List of check names for gRNA sequences (gRNASeq objects).

Type

list of str

clear_checks() None[source]

Clear checks for both gRNA sequences and hits.

collapse(ids=[], seqs=[])[source]

Group gRNA by identical coverage and create SetOfCollapsedgRNA from self. If neither ids nor seqs are provided, SetOfCollapsedgRNA will be generated from all gRNA.

Parameters
  • fout (str) – required, path to output file

  • ids (list) – list of IDs (str) of gRNA to write

  • seqs (list) – list of sequences (str) of gRNA to write, overrides ids

Returns

SetOfCollapsedgRNA

copy() minorg.grna.gRNAHits[source]

Deepcopy self to new gRNAHits object.

Returns

Return type

gRNAHits

filter_hits(*check_names, exclude_empty_seqs=True, accept_invalid=False, accept_invalid_field=True, all_checks=False, quiet=False, report_invalid_field=False)[source]

Filter gRNA hits by checks and return new gRNAHits object.

Parameters
  • *check_names (str) – check name(s) (not required if all_checks=True)

  • exclude_empty_seqs (bool) – exclude gRNA sequences from new gRNAHits object if none of their hits pass the filter(s)

  • accept_invalid (bool) – score unset checks as pass

  • accept_invalid_field (bool) – score unset checks as pass if a given check is not set for ALL hits

  • all_checks (bool) – filter using all checks

  • quiet (bool) – print only essential messages

Returns

Return type

gRNAHits

filter_hits_all_checks_passed(**kwargs)[source]

Wrapper for filter_hits(). Filter hits by ALL checks.

Parameters

**kwargs – other arguments passed to filter_hits()

filter_hits_some_checks_passed(*check_names, **kwargs)[source]

Wrapper for filter_hits(). Filter hits by some checks.

Parameters
  • *check_names (str) – required, check_names

  • **kwargs – other arguments passed to filter_hits()

filter_seqs(*check_names, accept_invalid=True, accept_invalid_field=True, all_checks=False, quiet=False, report_invalid_field=False)[source]

Filter gRNA sequences by checks and return new gRNAHits object.

Parameters
  • *check_names (str) – check name(s) (not required if all_checks=True)

  • accept_invalid (bool) – score unset checks as pass

  • accept_invalid_field (bool) – score unset checks as pass if a given check is not set for ALL hits

  • all_checks (bool) – filter using all checks

  • quiet (bool) – print only essential messages

Returns

Return type

gRNAHits

filter_seqs_all_checks_passed(**kwargs)[source]

Wrapper for filter_seqs(). Filter hits by ALL checks.

Parameters

**kwargs – other arguments passed to filter_seqs()

filter_seqs_some_checks_passed(*check_names, **kwargs)[source]

Wrapper for filter_seqs(). Filter gRNA sequences by some checks.

Parameters
  • *check_names (str) – required, check_names

  • **kwargs – other arguments passed to filter_seqs()

flatten_gRNAseqs() list[source]

Get list of gRNASeq objects.

flatten_hits() list[source]

Get list of gRNAHit objects.

property gRNAseqs: dict[source]
get_gRNAseq_by_id(id) Optional[minorg.grna.gRNASeq][source]

Get gRNASeq object with given sequence ID.

Parameters

seq (str) – gRNA sequence ID

Returns

  • gRNASeq – If exists

  • None – If doesn’t exist

get_gRNAseq_by_seq(seq) Optional[minorg.grna.gRNASeq][source]

Get gRNASeq object with given sequence.

Parameters

seq (str or Bio.Seq.Seq) – gRNA sequence

Returns

  • gRNASeq – If exists

  • None – If doesn’t exist

get_gRNAseqs_by_id(*ids) list[source]

Get multiple gRNASeq objects by sequence ID. Sequence IDs without associated gRNASeq objects will be skipped.

Parameters

*ids (str) – gRNA sequence ID(s)

Returns

Of gRNASeq objects

Return type

list

get_gRNAseqs_by_seq(*seqs) list[source]

Get multiple gRNASeq objects by sequence. Sequences without associated gRNASeq objects will be skipped.

Parameters

*seqs (str or Bio.Seq.Seq) – gRNA sequence(s)

Returns

Of gRNASeq objects

Return type

list

get_hits(seq) list[source]

Get hits of gRNA with given sequence.

Parameters

seq (str or Bio.Seq.Seq) – gRNA sequence

Returns

Of gRNAHit objects

Return type

list

hit_check_exists(check_name)[source]

Whether a given check exists for all gRNA hits.

Parameters

check_name (str) – check name

Returns

Return type

bool

property hits: dict[source]
parse_from_dict(d) None[source]

Read data from dictionary of gRNAHit objects and deep copy it to self.

Parameters

d (dict) – dictionary of gRNAHit objects in same format as _hits.

parse_from_mapping(fname, targets=None) None[source]

Read gRNA data from MINORg .map file.

Parameters
  • fname (str) – required, path to file

  • targets (str) – optional, path to file containing target sequences. Used to get target sequence length for tie breaking by favouring gRNA hits closer to 5’ end.

remove_seqs(*seqs) None[source]

Remove gRNA sequence and associated hits.

Parameters

*seqs (str or Bio.Seq.Seq) – gRNA sequence to remove

rename_seqs(fasta) None[source]

Rename gRNA according to FASTA file. Functionally identical to assign_gRNAseq_id() except it doesn’t check whether the FASTA file covers all gRNA sequences.

Parameters

fasta (str) – required, path to FASTA file

seq_check_exists(check_name)[source]

Whether a given check exists for all gRNA sequences.

Parameters

check_name (str) – check name

Returns

Return type

bool

property seqs: list[source]

List of gRNA sequences.

Type

list of str

set_all_seqs_check_by_function(check_name, func) None[source]

Set a given check for all gRNA sequences using a function that accepts str of gRNA sequence.

Parameters
  • check_name (str) – check name

  • func (func) – function that accepts str of gRNA sequence

set_seqs_check(check_name, status, seqs) None[source]

Set a given check to a given value for multiple gRNA sequences.

Parameters
  • check_name (str) – check name

  • status (str or bool or None) – check status

  • seqs (list of str) – list of gRNA sequences for which to set the check

set_seqs_check_by_function(check_name, func, seqs) None[source]

Set a given check for multiple gRNA sequences using a function that accepts str of gRNA sequence.

Parameters
  • check_name (str) – check name

  • func (func) – function that accepts str of gRNA sequence

  • seqs (list of str) – list of gRNA sequences for which to set the check

unused_hit_check(check_name) bool[source]

Whether a given check is NOT set for all gRNA hits.

Parameters

check_name (str) – check name

Returns

Return type

bool

unused_seq_check(check_name) bool[source]

Whether a given check is NOT set for all gRNA sequences.

Parameters

check_name (str) – check name

Returns

Return type

bool

update_records() None[source]

Remove gRNASeq objects from _gRNAseqs if their sequences are not also in _hits. Also remove gRNAHit objects from _hits if their sequences are not also in _gRNAseqs.

valid_hit_check(check_name) bool[source]

Whether a given check has been set for at least one gRNA hit.

Parameters

check_name (str) – check name

Returns

Return type

bool

valid_seq_check(check_name) bool[source]

Whether a given check has been set for at least one gRNA sequence.

Parameters

check_name (str) – check name

Returns

Return type

bool

write_equivalents(fout, ids=[], seqs=[], write_all=False, fasta=None) None[source]

Group gRNA by equivalent coverage and write groups to file.

Parameters
  • fout (str) – required, path to output file

  • ids (list) – list of IDs (str) of gRNA to write (if fasta is provided, the ids here should match those in the fasta file)

  • seqs (list) – list of sequences (str) of gRNA to write, overrides ids

  • write_all (bool) – write all gRNA seqences, overrides seqs and ids

  • fasta (str) – optional, path to FASTA file, used for renaming gRNA

write_fasta(fout, ids=[], seqs=[], write_all=False, fasta=None) None[source]

Write gRNA sequences to FASTA file.

Parameters
  • fout (str) – required, path to output file

  • ids (list) – list of IDs (str) of gRNA to write

  • seqs (list) – list of sequences (str) of gRNA to write, overrides ids

  • write_all (bool) – write all gRNA seqences, overrides seqs and ids

  • fasta (str) – optional, path to FASTA file, used for renaming gRNA

write_mapping(fout, sets=[], write_all=False, write_checks=False, checks=['background', 'GC', 'feature'], index=1, start_incl=True, end_incl=True, version=5, fasta=None) None[source]

Write MINORg .map file for gRNA mapping.

A MINORg .map file is tab-delimited and includes a header line. In version 5, columns are:

  • gRNA id: gRNA sequence ID

  • gRNA sequence: gRNA sequence

  • target id: sequence ID of gRNA target

  • target length: length (bp) of gRNA target

  • target sense: sense of gRNA target (‘sense’ or ‘antisense’)

  • gRNA strand: strand of gRNA (relative to target) (‘+’ or ‘-‘)

  • start: position of start of gRNA in target

  • end: position of end of gRNA in target

  • set: gRNA set number (set to 1 for all gRNA if write_all=True)

Any columns after ‘set’ are check statuses.

Parameters
  • fout (str) – required, path to output file

  • sets (list) – list of grouped gRNA sequences (e.g. [(‘<seq1>’, ‘<seq2>’), (‘<seq3>’,)]). If provided, only gRNA sequences included in sets will be written. All sets must be mutually exclusive. Used to assign set numbers in ‘set’ column. Overrides write_all.

  • write_all (bool) – write all gRNA sequences

  • write_checks (bool) – write check statuses

  • checks (list) – check names of checks to write (default=[‘background’, ‘GC’, ‘feature’])

  • index (int) – index of written gRNA range (default=1)

  • start_incl (bool) – whether written gRNA range is start-inclusive (default=True)

  • end_incl (bool) – whether written gRNA range is end-inclusive (default=True)

  • fasta (str) – optional, path to FASTA file, used for renaming gRNA

  • version (int) – .map file version

class minorg.grna.gRNASeq(seq, grnahits=None)[source]

Bases: minorg.grna.CheckObj

gRNA sequence.

Tracks checks applicable to gRNA sequence: off-target (check name: background), GC content (check name: GC)

_seq[source]

sequence, in uppercase

Type

str

_id[source]

sequence name

Type

str

_grnahits[source]

parent gRNAHits object

Type

gRNAHits

__init__(seq, grnahits=None)[source]

Create a gRNASeq object.

Parameters

seq (str or Bio.Seq.Seq) – sequence

property coverage: set[source]
property grnahits: minorg.grna.gRNAHits[source]
property hits: list[source]
property id: str[source]
property seq: str[source]
set_bg_check(status) None[source]

Set status for check name ‘background’ (False if gRNA has off-target effects).

Parameters

status (str or bool or None) – status of check. Valid values: ‘pass’, ‘fail’, ‘NA’, True, False, None

set_exclude_check(status) None[source]

Set status for check name ‘exclude’ (False if user wishes to exclude gRNA with this sequence).

Parameters

status (str or bool or None) – status of check (valid values: ‘pass’, ‘fail’, ‘NA’, True, False, None)

set_gc_check(status=None, gc_min=0, gc_max=1) None[source]

Set status for check name ‘GC’ (False if gRNA GC content is not within desired range).

Parameters
  • status (str or bool or None) – status of check (valid values: ‘pass’, ‘fail’, ‘NA’, True, False, None) If status is provided, gc_min and gc_max will be ignored.

  • gc_min (float) – minimum GC content (betweew 0 and 1, where 0 is no GC content and 1 is all GC)

  • gc_max (float) – maximum GC content (betweew 0 and 1, where 0 is no GC content and 1 is all GC)

property sorted_coverage: list[source]
property targets: set[source]