minorg.grna module
- class minorg.grna.CheckObj(*check_names)[source]
Bases:
object
Object with checks that can be set.
- _checks[source]
stores check values (format: {‘<check name>’: <check value (True, False, or None)>}
- Type
dict
- all_checks_passed() bool [source]
Whether self passes all checks.
Return True only if the values of all checks are True.
- Returns
- Return type
bool
- all_valid_checks_passed() bool [source]
Whether self passes all valid checks.
Return True if the values of all checks are True OR None BUT NOT False.
- Returns
- Return type
bool
- check(check_name, mode='bool') Optional[Union[str, bool]] [source]
Get check status.
- Parameters
check_name (str) – check name
mode (str) – return type (valid values: “bool”, “str”)
- Returns
str – If
mode = 'str'
. If check status is True, returns ‘pass’. If check status is False, returns ‘fail’. If check status is None, returns ‘NA’.bool – If
mode = 'bool'
AND check status is not NoneNone – If
mode = 'bool'
AND check status is None
- check_exists(check_name) bool [source]
Whether a given check exists.
- Parameters
check_name (str) – check name
- Returns
Whether
check_name
is a valid check name- Return type
bool
- property check_names: list[source]
Get all check names
- Returns
Of check names
- Return type
list of str
- set_check(check_name, status) None [source]
Set check value.
- Parameters
check_name (str) – check name
status (str or bool or None) – check status
- Check value will be set to:
True: If status = “pass” OR status = True
False: If status = “fail” OR status = False
None: If status is any other value
- some_checks_passed(*check_names) bool [source]
Whether self passes all of some combination of checks.
- Parameters
*check_names (str) – check names
- Returns
Whether self passed all of the specified checks
- Return type
bool
- some_valid_checks_passed(*check_names)[source]
Whether self passes all of the valid checks in some combination of checks.
Only returns True if none of the values of the specified checks is False.
- Parameters
*check_names (str) – check names
- Returns
Whether self’s values for all specified checks are True OR None BUT NOT False
- Return type
bool
- class minorg.grna.Target(seq, id=None, strand=None)[source]
Bases:
minorg.grna.CheckObj
gRNA target sequence.
- _sense[source]
sense (possible values: None, ‘+’, ‘-’; where ‘+’ means sense and ‘-’ means antisense)
- Type
str or None
- __init__(seq, id=None, strand=None)[source]
Create a Target object.
- Parameters
seq (str or Bio.Seq.Seq) – sequence
id (str) – sequence name
strand (str) – sequence strand (values: ‘+’, ‘-‘)
- parent_sense(mode='raw') Optional[str] [source]
Return parent sequence’s sense status.
- Parameters
mode (str) – return string format (valid values: “raw”, “str”)
- Returns
str – If
mode='str'
(‘sense’, ‘antisense’, ‘NA’) OR parent sense has been set (‘+’, ‘-‘)None – If
mode='raw'
AND parent sense has not been set
- set_sense(sense) None [source]
Set sense.
- Parameters
sense (str) – sense of sequence. Valid values: ‘+’, ‘sense’, ‘-’, ‘antisense’
- set_sense_by_parent(parent_sense) None [source]
Set sense using parent sequence’s sense AND self’s strand.
- Sense if
parent_sense='+'
AND self’s strand is ‘+’OR
parent_sense='-'
AND self’s strand is ‘-’
Else antisense.
- Parameters
parent_sense (str) – sense of parent sequence. Valid values: ‘+’, ‘sense’, ‘-’, ‘antisense’
- class minorg.grna.gRNAHit(target, start, end, strand, hit_id)[source]
Bases:
minorg.grna.CheckObj
gRNA hit
- __init__(target, start, end, strand, hit_id)[source]
Create a gRNAHit object.
- Parameters
target (
Target
) – Target object of sequence that gRNA targetsstart (int) – start position of gRNA in target (relative to parent sequence of target)
end (int) – end position of gRNA in target (relative to parent sequence of target)
hit_id – unique hit identifier
- adj_range(mode='strand') tuple [source]
Return range based on strand, target, or gene.
mode='strand'
: - Return range relative to plus strand. Uses self.strand.mode='target'
: - Return range relative to target sequence. Uses self.strand == self.target_strand.mode='gene'
: - Return range relative to sense strand. Uses self.target.sense == self.strand.- Parameters
mode (str) – reference for range. Valid values: ‘strand’, ‘target’, ‘gene’
- Returns
tuple – Of (start, end)
tuple – Of (float(‘nan’), float(‘nan’)) if the reference cannot be determined.
- property target: minorg.grna.Target[source]
- class minorg.grna.gRNAHits(d=None, gRNA_seqs=None, gRNA_hits=None)[source]
Bases:
object
Tracks multiple gRNASeq and gRNAHit objects.
- _gRNAseqs[source]
stores gRNASeq objects by sequence (format: {‘<seq>’: <gRNASeq object>})
- Type
dict
- _hits[source]
stores gRNAHit objects by sequence (format: {‘<seq>’: [<gRNAHit objects>]})
- Type
dict
- _cached_coverage[source]
stores cached coverage info for all gRNA sequences (format: {‘<seq>’: {set of target IDs}})
- Type
dict
- add_hit(seq, gRNA_hit)[source]
Add gRNA hit.
If entry for the gRNA’s sequence does not exist in self._gRNAseqs, call
add_seq()
to add it too.- Parameters
seq (str or Bio.Seq.Seq) – gRNA sequence
gRNA_hit (
gRNAHit
) – gRNAHit object
- add_seq(seq) None [source]
Add gRNA sequence if it doesn’t already in self._gRNAseqs and create empty entry for it in self._hits.
- Parameters
seq (str or Bio.Seq.Seq) – gRNA sequence
- all_target_len_valid() bool [source]
Whether all targets of gRNA have valid length (i.e. is greater than 0).
- Returns
- Return type
bool
- assign_gRNAseq_id(fasta) str [source]
(Re)name gRNA according to FASTA file.
- Parameters
fasta (str) – required, path to FASTA file
- assign_seqid(prefix='gRNA_', zfill=3, assign_all=True) None [source]
Assign sequence ID to gRNA sequences using format <prefix><unique gRNA number>.
- Parameters
prefix (str) – prefix for gRNA ID (default=’gRNA_’)
zfill (int) – number of leading zeroes (default=3)
assign_all (bool) – assign new sequence IDs to all gRNA sequences regardless of whether they already have a sequence ID
- property check_names_hits: list[source]
List of check names for gRNA hits (gRNAHit objects).
- Type
list of str
- property check_names_seqs: list[source]
List of check names for gRNA sequences (gRNASeq objects).
- Type
list of str
- collapse(ids=[], seqs=[])[source]
Group gRNA by identical coverage and create
SetOfCollapsedgRNA
from self. If neitherids
norseqs
are provided, SetOfCollapsedgRNA will be generated from all gRNA.- Parameters
fout (str) – required, path to output file
ids (list) – list of IDs (str) of gRNA to write
seqs (list) – list of sequences (str) of gRNA to write, overrides
ids
- Returns
SetOfCollapsedgRNA
- copy() minorg.grna.gRNAHits [source]
Deepcopy self to new
gRNAHits
object.- Returns
- Return type
- filter_hits(*check_names, exclude_empty_seqs=True, accept_invalid=False, accept_invalid_field=True, all_checks=False, quiet=False, report_invalid_field=False)[source]
Filter gRNA hits by checks and return new gRNAHits object.
- Parameters
*check_names (str) – check name(s) (not required if
all_checks=True
)exclude_empty_seqs (bool) – exclude gRNA sequences from new gRNAHits object if none of their hits pass the filter(s)
accept_invalid (bool) – score unset checks as pass
accept_invalid_field (bool) – score unset checks as pass if a given check is not set for ALL hits
all_checks (bool) – filter using all checks
quiet (bool) – print only essential messages
- Returns
- Return type
- filter_hits_all_checks_passed(**kwargs)[source]
Wrapper for
filter_hits()
. Filter hits by ALL checks.- Parameters
**kwargs – other arguments passed to
filter_hits()
- filter_hits_some_checks_passed(*check_names, **kwargs)[source]
Wrapper for
filter_hits()
. Filter hits by some checks.- Parameters
*check_names (str) – required, check_names
**kwargs – other arguments passed to
filter_hits()
- filter_seqs(*check_names, accept_invalid=True, accept_invalid_field=True, all_checks=False, quiet=False, report_invalid_field=False)[source]
Filter gRNA sequences by checks and return new gRNAHits object.
- Parameters
*check_names (str) – check name(s) (not required if
all_checks=True
)accept_invalid (bool) – score unset checks as pass
accept_invalid_field (bool) – score unset checks as pass if a given check is not set for ALL hits
all_checks (bool) – filter using all checks
quiet (bool) – print only essential messages
- Returns
- Return type
- filter_seqs_all_checks_passed(**kwargs)[source]
Wrapper for
filter_seqs()
. Filter hits by ALL checks.- Parameters
**kwargs – other arguments passed to
filter_seqs()
- filter_seqs_some_checks_passed(*check_names, **kwargs)[source]
Wrapper for
filter_seqs()
. Filter gRNA sequences by some checks.- Parameters
*check_names (str) – required, check_names
**kwargs – other arguments passed to
filter_seqs()
- get_gRNAseq_by_id(id) Optional[minorg.grna.gRNASeq] [source]
Get gRNASeq object with given sequence ID.
- Parameters
seq (str) – gRNA sequence ID
- Returns
gRNASeq – If exists
None – If doesn’t exist
- get_gRNAseq_by_seq(seq) Optional[minorg.grna.gRNASeq] [source]
Get gRNASeq object with given sequence.
- Parameters
seq (str or Bio.Seq.Seq) – gRNA sequence
- Returns
gRNASeq – If exists
None – If doesn’t exist
- get_gRNAseqs_by_id(*ids) list [source]
Get multiple gRNASeq objects by sequence ID. Sequence IDs without associated gRNASeq objects will be skipped.
- Parameters
*ids (str) – gRNA sequence ID(s)
- Returns
Of gRNASeq objects
- Return type
list
- get_gRNAseqs_by_seq(*seqs) list [source]
Get multiple gRNASeq objects by sequence. Sequences without associated gRNASeq objects will be skipped.
- Parameters
*seqs (str or Bio.Seq.Seq) – gRNA sequence(s)
- Returns
Of gRNASeq objects
- Return type
list
- get_hits(seq) list [source]
Get hits of gRNA with given sequence.
- Parameters
seq (str or Bio.Seq.Seq) – gRNA sequence
- Returns
Of gRNAHit objects
- Return type
list
- hit_check_exists(check_name)[source]
Whether a given check exists for all gRNA hits.
- Parameters
check_name (str) – check name
- Returns
- Return type
bool
- parse_from_dict(d) None [source]
Read data from dictionary of gRNAHit objects and deep copy it to self.
- Parameters
d (dict) – dictionary of gRNAHit objects in same format as
_hits
.
- parse_from_mapping(fname, targets=None) None [source]
Read gRNA data from MINORg .map file.
- Parameters
fname (str) – required, path to file
targets (str) – optional, path to file containing target sequences. Used to get target sequence length for tie breaking by favouring gRNA hits closer to 5’ end.
- remove_seqs(*seqs) None [source]
Remove gRNA sequence and associated hits.
- Parameters
*seqs (str or Bio.Seq.Seq) – gRNA sequence to remove
- rename_seqs(fasta) None [source]
Rename gRNA according to FASTA file. Functionally identical to
assign_gRNAseq_id()
except it doesn’t check whether the FASTA file covers all gRNA sequences.- Parameters
fasta (str) – required, path to FASTA file
- seq_check_exists(check_name)[source]
Whether a given check exists for all gRNA sequences.
- Parameters
check_name (str) – check name
- Returns
- Return type
bool
- set_all_seqs_check_by_function(check_name, func) None [source]
Set a given check for all gRNA sequences using a function that accepts str of gRNA sequence.
- Parameters
check_name (str) – check name
func (func) – function that accepts str of gRNA sequence
- set_seqs_check(check_name, status, seqs) None [source]
Set a given check to a given value for multiple gRNA sequences.
- Parameters
check_name (str) – check name
status (str or bool or None) – check status
seqs (list of str) – list of gRNA sequences for which to set the check
- set_seqs_check_by_function(check_name, func, seqs) None [source]
Set a given check for multiple gRNA sequences using a function that accepts str of gRNA sequence.
- Parameters
check_name (str) – check name
func (func) – function that accepts str of gRNA sequence
seqs (list of str) – list of gRNA sequences for which to set the check
- unused_hit_check(check_name) bool [source]
Whether a given check is NOT set for all gRNA hits.
- Parameters
check_name (str) – check name
- Returns
- Return type
bool
- unused_seq_check(check_name) bool [source]
Whether a given check is NOT set for all gRNA sequences.
- Parameters
check_name (str) – check name
- Returns
- Return type
bool
- update_records() None [source]
Remove gRNASeq objects from
_gRNAseqs
if their sequences are not also in_hits
. Also remove gRNAHit objects from_hits
if their sequences are not also in_gRNAseqs
.
- valid_hit_check(check_name) bool [source]
Whether a given check has been set for at least one gRNA hit.
- Parameters
check_name (str) – check name
- Returns
- Return type
bool
- valid_seq_check(check_name) bool [source]
Whether a given check has been set for at least one gRNA sequence.
- Parameters
check_name (str) – check name
- Returns
- Return type
bool
- write_equivalents(fout, ids=[], seqs=[], write_all=False, fasta=None) None [source]
Group gRNA by equivalent coverage and write groups to file.
- Parameters
fout (str) – required, path to output file
ids (list) – list of IDs (str) of gRNA to write (if
fasta
is provided, the ids here should match those in thefasta
file)seqs (list) – list of sequences (str) of gRNA to write, overrides
ids
write_all (bool) – write all gRNA seqences, overrides
seqs
andids
fasta (str) – optional, path to FASTA file, used for renaming gRNA
- write_fasta(fout, ids=[], seqs=[], write_all=False, fasta=None) None [source]
Write gRNA sequences to FASTA file.
- Parameters
fout (str) – required, path to output file
ids (list) – list of IDs (str) of gRNA to write
seqs (list) – list of sequences (str) of gRNA to write, overrides
ids
write_all (bool) – write all gRNA seqences, overrides
seqs
andids
fasta (str) – optional, path to FASTA file, used for renaming gRNA
- write_mapping(fout, sets=[], write_all=False, write_checks=False, checks=['background', 'GC', 'feature'], index=1, start_incl=True, end_incl=True, version=5, fasta=None) None [source]
Write MINORg .map file for gRNA mapping.
A MINORg .map file is tab-delimited and includes a header line. In version 5, columns are:
gRNA id: gRNA sequence ID
gRNA sequence: gRNA sequence
target id: sequence ID of gRNA target
target length: length (bp) of gRNA target
target sense: sense of gRNA target (‘sense’ or ‘antisense’)
gRNA strand: strand of gRNA (relative to target) (‘+’ or ‘-‘)
start: position of start of gRNA in target
end: position of end of gRNA in target
set: gRNA set number (set to 1 for all gRNA if
write_all=True
)
Any columns after ‘set’ are check statuses.
- Parameters
fout (str) – required, path to output file
sets (list) – list of grouped gRNA sequences (e.g. [(‘<seq1>’, ‘<seq2>’), (‘<seq3>’,)]). If provided, only gRNA sequences included in
sets
will be written. All sets must be mutually exclusive. Used to assign set numbers in ‘set’ column. Overrideswrite_all
.write_all (bool) – write all gRNA sequences
write_checks (bool) – write check statuses
checks (list) – check names of checks to write (default=[‘background’, ‘GC’, ‘feature’])
index (int) – index of written gRNA range (default=1)
start_incl (bool) – whether written gRNA range is start-inclusive (default=True)
end_incl (bool) – whether written gRNA range is end-inclusive (default=True)
fasta (str) – optional, path to FASTA file, used for renaming gRNA
version (int) – .map file version
- class minorg.grna.gRNASeq(seq, grnahits=None)[source]
Bases:
minorg.grna.CheckObj
gRNA sequence.
Tracks checks applicable to gRNA sequence: off-target (check name: background), GC content (check name: GC)
- __init__(seq, grnahits=None)[source]
Create a gRNASeq object.
- Parameters
seq (str or Bio.Seq.Seq) – sequence
- property grnahits: minorg.grna.gRNAHits[source]
- set_bg_check(status) None [source]
Set status for check name ‘background’ (False if gRNA has off-target effects).
- Parameters
status (str or bool or None) – status of check. Valid values: ‘pass’, ‘fail’, ‘NA’, True, False, None
- set_exclude_check(status) None [source]
Set status for check name ‘exclude’ (False if user wishes to exclude gRNA with this sequence).
- Parameters
status (str or bool or None) – status of check (valid values: ‘pass’, ‘fail’, ‘NA’, True, False, None)
- set_gc_check(status=None, gc_min=0, gc_max=1) None [source]
Set status for check name ‘GC’ (False if gRNA GC content is not within desired range).
- Parameters
status (str or bool or None) – status of check (valid values: ‘pass’, ‘fail’, ‘NA’, True, False, None) If
status
is provided,gc_min
andgc_max
will be ignored.gc_min (float) – minimum GC content (betweew 0 and 1, where 0 is no GC content and 1 is all GC)
gc_max (float) – maximum GC content (betweew 0 and 1, where 0 is no GC content and 1 is all GC)