minorg.MINORg module

minorg.MINORg.parse_lookup(iterable, lookup, return_first=False)[source]: commas not allowed in values

minorg.MINORg.valid_readable_file(pathname)[source]

minorg.MINORg.valid_aliases(aliases, lookup: dict, raise_error: bool = True, message: Optional[str] = None, param: Optional[minorg.parse_config.Param] = None, none_value=None, all_value=None, clear_value=None, display_cmd=None, additional_message=None, return_mapping=False)[source]: Generates appropriate error + message if alias(es) is/are invalid. Throws the error generated if requested (i.e. raise_error = True). Requires alias(es) (str of single alias or iterable of multiple aliases) + set lookup dictionary

class minorg.MINORg.PathHandler(tmp=False, keep_tmp=False, directory=None)[source]

Bases: object

Tracks new files/directories and temporary files.

tmp[source]

whether temporary directory is used

Type: bool

keep_tmp[source]

whether to retain temporary files/directories

Type: bool

out_dir[source]

absolute path to final output directory

Type: str

new_dirs[source]

paths of newly created directories

Type: list of str

tmp_files[source]

paths of temporary files/directories

Type: list of str

tmp_dir[source]

path of temporary directory

Type: str

directory[source]

path of directory being written into. If tmp=True, this will point to a temporary directory. Else, this will be the output directory.

Type: str

__init__(tmp=False, keep_tmp=False, directory=None)[source]

Create a PathHandler object.

Parameters

tmp (bool) – write files to temporary directory (to be deleted or moved to final output directory using resolve)
keep_tmp (bool) – retain temporary files/directories
directory (str) – final output directory (will be directly written into if tmp=False)

property tracebacklimit[source]

mkdir(*directories, tmp=False)[source]

Create directory/directories.

Parameters

*directories (str) – path
tmp (bool) – mark directory as temporary (for deletion when self.rm_tmpfiles is called)

mkfname(*path, tmp=False)[source]

Generate path.

If path provided is not absolute, self.active_directory is assumed to be the root directory.

Parameters

*path (str) – required, path to output file (e.g. self.mkfname(‘tmp’, ‘tmp.fasta’) –> <self.active_directory>/tmp/tmp.fasta)
tmp (bool) – mark file as temporary (for deletion when self.rm_tmpfiles is called)

Returns

path

Return type

str

reserve_fname(*path, tmp=False, newfile=False, **kwargs)[source]

Generate new file.

Operates exactly as PathHandler.mkfname(), with the additional options of creating an empty file or clearing an existing file.

Parameters

*path (str) – path to output file
tmp (bool) – mark file as temporary (for deletion when self.rm_tmpfiles is called)
newfile (bool) – clear an existing file at destination if it already exists
**kwargs – other arguments to pass to self.mkfname

Returns

path

Return type

str

rm_tmpfiles(*fnames)[source]

Delete all files and directories marked as temporary (in self.tmp_files).

Directories will only be deleted if they are empty.

Parameters: *fnames (str) – optional, path. If provided, only the specified files/directories will be deleted if they are also marked as temporary. Otherwise, all temporary files and directories are deleted.

resolve()[source]: If self.tmp, move output files to final output directory.

class minorg.MINORg.MINORg(directory=None, config=None, prefix='minorg', thread=None, keep_tmp=False, cli=False, auto_update_files=True, **kwargs)[source]

Bases: minorg.MINORg.PathHandler

Tracks parameters, intermediate files, and output file for reuse.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/my/output/directory",
                       prefix = "test", tmp = False, keep_tmp = True, thread = 1)
>>> my_minorg.add_reference("/path/to/TAIR10_Chr.all.fasta", "/path/to/TAIR10_GFF3.genes.gff", alias = "TAIR10", replace = True)
>>> my_minorg.add_reference("/path/to/Alyrata_384_v1.fa", "/path/to/Alyrata_384_v2.1.gene.gff3", alias = "araly2")
>>> my_minorg.genes = ["AT5G66900", "AL8G44500.v2.1"]
>>> my_minorg.subset_annotation()
>>> [y.get_attr("ID") for y in my_minorg.reference["TAIR10"].annotation.get_subfeatures(*x.genes)]
[['AT5G66900.1']]
>>> [y.get_attr("ID") for y in my_minorg.reference["TAIR10"].annotation.get_subfeatures(*x.genes)]
[['AL8G44500.t1.v2.1']]
>>> my_minorg.query_reference = True
>>> my_minorg.seq()
>>> my_minorg.grna()
>>> my_minorg.screen_reference = True
>>> my_minorg.filter_background()
>>> my_minorg.grna_hits.filter_seqs("background")
gRNAHits(gRNA = 352)
>>> my_minorg.filter_gc()
>>> my_minorg.grna_hits.filter_seqs("GC")
gRNAHits(gRNA = 370)
>>> my_minorg.grna_hits.filter_seqs("background", "GC")
gRNAHits(gRNA = 321)
>>> my_minorg.filter_feature() ## by default, MINORg only retains gRNA in CDS
Max acceptable insertion length: 15
>>> my_minorg.grna_hits.filter_hits("feature")
gRNAHits(gRNA = 344)
>>> my_minorg.valid_grna()
gRNAHits(gRNA = 278)
>>> my_minorg.minimumset()
>>> my_minorg.resolve()

directory[source]

[general] final output directory

Type: str

auto_update_files[source]

[general] whether to update gRNA .map and FASTA files automatically after a subcommand is called

Type: bool

prefix[source]

[general] prefix for output directories and files

Type: str

thread[source]

[general] maximum number of threads for parallel processing

Type: int

blastn[source]

[executable] path to blastn executable

Type: str

rpsblast[source]

[executable] path to rpsblast or rpsblast+ executable

Type: str

mafft[source]

[executable] path to mafft executable

Type: str

bedtools[source]

[executable] path to directory containing BEDTools executables. Use ONLY IF BEDTools is not in your command-search path.

Type: str

db[source]

[RPS-BLAST option] path to local RPS-BLAST database

Type: str

remote_rps[source]

[RPS-BLAST option] use remote database instead of local database for RPS-BLAST

Type: bool

pssm_ids[source]

[seq] list of Pssm-Ids of domain(s) for homologue search. If multiple Pssm-Ids are provided, overlapping domains will be merged.

Type: list of str

domain_name[source]

human-readable domain name used in sequence and file names in place of Pssm-Ids

Type: str

genes[source]

[seq] list of target gene IDs

Type: list or str

query_reference[source]

[seq] include reference genes as targets

Type: bool

minlen[source]

[seq: homologue] minimum homologue length (bp)

Type: int

minid[source]

[seq: homologue] minimum hit % identity

Type: float

mincdslen[source]

[seq: homologue] minimum number of bases in homologue aligned with reference gene CDS

Type: int

merge_within[source]

[seq: homologue] maximum distance (bp) between hits for merging

Type: int

check_recip[source]

[seq: homologue] execute reciprocal check

Type: bool

relax_recip[source]

[seq: homologue] execute relaxed reciprocal check

Type: bool

check_id_before_merge[source]

[seq: homologue] filter out hits by % identity before merging into potential homologues

Type: bool

length[source]

[grna] gRNA length (bp)

Type: int

pam[source]

[grna] PAM pattern

Type: PAM

screen_reference[source]

[filter: background] include reference genome for screening

Type: bool

ot_pamless[source]

[filter: background] ignore absence of PAM when assessing off-target gRNA hits

Type: bool

offtarget[source]

[filter: background] function that accepts Biopython’s HSP and QueryResult objects and determines whether an off-target gRNA hit is problematic. If not set by user, ot_pattern will be used. If both offtarget and ot_pattern are not set, ot_mismatch and ot_gap will be used. ## TODO!!! integrate this somehow?

Type: func

ot_pattern[source]

[filter: background] pattern specifying maximum number of gap(s) and/or mismatch(es) within given range(s) of a gRNA in off-target regions to disqualify the gRNA

Type: OffTargetExpression

ot_unaligned_as_mismatch[source]

[filter: background] treat unaligned positions as mismatches (used with ot_pattern)

Type: bool

ot_unaligned_as_gap[source]

[filter: background] treat unaligned positions as gaps (specifically as insertions; used with ot_pattern)

Type: bool

ot_mismatch[source]

[filter: background] minimum number of mismatches allowed for off-target gRNA hits

Type: int

ot_gap[source]

[filter: background] minimum number of gaps allowed for off-target gRNA hits

Type: int

mask[source]

[filter: background] FASTA file of additional sequences to mask in background

Type: list

gc_min[source]

[filter: GC] minimum GC content (between 0 and 1, where 0 is no GC content and 1 is all GC)

Type: float

gc_max[source]

[filter: GC] maximum GC content (betweew 0 and 1, where 0 is no GC content and 1 is all GC)

Type: float

feature[source]

[filter: feature] GFF3 feature within which gRNA are to be designed (default=”CDS”)

Type: str

max_insertion[source]

[filter: feature] maximum allowable insertion size in feature (bp)

Type: int

min_within_n[source]

[filter: feature] minimum number of reference genes which feature a gRNA must align within

Type: int

min_within_fraction[source]

[filter: feature] minimum fraction of reference genes which feature a gRNA must align within (between 0 and 1, where 0 is none and 1 is all; if 0, min_within_n will be set to 1)

Type: float

exclude[source]

[filter: exclude] path to FASTA file containing gRNA sequences to exclude

Type: str

sets[source]

[minimumset] number of sets to generate

Type: int

auto[source]

[minimumset] generate sets without requiring manual user confirmation for each set

Type: bool

accept_invalid[source]

[minimumset] score ‘NA’ as ‘pass’

Type: bool

accept_feature_unknown[source]

[minimumset] score ‘NA’ as ‘pass’ for feature check

Type: bool

accept_invalid_field[source]

[minimumset] score ‘NA’ as ‘pass’ if all entries for a check are ‘NA’

Type: bool

pass_map[source]

[minimumset] path to output .map file for gRNA that pass all valid checks (autogenerated by MINORg if not provided)

Type: str

pass_fasta[source]

[minimumset] path to output .fasta file for gRNA that pass all valid checks (autogenerated by MINORg if not provided)

Type: str

final_map[source]

[minimumset] path to output .map file for final gRNA set(s) (autogenerated by MINORg if not provided)

Type: str

final_fasta[source]

[minimumset] path to output .fasta file for final gRNA set(s) (autogenerated by MINORg if not provided)

Type: str

__init__(directory=None, config=None, prefix='minorg', thread=None, keep_tmp=False, cli=False, auto_update_files=True, **kwargs)[source]

Create a MINORg object.

Parameters

directory (str) – path to output directory
config (str) – path to config.ini file
prefix (str) – prefix for output directories and files (default=’minorg’)
thread (int) – maximum number of threads for parallel processing
keep_tmp (bool) – retain temporary files
cli (bool) – whether MINORg is called at command line. Not currently used, but may be used in future to determine what to print.
auto_update_files (bool) – whether to update gRNA .map and FASTA files automatically after a subcommand is called
**kwargs – other arguments supplied to parent class PathHandler

Returns

a MINORg object

Return type

MINORg

property query_map[source]

Mapping of alias to FASTA for queries.

Getter: Returns list of (<alias>, </path/to/FASTA>)
Type: list

property reference[source]

property assemblies[source]

Reference assemblies

Getter: Returns {<alias>: </path/to/FASTA>}
Type: dict

property annotations[source]

Reference annotations

Getter: Returns {<alias>: </path/to/GFF3>}
Type: dict

property attr_mods[source]

Attribute modifications

Getter: Returns {<reference alias>: <dict of attribute modifications>}
Type: dict

property features[source]

Target features

Getter: Returns list of target feature(s)
Type: list

property passed_grna[source]

gRNA that have passed background, GC, and feature checks

Relevant attributes: accept_invalid, accept_invalid_field

Getter: Returns gRNA that have passed standard checks
Type: gRNAHits

property genes[source]

Target gene IDs

Getter: Returns list of target gene IDs (str)
Type: list

property domain_name[source]

Domain name for use in sequence IDs and file names

Getter: Returns domain name if set, else ‘gene’ if not MINORg.pssm_ids, else dash-separated MINORg.pssm_ids
Setter: Sets domain name
Type: str

property offtarget[source]

Function to assess off-target goodness of alignment

Getter: Returns function that accepts HSP and QueryResult
Type: func

property pam[source]

PAM pattern

Getter: Returns expanded PAM pattern (includes explicit position of gRNA; e.g. ‘.NGG’)
Setter: Sets PAM pattern and parses it into expanded PAM pattern
Type: str

property length[source]

gRNA length

Getter: Returns gRNA length (bp)
Setter: Sets gRNA length and updates MINORg.PAM
Type: int

property feature[source]

Target feature

Getter: Returns first feature if multiple have been provided
Setter: Sets target feature
Type: str

property check_recip[source]

Whether to execute reciprocal check

Getter: Returns True if MINORg.check_recip=True OR MINORg.relax_recip=True
Setter: Sets this check
Type: bool

property ot_pam[source]

Remove gRNA from candidates if off-target hit has PAM site nearby

Getter: Returns whether to consider presence of PAM site for off-target filtering
Type: bool

property ot_pattern[source]

Off-target mismatch/gap pattern

Getter: Returns OffTargetExpression object
Type: :class:~minorg.ot_regex.OffTargetExpression

property ot_unaligned_as_mismatch[source]

Treat unaligned positions as mismatch (used with ot_pattern)

Getter: Returns whether to count unaligned positions as mismatch
Type: bool

property ot_unaligned_as_gap[source]

Treat unaligned positions as gaps (specifically insertions; used with ot_pattern)

Getter: Returns whether to count unaligned positions as gaps
Type: bool

property prioritise_nr[source]

Prioritise non-redundancy for minimum set generation

Getter: Returns whether non-redundancy is prioritised
Type: bool

property prioritize_nr[source]

Prioritise non-redundancy for minimum set generation

Getter: Returns whether non-redundancy is prioritised
Type: bool

property prioritise_pos[source]

Prioritise position (i.e. 5’ gRNA) for minimum set generation

Getter: Returns whether position (i.e. 5’ gRNA) is prioritised
Type: bool

property prioritize_pos[source]

Prioritise position (i.e. 5’ gRNA) for minimum set generation

Getter: Returns whether position (i.e. 5’ gRNA) is prioritised
Type: bool

mkfname(*path, tmp=False, prefix=True)[source]

Generate new file name.

If path provided is not absolute, self.active_directory is assumed to be the root directory.

Parameters

*path (str) – required, path to output file (e.g. self.mkfname(‘tmp’, ‘tmp.fasta’))
tmp (bool) – mark file as temporary (for deletion when self.rm_tmpfiles is called)
prefix (bool) – prefix self.prefix to basename

results_fname(*path, reserve=False, newfile=False, **kwargs)[source]

Generate new file name in results directory (<self.active_directory>/<self.prefix>).

Operates exactly as MINORg.mkfname(), with the additional options of creating an empty file or clearing an existing file.

Parameters

*path (str) – path to output file
reserve (bool) – create empty file at destination if it does not exist
newfile (bool) – clear an existing file at destination if it already exists
**kwargs – other arguments supplied to MINORg.mkfname()

resolve() → None[source]: Wrapper for super().resolve() that also updates stored filenames. Removes logfile if not in CLI mode.

get_ref_seqid(seqid, attr)[source]

Extract attribute of sequence from sequence name (for sequences generated by self.target)

Parameters

seqid (str) – required, sequence name
attr (str) – required, attribute to return. Valid attributes: ‘gene’, ‘feature’, ‘source’.

Returns

str

valid_grna(*check_names, accept_invalid=None, accept_invalid_field=None)[source]

gRNA that have passed background, GC, and feature checks, as well as any user-set checks

Relevant attributes: accept_invalid, accept_invalid_field

Parameters

*check_names (str) –

optional, checks to include.

If not specified: all checks, both standard and non-standard, will be used.

If ‘standard’: only background, GC, and feature checks will be used. Can be used in combination with non-standard check names.

If ‘nonstandard’: only non-standard checks will be used. Can be used in combination with standard check names.

Returns

gRNAHits

check_names(seq=True, hit=True, standard=True, nonstandard=True) → list[source]

Get gRNA check names

Parameters

seq (bool) – get check names for gRNA sequences (gRNASeq objects)
hit (bool) – get check names for gRNA hits (gRNAHit objects)
standard (bool) – get standard check names
nonstandard (bool) – get nonstandard check names

Returns

List of check names (str)

Return type

list

add_query(fname, alias=None)[source]

Add file to be queried.

Parameters

fname (str) – required, path to file
alias (str) – optional, alias for query file

remove_query(alias)[source]

Remove file for querying.

Parameters: alias (str) – required, alias of query file to remove

add_background(fname, alias=None)[source]

Add file for background filter.

Parameters

fname (str) – required, path to file
alias (str) – optional, alias for background file. Used when writing mask report and for removing background.

remove_background(alias)[source]

Remove file for background filter.

Parameters: alias (str) – required, alias of background file to remove

parse_grna_map_from_file(fname)[source]

Read candidate gRNA from .map file output by MINORg.

Parameters: fname (str) – required, path to MINORg .map file

rename_grna(fasta)[source]

Rename candidate gRNA according to FASTA file.

Parameters: fasta (str) – required, path to FASTA file

write_all_grna_fasta(fout=None)[source]

Write FASTA file of all candidate gRNA.

Parameters: fout (str) – optional, absolute path to output file. Autogenerated using self.prefix and self.active_directory if not provided.

write_all_grna_map(fout=None, write_checks=True)[source]

Write .map file of all candidate gRNA.

Parameters

fout (str) – optional, absolute path to output file. Autogenerated using self.prefix and self.active_directory if not provided.
write_checks (bool) – write all check statuses

write_all_grna_eqv(fout=None)[source]

Write .eqv file (grouping gRNA by equivalent coverage) of all candidate gRNA.

Parameters: fout (str) – optional, absolute path to output file. Autogenerated using self.prefix and self.active_directory if not provided.

write_pass_grna_fasta(fout=None)[source]

Write FASTA file of gRNA that pass all valid checks.

Parameters: fout (str) – optional, absolute path to output file. Autogenerated using self.prefix and self.active_directory if not provided.

write_pass_grna_map(fout=None)[source]

Write .map file of gRNA that pass all valid checks.

Parameters: fout (str) – optional, absolute path to output file. Autogenerated using self.prefix and self.active_directory if not provided.

write_pass_grna_eqv(fout=None)[source]

Write .eqv file (grouping gRNA by equivalent coverage) of gRNA that pass all valid checks.

Parameters: fout (str) – optional, absolute path to output file. Autogenerated using self.prefix and self.active_directory if not provided.

write_pass_grna_files(fasta=None, map=None, eqv=None)[source]

add_reference(fasta, annotation, alias=None, genetic_code=1, attr_mod={}, memsave=True, replace=False)[source]

Add reference genome.

Parameters

fasta (str) – required, path to fasta file
annotation (str) – required path to GFF3 file
alias (str) – optional, name of fasta-annotation combo
genetic_code (str or int) – NCBI genetic code (default=1)
attr_mod (dict) – mapping for non-standard attribute field names in GFF3 file (default={})
replace (bool) – replace any existing reference with same alias (default=False)

remove_reference(alias)[source]

Remove reference genome.

Parameters: alias (str) – required, alias of reference genome

clear_reference()[source]: Remove all reference genomes.

subset_annotation(quiet=True, preserve_order=True)[source]

Subset annotations of all reference genomes according to self.genes.

Reduces annotation lookup time.

Parameters

quiet (bool) – silence printing of non-essential messages
sort (bool) – sort subset data

extend_reference(ext_gene, ext_cds)[source]

Extend reference genomes from FASTA files of gene and CDS sequences.

Sequence IDs of gene sequences in ext_gene file(s) should not be repeated and should not contain ‘|’. Sequences in ext_cds file(s) should be named <sequence ID in ext_gene file(s)>.<int> to indicate their parent gene. Corresponding gene and CDS sequences will be aligned with mafft for inference of CDS range in genes.

Parameters

ext_gene (str or list) – required, path to file or list of paths to files of gene sequences
ext_cds (str or list) – required, path to file or list of paths to files of CDS sequences

Get sequence(s) of reference genes or isoforms.

If self.pssm_ids is provided, sequences are restricted to the relevant domain(s).

Parameters

*features (str) – optional, GFF3 feature type(s) to retrieve If not specified, sequence of the gene or isoform will be retrieved directly.
ref (str) – optional, alias of reference genome from which to extract sequences. If not specified, all reference genomes will be searched for genes.
seqid_template (str) –
optional, template for output sequence name. Template will be parsed by strings.Template. The default template is “Reference|$source|$gene|$isoform|$feature|$n|$complete”.
- $source: reference genome alias
- $gene: gene/isoform ID
- $isoform: isoform ID if by_gene = False, else same as $gene
- $feature: GFF3 feature type
- $n: if multiple domains are present, they will be numbered according to proximity to 5’ of sense strand
- $complete: ‘complete’ if complete=True else ‘stitched’
adj_dir (bool) – output sense strand
by_gene (bool) – merge sequences from all isoforms of a gene
isoform_lvl (bool) – if GFF features ‘mRNA’ or ‘protein’ specified, sequences will be separated by mRNA/protein isoforms
complete (bool) – merge output range(s) together into single range. Output will stll be a list of tuple. (e.g. [(<smallest start>, <largest end>)])
translate (bool) – translate sequence. Should be used with adj_dir.
fout (str) – optional, path to output file if features is not specified.
**fouts (str) – optional, path to output files for each feature if features is specified. Example: self.get_reference_seq(“CDS”, CDS = <path to file>) returns dict AND writes to file

Returns

sequence(s) of reference genes or isoforms (specified in self.genes), grouped by feature.: Format: {<feature>: {<seqid>: <seq>}}

Return type

dict

generate_ref_gene_cds(ref_dir='ref', quiet=True, domain_name=None)[source]

Get and store self.genes’ gene and CDS sequences, and GFF data for domains.

$source: reference genome alias

$domain: PSSM ID or domain name (‘gene’ if not specified)

$n: if multiple domains are present, they will be numbered according to proximity to 5’ of sense strand

$feature: GFF3 feature type

$complete: ‘complete’ if sequence includes intervening sequences not of feature type $feature. ‘stitched’ if sequence is concatenated from features of feature type $feature.

$gene: gene/isoform ID

$range: range of sequence in gene

seq(minlen=None, minid=None, mincdslen=None, quiet=True, check_recip=None, relax_recip=None, check_id_before_merge=None)[source]

Identify targets and generate self.target file based on self’s attributes.

All arguments are optional. If not provided, the corresponding value stored in self’s attributes will be used.

If self.pssm_ids has been provided, either 1) self.rps_hits OR 2) self.db AND self.rpsblast are required.

>>> my_minorg = MINORg(directory = "/my/output/directory",
                       prefix = "test", tmp = False, keep_tmp = True, thread = 1)
>>> my_minorg.add_reference("/path/to/TAIR10_Chr.all.fasta", "/path/to/TAIR10_GFF3.genes.gff", alias = "TAIR10", replace = True)
>>> my_minorg.genes = ["AT5G66900"]
>>> my_minorg.subset_annotation()
>>> my_minorg.pssm_ids = "366375" # NB-ARC domain
>>> my_minorg.query_reference = True
>>> my_minorg.seq()
Exception: If self.pssm_ids has been provided, self.db (path to RPS-BLAST database) AND self.rpsblast (path to rpsblast or rpsblast+ executable OR command name if available at command line) are required
>>> my_minorg.db = "/path/to/rpsblast/database"
>>> my_minorg.rpslast = "rpsblast+"
>>> my_minorg.seq()
>>> my_minorg.target
'/my/output/directory/test/test_3663775_targets.fasta'

Parameters

minlen (int) – optional. See attributes.
minid (float) – optional. See attributes.
mincdslen (in) – optional. See attributes.
check_recip (bool) – optional. See attributes.
relax_recip (bool) – optional. See attributes.
check_id_before_merge (bool) – optional. See attributes.

grna()[source]: Generate all possible gRNA from self.target file based on self’s attributes: pam, length.

write_mask_report(fout)[source]

Write mask report to file.

Details masked regions as well as which sequence each region is identical to.

Parameters: fout (str) – required, path to output file

is_offtarget_pos(hsp, ifasta)[source]

Check whether gRNA hit is outside masked target regions.

Parameters

hsp (Biopython HSP) – required
ifasta (IndexedFasta) – required, subject of BLAST search

Returns

whether a HSP hit is outside of a masked region

Return type

bool

is_offtarget_aln(hsp, query_result, **kwargs)[source]

Check whether (potentially off-target) gRNA hit aligns too well.

Used in combination with is_offtarget_pos to determine if an off-target gRNA hit could be problematic.

Parameters

hsp (Biopython HSP) – required
query_result (Biopython QueryResult) – required
**kwargs – other arguments supplied passed to MINORg._is_offtarget_aln()

Returns

whether a HSP hit meets the threshold for problematic off-target gRNA hit

Return type

bool

is_offtarget_pam(hsp, query_result, ifasta)[source]

Check for presence of PAM in close proximity and correct orientation to gRNA hit.

Parameters

hsp (Biopython HSP) – required
query_result (Biopython QueryResult) – required
ifasta (IndexedFasta) – required

Returns

whether a HSP hit is in close proximity and correct orientation to a PAM site

Return type

bool

is_offtarget_hit(hsp, query_result, ifasta)[source]

Check whether a gRNA hit aligns too well outside of masked target regions.

Determined by is_offtarget_pos AND is_offtarget_aln AND is_offtarget_pam for a given HSP, QueryResult, and FASTA file combination.

Parameters

hsp (Biopython HSP) – required
query_result (Biopython QueryResult) – required
ifasta (IndexedFasta) – required

Returns

whether a gRNA hit is off-target and problematic

Return type

bool

filter_background(*other_mask_fnames, keep_blast_output=None, mask_reference=True)[source]

Set background filter check for candidate gRNAs.

Masks target sequences in all FASTA files to be screened for off-target, BLASTs all candidate gRNAs to those FASTA files, and assesses each BLAST hit individually for whether they could potentially be problematic.

Relevant attributes: screen_reference, ot_pamless, ot_mismatch, ot_gap

Relevant methods: add_background, remove_background

Parameters

*other_mask_fnames (str) – optional, paths to other FASTA files not in self.background that are also to be screened for off-target
keep_blast_output (bool) – retain BLAST output file. If not provided, defaults to self.keep_tmp. False deletes it.
mask_reference (bool) – mask reference genes (default=True)

filter_gc(gc_min=None, gc_max=None)[source]

Set GC check for candidate gRNA.

Parameters

gc_min (float) – optional. See attributes.
gc_max (float) – optional. See attributes.

align_reference(fout)[source]

Align reference genes.

CDS are first aligned. Full genes are then added to the alignment.

Parameters: fout (str) – required, path to output FASTA file

align_reference_and_targets(domain_name=None, realign_only_if_updated=True)[source]

Align reference genes and targets. Path to FASTA file generated will be stored in self.alignment.

Parameters: domain_name (str) – optional, domain name to be used when naming output file

filter_feature(max_insertion=None, min_within_n=None, min_within_fraction=None, alignment_rvs_pattern='^_R_$seqid$$')[source]

Set feature check for candidate gRNAs.

Range(s) for desired feature of inferred homologue targets discoverd by MINORg will be inferred based on alignment with reference genes based on self’s attributes: max_insertion

Parameters

max_insertion (int) – optional. See attributes.
min_within_n (int) – optional. See attributes.
min_within_fraction (float) – optional. See attributes.

filter_exclude()[source]

Set exclude check for candidate gRNAs. Overwrites existing exclude check using self.exclude.

If self.exclude is set, gRNA which sequences appear in the file at self.exclude will fail this check.

filter(background_check=True, feature_check=True, gc_check=True)[source]

Execute self.filter_background, self.filter_feature, and self.filter_gc based on self’s attributes.

Parameters

background_check (bool) – filter gRNA by off-target (default=True)
feature_check (bool) – filter gRNA by within-feature (default=True)
gc_check (bool) – filter gRNA by GC content (default=True)

minimumset(sets=None, manual=None, fasta=None, fout_fasta=None, fout_map=None, fout_eqv=None, report_full_path=True, all_checks=True, nonstandard_checks=[], exclude_check=True, gc_check=True, background_check=True, feature_check=True)[source]

Generate minimum set(s) of gRNA required to cover all targets.

Parameters

sets (int) – optional. See attributes.
manual (bool) – manually approve all gRNA sets. Defaults to NOT self.auto if not provided.
fasta (str) – optional, path to FASTA file. Used for renaming gRNA.
fout_fasta (str) – optional, path to output FASTA file. Autogenerated using self.prefix and self.active_directory if not provided.
fout_map (str) – optional, path to output .map file. Autogenerated using self.prefix and self.active_directory if not provided.
fout_eqv (str) – optional, path to output .eqv file of passing gRNA. Autogenerated using self.prefix and self.active_directory if not provided.
report_full_path (bool) – print full path to output files upon successful writing
all_checks (bool) – include all checks for filtering (including any in user-added columns)
nonstandard_checks (list) – non-standard checks to include for filtering (subject to self.accept_invalid_field)
exclude_check (bool) – include ‘exclude’ field for filtering (subject to self.accept_invalid_field)
gc_check (bool) – include ‘GC’ field for filtering (subject to self.accept_invalid_field)
background_check (bool) – include ‘background’ field for filtering (subject to self.accept_invalid_field)
feature_check (bool) – include ‘feature’ field for filtering (subject to self.accept_invalid_field)

full(manual=None, background_check=True, feature_check=True, gc_check=True)[source]

Execute full MINORg programme using self’s attributes as parameter values.

Parameters

manual (bool) – manually approve all gRNA sets. Defaults to NOT self.auto if not provided.
background_check (bool) – filter gRNA by off-target (default=True)
feature_check (bool) – filter gRNA by within-feature (default=True)
gc_check (bool) – filter gRNA by GC content (default=True)