minorg.MINORg module

minorg.MINORg.parse_lookup(iterable, lookup, return_first=False)[source]

commas not allowed in values

minorg.MINORg.valid_readable_file(pathname)[source]
minorg.MINORg.valid_aliases(aliases, lookup: dict, raise_error: bool = True, message: Optional[str] = None, param: Optional[minorg.parse_config.Param] = None, none_value=None, all_value=None, clear_value=None, display_cmd=None, additional_message=None, return_mapping=False)[source]

Generates appropriate error + message if alias(es) is/are invalid. Throws the error generated if requested (i.e. raise_error = True). Requires alias(es) (str of single alias or iterable of multiple aliases) + set lookup dictionary

class minorg.MINORg.PathHandler(tmp=False, keep_tmp=False, directory=None)[source]

Bases: object

Tracks new files/directories and temporary files.

tmp[source]

whether temporary directory is used

Type

bool

keep_tmp[source]

whether to retain temporary files/directories

Type

bool

out_dir[source]

absolute path to final output directory

Type

str

new_dirs[source]

paths of newly created directories

Type

list of str

tmp_files[source]

paths of temporary files/directories

Type

list of str

tmp_dir[source]

path of temporary directory

Type

str

directory[source]

path of directory being written into. If tmp=True, this will point to a temporary directory. Else, this will be the output directory.

Type

str

__init__(tmp=False, keep_tmp=False, directory=None)[source]

Create a PathHandler object.

Parameters
  • tmp (bool) – write files to temporary directory (to be deleted or moved to final output directory using resolve)

  • keep_tmp (bool) – retain temporary files/directories

  • directory (str) – final output directory (will be directly written into if tmp=False)

property tracebacklimit[source]
mkdir(*directories, tmp=False)[source]

Create directory/directories.

Parameters
  • *directories (str) – path

  • tmp (bool) – mark directory as temporary (for deletion when self.rm_tmpfiles is called)

mkfname(*path, tmp=False)[source]

Generate path.

If path provided is not absolute, self.active_directory is assumed to be the root directory.

Parameters
  • *path (str) – required, path to output file (e.g. self.mkfname(‘tmp’, ‘tmp.fasta’) –> <self.active_directory>/tmp/tmp.fasta)

  • tmp (bool) – mark file as temporary (for deletion when self.rm_tmpfiles is called)

Returns

path

Return type

str

reserve_fname(*path, tmp=False, newfile=False, **kwargs)[source]

Generate new file.

Operates exactly as PathHandler.mkfname(), with the additional options of creating an empty file or clearing an existing file.

Parameters
  • *path (str) – path to output file

  • tmp (bool) – mark file as temporary (for deletion when self.rm_tmpfiles is called)

  • newfile (bool) – clear an existing file at destination if it already exists

  • **kwargs – other arguments to pass to self.mkfname

Returns

path

Return type

str

rm_tmpfiles(*fnames)[source]

Delete all files and directories marked as temporary (in self.tmp_files).

Directories will only be deleted if they are empty.

Parameters

*fnames (str) – optional, path. If provided, only the specified files/directories will be deleted if they are also marked as temporary. Otherwise, all temporary files and directories are deleted.

resolve()[source]

If self.tmp, move output files to final output directory.

class minorg.MINORg.MINORg(directory=None, config=None, prefix='minorg', thread=None, keep_tmp=False, cli=False, auto_update_files=True, **kwargs)[source]

Bases: minorg.MINORg.PathHandler

Tracks parameters, intermediate files, and output file for reuse.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/my/output/directory",
                       prefix = "test", tmp = False, keep_tmp = True, thread = 1)
>>> my_minorg.add_reference("/path/to/TAIR10_Chr.all.fasta", "/path/to/TAIR10_GFF3.genes.gff", alias = "TAIR10", replace = True)
>>> my_minorg.add_reference("/path/to/Alyrata_384_v1.fa", "/path/to/Alyrata_384_v2.1.gene.gff3", alias = "araly2")
>>> my_minorg.genes = ["AT5G66900", "AL8G44500.v2.1"]
>>> my_minorg.subset_annotation()
>>> [y.get_attr("ID") for y in my_minorg.reference["TAIR10"].annotation.get_subfeatures(*x.genes)]
[['AT5G66900.1']]
>>> [y.get_attr("ID") for y in my_minorg.reference["TAIR10"].annotation.get_subfeatures(*x.genes)]
[['AL8G44500.t1.v2.1']]
>>> my_minorg.query_reference = True
>>> my_minorg.seq()
>>> my_minorg.grna()
>>> my_minorg.screen_reference = True
>>> my_minorg.filter_background()
>>> my_minorg.grna_hits.filter_seqs("background")
gRNAHits(gRNA = 352)
>>> my_minorg.filter_gc()
>>> my_minorg.grna_hits.filter_seqs("GC")
gRNAHits(gRNA = 370)
>>> my_minorg.grna_hits.filter_seqs("background", "GC")
gRNAHits(gRNA = 321)
>>> my_minorg.filter_feature() ## by default, MINORg only retains gRNA in CDS
Max acceptable insertion length: 15
>>> my_minorg.grna_hits.filter_hits("feature")
gRNAHits(gRNA = 344)
>>> my_minorg.valid_grna()
gRNAHits(gRNA = 278)
>>> my_minorg.minimumset()
>>> my_minorg.resolve()
directory[source]

[general] final output directory

Type

str

auto_update_files[source]

[general] whether to update gRNA .map and FASTA files automatically after a subcommand is called

Type

bool

prefix[source]

[general] prefix for output directories and files

Type

str

thread[source]

[general] maximum number of threads for parallel processing

Type

int

blastn[source]

[executable] path to blastn executable

Type

str

rpsblast[source]

[executable] path to rpsblast or rpsblast+ executable

Type

str

mafft[source]

[executable] path to mafft executable

Type

str

bedtools[source]

[executable] path to directory containing BEDTools executables. Use ONLY IF BEDTools is not in your command-search path.

Type

str

db[source]

[RPS-BLAST option] path to local RPS-BLAST database

Type

str

remote_rps[source]

[RPS-BLAST option] use remote database instead of local database for RPS-BLAST

Type

bool

pssm_ids[source]

[seq] list of Pssm-Ids of domain(s) for homologue search. If multiple Pssm-Ids are provided, overlapping domains will be merged.

Type

list of str

domain_name[source]

human-readable domain name used in sequence and file names in place of Pssm-Ids

Type

str

genes[source]

[seq] list of target gene IDs

Type

list or str

query_reference[source]

[seq] include reference genes as targets

Type

bool

minlen[source]

[seq: homologue] minimum homologue length (bp)

Type

int

minid[source]

[seq: homologue] minimum hit % identity

Type

float

mincdslen[source]

[seq: homologue] minimum number of bases in homologue aligned with reference gene CDS

Type

int

merge_within[source]

[seq: homologue] maximum distance (bp) between hits for merging

Type

int

check_recip[source]

[seq: homologue] execute reciprocal check

Type

bool

relax_recip[source]

[seq: homologue] execute relaxed reciprocal check

Type

bool

check_id_before_merge[source]

[seq: homologue] filter out hits by % identity before merging into potential homologues

Type

bool

length[source]

[grna] gRNA length (bp)

Type

int

pam[source]

[grna] PAM pattern

Type

PAM

screen_reference[source]

[filter: background] include reference genome for screening

Type

bool

ot_pamless[source]

[filter: background] ignore absence of PAM when assessing off-target gRNA hits

Type

bool

offtarget[source]

[filter: background] function that accepts Biopython’s HSP and QueryResult objects and determines whether an off-target gRNA hit is problematic. If not set by user, ot_pattern will be used. If both offtarget and ot_pattern are not set, ot_mismatch and ot_gap will be used. ## TODO!!! integrate this somehow?

Type

func

ot_pattern[source]

[filter: background] pattern specifying maximum number of gap(s) and/or mismatch(es) within given range(s) of a gRNA in off-target regions to disqualify the gRNA

Type

OffTargetExpression

ot_unaligned_as_mismatch[source]

[filter: background] treat unaligned positions as mismatches (used with ot_pattern)

Type

bool

ot_unaligned_as_gap[source]

[filter: background] treat unaligned positions as gaps (specifically as insertions; used with ot_pattern)

Type

bool

ot_mismatch[source]

[filter: background] minimum number of mismatches allowed for off-target gRNA hits

Type

int

ot_gap[source]

[filter: background] minimum number of gaps allowed for off-target gRNA hits

Type

int

mask[source]

[filter: background] FASTA file of additional sequences to mask in background

Type

list

gc_min[source]

[filter: GC] minimum GC content (between 0 and 1, where 0 is no GC content and 1 is all GC)

Type

float

gc_max[source]

[filter: GC] maximum GC content (betweew 0 and 1, where 0 is no GC content and 1 is all GC)

Type

float

feature[source]

[filter: feature] GFF3 feature within which gRNA are to be designed (default=”CDS”)

Type

str

max_insertion[source]

[filter: feature] maximum allowable insertion size in feature (bp)

Type

int

min_within_n[source]

[filter: feature] minimum number of reference genes which feature a gRNA must align within

Type

int

min_within_fraction[source]

[filter: feature] minimum fraction of reference genes which feature a gRNA must align within (between 0 and 1, where 0 is none and 1 is all; if 0, min_within_n will be set to 1)

Type

float

exclude[source]

[filter: exclude] path to FASTA file containing gRNA sequences to exclude

Type

str

sets[source]

[minimumset] number of sets to generate

Type

int

auto[source]

[minimumset] generate sets without requiring manual user confirmation for each set

Type

bool

accept_invalid[source]

[minimumset] score ‘NA’ as ‘pass’

Type

bool

accept_feature_unknown[source]

[minimumset] score ‘NA’ as ‘pass’ for feature check

Type

bool

accept_invalid_field[source]

[minimumset] score ‘NA’ as ‘pass’ if all entries for a check are ‘NA’

Type

bool

pass_map[source]

[minimumset] path to output .map file for gRNA that pass all valid checks (autogenerated by MINORg if not provided)

Type

str

pass_fasta[source]

[minimumset] path to output .fasta file for gRNA that pass all valid checks (autogenerated by MINORg if not provided)

Type

str

final_map[source]

[minimumset] path to output .map file for final gRNA set(s) (autogenerated by MINORg if not provided)

Type

str

final_fasta[source]

[minimumset] path to output .fasta file for final gRNA set(s) (autogenerated by MINORg if not provided)

Type

str

__init__(directory=None, config=None, prefix='minorg', thread=None, keep_tmp=False, cli=False, auto_update_files=True, **kwargs)[source]

Create a MINORg object.

Parameters
  • directory (str) – path to output directory

  • config (str) – path to config.ini file

  • prefix (str) – prefix for output directories and files (default=’minorg’)

  • thread (int) – maximum number of threads for parallel processing

  • keep_tmp (bool) – retain temporary files

  • cli (bool) – whether MINORg is called at command line. Not currently used, but may be used in future to determine what to print.

  • auto_update_files (bool) – whether to update gRNA .map and FASTA files automatically after a subcommand is called

  • **kwargs – other arguments supplied to parent class PathHandler

Returns

a MINORg object

Return type

MINORg

property query_map[source]

Mapping of alias to FASTA for queries.

Getter

Returns list of (<alias>, </path/to/FASTA>)

Type

list

property reference[source]
property assemblies[source]

Reference assemblies

Getter

Returns {<alias>: </path/to/FASTA>}

Type

dict

property annotations[source]

Reference annotations

Getter

Returns {<alias>: </path/to/GFF3>}

Type

dict

property attr_mods[source]

Attribute modifications

Getter

Returns {<reference alias>: <dict of attribute modifications>}

Type

dict

property features[source]

Target features

Getter

Returns list of target feature(s)

Type

list

property passed_grna[source]

gRNA that have passed background, GC, and feature checks

Relevant attributes: accept_invalid, accept_invalid_field

Getter

Returns gRNA that have passed standard checks

Type

gRNAHits

property genes[source]

Target gene IDs

Getter

Returns list of target gene IDs (str)

Type

list

property domain_name[source]

Domain name for use in sequence IDs and file names

Getter

Returns domain name if set, else ‘gene’ if not MINORg.pssm_ids, else dash-separated MINORg.pssm_ids

Setter

Sets domain name

Type

str

property offtarget[source]

Function to assess off-target goodness of alignment

Getter

Returns function that accepts HSP and QueryResult

Type

func

property pam[source]

PAM pattern

Getter

Returns expanded PAM pattern (includes explicit position of gRNA; e.g. ‘.NGG’)

Setter

Sets PAM pattern and parses it into expanded PAM pattern

Type

str

property length[source]

gRNA length

Getter

Returns gRNA length (bp)

Setter

Sets gRNA length and updates MINORg.PAM

Type

int

property feature[source]

Target feature

Getter

Returns first feature if multiple have been provided

Setter

Sets target feature

Type

str

property check_recip[source]

Whether to execute reciprocal check

Getter

Returns True if MINORg.check_recip=True OR MINORg.relax_recip=True

Setter

Sets this check

Type

bool

property ot_pam[source]

Remove gRNA from candidates if off-target hit has PAM site nearby

Getter

Returns whether to consider presence of PAM site for off-target filtering

Type

bool

property ot_pattern[source]

Off-target mismatch/gap pattern

Getter

Returns OffTargetExpression object

Type

:class:~minorg.ot_regex.OffTargetExpression

property ot_unaligned_as_mismatch[source]

Treat unaligned positions as mismatch (used with ot_pattern)

Getter

Returns whether to count unaligned positions as mismatch

Type

bool

property ot_unaligned_as_gap[source]

Treat unaligned positions as gaps (specifically insertions; used with ot_pattern)

Getter

Returns whether to count unaligned positions as gaps

Type

bool

property prioritise_nr[source]

Prioritise non-redundancy for minimum set generation

Getter

Returns whether non-redundancy is prioritised

Type

bool

property prioritize_nr[source]

Prioritise non-redundancy for minimum set generation

Getter

Returns whether non-redundancy is prioritised

Type

bool

property prioritise_pos[source]

Prioritise position (i.e. 5’ gRNA) for minimum set generation

Getter

Returns whether position (i.e. 5’ gRNA) is prioritised

Type

bool

property prioritize_pos[source]

Prioritise position (i.e. 5’ gRNA) for minimum set generation

Getter

Returns whether position (i.e. 5’ gRNA) is prioritised

Type

bool

mkfname(*path, tmp=False, prefix=True)[source]

Generate new file name.

If path provided is not absolute, self.active_directory is assumed to be the root directory.

Parameters
  • *path (str) – required, path to output file (e.g. self.mkfname(‘tmp’, ‘tmp.fasta’))

  • tmp (bool) – mark file as temporary (for deletion when self.rm_tmpfiles is called)

  • prefix (bool) – prefix self.prefix to basename

results_fname(*path, reserve=False, newfile=False, **kwargs)[source]

Generate new file name in results directory (<self.active_directory>/<self.prefix>).

Operates exactly as MINORg.mkfname(), with the additional options of creating an empty file or clearing an existing file.

Parameters
  • *path (str) – path to output file

  • reserve (bool) – create empty file at destination if it does not exist

  • newfile (bool) – clear an existing file at destination if it already exists

  • **kwargs – other arguments supplied to MINORg.mkfname()

resolve() None[source]

Wrapper for super().resolve() that also updates stored filenames. Removes logfile if not in CLI mode.

get_ref_seqid(seqid, attr)[source]

Extract attribute of sequence from sequence name (for sequences generated by self.target)

Parameters
  • seqid (str) – required, sequence name

  • attr (str) – required, attribute to return. Valid attributes: ‘gene’, ‘feature’, ‘source’.

Returns

str

valid_grna(*check_names, accept_invalid=None, accept_invalid_field=None)[source]

gRNA that have passed background, GC, and feature checks, as well as any user-set checks

Relevant attributes: accept_invalid, accept_invalid_field

Parameters

*check_names (str) –

optional, checks to include.

If not specified: all checks, both standard and non-standard, will be used.

If ‘standard’: only background, GC, and feature checks will be used. Can be used in combination with non-standard check names.

If ‘nonstandard’: only non-standard checks will be used. Can be used in combination with standard check names.

Returns

gRNAHits

check_names(seq=True, hit=True, standard=True, nonstandard=True) list[source]

Get gRNA check names

Parameters
  • seq (bool) – get check names for gRNA sequences (gRNASeq objects)

  • hit (bool) – get check names for gRNA hits (gRNAHit objects)

  • standard (bool) – get standard check names

  • nonstandard (bool) – get nonstandard check names

Returns

List of check names (str)

Return type

list

add_query(fname, alias=None)[source]

Add file to be queried.

Parameters
  • fname (str) – required, path to file

  • alias (str) – optional, alias for query file

remove_query(alias)[source]

Remove file for querying.

Parameters

alias (str) – required, alias of query file to remove

add_background(fname, alias=None)[source]

Add file for background filter.

Parameters
  • fname (str) – required, path to file

  • alias (str) – optional, alias for background file. Used when writing mask report and for removing background.

remove_background(alias)[source]

Remove file for background filter.

Parameters

alias (str) – required, alias of background file to remove

parse_grna_map_from_file(fname)[source]

Read candidate gRNA from .map file output by MINORg.

Parameters

fname (str) – required, path to MINORg .map file

rename_grna(fasta)[source]

Rename candidate gRNA according to FASTA file.

Parameters

fasta (str) – required, path to FASTA file

write_all_grna_fasta(fout=None)[source]

Write FASTA file of all candidate gRNA.

Parameters

fout (str) – optional, absolute path to output file. Autogenerated using self.prefix and self.active_directory if not provided.

write_all_grna_map(fout=None, write_checks=True)[source]

Write .map file of all candidate gRNA.

Parameters
  • fout (str) – optional, absolute path to output file. Autogenerated using self.prefix and self.active_directory if not provided.

  • write_checks (bool) – write all check statuses

write_all_grna_eqv(fout=None)[source]

Write .eqv file (grouping gRNA by equivalent coverage) of all candidate gRNA.

Parameters

fout (str) – optional, absolute path to output file. Autogenerated using self.prefix and self.active_directory if not provided.

write_pass_grna_fasta(fout=None)[source]

Write FASTA file of gRNA that pass all valid checks.

Parameters

fout (str) – optional, absolute path to output file. Autogenerated using self.prefix and self.active_directory if not provided.

write_pass_grna_map(fout=None)[source]

Write .map file of gRNA that pass all valid checks.

Parameters

fout (str) – optional, absolute path to output file. Autogenerated using self.prefix and self.active_directory if not provided.

write_pass_grna_eqv(fout=None)[source]

Write .eqv file (grouping gRNA by equivalent coverage) of gRNA that pass all valid checks.

Parameters

fout (str) – optional, absolute path to output file. Autogenerated using self.prefix and self.active_directory if not provided.

write_pass_grna_files(fasta=None, map=None, eqv=None)[source]
add_reference(fasta, annotation, alias=None, genetic_code=1, attr_mod={}, memsave=True, replace=False)[source]

Add reference genome.

Parameters
  • fasta (str) – required, path to fasta file

  • annotation (str) – required path to GFF3 file

  • alias (str) – optional, name of fasta-annotation combo

  • genetic_code (str or int) – NCBI genetic code (default=1)

  • attr_mod (dict) – mapping for non-standard attribute field names in GFF3 file (default={})

  • replace (bool) – replace any existing reference with same alias (default=False)

remove_reference(alias)[source]

Remove reference genome.

Parameters

alias (str) – required, alias of reference genome

clear_reference()[source]

Remove all reference genomes.

subset_annotation(quiet=True, preserve_order=True)[source]

Subset annotations of all reference genomes according to self.genes.

Reduces annotation lookup time.

Parameters
  • quiet (bool) – silence printing of non-essential messages

  • sort (bool) – sort subset data

extend_reference(ext_gene, ext_cds)[source]

Extend reference genomes from FASTA files of gene and CDS sequences.

Sequence IDs of gene sequences in ext_gene file(s) should not be repeated and should not contain ‘|’. Sequences in ext_cds file(s) should be named <sequence ID in ext_gene file(s)>.<int> to indicate their parent gene. Corresponding gene and CDS sequences will be aligned with mafft for inference of CDS range in genes.

Parameters
  • ext_gene (str or list) – required, path to file or list of paths to files of gene sequences

  • ext_cds (str or list) – required, path to file or list of paths to files of CDS sequences

get_reference_seq(*features, adj_dir=True, by_gene=False, ref=None, seqid_template='Reference|$source|$gene|$isoform|$feature|$n|$complete', translate=False, isoform_lvl=None, fout=None, complete=False, **fouts)[source]

Get sequence(s) of reference genes or isoforms.

If self.pssm_ids is provided, sequences are restricted to the relevant domain(s).

Parameters
  • *features (str) – optional, GFF3 feature type(s) to retrieve If not specified, sequence of the gene or isoform will be retrieved directly.

  • ref (str) – optional, alias of reference genome from which to extract sequences. If not specified, all reference genomes will be searched for genes.

  • seqid_template (str) –

    optional, template for output sequence name. Template will be parsed by strings.Template. The default template is “Reference|$source|$gene|$isoform|$feature|$n|$complete”.

    • $source: reference genome alias

    • $gene: gene/isoform ID

    • $isoform: isoform ID if by_gene = False, else same as $gene

    • $feature: GFF3 feature type

    • $n: if multiple domains are present, they will be numbered according to proximity to 5’ of sense strand

    • $complete: ‘complete’ if complete=True else ‘stitched’

  • adj_dir (bool) – output sense strand

  • by_gene (bool) – merge sequences from all isoforms of a gene

  • isoform_lvl (bool) – if GFF features ‘mRNA’ or ‘protein’ specified, sequences will be separated by mRNA/protein isoforms

  • complete (bool) – merge output range(s) together into single range. Output will stll be a list of tuple. (e.g. [(<smallest start>, <largest end>)])

  • translate (bool) – translate sequence. Should be used with adj_dir.

  • fout (str) – optional, path to output file if features is not specified.

  • **fouts (str) – optional, path to output files for each feature if features is specified. Example: self.get_reference_seq(“CDS”, CDS = <path to file>) returns dict AND writes to file

Returns

sequence(s) of reference genes or isoforms (specified in self.genes), grouped by feature.

Format: {<feature>: {<seqid>: <seq>}}

Return type

dict

generate_ref_gene_cds(ref_dir='ref', quiet=True, domain_name=None)[source]

Get and store self.genes’ gene and CDS sequences, and GFF data for domains.

The default sequence ID template is: “Reference|$source|$domain|$n|$feature|$complete|$gene|$range”.

  • $source: reference genome alias

  • $domain: PSSM ID or domain name (‘gene’ if not specified)

  • $n: if multiple domains are present, they will be numbered according to proximity to 5’ of sense strand

  • $feature: GFF3 feature type

  • $complete: ‘complete’ if sequence includes intervening sequences not of feature type $feature. ‘stitched’ if sequence is concatenated from features of feature type $feature.

  • $gene: gene/isoform ID

  • $range: range of sequence in gene

seq(minlen=None, minid=None, mincdslen=None, quiet=True, check_recip=None, relax_recip=None, check_id_before_merge=None)[source]

Identify targets and generate self.target file based on self’s attributes.

All arguments are optional. If not provided, the corresponding value stored in self’s attributes will be used.

If self.pssm_ids has been provided, either 1) self.rps_hits OR 2) self.db AND self.rpsblast are required.

>>> my_minorg = MINORg(directory = "/my/output/directory",
                       prefix = "test", tmp = False, keep_tmp = True, thread = 1)
>>> my_minorg.add_reference("/path/to/TAIR10_Chr.all.fasta", "/path/to/TAIR10_GFF3.genes.gff", alias = "TAIR10", replace = True)
>>> my_minorg.genes = ["AT5G66900"]
>>> my_minorg.subset_annotation()
>>> my_minorg.pssm_ids = "366375" # NB-ARC domain
>>> my_minorg.query_reference = True
>>> my_minorg.seq()
Exception: If self.pssm_ids has been provided, self.db (path to RPS-BLAST database) AND self.rpsblast (path to rpsblast or rpsblast+ executable OR command name if available at command line) are required
>>> my_minorg.db = "/path/to/rpsblast/database"
>>> my_minorg.rpslast = "rpsblast+"
>>> my_minorg.seq()
>>> my_minorg.target
'/my/output/directory/test/test_3663775_targets.fasta'
Parameters
  • minlen (int) – optional. See attributes.

  • minid (float) – optional. See attributes.

  • mincdslen (in) – optional. See attributes.

  • check_recip (bool) – optional. See attributes.

  • relax_recip (bool) – optional. See attributes.

  • check_id_before_merge (bool) – optional. See attributes.

grna()[source]

Generate all possible gRNA from self.target file based on self’s attributes: pam, length.

write_mask_report(fout)[source]

Write mask report to file.

Details masked regions as well as which sequence each region is identical to.

Parameters

fout (str) – required, path to output file

is_offtarget_pos(hsp, ifasta)[source]

Check whether gRNA hit is outside masked target regions.

Parameters
  • hsp (Biopython HSP) – required

  • ifasta (IndexedFasta) – required, subject of BLAST search

Returns

whether a HSP hit is outside of a masked region

Return type

bool

is_offtarget_aln(hsp, query_result, **kwargs)[source]

Check whether (potentially off-target) gRNA hit aligns too well.

Used in combination with is_offtarget_pos to determine if an off-target gRNA hit could be problematic.

Parameters
  • hsp (Biopython HSP) – required

  • query_result (Biopython QueryResult) – required

  • **kwargs – other arguments supplied passed to MINORg._is_offtarget_aln()

Returns

whether a HSP hit meets the threshold for problematic off-target gRNA hit

Return type

bool

is_offtarget_pam(hsp, query_result, ifasta)[source]

Check for presence of PAM in close proximity and correct orientation to gRNA hit.

Parameters
  • hsp (Biopython HSP) – required

  • query_result (Biopython QueryResult) – required

  • ifasta (IndexedFasta) – required

Returns

whether a HSP hit is in close proximity and correct orientation to a PAM site

Return type

bool

is_offtarget_hit(hsp, query_result, ifasta)[source]

Check whether a gRNA hit aligns too well outside of masked target regions.

Determined by is_offtarget_pos AND is_offtarget_aln AND is_offtarget_pam for a given HSP, QueryResult, and FASTA file combination.

Parameters
  • hsp (Biopython HSP) – required

  • query_result (Biopython QueryResult) – required

  • ifasta (IndexedFasta) – required

Returns

whether a gRNA hit is off-target and problematic

Return type

bool

filter_background(*other_mask_fnames, keep_blast_output=None, mask_reference=True)[source]

Set background filter check for candidate gRNAs.

Masks target sequences in all FASTA files to be screened for off-target, BLASTs all candidate gRNAs to those FASTA files, and assesses each BLAST hit individually for whether they could potentially be problematic.

Relevant attributes: screen_reference, ot_pamless, ot_mismatch, ot_gap

Relevant methods: add_background, remove_background

Parameters
  • *other_mask_fnames (str) – optional, paths to other FASTA files not in self.background that are also to be screened for off-target

  • keep_blast_output (bool) – retain BLAST output file. If not provided, defaults to self.keep_tmp. False deletes it.

  • mask_reference (bool) – mask reference genes (default=True)

filter_gc(gc_min=None, gc_max=None)[source]

Set GC check for candidate gRNA.

Parameters
  • gc_min (float) – optional. See attributes.

  • gc_max (float) – optional. See attributes.

align_reference(fout)[source]

Align reference genes.

CDS are first aligned. Full genes are then added to the alignment.

Parameters

fout (str) – required, path to output FASTA file

align_reference_and_targets(domain_name=None, realign_only_if_updated=True)[source]

Align reference genes and targets. Path to FASTA file generated will be stored in self.alignment.

Parameters

domain_name (str) – optional, domain name to be used when naming output file

filter_feature(max_insertion=None, min_within_n=None, min_within_fraction=None, alignment_rvs_pattern='^_R_$seqid$$')[source]

Set feature check for candidate gRNAs.

Range(s) for desired feature of inferred homologue targets discoverd by MINORg will be inferred based on alignment with reference genes based on self’s attributes: max_insertion

Parameters
  • max_insertion (int) – optional. See attributes.

  • min_within_n (int) – optional. See attributes.

  • min_within_fraction (float) – optional. See attributes.

filter_exclude()[source]

Set exclude check for candidate gRNAs. Overwrites existing exclude check using self.exclude.

If self.exclude is set, gRNA which sequences appear in the file at self.exclude will fail this check.

filter(background_check=True, feature_check=True, gc_check=True)[source]

Execute self.filter_background, self.filter_feature, and self.filter_gc based on self’s attributes.

Parameters
  • background_check (bool) – filter gRNA by off-target (default=True)

  • feature_check (bool) – filter gRNA by within-feature (default=True)

  • gc_check (bool) – filter gRNA by GC content (default=True)

minimumset(sets=None, manual=None, fasta=None, fout_fasta=None, fout_map=None, fout_eqv=None, report_full_path=True, all_checks=True, nonstandard_checks=[], exclude_check=True, gc_check=True, background_check=True, feature_check=True)[source]

Generate minimum set(s) of gRNA required to cover all targets.

Parameters
  • sets (int) – optional. See attributes.

  • manual (bool) – manually approve all gRNA sets. Defaults to NOT self.auto if not provided.

  • fasta (str) – optional, path to FASTA file. Used for renaming gRNA.

  • fout_fasta (str) – optional, path to output FASTA file. Autogenerated using self.prefix and self.active_directory if not provided.

  • fout_map (str) – optional, path to output .map file. Autogenerated using self.prefix and self.active_directory if not provided.

  • fout_eqv (str) – optional, path to output .eqv file of passing gRNA. Autogenerated using self.prefix and self.active_directory if not provided.

  • report_full_path (bool) – print full path to output files upon successful writing

  • all_checks (bool) – include all checks for filtering (including any in user-added columns)

  • nonstandard_checks (list) – non-standard checks to include for filtering (subject to self.accept_invalid_field)

  • exclude_check (bool) – include ‘exclude’ field for filtering (subject to self.accept_invalid_field)

  • gc_check (bool) – include ‘GC’ field for filtering (subject to self.accept_invalid_field)

  • background_check (bool) – include ‘background’ field for filtering (subject to self.accept_invalid_field)

  • feature_check (bool) – include ‘feature’ field for filtering (subject to self.accept_invalid_field)

full(manual=None, background_check=True, feature_check=True, gc_check=True)[source]

Execute full MINORg programme using self’s attributes as parameter values.

Parameters
  • manual (bool) – manually approve all gRNA sets. Defaults to NOT self.auto if not provided.

  • background_check (bool) – filter gRNA by off-target (default=True)

  • feature_check (bool) – filter gRNA by within-feature (default=True)

  • gc_check (bool) – filter gRNA by GC content (default=True)