minorg.MINORg module
- minorg.MINORg.parse_lookup(iterable, lookup, return_first=False)[source]
commas not allowed in values
- minorg.MINORg.valid_aliases(aliases, lookup: dict, raise_error: bool = True, message: Optional[str] = None, param: Optional[minorg.parse_config.Param] = None, none_value=None, all_value=None, clear_value=None, display_cmd=None, additional_message=None, return_mapping=False)[source]
Generates appropriate error + message if alias(es) is/are invalid. Throws the error generated if requested (i.e. raise_error = True). Requires alias(es) (str of single alias or iterable of multiple aliases) + set lookup dictionary
- class minorg.MINORg.PathHandler(tmp=False, keep_tmp=False, directory=None)[source]
Bases:
object
Tracks new files/directories and temporary files.
- directory[source]
path of directory being written into. If tmp=True, this will point to a temporary directory. Else, this will be the output directory.
- Type
str
- __init__(tmp=False, keep_tmp=False, directory=None)[source]
Create a PathHandler object.
- Parameters
tmp (bool) – write files to temporary directory (to be deleted or moved to final output directory using resolve)
keep_tmp (bool) – retain temporary files/directories
directory (str) – final output directory (will be directly written into if tmp=False)
- mkdir(*directories, tmp=False)[source]
Create directory/directories.
- Parameters
*directories (str) – path
tmp (bool) – mark directory as temporary (for deletion when self.rm_tmpfiles is called)
- mkfname(*path, tmp=False)[source]
Generate path.
If path provided is not absolute, self.active_directory is assumed to be the root directory.
- Parameters
*path (str) – required, path to output file (e.g. self.mkfname(‘tmp’, ‘tmp.fasta’) –> <self.active_directory>/tmp/tmp.fasta)
tmp (bool) – mark file as temporary (for deletion when self.rm_tmpfiles is called)
- Returns
path
- Return type
str
- reserve_fname(*path, tmp=False, newfile=False, **kwargs)[source]
Generate new file.
Operates exactly as
PathHandler.mkfname()
, with the additional options of creating an empty file or clearing an existing file.- Parameters
*path (str) – path to output file
tmp (bool) – mark file as temporary (for deletion when self.rm_tmpfiles is called)
newfile (bool) – clear an existing file at destination if it already exists
**kwargs – other arguments to pass to self.mkfname
- Returns
path
- Return type
str
- rm_tmpfiles(*fnames)[source]
Delete all files and directories marked as temporary (in self.tmp_files).
Directories will only be deleted if they are empty.
- Parameters
*fnames (str) – optional, path. If provided, only the specified files/directories will be deleted if they are also marked as temporary. Otherwise, all temporary files and directories are deleted.
- class minorg.MINORg.MINORg(directory=None, config=None, prefix='minorg', thread=None, keep_tmp=False, cli=False, auto_update_files=True, **kwargs)[source]
Bases:
minorg.MINORg.PathHandler
Tracks parameters, intermediate files, and output file for reuse.
>>> from minorg.MINORg import MINORg >>> my_minorg = MINORg(directory = "/my/output/directory", prefix = "test", tmp = False, keep_tmp = True, thread = 1) >>> my_minorg.add_reference("/path/to/TAIR10_Chr.all.fasta", "/path/to/TAIR10_GFF3.genes.gff", alias = "TAIR10", replace = True) >>> my_minorg.add_reference("/path/to/Alyrata_384_v1.fa", "/path/to/Alyrata_384_v2.1.gene.gff3", alias = "araly2") >>> my_minorg.genes = ["AT5G66900", "AL8G44500.v2.1"] >>> my_minorg.subset_annotation() >>> [y.get_attr("ID") for y in my_minorg.reference["TAIR10"].annotation.get_subfeatures(*x.genes)] [['AT5G66900.1']] >>> [y.get_attr("ID") for y in my_minorg.reference["TAIR10"].annotation.get_subfeatures(*x.genes)] [['AL8G44500.t1.v2.1']] >>> my_minorg.query_reference = True >>> my_minorg.seq() >>> my_minorg.grna() >>> my_minorg.screen_reference = True >>> my_minorg.filter_background() >>> my_minorg.grna_hits.filter_seqs("background") gRNAHits(gRNA = 352) >>> my_minorg.filter_gc() >>> my_minorg.grna_hits.filter_seqs("GC") gRNAHits(gRNA = 370) >>> my_minorg.grna_hits.filter_seqs("background", "GC") gRNAHits(gRNA = 321) >>> my_minorg.filter_feature() ## by default, MINORg only retains gRNA in CDS Max acceptable insertion length: 15 >>> my_minorg.grna_hits.filter_hits("feature") gRNAHits(gRNA = 344) >>> my_minorg.valid_grna() gRNAHits(gRNA = 278) >>> my_minorg.minimumset() >>> my_minorg.resolve()
- auto_update_files[source]
[general] whether to update gRNA .map and FASTA files automatically after a subcommand is called
- Type
bool
- bedtools[source]
[executable] path to directory containing BEDTools executables. Use ONLY IF BEDTools is not in your command-search path.
- Type
str
- remote_rps[source]
[RPS-BLAST option] use remote database instead of local database for RPS-BLAST
- Type
bool
- pssm_ids[source]
[seq] list of Pssm-Ids of domain(s) for homologue search. If multiple Pssm-Ids are provided, overlapping domains will be merged.
- Type
list of str
- domain_name[source]
human-readable domain name used in sequence and file names in place of Pssm-Ids
- Type
str
- mincdslen[source]
[seq: homologue] minimum number of bases in homologue aligned with reference gene CDS
- Type
int
- check_id_before_merge[source]
[seq: homologue] filter out hits by % identity before merging into potential homologues
- Type
bool
- ot_pamless[source]
[filter: background] ignore absence of PAM when assessing off-target gRNA hits
- Type
bool
- offtarget[source]
[filter: background] function that accepts Biopython’s HSP and QueryResult objects and determines whether an off-target gRNA hit is problematic. If not set by user, ot_pattern will be used. If both offtarget and ot_pattern are not set, ot_mismatch and ot_gap will be used. ## TODO!!! integrate this somehow?
- Type
func
- ot_pattern[source]
[filter: background] pattern specifying maximum number of gap(s) and/or mismatch(es) within given range(s) of a gRNA in off-target regions to disqualify the gRNA
- Type
OffTargetExpression
- ot_unaligned_as_mismatch[source]
[filter: background] treat unaligned positions as mismatches (used with ot_pattern)
- Type
bool
- ot_unaligned_as_gap[source]
[filter: background] treat unaligned positions as gaps (specifically as insertions; used with ot_pattern)
- Type
bool
- ot_mismatch[source]
[filter: background] minimum number of mismatches allowed for off-target gRNA hits
- Type
int
- ot_gap[source]
[filter: background] minimum number of gaps allowed for off-target gRNA hits
- Type
int
- mask[source]
[filter: background] FASTA file of additional sequences to mask in background
- Type
list
- gc_min[source]
[filter: GC] minimum GC content (between 0 and 1, where 0 is no GC content and 1 is all GC)
- Type
float
- gc_max[source]
[filter: GC] maximum GC content (betweew 0 and 1, where 0 is no GC content and 1 is all GC)
- Type
float
- feature[source]
[filter: feature] GFF3 feature within which gRNA are to be designed (default=”CDS”)
- Type
str
- min_within_n[source]
[filter: feature] minimum number of reference genes which feature a gRNA must align within
- Type
int
- min_within_fraction[source]
[filter: feature] minimum fraction of reference genes which feature a gRNA must align within (between 0 and 1, where 0 is none and 1 is all; if 0, min_within_n will be set to 1)
- Type
float
- auto[source]
[minimumset] generate sets without requiring manual user confirmation for each set
- Type
bool
- accept_invalid_field[source]
[minimumset] score ‘NA’ as ‘pass’ if all entries for a check are ‘NA’
- Type
bool
- pass_map[source]
[minimumset] path to output .map file for gRNA that pass all valid checks (autogenerated by MINORg if not provided)
- Type
str
- pass_fasta[source]
[minimumset] path to output .fasta file for gRNA that pass all valid checks (autogenerated by MINORg if not provided)
- Type
str
- final_map[source]
[minimumset] path to output .map file for final gRNA set(s) (autogenerated by MINORg if not provided)
- Type
str
- final_fasta[source]
[minimumset] path to output .fasta file for final gRNA set(s) (autogenerated by MINORg if not provided)
- Type
str
- __init__(directory=None, config=None, prefix='minorg', thread=None, keep_tmp=False, cli=False, auto_update_files=True, **kwargs)[source]
Create a MINORg object.
- Parameters
directory (str) – path to output directory
config (str) – path to config.ini file
prefix (str) – prefix for output directories and files (default=’minorg’)
thread (int) – maximum number of threads for parallel processing
keep_tmp (bool) – retain temporary files
cli (bool) – whether MINORg is called at command line. Not currently used, but may be used in future to determine what to print.
auto_update_files (bool) – whether to update gRNA .map and FASTA files automatically after a subcommand is called
**kwargs – other arguments supplied to parent class
PathHandler
- Returns
a MINORg object
- Return type
- property query_map[source]
Mapping of alias to FASTA for queries.
- Getter
Returns list of (<alias>, </path/to/FASTA>)
- Type
list
- property assemblies[source]
Reference assemblies
- Getter
Returns {<alias>: </path/to/FASTA>}
- Type
dict
- property annotations[source]
Reference annotations
- Getter
Returns {<alias>: </path/to/GFF3>}
- Type
dict
- property attr_mods[source]
Attribute modifications
- Getter
Returns {<reference alias>: <dict of attribute modifications>}
- Type
dict
- property passed_grna[source]
gRNA that have passed background, GC, and feature checks
Relevant attributes: accept_invalid, accept_invalid_field
- Getter
Returns gRNA that have passed standard checks
- Type
- property domain_name[source]
Domain name for use in sequence IDs and file names
- Getter
Returns domain name if set, else ‘gene’ if not
MINORg.pssm_ids
, else dash-separatedMINORg.pssm_ids
- Setter
Sets domain name
- Type
str
- property offtarget[source]
Function to assess off-target goodness of alignment
- Getter
Returns function that accepts HSP and QueryResult
- Type
func
- property pam[source]
PAM pattern
- Getter
Returns expanded PAM pattern (includes explicit position of gRNA; e.g. ‘.NGG’)
- Setter
Sets PAM pattern and parses it into expanded PAM pattern
- Type
str
- property length[source]
gRNA length
- Getter
Returns gRNA length (bp)
- Setter
Sets gRNA length and updates MINORg.PAM
- Type
int
- property feature[source]
Target feature
- Getter
Returns first feature if multiple have been provided
- Setter
Sets target feature
- Type
str
- property check_recip[source]
Whether to execute reciprocal check
- Getter
Returns True if MINORg.check_recip=True OR MINORg.relax_recip=True
- Setter
Sets this check
- Type
bool
- property ot_pam[source]
Remove gRNA from candidates if off-target hit has PAM site nearby
- Getter
Returns whether to consider presence of PAM site for off-target filtering
- Type
bool
- property ot_pattern[source]
Off-target mismatch/gap pattern
- Getter
Returns OffTargetExpression object
- Type
:class:~minorg.ot_regex.OffTargetExpression
- property ot_unaligned_as_mismatch[source]
Treat unaligned positions as mismatch (used with ot_pattern)
- Getter
Returns whether to count unaligned positions as mismatch
- Type
bool
- property ot_unaligned_as_gap[source]
Treat unaligned positions as gaps (specifically insertions; used with ot_pattern)
- Getter
Returns whether to count unaligned positions as gaps
- Type
bool
- property prioritise_nr[source]
Prioritise non-redundancy for minimum set generation
- Getter
Returns whether non-redundancy is prioritised
- Type
bool
- property prioritize_nr[source]
Prioritise non-redundancy for minimum set generation
- Getter
Returns whether non-redundancy is prioritised
- Type
bool
- property prioritise_pos[source]
Prioritise position (i.e. 5’ gRNA) for minimum set generation
- Getter
Returns whether position (i.e. 5’ gRNA) is prioritised
- Type
bool
- property prioritize_pos[source]
Prioritise position (i.e. 5’ gRNA) for minimum set generation
- Getter
Returns whether position (i.e. 5’ gRNA) is prioritised
- Type
bool
- mkfname(*path, tmp=False, prefix=True)[source]
Generate new file name.
If path provided is not absolute, self.active_directory is assumed to be the root directory.
- Parameters
*path (str) – required, path to output file (e.g. self.mkfname(‘tmp’, ‘tmp.fasta’))
tmp (bool) – mark file as temporary (for deletion when self.rm_tmpfiles is called)
prefix (bool) – prefix self.prefix to basename
- results_fname(*path, reserve=False, newfile=False, **kwargs)[source]
Generate new file name in results directory (<self.active_directory>/<self.prefix>).
Operates exactly as
MINORg.mkfname()
, with the additional options of creating an empty file or clearing an existing file.- Parameters
*path (str) – path to output file
reserve (bool) – create empty file at destination if it does not exist
newfile (bool) – clear an existing file at destination if it already exists
**kwargs – other arguments supplied to
MINORg.mkfname()
- resolve() None [source]
Wrapper for super().resolve() that also updates stored filenames. Removes logfile if not in CLI mode.
- get_ref_seqid(seqid, attr)[source]
Extract attribute of sequence from sequence name (for sequences generated by self.target)
- Parameters
seqid (str) – required, sequence name
attr (str) – required, attribute to return. Valid attributes: ‘gene’, ‘feature’, ‘source’.
- Returns
str
- valid_grna(*check_names, accept_invalid=None, accept_invalid_field=None)[source]
gRNA that have passed background, GC, and feature checks, as well as any user-set checks
Relevant attributes: accept_invalid, accept_invalid_field
- Parameters
*check_names (str) –
optional, checks to include.
If not specified: all checks, both standard and non-standard, will be used.
If ‘standard’: only background, GC, and feature checks will be used. Can be used in combination with non-standard check names.
If ‘nonstandard’: only non-standard checks will be used. Can be used in combination with standard check names.
- Returns
- check_names(seq=True, hit=True, standard=True, nonstandard=True) list [source]
Get gRNA check names
- Parameters
seq (bool) – get check names for gRNA sequences (gRNASeq objects)
hit (bool) – get check names for gRNA hits (gRNAHit objects)
standard (bool) – get standard check names
nonstandard (bool) – get nonstandard check names
- Returns
List of check names (str)
- Return type
list
- add_query(fname, alias=None)[source]
Add file to be queried.
- Parameters
fname (str) – required, path to file
alias (str) – optional, alias for query file
- remove_query(alias)[source]
Remove file for querying.
- Parameters
alias (str) – required, alias of query file to remove
- add_background(fname, alias=None)[source]
Add file for background filter.
- Parameters
fname (str) – required, path to file
alias (str) – optional, alias for background file. Used when writing mask report and for removing background.
- remove_background(alias)[source]
Remove file for background filter.
- Parameters
alias (str) – required, alias of background file to remove
- parse_grna_map_from_file(fname)[source]
Read candidate gRNA from .map file output by MINORg.
- Parameters
fname (str) – required, path to MINORg .map file
- rename_grna(fasta)[source]
Rename candidate gRNA according to FASTA file.
- Parameters
fasta (str) – required, path to FASTA file
- write_all_grna_fasta(fout=None)[source]
Write FASTA file of all candidate gRNA.
- Parameters
fout (str) – optional, absolute path to output file. Autogenerated using self.prefix and self.active_directory if not provided.
- write_all_grna_map(fout=None, write_checks=True)[source]
Write .map file of all candidate gRNA.
- Parameters
fout (str) – optional, absolute path to output file. Autogenerated using self.prefix and self.active_directory if not provided.
write_checks (bool) – write all check statuses
- write_all_grna_eqv(fout=None)[source]
Write .eqv file (grouping gRNA by equivalent coverage) of all candidate gRNA.
- Parameters
fout (str) – optional, absolute path to output file. Autogenerated using self.prefix and self.active_directory if not provided.
- write_pass_grna_fasta(fout=None)[source]
Write FASTA file of gRNA that pass all valid checks.
- Parameters
fout (str) – optional, absolute path to output file. Autogenerated using self.prefix and self.active_directory if not provided.
- write_pass_grna_map(fout=None)[source]
Write .map file of gRNA that pass all valid checks.
- Parameters
fout (str) – optional, absolute path to output file. Autogenerated using self.prefix and self.active_directory if not provided.
- write_pass_grna_eqv(fout=None)[source]
Write .eqv file (grouping gRNA by equivalent coverage) of gRNA that pass all valid checks.
- Parameters
fout (str) – optional, absolute path to output file. Autogenerated using self.prefix and self.active_directory if not provided.
- add_reference(fasta, annotation, alias=None, genetic_code=1, attr_mod={}, memsave=True, replace=False)[source]
Add reference genome.
- Parameters
fasta (str) – required, path to fasta file
annotation (str) – required path to GFF3 file
alias (str) – optional, name of fasta-annotation combo
genetic_code (str or int) – NCBI genetic code (default=1)
attr_mod (dict) – mapping for non-standard attribute field names in GFF3 file (default={})
replace (bool) – replace any existing reference with same alias (default=False)
- remove_reference(alias)[source]
Remove reference genome.
- Parameters
alias (str) – required, alias of reference genome
- subset_annotation(quiet=True, preserve_order=True)[source]
Subset annotations of all reference genomes according to self.genes.
Reduces annotation lookup time.
- Parameters
quiet (bool) – silence printing of non-essential messages
sort (bool) – sort subset data
- extend_reference(ext_gene, ext_cds)[source]
Extend reference genomes from FASTA files of gene and CDS sequences.
Sequence IDs of gene sequences in ext_gene file(s) should not be repeated and should not contain ‘|’. Sequences in ext_cds file(s) should be named <sequence ID in ext_gene file(s)>.<int> to indicate their parent gene. Corresponding gene and CDS sequences will be aligned with mafft for inference of CDS range in genes.
- Parameters
ext_gene (str or list) – required, path to file or list of paths to files of gene sequences
ext_cds (str or list) – required, path to file or list of paths to files of CDS sequences
- get_reference_seq(*features, adj_dir=True, by_gene=False, ref=None, seqid_template='Reference|$source|$gene|$isoform|$feature|$n|$complete', translate=False, isoform_lvl=None, fout=None, complete=False, **fouts)[source]
Get sequence(s) of reference genes or isoforms.
If self.pssm_ids is provided, sequences are restricted to the relevant domain(s).
- Parameters
*features (str) – optional, GFF3 feature type(s) to retrieve If not specified, sequence of the gene or isoform will be retrieved directly.
ref (str) – optional, alias of reference genome from which to extract sequences. If not specified, all reference genomes will be searched for genes.
seqid_template (str) –
optional, template for output sequence name. Template will be parsed by strings.Template. The default template is “Reference|$source|$gene|$isoform|$feature|$n|$complete”.
$source: reference genome alias
$gene: gene/isoform ID
$isoform: isoform ID if by_gene = False, else same as $gene
$feature: GFF3 feature type
$n: if multiple domains are present, they will be numbered according to proximity to 5’ of sense strand
$complete: ‘complete’ if
complete=True
else ‘stitched’
adj_dir (bool) – output sense strand
by_gene (bool) – merge sequences from all isoforms of a gene
isoform_lvl (bool) – if GFF features ‘mRNA’ or ‘protein’ specified, sequences will be separated by mRNA/protein isoforms
complete (bool) – merge output range(s) together into single range. Output will stll be a list of tuple. (e.g. [(<smallest start>, <largest end>)])
translate (bool) – translate sequence. Should be used with adj_dir.
fout (str) – optional, path to output file if features is not specified.
**fouts (str) – optional, path to output files for each feature if features is specified. Example: self.get_reference_seq(“CDS”, CDS = <path to file>) returns dict AND writes to file
- Returns
- sequence(s) of reference genes or isoforms (specified in self.genes), grouped by feature.
Format: {<feature>: {<seqid>: <seq>}}
- Return type
dict
- generate_ref_gene_cds(ref_dir='ref', quiet=True, domain_name=None)[source]
Get and store self.genes’ gene and CDS sequences, and GFF data for domains.
The default sequence ID template is: “Reference|$source|$domain|$n|$feature|$complete|$gene|$range”.
$source: reference genome alias
$domain: PSSM ID or domain name (‘gene’ if not specified)
$n: if multiple domains are present, they will be numbered according to proximity to 5’ of sense strand
$feature: GFF3 feature type
$complete: ‘complete’ if sequence includes intervening sequences not of feature type $feature. ‘stitched’ if sequence is concatenated from features of feature type $feature.
$gene: gene/isoform ID
$range: range of sequence in gene
- seq(minlen=None, minid=None, mincdslen=None, quiet=True, check_recip=None, relax_recip=None, check_id_before_merge=None)[source]
Identify targets and generate self.target file based on self’s attributes.
All arguments are optional. If not provided, the corresponding value stored in self’s attributes will be used.
If self.pssm_ids has been provided, either 1) self.rps_hits OR 2) self.db AND self.rpsblast are required.
>>> my_minorg = MINORg(directory = "/my/output/directory", prefix = "test", tmp = False, keep_tmp = True, thread = 1) >>> my_minorg.add_reference("/path/to/TAIR10_Chr.all.fasta", "/path/to/TAIR10_GFF3.genes.gff", alias = "TAIR10", replace = True) >>> my_minorg.genes = ["AT5G66900"] >>> my_minorg.subset_annotation() >>> my_minorg.pssm_ids = "366375" # NB-ARC domain >>> my_minorg.query_reference = True >>> my_minorg.seq() Exception: If self.pssm_ids has been provided, self.db (path to RPS-BLAST database) AND self.rpsblast (path to rpsblast or rpsblast+ executable OR command name if available at command line) are required >>> my_minorg.db = "/path/to/rpsblast/database" >>> my_minorg.rpslast = "rpsblast+" >>> my_minorg.seq() >>> my_minorg.target '/my/output/directory/test/test_3663775_targets.fasta'
- Parameters
minlen (int) – optional. See attributes.
minid (float) – optional. See attributes.
mincdslen (in) – optional. See attributes.
check_recip (bool) – optional. See attributes.
relax_recip (bool) – optional. See attributes.
check_id_before_merge (bool) – optional. See attributes.
- grna()[source]
Generate all possible gRNA from self.target file based on self’s attributes: pam, length.
- write_mask_report(fout)[source]
Write mask report to file.
Details masked regions as well as which sequence each region is identical to.
- Parameters
fout (str) – required, path to output file
- is_offtarget_pos(hsp, ifasta)[source]
Check whether gRNA hit is outside masked target regions.
- Parameters
hsp (Biopython HSP) – required
ifasta (IndexedFasta) – required, subject of BLAST search
- Returns
whether a HSP hit is outside of a masked region
- Return type
bool
- is_offtarget_aln(hsp, query_result, **kwargs)[source]
Check whether (potentially off-target) gRNA hit aligns too well.
Used in combination with is_offtarget_pos to determine if an off-target gRNA hit could be problematic.
- Parameters
hsp (Biopython HSP) – required
query_result (Biopython QueryResult) – required
**kwargs – other arguments supplied passed to
MINORg._is_offtarget_aln()
- Returns
whether a HSP hit meets the threshold for problematic off-target gRNA hit
- Return type
bool
- is_offtarget_pam(hsp, query_result, ifasta)[source]
Check for presence of PAM in close proximity and correct orientation to gRNA hit.
- Parameters
hsp (Biopython HSP) – required
query_result (Biopython QueryResult) – required
ifasta (IndexedFasta) – required
- Returns
whether a HSP hit is in close proximity and correct orientation to a PAM site
- Return type
bool
- is_offtarget_hit(hsp, query_result, ifasta)[source]
Check whether a gRNA hit aligns too well outside of masked target regions.
Determined by is_offtarget_pos AND is_offtarget_aln AND is_offtarget_pam for a given HSP, QueryResult, and FASTA file combination.
- Parameters
hsp (Biopython HSP) – required
query_result (Biopython QueryResult) – required
ifasta (IndexedFasta) – required
- Returns
whether a gRNA hit is off-target and problematic
- Return type
bool
- filter_background(*other_mask_fnames, keep_blast_output=None, mask_reference=True)[source]
Set background filter check for candidate gRNAs.
Masks target sequences in all FASTA files to be screened for off-target, BLASTs all candidate gRNAs to those FASTA files, and assesses each BLAST hit individually for whether they could potentially be problematic.
Relevant attributes: screen_reference, ot_pamless, ot_mismatch, ot_gap
Relevant methods: add_background, remove_background
- Parameters
*other_mask_fnames (str) – optional, paths to other FASTA files not in self.background that are also to be screened for off-target
keep_blast_output (bool) – retain BLAST output file. If not provided, defaults to self.keep_tmp.
False
deletes it.mask_reference (bool) – mask reference genes (default=True)
- filter_gc(gc_min=None, gc_max=None)[source]
Set GC check for candidate gRNA.
- Parameters
gc_min (float) – optional. See attributes.
gc_max (float) – optional. See attributes.
- align_reference(fout)[source]
Align reference genes.
CDS are first aligned. Full genes are then added to the alignment.
- Parameters
fout (str) – required, path to output FASTA file
- align_reference_and_targets(domain_name=None, realign_only_if_updated=True)[source]
Align reference genes and targets. Path to FASTA file generated will be stored in self.alignment.
- Parameters
domain_name (str) – optional, domain name to be used when naming output file
- filter_feature(max_insertion=None, min_within_n=None, min_within_fraction=None, alignment_rvs_pattern='^_R_$seqid$$')[source]
Set feature check for candidate gRNAs.
Range(s) for desired feature of inferred homologue targets discoverd by MINORg will be inferred based on alignment with reference genes based on self’s attributes: max_insertion
- Parameters
max_insertion (int) – optional. See attributes.
min_within_n (int) – optional. See attributes.
min_within_fraction (float) – optional. See attributes.
- filter_exclude()[source]
Set exclude check for candidate gRNAs. Overwrites existing exclude check using self.exclude.
If self.exclude is set, gRNA which sequences appear in the file at self.exclude will fail this check.
- filter(background_check=True, feature_check=True, gc_check=True)[source]
Execute self.filter_background, self.filter_feature, and self.filter_gc based on self’s attributes.
- Parameters
background_check (bool) – filter gRNA by off-target (default=True)
feature_check (bool) – filter gRNA by within-feature (default=True)
gc_check (bool) – filter gRNA by GC content (default=True)
- minimumset(sets=None, manual=None, fasta=None, fout_fasta=None, fout_map=None, fout_eqv=None, report_full_path=True, all_checks=True, nonstandard_checks=[], exclude_check=True, gc_check=True, background_check=True, feature_check=True)[source]
Generate minimum set(s) of gRNA required to cover all targets.
- Parameters
sets (int) – optional. See attributes.
manual (bool) – manually approve all gRNA sets. Defaults to NOT self.auto if not provided.
fasta (str) – optional, path to FASTA file. Used for renaming gRNA.
fout_fasta (str) – optional, path to output FASTA file. Autogenerated using self.prefix and self.active_directory if not provided.
fout_map (str) – optional, path to output .map file. Autogenerated using self.prefix and self.active_directory if not provided.
fout_eqv (str) – optional, path to output .eqv file of passing gRNA. Autogenerated using self.prefix and self.active_directory if not provided.
report_full_path (bool) – print full path to output files upon successful writing
all_checks (bool) – include all checks for filtering (including any in user-added columns)
nonstandard_checks (list) – non-standard checks to include for filtering (subject to self.accept_invalid_field)
exclude_check (bool) – include ‘exclude’ field for filtering (subject to self.accept_invalid_field)
gc_check (bool) – include ‘GC’ field for filtering (subject to self.accept_invalid_field)
background_check (bool) – include ‘background’ field for filtering (subject to self.accept_invalid_field)
feature_check (bool) – include ‘feature’ field for filtering (subject to self.accept_invalid_field)
- full(manual=None, background_check=True, feature_check=True, gc_check=True)[source]
Execute full MINORg programme using self’s attributes as parameter values.
- Parameters
manual (bool) – manually approve all gRNA sets. Defaults to NOT self.auto if not provided.
background_check (bool) – filter gRNA by off-target (default=True)
feature_check (bool) – filter gRNA by within-feature (default=True)
gc_check (bool) – filter gRNA by GC content (default=True)