6. Tutorial (Python)

Please download the files in https://github.com/rlrq/MINORg/tree/master/examples. In all the examples below, you should replace “/path/to” with the appropriate full path name, which is usually to the directory containing these example files.

6.1. Setting up the tutorial

To ensure that the examples in this tutorial work, please replace ‘/path/to’ in the files ‘arabidopsis_genomes.txt’, ‘athaliana_genomes.txt’, and ‘subset_genome_mapping.txt’ with the full path to the directory containing the example files.

6.2. Getting started

To begin, import the MINORg class.

>>> from minorg.MINORg import MINORg

To create a MINORg object:

>>> my_minorg = MINORg(directory = "/path/to/output/directory", prefix = "prefix")

Both directory and prefix are optional. If not provided, they will default to the current directory and ‘minorg’ respectively. If the directory does not currently exist, it will be created.

If you wish to use the default values specified in a config file, use this instead:

>>> my_minorg = MINORg(config = "/path/to/config.ini", directory = "/path/to/output/directory", prefix = "prefix")

You may now set your parameters using the attributes of your MINORg object. For a table listing the equivalent CLI arguments and MINORg attributes, see CLI vs Python.

6.3. IMPT: Note on executables

See: Executables

You can specify executables as such:

>>> my_minorg.blastn = '/path/to/blastn/executable'
>>> my_minorg.rpsblast = '/path/to/rpsblast/executable'
>>> my_minorg.mafft = '/path/to/mafft/executable'
>>> my_minorg.bedtools = '/path/to/bedtools2/bin'

Note that BEDTools is unique in that if it is not in your command-search path, you should provide the path TO THE DIRECTORY CONTAINING ITS EXECUTABLES (i.e. there will not be a single ‘bedtools’ executable), and if it IS in your command-search path you SHOULD NOT be using the bedtools attribute. See Executables for more on executables.

6.4. Defining target sequences

6.4.1. User-provided targets

Let us begin with the simplest MINORg execution:

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_200_target")
>>> my_minorg.target = "/path/to/sample_CDS.fasta"
>>> my_minorg.full()
>>> my_minorg.resolve()

The above combination of arguments tells MINORg to generate gRNA from targets in a user-provided FASTA file (my_minorg.target = 'pat/to/sample_CDS.fasta') and to output files into the directory /path/to/output/directory/example_200_target. By default, MINORg generates 20 bp gRNA using NGG PAM. The full MINORg programme is executed by calling the full() method (my_minorg.full()). Don’t forget to call resolve() to remove any temporary files.

6.4.2. Reference gene(s) as targets

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_201_refgene")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050", "AT5G45060", "AT5G45200", "AT5G45210", "AT5G45220", "AT5G45230", "AT5G45240", "AT5G45250"]
>>> my_minorg.query_reference = True
>>> my_minorg.full()
>>> my_minorg.resolve()

In the above example, my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10") is used to specify information about a reference genome:

  • Positional argument 1: path to reference assembly (In this case "/path/to/subset_ref_TAIR10.fasta")

  • Positional argument 2: path to reference annotation (In this case "/path/to/subset_ref_TAIR10.gff")

  • Optional keyword argument 1 (alias): genome alias (in this case "TAIR10"); a unique name for the reference genome, used when referring to it in sequence names and output files. Autogenerated by MINORg if not provided.

  • See add_reference() and Non-standard reference for how to specify genetic code and non-standard attribute field names

my_minorg.genes = ["AT5G45050", "AT5G45060", "AT5G45200", "AT5G45210", "AT5G45220", "AT5G45230", "AT5G45240", "AT5G45250"] tells MINORg the target gene(s), and my_minorg.query_reference = True tells MINORg to generate gRNA for reference gene(s).

6.4.3. Non-reference gene(s) as targets

6.4.3.1. Extending the reference

See also: Extended genome

If you have both genomic and CDS-only sequences of your target genes but not a GFF3 annotation file, MINORg can infer coding regions (CDS) for your target genes using extend_reference(). See Extended genome for how to name your sequences to ensure proper mapping of CDS to genes.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_202_ext")
>>> my_minorg.extend_reference("/path/to/sample_gene.fasta", "/path/to/sample_CDS.fasta")
>>> my_minorg.genes = ["AT1G10920"]
>>> my_minorg.query_reference = True
>>> my_minorg.full()
>>> my_minorg.resolve()

extend_reference() effectively adds new genes to the reference genome, so they can be used just like any reference gene. Therefore, they can also be used in combination with add_query().

6.4.3.2. Inferring homologues in unannotated genomes

See also: Non-reference homologue inference

If you would like MINORg to infer homologues in non-reference genomes, you can use add_query() to specify the FASTA files of those non-reference genomes.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_203_query")
>>> my_minorg.extend_reference("/path/to/sample_gene.fasta", "/path/to/sample_CDS.fasta")
>>> my_minorg.genes = ["AT1G10920"]
>>> my_minorg.add_query("/path/to/subset_9654.fasta", alias = "9654")
>>> my_minorg.add_query("/path/to/subset_9655.fasta", alias = "9655")
>>> my_minorg.full()
>>> my_minorg.resolve()

In the above example, my_minorg.add_query("/path/to/subset_9654.fasta", alias = "9654") and my_minorg.add_query("/path/to/subset_9655.fasta", alias = "9655") are used to specify information about query FASTA files.

  • The alias keyword argument is optional. If not provided, MINORg will generate a unique alias.

  • Query FASTA files are stored as a dictionary with the format {<alias>:<FASTA>} at query.

  • If you’d like to remove a query file that you’ve added, you can use:

    >>> my_minorg.remove_query("9654")
    
    • The remove_query() method takes a query alias. If you did not specify an alias when using add_query() and do not know the alias of the file you wish to remove, you may view the query-FASTA mapping using the query attribute.

      >>> my_minorg.query
      {"9654": "/path/to/subset_9654.fasta", "9655": "/path/to/subset_9655.fasta"}
      

6.4.4. Domain as targets

MINORg allows users to specify the identifier of an RPS-BLAST position-specific scoring matrix (PSSM-Id) to further restrict the target sequence to a given domain associated with the PSSM-Id. This could be particularly useful when designing gRNA for genes that do not share conserved domain structures but do share a domain that you wish to knock out.

6.4.4.1. Local database

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_204_domain")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.query_reference = True
>>> my_minorg.rpsblast = "/path/to/rpsblast/executable"
>>> my_minorg.db = "/path/to/rpsblast/db"
>>> my_minorg.pssm_ids = ["214815"]
>>> my_minorg.full()
>>> my_minorg.resolve()

In the above example, gRNA will be generated for the WRKY domain (PSSM-Id 214815 as of CDD database v3.18) of the gene AT5G45050. Users are responsible for providing the PSSM-Id of a domain that exists in the gene. If multiple PSSM-Ids are provided, overlapping domains will be combined and output WILL NOT distinguish between one PSSM-Id or another. Unlike other examples, the database (db) is not provided as part of the example files. If you are using the full Docker image pulled from rlrq/minorg, the database is bundled with the image. Otherwise, you will have to download it yourself. See RPS-BLAST local database for more information.

6.4.4.2. Remote database

While it is in theory possible to use the remote CDD database & servers instead of local ones, the --remote option for the ‘rpsblast’/’rpsblast+’ command from the BLAST+ package has never worked for me. In any case, if your version of local rpsblast is able to access the remote database, you can use remote_rps.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_204_domain")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.query_reference = True
>>> my_minorg.rpsblast = "/path/to/rpsblast/executable"
>>> my_minorg.db = "Cdd"
>>> my_minorg.remote_rps = True
>>> my_minorg.pssm_ids = ["214815"]
>>> my_minorg.full()
>>> my_minorg.resolve()

6.5. Defining gRNA

See also: PAM

By default, MINORg generates 20 bp gRNA using SpCas9’s NGG PAM. You may specify other gRNA length using length and other PAM using pam.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_205_grna")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.query_reference = True
>>> my_minorg.length = 23
>>> from minorg import pam
>>> my_minorg.pam = pam.Cas12a
>>> my_minorg.full()
>>> my_minorg.resolve()

In the example above, MINORg will generate 19 bp gRNA (my_minorg.length = 23) using Cas12a’s unusual 5’ PAM pattern (TTTV<gRNA>) (my_minorg.pam = pam.Cas12a). MINORg has several built-in PAMs (see Preset PAM patterns for options), and also supports customisable PAM patterns using ambiguous bases and regular expressions (see PAM for format). To use preset PAMs, such as in the example above, you will first need to import MINORg’s minorg.pam module (from minorg import pam), then use pam.<preset pam alias> (such as pam.Cas12a) to refer to the desired PAM pattern.

6.6. Filtering gRNA

MINORg supports 3 different gRNA filtering options, all of which can be used together.

6.6.1. Filter by GC content

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_206_gc")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.query_reference = True
>>> my_minorg.gc_min = 0.2
>>> my_minorg.gc_max = 0.8
>>> my_minorg.full()
>>> my_minorg.resolve()

In the above example, MINORg will exclude gRNA with less than 20% (my_minorg.gc_min = 0.2) or greater than 80% (my_minorg.gc_min = 0.8) GC content. By default, minimum GC content is 30% and maximum is 70%.

6.6.2. Filter by off-target

See: Off-target assessment

6.6.2.1. Using total mismatch/gap/unaligned

See: Total mismatch/gap/unaligned

Thresholds for total number of mismatches or gaps (and unaligned positions) required for an off-target gRNA hit to be considered non-problematic are controlled by ot_mismatch and ot_gap respectively. See Total mismatch/gap/unaligned for more.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_207_ot_ref")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.query_reference = True
>>> my_minorg.screen_reference = True
>>> my_minorg.add_background("/path/to/subset_ref_Araly2.fasta", alias = "araly")
>>> my_minorg.add_background("/path/to/subset_ref_Araha1.fasta", alias = "araha")
>>> my_minorg.add_background("/path/to/subset_9654.fasta", alias = "9654")
>>> my_minorg.add_background("/path/to/subset_9655.fasta", alias = "9655")
>>> my_minorg.ot_gap = 2
>>> my_minorg.ot_mismatch = 2
>>> my_minorg.full()
>>> my_minorg.resolve()

In the above example, MINORg will screen gRNA for off-targets in:

  • The reference genome (my_minorg.screen_reference)

  • Four different FASTA files (my_minorg.add_background("<FASTA>", alias = "<alias>"))

    • The alias keyword argument is optional. If not provided, MINORg will generate a unique alias.

    • Note that any AT5G45050 homologues in these four FASTA files will NOT be masked. This means that only gRNA that do not target any AT5G45050 homologues in these four genomes will pass this off-target check.

      • To mask homologues in these genomes, you will need to provide a FASTA file containing the sequences of their homologues using my_minorg.mask = ["/path/to/to_mask_1.fasta", "/path/to/to_mask_2.fasta"]. You may use subcommand seq() (see Subcommands) to identify these homologues and retrieve their sequences.

ot_gap and ot_mismatch control the minimum number of gaps or mismatches off-target gRNA hits must have to be considered non-problematic; any gRNA with at least one problematic gRNA hit will be excluded. By default, both values are set to ‘1’. See Off-target assessment for more on the off-target assessment algorithm.

In the case above, my_minorg.screen_reference = True is actually redundant as the genome(s) from which targets are obtained (which, because of my_minorg.query_reference, is the reference genome) are automatically included for background check.

However, in the example below, when the targets are from non-reference genomes, the reference genome is not automatically included for off-target assessment and thus screen_reference is NOT redundant. Additionally, do note that the genes specified using gene are masked in the reference genome, such that any gRNA hits to them are NOT considered off-target and will NOT be excluded.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_208_ot_nonref")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.add_query("/path/to/subset_9654.fasta", alias = "9654")
>>> my_minorg.screen_reference = True
>>> my_minorg.add_background("/path/to/subset_ref_Araly2.fasta", alias = "araly")
>>> my_minorg.add_background("/path/to/subset_ref_Araha1.fasta", alias = "araha")
>>> my_minorg.add_background("/path/to/subset_9655.fasta", alias = "9655")
>>> my_minorg.ot_gap = 2
>>> my_minorg.ot_mismatch = 2
>>> my_minorg.full()
>>> my_minorg.resolve()

6.6.2.2. Using position-specific mismatch/gap/unaligned

See: Position-specific mismatch/gap/unaligned

Finer control of off-target definition can be achieved using ot_pattern, which allows users to provide a pattern that specifies different thresholds for different positions along a gRNA. Unlike ot_mismatch and ot_gap, which specify the LOWER-bound of NON-problematic hits, ot_pattern specifies UPPER-bound of PROBLEMATIC hits. By default, unaligned positions will be treated as mismatches, but this behaviour can be altered by setting ot_unaligned_as_mismatch to False. See Off-target pattern for how to build an off-target pattern, and Position-specific mismatch/gap/unaligned for more on how unaligned positions can be counted.

When ot_pattern is specified, ot_mismatch and ot_gap will be ignored.

The following example is identical to the first in Using total mismatch/gap/unaligned, except ot_mismatch and ot_gap are replaced with ot_pattern.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_209_ot_ref_pattern")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.query_reference = True
>>> my_minorg.screen_reference = True
>>> my_minorg.add_background("/path/to/subset_ref_Araly2.fasta", alias = "araly")
>>> my_minorg.add_background("/path/to/subset_ref_Araha1.fasta", alias = "araha")
>>> my_minorg.add_background("/path/to/subset_9654.fasta", alias = "9654")
>>> my_minorg.add_background("/path/to/subset_9655.fasta", alias = "9655")
>>> my_minorg.ot_pattern = "0mg-10,1mg-11-"
>>> my_minorg.full()
>>> my_minorg.resolve()

In the above example, my_minorg.ot_pattern = "0mg-10,1mg-11-" means that MINORg will discard any gRNA with at least one off-target hit where:

  • There are no mismatches or gaps between positions -10 and -1, and there are no more than 1 mismatch or gap from position -11 to the 5’ end.

See Off-target pattern for how to build and interpret an off-target pattern.

6.6.2.3. PAM-less off-target check

By default, MINORg does NOT check for the presence of PAM sites next to potential off-target hits. You may override this behaviour by setting ot_pamless to False. This tells MINORg to mark off-target hits that fail the ot_gap or ot_mismatch thresholds (or match ot_pattern) as problematic ONLY IF there is a PAM site nearby.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_210_ot_pamless")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.add_query("/path/to/subset_9654.fasta", alias = "9654")
>>> my_minorg.screen_reference = True
>>> my_minorg.add_background("/path/to/subset_ref_Araly2.fasta", alias = "araly")
>>> my_minorg.add_background("/path/to/subset_ref_Araha1.fasta", alias = "araha")
>>> my_minorg.add_background("/path/to/subset_9655.fasta", alias = "9655")
>>> my_minorg.ot_gap = 2
>>> my_minorg.ot_mismatch = 2
>>> my_minorg.ot_pamless = True
>>> my_minorg.full()
>>> my_minorg.valid_grna("background") ## gRNA that pass background filtering
gRNAHits(gRNA = 6)
>>> my_minorg.ot_pamless = False ## only remove gRNA from candidates if off-target hits have PAM site nearby
>>> my_minorg.full()
>>> my_minorg.valid_grna("background")
gRNAHits(gRNA = 12)
>>> my_minorg.resolve()

6.6.2.4. Skip off-target check

To skip off-target check entirely, use background_check = False when calling full().

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_211_skipbgcheck")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.query_reference = True
>>> my_minorg.full(background_check = False)
(several warning messages about the background check being unset will pop up, but you can ignore them)
>>> my_minorg.resolve()

6.6.3. Filter by feature

See: Within-feature inference

By default, when genes is set, MINORg restricts gRNA to coding regions (CDS). For more on how MINORg does this for inferred, unannotated homologues, see Within-feature inference. You may change the feature type in which to design gRNA using the attribute feature. See column 3 of your GFF3 file for valid feature types (see https://en.wikipedia.org/wiki/General_feature_format for more on GFF file format).

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_212_withinfeature")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.query_reference = True
>>> my_minorg.feature = "three_prime_UTR"
>>> my_minorg.full(background_check = False)
>>> my_minorg.resolve()

6.7. Generating minimum gRNA set(s)

6.7.1. Number of sets

By default, MINORg outputs a single gRNA set covering all targets. You may request more (mutually exclusive) sets using the set attribute.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_213_set")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G46260", "AT5G46270", "AT5G46450", "AT5G46470", "AT5G46490", "AT5G46510", "AT5G46520"]
>>> my_minorg.query_reference = True
>>> my_minorg.sets = 5
>>> my_minorg.full()
>>> my_minorg.resolve()

6.7.2. Prioritise non-redundancy

By default, MINORg selects gRNA for sets using these criteria in decreasing order of priority:

  1. Coverage (of as yet uncovered targets)

  2. Proximity to 5’ end

  3. Non-redundancy

Proximity is only assessed when there is a tie for coverage, and non-redundancy when there is a tie for both coverage and proximity. You may instead prioritise non-redundancy over proximity by setting prioritise_nr to True. MINORg will use a combination of approximate and optimal weighted set cover algorithms to output small sets with low redundancy. However, do note that the sets will in general be larger than when prioritise_nr is False.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_214_nr")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G46260", "AT5G46270", "AT5G46450", "AT5G46470", "AT5G46490", "AT5G46510", "AT5G46520"]
>>> my_minorg.query_reference = True
>>> my_minorg.prioritise_nr = True
>>> my_minorg.full()
>>> my_minorg.resolve()

6.7.3. Excluding gRNA

You may specify gRNA sequences to exclude from any final gRNA set by providing the path to a FASTA file containing sequences to exclude to exclude.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_215_exclude")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G46260", "AT5G46270", "AT5G46450", "AT5G46470", "AT5G46490", "AT5G46510", "AT5G46520"]
>>> my_minorg.query_reference = True
>>> my_minorg.exclude = "/path/to/sample_exclude_RPS6.fasta"
>>> my_minorg.full()
>>> my_minorg.resolve()

The gRNA names in the file passed to exclude do not matter. Only the sequences are used when determining whether to exclude a gRNA.

6.7.4. Accepting unknown checks

Sometimes, not all filtering checks (GC, background, and feature) are set for all sequences. This is not an issue if you use the full programme (i.e. full()), but may be relevant if you are re-generating sets using the ‘minimumset’ subcommand (i.e. minimumset()) with a modified mapping file OR a mapping file from the ‘filter’ subcommand where not all filters have been applied.

Let us take a look at ‘sample_custom_check.map’, where we’ve added a custom check called ‘my_custom_check’ in the last column:

gRNA id       gRNA sequence   target id       target sense    gRNA strand     start   end     group   background      GC      feature my_custom_check
gRNA_001      CTTCATCTTCTTCTCGAAAT    targetA NA      +       8       27      1       pass    pass    NA      pass
gRNA_001      CTTCATCTTCTTCTCGAAAT    targetB NA      +       80      99      1       pass    pass    NA      pass
gRNA_002      GATGTTTTCTTGAGCTTCAG    targetA NA      +       37      56      1       pass    pass    NA      NA
gRNA_002      GATGTTTTCTTGAGCTTCAG    targetB NA      +       286     305     1       pass    pass    NA      pass
gRNA_002      GATGTTTTCTTGAGCTTCAG    targetC NA      +       109     128     1       pass    pass    NA      fail
gRNA_002      GATGTTTTCTTGAGCTTCAG    targetD NA      +       110     129     1       pass    pass    NA      fail
gRNA_003      ATGTTTTCTTGAGCTTCAGA    targetB NA      +       38      57      1       pass    pass    NA      NA
gRNA_003      ATGTTTTCTTGAGCTTCAGA    targetC NA      +       287     306     1       pass    pass    NA      pass
gRNA_003      ATGTTTTCTTGAGCTTCAGA    targetD NA      +       110     129     1       pass    pass    NA      pass

There are three possible values for check status: ‘pass’, ‘fail’, and ‘NA’.

An invalid/unset check is an ‘NA’. If a check is unset for all entries (as is the case with the check ‘feature’ here), it will be ignored (i.e. the check is treated as ‘pass’ for all entries). However, when a check has been set for some entries but not others (as is the case with the ‘my_custom_check’ check here), MINORg will treat invalid/unset checks as ‘fail’ by default. This is because there isn’t enough information on whether this constitutes a pass or fail for the check, and MINORg prefers to be conservative when outputting gRNA. You may override this behaviour by setting accept_invalid to True. By doing so, MINORg will treat ‘NA’ as ‘pass’ for all checks.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_216_acceptinvalid")
>>> my_minorg.parse_grna_map_from_file("/path/to/sample_custom_check.map")
>>> my_minorg.accept_invalid = True
>>> my_minorg.minimumset()

6.7.5. Manually approve gRNA sets

You may opt to manually inspect each gRNA set before MINORg write them to file by using manual = True when executing full() or the minimum set subcommand minimumset().

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_217_manual")
>>> my_minorg.target = "/path/to/sample_CDS.fasta"
>>> my_minorg.full(manual = True)

        ID   sequence (Set 1)
        gRNA_001     GGAATACAAGAGATTATCGA
Hit 'x' to continue if you are satisfied with these sequences. Otherwise, enter the sequence ID or
sequence of an undesirable gRNA (case-sensitive) and hit the return key to update this list: x

Final gRNA sequence(s) have been written to minorg_gRNA_final.fasta
Final gRNA sequence ID(s), gRNA sequence(s), and target(s) have been written to minorg_gRNA_final.map

1 mutually exclusive gRNA set(s) requested. 1 set(s) found.
Output files have been generated in /path/to/example_217_manual

6.8. Subcommands

MINORg comprises of four main steps:

  1. Target sequence identification

  2. Candidate gRNA generation

  3. gRNA filtering

  4. Minimum gRNA set generation

As users may only wish to execute a subset of these steps instead of the full programme (full()), MINORg also provides four subcommands (methods) corresponding to these four steps:

  1. seq()

  2. grna()

  3. filter(), which itself calls three other methods

  4. minimumset()

The subcommands may be useful if you already have a preferred off-target/on-target assessment software. In this case, you may execute subcommands seq() and grna(), submit the gRNA output by MINORg for off-target/on-target assessment, update the .map file output by MINORg with the status of each gRNA for that off-target/on-target assessment, and execute minimumset() to obtain a desired number of minimum gRNA sets. Note that if you do this, you should re-read the updated .map file into MINORg using parse_grna_map_from_file() so MINORg can replace the gRNA data stored in memory with your updated gRNA data.

Each subcommand may require a different combination of attributes.

6.8.1. Subcommand seq()

The seq() subcommand identifies target sequences, whether by extracting them from a reference genome or inferring homologues in unannotated genomes. All parameters introduced in Defining target sequences (except attribute target) and Defining reference genomes apply. If you already have a FASTA file containing your target sequences, you may set target to the path of that FASTA file and skip this subcommand.

This step will output target sequences into a file ending with ‘_targets.fasta’. This filename will be stored at attribute target.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_218_subcmdseq")
>>> my_minorg.extend_reference("/path/to/sample_gene.fasta", "/path/to/sample_CDS.fasta")
>>> my_minorg.genes = ["AT1G10920"]
>>> my_minorg.add_query("/path/to/subset_9654.fasta", alias = "9654")
>>> my_minorg.add_query("/path/to/subset_9655.fasta", alias = "9655")
>>> my_minorg.seq()
>>> my_minorg.target
'/path/to/example_218_subcmdseq/minorg/minorg_gene_targets.fasta'

6.8.2. Subcommand grna()

The grna() subcommand generates gRNA within target sequences from a target file. Unlike the command line version, it DOES NOT incorporate parts of the seq() and filter() subcommands. All parameters introduced in Defining gRNA apply.

By default, .map and FASTA files of gRNA sequences will be written to files. You may override this behaviour by setting auto_update_files to False or using auto_update_files = False when instantiating a MINORg object (e.g. my_minorg(directory = "/path/to/output/dir", auto_update_files = False)). In this case, only the FASTA file will be written. To manually write files, you should use the following methods. If you do not supply an output file path, it will be automatically generated:

  • write_all_grna_map(): write .map file containing all candidate gRNA (no checks will be set by grna() so all entries in check fields will be ‘NA’)

    • Path to output file will be stored at grna_map

    • If output file is not specified, the output file will be <output_directory>/<prefix>/<prefix>_gRNA_all.map

  • write_all_grna_fasta(): write FASTA file containing all candidate gRNA

    • Path to output file will be stored at grna_fasta

    • If output file is not specified, the output file will be <output_directory>/<prefix>/<prefix>_gRNA_all.fasta

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_219_subcmdgrna")
>>> my_minorg.target = "/path/to/sample_CDS.fasta"
>>> my_minorg.grna() ## default 3' NGG PAM
PAM pattern: .{20}(?=[GATC]GG)
>>> my_minorg.grna_hits
gRNAHits(gRNA = 201)
>>> from minorg import pam
>>> my_minorg.pam = pam.Cas12a ## 5' TTTV PAM
>>> my_minorg.grna() ## regenerate gRNA
PAM pattern: (?<=TTT[ACG]).{20}
>>> my_minorg.grna_hits
gRNAHits(gRNA = 95)
>>> my_minorg.pam = "ATV."
>>> my_minorg.grna() ## regenerate gRNA
PAM pattern: (?<=AT[ACG]).{20}
>>> my_minorg.grna_hits
gRNAHits(gRNA = 267)
>>> my_minorg.write_all_grna_fasta()
>>> my_minorg.grna_fasta
'/path/to/example_218_subcmdgrna/minorg/minorg_gRNA_all.fasta'
>>> my_minorg.write_all_grna_fasta("/path/to/another/location.fasta")
>>> my_minorg.grna_fasta
'/path/to/another/location.fasta'

gRNA data is stored at the attribute grna_hits, and it prints the number of gRNA as a string representation. In the above example, 201 different gRNA are generated from the target sequences in the target file “sample_CDS.fasta”. We then decided we want to generate gRNA for Cas12a instead, which has a 5’ TTTV PAM pattern. This yields us 95 different gRNA. Finally we decided to try a completely made up 5’ ATV PAM pattern, netting us 267 different gRNA in the end. Satisfied, we wrote the sequences of these gRNA to file, and printed the path of the file.

6.8.3. Subcommand filter()

The filter() subcommand takes in a compulsory MINORg .map file (which can be read using parse_grna_map_from_file()) and rewrites some/all checks. You can execute all filters (GC, off-target, and feature) using filter(), or execute checks separately using filter_gc(), filter_background(), and filter_feature().

By default, gRNA sequences and map files will be updated automatically whenever any of the filtering methods is called. You may override this behaviour by setting auto_update_files to False or using auto_update_files = False when instantiating a MINORg object (e.g. my_minorg(directory = "/path/to/output/dir", auto_update_files = False)). To manually write files, you should use the following methods. If you do not supply an output file path, it will be automatically generated:

  • write_all_grna_map(): write .map file containing all candidate gRNA and checks

    • Path to output file will be stored at grna_map

    • If output file is not specified, the output file will be <output_directory>/<prefix>/<prefix>_gRNA_all.map

  • write_all_grna_fasta(): write FASTA file containing all candidate gRNA

    • Path to output file will be stored at grna_fasta

    • If output file is not specified, the output file will be <output_directory>/<prefix>/<prefix>_gRNA_all.fasta

    • This file will NOT be auto updated as it is not affected by filtering check status

  • write_pass_grna_map(): write .map file containing all passing gRNA

    • Path to output file will be stored at pass_map

    • If output file is not specified, the output file will be <output_directory>/<prefix>/<prefix>_gRNA_pass.map

  • write_pass_grna_fasta(): write FASTA file containing all passing gRNA

    • Path to output file will be stored at pass_fasta

    • If output file is not specified, the output file will be <output_directory>/<prefix>/<prefix>_gRNA_pass.fasta

In all cases, you may rename the gRNA using rename_grna(), which takes in the path of a FASTA file that contains the gRNA sequences you wish to rename with sequence IDs of the names you wish to rename them to. This method should be used before you call any of the above methods to write gRNA to file.

6.8.3.1. Subcommand filter_gc()

All parameters introduced in Filter by GC content apply.

6.8.3.1.1. Filtering by GC content after calling full()

filter_gc() can be used on an active MINORg object even if you’ve already called full().

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_220_subcmdfilter_gc")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050", "AT5G45060", "AT5G45200", "AT5G45210", "AT5G45220", "AT5G45230", "AT5G45240", "AT5G45250"]
>>> my_minorg.query_reference = True
>>> my_minorg.full()
>>> my_minorg.grna_hits
gRNAHits(gRNA = 2141)
>>> my_minorg.valid_grna("GC") ## gRNA that pass GC filter
gRNAHits(gRNA = 1871)
>>> my_minorg.gc_min = 0.2
>>> my_minorg.gc_max = 0.8
>>> my_minorg.filter_gc() ## re-filter by GC content
>>> my_minorg.valid_grna("GC") ## gRNA that pass GC filter
gRNAHits(gRNA = 2097)
>>> my_minorg.minimumset()
>>> my_minorg.resolve()
6.8.3.1.2. Filtering GC content on output of another MINORg run
>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_220_subcmdfilter_gc_pt2", auto_update_files = False)
>>> my_minorg.parse_grna_map_from_file("/path/to/sample_custom_check.map")
>>> my_minorg.valid_grna("GC")
gRNAHits(gRNA = 3)
>>> my_minorg.gc_min = 0.4
>>> my_minorg.gc_max = 0.6
>>> my_minorg.filter_gc()
>>> my_minorg.valid_grna("GC")
gRNAHits(gRNA = 1)
>>> my_minorg.write_pass_grna_fasta()
>>> my_minorg.resolve()

6.8.3.2. Subcommand filter_background()

All parameters introduced in Filter by off-target apply. Additionally, you should supply target sequences to target so that MINORg can mask them (this tells MINORg that any gRNA hits to them is in fact on-target and NOT off-target). Any additional sequences to be masked may be provided to mask as a list of paths to FASTA files. If you have set screen_reference to True to include reference genome(s) (see Multiple reference genomes for how to specify multiple reference genomes) in the off-target screen, you may specify a FASTA file of sequences of genes to be masked to mask as well. You can generate these sequences using the seq() subcommand, but MAKE SURE TO USE A DIFFERENT MINORg OBJECT AND DIRECTORY TO AVOID OVERWRITING ANY PREVIOUSLY GENERATED FILES.

6.8.3.2.1. Filtering background after calling full()

Let us first execute MINORg.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_221_subcmdfilter_bg")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G46450", "AT5G46470", "AT5G46490", "AT5G46510", "AT5G46520"]
>>> my_minorg.add_query("/path/to/subset_9654.fasta", alias = "9654")
>>> my_minorg.add_query("/path/to/subset_9655.fasta", alias = "9655")
>>> my_minorg.sets = 5
>>> my_minorg.full(background_check = False)

In the code above, we skipped off-target check using background_check = False when executing full(). But we’ve changed out mind and would like to screen the reference genome and the non-reference genomes that these targets are from AND we don’t want our gRNA to be able to target any genes in ‘subset_9944.fasta’ and ‘subset_9947’. We also want to tell MINORg that it’s okay if a gRNA has off-target effects in homologous genes AT5G46260 and AT5G46270 in the reference genome. We can do that using the filter() subcommand, followed by the minimumset() subcommand to regenerate minimum sets.

In order to do all this, we will have to get the gene sequences of AT5G46260 and AT5G46270 in order to mask them in the reference genome. We can do this using the get_reference_seq() method.

>>> ot_minorg = MINORg(directory = "/path/to/example_221_subcmdfilter_bg_tomask") ## different directory
>>> ot_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> ot_minorg.genes = ["AT5G46260", "AT5G46270"]
>>> fout_to_mask = ot_minorg.mkfname("ref_to_mask.fasta") ## MINORg has a built-in method to generate file names within the output directory
>>> ot_minorg.get_reference_seq(fout = fout_to_mask) ## this method will return a dictionary of sequences, but will also write to file if 'fout' is used
>>> ot_minorg.resolve()

Now that we have the reference sequences to mask, we can pass the file name to my_minorg‘s mask attribute, add our background files using add_background(), set screen_reference to True, call filter_background() to update off-target checks for all candidate gRNA, and execute minimumset() to regenerate our minimum gRNA sets. You may also wish to call write_all_grna_map(), write_pass_grna_map(), and/or write_pass_grna_fasta() to update the gRNA FASTA and .map files if auto_update_files has been set to False.

>>> my_minorg.mask.append(fout_to_mask)
>>> my_minorg.add_background("/path/to/subset_9944.fasta", alias = "9944")
>>> my_minorg.add_background("/path/to/subset_9947.fasta", alias = "9947")
>>> my_minorg.screen_reference = True
>>> my_minorg.filter_background()
>>> my_minorg.minimumset()
>>> my_minorg.resolve()
6.8.3.2.2. Filtering background on output of another MINORg run

Alternatively, if the orginal my_minorg object no longer exists, whether because you’ve closed the IDE session or deleted the object, you can read its .map file into a new MINORg object using parse_grna_map_from_file() like below. In this case, you can pass the IDs of the additional genes to be masked together with the original genes to genes and don’t need to use get_reference_seq(). Since we’re no longer querying ‘subset_9654.fasta’ and ‘subset_9655.fasta’, we can use add_background() to tell MINORg to search for off-target effects in them. And don’t forget to also provide the FASTA file of target sequences to target so MINORg can mask them!:

>>> from minorg.MINORg import MINORg
>>> new_minorg = MINORg(directory = "/path/to/example_221_subcmdfilter_bg_new")
>>> new_minorg.parse_grna_map_from_file("/path/to/example_221_subcmdfilter_bg/minorg/minorg_gRNA_all.map")
>>> new_minorg.target = "/path/to/example_221_subcmdfilter_bg/minorg/minorg_gene_targets.fasta"
>>> new_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> new_minorg.genes = ["AT5G46260", "AT5G46270", "AT5G46450", "AT5G46470", "AT5G46490", "AT5G46510", "AT5G46520"]
>>> new_minorg.add_background("/path/to/subset_9654.fasta", alias = "9654")
>>> new_minorg.add_background("/path/to/subset_9655.fasta", alias = "9655")
>>> new_minorg.add_background("/path/to/subset_9944.fasta", alias = "9944")
>>> new_minorg.add_background("/path/to/subset_9947.fasta", alias = "9947")
>>> new_minorg.screen_reference = True
>>> new_minorg.filter_background()
>>> new_minorg.minimumset()
>>> new_minorg.resolve()

6.8.3.3. Subcommand filter_feature()

All parameters introduced in Filter by feature apply. Additionally, you will need to provide a FASTA file of target sequences (attribute target), reference genome(s) (see Defining reference genomes), and genes (attribute genes). The specified reference gene(s) will be extracted from the reference genome(s) and aligned with target sequence(s) in order for MINORg to infer feature boundaries in target sequence(s). See Within-feature inference for the algorithm of how feature boundaries are inferred.

6.8.3.3.1. Filtering feature after calling full()

Let us first execute MINORg.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_222_subcmdfilter_feature")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.add_query("/path/to/subset_9654.fasta", alias = "9654")
>>> my_minorg.add_query("/path/to/subset_9655.fasta", alias = "9655")
>>> my_minorg.full()
>>> my_minorg.valid_grna("feature") ## gRNA that pass within-feature filter
gRNAHits(gRNA = 368)

By default, MINORg sets the desired feature to ‘CDS’. You can re-assess and overwrite the ‘feature’ check in the .map file to only allow gRNA in other GFF features, such as the 3’ UTR, by updating feature and using filter_feature() to re-filter gRNA for the new feature.

>>> my_minorg.feature = "three_prime_UTR"
>>> my_minorg.filter_feature()
>>> my_minorg.valid_grna("feature") ## gRNA that pass within-feature filter
gRNAHits(gRNA = 5)
>>> my_minorg.minimumset()
>>> my_minorg.resolve()
6.8.3.3.2. Filtering feature on output of another MINORg run

As with Filtering background on output of another MINORg run, we can read in the output of a previous MINORg execution and filter that. This requires the .map file ending with ‘_all.map’ (parse using parse_grna_map_from_file()) as well as a FASTA file of target sequences (specify using target).

>>> from minorg.MINORg import MINORg
>>> new_minorg = MINORg(directory = "/path/to/example_222_subcmdfilter_feature_new")
>>> new_minorg.parse_grna_map_from_file("/path/to/example_222_subcmdfilter_feature/minorg/minorg_gRNA_all.map")
>>> new_minorg.target = "/path/to/example_222_subcmdfilter_feature/minorg/minorg_gene_targets.fasta"
>>> new_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> new_minorg.genes = ["AT5G45050"] ## MINORg needs to know which reference genes to align to targets to in order to infer feature ranges
>>> new_minorg.feature = "three_prime_UTR"
>>> new_minorg.filter_feature()
>>> new_minorg.minimumset()
>>> new_minorg.resolve()

6.8.4. Subcommand minimumset()

The minimumset() subcommand generates mutually exclusive minimum set(s) of gRNA, where each set is capable of covering all targets. It requires a MINORg .map file (the one that ends in ‘_gRNA_pass.map’ is sufficient, but ‘_gRNA_all.map’ would allow for filtering by a custom combination of fields). All parameters introduced in Generating minimum gRNA set(s) apply.

This step will write final gRNA sequences into a file ending with ‘_gRNA_final.fasta’. A file ending with ‘_gRNA_final.map’ that maps gRNA to their targets will also be generated. You may optionally specify the location of the FASTA and .map output files using:

  • final_map: path of .map file containing gRNA in final set(s)

    • If output file is not specified, the output file will be <output_directory>/<prefix>/<prefix>_gRNA_final.map

  • final_fasta: path of FASTA file containing gRNA in final set(s)

    • If output file is not specified, the output file will be <output_directory>/<prefix>/<prefix>_gRNA_final.fasta

6.8.4.1. Regenerating minimum sets after calling full()

minimumset() can also be used on an active MINORg object.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_223_subcmdminimumset_pt1")
...
<set up parameters>
...
>>> my_minorg.full()
>>> my_minorg.sets = 5
>>> my_minorg.minimumset() ## regenerate up to 5 gRNA sets

6.8.4.2. Generating minimum sets from output of another MINORg run

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_223_subcmdminimumset_pt2")
>>> my_minorg.parse_grna_map_from_file("/path/to/example_203_query/minorg/minorg_gRNA_all.map")
>>> my_minorg.target = "/path/to/example_203_query/minorg/minorg_gene_targets.fasta"
>>> my_minorg.prioritise_nr = True
>>> my_minorg.sets = 5
>>> my_minorg.minimumset(gc_check = False)
>>> my_minorg.resolve()

In order for MINORg to better assess a gRNA’s proximity to the 5’ end (of hopefully sense strand) of a target in the event a tie-breaker is necessary, it is strongly suggested that target sequences be provided to target so MINORg knows how long a target sequence is. This is especially so if the target sequences are antisense ones (you can check this using the .map file) generated by MINORg’s inferences of homologues in unannotated genomes. In the example above, we’ve asked MINORg to ignore the GC content check when generating minimum sets (my_minorg.minimumset(gc_check = False)).

6.8.5. Chaining subcommands

You may use subcommands separately if you’d like to inspect the outcome of each step and/or repeat a step with different parameters before proceeding with the next. MINORg tracks the output of previous steps, so you do not need to read them into MINORg before executing the next step.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_224_subcmd", prefix = "test", thread = 1)
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10", replace = True)
>>> my_minorg.add_reference("/path/to/subset_ref_Araly2.fasta", "/path/to/subset_ref_Araly2.gff", alias = "araly2")
>>> my_minorg.genes = ["AT1G33560", "AL1G47950.v2.1"]
>>> my_minorg.query_reference = True
>>> my_minorg.seq() ## generate target sequences
>>> my_minorg.target ## print path to FASTA file containing target sequences
'/path/to/example_223_subcmd/minorg/minorg_gene_targets.fasta'
>>> my_minorg.grna()
PAM pattern: .{20}(?=[GATC]GG)
>>> my_minorg.screen_reference = True
>>> my_minorg.filter_background()
Masking on-targets
Finding off-targets
>>> my_minorg.valid_grna("background")
gRNAHits(gRNA = 395)
>>> my_minorg.add_background("/path/to/subset_ref_Araha1.fasta", alias = "araha1") ## add background file
>>> my_minorg.filter_background() ## repeat background check with additional background file
Masking on-targets
Finding off-targets
>>> my_minorg.valid_grna("background") ## updated set of passing gRNA
gRNAHits(gRNA = 250)
>>> my_minorg.filter_gc()
>>> my_minorg.valid_grna("GC")
gRNAHits(gRNA = 355)
>>> my_minorg.valid_grna("background", "GC")
gRNAHits(gRNA = 223)
>>> my_minorg.valid_grna() ## gRNA filtered for all valid checks (at this point, background and GC)
/path/to/minorg/grna.py:823: MINORgWarning: The following hit checks have not been set: feature
gRNAHits(gRNA = 223)
>>> my_minorg.filter_feature() ## by default, MINORg only retains gRNA in CDS
>>> my_minorg.valid_grna("feature")
gRNAHits(gRNA = 324)
>>> my_minorg.valid_grna()
gRNAHits(gRNA = 181)
>>> my_minorg.minimumset(manual = True)

        ID   sequence (Set 1)
        gRNA_026     GTCGTTTCCGGAGACTATGA
Hit 'x' to continue if you are satisfied with these sequences. Otherwise, enter the sequence ID or
sequence of an undesirable gRNA (case-sensitive) and hit the return key to update this list: gRNA_026

        ID   sequence (Set 1)
        gRNA_223     TCAATCTCCATCATAGTCTC
Hit 'x' to continue if you are satisfied with these sequences. Otherwise, enter the sequence ID or
sequence of an undesirable gRNA (case-sensitive) and hit the return key to update this list: x

Final gRNA sequence(s) have been written to /path/to/example_223_subcmd/minorg/minorg_gRNA_final.fasta
Final gRNA sequence ID(s), gRNA sequence(s), and target(s) have been written to
/path/to/example_223_subcmd/minorg/minorg_gRNA_final.map

1 mutually exclusive gRNA set(s) requested. 1 set(s) found.
>>> my_minorg.write_all_grna_map() ## write .map file containing check information for all candidate gRNA
>>> my_minorg.write_all_grna_fasta() ## write FASTA file containing all candidate gRNA
>>> my_minorg.write_pass_grna_map() ## write .map file containing information for valid gRNA
>>> my_minorg.write_pass_grna_fasta() ## write FASTA file containing valid gRNA
>>> my_minorg.resolve() ## remove temporary files

It is highly recommended that you execute resolve() to remove any temporary files generated.

6.9. Defining reference genomes

6.9.1. Single reference genome

See example in Reference gene(s) as targets.

6.9.2. Multiple reference genomes

See also: Reference

You may specify genes from multiple reference genomes so long as those reference genomes have also been added using add_reference().

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_225_multiref")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.add_reference("/path/to/subset_ref_Araly2.fasta", "/path/to/subset_ref_Araly2.gff", alias = "Araly2")
>>> my_minorg.add_reference("/path/to/subset_ref_Araha1.fasta", "/path/to/subset_ref_Araha1.gff", alias = "Araha1")
>>> my_minorg.genes = ["AT1G33560", "AL1G47950.v2.1", "Araha.3012s0003.v1.1"]
>>> my_minorg.query_reference = True
>>> my_minorg.full()
>>> my_minorg.resolve()

In the example above, MINORg will design gRNA for 3 highly conserved paralogues in 3 different species. Note that you should be careful that any gene IDs you use should either be unique across all reference genomes OR be shared only among your target genes. Otherwise, MINORg will treat any undesired genes with the same gene IDs as targets as well.

6.9.3. Non-standard reference

6.9.3.1. Non-standard genetic code

When using pssm_ids, users should ensure that the correct genetic code has been specified for reference genomes using the genetic_code keyword argument when adding reference genomes using add_reference(), as MINORg has to first translate CDS into peptides for domain search using RPS-BLAST. The default genetic code is the Standard Code. Please refer to https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi for genetic code numbers and names.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_226_geneticcode")
>>> my_minorg.add_reference("/path/to/subset_ref_yeast_mt.fasta", "/path/to/subset_ref_yeast_mt.gff", alias = "yeast_mt", genetic_code = 3) ## specify genetic code here
>>> my_minorg.genes = ["gene-Q0275"]
>>> my_minorg.query_reference = True
>>> my_minorg.rpsblast = "/path/to/rpsblast/executable"
>>> my_minorg.db = "/path/to/rpsblast/db"
>>> my_minorg.pssm_ids = ["366140"]
>>> my_minorg.full()
>>> my_minorg.resolve()

In the above example, the gene ‘gene-Q0275’ is a yeast mitochondrial gene, and my_minorg.pssm_ids = ["366140"] specifies the PSSM-Id for the COX3 domain in the Cdd v3.18 RPS-BLAST database. The genetic code number for yeast mitochondrial code is ‘3’.

As a failsafe, MINORg does not terminate translated peptide sequences at the first stop codon. This ensures that any codons after an incorrectly translated premature stop codon will still be translated. Typically, a handful of mistranslated codons can still result in the correct RPS-BLAST domain hits, although hit scores may be slightly lower. Nevertheless, to ensure maximum accuracy, the correct genetic code is preferred.

6.9.3.2. Non-standard GFF3 attribute field names

See also: Attribute modification

MINORg requires standard attribute field names in GFF3 files in order to properly map subfeatures to their parent features (e.g. map CDS to mRNA, and mRNA to gene). Non-standard field names should be mapped to standard ones using the attr_mod (for ‘attribute modification’) keyword argument when adding reference genomes using add_reference().

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_227_attrmod")
>>> my_minorg.add_reference("/path/to/subset_ref_irgsp.fasta", "/path/to/subset_ref_irgsp.gff", alias = "irgsp", attr_mod = {"mRNA": {"Parent": "Locus_id"}}) ## specify attribute modifications
>>> my_minorg.genes = ["Os01g0100100"]
>>> my_minorg.query_reference = True
>>> my_minorg.full()
>>> my_minorg.resolve()

The IRGSP 1.0 reference genome for rice (Oryza sativa subsp. Nipponbare) uses a non-standard attribute field name for mRNA entries in their GFF3 file. Instead of ‘Parent’, which is the standard name of the field used to map a feature to its parent feature, mRNA entries in the IRGSP 1.0 annotation use ‘Locus_id’. See Attribute modification for more details on how to format the input to attr_mod.

6.10. Multithreading

MINORg supports multi-threading in order to process files in parallel. Any excess threads may also be used for BLAST. This is most useful when you are querying multiple genomes, have multiple reference genomes, or multiple background sequences.

NOTE for Docker users: Multithreading for parallel querying of multiple genomes and backgrounds is DISABLED for Docker distributions due to incompatibilities.

To run MINORg with parallel processing, set thread to the desired number of threads.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_228_thread")
>>> my_minorg.extend_reference("/path/to/sample_gene.fasta", "/path/to/sample_CDS.fasta")
>>> my_minorg.genes = ["AT1G10920"]
>>> my_minorg.add_query("/path/to/subset_9654.fasta", alias = "9654")
>>> my_minorg.add_query("/path/to/subset_9655.fasta", alias = "9655")
>>> my_minorg.thread = 2
>>> my_minorg.full()
>>> my_minorg.resolve()

6.11. Differences between CLI and Python versions

Note that, unlike the command line, the Python package does not support aliases even if the config file has been set up appropriately for command line executions. Therefore, there are no true equivalents to --cluster, --indv, or --reference.

6.11.1. To specify cluster genes

Analogous to --cluster and --gene.

Correct:

>>> my_minorg.genes = ['AT5G46260','AT5G46270','AT5G46450','AT5G46470','AT5G46490','AT5G46510','AT5G46520']

Incorrect:

>>> my_minorg.cluster_set = '/path/to/subset_cluster_mapping.txt'
>>> my_minorg.cluster = 'RPS6'

Attributes ‘cluster_set’ and ‘cluster’ do not exist. This does not throw error now but will cause problems later.

6.11.2. To specify query FASTA files

Analogous to --indv and --query.

Correct:

>>> my_minorg.add_query('/path/to/subset_9654.fasta', alias = '9654')
>>> my_minorg.add_query('/path/to/subset_9655.fasta', alias = '9655')

Incorrect:

>>> my_minorg.genome_set = '/path/to/subset_genome_mapping.txt'
>>> my_minorg.indv = '9654,9655'

Attributes ‘genome_set’ and ‘indv’ do not exist. This does not throw error now but will cause problems later.

6.11.3. To specify reference genomes

Analogous to --reference, --assembly, --annotation, --attr-mod, and --genetic-code.

Correct:

>>> my_minorg.add_reference('/path/to/TAIR10.fasta', '/path/to/TARI10.gff3', alias = 'TAIR10', genetic_code = 1, atr_mod = {})

Note that attr_mod and genetic_code are optional if the annotation uses standard attribute field names and the standard genetic code, which the example above does.

Incorrect:

>>> my_minorg.reference_set = '/path/to/arabidopsis_genomes.txt'
>>> my_minorg.reference = 'TAIR10'
AttributeError: can't set attribute

Attributes ‘reference_set’ does not exist, and ‘reference’ is a property that users are not allowed to directly modify.