6. Tutorial (Python)

Please download the files in https://github.com/rlrq/MINORg/tree/master/examples. In all the examples below, you should replace “/path/to” with the appropriate full path name, which is usually to the directory containing these example files.

6.1. Setting up the tutorial

To ensure that the examples in this tutorial work, please replace ‘/path/to’ in the files ‘arabidopsis_genomes.txt’, ‘athaliana_genomes.txt’, and ‘subset_genome_mapping.txt’ with the full path to the directory containing the example files.

6.2. Getting started

To begin, import the MINORg class.

>>> from minorg.MINORg import MINORg

To create a MINORg object:

>>> my_minorg = MINORg(directory = "/path/to/output/directory", prefix = "prefix")

Both directory and prefix are optional. If not provided, they will default to the current directory and ‘minorg’ respectively. If the directory does not currently exist, it will be created.

If you wish to use the default values specified in a config file, use this instead:

>>> my_minorg = MINORg(config = "/path/to/config.ini", directory = "/path/to/output/directory", prefix = "prefix")

You may now set your parameters using the attributes of your MINORg object. For a table listing the equivalent CLI arguments and MINORg attributes, see CLI vs Python.

6.3. IMPT: Note on executables

See: Executables

You can specify executables as such:

>>> my_minorg.blastn = '/path/to/blastn/executable'
>>> my_minorg.rpsblast = '/path/to/rpsblast/executable'
>>> my_minorg.mafft = '/path/to/mafft/executable'
>>> my_minorg.bedtools = '/path/to/bedtools2/bin'

Note that BEDTools is unique in that if it is not in your command-search path, you should provide the path TO THE DIRECTORY CONTAINING ITS EXECUTABLES (i.e. there will not be a single ‘bedtools’ executable), and if it IS in your command-search path you SHOULD NOT be using the bedtools attribute. See Executables for more on executables.

6.4. Defining target sequences

6.4.1. User-provided targets

Let us begin with the simplest MINORg execution:

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_200_target")
>>> my_minorg.target = "/path/to/sample_CDS.fasta"
>>> my_minorg.full()
>>> my_minorg.resolve()

The above combination of arguments tells MINORg to generate gRNA from targets in a user-provided FASTA file (my_minorg.target = 'pat/to/sample_CDS.fasta') and to output files into the directory /path/to/output/directory/example_200_target. By default, MINORg generates 20 bp gRNA using NGG PAM. The full MINORg programme is executed by calling the full() method (my_minorg.full()). Don’t forget to call resolve() to remove any temporary files.

6.4.2. Reference gene(s) as targets

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_201_refgene")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050", "AT5G45060", "AT5G45200", "AT5G45210", "AT5G45220", "AT5G45230", "AT5G45240", "AT5G45250"]
>>> my_minorg.query_reference = True
>>> my_minorg.full()
>>> my_minorg.resolve()

In the above example, my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10") is used to specify information about a reference genome:

Positional argument 1: path to reference assembly (In this case "/path/to/subset_ref_TAIR10.fasta")
Positional argument 2: path to reference annotation (In this case "/path/to/subset_ref_TAIR10.gff")
Optional keyword argument 1 (alias): genome alias (in this case "TAIR10"); a unique name for the reference genome, used when referring to it in sequence names and output files. Autogenerated by MINORg if not provided.
See add_reference() and Non-standard reference for how to specify genetic code and non-standard attribute field names

my_minorg.genes = ["AT5G45050", "AT5G45060", "AT5G45200", "AT5G45210", "AT5G45220", "AT5G45230", "AT5G45240", "AT5G45250"] tells MINORg the target gene(s), and my_minorg.query_reference = True tells MINORg to generate gRNA for reference gene(s).

6.4.3. Non-reference gene(s) as targets

6.4.3.1. Extending the reference

6.4.3.2. Inferring homologues in unannotated genomes

If you would like MINORg to infer homologues in non-reference genomes, you can use add_query() to specify the FASTA files of those non-reference genomes.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_203_query")
>>> my_minorg.extend_reference("/path/to/sample_gene.fasta", "/path/to/sample_CDS.fasta")
>>> my_minorg.genes = ["AT1G10920"]
>>> my_minorg.add_query("/path/to/subset_9654.fasta", alias = "9654")
>>> my_minorg.add_query("/path/to/subset_9655.fasta", alias = "9655")
>>> my_minorg.full()
>>> my_minorg.resolve()

In the above example, my_minorg.add_query("/path/to/subset_9654.fasta", alias = "9654") and my_minorg.add_query("/path/to/subset_9655.fasta", alias = "9655") are used to specify information about query FASTA files.

The alias keyword argument is optional. If not provided, MINORg will generate a unique alias.
Query FASTA files are stored as a dictionary with the format {<alias>:<FASTA>} at query.
If you’d like to remove a query file that you’ve added, you can use:
```
>>> my_minorg.remove_query("9654")
```
- The remove_query() method takes a query alias. If you did not specify an alias when using add_query() and do not know the alias of the file you wish to remove, you may view the query-FASTA mapping using the query attribute.
```
>>> my_minorg.query
{"9654": "/path/to/subset_9654.fasta", "9655": "/path/to/subset_9655.fasta"}
```

6.4.4. Domain as targets

MINORg allows users to specify the identifier of an RPS-BLAST position-specific scoring matrix (PSSM-Id) to further restrict the target sequence to a given domain associated with the PSSM-Id. This could be particularly useful when designing gRNA for genes that do not share conserved domain structures but do share a domain that you wish to knock out.

6.4.4.1. Local database

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_204_domain")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.query_reference = True
>>> my_minorg.rpsblast = "/path/to/rpsblast/executable"
>>> my_minorg.db = "/path/to/rpsblast/db"
>>> my_minorg.pssm_ids = ["214815"]
>>> my_minorg.full()
>>> my_minorg.resolve()

In the above example, gRNA will be generated for the WRKY domain (PSSM-Id 214815 as of CDD database v3.18) of the gene AT5G45050. Users are responsible for providing the PSSM-Id of a domain that exists in the gene. If multiple PSSM-Ids are provided, overlapping domains will be combined and output WILL NOT distinguish between one PSSM-Id or another. Unlike other examples, the database (db) is not provided as part of the example files. If you are using the full Docker image pulled from rlrq/minorg, the database is bundled with the image. Otherwise, you will have to download it yourself. See RPS-BLAST local database for more information.

6.4.4.2. Remote database

While it is in theory possible to use the remote CDD database & servers instead of local ones, the --remote option for the ‘rpsblast’/’rpsblast+’ command from the BLAST+ package has never worked for me. In any case, if your version of local rpsblast is able to access the remote database, you can use remote_rps.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_204_domain")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.query_reference = True
>>> my_minorg.rpsblast = "/path/to/rpsblast/executable"
>>> my_minorg.db = "Cdd"
>>> my_minorg.remote_rps = True
>>> my_minorg.pssm_ids = ["214815"]
>>> my_minorg.full()
>>> my_minorg.resolve()

6.5. Defining gRNA

6.6. Filtering gRNA

MINORg supports 3 different gRNA filtering options, all of which can be used together.

6.6.1. Filter by GC content

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_206_gc")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.query_reference = True
>>> my_minorg.gc_min = 0.2
>>> my_minorg.gc_max = 0.8
>>> my_minorg.full()
>>> my_minorg.resolve()

In the above example, MINORg will exclude gRNA with less than 20% (my_minorg.gc_min = 0.2) or greater than 80% (my_minorg.gc_min = 0.8) GC content. By default, minimum GC content is 30% and maximum is 70%.

6.6.2. Filter by off-target

See: Off-target assessment

6.6.2.1. Using total mismatch/gap/unaligned

See: Total mismatch/gap/unaligned

Thresholds for total number of mismatches or gaps (and unaligned positions) required for an off-target gRNA hit to be considered non-problematic are controlled by ot_mismatch and ot_gap respectively. See Total mismatch/gap/unaligned for more.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_207_ot_ref")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.query_reference = True
>>> my_minorg.screen_reference = True
>>> my_minorg.add_background("/path/to/subset_ref_Araly2.fasta", alias = "araly")
>>> my_minorg.add_background("/path/to/subset_ref_Araha1.fasta", alias = "araha")
>>> my_minorg.add_background("/path/to/subset_9654.fasta", alias = "9654")
>>> my_minorg.add_background("/path/to/subset_9655.fasta", alias = "9655")
>>> my_minorg.ot_gap = 2
>>> my_minorg.ot_mismatch = 2
>>> my_minorg.full()
>>> my_minorg.resolve()

In the above example, MINORg will screen gRNA for off-targets in:

The reference genome (my_minorg.screen_reference)
Four different FASTA files (my_minorg.add_background("<FASTA>", alias = "<alias>"))
- The alias keyword argument is optional. If not provided, MINORg will generate a unique alias.
- Note that any AT5G45050 homologues in these four FASTA files will NOT be masked. This means that only gRNA that do not target any AT5G45050 homologues in these four genomes will pass this off-target check.
  - To mask homologues in these genomes, you will need to provide a FASTA file containing the sequences of their homologues using my_minorg.mask = ["/path/to/to_mask_1.fasta", "/path/to/to_mask_2.fasta"]. You may use subcommand seq() (see Subcommands) to identify these homologues and retrieve their sequences.

ot_gap and ot_mismatch control the minimum number of gaps or mismatches off-target gRNA hits must have to be considered non-problematic; any gRNA with at least one problematic gRNA hit will be excluded. By default, both values are set to ‘1’. See Off-target assessment for more on the off-target assessment algorithm.

In the case above, my_minorg.screen_reference = True is actually redundant as the genome(s) from which targets are obtained (which, because of my_minorg.query_reference, is the reference genome) are automatically included for background check.

However, in the example below, when the targets are from non-reference genomes, the reference genome is not automatically included for off-target assessment and thus screen_reference is NOT redundant. Additionally, do note that the genes specified using gene are masked in the reference genome, such that any gRNA hits to them are NOT considered off-target and will NOT be excluded.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_208_ot_nonref")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.add_query("/path/to/subset_9654.fasta", alias = "9654")
>>> my_minorg.screen_reference = True
>>> my_minorg.add_background("/path/to/subset_ref_Araly2.fasta", alias = "araly")
>>> my_minorg.add_background("/path/to/subset_ref_Araha1.fasta", alias = "araha")
>>> my_minorg.add_background("/path/to/subset_9655.fasta", alias = "9655")
>>> my_minorg.ot_gap = 2
>>> my_minorg.ot_mismatch = 2
>>> my_minorg.full()
>>> my_minorg.resolve()

6.6.2.2. Using position-specific mismatch/gap/unaligned

See: Position-specific mismatch/gap/unaligned

Finer control of off-target definition can be achieved using ot_pattern, which allows users to provide a pattern that specifies different thresholds for different positions along a gRNA. Unlike ot_mismatch and ot_gap, which specify the LOWER-bound of NON-problematic hits, ot_pattern specifies UPPER-bound of PROBLEMATIC hits. By default, unaligned positions will be treated as mismatches, but this behaviour can be altered by setting ot_unaligned_as_mismatch to False. See Off-target pattern for how to build an off-target pattern, and Position-specific mismatch/gap/unaligned for more on how unaligned positions can be counted.

When ot_pattern is specified, ot_mismatch and ot_gap will be ignored.

The following example is identical to the first in Using total mismatch/gap/unaligned, except ot_mismatch and ot_gap are replaced with ot_pattern.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_209_ot_ref_pattern")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.query_reference = True
>>> my_minorg.screen_reference = True
>>> my_minorg.add_background("/path/to/subset_ref_Araly2.fasta", alias = "araly")
>>> my_minorg.add_background("/path/to/subset_ref_Araha1.fasta", alias = "araha")
>>> my_minorg.add_background("/path/to/subset_9654.fasta", alias = "9654")
>>> my_minorg.add_background("/path/to/subset_9655.fasta", alias = "9655")
>>> my_minorg.ot_pattern = "0mg-10,1mg-11-"
>>> my_minorg.full()
>>> my_minorg.resolve()

In the above example, my_minorg.ot_pattern = "0mg-10,1mg-11-" means that MINORg will discard any gRNA with at least one off-target hit where:

There are no mismatches or gaps between positions -10 and -1, and there are no more than 1 mismatch or gap from position -11 to the 5’ end.

See Off-target pattern for how to build and interpret an off-target pattern.

6.6.2.3. PAM-less off-target check

By default, MINORg does NOT check for the presence of PAM sites next to potential off-target hits. You may override this behaviour by setting ot_pamless to False. This tells MINORg to mark off-target hits that fail the ot_gap or ot_mismatch thresholds (or match ot_pattern) as problematic ONLY IF there is a PAM site nearby.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_210_ot_pamless")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.add_query("/path/to/subset_9654.fasta", alias = "9654")
>>> my_minorg.screen_reference = True
>>> my_minorg.add_background("/path/to/subset_ref_Araly2.fasta", alias = "araly")
>>> my_minorg.add_background("/path/to/subset_ref_Araha1.fasta", alias = "araha")
>>> my_minorg.add_background("/path/to/subset_9655.fasta", alias = "9655")
>>> my_minorg.ot_gap = 2
>>> my_minorg.ot_mismatch = 2
>>> my_minorg.ot_pamless = True
>>> my_minorg.full()
>>> my_minorg.valid_grna("background") ## gRNA that pass background filtering
gRNAHits(gRNA = 6)
>>> my_minorg.ot_pamless = False ## only remove gRNA from candidates if off-target hits have PAM site nearby
>>> my_minorg.full()
>>> my_minorg.valid_grna("background")
gRNAHits(gRNA = 12)
>>> my_minorg.resolve()

6.6.2.4. Skip off-target check

To skip off-target check entirely, use background_check = False when calling full().

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_211_skipbgcheck")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.query_reference = True
>>> my_minorg.full(background_check = False)
(several warning messages about the background check being unset will pop up, but you can ignore them)
>>> my_minorg.resolve()

6.6.3. Filter by feature

See: Within-feature inference

By default, when genes is set, MINORg restricts gRNA to coding regions (CDS). For more on how MINORg does this for inferred, unannotated homologues, see Within-feature inference. You may change the feature type in which to design gRNA using the attribute feature. See column 3 of your GFF3 file for valid feature types (see https://en.wikipedia.org/wiki/General_feature_format for more on GFF file format).

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_212_withinfeature")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.query_reference = True
>>> my_minorg.feature = "three_prime_UTR"
>>> my_minorg.full(background_check = False)
>>> my_minorg.resolve()

6.7. Generating minimum gRNA set(s)

6.7.1. Number of sets

By default, MINORg outputs a single gRNA set covering all targets. You may request more (mutually exclusive) sets using the set attribute.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_213_set")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G46260", "AT5G46270", "AT5G46450", "AT5G46470", "AT5G46490", "AT5G46510", "AT5G46520"]
>>> my_minorg.query_reference = True
>>> my_minorg.sets = 5
>>> my_minorg.full()
>>> my_minorg.resolve()

6.7.2. Prioritise non-redundancy

By default, MINORg selects gRNA for sets using these criteria in decreasing order of priority:

Coverage (of as yet uncovered targets)
Proximity to 5’ end
Non-redundancy

Proximity is only assessed when there is a tie for coverage, and non-redundancy when there is a tie for both coverage and proximity. You may instead prioritise non-redundancy over proximity by setting prioritise_nr to True. MINORg will use a combination of approximate and optimal weighted set cover algorithms to output small sets with low redundancy. However, do note that the sets will in general be larger than when prioritise_nr is False.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_214_nr")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G46260", "AT5G46270", "AT5G46450", "AT5G46470", "AT5G46490", "AT5G46510", "AT5G46520"]
>>> my_minorg.query_reference = True
>>> my_minorg.prioritise_nr = True
>>> my_minorg.full()
>>> my_minorg.resolve()

6.7.3. Excluding gRNA

You may specify gRNA sequences to exclude from any final gRNA set by providing the path to a FASTA file containing sequences to exclude to exclude.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_215_exclude")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G46260", "AT5G46270", "AT5G46450", "AT5G46470", "AT5G46490", "AT5G46510", "AT5G46520"]
>>> my_minorg.query_reference = True
>>> my_minorg.exclude = "/path/to/sample_exclude_RPS6.fasta"
>>> my_minorg.full()
>>> my_minorg.resolve()

The gRNA names in the file passed to exclude do not matter. Only the sequences are used when determining whether to exclude a gRNA.

6.7.4. Accepting unknown checks

Sometimes, not all filtering checks (GC, background, and feature) are set for all sequences. This is not an issue if you use the full programme (i.e. full()), but may be relevant if you are re-generating sets using the ‘minimumset’ subcommand (i.e. minimumset()) with a modified mapping file OR a mapping file from the ‘filter’ subcommand where not all filters have been applied.

Let us take a look at ‘sample_custom_check.map’, where we’ve added a custom check called ‘my_custom_check’ in the last column:

gRNA id       gRNA sequence   target id       target sense    gRNA strand     start   end     group   background      GC      feature my_custom_check
gRNA_001      CTTCATCTTCTTCTCGAAAT    targetA NA      +       8       27      1       pass    pass    NA      pass
gRNA_001      CTTCATCTTCTTCTCGAAAT    targetB NA      +       80      99      1       pass    pass    NA      pass
gRNA_002      GATGTTTTCTTGAGCTTCAG    targetA NA      +       37      56      1       pass    pass    NA      NA
gRNA_002      GATGTTTTCTTGAGCTTCAG    targetB NA      +       286     305     1       pass    pass    NA      pass
gRNA_002      GATGTTTTCTTGAGCTTCAG    targetC NA      +       109     128     1       pass    pass    NA      fail
gRNA_002      GATGTTTTCTTGAGCTTCAG    targetD NA      +       110     129     1       pass    pass    NA      fail
gRNA_003      ATGTTTTCTTGAGCTTCAGA    targetB NA      +       38      57      1       pass    pass    NA      NA
gRNA_003      ATGTTTTCTTGAGCTTCAGA    targetC NA      +       287     306     1       pass    pass    NA      pass
gRNA_003      ATGTTTTCTTGAGCTTCAGA    targetD NA      +       110     129     1       pass    pass    NA      pass

There are three possible values for check status: ‘pass’, ‘fail’, and ‘NA’.

An invalid/unset check is an ‘NA’. If a check is unset for all entries (as is the case with the check ‘feature’ here), it will be ignored (i.e. the check is treated as ‘pass’ for all entries). However, when a check has been set for some entries but not others (as is the case with the ‘my_custom_check’ check here), MINORg will treat invalid/unset checks as ‘fail’ by default. This is because there isn’t enough information on whether this constitutes a pass or fail for the check, and MINORg prefers to be conservative when outputting gRNA. You may override this behaviour by setting accept_invalid to True. By doing so, MINORg will treat ‘NA’ as ‘pass’ for all checks.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_216_acceptinvalid")
>>> my_minorg.parse_grna_map_from_file("/path/to/sample_custom_check.map")
>>> my_minorg.accept_invalid = True
>>> my_minorg.minimumset()

6.7.5. Manually approve gRNA sets

You may opt to manually inspect each gRNA set before MINORg write them to file by using manual = True when executing full() or the minimum set subcommand minimumset().

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_217_manual")
>>> my_minorg.target = "/path/to/sample_CDS.fasta"
>>> my_minorg.full(manual = True)

        ID   sequence (Set 1)
        gRNA_001     GGAATACAAGAGATTATCGA
Hit 'x' to continue if you are satisfied with these sequences. Otherwise, enter the sequence ID or
sequence of an undesirable gRNA (case-sensitive) and hit the return key to update this list: x

Final gRNA sequence(s) have been written to minorg_gRNA_final.fasta
Final gRNA sequence ID(s), gRNA sequence(s), and target(s) have been written to minorg_gRNA_final.map

1 mutually exclusive gRNA set(s) requested. 1 set(s) found.
Output files have been generated in /path/to/example_217_manual

6.8. Subcommands

MINORg comprises of four main steps:

Target sequence identification
Candidate gRNA generation
gRNA filtering
Minimum gRNA set generation

As users may only wish to execute a subset of these steps instead of the full programme (full()), MINORg also provides four subcommands (methods) corresponding to these four steps:

seq()
grna()
filter(), which itself calls three other methods
minimumset()

The subcommands may be useful if you already have a preferred off-target/on-target assessment software. In this case, you may execute subcommands seq() and grna(), submit the gRNA output by MINORg for off-target/on-target assessment, update the .map file output by MINORg with the status of each gRNA for that off-target/on-target assessment, and execute minimumset() to obtain a desired number of minimum gRNA sets. Note that if you do this, you should re-read the updated .map file into MINORg using parse_grna_map_from_file() so MINORg can replace the gRNA data stored in memory with your updated gRNA data.

Each subcommand may require a different combination of attributes.

6.8.1. Subcommand `seq()`

The seq() subcommand identifies target sequences, whether by extracting them from a reference genome or inferring homologues in unannotated genomes. All parameters introduced in Defining target sequences (except attribute target) and Defining reference genomes apply. If you already have a FASTA file containing your target sequences, you may set target to the path of that FASTA file and skip this subcommand.

This step will output target sequences into a file ending with ‘_targets.fasta’. This filename will be stored at attribute target.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_218_subcmdseq")
>>> my_minorg.extend_reference("/path/to/sample_gene.fasta", "/path/to/sample_CDS.fasta")
>>> my_minorg.genes = ["AT1G10920"]
>>> my_minorg.add_query("/path/to/subset_9654.fasta", alias = "9654")
>>> my_minorg.add_query("/path/to/subset_9655.fasta", alias = "9655")
>>> my_minorg.seq()
>>> my_minorg.target
'/path/to/example_218_subcmdseq/minorg/minorg_gene_targets.fasta'

6.8.2. Subcommand `grna()`

The grna() subcommand generates gRNA within target sequences from a target file. Unlike the command line version, it DOES NOT incorporate parts of the seq() and filter() subcommands. All parameters introduced in Defining gRNA apply.

By default, .map and FASTA files of gRNA sequences will be written to files. You may override this behaviour by setting auto_update_files to False or using auto_update_files = False when instantiating a MINORg object (e.g. my_minorg(directory = "/path/to/output/dir", auto_update_files = False)). In this case, only the FASTA file will be written. To manually write files, you should use the following methods. If you do not supply an output file path, it will be automatically generated:

write_all_grna_map(): write .map file containing all candidate gRNA (no checks will be set by grna() so all entries in check fields will be ‘NA’)
- Path to output file will be stored at grna_map
- If output file is not specified, the output file will be <output_directory>/<prefix>/<prefix>_gRNA_all.map
write_all_grna_fasta(): write FASTA file containing all candidate gRNA
- Path to output file will be stored at grna_fasta
- If output file is not specified, the output file will be <output_directory>/<prefix>/<prefix>_gRNA_all.fasta

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_219_subcmdgrna")
>>> my_minorg.target = "/path/to/sample_CDS.fasta"
>>> my_minorg.grna() ## default 3' NGG PAM
PAM pattern: .{20}(?=[GATC]GG)
>>> my_minorg.grna_hits
gRNAHits(gRNA = 201)
>>> from minorg import pam
>>> my_minorg.pam = pam.Cas12a ## 5' TTTV PAM
>>> my_minorg.grna() ## regenerate gRNA
PAM pattern: (?<=TTT[ACG]).{20}
>>> my_minorg.grna_hits
gRNAHits(gRNA = 95)
>>> my_minorg.pam = "ATV."
>>> my_minorg.grna() ## regenerate gRNA
PAM pattern: (?<=AT[ACG]).{20}
>>> my_minorg.grna_hits
gRNAHits(gRNA = 267)
>>> my_minorg.write_all_grna_fasta()
>>> my_minorg.grna_fasta
'/path/to/example_218_subcmdgrna/minorg/minorg_gRNA_all.fasta'
>>> my_minorg.write_all_grna_fasta("/path/to/another/location.fasta")
>>> my_minorg.grna_fasta
'/path/to/another/location.fasta'

gRNA data is stored at the attribute grna_hits, and it prints the number of gRNA as a string representation. In the above example, 201 different gRNA are generated from the target sequences in the target file “sample_CDS.fasta”. We then decided we want to generate gRNA for Cas12a instead, which has a 5’ TTTV PAM pattern. This yields us 95 different gRNA. Finally we decided to try a completely made up 5’ ATV PAM pattern, netting us 267 different gRNA in the end. Satisfied, we wrote the sequences of these gRNA to file, and printed the path of the file.

6.8.3. Subcommand `filter()`

The filter() subcommand takes in a compulsory MINORg .map file (which can be read using parse_grna_map_from_file()) and rewrites some/all checks. You can execute all filters (GC, off-target, and feature) using filter(), or execute checks separately using filter_gc(), filter_background(), and filter_feature().

By default, gRNA sequences and map files will be updated automatically whenever any of the filtering methods is called. You may override this behaviour by setting auto_update_files to False or using auto_update_files = False when instantiating a MINORg object (e.g. my_minorg(directory = "/path/to/output/dir", auto_update_files = False)). To manually write files, you should use the following methods. If you do not supply an output file path, it will be automatically generated:

write_all_grna_map(): write .map file containing all candidate gRNA and checks
- Path to output file will be stored at grna_map
- If output file is not specified, the output file will be <output_directory>/<prefix>/<prefix>_gRNA_all.map
write_all_grna_fasta(): write FASTA file containing all candidate gRNA
- Path to output file will be stored at grna_fasta
- If output file is not specified, the output file will be <output_directory>/<prefix>/<prefix>_gRNA_all.fasta
- This file will NOT be auto updated as it is not affected by filtering check status
write_pass_grna_map(): write .map file containing all passing gRNA
- Path to output file will be stored at pass_map
- If output file is not specified, the output file will be <output_directory>/<prefix>/<prefix>_gRNA_pass.map
write_pass_grna_fasta(): write FASTA file containing all passing gRNA
- Path to output file will be stored at pass_fasta
- If output file is not specified, the output file will be <output_directory>/<prefix>/<prefix>_gRNA_pass.fasta

In all cases, you may rename the gRNA using rename_grna(), which takes in the path of a FASTA file that contains the gRNA sequences you wish to rename with sequence IDs of the names you wish to rename them to. This method should be used before you call any of the above methods to write gRNA to file.

6.8.3.1. Subcommand `filter_gc()`

All parameters introduced in Filter by GC content apply.

6.8.3.1.1. Filtering by GC content after calling `full()`

filter_gc() can be used on an active MINORg object even if you’ve already called full().

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_220_subcmdfilter_gc")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050", "AT5G45060", "AT5G45200", "AT5G45210", "AT5G45220", "AT5G45230", "AT5G45240", "AT5G45250"]
>>> my_minorg.query_reference = True
>>> my_minorg.full()
>>> my_minorg.grna_hits
gRNAHits(gRNA = 2141)
>>> my_minorg.valid_grna("GC") ## gRNA that pass GC filter
gRNAHits(gRNA = 1871)
>>> my_minorg.gc_min = 0.2
>>> my_minorg.gc_max = 0.8
>>> my_minorg.filter_gc() ## re-filter by GC content
>>> my_minorg.valid_grna("GC") ## gRNA that pass GC filter
gRNAHits(gRNA = 2097)
>>> my_minorg.minimumset()
>>> my_minorg.resolve()

6.8.3.1.2. Filtering GC content on output of another MINORg run

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_220_subcmdfilter_gc_pt2", auto_update_files = False)
>>> my_minorg.parse_grna_map_from_file("/path/to/sample_custom_check.map")
>>> my_minorg.valid_grna("GC")
gRNAHits(gRNA = 3)
>>> my_minorg.gc_min = 0.4
>>> my_minorg.gc_max = 0.6
>>> my_minorg.filter_gc()
>>> my_minorg.valid_grna("GC")
gRNAHits(gRNA = 1)
>>> my_minorg.write_pass_grna_fasta()
>>> my_minorg.resolve()

6.8.3.2. Subcommand `filter_background()`

All parameters introduced in Filter by off-target apply. Additionally, you should supply target sequences to target so that MINORg can mask them (this tells MINORg that any gRNA hits to them is in fact on-target and NOT off-target). Any additional sequences to be masked may be provided to mask as a list of paths to FASTA files. If you have set screen_reference to True to include reference genome(s) (see Multiple reference genomes for how to specify multiple reference genomes) in the off-target screen, you may specify a FASTA file of sequences of genes to be masked to mask as well. You can generate these sequences using the seq() subcommand, but MAKE SURE TO USE A DIFFERENT MINORg OBJECT AND DIRECTORY TO AVOID OVERWRITING ANY PREVIOUSLY GENERATED FILES.

6.8.3.2.1. Filtering background after calling `full()`

Let us first execute MINORg.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_221_subcmdfilter_bg")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G46450", "AT5G46470", "AT5G46490", "AT5G46510", "AT5G46520"]
>>> my_minorg.add_query("/path/to/subset_9654.fasta", alias = "9654")
>>> my_minorg.add_query("/path/to/subset_9655.fasta", alias = "9655")
>>> my_minorg.sets = 5
>>> my_minorg.full(background_check = False)

In the code above, we skipped off-target check using background_check = False when executing full(). But we’ve changed out mind and would like to screen the reference genome and the non-reference genomes that these targets are from AND we don’t want our gRNA to be able to target any genes in ‘subset_9944.fasta’ and ‘subset_9947’. We also want to tell MINORg that it’s okay if a gRNA has off-target effects in homologous genes AT5G46260 and AT5G46270 in the reference genome. We can do that using the filter() subcommand, followed by the minimumset() subcommand to regenerate minimum sets.

In order to do all this, we will have to get the gene sequences of AT5G46260 and AT5G46270 in order to mask them in the reference genome. We can do this using the get_reference_seq() method.

>>> ot_minorg = MINORg(directory = "/path/to/example_221_subcmdfilter_bg_tomask") ## different directory
>>> ot_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> ot_minorg.genes = ["AT5G46260", "AT5G46270"]
>>> fout_to_mask = ot_minorg.mkfname("ref_to_mask.fasta") ## MINORg has a built-in method to generate file names within the output directory
>>> ot_minorg.get_reference_seq(fout = fout_to_mask) ## this method will return a dictionary of sequences, but will also write to file if 'fout' is used
>>> ot_minorg.resolve()

Now that we have the reference sequences to mask, we can pass the file name to my_minorg‘s mask attribute, add our background files using add_background(), set screen_reference to True, call filter_background() to update off-target checks for all candidate gRNA, and execute minimumset() to regenerate our minimum gRNA sets. You may also wish to call write_all_grna_map(), write_pass_grna_map(), and/or write_pass_grna_fasta() to update the gRNA FASTA and .map files if auto_update_files has been set to False.

>>> my_minorg.mask.append(fout_to_mask)
>>> my_minorg.add_background("/path/to/subset_9944.fasta", alias = "9944")
>>> my_minorg.add_background("/path/to/subset_9947.fasta", alias = "9947")
>>> my_minorg.screen_reference = True
>>> my_minorg.filter_background()
>>> my_minorg.minimumset()
>>> my_minorg.resolve()

6.8.3.2.2. Filtering background on output of another MINORg run

Alternatively, if the orginal my_minorg object no longer exists, whether because you’ve closed the IDE session or deleted the object, you can read its .map file into a new MINORg object using parse_grna_map_from_file() like below. In this case, you can pass the IDs of the additional genes to be masked together with the original genes to genes and don’t need to use get_reference_seq(). Since we’re no longer querying ‘subset_9654.fasta’ and ‘subset_9655.fasta’, we can use add_background() to tell MINORg to search for off-target effects in them. And don’t forget to also provide the FASTA file of target sequences to target so MINORg can mask them!:

>>> from minorg.MINORg import MINORg
>>> new_minorg = MINORg(directory = "/path/to/example_221_subcmdfilter_bg_new")
>>> new_minorg.parse_grna_map_from_file("/path/to/example_221_subcmdfilter_bg/minorg/minorg_gRNA_all.map")
>>> new_minorg.target = "/path/to/example_221_subcmdfilter_bg/minorg/minorg_gene_targets.fasta"
>>> new_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> new_minorg.genes = ["AT5G46260", "AT5G46270", "AT5G46450", "AT5G46470", "AT5G46490", "AT5G46510", "AT5G46520"]
>>> new_minorg.add_background("/path/to/subset_9654.fasta", alias = "9654")
>>> new_minorg.add_background("/path/to/subset_9655.fasta", alias = "9655")
>>> new_minorg.add_background("/path/to/subset_9944.fasta", alias = "9944")
>>> new_minorg.add_background("/path/to/subset_9947.fasta", alias = "9947")
>>> new_minorg.screen_reference = True
>>> new_minorg.filter_background()
>>> new_minorg.minimumset()
>>> new_minorg.resolve()

6.8.3.3. Subcommand `filter_feature()`

All parameters introduced in Filter by feature apply. Additionally, you will need to provide a FASTA file of target sequences (attribute target), reference genome(s) (see Defining reference genomes), and genes (attribute genes). The specified reference gene(s) will be extracted from the reference genome(s) and aligned with target sequence(s) in order for MINORg to infer feature boundaries in target sequence(s). See Within-feature inference for the algorithm of how feature boundaries are inferred.

6.8.3.3.1. Filtering feature after calling `full()`

Let us first execute MINORg.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_222_subcmdfilter_feature")
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> my_minorg.genes = ["AT5G45050"]
>>> my_minorg.add_query("/path/to/subset_9654.fasta", alias = "9654")
>>> my_minorg.add_query("/path/to/subset_9655.fasta", alias = "9655")
>>> my_minorg.full()
>>> my_minorg.valid_grna("feature") ## gRNA that pass within-feature filter
gRNAHits(gRNA = 368)

By default, MINORg sets the desired feature to ‘CDS’. You can re-assess and overwrite the ‘feature’ check in the .map file to only allow gRNA in other GFF features, such as the 3’ UTR, by updating feature and using filter_feature() to re-filter gRNA for the new feature.

>>> my_minorg.feature = "three_prime_UTR"
>>> my_minorg.filter_feature()
>>> my_minorg.valid_grna("feature") ## gRNA that pass within-feature filter
gRNAHits(gRNA = 5)
>>> my_minorg.minimumset()
>>> my_minorg.resolve()

6.8.3.3.2. Filtering feature on output of another MINORg run

As with Filtering background on output of another MINORg run, we can read in the output of a previous MINORg execution and filter that. This requires the .map file ending with ‘_all.map’ (parse using parse_grna_map_from_file()) as well as a FASTA file of target sequences (specify using target).

>>> from minorg.MINORg import MINORg
>>> new_minorg = MINORg(directory = "/path/to/example_222_subcmdfilter_feature_new")
>>> new_minorg.parse_grna_map_from_file("/path/to/example_222_subcmdfilter_feature/minorg/minorg_gRNA_all.map")
>>> new_minorg.target = "/path/to/example_222_subcmdfilter_feature/minorg/minorg_gene_targets.fasta"
>>> new_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10")
>>> new_minorg.genes = ["AT5G45050"] ## MINORg needs to know which reference genes to align to targets to in order to infer feature ranges
>>> new_minorg.feature = "three_prime_UTR"
>>> new_minorg.filter_feature()
>>> new_minorg.minimumset()
>>> new_minorg.resolve()

6.8.4. Subcommand `minimumset()`

The minimumset() subcommand generates mutually exclusive minimum set(s) of gRNA, where each set is capable of covering all targets. It requires a MINORg .map file (the one that ends in ‘_gRNA_pass.map’ is sufficient, but ‘_gRNA_all.map’ would allow for filtering by a custom combination of fields). All parameters introduced in Generating minimum gRNA set(s) apply.

This step will write final gRNA sequences into a file ending with ‘_gRNA_final.fasta’. A file ending with ‘_gRNA_final.map’ that maps gRNA to their targets will also be generated. You may optionally specify the location of the FASTA and .map output files using:

final_map: path of .map file containing gRNA in final set(s)
- If output file is not specified, the output file will be <output_directory>/<prefix>/<prefix>_gRNA_final.map
final_fasta: path of FASTA file containing gRNA in final set(s)
- If output file is not specified, the output file will be <output_directory>/<prefix>/<prefix>_gRNA_final.fasta

6.8.4.1. Regenerating minimum sets after calling `full()`

minimumset() can also be used on an active MINORg object.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_223_subcmdminimumset_pt1")
...
<set up parameters>
...
>>> my_minorg.full()
>>> my_minorg.sets = 5
>>> my_minorg.minimumset() ## regenerate up to 5 gRNA sets

6.8.4.2. Generating minimum sets from output of another MINORg run

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_223_subcmdminimumset_pt2")
>>> my_minorg.parse_grna_map_from_file("/path/to/example_203_query/minorg/minorg_gRNA_all.map")
>>> my_minorg.target = "/path/to/example_203_query/minorg/minorg_gene_targets.fasta"
>>> my_minorg.prioritise_nr = True
>>> my_minorg.sets = 5
>>> my_minorg.minimumset(gc_check = False)
>>> my_minorg.resolve()

In order for MINORg to better assess a gRNA’s proximity to the 5’ end (of hopefully sense strand) of a target in the event a tie-breaker is necessary, it is strongly suggested that target sequences be provided to target so MINORg knows how long a target sequence is. This is especially so if the target sequences are antisense ones (you can check this using the .map file) generated by MINORg’s inferences of homologues in unannotated genomes. In the example above, we’ve asked MINORg to ignore the GC content check when generating minimum sets (my_minorg.minimumset(gc_check = False)).

6.8.5. Chaining subcommands

You may use subcommands separately if you’d like to inspect the outcome of each step and/or repeat a step with different parameters before proceeding with the next. MINORg tracks the output of previous steps, so you do not need to read them into MINORg before executing the next step.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_224_subcmd", prefix = "test", thread = 1)
>>> my_minorg.add_reference("/path/to/subset_ref_TAIR10.fasta", "/path/to/subset_ref_TAIR10.gff", alias = "TAIR10", replace = True)
>>> my_minorg.add_reference("/path/to/subset_ref_Araly2.fasta", "/path/to/subset_ref_Araly2.gff", alias = "araly2")
>>> my_minorg.genes = ["AT1G33560", "AL1G47950.v2.1"]
>>> my_minorg.query_reference = True
>>> my_minorg.seq() ## generate target sequences
>>> my_minorg.target ## print path to FASTA file containing target sequences
'/path/to/example_223_subcmd/minorg/minorg_gene_targets.fasta'
>>> my_minorg.grna()
PAM pattern: .{20}(?=[GATC]GG)
>>> my_minorg.screen_reference = True
>>> my_minorg.filter_background()
Masking on-targets
Finding off-targets
>>> my_minorg.valid_grna("background")
gRNAHits(gRNA = 395)
>>> my_minorg.add_background("/path/to/subset_ref_Araha1.fasta", alias = "araha1") ## add background file
>>> my_minorg.filter_background() ## repeat background check with additional background file
Masking on-targets
Finding off-targets
>>> my_minorg.valid_grna("background") ## updated set of passing gRNA
gRNAHits(gRNA = 250)
>>> my_minorg.filter_gc()
>>> my_minorg.valid_grna("GC")
gRNAHits(gRNA = 355)
>>> my_minorg.valid_grna("background", "GC")
gRNAHits(gRNA = 223)
>>> my_minorg.valid_grna() ## gRNA filtered for all valid checks (at this point, background and GC)
/path/to/minorg/grna.py:823: MINORgWarning: The following hit checks have not been set: feature
gRNAHits(gRNA = 223)
>>> my_minorg.filter_feature() ## by default, MINORg only retains gRNA in CDS
>>> my_minorg.valid_grna("feature")
gRNAHits(gRNA = 324)
>>> my_minorg.valid_grna()
gRNAHits(gRNA = 181)
>>> my_minorg.minimumset(manual = True)

        ID   sequence (Set 1)
        gRNA_026     GTCGTTTCCGGAGACTATGA
Hit 'x' to continue if you are satisfied with these sequences. Otherwise, enter the sequence ID or
sequence of an undesirable gRNA (case-sensitive) and hit the return key to update this list: gRNA_026

        ID   sequence (Set 1)
        gRNA_223     TCAATCTCCATCATAGTCTC
Hit 'x' to continue if you are satisfied with these sequences. Otherwise, enter the sequence ID or
sequence of an undesirable gRNA (case-sensitive) and hit the return key to update this list: x

Final gRNA sequence(s) have been written to /path/to/example_223_subcmd/minorg/minorg_gRNA_final.fasta
Final gRNA sequence ID(s), gRNA sequence(s), and target(s) have been written to
/path/to/example_223_subcmd/minorg/minorg_gRNA_final.map

1 mutually exclusive gRNA set(s) requested. 1 set(s) found.
>>> my_minorg.write_all_grna_map() ## write .map file containing check information for all candidate gRNA
>>> my_minorg.write_all_grna_fasta() ## write FASTA file containing all candidate gRNA
>>> my_minorg.write_pass_grna_map() ## write .map file containing information for valid gRNA
>>> my_minorg.write_pass_grna_fasta() ## write FASTA file containing valid gRNA
>>> my_minorg.resolve() ## remove temporary files

It is highly recommended that you execute resolve() to remove any temporary files generated.

6.9. Defining reference genomes

6.9.1. Single reference genome

See example in Reference gene(s) as targets.

6.9.2. Multiple reference genomes

6.9.3. Non-standard reference

6.9.3.1. Non-standard genetic code

When using pssm_ids, users should ensure that the correct genetic code has been specified for reference genomes using the genetic_code keyword argument when adding reference genomes using add_reference(), as MINORg has to first translate CDS into peptides for domain search using RPS-BLAST. The default genetic code is the Standard Code. Please refer to https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi for genetic code numbers and names.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_226_geneticcode")
>>> my_minorg.add_reference("/path/to/subset_ref_yeast_mt.fasta", "/path/to/subset_ref_yeast_mt.gff", alias = "yeast_mt", genetic_code = 3) ## specify genetic code here
>>> my_minorg.genes = ["gene-Q0275"]
>>> my_minorg.query_reference = True
>>> my_minorg.rpsblast = "/path/to/rpsblast/executable"
>>> my_minorg.db = "/path/to/rpsblast/db"
>>> my_minorg.pssm_ids = ["366140"]
>>> my_minorg.full()
>>> my_minorg.resolve()

In the above example, the gene ‘gene-Q0275’ is a yeast mitochondrial gene, and my_minorg.pssm_ids = ["366140"] specifies the PSSM-Id for the COX3 domain in the Cdd v3.18 RPS-BLAST database. The genetic code number for yeast mitochondrial code is ‘3’.

As a failsafe, MINORg does not terminate translated peptide sequences at the first stop codon. This ensures that any codons after an incorrectly translated premature stop codon will still be translated. Typically, a handful of mistranslated codons can still result in the correct RPS-BLAST domain hits, although hit scores may be slightly lower. Nevertheless, to ensure maximum accuracy, the correct genetic code is preferred.

6.9.3.2. Non-standard GFF3 attribute field names

6.10. Multithreading

MINORg supports multi-threading in order to process files in parallel. Any excess threads may also be used for BLAST. This is most useful when you are querying multiple genomes, have multiple reference genomes, or multiple background sequences.

NOTE for Docker users: Multithreading for parallel querying of multiple genomes and backgrounds is DISABLED for Docker distributions due to incompatibilities.

To run MINORg with parallel processing, set thread to the desired number of threads.

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg(directory = "/path/to/example_228_thread")
>>> my_minorg.extend_reference("/path/to/sample_gene.fasta", "/path/to/sample_CDS.fasta")
>>> my_minorg.genes = ["AT1G10920"]
>>> my_minorg.add_query("/path/to/subset_9654.fasta", alias = "9654")
>>> my_minorg.add_query("/path/to/subset_9655.fasta", alias = "9655")
>>> my_minorg.thread = 2
>>> my_minorg.full()
>>> my_minorg.resolve()

6.11. Differences between CLI and Python versions

Note that, unlike the command line, the Python package does not support aliases even if the config file has been set up appropriately for command line executions. Therefore, there are no true equivalents to --cluster, --indv, or --reference.

6.11.1. To specify cluster genes

Analogous to --cluster and --gene.

Correct:

>>> my_minorg.genes = ['AT5G46260','AT5G46270','AT5G46450','AT5G46470','AT5G46490','AT5G46510','AT5G46520']

Incorrect:

>>> my_minorg.cluster_set = '/path/to/subset_cluster_mapping.txt'
>>> my_minorg.cluster = 'RPS6'

Attributes ‘cluster_set’ and ‘cluster’ do not exist. This does not throw error now but will cause problems later.

6.11.2. To specify query FASTA files

Analogous to --indv and --query.

Correct:

>>> my_minorg.add_query('/path/to/subset_9654.fasta', alias = '9654')
>>> my_minorg.add_query('/path/to/subset_9655.fasta', alias = '9655')

Incorrect:

>>> my_minorg.genome_set = '/path/to/subset_genome_mapping.txt'
>>> my_minorg.indv = '9654,9655'

Attributes ‘genome_set’ and ‘indv’ do not exist. This does not throw error now but will cause problems later.

6.11.3. To specify reference genomes

Analogous to --reference, --assembly, --annotation, --attr-mod, and --genetic-code.

Correct:

>>> my_minorg.add_reference('/path/to/TAIR10.fasta', '/path/to/TARI10.gff3', alias = 'TAIR10', genetic_code = 1, atr_mod = {})

Note that attr_mod and genetic_code are optional if the annotation uses standard attribute field names and the standard genetic code, which the example above does.

Incorrect:

>>> my_minorg.reference_set = '/path/to/arabidopsis_genomes.txt'
>>> my_minorg.reference = 'TAIR10'
AttributeError: can't set attribute

Attributes ‘reference_set’ does not exist, and ‘reference’ is a property that users are not allowed to directly modify.

6. Tutorial (Python)

6.1. Setting up the tutorial

6.2. Getting started

6.3. IMPT: Note on executables

6.4. Defining target sequences

6.4.1. User-provided targets

6.4.2. Reference gene(s) as targets

6.4.3. Non-reference gene(s) as targets

6.4.3.1. Extending the reference

6.4.3.2. Inferring homologues in unannotated genomes

6.4.4. Domain as targets

6.4.4.1. Local database

6.4.4.2. Remote database

6.5. Defining gRNA

6.6. Filtering gRNA

6.6.1. Filter by GC content

6.6.2. Filter by off-target

6.6.2.1. Using total mismatch/gap/unaligned

6.6.2.2. Using position-specific mismatch/gap/unaligned

6.6.2.3. PAM-less off-target check

6.6.2.4. Skip off-target check

6.6.3. Filter by feature

6.7. Generating minimum gRNA set(s)

6.7.1. Number of sets

6.7.2. Prioritise non-redundancy

6.7.3. Excluding gRNA

6.7.4. Accepting unknown checks

6.7.5. Manually approve gRNA sets

6.8. Subcommands

6.8.1. Subcommand seq()

6.8.2. Subcommand grna()

6.8.3. Subcommand filter()

6.8.3.1. Subcommand filter_gc()

6.8.3.1.1. Filtering by GC content after calling full()

6.8.3.1.2. Filtering GC content on output of another MINORg run

6.8.3.2. Subcommand filter_background()

6.8.3.2.1. Filtering background after calling full()

6.8.3.2.2. Filtering background on output of another MINORg run

6.8.3.3. Subcommand filter_feature()

6.8.3.3.1. Filtering feature after calling full()

6.8.3.3.2. Filtering feature on output of another MINORg run

6.8.4. Subcommand minimumset()

6.8.4.1. Regenerating minimum sets after calling full()

6.8.4.2. Generating minimum sets from output of another MINORg run

6.8.5. Chaining subcommands

6.9. Defining reference genomes

6.9.1. Single reference genome

6.9.2. Multiple reference genomes

6.9.3. Non-standard reference

6.9.3.1. Non-standard genetic code

6.9.3.2. Non-standard GFF3 attribute field names

6.10. Multithreading

6.11. Differences between CLI and Python versions

6.11.1. To specify cluster genes

6.11.2. To specify query FASTA files

6.11.3. To specify reference genomes

6.8.1. Subcommand `seq()`

6.8.2. Subcommand `grna()`

6.8.3. Subcommand `filter()`

6.8.3.1. Subcommand `filter_gc()`

6.8.3.1.1. Filtering by GC content after calling `full()`

6.8.3.2. Subcommand `filter_background()`

6.8.3.2.1. Filtering background after calling `full()`

6.8.3.3. Subcommand `filter_feature()`

6.8.3.3.1. Filtering feature after calling `full()`

6.8.4. Subcommand `minimumset()`

6.8.4.1. Regenerating minimum sets after calling `full()`