7. Parameters

In addition to a brief overview of the equivalent parameters between CLI and Python versions of MINORg, this section provides additional information about the input format and utility of some of the more cryptic parameters.

7.1. CLI vs Python

Several CLI arguments have no equivalents in the Python module as they were intended to simplify the building of commands for users who have little to no experience with coding. Users of the Python package are assumed to be comfortable with generating their own preset parameter combinations.

The table below lists the major similarities and differences between CLI arguments and the Python package’s MINORg class attributes (note that some attributes are in fact properties, but with the exception of reference, setting them should be no different from setting attributes).

Note: Some parameters only apply to specific subcommands (in addition to, of course, the full programme). The relevant subcommands will be indicated in square brackets in the ‘Category’ column.

Python attributes in the table below indicated with an asterisk (*) should be set using a dedicated method.

Category

CLI arguments

Python attributes

Description

General

--directory

directory

output directory

--prefix

prefix

output file/directory prefix

--thread

thread

threads

Executable

(path to executable

if not in command-search

path)

--blastn

blastn

local BLAST’s blastn

--rpsblast

rpsblast

local BLAST’s rpsblast/rpsblast+

--mafft

mafft

MAFFT

--bedtools

bedtools

EXCEPTION: path to directory containing BEDTools executables

Reference genomes

(CLI: seq, full;

Python: seq, filter)

--reference

reference*

reference genome

--assembly

reference genome FASTA

--annotation

reference genome GFF

--attr_mod

mapping for non-standard GFF attribute field names

--genetic-code

NCBI genetic code number or name

--extend-gene

FASTA file of genes to add to reference genome

--extend-cds

FASTA file of CDS of genes to add to reference genome

[seq]

target definition

--gene

genes

gene IDs

--cluster

cluster aliases

--indv

individuals to discover targets in

--target

target

FASTA file of sequences to find gRNA in

--query

query*

FASTA file(s) to discover targets in

--domain <alias>

aliases of domains to find gRNA in

--domain <Pssm-Id>

pssm_ids

Pssm-Id(s) of domains to find gRNA in

domain_name

human-readable domain name used in sequence and file names in place of Pssm-Ids

[seq]

inferring homologues

from BLASTN hits

--minid

minid

minimum hit % identity

--minlen

minlen

minimum merged hits length

--mincdslen

mincdslen

minimum CDS length of merged hits

--check-recip

check_recip

execute reciprocal check

--relax-recip

relax_recip

execute relaxed reciprocal check

--merge-within

merge_within

maximum distance between hits for merging

--check-id-before-merge

check_id_before_merge

filter hits by % identity before merging

[seq]

RPS-BLAST options

--db

db

path to local RPS-BLAST database

--remote-rps

remote_rps

use remote RPS-BLAST database (currently non-functional)

[grna]

--pam

pam

PAM pattern

--length

length

gRNA length

[filter]

GC

--gc-min

gc_min

minimum GC content

--gc-max

gc_max

maximum GC content

[filter]

feature

--feature

feature

GFF3 feature type

--max-insertion

max_insertion

maximum allowable insertion in feature

--min-within-n

min_within_n

minimum number of reference genes which features overlap with gRNA range in alignment

--min-within-fraction

min_within_fraction

minimum fraction of reference genes which features overlap with gRNA range in alignment

[filter]

background

--background

background*

FASTA files in which to search for potential off-targets

--screen-reference

screen_reference

include reference genomes in search for potential off-targets

mask

FASTA files of additional sequences to mask

--unmask-ref

unmask reference genes

--mask-gene

additional genes to mask

--unmask-gene

genes to unmask

--mask-cluster

additional clusters to mask

--unmask-cluster

clusters to unmask

--ot-pamless

ot_pamless

ignore absense of PAM for potential off-targets

--ot-mismatch

ot_mismatch

minimum acceptable mismatches for off-targets

--ot-gap

ot_gap

minimum acceptable gaps for off-targets

--ot-pattern

ot_pattern

pattern to define combination, number, and location of gap(s) and/or mismatch(es) for unacceptable off-target hits (i.e. gRNA with off-target hits that match the defined pattern will be excluded)

--ot-unaligned-as-mismatch

ot_unaligned_as_mismatch

treat unaligned positions as mismatches (used with --ot-pattern/ot_pattern)

--ot-unaligned-as-gap

ot_unaligned_as_gap

treat unaligned positions as gaps (used with --ot-pattern/ot_pattern)

--skip-bg-check

skip off-target check

[filter] exclude

--exclude

exclude

FASTA file of gRNA sequences to exclude

[minimumset]

--accept-invalid

accept_invalid

score ‘NA’ as ‘pass’

--accept-feature-unknown

accept_feature_unknown

score ‘NA’ as ‘pass’ for feature check

accept_invalid_field

score ‘NA’ as ‘pass’ if all entries for a check are ‘NA’

--sets

sets

number of gRNA sets

--auto

auto

generate sets without require manual user confirmation for each set

--prioritise-nr

prioritise_nr

prioritise non-redundancy (nr) over proximity to 5’

when selecting next gRNA

in set

7.2. Parameter types

7.2.1. Flags and arguments

7.2.1.1. Flag

Flags are parameters that do not take values.

CLI: --auto, --accept-invalid, --accept-feature-unknown, --prioritise-nr/--prioritise-pos, --ot-unaligned-as-gap/--ot-uag, --ot-unaligned-as-mismatch/--ot-uam

For example:

$ minorg <other arguments> --auto

Simply using --auto tells MINORg to automate set generation.

Python: auto, accept_invalid, accept_feature_unknown, accept_invalid_field, prioritise_nr/prioritise_pos, unaligned_as_gap, unaligned_as_mismatch

In Python, flags are raised by setting the value of their attributes to True or False. For example:

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg()
>>> my_minorg.auto = True ## raise flag for parameter 'auto'

7.2.1.2. Argument

These parameters take values.

CLI: all parameters that are not flags

$ minorg <other arguments> --prefix my_minorg

--prefix my_minorg tells MINORg to use ‘my_minorg’ as a prefix for output files and directories.

Python: all parameters that are not flags

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg()
>>> my_minorg.prefix = 'my_minorg' ## tells MINORg to use 'my_minorg' as prefix for output files and directories

7.2.2. Paths

CLI: As all paths will be resolved to absolute paths, relative paths are acceptable. Nevertheless, do be careful with relative paths and NEVER use them in the config file or in lookup files.
Python: Paths are NOT resolved (except directory and config file). Absolute paths are STRONGLY RECOMMENDED. Be careful with relative paths.

7.2.2.1. Executables

Default values for executables may be specified in the config file (see Configuration for more on the config file).

7.2.2.1.1. blastn, rpsblast/rpsblast+, MAFFT

If an executable is in the command-search path, specifying these parameters is optional, although you may, if you desire, specify the command itself (e.g. ‘blastn’ instead of ‘/usr/bin/blastn’). If not, the path to the executable is required.

To determine if blastn and rpsblast (or rpsblast+ depending on your BLAST+ version) in the command-search path, execute at the command line:

blastn -version

If it prints something like

blastn: 2.6.0+
 Package: blast 2.6.0, build Jan 15 2017 17:12:27

then ‘blastn’ IS in your command-search path. Repeat this with ‘rpsblast’ and/or ‘rpsblast+’.

To determine if the mafft is in your command-search path, execute at the command line:

mafft --version

If it prints something like

v7.427 (2019/Mar/24)

then it IS in your command-search path.

CLI: --blastn, --rpsblast, --maff

$ minorg <other arguments> --blastn /usr/bin/blastn

Python: blastn, rpsblast, mafft

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg()
>>> my_minorg.blastn = '/usr/bin/blastn' ## tells MINORg where the blastn executable is
7.2.2.1.2. BEDTools

If bedtools is in the command-search path, you should NOT use this parameter. If not, the path to the directory containing the BEDTools executables is required.

To determine if the BEDTools executables are in your command-search path, execute at the command line:

bedtools --version

If it prints something like

bedtools v2.26.0

then ‘bedtools’ is in your command-search path.

CLI: bedtools

$ minorg <other arguments> --bedtools /path/to/bedtools2/bin/

Python: bedtools

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg()
>>> my_minorg.bedtools = '/path/to/bedtools2/bin/' ## tells MINORg where the BEDTools executables are

7.2.3. Alias lookup

Note that aliases are case-sensitive.

7.2.3.1. 1-level lookup

See also: 1-level lookup

1-level lookup parameters have preset values mapped to aliases defined in a configuration file. Users may use either the alias(es) or provide raw values.

CLI: --assembly, --annotation, --db, attr-mod, --domain
Python: Does not support aliases. Raw values only.

7.2.3.2. 2-level lookup

See also: 2-level lookup

2-level lookup parameters use a combination of 2 parameters. The first parameter (suffixed with set) specifies a file containing alias mapping information for the second parameter (not suffixed). Aliases for the first parameter are defined in a configuration file, and functions effectively the same way a 1-level lookup parameter does. The second parameter reads alias mapping information from the file specified by the first parameter. Unlike the first parameters, users may only use alias(es)–raw values are not allowed. To specify raw values, different parameters must be used (see Alternative parameters for which).

CLI: --reference-set-reference, --cluster-set---cluster, --genome-set---indv
Python: Does not support aliases. Raw values only.

7.2.3.3. Predefined lookup

Predefined lookup parameters are built into the programme. Users may use either the alias(es) or raw values.

CLI: --pam
Python: pam

7.2.3.4. Raw values

All other parameters are raw values only.

7.2.4. Multiple values

7.2.4.1. Comma-separated (CLI)

CLI: --reference, --cluster, --gene, --indv

Comma-separated multiple value arguments accept multiple values for a single argument so long as the values are comma-separated. For example, multiple genes can be specified using --gene 'geneA,geneB,geneC'.

7.2.4.2. Multi-argument (CLI)

CLI: --reference, --cluster, --gene, --indv, --query, --feature, --ext-gene, --ext-cds, --mask-gene, --unmask-gene, --mask-cluster, --unmask-cluster, --ot-indv

Multi-argument parameters accept multiple values by re-using a parameter. For example, multiple genes can be specified using --gene geneA --gene geneB --gene geneC.

(Note that some parameters can be both comma-separated AND multi-argument, and that these features can be combined. For example, --gene geneA --gene geneB,geneC is valid.)

7.2.4.3. Multi-value list (Python)

Python: genes

Multiple values for a single parameter may be provided to MINORg in a list. For example:

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg()
>>> my_minorg.genes = ['geneA'] ## specify a single value
>>> my_minorg.genes = ['geneA', 'geneB', 'geneC'] ## specify multiple values

7.2.4.4. Multi-value dictionary (Python)

Python: query, background

Multiple values for a single parameter may be provided to MINORg in a dictionary. For example:

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg()
>>> my_minorg.query = {'queryA': '/path/to/query_file.fasta', 'queryB': '/path/to/another/query_file.fasta'}

7.3. Parameter descriptions

7.3.1. Reference

Type: Argument, 2-level lookup

CLI: --reference (used with --reference-set)
Python: set using add_reference(), get using reference
Config file:
set default: reference (section [data])
set default set: reference set (section [data])
assign aliases to sets: reference sets (section [lookup])

This paramter allows users to specify multiple reference genomes.

7.3.1.1. Reference (CLI)

See Multiple reference genomes for usage.

The primary difference between using --reference <alias(es)> --reference-set <reference lookup file> and --assembly <FASTA> --annotation <GFF3> is that you can specify multiple genomes. This is achieved by supplying a reference lookup file (which maps a reference alias to a combination of <FASTA>-<GFF3>-<genetic code>-<GFF3 attribute modification>) using --reference-set (see reference for lookup file format) as well as the alias(es) of refence genome(s) to use using --reference.

7.3.1.2. Reference (Python)

See Multiple reference genomes for an example of how to use the dedicated method add_reference() to specify reference genomes, and Non-standard reference for how to specify genetic code and GFF3 attribute modifications for non-standard genomes/annotations.

7.3.2. Attribute modification

Type: Argument, 1-level lookup

CLI: --attr-mod
Python: NA (see argument attr_mod of add_reference() instead)
Config file:
set default: gff attribute modification (section [data])
assign aliases: gff attribute modification presets (section [lookup])

This parameter tells MINORg how to map non-standard GFF3 field names to standard GFF3 field names. This feature was originally developed when I tried to retrieve sequences using the IRGSP-1.0 annotation for rice (Oryza sativa subsp. Nipponbare) and discovered that it uses ‘Locus_id’ instead of ‘Parent’ for mRNA annotations.

See http://gmod.org/wiki/GFF3 for standard attribute field names (see section titled ‘Column 9: “attributes”’).

7.3.2.1. Attribute modification format (CLI)

The input given to --attr-mod should follow this format (with quotes):

‘<feature type>:<standard>=<nonstandard>,<standard>=<nonstandard>;<feature type>:<standard>=<nonstandard>’

Examples:

--attr-mod 'mRNA:Parent=Locus_id,ID=transcript_id;CDS:Parent=transcript_id'

‘Locus_id’ and ‘transcript_id’ are non-standard field names for fields ‘Parent’ and ‘ID’ respectively for the feature type ‘mRNA’, and ‘transcript_id’ is the non-standard name for the field ‘Parent’ for the feature type ‘CDS’.

--attr-mod 'all:ID=id'

‘id’ is the non-standard field name for the field ‘ID’ for all feature types.

7.3.2.2. Attribute modification format (reference lookup file)

See Attribute modification format (CLI), except quotes are not required.

7.3.2.3. Attribute modification format (Python)

The input given to the attr_mod keyword argument of the add_reference() method should be a dictionary following the following format:

{'<feature type>': {'<standard>': '<nonstandard>', '<standard>': '<nonstandard>'},
 '<feature type>': {'<standard>': '<nonstandard>'}}

Examples:

{'mRNA': {'Parent': 'Locus_id', 'ID': 'transcript_id'}, 'CDS': {'Parent': 'transcript_id'}}

‘Locus_id’ and ‘transcript_id’ are non-standard field names for fields ‘Parent’ and ‘ID’ respectively for the feature type ‘mRNA’, and ‘transcript_id’ is the non-standard name for the field ‘Parent’ for the feature type ‘CDS’.

{'all': {'ID': 'id'}}

‘id’ is the non-standard field name for the field ‘ID’ for all feature types.

7.3.3. Extended genome

Type: Argument, Raw values, Multi-argument (CLI)

CLI: --extend-gene, --extend-cds
Python: use extend_reference()

These parameters accept FASTA files and allow MINORg to infer coding regions (CDS) from genomic (--extend-gene; first positional argument of extend_reference()) and CDS-only (--extend-cds; second positional argument of extend_reference()) sequences. They should be used when you do not have a GFF3 annotation file for your desired genes, but DO have the above mentioned sequences. MINORg will align gene and CDS-only sequences using MAFFT to generate a GFF3 annotation file with inferred intron-exon boundaries. These genes will then be added to the reference genome and you can use their gene IDs as you would reference gene IDs. You may provide multiple files to each parameter–MINORg will process them all simultaneously.

For MINORg to map the CDS-only sequences to the correct gene sequences, CDS-only sequences should be named according to the the format: ‘<gene ID>.<CDS ID>’

For example, given the following CDS sequences:

>geneA.1
ATGATGATGATGATGATGATGATGTAA
>geneA.two
ATGATGATGATGATGATGATGTAA
>geneA.foo.bar
ATGATGATGATGATGATGTAA
>geneB.1
ATGAAAAAAAAAAAAAAAAAATAA

And the following gene sequences:

>geneA
ATGATGATGATGATGATGATGATGTAA
>geneA.foo
ATGATGATGATGATGATGATGATGTAA
>geneB
ATGAAAAAAAAAAAAAAAAAAAAAAAATAA

CDS sequences geneA.1 and geneA.two will be mapped to gene sequence geneA, geneA.foo.bar will be mapped to geneA.foo, and geneB.1 will be mapped to geneB. Note that geneA.1 and geneA.two will be treated as different isoforms of the gene geneA.

7.3.4. PAM

Type: Argument, Predefined lookup

CLI: --pam
Python: pam
Config file:
set default: pam (section [grna])
assign aliases: pam alias (section [lookup]) (not yet implemented)

By default, MINORg designs gRNA for SpCas9 systems (i.e. 3’ NGG PAM). You may specify other PAM patterns for non-SpCas9 systems using --pam. It is recommended that any PAM pattern that uses special characters be enclosed in quotes, as it may lead to unexpected behaviour otherwise at the terminal.

Under the hood, MINORg uses regex to match PAM sites. Therefore, it is in theory possible to utilise the full suite of Python regex syntax to customise your PAM pattern. Note that PAM is NOT case-sensitive. However, do take care to avoid using . as a wildcard, as MINORg uses this character to determine where gRNA is relative to a PAM pattern.

7.3.4.1. Ambiguous bases and repeats

Unlike many gRNA designers, MINORg accepts ambiguous bases (see: https://genome.ucsc.edu/goldenPath/help/iupac.html for IUPAC codes) as well as variable number of repeats.

Example: The pattern ‘R{1,2}T’ (where ‘R’ means ‘A’ or ‘G’, and {1,2} means either 1 to 2 repetitions of the character right before it) will match ‘AT’, ‘GT’, ‘AAT’, ‘AGT’, ‘GAT’, and ‘GGT’.

7.3.4.2. Spacers and 3’ or 5’ PAM

In the absence of ‘N’ in the PAM pattern, MINORg will assume 3’ PAM with 1 spacer base (such as in the 3’ ‘NGG’ of SpCas9). If a pattern includes an ‘N’ at either end, MINORg will assume that the gRNA is directly adjacent to the ‘N’ base of the pattern. To specify a 5’ PAM in the absence of ‘N’ in the PAM pattern, ‘.’ should be inserted where the gRNA is.

Example 1: --pam .NGG and --pam NGG and --pam GG are functionally identical. The latter two will be expanded to the most explicit pattern: .NGG.

Example 2: If a CRISPR system uses ‘GG’ PAM with NO spacer ‘N’ base, the PAM pattern has to be specified to MINORg as --pam .GG. Otherwise, MINORg will insert a spacer ‘N’ base, giving rise to the incorrect explicit pattern of .NGG instead.

Example 3: AacCas12b uses a 5’ PAM with the pattern ‘TTN’, which can be specified to MINORg as --pam TTN or --pam TTN., where . indicates where the gRNA is. . is optional as this PAM pattern (TTN) includes ‘N’ at the end. Therefore, MINORg will infer a 5’ PAM.

Example 4: Cas12a uses a 5’ PAM with the pattern ‘TTTV’, which can be specified to MINORg as --pam TTTV. or --pam 'T{3}V.', where . indicates where the gRNA is. As the PAM pattern does not include ‘N’, the gRNA position MUST be explicitly indicated using .. If --pam TTTV is (incorrectly) used, MINORg will default to a 3’ PAM AND add a spacer base, expanding it to the undesired explicit pattern .NTTTV .

For a PAM-less search, use: --pam . or --pam '.'.

7.3.4.3. Preset PAM patterns

MINORg comes with several preset PAM patterns for different CRISPR systems.

For example: --pam SpCas9 and --pam .NGG are functionally identical.

alias(es)

PAM sequence (explicit)

Notes

SpCas9 OR spcas9

.NGG

default

SaCas9T OR sacas9t

.NGRRT

SaCas9N OR sacas9n

.NGRRN

NmeCas9 OR nmecas9

.NNNNGATT

CjCas9 OR cjcas9

.NNNNRYAC

StCas9 OR stcas9

.NNAGAAW

Cas12a OR cas12a

TTTV.

5’ PAM

AacCas12b OR aaccas12b

TTN.

5’ PAM

BhCas12b OR bhcas12b

DTTN.

5’ PAM

7.3.5. Off-target pattern

Type: Argument

CLI: --ot-pattern
Python: ot_pattern
Config file:
set default: off-target pattern (section [filter])

For greater flexibility, MINORg provides a method for defining position-specific tolerances for gaps/mismatches/unaligned positions.

(By default, MINORg uses --ot-mismatch/ot_mismatch and --ot-gap/ot_gap to determine whether an off-target hit disqualifies as gRNA. This default behaviour counts the total number of mismatches and/or gaps and/or unaligned positions in an off-target gRNA hit and discards or retains a gRNA based on the specified threshold values. See Total mismatch/gap/unaligned for this default algorithm. This behaviour will be overridden if --ot-pattern/ot_pattern is specified.)

7.3.5.1. Basic unit

The basic unit of an off-target pattern comprises of 3 parts:

  • Maximum intolerable count (integer)

  • Type of non-match (whether gap, deletion, insertion, and/or gap)

    • m: mismatch

    • g: gap (should not be used with i and/or d)

    • i: insertion (base present in gRNA but not in the off-target sequence)

    • d: deletion (base not present in gRNA but present in the off-target sequence)

  • Range

    • All examples below will be based on a very short 8 bp gRNA of 5’-ATGCatgc-3’ (upper and lowercase for illustration purposes)

    • Position indices can be positive or negative, but not zero.

      • This allows flexibility regardless of gRNA length and whether PAM is 5’ or 3’.

      • If index > 0: positions are counted from the 5’ end (best for 5’ PAM)

        • Index 1 = A

        • Index 5 = a

        • Index 7 = g

      • If index < 0: positions are counted from the 3’ end (best for 3’ PAM)

        • Index -1 = c

        • Index -5 = C

        • Index -7 = T

    • If a single index is provided, the range is assumed to be:

      • <start> to <index>: if index > 0

        • 4: ATGC (positions 1 to 4)

        • 6: ATGCat (positions 1 to 6)

      • <index> to <end>: if index < 0

        • -4: atgc (positions -4 to -1)

        • -6: GCatgc (positions -6 to -1)

    • If a single index is provided AND followed by a ‘-’, the range is assumed to be:

      • <index> to <end>: if index > 0

        • 4-: Catgc (positions 4 to 8)

        • 6-: tgc (positions 6 to 8)

      • <start> to <index>: if index < 0

        • -4-: ATGCa (positions -8 to -4)

        • -6-: ATG (positions -8 to -6)

    • Otherwise, a range can be defined using 2 indices separated by ‘-’. Values must either both be positive or both be negative. For ranges defined by negative indices, the index with the smaller absolute value should come first.

      • Valid

        • 2-5: TGCa (positions 2 to 5)

        • -2--5: Catg (positions -5 to -2)

      • Invalid

        • 2--5: mixed signs

        • -2-5: mixed signs

        • -5--2: index with smaller absolute value should come first

7.3.5.1.1. Examples
  • 0mg5: gRNA hit with any mismatches or gaps (>0) from positions 1 to 5 will be NOT be considered problematic.

  • 1i-5--20: gRNA hit with more than 1 (>1) insertions from positions -5 to -20 will NOT be considered problematic.

7.3.5.2. Operators

Multiple units can be combined using , (AND) and | (OR).

Neither operator is prioritised over the other. You may specify order using parentheses ( and ). In the absence of parenthesis, operations are evaluated left to right.

  • 0mg5,1mg6-|0mg6,1m7- will be evaluated as (((0mg5,1mg6-)|0mg6),1m7-)

    • To evaluate 0mg5,1mg6-|0mg6,1m7- as ‘0mg5,1mg6- OR 0mg6,1m7-‘, use (0mg5,1mg6-)|(0mg6,1m7-)

NOTE: You can technically combine basic units with ranges that are negative and positive (e.g 0mg5,0mg-5 is valid), but I’m not sure why you’d do that.

7.3.5.2.1. Examples
  • (0mg5,1mg6-)|(0mg6,1m7-): gRNA hit with <no gaps/mismatches from positions 1 to 5 and no more than 1 gaps/mismatches from positions 6 to the end> OR <no gaps/mismatches from positions 1 to 6 and no more than 1 mismatch from positions 7 to the end REGARDLESS OF HOW MANY GAPS> will be considered problematic.

7.3.6. Prioritise non-redundancy

Type: Flag

CLI: --prioritise-nr/--prioritize-nr
Python: prioritise_nr/--prioritize-nr
Config file:
set default: prioritise non-redundnacy (section [filter])

By default, gRNA are selected for a set in the following order of priority:

  1. Coverage - Favour gRNA that cover a larger number of targets not covered by already selected gRNA

  2. Proximity to 5’ - Favour gRNA that are positioned closer to the 5’ end of a target - For reference genes, MINORg favours proxiity to the 5’ end of the sense strand - If reference genes have been specified, an alignment would have been generated with targets and reference genes, and sense will be inferred from this alignment. With sense information, MINORg will favour proximity to the 5’ end of the sense strand.

  3. Non-redundancy - Favour gRNA which coverage has the fewest overlap with targets covered by already selected gRNA

If this flag is raised, ‘Non-redundancy’ will be prioritised before ‘Proximity to 5’. This may be preferred if you wish to generate a large number of sets, as priortisation of non-redundancy makes it less likely that extremely high coverage gRNA will be added to a growing set, such that these gRNA can then be used to seed the next set.

7.3.7. RPS-BLAST local database

Type: Argument, 1-level lookup

CLI: --db
Python: db
Config file:
set default: rps database (section [data])
assign aliases: rps database alias (section [lookup])

The latest CDD database may be downloaded at ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/cdd.targ.gz. As the CDD database is regularly updated, the PSSM-Id for a domain shown at the CDD website is subject to change. Thus, I also recommend downloading ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/cddid.tbl.gz, which contains information that maps PSSM-Ids to domain accession IDs as well as domain names of the database version at the point of downloading.

Note: As the local database itself consists of multiple files with different extensions, the path provided to this parameter is not to any single file. For example, given the following file structure:

/
+-- root/
    |-- other_files/
    +-- rps_db/
        |-- Cdd.aux
        |-- Cdd.freq
        |-- Cdd.loo
        |-- Cdd.phr
        |-- Cdd.pin
        |-- Cdd.psd
        |-- Cdd.psi
        |-- Cdd.psq
        +-- Cdd.rps

where the database is contained in the directory /root/rsp_db/, the appropriate path to pass to this parameter is: /root/rps_db/Cdd, where the trailing ‘Cdd’ is the prefix of all of the database’s files

7.3.8. RPS-BLAST remote database

Type: Flag

CLI: --remote-rps
Python: remote_rps
Config file:
set default: remote rps (section [data])

While it is in theory possible to use the remote CDD database & servers instead of local ones, the --remote option for the ‘rpsblast’/’rpsblast+’ command from the BLAST+ package has never worked for me. In any case, if your version of local rpsblast is able to access the remote database, you can use --remote-rps instead of --db /path/to/rpsblast/db.