7. Parameters

In addition to a brief overview of the equivalent parameters between CLI and Python versions of MINORg, this section provides additional information about the input format and utility of some of the more cryptic parameters.

Contents

7.1. CLI vs Python 

Several CLI arguments have no equivalents in the Python module as they were intended to simplify the building of commands for users who have little to no experience with coding. Users of the Python package are assumed to be comfortable with generating their own preset parameter combinations.

The table below lists the major similarities and differences between CLI arguments and the Python package’s MINORg class attributes (note that some attributes are in fact properties, but with the exception of reference, setting them should be no different from setting attributes).

Note: Some parameters only apply to specific subcommands (in addition to, of course, the full programme). The relevant subcommands will be indicated in square brackets in the ‘Category’ column.

Python attributes in the table below indicated with an asterisk (*) should be set using a dedicated method.

Category	CLI arguments	Python attributes	Description
General	--directory	directory	output directory
	--prefix	prefix	output file/directory prefix
	--thread	thread	threads
Executable (path to executable if not in command-search path)	--blastn	blastn	local BLAST’s blastn
	--rpsblast	rpsblast	local BLAST’s rpsblast/rpsblast+
	--mafft	mafft	MAFFT
	--bedtools	bedtools	EXCEPTION: path to directory containing BEDTools executables
Reference genomes (CLI: seq, full; Python: seq, filter)	--reference	reference*	reference genome
	--assembly		reference genome FASTA
	--annotation		reference genome GFF
	--attr_mod		mapping for non-standard GFF attribute field names
	--genetic-code		NCBI genetic code number or name
	--extend-gene		FASTA file of genes to add to reference genome
	--extend-cds		FASTA file of CDS of genes to add to reference genome
[seq] target definition	--gene	genes	gene IDs
	--cluster		cluster aliases
	--indv		individuals to discover targets in
	--target	target	FASTA file of sequences to find gRNA in
	--query	query*	FASTA file(s) to discover targets in
	--domain <alias>		aliases of domains to find gRNA in
	--domain <Pssm-Id>	pssm_ids	Pssm-Id(s) of domains to find gRNA in
		domain_name	human-readable domain name used in sequence and file names in place of Pssm-Ids
[seq] inferring homologues from BLASTN hits	--minid	minid	minimum hit % identity
	--minlen	minlen	minimum merged hits length
	--mincdslen	mincdslen	minimum CDS length of merged hits
	--check-recip	check_recip	execute reciprocal check
	--relax-recip	relax_recip	execute relaxed reciprocal check
	--merge-within	merge_within	maximum distance between hits for merging
	--check-id-before-merge	check_id_before_merge	filter hits by % identity before merging
[seq] RPS-BLAST options	--db	db	path to local RPS-BLAST database
[seq] RPS-BLAST options	--remote-rps	remote_rps	use remote RPS-BLAST database (currently non-functional)
[grna]	--pam	pam	PAM pattern
[grna]	--length	length	gRNA length
[filter] GC	--gc-min	gc_min	minimum GC content
[filter] GC	--gc-max	gc_max	maximum GC content
[filter] feature	--feature	feature	GFF3 feature type
	--max-insertion	max_insertion	maximum allowable insertion in feature
	--min-within-n	min_within_n	minimum number of reference genes which features overlap with gRNA range in alignment
	--min-within-fraction	min_within_fraction	minimum fraction of reference genes which features overlap with gRNA range in alignment
[filter] background	--background	background*	FASTA files in which to search for potential off-targets
	--screen-reference	screen_reference	include reference genomes in search for potential off-targets
		mask	FASTA files of additional sequences to mask
	--unmask-ref		unmask reference genes
	--mask-gene		additional genes to mask
	--unmask-gene		genes to unmask
	--mask-cluster		additional clusters to mask
	--unmask-cluster		clusters to unmask
	--ot-pamless	ot_pamless	ignore absense of PAM for potential off-targets
	--ot-mismatch	ot_mismatch	minimum acceptable mismatches for off-targets
	--ot-gap	ot_gap	minimum acceptable gaps for off-targets
	--ot-pattern	ot_pattern	pattern to define combination, number, and location of gap(s) and/or mismatch(es) for unacceptable off-target hits (i.e. gRNA with off-target hits that match the defined pattern will be excluded)
	--ot-unaligned-as-mismatch	ot_unaligned_as_mismatch	treat unaligned positions as mismatches (used with --ot-pattern/ot_pattern)
	--ot-unaligned-as-gap	ot_unaligned_as_gap	treat unaligned positions as gaps (used with --ot-pattern/ot_pattern)
	--skip-bg-check		skip off-target check
[filter] exclude	--exclude	exclude	FASTA file of gRNA sequences to exclude
[minimumset]	--accept-invalid	accept_invalid	score ‘NA’ as ‘pass’
	--accept-feature-unknown	accept_feature_unknown	score ‘NA’ as ‘pass’ for feature check
		accept_invalid_field	score ‘NA’ as ‘pass’ if all entries for a check are ‘NA’
	--sets	sets	number of gRNA sets
	--auto	auto	generate sets without require manual user confirmation for each set
	--prioritise-nr	prioritise_nr	prioritise non-redundancy (nr) over proximity to 5’ when selecting next gRNA in set

7.2. Parameter types 

7.2.1. Flags and arguments 

7.2.1.1. Flag 

Flags are parameters that do not take values.

CLI: --auto, --accept-invalid, --accept-feature-unknown, --prioritise-nr/--prioritise-pos, --ot-unaligned-as-gap/--ot-uag, --ot-unaligned-as-mismatch/--ot-uam

For example:

$ minorg <other arguments> --auto

Simply using --auto tells MINORg to automate set generation.

Python: auto, accept_invalid, accept_feature_unknown, accept_invalid_field, prioritise_nr/prioritise_pos, unaligned_as_gap, unaligned_as_mismatch

In Python, flags are raised by setting the value of their attributes to True or False. For example:

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg()
>>> my_minorg.auto = True ## raise flag for parameter 'auto'

7.2.1.2. Argument 

These parameters take values.

CLI: all parameters that are not flags

$ minorg <other arguments> --prefix my_minorg

--prefix my_minorg tells MINORg to use ‘my_minorg’ as a prefix for output files and directories.

Python: all parameters that are not flags

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg()
>>> my_minorg.prefix = 'my_minorg' ## tells MINORg to use 'my_minorg' as prefix for output files and directories

7.2.2. Paths 

CLI: As all paths will be resolved to absolute paths, relative paths are acceptable. Nevertheless, do be careful with relative paths and NEVER use them in the config file or in lookup files.

Python: Paths are NOT resolved (except directory and config file). Absolute paths are STRONGLY RECOMMENDED. Be careful with relative paths.

7.2.2.1. Executables 

Default values for executables may be specified in the config file (see Configuration for more on the config file).

7.2.2.1.1. blastn, rpsblast/rpsblast+, MAFFT

If an executable is in the command-search path, specifying these parameters is optional, although you may, if you desire, specify the command itself (e.g. ‘blastn’ instead of ‘/usr/bin/blastn’). If not, the path to the executable is required.

To determine if blastn and rpsblast (or rpsblast+ depending on your BLAST+ version) in the command-search path, execute at the command line:

blastn -version

If it prints something like

blastn: 2.6.0+
 Package: blast 2.6.0, build Jan 15 2017 17:12:27

then ‘blastn’ IS in your command-search path. Repeat this with ‘rpsblast’ and/or ‘rpsblast+’.

To determine if the mafft is in your command-search path, execute at the command line:

mafft --version

If it prints something like

v7.427 (2019/Mar/24)

then it IS in your command-search path.

CLI: --blastn, --rpsblast, --maff

$ minorg <other arguments> --blastn /usr/bin/blastn

Python: blastn, rpsblast, mafft

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg()
>>> my_minorg.blastn = '/usr/bin/blastn' ## tells MINORg where the blastn executable is

7.2.2.1.2. BEDTools

If bedtools is in the command-search path, you should NOT use this parameter. If not, the path to the directory containing the BEDTools executables is required.

To determine if the BEDTools executables are in your command-search path, execute at the command line:

bedtools --version

If it prints something like

bedtools v2.26.0

then ‘bedtools’ is in your command-search path.

CLI: bedtools

$ minorg <other arguments> --bedtools /path/to/bedtools2/bin/

Python: bedtools

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg()
>>> my_minorg.bedtools = '/path/to/bedtools2/bin/' ## tells MINORg where the BEDTools executables are

7.2.3. Alias lookup 

Note that aliases are case-sensitive.

7.2.3.1. 1-level lookup 

7.2.3.2. 2-level lookup 

7.2.3.3. Predefined lookup 

Predefined lookup parameters are built into the programme. Users may use either the alias(es) or raw values.

CLI: --pam

Python: pam

7.2.3.4. Raw values 

All other parameters are raw values only.

7.2.4. Multiple values 

7.2.4.1. Comma-separated (CLI)

CLI: --reference, --cluster, --gene, --indv

Comma-separated multiple value arguments accept multiple values for a single argument so long as the values are comma-separated. For example, multiple genes can be specified using --gene 'geneA,geneB,geneC'.

7.2.4.2. Multi-argument (CLI)

CLI: --reference, --cluster, --gene, --indv, --query, --feature, --ext-gene, --ext-cds, --mask-gene, --unmask-gene, --mask-cluster, --unmask-cluster, --ot-indv

Multi-argument parameters accept multiple values by re-using a parameter. For example, multiple genes can be specified using --gene geneA --gene geneB --gene geneC.

(Note that some parameters can be both comma-separated AND multi-argument, and that these features can be combined. For example, --gene geneA --gene geneB,geneC is valid.)

7.2.4.3. Multi-value list (Python)

Python: genes

Multiple values for a single parameter may be provided to MINORg in a list. For example:

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg()
>>> my_minorg.genes = ['geneA'] ## specify a single value
>>> my_minorg.genes = ['geneA', 'geneB', 'geneC'] ## specify multiple values

7.2.4.4. Multi-value dictionary (Python)

Python: query, background

Multiple values for a single parameter may be provided to MINORg in a dictionary. For example:

>>> from minorg.MINORg import MINORg
>>> my_minorg = MINORg()
>>> my_minorg.query = {'queryA': '/path/to/query_file.fasta', 'queryB': '/path/to/another/query_file.fasta'}

7.3. Parameter descriptions 

7.3.1. Reference 

Type: Argument, 2-level lookup

CLI: --reference (used with --reference-set)
Python: set using add_reference(), get using reference
Config file:

set default: reference (section [data])

set default set: reference set (section [data])

assign aliases to sets: reference sets (section [lookup])

This paramter allows users to specify multiple reference genomes.

7.3.1.1. Reference (CLI)

See Multiple reference genomes for usage.

The primary difference between using --reference <alias(es)> --reference-set <reference lookup file> and --assembly <FASTA> --annotation <GFF3> is that you can specify multiple genomes. This is achieved by supplying a reference lookup file (which maps a reference alias to a combination of <FASTA>-<GFF3>-<genetic code>-<GFF3 attribute modification>) using --reference-set (see reference for lookup file format) as well as the alias(es) of refence genome(s) to use using --reference.

7.3.1.2. Reference (Python)

See Multiple reference genomes for an example of how to use the dedicated method add_reference() to specify reference genomes, and Non-standard reference for how to specify genetic code and GFF3 attribute modifications for non-standard genomes/annotations.

7.3.2. Attribute modification 

Type: Argument, 1-level lookup

CLI: --attr-mod
Python: NA (see argument attr_mod of add_reference() instead)
Config file:

set default: gff attribute modification (section [data])

assign aliases: gff attribute modification presets (section [lookup])

This parameter tells MINORg how to map non-standard GFF3 field names to standard GFF3 field names. This feature was originally developed when I tried to retrieve sequences using the IRGSP-1.0 annotation for rice (Oryza sativa subsp. Nipponbare) and discovered that it uses ‘Locus_id’ instead of ‘Parent’ for mRNA annotations.

See http://gmod.org/wiki/GFF3 for standard attribute field names (see section titled ‘Column 9: “attributes”’).

7.3.2.1. Attribute modification format (CLI)

The input given to --attr-mod should follow this format (with quotes):

‘<feature type>:<standard>=<nonstandard>,<standard>=<nonstandard>;<feature type>:<standard>=<nonstandard>’

Examples:

--attr-mod 'mRNA:Parent=Locus_id,ID=transcript_id;CDS:Parent=transcript_id'
‘Locus_id’ and ‘transcript_id’ are non-standard field names for fields ‘Parent’ and ‘ID’ respectively for the feature type ‘mRNA’, and ‘transcript_id’ is the non-standard name for the field ‘Parent’ for the feature type ‘CDS’.

--attr-mod 'all:ID=id'
‘id’ is the non-standard field name for the field ‘ID’ for all feature types.

7.3.2.2. Attribute modification format (reference lookup file)

See Attribute modification format (CLI), except quotes are not required.

7.3.2.3. Attribute modification format (Python)

The input given to the attr_mod keyword argument of the add_reference() method should be a dictionary following the following format:

{'<feature type>': {'<standard>': '<nonstandard>', '<standard>': '<nonstandard>'},
 '<feature type>': {'<standard>': '<nonstandard>'}}

Examples:

{'mRNA': {'Parent': 'Locus_id', 'ID': 'transcript_id'}, 'CDS': {'Parent': 'transcript_id'}}
‘Locus_id’ and ‘transcript_id’ are non-standard field names for fields ‘Parent’ and ‘ID’ respectively for the feature type ‘mRNA’, and ‘transcript_id’ is the non-standard name for the field ‘Parent’ for the feature type ‘CDS’.

{'all': {'ID': 'id'}}
‘id’ is the non-standard field name for the field ‘ID’ for all feature types.

7.3.3. Extended genome 

Type: Argument, Raw values, Multi-argument (CLI)

CLI: --extend-gene, --extend-cds

Python: use extend_reference()

These parameters accept FASTA files and allow MINORg to infer coding regions (CDS) from genomic (--extend-gene; first positional argument of extend_reference()) and CDS-only (--extend-cds; second positional argument of extend_reference()) sequences. They should be used when you do not have a GFF3 annotation file for your desired genes, but DO have the above mentioned sequences. MINORg will align gene and CDS-only sequences using MAFFT to generate a GFF3 annotation file with inferred intron-exon boundaries. These genes will then be added to the reference genome and you can use their gene IDs as you would reference gene IDs. You may provide multiple files to each parameter–MINORg will process them all simultaneously.

For MINORg to map the CDS-only sequences to the correct gene sequences, CDS-only sequences should be named according to the the format: ‘<gene ID>.<CDS ID>’

For example, given the following CDS sequences:

>geneA.1
ATGATGATGATGATGATGATGATGTAA
>geneA.two
ATGATGATGATGATGATGATGTAA
>geneA.foo.bar
ATGATGATGATGATGATGTAA
>geneB.1
ATGAAAAAAAAAAAAAAAAAATAA

And the following gene sequences:

>geneA
ATGATGATGATGATGATGATGATGTAA
>geneA.foo
ATGATGATGATGATGATGATGATGTAA
>geneB
ATGAAAAAAAAAAAAAAAAAAAAAAAATAA

CDS sequences geneA.1 and geneA.two will be mapped to gene sequence geneA, geneA.foo.bar will be mapped to geneA.foo, and geneB.1 will be mapped to geneB. Note that geneA.1 and geneA.two will be treated as different isoforms of the gene geneA.

7.3.4. PAM 

Type: Argument, Predefined lookup

CLI: --pam
Python: pam
Config file:

set default: pam (section [grna])

assign aliases: pam alias (section [lookup]) (not yet implemented)

By default, MINORg designs gRNA for SpCas9 systems (i.e. 3’ NGG PAM). You may specify other PAM patterns for non-SpCas9 systems using --pam. It is recommended that any PAM pattern that uses special characters be enclosed in quotes, as it may lead to unexpected behaviour otherwise at the terminal.

Under the hood, MINORg uses regex to match PAM sites. Therefore, it is in theory possible to utilise the full suite of Python regex syntax to customise your PAM pattern. Note that PAM is NOT case-sensitive. However, do take care to avoid using . as a wildcard, as MINORg uses this character to determine where gRNA is relative to a PAM pattern.

7.3.4.1. Ambiguous bases and repeats 

Unlike many gRNA designers, MINORg accepts ambiguous bases (see: https://genome.ucsc.edu/goldenPath/help/iupac.html for IUPAC codes) as well as variable number of repeats.

Example: The pattern ‘R{1,2}T’ (where ‘R’ means ‘A’ or ‘G’, and {1,2} means either 1 to 2 repetitions of the character right before it) will match ‘AT’, ‘GT’, ‘AAT’, ‘AGT’, ‘GAT’, and ‘GGT’.

7.3.4.2. Spacers and 3’ or 5’ PAM 

In the absence of ‘N’ in the PAM pattern, MINORg will assume 3’ PAM with 1 spacer base (such as in the 3’ ‘NGG’ of SpCas9). If a pattern includes an ‘N’ at either end, MINORg will assume that the gRNA is directly adjacent to the ‘N’ base of the pattern. To specify a 5’ PAM in the absence of ‘N’ in the PAM pattern, ‘.’ should be inserted where the gRNA is.

Example 1: --pam .NGG and --pam NGG and --pam GG are functionally identical. The latter two will be expanded to the most explicit pattern: .NGG.

Example 2: If a CRISPR system uses ‘GG’ PAM with NO spacer ‘N’ base, the PAM pattern has to be specified to MINORg as --pam .GG. Otherwise, MINORg will insert a spacer ‘N’ base, giving rise to the incorrect explicit pattern of .NGG instead.

Example 3: AacCas12b uses a 5’ PAM with the pattern ‘TTN’, which can be specified to MINORg as --pam TTN or --pam TTN., where . indicates where the gRNA is. . is optional as this PAM pattern (TTN) includes ‘N’ at the end. Therefore, MINORg will infer a 5’ PAM.

Example 4: Cas12a uses a 5’ PAM with the pattern ‘TTTV’, which can be specified to MINORg as --pam TTTV. or --pam 'T{3}V.', where . indicates where the gRNA is. As the PAM pattern does not include ‘N’, the gRNA position MUST be explicitly indicated using .. If --pam TTTV is (incorrectly) used, MINORg will default to a 3’ PAM AND add a spacer base, expanding it to the undesired explicit pattern .NTTTV .

For a PAM-less search, use: --pam . or --pam '.'.

7.3.4.3. Preset PAM patterns 

MINORg comes with several preset PAM patterns for different CRISPR systems.

For example: --pam SpCas9 and --pam .NGG are functionally identical.

alias(es)	PAM sequence (explicit)	Notes
SpCas9 OR spcas9	.NGG	default
SaCas9T OR sacas9t	.NGRRT
SaCas9N OR sacas9n	.NGRRN
NmeCas9 OR nmecas9	.NNNNGATT
CjCas9 OR cjcas9	.NNNNRYAC
StCas9 OR stcas9	.NNAGAAW
Cas12a OR cas12a	TTTV.	5’ PAM
AacCas12b OR aaccas12b	TTN.	5’ PAM
BhCas12b OR bhcas12b	DTTN.	5’ PAM

7.3.5. Off-target pattern 

Type: Argument

CLI: --ot-pattern
Python: ot_pattern
Config file:

set default: off-target pattern (section [filter])

For greater flexibility, MINORg provides a method for defining position-specific tolerances for gaps/mismatches/unaligned positions.

(By default, MINORg uses --ot-mismatch/ot_mismatch and --ot-gap/ot_gap to determine whether an off-target hit disqualifies as gRNA. This default behaviour counts the total number of mismatches and/or gaps and/or unaligned positions in an off-target gRNA hit and discards or retains a gRNA based on the specified threshold values. See Total mismatch/gap/unaligned for this default algorithm. This behaviour will be overridden if --ot-pattern/ot_pattern is specified.)

7.3.5.1. Basic unit 

The basic unit of an off-target pattern comprises of 3 parts:

Maximum intolerable count (integer)
Type of non-match (whether gap, deletion, insertion, and/or gap)
- m: mismatch
- g: gap (should not be used with i and/or d)
- i: insertion (base present in gRNA but not in the off-target sequence)
- d: deletion (base not present in gRNA but present in the off-target sequence)
Range
- All examples below will be based on a very short 8 bp gRNA of 5’-ATGCatgc-3’ (upper and lowercase for illustration purposes)
- Position indices can be positive or negative, but not zero.
  - This allows flexibility regardless of gRNA length and whether PAM is 5’ or 3’.
  - If index > 0: positions are counted from the 5’ end (best for 5’ PAM)
    - Index 1 = A
    - Index 5 = a
    - Index 7 = g
  - If index < 0: positions are counted from the 3’ end (best for 3’ PAM)
    - Index -1 = c
    - Index -5 = C
    - Index -7 = T
- If a single index is provided, the range is assumed to be:
  - <start> to <index>: if index > 0
    - 4: ATGC (positions 1 to 4)
    - 6: ATGCat (positions 1 to 6)
  - <index> to <end>: if index < 0
    - -4: atgc (positions -4 to -1)
    - -6: GCatgc (positions -6 to -1)
- If a single index is provided AND followed by a ‘-’, the range is assumed to be:
  - <index> to <end>: if index > 0
    - 4-: Catgc (positions 4 to 8)
    - 6-: tgc (positions 6 to 8)
  - <start> to <index>: if index < 0
    - -4-: ATGCa (positions -8 to -4)
    - -6-: ATG (positions -8 to -6)
- Otherwise, a range can be defined using 2 indices separated by ‘-’. Values must either both be positive or both be negative. For ranges defined by negative indices, the index with the smaller absolute value should come first.
  - Valid
    - 2-5: TGCa (positions 2 to 5)
    - -2--5: Catg (positions -5 to -2)
  - Invalid
    - 2--5: mixed signs
    - -2-5: mixed signs
    - -5--2: index with smaller absolute value should come first

7.3.5.1.1. Examples

0mg5: gRNA hit with any mismatches or gaps (>0) from positions 1 to 5 will be NOT be considered problematic.
1i-5--20: gRNA hit with more than 1 (>1) insertions from positions -5 to -20 will NOT be considered problematic.

7.3.5.2. Operators 

Multiple units can be combined using , (AND) and | (OR).

Neither operator is prioritised over the other. You may specify order using parentheses ( and ). In the absence of parenthesis, operations are evaluated left to right.

0mg5,1mg6-|0mg6,1m7- will be evaluated as (((0mg5,1mg6-)|0mg6),1m7-)
- To evaluate 0mg5,1mg6-|0mg6,1m7- as ‘0mg5,1mg6- OR 0mg6,1m7-‘, use (0mg5,1mg6-)|(0mg6,1m7-)

NOTE: You can technically combine basic units with ranges that are negative and positive (e.g 0mg5,0mg-5 is valid), but I’m not sure why you’d do that.

7.3.5.2.1. Examples

(0mg5,1mg6-)|(0mg6,1m7-): gRNA hit with <no gaps/mismatches from positions 1 to 5 and no more than 1 gaps/mismatches from positions 6 to the end> OR <no gaps/mismatches from positions 1 to 6 and no more than 1 mismatch from positions 7 to the end REGARDLESS OF HOW MANY GAPS> will be considered problematic.

7.3.6. Prioritise non-redundancy 

Type: Flag

CLI: --prioritise-nr/--prioritize-nr
Python: prioritise_nr/--prioritize-nr
Config file:

set default: prioritise non-redundnacy (section [filter])

By default, gRNA are selected for a set in the following order of priority:

Coverage - Favour gRNA that cover a larger number of targets not covered by already selected gRNA
Proximity to 5’ - Favour gRNA that are positioned closer to the 5’ end of a target - For reference genes, MINORg favours proxiity to the 5’ end of the sense strand - If reference genes have been specified, an alignment would have been generated with targets and reference genes, and sense will be inferred from this alignment. With sense information, MINORg will favour proximity to the 5’ end of the sense strand.
Non-redundancy - Favour gRNA which coverage has the fewest overlap with targets covered by already selected gRNA

If this flag is raised, ‘Non-redundancy’ will be prioritised before ‘Proximity to 5’. This may be preferred if you wish to generate a large number of sets, as priortisation of non-redundancy makes it less likely that extremely high coverage gRNA will be added to a growing set, such that these gRNA can then be used to seed the next set.

7.3.7. RPS-BLAST local database 

Type: Argument, 1-level lookup

CLI: --db
Python: db
Config file:

set default: rps database (section [data])

assign aliases: rps database alias (section [lookup])

The latest CDD database may be downloaded at ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/cdd.targ.gz. As the CDD database is regularly updated, the PSSM-Id for a domain shown at the CDD website is subject to change. Thus, I also recommend downloading ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/cddid.tbl.gz, which contains information that maps PSSM-Ids to domain accession IDs as well as domain names of the database version at the point of downloading.

Note: As the local database itself consists of multiple files with different extensions, the path provided to this parameter is not to any single file. For example, given the following file structure:

/
+-- root/
    |-- other_files/
    +-- rps_db/
        |-- Cdd.aux
        |-- Cdd.freq
        |-- Cdd.loo
        |-- Cdd.phr
        |-- Cdd.pin
        |-- Cdd.psd
        |-- Cdd.psi
        |-- Cdd.psq
        +-- Cdd.rps

where the database is contained in the directory /root/rsp_db/, the appropriate path to pass to this parameter is: /root/rps_db/Cdd, where the trailing ‘Cdd’ is the prefix of all of the database’s files

7.3.8. RPS-BLAST remote database 

Type: Flag

CLI: --remote-rps
Python: remote_rps
Config file:

set default: remote rps (section [data])

While it is in theory possible to use the remote CDD database & servers instead of local ones, the --remote option for the ‘rpsblast’/’rpsblast+’ command from the BLAST+ package has never worked for me. In any case, if your version of local rpsblast is able to access the remote database, you can use --remote-rps instead of --db /path/to/rpsblast/db.