Output ====== Directory structure ------------------- As a general rule of thumb, using the default prefix 'minorg', MINORg generates the following files with the following directory structure:: / |-- minorg.log +-- minorg/ |-- minorg__mafft.fasta |-- minorg__targets.fasta |-- minorg_gRNA_all.fasta |-- minorg_gRNA_all.map |-- minorg_gRNA_final.fasta |-- minorg_gRNA_final.map |-- minorg_gRNA_pass.fasta |-- minorg_gRNA_pass.map |-- minorg_gRNA_masked.txt +-- ref/ |-- minorg_ref__CDS.fasta |-- minorg_ref__gene.fasta +-- minorg_ref__pep.fasta The exact combination of files generated depends on whether the full programme or a subcommand (and which subcommand) has been executed, as well as the combination of parameters used. If domain is not specified (using ``--domain`` (CLI) OR ``pssm_ids`` and/or ``domain_name`` (Python)), it defaults to 'gene'. If the user has specified a custom prefix (using ``--prefix`` (CLI) OR ``prefix`` (Python)), the prefix replaces 'minorg' for all file names. When ``--cluster`` (CLI exclusive) is used, the general structure is maintained, except 'minorg_' replaces 'minorg' for all files except 'minorg.log', and each cluster gets its own directory:: / |-- minorg.log +-- minorg_clusterA/ | |-- minorg_clusterA__mafft.fasta | |-- minorg_clusterA__targets.fasta | |-- minorg_clusterA_gRNA_all.fasta | |-- minorg_clusterA_gRNA_all.map | |-- minorg_clusterA_gRNA_final.fasta | |-- minorg_clusterA_gRNA_final.map | |-- minorg_clusterA_gRNA_pass.fasta | |-- minorg_clusterA_gRNA_pass.map | |-- minorg_clusterA_gRNA_masked.txt | +-- ref/ | |-- minorg_clusterA_ref__CDS.fasta | |-- minorg_clusterA_ref__gene.fasta | +-- minorg_clusterA_ref__pep.fasta +-- minorg_clusterB/ |-- minorg_clusterB__mafft.fasta |-- minorg_clusterB__targets.fasta |-- minorg_clusterB_gRNA_all.fasta |-- minorg_clusterB_gRNA_all.map |-- minorg_clusterB_gRNA_final.fasta |-- minorg_clusterB_gRNA_final.map |-- minorg_clusterB_gRNA_pass.fasta |-- minorg_clusterB_gRNA_pass.map |-- minorg_clusterB_gRNA_masked.txt +-- ref/ |-- minorg_clusterB_ref__CDS.fasta |-- minorg_clusterB_ref__gene.fasta +-- minorg_clusterB_ref__pep.fasta Output files ------------ minorg.log ++++++++++ This file is currently only generated when the CLI version of MINORg is used. The logfile follows the following format:: ((raw args)) minorg --directory ./example_07_domain --indv ref --gene AT5G45050 --assembly ./subset_ref_TAIR10.fasta --annotation ./subset_ref_TAIR10.gff --domain 214815 ((expanded args)) reference_set: /path/to/arabidopsis_genomes.txt genome_set: /path/to/subset_genome_mapping.txt cluster_set: /path/to/cluster_mapping.txt output_ver: 4 (default) prioritise_nr: False (default) auto: True (default) sets: 1 (default) accept_invalid: False (default) exclude: None (default) ... where 'raw arguments' log the exact command and parameters used and 'expanded arguments' shows all arguments, including those that the user did not explicitly specify in their command. Warning and/or error messages may be logged after the arguments. XXX_mafft.fasta +++++++++++++++ The file ending in '_mafft.fasta' contains a sequence alignment of targets to reference genes generated using MAFFT. XXX_targets.fasta +++++++++++++++++ The file ending in '_targets.fasta' contains target sequence(s). The names of target sequences follow the following format for reference genes:: Reference||||||| * Reference alias: Unique alias given to each reference genome * Domain: PSSM ID or domain name ('gene' if not specified') * n: If multiple domains are present, they will be numbered according to proximity to 5' of sense strand * Feature type: GFF3 feature type * Stitched/complete: Whether sequences were concatenated * Stitched: Concatenated sequence, generated by stitching together regions of the requested GFF3 feature type. For example, individual CDS regions concatenated together into a single translatable sequence is considered 'stitched'. On the other hand * Complete: Sequence that includes intervening regions that may or may not also be of the requested GFF3 feature type. For example, a sequence that spans the first base of the first CDS block to the last base of the last CDS block would be a 'complete' CDS sequence. * Gene ID: Gene ID * Range(s): Ranges of the gene that this sequence spans * For stitched sequences, there may be multiple feature ranges in the format '0-10,20-30' * For complete sequences, there will only be a single range spanning the first base of the first feature to the last base of the last feature of that feature type in the gene. The names of target sequences follow the following format if they were discovered by homology inferrence:: ||| * Query alias: Unique alias given to each query file * Molecule: Sequence ID of sequence the target is from * For example, if the query is a FASTA file with sequences named 'ChrA', 'ChrB', and 'scaffold_001', and the target is found on scaffold_001, the value of this field will be 'scaffold_001' * i: Unique number given to each target sequence from the same query file * Range(s): Range of the target in the molecule gRNA FASTA files ++++++++++++++++ Names of files containing gRNA sequences follow the following format: _gRNA_.fasta The categories are: * all: all candidate gRNA, regardless of pass/fail status * pass: candidate gRNA that have passed all valid checks * final: final gRNA selected in minimum sets gRNA .map files +++++++++++++++ Names of files containing information for mapping gRNA to targets format: _gRNA_.map As with gRNA FASTA files, the categories are: * all: all candidate gRNA, regardless of pass/fail status * pass: candidate gRNA that have passed all valid checks * final: final gRNA selected in minimum sets These files are tab-separated and look like this:: gRNA id gRNA sequence target id target sense gRNA strand start end set background GC feature my_custom_check gRNA_001 CTATGGGTTTGGCGAAAGTA Reference|Reference|214815|gene|stitched|AT5G45050|4139-4382 sense + 4 23 1 pass pass fail pass gRNA_002 TCAAAAGTTCTCCTTATCCA Reference|Reference|214815|gene|stitched|AT5G45050|4139-4382 sense + 38 57 1 pass pass fail pass gRNA_003 AAATCTTTGATGTTTACTTA Reference|Reference|214815|gene|stitched|AT5G45050|4139-4382 sense + 79 98 1 pass fail fail fail gRNA_004 GTCTTTGCTTTTTACTTCTC Reference|Reference|214815|gene|stitched|AT5G45050|4139-4382 sense + 111 130 1 pass pass fail pass gRNA_005 TATAGATGTGCCAGCTCGAA Reference|Reference|214815|gene|stitched|AT5G45050|4139-4382 sense + 140 159 1 pass pass pass fail gRNA_006 GCTCGAAAGGTTGTTTTGCT Reference|Reference|214815|gene|stitched|AT5G45050|4139-4382 sense + 153 172 1 pass pass pass fail gRNA_007 TAAGTAATTACTGAAACATT Reference|Reference|214815|gene|stitched|AT5G45050|4139-4382 sense - 206 225 1 pass fail pass pass gRNA_008 CTGAAACATTTGGATCAGTG Reference|Reference|214815|gene|stitched|AT5G45050|4139-4382 sense - 196 215 1 pass pass pass pass gRNA_009 AGCAAAACAACCTTTCGAGC Reference|Reference|214815|gene|stitched|AT5G45050|4139-4382 sense - 153 172 1 pass pass pass pass Column description: #. gRNA id: Unique ID for each gRNA sequence, consistent with gRNA FASTA files #. gRNA sequence: gRNA sequence (upper case) #. target id: Sequence ID of target sequence, consistent with XXX_targets.fasta #. target sense: Whether target sequence is sense or antisense * This is detected by alignment with reference genes. * If the user provided target sequences to the full programme or to the ``seq`` subcommand using ``--target``, all entries in this field will be 'NA'. #. gRNA strand: Strand of gRNA relative to target sequence #. start: Start position of gRNA in target sequence #. end: End position of gRNA in target sequence #. set: gRNA set number * Unless the file ends with '_final.map', all entries in this field will be set to 1. * If the file ends with '_final.map', this value corresponds to the set a gRNA is assigned to. #. background: Status of background check (only in file ending with '_all.map') #. GC: Status of GC content check (only in file ending with '_all.map') #. feature: Status of within feature check (only in file ending with '_all.map') #. Users may provide custom checks in additional columns * In this example, I've named my custom check 'my_custom_check'