minorg.index module
- class minorg.index.IndexedFasta(fasta, default_seq=None, key_function=<function IndexedFasta.<lambda>>, as_raw=False, strict_bounds=False, read_ahead=None, mutable=False, split_char=None, filt_function=<function IndexedFasta.<lambda>>, one_based_attributes=True, read_long_names=False, duplicate_action='stop', sequence_always_upper=False, rebuild=True, build_index=True)[source]
Bases:
pyfaidx.Fasta
pyfaidx.Fasta class object wrapped to return Bio.Seq.Seq when sliced. Capable of indexing files in non-writable locations.
- __init__(fasta, default_seq=None, key_function=<function IndexedFasta.<lambda>>, as_raw=False, strict_bounds=False, read_ahead=None, mutable=False, split_char=None, filt_function=<function IndexedFasta.<lambda>>, one_based_attributes=True, read_long_names=False, duplicate_action='stop', sequence_always_upper=False, rebuild=True, build_index=True)[source]
An object that provides a pygr compatible interface. filename: name of fasta file
- class minorg.index.IndexedFile(filename, chunk_lines=10000, skip=<function IndexedFile.<lambda>>)[source]
Bases:
object
File indexed by line.
Stores position of every
chunk_lines
th line. During retrieval, jumps to largestchunk_lines
th line before or equal to desired line number and iterates from that line.- __init__(filename, chunk_lines=10000, skip=<function IndexedFile.<lambda>>)[source]
Create IndexedFile object.
- Parameters
filename (str) – required, path to file to index
chunk_lines (int) – number of lines between each stored position. The larger the number the fewer number of positions that will have to be stored in memory BUT the slower the lookup.
skip (func) – function that accepts line (str) from file and outputs whether to skip that line. Skipped lines are effectively hidden from self and do not contribute to line count during indexing or retrieval. (default=lambda x: False)
- get_line(*indices, strip_newline=False, output_fmt=None) Union[str, list] [source]
Retrieve line content by line number(s).
- Parameters
*indices (int) – line numbers of lines to retrieve
strip_newline (bool) – remove newline from returned lines
output_fmt (type) – output format. Valid value:
list
. Ifoutput_fmt = list
, returns list even iflen(indices) == 1
.
- Returns
str – If
len(indices) == 1 and output_fmt != list
list – If
len(indices) > 1 or output_fmt == list
- class minorg.index.rFaidx(*args, **kwargs)[source]
Bases:
pyfaidx.Faidx
Faidx object that writes index into a temporary directory if file location is not writable
- __init__(*args, **kwargs)[source]
filename: name of fasta file key_function: optional callback function which should return a unique
key for the self.index dictionary when given rname.
- as_raw: optional parameter to specify whether to return sequences as a
Sequence() object or as a raw string. Default: False (i.e. return a Sequence() object).