minorg.index module

class minorg.index.IndexedFasta(fasta, default_seq=None, key_function=<function IndexedFasta.<lambda>>, as_raw=False, strict_bounds=False, read_ahead=None, mutable=False, split_char=None, filt_function=<function IndexedFasta.<lambda>>, one_based_attributes=True, read_long_names=False, duplicate_action='stop', sequence_always_upper=False, rebuild=True, build_index=True)[source]

Bases: pyfaidx.Fasta

pyfaidx.Fasta class object wrapped to return Bio.Seq.Seq when sliced. Capable of indexing files in non-writable locations.

__init__(fasta, default_seq=None, key_function=<function IndexedFasta.<lambda>>, as_raw=False, strict_bounds=False, read_ahead=None, mutable=False, split_char=None, filt_function=<function IndexedFasta.<lambda>>, one_based_attributes=True, read_long_names=False, duplicate_action='stop', sequence_always_upper=False, rebuild=True, build_index=True)[source]: An object that provides a pygr compatible interface. filename: name of fasta file

get_seq(*args, **kwargs) → Bio.Seq.Seq[source]

Return a sequence by record name and interval [start, end).

Coordinates are 1-based, end-exclusive. If rc is set, reverse complement will be returned.

get_spliced_seq(*args, **kwargs) → Bio.Seq.Seq[source]

Return a sequence by record name and list of intervals

Interval list is an iterable of [start, end]. Coordinates are 1-based, end-exclusive. If rc is set, reverse complement will be returned.

class minorg.index.IndexedFile(filename, chunk_lines=10000, skip=<function IndexedFile.<lambda>>)[source]

Bases: object

File indexed by line.

Stores position of every chunk_lines th line. During retrieval, jumps to largest chunk_lines th line before or equal to desired line number and iterates from that line.

__init__(filename, chunk_lines=10000, skip=<function IndexedFile.<lambda>>)[source]

Create IndexedFile object.

Parameters

filename (str) – required, path to file to index
chunk_lines (int) – number of lines between each stored position. The larger the number the fewer number of positions that will have to be stored in memory BUT the slower the lookup.
skip (func) – function that accepts line (str) from file and outputs whether to skip that line. Skipped lines are effectively hidden from self and do not contribute to line count during indexing or retrieval. (default=lambda x: False)

get_line(*indices, strip_newline=False, output_fmt=None) → Union[str, list][source]

Retrieve line content by line number(s).

Parameters

*indices (int) – line numbers of lines to retrieve
strip_newline (bool) – remove newline from returned lines
output_fmt (type) – output format. Valid value: list. If output_fmt = list, returns list even if len(indices) == 1.

Returns

str – If len(indices) == 1 and output_fmt != list
list – If len(indices) > 1 or output_fmt == list

index(chunk_lines=None) → None[source]

Index file.

Parameters: chunk_lines (int) – see __init__.

class minorg.index.rFaidx(*args, **kwargs)[source]

Bases: pyfaidx.Faidx

Faidx object that writes index into a temporary directory if file location is not writable

__init__(*args, **kwargs)[source]

filename: name of fasta file key_function: optional callback function which should return a unique

key for the self.index dictionary when given rname.

as_raw: optional parameter to specify whether to return sequences as a: Sequence() object or as a raw string. Default: False (i.e. return a Sequence() object).

property indexname[source]