minorg.annotation module
GFF3 class
- class minorg.annotation.Annotation(entry, gff, **kwargs)[source]
Bases:
object
Representation of GFF3 annotation/feature
- class minorg.annotation.Attributes(val, gff, entry, field_sep_inter=';', field_sep_intra=',', **for_dummy_gff)[source]
Bases:
object
Reprsentation of GFF3 feature attributes
- class minorg.annotation.GFF(fname=None, data=None, string=None, attr_mod=None, genetic_code=1, fmt=None, quiet=False, memsave=False, chunk_lines=1000, **kwargs)[source]
Bases:
object
Representation of GFF3 file.
Large files can be indexed instead of read to memory by using
memsave=True
.>>> my_gff = GFF('/path/to/large_gff.gff', memsave = True)
- _data[source]
stores annotation data as list of
minorg.annotation.Annotation
objects. Not used if fname is not None but memsave=True.- Type
list
- _kwargs[source]
stores additional arguments when parsing GFF3 entries to
minorg.annotation.Annotation
objects
- _chunk_lines[source]
number of lines between each stored position (used for indexing when memsave=True)
- Type
int
- __iter__()[source]
Read from data stored in memory if self._data is not empty, else read directly from file.
- Yields
- __init__(fname=None, data=None, string=None, attr_mod=None, genetic_code=1, fmt=None, quiet=False, memsave=False, chunk_lines=1000, **kwargs)[source]
Create a GFF object.
- Parameters
fname (str) – optional, path to GFF3 file or BED file generated using gff2bed
data (list) – optional, list of
minorg.annotation.Annotation
objectsstring (str) – optional, string contents in the format of a GFF3 file or BED file generated using gff2bed
attr_mod (dict) – optional, dictionary of mapping for non-standard attribute field names
genetic_code (int or str) – NCBI genetic code name or number
fmt (str) – optional, valid values: BED, GFF, GFF3. If not provided and fname != None, will be inferred from fname extension.
quiet (bool) – print only essential messages
memsave (bool) – index file instead of reading data to memory
chunk_lines (int) – number of lines between each stored line index (default=1000)
**kwargs – additional arguments when parsing GFF3 entries to
minorg.annotation.Annotation
objects
- add_entry(gff_entry, duplicate_check=False) None [source]
Add
minorg.annotation.Annotation
object to self’s data at self._data.- Parameters
gff_entry (
Annotation
) – required, Annotation object to addduplicate_check (bool) – check for duplicates and only add
gff_entry
if not already in data
- empty_copy(other=None) Optional[minorg.annotation.GFF] [source]
Shallow copy self’s attributes (BUT NOT DATA) to another
minorg.annotation.GFF
object.If
other=None
, create newminorg.annotation.GFF
object, copy attributes to it, and return the new object.- Parameters
other (
minorg.annotation.GFF
) – optional. If not provided, creates a newminorg.annotation.GFF
object with copied attributes.- Returns
If
other=None
- Return type
- get_features(*feature_types, index=False)[source]
Get entries of specific feature types.
- Parameters
*feature_types (str) – GFF3 feature types to retrieve
index (bool) – return line number (index) instead of Annotation objects
- Returns
list – Of
minorg.annotation.Annotation
objects if index=Falselist – Of int line number of entries if index=True
- get_features_and_subfeatures(*feature_ids, index=False, full=True, preserve_order=True)[source]
Gets features w/ feature_ids AND subfeatures of those features. If full = True, executes get_subfeatures_full for subfeature discovery, else get_subfeatures
- Parameters
feature_ids (str) – parent feature IDs
index (bool) – return line number of feature instead of
minorg.annotation.Annotation
objectsfull (bool) – return feature(s) as well as its/their subfeatures
preserve_order (bool) – sort output by line number (i.e. preserve original order)
- Returns
list – Of
minorg.annotation.Annotation
objects if index=Falselist – Of int line number of entries if index=True
- get_i(*indices, output_list=False, sort=True) Optional[Union[list, minorg.annotation.Annotation]] [source]
Get
Annotation
of GFF entry/entries by line index.- Parameters
indices (int) – line number(s) (indices) of entries to retrieve
output_list (bool) – return list even if ony one line number is provided
sort (bool) – sort output by line number
- Returns
list – Of
Annotation
ifoutput_list=True
or multiple indices were requestedAnnotation
– Ifoutput_list=False
and only one index was requestedNone – If
output_list=False
and the specified line does not exist
- get_i_raw(*indices, strip_newline=True, output_list=True, sort=True) Optional[Union[list, str]] [source]
Get raw string of entry/entries by line index.
- Parameters
indices (int) – line numbers (indices) of entries to retrieve
strip_newline (bool) – remove newline from returned lines
output_list (bool) – return list even if ony one line number is provided
sort (bool) – sort output by line number
- Returns
list – Of str of entries if
output_list=True
or multiple lines were requestedstr – Of entry if
output_list=False
and only one line was requestedNone – If
output_list=False
and the specified line does not exist
- get_id(*feature_ids, index=False, output_list=False, preserve_order=True)[source]
Get
Annotation
of GFF entry/entries by feature ID.- Parameters
*feature_ids (str) – feature ID(s)
index (bool) – return line number(s) of feature(s) instead of
minorg.annotation.Annotation
object(s)output_list (bool) – return list even if ony one feature ID is provided
preserve_order (bool) – sort output by line number (preserve original order)
- Returns
list – If
output_list=True
or more than one feature ID was provided. List ofminorg.annotation.Annotation
objects ifindex=False
. List of int ifindex=True
.Annotation
– Ifoutput_list=False
and only one feature ID was provided andindex=False
int – If
output_list=False
and only one feature ID was provided andindex=True
None – If
output_list=False
and no feature with the specified feature ID can be found
- get_subfeatures(*feature_ids, feature_types=[], index=False)[source]
Get all features that are subfeatures of user-provided feature_ids.
- Parameters
*feature_ids (list) – list of feature IDs
feature_types (str) – feature type(s) to retain
index (bool) – return line number of feature instead of
minorg.annotation.Annotation
objects
- Returns
list – Of
minorg.annotation.Annotation
objects if index=Falselist – Of int line number of entries if index=True
- get_subfeatures_full(*feature_ids, feature_types=[], index=False, preserve_order=True)[source]
Get all features that are subfeatures of user-provided feature_ids AND subfeatures of those subfeatures, until there are no sub-sub…sub-features left.
- Parameters
*feature_ids (str) – parent feature IDs
feature_types (list of str) – GFF3 feature type(s) to retrieve
index (bool) – return line number of feature instead of
minorg.annotation.Annotation
objectspreserve_order (bool) – sort output by line number (preserve original order)
- Returns
list – Of
minorg.annotation.Annotation
objects if index=Falselist – Of int line number of entries if index=True
- index(chunk_lines=None) None [source]
Index file.
- Parameters
chunk_lines (int) – optional, number of lines between each indexed line. If not provided, defaults to self._chunk_lines.
- invert_attr_fields() dict [source]
Generate {<feature>: {<NONSTANDARD attribute field name>: <STANDARD attribute field name>}} mapping from self._attr_fields (which is in format {<feature>: {STANDARD attribute field name>: <NONSTANDARD attribute field name>}}
- Returns
Of reversed attribute field name mapping
- Return type
dict
- iter_fasta_raw() Generator[str, None, None] [source]
Yields FASTA entries (excludes ‘##FASTA’ header)
- iter_raw(include_fasta=False) Generator[str, None, None] [source]
Yields raw string read from file.
- make_annotation_from_str_gen(strip_newline=True) Callable[str, minorg.annotation.Annotation] [source]
Create function to parse raw string entries into
minorg.annotation.Annotation
objects based on inferred data format (GFF3 or BED generated by gff2bed)- Parameters
strip_newline (bool) – whether to strip newline if it present in string entry when read from file
- Returns
- Return type
func
- parse_fasta() None [source]
Read the FASTA section of the file stored at self._fname, and write sequence entries to temporary file at self._fasta. If self._fasta already exists, it will be deleted and replaced.
- read_file(fname, fasta=False) None [source]
Read file to memory.
- Parameters
fname (str) – path to GFF3 file
fasta (bool) – parse FASTA section, if it exists, into IndexedFasta object stored at self.fasta
- sort() None [source]
Sort data stored in memory. Does NOT sort indexed files.
Sort key: seqid, start, -end, source, feature type, str of attributes with standardised field names
- subset(feature_ids=None, feature_types=None, subfeatures=True, preserve_order=True)[source]
Subset data by feature ID (feature_ids) and/or feature type (feature_types) and generate new GFF object from them.
Arguments:
- update_attr_fields(attr_mod=None) None [source]
Update attribute fields w/ user-provided attribute field modification dictionary. (Otherwise, use self._attr_fields)
- Parameters
attr_mod (dict) – optional, dictionary of attribute modifications