minorg.fasta module

minorg.fasta.collapse_identical_seqs(fasta, fout_fa, fout_seqid)[source]
minorg.fasta.dict_to_SeqRecordList(d, description='', seq_id_func=<function <lambda>>, seq_type='DNA', gap_char='-', gapped=False)[source]
minorg.fasta.dict_to_fasta(d, fout, seq_type='detect', gap_char='-', gapped=False)[source]
minorg.fasta.extract_ranges(seq, ranges, strand='+')[source]
minorg.fasta.fasta_to_dict(fname)[source]

returns dictionary of sequences indexed by sequence name

minorg.fasta.find_identical_in_fasta(query, subject, chunk=100000)[source]

Searches for exact matches in memory saving way using sliding within with size 2*(max(chunk, len(query_seq))) and overlap max(chunk, len(query_seq)). DOES NOT EXPAND AMBIGUOUS BASES. (i.e. ‘N’ matches ONLY ‘N’ character, not any base)

Parameters
  • query (dict) – {‘<seqid>’: ‘<str of seq or Bio.Seq.Seq obj>’}

  • subject (str) – path to FASTA file

  • chunk (int) – window overlap size (bp) for search

Returns

list of SearchIO QueryResult objects as if read from blast-tab file

with fields “qseqid sseqid sstart send”

Return type

list