GeneticCode#
- class GeneticCode(ID: int, name: str, ncbi_code_sequence: dataclasses.InitVar[str], ncbi_start_codon_map: dataclasses.InitVar[str], moltype: MolType = MolType(('T', 'C', 'A', 'G')))#
Holds codon to amino acid mapping, and vice versa.
- Attributes:
- anticodons
- codons
- sense_codons
- start_codons
- stop_codons
Methods
get_alphabet
([include_gap, include_stop])returns a codon alphabet
is_stop
(codon)Returns True if codon is a stop codon, False otherwise.
sixframes
(seq)Returns the six reading frames of the genetic code.
to_regex
(seq)returns a regex pattern with an amino acid expanded to its codon set
to_table
()returns aa to codon mapping as a cogent3 Table
translate
(dna[, start, rc, incomplete_ok])Translates DNA to protein.
Notes
We add additional states to the genetic code to represent gapped codons and missing data.
- ID: int#
- anticodons: tuple[str, ...] = None#
- codons: KmerAlphabet = None#
- get_alphabet(include_gap: bool = False, include_stop: bool = False) SenseCodonAlphabet #
returns a codon alphabet
- Parameters:
- include_gap
alphabet includes the gap motif
- is_stop(codon)#
Returns True if codon is a stop codon, False otherwise.
- name: str#
- ncbi_code_sequence: dataclasses.InitVar[str]#
- ncbi_start_codon_map: dataclasses.InitVar[str]#
- property sense_codons: set[str]#
- sixframes(seq: str) Iterable[tuple[str, int, str]] #
Returns the six reading frames of the genetic code.
- Returns:
- A dictionary with keys (strand, start) where strand is “+”/”-”
- property start_codons: set[str]#
- property stop_codons: set[str]#
- to_regex(seq: str | Sequence) str #
returns a regex pattern with an amino acid expanded to its codon set
- Parameters:
- seq
a Sequence or string of amino acids
- to_table()#
returns aa to codon mapping as a cogent3 Table
- translate(dna: str | ndarray, start: int = 0, rc: bool = False, incomplete_ok: bool = True) str #
Translates DNA to protein.
- Parameters:
- dna
a string of nucleotides
- start
position to begin translation (used to implement frames)
- rc
if True, returns the translation of the reverse complement sequence
- incomplete_ok
if True, translates codons that are a mix of gaps and bases as a gap. If False, raises an AlphabetError on those incomplete cases.
- Returns:
- The amino acid sequence as a string.
Notes
Sequences are truncated to be a multiple of 3. Codons containing ambiguous nucleotides are translated as ‘X’, codons containing a gap character are translated as ‘-’ unless incomplete_ok is False. Codons with a mix of ambiguous nucleotides are translated as ‘X’.