GeneticCode#

class GeneticCode(ID: int, name: str, ncbi_code_sequence: dataclasses.InitVar[str], ncbi_start_codon_map: dataclasses.InitVar[str], moltype: MolType = MolType(('T', 'C', 'A', 'G')))#

Holds codon to amino acid mapping, and vice versa.

Attributes:
anticodons
codons
sense_codons
start_codons
stop_codons

Methods

get_alphabet([include_gap, include_stop])

returns a codon alphabet

is_stop(codon)

Returns True if codon is a stop codon, False otherwise.

sixframes(seq)

Returns the six reading frames of the genetic code.

to_regex(seq)

returns a regex pattern with an amino acid expanded to its codon set

to_table()

returns aa to codon mapping as a cogent3 Table

translate(dna[, start, rc, incomplete_ok])

Translates DNA to protein.

Notes

We add additional states to the genetic code to represent gapped codons and missing data.

ID: int#
anticodons: tuple[str, ...] = None#
codons: KmerAlphabet = None#
get_alphabet(include_gap: bool = False, include_stop: bool = False) SenseCodonAlphabet#

returns a codon alphabet

Parameters:
include_gap

alphabet includes the gap motif

is_stop(codon)#

Returns True if codon is a stop codon, False otherwise.

moltype: MolType = MolType(('T', 'C', 'A', 'G'))#
name: str#
ncbi_code_sequence: dataclasses.InitVar[str]#
ncbi_start_codon_map: dataclasses.InitVar[str]#
property sense_codons: set[str]#
sixframes(seq: str) Iterable[tuple[str, int, str]]#

Returns the six reading frames of the genetic code.

Returns:
A dictionary with keys (strand, start) where strand is “+”/”-”
property start_codons: set[str]#
property stop_codons: set[str]#
to_regex(seq: str | Sequence) str#

returns a regex pattern with an amino acid expanded to its codon set

Parameters:
seq

a Sequence or string of amino acids

to_table()#

returns aa to codon mapping as a cogent3 Table

translate(dna: str | ndarray, start: int = 0, rc: bool = False, incomplete_ok: bool = True) str#

Translates DNA to protein.

Parameters:
dna

a string of nucleotides

start

position to begin translation (used to implement frames)

rc

if True, returns the translation of the reverse complement sequence

incomplete_ok

if True, translates codons that are a mix of gaps and bases as a gap. If False, raises an AlphabetError on those incomplete cases.

Returns:
The amino acid sequence as a string.

Notes

Sequences are truncated to be a multiple of 3. Codons containing ambiguous nucleotides are translated as ‘X’, codons containing a gap character are translated as ‘-’ unless incomplete_ok is False. Codons with a mix of ambiguous nucleotides are translated as ‘X’.