GeneticCode#

class GeneticCode(code_sequence, ID=None, name=None, start_codon_sequence=None)#

Holds codon to amino acid mapping, and vice versa.

Use the get_code() function to get one of the included code instances. These are created as follows.

>>> code_sequence = 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG'
>>> gc = GeneticCode(code_sequence)
>>> sgc['UUU'] == 'F'
>>> sgc['TTT'] == 'F'
>>> sgc['F'] == ['TTT', 'TTC']          #in arbitrary order
>>> sgc['*'] == ['TAA', 'TAG', 'TGA']   #in arbitrary order

code_sequence : 64 character string containing NCBI genetic code translation

GeneticCode is immutable once created.

Attributes:
blocks

Returns list of lists of codon blocks in the genetic code.

Methods

changes(other)

Returns dict of {codon:'XY'} for codons that differ.

get_stop_indices(dna[, start])

returns indexes for stop codons in the specified frame

is_start(codon)

Returns True if codon is a start codon, False otherwise.

is_stop(codon)

Returns True if codon is a stop codon, False otherwise.

sixframes(dna)

Returns six-frame translation as dict containing {frame:translation}

to_regex(seq)

returns a regex pattern with an amino acid expanded to its codon set

to_table()

returns aa to codon mapping as a cogent3 Table

translate(dna[, start])

Translates DNA to protein with current GeneticCode.

get_alphabet

property blocks#

Returns list of lists of codon blocks in the genetic code.

A codon block can be:
  • a quartet, if all 4 XYn codons have the same amino acid.

  • a doublet, if XYt and XYc or XYa and XYg have the same aa.

  • a singlet, otherwise.

Returns a list of the quartets, doublets, and singlets in the order UUU -> GGG.

Note that a doublet cannot span the purine/pyrimidine boundary, and a quartet cannot span the boundary between two codon blocks whose first two bases differ.

changes(other)#

Returns dict of {codon:’XY’} for codons that differ.

X is the string representation of the amino acid in self, Y is the string representation of the amino acid in other. Always returns a 2-character string.

get_alphabet(include_stop=False)#
get_stop_indices(dna, start=0)#

returns indexes for stop codons in the specified frame

is_start(codon)#

Returns True if codon is a start codon, False otherwise.

is_stop(codon)#

Returns True if codon is a stop codon, False otherwise.

sixframes(dna)#

Returns six-frame translation as dict containing {frame:translation}

to_regex(seq)#

returns a regex pattern with an amino acid expanded to its codon set

Parameters:
seq

a Sequence or string of amino acids

to_table()#

returns aa to codon mapping as a cogent3 Table

translate(dna, start=0)#

Translates DNA to protein with current GeneticCode.

Parameters:
dna: str

a string of nucleotides

start: int

position to begin translation (used to implement frames)

Returns:
String containing amino acid sequence. Translates the entire sequence.
It is the caller’s responsibility to find open reading frames.