Using genetic codes#

Note

These docs now use the new_type core objects via the following setting.

import os

# using new types without requiring an explicit argument
os.environ["COGENT3_NEW_TYPE"] = "1"

Selecting codes in methods that support them#

In cases where a cogent3 object method has a gc argument, you can just use the number under “Code ID” column.

For example, I’ve created a partial codon in "s1"

from cogent3 import make_aligned_seqs

data = {
    "s1": "GCTCATGCCAGCTCTTTACAGCATGAGAACA--AGT",
    "s2": "ACTCATGCCAACTCATTACAGCATGAGAACAGCAGT",
    "s3": "ACTCATGCCAGCTCATTACAGCATGAGAACAGCAGT",
    "s4": "ACTCATGCCAGCTCATTACAGCATGAGAACAGCAGT",
    "s5": "ACTCATGCCAGCTCAGTACAGCATGAGAACAGCAGT",
}

nt_seqs = make_aligned_seqs(data=data, moltype="dna")
nt_seqs
0
s2ACTCATGCCAACTCATTACAGCATGAGAACAGCAGT
s1G.........G...T................--...
s3..........G.........................
s4..........G.........................
s5..........G....G....................

5 x 36 dna alignment

We specify the genetic code, and we allow incomplete codons. In this case, if a codon contains a gap, they are converted to ? in the translation.

nt_seqs.get_translation(gc=1, incomplete_ok=True)
0
s2THANSLQHENSS
s1A..S......-.
s3...S........
s4...S........
s5...S.V......

5 x 12 protein alignment

Translate DNA sequences#

From a string

from cogent3 import get_code

standard_code = get_code(1)
standard_code.translate("TTTGCAAAC")
'FAN'

This can also be applied to a numpy array.

import numpy
from cogent3 import get_code

standard_code = get_code(1)

standard_code.translate(numpy.array([0, 0, 0, 3, 1, 2, 2, 2, 1], dtype=numpy.uint8))
'FAN'

Conversion to a ProteinSequence from a DnaSequence is shown in Translate a sequence to protein.

Translate all six frames#

from cogent3 import get_code, make_seq

standard_code = get_code(1)
seq = make_seq("ATGCTAACATAAA", moltype="dna")
translations = standard_code.sixframes(seq)
print(translations)
<generator object GeneticCode._ at 0x7f610ea0b670>

Translate a codon#

from cogent3 import get_code, make_seq

standard_code = get_code(1)
standard_code["TTT"]
'F'

or get the codons for a single amino acid

standard_code["A"]
{'GCA', 'GCC', 'GCG', 'GCT'}

Look up the amino acid corresponding to a single codon#

from cogent3 import get_code

standard_code = get_code(1)
standard_code["TTT"]
'F'

Get all the codons for one amino acid#

from cogent3 import get_code

standard_code = get_code(1)
standard_code["A"]
{'GCA', 'GCC', 'GCG', 'GCT'}

Get all the codons for a group of amino acids#

targets = ["A", "C"]
codons = [standard_code[aa] for aa in targets]
codons
[{'GCA', 'GCC', 'GCG', 'GCT'}, {'TGC', 'TGT'}]

Getting the alphabet for the genetic code#

The default for the get_alphabet() method is to return an alphabet representing just the sense codons (a SenseCodonAlphabet instance).

from cogent3 import get_code

gc = get_code(1)
alphabet = gc.get_alphabet()
len(alphabet)
61

Setting include_stop=True returns all codons.

from cogent3 import get_code

gc = get_code(1)
alphabet = gc.get_alphabet(include_stop=True)
type(alphabet)
cogent3.core.new_alphabet.SenseCodonAlphabet

You can also include “gap state” (i.e. "---") or “missing state” ("???") codons with the arguments include_gap and include_missing respectively.