Using genetic codes#
Note
These docs now use the new_type
core objects via the following setting.
import os
# using new types without requiring an explicit argument
os.environ["COGENT3_NEW_TYPE"] = "1"
Selecting codes in methods that support them#
In cases where a cogent3
object method has a gc
argument, you can just use the number under “Code ID” column.
For example, I’ve created a partial codon in "s1"
from cogent3 import make_aligned_seqs
data = {
"s1": "GCTCATGCCAGCTCTTTACAGCATGAGAACA--AGT",
"s2": "ACTCATGCCAACTCATTACAGCATGAGAACAGCAGT",
"s3": "ACTCATGCCAGCTCATTACAGCATGAGAACAGCAGT",
"s4": "ACTCATGCCAGCTCATTACAGCATGAGAACAGCAGT",
"s5": "ACTCATGCCAGCTCAGTACAGCATGAGAACAGCAGT",
}
nt_seqs = make_aligned_seqs(data=data, moltype="dna")
nt_seqs
0 | |
s2 | ACTCATGCCAACTCATTACAGCATGAGAACAGCAGT |
s1 | G.........G...T................--... |
s3 | ..........G......................... |
s4 | ..........G......................... |
s5 | ..........G....G.................... |
5 x 36 dna alignment
We specify the genetic code, and we allow incomplete codons. In this case, if a codon contains a gap, they are converted to ?
in the translation.
nt_seqs.get_translation(gc=1, incomplete_ok=True)
0 | |
s2 | THANSLQHENSS |
s1 | A..S......-. |
s3 | ...S........ |
s4 | ...S........ |
s5 | ...S.V...... |
5 x 12 protein alignment
Translate DNA sequences#
From a string
from cogent3 import get_code
standard_code = get_code(1)
standard_code.translate("TTTGCAAAC")
'FAN'
This can also be applied to a numpy array.
import numpy
from cogent3 import get_code
standard_code = get_code(1)
standard_code.translate(numpy.array([0, 0, 0, 3, 1, 2, 2, 2, 1], dtype=numpy.uint8))
'FAN'
Conversion to a ProteinSequence
from a DnaSequence
is shown in Translate a sequence to protein.
Translate all six frames#
from cogent3 import get_code, make_seq
standard_code = get_code(1)
seq = make_seq("ATGCTAACATAAA", moltype="dna")
translations = standard_code.sixframes(seq)
print(translations)
<generator object GeneticCode._ at 0x7f610ea0b670>
Translate a codon#
from cogent3 import get_code, make_seq
standard_code = get_code(1)
standard_code["TTT"]
'F'
or get the codons for a single amino acid
standard_code["A"]
{'GCA', 'GCC', 'GCG', 'GCT'}
Look up the amino acid corresponding to a single codon#
from cogent3 import get_code
standard_code = get_code(1)
standard_code["TTT"]
'F'
Get all the codons for one amino acid#
from cogent3 import get_code
standard_code = get_code(1)
standard_code["A"]
{'GCA', 'GCC', 'GCG', 'GCT'}
Get all the codons for a group of amino acids#
targets = ["A", "C"]
codons = [standard_code[aa] for aa in targets]
codons
[{'GCA', 'GCC', 'GCG', 'GCT'}, {'TGC', 'TGT'}]
Getting the alphabet for the genetic code#
The default for the get_alphabet()
method is to return an alphabet representing just the sense codons (a SenseCodonAlphabet
instance).
from cogent3 import get_code
gc = get_code(1)
alphabet = gc.get_alphabet()
len(alphabet)
61
Setting include_stop=True
returns all codons.
from cogent3 import get_code
gc = get_code(1)
alphabet = gc.get_alphabet(include_stop=True)
type(alphabet)
cogent3.core.new_alphabet.SenseCodonAlphabet
You can also include “gap state” (i.e. "---"
) or “missing state” ("???"
) codons with the arguments include_gap
and include_missing
respectively.