Applying a time-reversible codon model#

We display the full set of codon models available.

from cogent3 import available_models

available_models("codon")
Specify a model using 'Abbreviation' (case sensitive).
Model TypeAbbreviationDescription
codonCNFGTRConditional nucleotide frequency codon substitution model, GTR variant (with params analagous to the nucleotide GTR model). Yap, Lindsay, Easteal and Huttley, 2010, Mol Biol Evol 27: 726-734
codonCNFHKYConditional nucleotide frequency codon substitution model, HKY variant (with kappa, the ratio of transitions to transversions) Yap, Lindsay, Easteal and Huttley, 2010, Mol Biol Evol 27: 726-734
codonMG94HKYMuse and Gaut 1994 codon substitution model, HKY variant (with kappa, the ratio of transitions to transversions) Muse and Gaut, 1994, Mol Biol Evol, 11, 715-24
codonMG94GTRMuse and Gaut 1994 codon substitution model, GTR variant (with params analagous to the nucleotide GTR model) Muse and Gaut, 1994, Mol Biol Evol, 11, 715-24
codonGY94Goldman and Yang 1994 codon substitution model. N Goldman and Z Yang, 1994, Mol Biol Evol, 11(5):725-36.
codonY98Yang's 1998 substitution model, a derivative of the GY94. Z Yang, 1998, Mol Biol Evol, 15(5):568-73
codonH04GHuttley 2004 CpG substitution model. Includes a term for substitutions to or from CpG's. GA Huttley, 2004, Mol Biol Evol, 21(9):1760-8
codonH04GKHuttley 2004 CpG substitution model. Includes a term for transition substitutions to or from CpG's. GA Huttley, 2004, Mol Biol Evol, 21(9):1760-8
codonH04GGKHuttley 2004 CpG substitution model. Includes a general term for substitutions to or from CpG's and an adjustment for CpG transitions. GA Huttley, 2004, Mol Biol Evol, 21(9):1760-8
codonGNCGeneral Nucleotide Codon, a non-reversible codon model. Kaehler, Yap, Huttley, 2017, Gen Biol Evol 9(1): 134–49

10 rows x 3 columns

Using the conditional nucleotide form codon model#

The CNFGTR model (Yap et al) is the most robust of the time-reversible codon models available (Kaehler et al). By default, this model does not optimise the codon frequencies but uses the average estimated from the alignment. We configure the model to optimise the root motif probabilities.

from cogent3 import get_app

loader = get_app("load_aligned", format="fasta", moltype="dna")
aln = loader("data/primate_brca1.fasta")
model = get_app("model",
    "CNFGTR",
    tree="data/primate_brca1.tree",
    optimise_motif_probs=True,
)
result = model(aln)
result
CNFGTR
keylnLnfpDLCunique_Q
'CNFGTR'-6739.306777TrueTrue
result.lf

CNFGTR

log-likelihood = -6739.3067

number of free parameters = 77

Global params
A/CA/GA/TC/GC/Tomega
1.06563.93910.78511.94754.22650.7569
Edge params
edgeparentlength
Galagoroot0.5330
HowlerMonroot0.1365
Rhesusedge.30.0659
Orangutanedge.20.0233
Gorillaedge.10.0075
Humanedge.00.0182
Chimpanzeeedge.00.0085
edge.0edge.10.0000
edge.1edge.20.0101
edge.2edge.30.0352
edge.3root0.0228
Motif params
AAAAACAAGAATACAACCACGACTAGAAGC
0.05400.02420.03070.05430.02370.00630.00210.02970.02380.0280
continuation
AGGAGTATAATCATGATTCAACACCAGCAT
0.01220.04050.02260.00710.01410.02030.02280.00630.02200.0237
continuation
CCACCCCCGCCTCGACGCCGGCGTCTACTC
0.01650.00430.00210.02390.00220.00120.00350.00580.01230.0065
continuation
CTGCTTGAAGACGAGGATGCAGCCGCGGCT
0.00980.01050.07030.01120.02630.03100.01540.00830.00360.0145
continuation
GGAGGCGGGGGTGTAGTCGTGGTTTACTAT
0.01510.00720.00510.01390.01700.00770.00940.02100.00360.0171
continuation
TCATCCTCGTCTTGCTGGTGTTTATTCTTG
0.02200.00830.00390.02140.00380.00330.02010.02220.00510.0107
continuation
TTT
0.0146