Applying GNC, a non-stationary codon model

See Kaehler et al for the formal description of this model. Note that we demonstrate hypothesis testing using this model elsewhere.

We apply this to a sample alignment.

from cogent3.app import io, evo

loader = io.load_aligned(format="fasta", moltype="dna")
aln = loader("data/primate_brca1.fasta")

The model is specified using it’s abbreviation.

model = evo.model("GNC", tree="data/primate_brca1.tree")
result = model(aln)
result
GNC
keylnLnfpDLCunique_Q
'GNC'-6707.185683TrueTrue
result.lf

GNC

log-likelihood = -6707.1856

number of free parameters = 83

Global params
A>CA>GA>TC>AC>GC>TG>AG>CG>TT>A
0.86163.53800.97821.67042.20156.26127.89281.22110.79801.2834
continuation
T>Comega
3.06080.8201
Edge params
edgeparentlength
Galagoroot0.5233
HowlerMonroot0.1331
Rhesusedge.30.0639
Orangutanedge.20.0234
Gorillaedge.10.0075
Humanedge.00.0182
Chimpanzeeedge.00.0085
edge.0edge.10.0000
edge.1edge.20.0100
edge.2edge.30.0368
edge.3root0.0246
Motif params
AAAAACAAGAATACAACCACGACTAGAAGC
0.05570.02280.03520.05480.02340.00320.00000.03200.02240.0285
continuation
AGGAGTATAATCATGATTCAACACCAGCAT
0.01460.03790.01840.00740.01200.01810.01940.00530.02540.0236
continuation
CCACCCCCGCCTCGACGCCGGCGTCTACTC
0.02130.00650.00000.02800.00000.00110.00110.00210.01540.0073
continuation
CTGCTTGAAGACGAGGATGCAGCCGCGGCT
0.01350.01070.07720.00880.02980.03180.01690.01070.00100.0130
continuation
GGAGGCGGGGGTGTAGTCGTGGTTTACTAT
0.01470.00990.00790.01120.01480.00640.00730.02070.00210.0086
continuation
TCATCCTCGTCTTGCTGGTGTTTATTCTTG
0.02240.00740.00000.02750.00110.00430.02120.01980.00850.0102
continuation
TTT
0.0181

We can obtain the tree with branch lengths as ENS

If this tree is written to newick (using the write() method), the lengths will now be ENS.

tree = result.tree
fig = tree.get_figure()
fig.scale_bar = "top right"
fig.show(width=500, height=500)