Applying GNC, a non-stationary codon model

See Kaehler et al for the formal description of this model. Note that we demonstrate hypothesis testing using this model elsewhere.

We apply this to a sample alignment.

from cogent3.app import evo, io

loader = io.load_aligned(format="fasta", moltype="dna")
aln = loader("data/primate_brca1.fasta")

The model is specified using it’s abbreviation.

model = evo.model("GNC", tree="data/primate_brca1.tree")
result = model(aln)
result
GNC
keylnLnfpDLCunique_Q
'GNC'-6713.273323TrueTrue
result.lf

GNC

log-likelihood = -6713.2733

number of free parameters = 23

Global params
A>CA>GA>TC>AC>GC>TG>AG>CG>TT>A
0.86153.53740.97921.66682.20426.25677.91971.22530.80151.2911
continuation
T>Comega
3.07240.8204
Edge params
edgeparentlength
Galagoroot0.5232
HowlerMonroot0.1338
Rhesusedge.30.0640
Orangutanedge.20.0233
Gorillaedge.10.0075
Humanedge.00.0182
Chimpanzeeedge.00.0085
edge.0edge.10.0000
edge.1edge.20.0100
edge.2edge.30.0366
edge.3root0.0238
Motif params
AAAAACAAGAATACAACCACGACTAGAAGC
0.05560.02350.03440.05560.02280.00460.00080.02890.02310.0286
continuation
AGGAGTATAATCATGATTCAACACCAGCAT
0.01400.03810.01860.00700.01280.01920.01960.00520.02380.0221
continuation
CCACCCCCGCCTCGACGCCGGCGTCTACTC
0.01950.00620.00060.02630.00110.00090.00230.00320.01370.0078
continuation
CTGCTTGAAGACGAGGATGCAGCCGCGGCT
0.01250.01050.07550.01050.03030.03150.01580.00960.00140.0137
continuation
GGAGGCGGGGGTGTAGTCGTGGTTTACTAT
0.01610.00900.00670.01330.01480.00700.00690.02130.00230.0101
continuation
TCATCCTCGTCTTGCTGGTGTTTATTCTTG
0.02210.00820.00150.02510.00180.00400.02010.02120.00780.0108
continuation
TTT
0.0187

We can obtain the tree with branch lengths as ENS

If this tree is written to newick (using the write() method), the lengths will now be ENS.

tree = result.tree
fig = tree.get_figure()
fig.scale_bar = "top right"
fig.show(width=500, height=500)