Apply a non-stationary nucleotide model to an alignment with a tree

We analyse an alignment with sequences from 6 primates.

from cogent3.app import io

reader = io.load_aligned(format="fasta", moltype="dna")
aln = reader("data/primate_brca1.fasta")
aln.names
['Chimpanzee',
 'Galago',
 'Gorilla',
 'HowlerMon',
 'Human',
 'Orangutan',
 'Rhesus']

Specify the tree via a tree instance

from cogent3 import load_tree
from cogent3.app import evo

tree = load_tree("data/primate_brca1.tree")
gn = evo.model("GN", tree=tree)
gn
model(type='model', sm='GN', tree='root', unique_trees=False, name=None,
sm_args=None, lf_args=None, time_het=None, param_rules=None, opt_args=None,
split_codons=False, show_progress=False, verbose=False)

Specify the tree via a path.

gn = evo.model("GN", tree="data/primate_brca1.tree")
gn
model(type='model', sm='GN', tree='data/primate_brca1.tree', unique_trees=False,
name=None, sm_args=None, lf_args=None, time_het=None, param_rules=None,
opt_args=None, split_codons=False, show_progress=False, verbose=False)

Apply the model to an alignment

fitted = gn(aln)
fitted
GN
keylnLnfpDLCunique_Q
'GN'-6987.883425TrueTrue

In the above, no value is shown for unique_Q. This can happen because of numerical precision issues.

Note

in the display of the lf below, the “length” parameter is not the ENS. It is, instead, just a scalar.

fitted.lf

GN

log-likelihood = -6987.8834

number of free parameters = 25

Global params
A>CA>GA>TC>AC>GC>TG>AG>CG>TT>A
0.87003.66680.91101.59242.12646.03218.21751.22880.62941.2498
continuation
T>C
3.4134
Edge params
edgeparentlength
Galagoroot0.1735
HowlerMonroot0.0450
Rhesusedge.30.0215
Orangutanedge.20.0078
Gorillaedge.10.0025
Humanedge.00.0061
Chimpanzeeedge.00.0028
edge.0edge.10.0000
edge.1edge.20.0033
edge.2edge.30.0121
edge.3root0.0077
Motif params
ACGT
0.37560.17680.20780.2398