Applying a discrete-time, non-stationary nucleotide model#

We fit a discrete-time Markov nucleotide model. This corresponds to a Barry and Hartigan 1987 model.

from cogent3 import get_app

loader = get_app("load_aligned", format="fasta", moltype="dna")
aln = loader("data/primate_brca1.fasta")
model = get_app("model", "BH", tree="data/primate_brca1.tree")
result = model(aln)
result
BH
keylnLnfpDLCunique_Q
'BH'-6941.6028132TrueTrue

Note

DLC stands for diagonal largest in column and the value is a check on the identifiability of the model. unique_Q is another identifiability check, but it not applicable to a discrete-time model and so remains as None.

Looking at the likelihood function, we see these maximum likelihood estimated values

result.lf

BH

log-likelihood = -6941.6028

number of free parameters = 132

Edge motif motif2 params
edgemotifmotif2psubs
GalagoTT0.8750
GalagoTC0.0649
GalagoTA0.0409
GalagoTG0.0192
GalagoCT0.1125
............
edge.3AG0.0053
edge.3GT0.0000
edge.3GC0.0011
edge.3GA0.0041
edge.3GG0.9948
Motif params
ACGT
0.37570.17420.20950.2406

Get a tree with branch lengths as paralinear#

This is the only possible length metric for a discrete-time process.

tree = result.tree
fig = tree.get_figure()
fig.scale_bar = "top right"
fig.show(width=500, height=500)

Getting parameter estimates#

For a discrete-time model, aside from the root motif probabilities, everything is edge specific. But note that the tabular_result has different keys from the continuous-time case, as demonstrated below.

tabulator = get_app("tabulate_stats")
stats = tabulator(result)
stats
2x tabular_result('edge motif motif2 params': Table, 'motif params': Table)
stats["edge motif motif2 params"]
edge motif motif2 params
edgemotifmotif2psubs
GalagoTT0.8750
GalagoTC0.0649
GalagoTA0.0409
GalagoTG0.0192
GalagoCT0.1125
............
edge.3AG0.0053
edge.3GT0.0000
edge.3GC0.0011
edge.3GA0.0041
edge.3GG0.9948

Top 5 and bottom 5 rows from 176 rows x 4 columns