Applying a discrete-time, non-stationary nucleotide model

We fit a discrete-time Markov nucleotide model. This corresponds to a Barry and Hartigan 1987 model.

from cogent3.app import io, evo

loader = io.load_aligned(format="fasta", moltype="dna")
aln = loader("data/primate_brca1.fasta")
model = evo.model("BH", tree="data/primate_brca1.tree")
result = model(aln)
result
BH
keylnLnfpDLCunique_Q
'BH'-6941.4684135TrueTrue

NOTE: DLC stands for diagonal largest in column and the value is a check on the identifiability of the model. unique_Q is not applicable to a discrete-time model and so remains as None.

Looking at the likelihood function, you will

result.lf

BH

log-likelihood = -6941.4684

number of free parameters = 135

Edge motif motif2 params
edgemotifmotif2psubs
GalagoTT0.8751
GalagoTC0.0649
GalagoTA0.0409
GalagoTG0.0192
GalagoCT0.1126
............
edge.3AG0.0055
edge.3GT0.0000
edge.3GC0.0011
edge.3GA0.0039
edge.3GG0.9950
Motif params
ACGT
0.37740.17630.20580.2404

Get a tree with branch lengths as paralinear

This is the only possible length metric for a discrete-time process.

tree = result.tree
fig = tree.get_figure()
fig.scale_bar = "top right"
fig.show(width=500, height=500)

Getting parameter estimates

For a discrete-time model, aside from the root motif probabilities, everything is edge specific. But note that the tabular_result has different keys from the continuous-time case, as demonstrated below.

tabulator = evo.tabulate_stats()
stats = tabulator(result)
stats
2x tabular_result('edge motif motif2 params': Table, 'motif params': Table)
stats["edge motif motif2 params"]
edge motif motif2 params
edgemotifmotif2psubs
GalagoTT0.8751
GalagoTC0.0649
GalagoTA0.0409
GalagoTG0.0192
GalagoCT0.1126
............
edge.3AG0.0055
edge.3GT0.0000
edge.3GC0.0011
edge.3GA0.0039
edge.3GG0.9950

Top 5 and bottom 5 rows from 176 rows x 4 columns