Applying a discrete-time, non-stationary nucleotide model#
We fit a discrete-time Markov nucleotide model. This corresponds to a Barry and Hartigan 1987 model.
from cogent3 import get_app
loader = get_app("load_aligned", format="fasta", moltype="dna")
aln = loader("data/primate_brca1.fasta")
model = get_app("model", "BH", tree="data/primate_brca1.tree")
result = model(aln)
result
key | lnL | nfp | DLC | unique_Q |
---|---|---|---|---|
'BH' | -6941.6028 | 132 | True | True |
Note
DLC stands for diagonal largest in column and the value is a check on the identifiability of the model. unique_Q
is another identifiability check, but it not applicable to a discrete-time model and so remains as None
.
Looking at the likelihood function, we see these maximum likelihood estimated values
result.lf
BH
log-likelihood = -6941.6028
number of free parameters = 132
edge | motif | motif2 | psubs |
---|---|---|---|
Galago | T | T | 0.8750 |
Galago | T | C | 0.0649 |
Galago | T | A | 0.0409 |
Galago | T | G | 0.0192 |
Galago | C | T | 0.1125 |
... | ... | ... | ... |
edge.3 | A | G | 0.0053 |
edge.3 | G | T | 0.0000 |
edge.3 | G | C | 0.0011 |
edge.3 | G | A | 0.0041 |
edge.3 | G | G | 0.9948 |
A | C | G | T |
---|---|---|---|
0.3757 | 0.1742 | 0.2095 | 0.2406 |
Get a tree with branch lengths as paralinear#
This is the only possible length metric for a discrete-time process.
tree = result.tree
fig = tree.get_figure()
fig.scale_bar = "top right"
fig.show(width=500, height=500)
Getting parameter estimates#
For a discrete-time model, aside from the root motif probabilities, everything is edge specific. But note that the tabular_result
has different keys from the continuous-time case, as demonstrated below.
tabulator = get_app("tabulate_stats")
stats = tabulator(result)
stats
2x tabular_result('edge motif motif2 params': Table, 'motif params': Table)
stats["edge motif motif2 params"]
edge | motif | motif2 | psubs |
---|---|---|---|
Galago | T | T | 0.8750 |
Galago | T | C | 0.0649 |
Galago | T | A | 0.0409 |
Galago | T | G | 0.0192 |
Galago | C | T | 0.1125 |
... | ... | ... | ... |
edge.3 | A | G | 0.0053 |
edge.3 | G | T | 0.0000 |
edge.3 | G | C | 0.0011 |
edge.3 | G | A | 0.0041 |
edge.3 | G | G | 0.9948 |
Top 5 and bottom 5 rows from 176 rows x 4 columns