Testing a hypothesis – non-stationary or time-reversible#
We test the hypothesis that the GTR model is sufficient for a data set, compared with the GN (non-stationary general nucleotide model).
from cogent3 import get_app
loader = get_app("load_aligned", format="fasta", moltype="dna")
aln = loader("data/primate_brca1.fasta")
tree = "data/primate_brca1.tree"
null = get_app("model", "GTR", tree=tree, optimise_motif_probs=True)
alt = get_app("model", "GN", tree=tree, optimise_motif_probs=True)
hyp = get_app("hypothesis", null, alt)
result = hyp(aln)
type(result)
cogent3.app.result.hypothesis_result
result
is a hypothesis_result
object. The repr()
displays the likelihood ratio test statistic, degrees of freedom and associated p-value>
result
LR | df | pvalue |
---|---|---|
9.3813 | 6 | 0.1532 |
hypothesis | key | lnL | nfp | DLC | unique_Q |
---|---|---|---|---|---|
null | 'GTR' | -6992.5769 | 19 | True | True |
alt | 'GN' | -6987.8862 | 25 | True | True |
In this case, we accept the null given the p-value is > 0.05. We use this object to demonstrate the properties of a hypothesis_result
.
hypothesis_result
has attributes and keys#
Accessing the test statistics#
result.LR, result.df, result.pvalue
(9.381296736692093, 6, np.float64(0.15324238178249514))
The null hypothesis#
This model is accessed via the null
attribute.
result.null
key | lnL | nfp | DLC | unique_Q |
---|---|---|---|---|
'GTR' | -6992.5769 | 19 | True | True |
result.null.lf
GTR
log-likelihood = -6992.5769
number of free parameters = 19
A/C | A/G | A/T | C/G | C/T |
---|---|---|---|---|
1.2296 | 5.2478 | 0.9473 | 2.3389 | 5.9666 |
edge | parent | length |
---|---|---|
Galago | root | 0.1727 |
HowlerMon | root | 0.0448 |
Rhesus | edge.3 | 0.0215 |
Orangutan | edge.2 | 0.0077 |
Gorilla | edge.1 | 0.0025 |
Human | edge.0 | 0.0060 |
Chimpanzee | edge.0 | 0.0028 |
edge.0 | edge.1 | 0.0000 |
edge.1 | edge.2 | 0.0034 |
edge.2 | edge.3 | 0.0119 |
edge.3 | root | 0.0076 |
A | C | G | T |
---|---|---|---|
0.3792 | 0.1719 | 0.2066 | 0.2423 |
The alternate hypothesis#
result.alt.lf
GN
log-likelihood = -6987.8862
number of free parameters = 25
A>C | A>G | A>T | C>A | C>G | C>T | G>A | G>C | G>T | T>A |
---|---|---|---|---|---|---|---|---|---|
0.8700 | 3.6670 | 0.9111 | 1.5925 | 2.1264 | 6.0324 | 8.2178 | 1.2288 | 0.6294 | 1.2499 |
T>C |
---|
3.4136 |
edge | parent | length |
---|---|---|
Galago | root | 0.1735 |
HowlerMon | root | 0.0450 |
Rhesus | edge.3 | 0.0215 |
Orangutan | edge.2 | 0.0078 |
Gorilla | edge.1 | 0.0025 |
Human | edge.0 | 0.0061 |
Chimpanzee | edge.0 | 0.0028 |
edge.0 | edge.1 | 0.0000 |
edge.1 | edge.2 | 0.0033 |
edge.2 | edge.3 | 0.0121 |
edge.3 | root | 0.0077 |
A | C | G | T |
---|---|---|---|
0.3756 | 0.1768 | 0.2078 | 0.2398 |
Saving hypothesis results#
You are advised to save these results as serialised data since this provides maximum flexibility for downstream analyses.
The following would write the result into a sqlitedb
.
from cogent3 import get_app, open_data_store
output = open_data_store("path/to/myresults.sqlitedb", mode="w")
writer = get_app("write_db", data_store=output)
writer(result)