Testing a hypothesis – non-stationary or time-reversible#

We test the hypothesis that the GTR model is sufficient for a data set, compared with the GN (non-stationary general nucleotide model).

from cogent3 import get_app

loader = get_app("load_aligned", format="fasta", moltype="dna")
aln = loader("data/primate_brca1.fasta")

tree = "data/primate_brca1.tree"

null = get_app("model", "GTR", tree=tree, optimise_motif_probs=True)
alt = get_app("model", "GN", tree=tree, optimise_motif_probs=True)
hyp = get_app("hypothesis", null, alt)
result = hyp(aln)
type(result)
cogent3.app.result.hypothesis_result

result is a hypothesis_result object. The repr() displays the likelihood ratio test statistic, degrees of freedom and associated p-value>

result
Statistics
LRdfpvalue
9.381360.1532
hypothesiskeylnLnfpDLCunique_Q
null'GTR'-6992.576919TrueTrue
alt'GN'-6987.886225TrueTrue

In this case, we accept the null given the p-value is > 0.05. We use this object to demonstrate the properties of a hypothesis_result.

hypothesis_result has attributes and keys#

Accessing the test statistics#

result.LR, result.df, result.pvalue
(9.381296736692093, 6, np.float64(0.15324238178249514))

The null hypothesis#

This model is accessed via the null attribute.

result.null
GTR
keylnLnfpDLCunique_Q
'GTR'-6992.576919TrueTrue
result.null.lf

GTR

log-likelihood = -6992.5769

number of free parameters = 19

Global params
A/CA/GA/TC/GC/T
1.22965.24780.94732.33895.9666
Edge params
edgeparentlength
Galagoroot0.1727
HowlerMonroot0.0448
Rhesusedge.30.0215
Orangutanedge.20.0077
Gorillaedge.10.0025
Humanedge.00.0060
Chimpanzeeedge.00.0028
edge.0edge.10.0000
edge.1edge.20.0034
edge.2edge.30.0119
edge.3root0.0076
Motif params
ACGT
0.37920.17190.20660.2423

The alternate hypothesis#

result.alt.lf

GN

log-likelihood = -6987.8862

number of free parameters = 25

Global params
A>CA>GA>TC>AC>GC>TG>AG>CG>TT>A
0.87003.66700.91111.59252.12646.03248.21781.22880.62941.2499
continuation
T>C
3.4136
Edge params
edgeparentlength
Galagoroot0.1735
HowlerMonroot0.0450
Rhesusedge.30.0215
Orangutanedge.20.0078
Gorillaedge.10.0025
Humanedge.00.0061
Chimpanzeeedge.00.0028
edge.0edge.10.0000
edge.1edge.20.0033
edge.2edge.30.0121
edge.3root0.0077
Motif params
ACGT
0.37560.17680.20780.2398

Saving hypothesis results#

You are advised to save these results as serialised data since this provides maximum flexibility for downstream analyses.

The following would write the result into a sqlitedb.

from cogent3 import get_app, open_data_store

output = open_data_store("path/to/myresults.sqlitedb", mode="w")
writer = get_app("write_db", data_store=output)
writer(result)