Testing a hypothesis – non-stationary or time-reversible#

Note

These docs now use the new_type core objects via the following setting.

import os

# using new types without requiring an explicit argument
os.environ["COGENT3_NEW_TYPE"] = "1"

We test the hypothesis that the GTR model is sufficient for a data set, compared with the GN (non-stationary general nucleotide model).

from cogent3 import get_app

loader = get_app("load_aligned", format="fasta", moltype="dna")
aln = loader("data/primate_brca1.fasta")

tree = "data/primate_brca1.tree"

null = get_app("model", "GTR", tree=tree, optimise_motif_probs=True)
alt = get_app("model", "GN", tree=tree, optimise_motif_probs=True)
hyp = get_app("hypothesis", null, alt)
result = hyp(aln)
type(result)
cogent3.app.result.hypothesis_result

result is a hypothesis_result object. The repr() displays the likelihood ratio test statistic, degrees of freedom and associated p-value>

result
Statistics
LRdfpvalue
9.381360.1532
hypothesiskeylnLnfpDLCunique_Q
null'GTR'-6992.576919TrueTrue
alt'GN'-6987.886225TrueTrue

In this case, we accept the null given the p-value is > 0.05. We use this object to demonstrate the properties of a hypothesis_result.

hypothesis_result has attributes and keys#

Accessing the test statistics#

result.LR, result.df, result.pvalue
(9.381296736692093, 6, np.float64(0.15324238178249514))

The null hypothesis#

This model is accessed via the null attribute.

result.null
GTR
keylnLnfpDLCunique_Q
'GTR'-6992.576919TrueTrue
result.null.lf

GTR

log-likelihood = -6992.5769

number of free parameters = 19

Global params
A/CA/GA/TC/GC/T
1.235.250.952.345.97
Edge params
edgeparentlength
Galagoroot0.17
HowlerMonroot0.04
Rhesusedge.30.02
Orangutanedge.20.01
Gorillaedge.10.00
Humanedge.00.01
Chimpanzeeedge.00.00
edge.0edge.10.00
edge.1edge.20.00
edge.2edge.30.01
edge.3root0.01
Motif params
ACGT
0.380.170.210.24

The alternate hypothesis#

result.alt.lf

GN

log-likelihood = -6987.8862

number of free parameters = 25

Global params
A>CA>GA>TC>AC>GC>TG>AG>CG>TT>AT>C
0.873.670.911.592.136.038.221.230.631.253.41
Edge params
edgeparentlength
Galagoroot0.17
HowlerMonroot0.04
Rhesusedge.30.02
Orangutanedge.20.01
Gorillaedge.10.00
Humanedge.00.01
Chimpanzeeedge.00.00
edge.0edge.10.00
edge.1edge.20.00
edge.2edge.30.01
edge.3root0.01
Motif params
ACGT
0.380.180.210.24

Saving hypothesis results#

You are advised to save these results as serialised data since this provides maximum flexibility for downstream analyses.

The following would write the result into a sqlitedb.

from cogent3 import get_app, open_data_store

output = open_data_store("path/to/myresults.sqlitedb", mode="w")
writer = get_app("write_db", data_store=output)
writer(result)