Performing a relative rate test#

Section author: Gavin Huttley

From cogent3 import all the components we need

from cogent3 import load_aligned_seqs, load_tree
from cogent3.evolve.models import get_model
from scipy.stats.distributions import chi2

Get your alignment and tree.

aln = load_aligned_seqs("data/long_testseqs.fasta")
t = load_tree(filename="data/test.tree")

Create a HKY85 model.

sm = get_model("HKY85")

Make the controller object and limit the display precision (to decrease the chance that small differences in estimates cause tests of the documentation to fail).

lf = sm.make_likelihood_function(t, digits=2, space=3)

Set the local clock for humans & Howler Monkey. This method is just a special interface to the more general set_param_rules() method.

lf.set_local_clock("Human", "HowlerMon")

Get the likelihood function object this object performs the actual likelihood calculation.

lf.set_alignment(aln)

Optimise the function capturing the return optimised lnL, and parameter value vector.

lf.optimise(show_progress=False)

View the resulting maximum-likelihood parameter values.

lf.set_name("clock")
lf

clock

log-likelihood = -8751.9425

number of free parameters = 7

Global params
kappa
4.10
Edge params
edgeparentlength
Humanedge.00.04
HowlerMonedge.00.04
edge.0edge.10.04
Mouseedge.10.28
edge.1root0.02
NineBanderoot0.09
DogFacedroot0.11
Motif params
ACGT
0.370.190.210.23

We extract the log-likelihood and number of free parameters for later use.

null_lnL = lf.get_log_likelihood()
null_nfp = lf.get_num_free_params()

Clear the local clock constraint, freeing up the branch lengths.

lf.set_param_rule("length", is_independent=True)

Run the optimiser capturing the return optimised lnL, and parameter value vector.

lf.optimise(show_progress=False)

View the resulting maximum-likelihood parameter values.

lf.set_name("non clock")
lf

non clock

log-likelihood = -8750.5889

number of free parameters = 8

Global params
kappa
4.10
Edge params
edgeparentlength
Humanedge.00.03
HowlerMonedge.00.04
edge.0edge.10.04
Mouseedge.10.28
edge.1root0.02
NineBanderoot0.09
DogFacedroot0.11
Motif params
ACGT
0.370.190.210.23

These two lnL’s are now used to calculate the likelihood ratio statistic it’s degrees-of-freedom and the probability of observing the LR.

LR = 2 * (lf.get_log_likelihood() - null_lnL)
df = lf.get_num_free_params() - null_nfp
P = chi2.sf(LR, df)

Print this and look up a \(\chi^2\) with number of edges - 1 degrees of freedom.

print("Likelihood ratio statistic = ", LR)
print("degrees-of-freedom = ", df)
print("probability = ", P)
Likelihood ratio statistic =  2.707164528703288
degrees-of-freedom =  1
probability =  0.09989841174711883