Likelihood analysis of multiple loci#
Section author: Gavin Huttley
We want to know whether an exchangeability parameter is different between alignments. We will specify a null model, under which each alignment get’s it’s own motif probabilities and all alignments share branch lengths and the exchangeability parameter kappa (the transition / transversion ratio). We’ll split the example alignment into two-pieces.
from cogent3 import load_aligned_seqs, make_table, make_tree
from cogent3.evolve.models import HKY85
from scipy.stats.distributions import chi2
from cogent3.recalculation.scope import ALL, EACH
aln = load_aligned_seqs("data/long_testseqs.fasta")
half = len(aln) // 2
aln1 = aln[:half]
aln2 = aln[half:]
We provide names for those alignments, then construct the tree, model instances.
loci_names = ["1st-half", "2nd-half"]
loci = [aln1, aln2]
tree = make_tree(tip_names=aln.names)
mod = HKY85()
To make a likelihood function with multiple alignments we provide the list of loci names. We can then specify a parameter (other than length) to be the same across the loci (using the imported ALL
) or different for each locus (using EACH
). We conduct a LR test as before.
lf = mod.make_likelihood_function(tree, loci=loci_names, digits=2, space=3)
lf.set_param_rule("length", is_independent=False)
lf.set_param_rule("kappa", loci=ALL)
lf.set_alignment(loci)
lf.optimise(show_progress=False)
lf
HKY85
log-likelihood = -9168.3331
number of free parameters = 2
kappa | length |
---|---|
3.98 | 0.13 |
locus | A | C | G | T |
---|---|---|---|---|
2nd-half | 0.35 | 0.19 | 0.22 | 0.24 |
1st-half | 0.38 | 0.18 | 0.21 | 0.22 |
all_lnL = lf.lnL
all_nfp = lf.nfp
lf.set_param_rule("kappa", loci=EACH)
lf.optimise(show_progress=False)
lf
HKY85
log-likelihood = -9167.5373
number of free parameters = 3
length |
---|
0.13 |
locus | kappa |
---|---|
1st-half | 4.33 |
2nd-half | 3.74 |
locus | A | C | G | T |
---|---|---|---|---|
2nd-half | 0.35 | 0.19 | 0.22 | 0.24 |
1st-half | 0.38 | 0.18 | 0.21 | 0.22 |
each_lnL = lf.lnL
each_nfp = lf.nfp
LR = 2 * (each_lnL - all_lnL)
df = each_nfp - all_nfp
Just to pretty up the result display, I’ll print(a table consisting of the test statistics created on the fly.)
make_table(
header=["LR", "df", "p"], rows=[[LR, df, chi2.sf(LR, df)]], digits=2, space=3,
)
LR | df | p |
---|---|---|
1.59 | 1 | 0.21 |
1 rows x 3 columns