natsel_sitehet
– a test of site heterogeneity#
Note
These docs now use the new_type
core objects via the following setting.
import os
# using new types without requiring an explicit argument
os.environ["COGENT3_NEW_TYPE"] = "1"
This app evaluates evidence for whether sites differ in their mode of natural selection (Nielsen and Yang 1998).
from cogent3 import get_app
loader = get_app("load_aligned", format="fasta", moltype="dna")
aln = loader("data/primate_brca1.fasta")
sites_differ = get_app("natsel_sitehet",
"GNC", tree="data/primate_brca1.tree", optimise_motif_probs=False
)
result = sites_differ(aln)
result
LR | df | pvalue |
---|---|---|
1.4048 | 2 | 0.4954 |
hypothesis | key | lnL | nfp | DLC | unique_Q |
---|---|---|---|---|---|
null | 'GNC-null' | -6708.3129 | 24 | True | True |
alt | 'GNC-alt' | -6707.6104 | 26 | True | True |
The models have been constructed such that site-class bins have names indicating the mode of natural selection: -ve is purifying (oomega<1); neutral (omega=1); and +ve is positive natural selection (omega>1). The two parameters of interest relating to these are the bprobs
(the maximum likelihood estimate of the frequency of the site-class) and the corresponding value of omega.
result.alt.lf
GNC-alt
log-likelihood = -6707.6104
number of free parameters = 26
A>C | A>G | A>T | C>A | C>G | C>T | G>A | G>C | G>T | T>A | T>C |
---|---|---|---|---|---|---|---|---|---|---|
0.85 | 3.56 | 0.97 | 1.64 | 2.18 | 6.32 | 8.08 | 1.23 | 0.78 | 1.28 | 3.03 |
bin | bprobs | omega |
---|---|---|
-ve | 0.10 | 0.00 |
neutral | 0.81 | 1.00 |
+ve | 0.09 | 20.00 |
edge | parent | length |
---|---|---|
Galago | root | 0.55 |
HowlerMon | root | 0.14 |
Rhesus | edge.3 | 0.06 |
Orangutan | edge.2 | 0.02 |
Gorilla | edge.1 | 0.01 |
Human | edge.0 | 0.02 |
Chimpanzee | edge.0 | 0.01 |
edge.0 | edge.1 | 0.00 |
edge.1 | edge.2 | 0.01 |
edge.2 | edge.3 | 0.04 |
edge.3 | root | 0.02 |
AAA | AAC | AAG | AAT | ACA | ACC | ACG | ACT | AGA | AGC | AGG | AGT | ATA |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.06 | 0.02 | 0.03 | 0.06 | 0.02 | 0.00 | 0.00 | 0.03 | 0.02 | 0.03 | 0.01 | 0.04 | 0.02 |
ATC | ATG | ATT | CAA | CAC | CAG | CAT | CCA | CCC | CCG | CCT | CGA | CGC |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.01 | 0.01 | 0.02 | 0.02 | 0.01 | 0.02 | 0.02 | 0.02 | 0.01 | 0.00 | 0.03 | 0.00 | 0.00 |
CGG | CGT | CTA | CTC | CTG | CTT | GAA | GAC | GAG | GAT | GCA | GCC | GCG |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.00 | 0.00 | 0.01 | 0.01 | 0.01 | 0.01 | 0.08 | 0.01 | 0.03 | 0.03 | 0.02 | 0.01 | 0.00 |
GCT | GGA | GGC | GGG | GGT | GTA | GTC | GTG | GTT | TAC | TAT | TCA | TCC |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.01 | 0.02 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.02 | 0.00 | 0.01 | 0.02 | 0.01 |
TCG | TCT | TGC | TGG | TGT | TTA | TTC | TTG | TTT |
---|---|---|---|---|---|---|---|---|
0.00 | 0.03 | 0.00 | 0.00 | 0.02 | 0.02 | 0.01 | 0.01 | 0.02 |
Getting the individual site posterior probabilities#
I’m just displaying the posterior-probabilities from the first 20 positions only.
bprobs = result.alt.lf.get_bin_probs()
bprobs[:, :20]
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
-ve | 0.1491 | 0.0839 | 0.0000 | 0.1313 | 0.1146 | 0.1569 | 0.0843 | 0.1191 | 0.1020 | 0.0798 | 0.0760 | 0.1569 | 0.0937 | 0.0000 | 0.1563 | 0.5173 | 0.0798 | 0.0695 | 0.1146 | 0.1216 |
neutral | 0.7725 | 0.8127 | 0.8643 | 0.7851 | 0.7961 | 0.7668 | 0.8125 | 0.7927 | 0.8032 | 0.8141 | 0.8164 | 0.7668 | 0.8070 | 0.8655 | 0.7670 | 0.4824 | 0.8141 | 0.8197 | 0.7961 | 0.7909 |
+ve | 0.0784 | 0.1034 | 0.1357 | 0.0837 | 0.0893 | 0.0764 | 0.1032 | 0.0882 | 0.0948 | 0.1061 | 0.1076 | 0.0764 | 0.0993 | 0.1345 | 0.0766 | 0.0003 | 0.1061 | 0.1108 | 0.0893 | 0.0875 |