natsel_sitehet
– a test of site heterogeneity#
This app evaluates evidence for whether sites differ in their mode of natural selection (Nielsen and Yang 1998).
from cogent3 import get_app
loader = get_app("load_aligned", format="fasta", moltype="dna")
aln = loader("data/primate_brca1.fasta")
sites_differ = get_app("natsel_sitehet",
"GNC", tree="data/primate_brca1.tree", optimise_motif_probs=False
)
result = sites_differ(aln)
result
LR | df | pvalue |
---|---|---|
1.4048 | 2 | 0.4954 |
hypothesis | key | lnL | nfp | DLC | unique_Q |
---|---|---|---|---|---|
null | 'GNC-null' | -6708.3129 | 24 | True | True |
alt | 'GNC-alt' | -6707.6104 | 26 | True | True |
The models have been constructed such that site-class bins have names indicating the mode of natural selection: -ve is purifying (oomega<1); neutral (omega=1); and +ve is positive natural selection (omega>1). The two parameters of interest relating to these are the bprobs
(the maximum likelihood estimate of the frequency of the site-class) and the corresponding value of omega.
result.alt.lf
GNC-alt
log-likelihood = -6707.6104
number of free parameters = 26
A>C | A>G | A>T | C>A | C>G | C>T | G>A | G>C | G>T | T>A |
---|---|---|---|---|---|---|---|---|---|
0.8530 | 3.5644 | 0.9734 | 1.6404 | 2.1800 | 6.3216 | 8.0811 | 1.2346 | 0.7829 | 1.2797 |
T>C |
---|
3.0291 |
bin | bprobs | omega |
---|---|---|
-ve | 0.1043 | 1.0000073065984624e-06 |
neutral | 0.8052 | 1.0 |
+ve | 0.0905 | 19.999999980177606 |
edge | parent | length |
---|---|---|
Galago | root | 0.5463 |
HowlerMon | root | 0.1364 |
Rhesus | edge.3 | 0.0649 |
Orangutan | edge.2 | 0.0235 |
Gorilla | edge.1 | 0.0075 |
Human | edge.0 | 0.0182 |
Chimpanzee | edge.0 | 0.0085 |
edge.0 | edge.1 | 0.0000 |
edge.1 | edge.2 | 0.0099 |
edge.2 | edge.3 | 0.0364 |
edge.3 | root | 0.0233 |
AAA | AAC | AAG | AAT | ACA | ACC | ACG | ACT | AGA | AGC |
---|---|---|---|---|---|---|---|---|---|
0.0556 | 0.0235 | 0.0344 | 0.0556 | 0.0228 | 0.0046 | 0.0008 | 0.0289 | 0.0231 | 0.0286 |
AGG | AGT | ATA | ATC | ATG | ATT | CAA | CAC | CAG | CAT |
---|---|---|---|---|---|---|---|---|---|
0.0140 | 0.0381 | 0.0186 | 0.0070 | 0.0128 | 0.0192 | 0.0196 | 0.0052 | 0.0238 | 0.0221 |
CCA | CCC | CCG | CCT | CGA | CGC | CGG | CGT | CTA | CTC |
---|---|---|---|---|---|---|---|---|---|
0.0195 | 0.0062 | 0.0006 | 0.0263 | 0.0011 | 0.0009 | 0.0023 | 0.0032 | 0.0137 | 0.0078 |
CTG | CTT | GAA | GAC | GAG | GAT | GCA | GCC | GCG | GCT |
---|---|---|---|---|---|---|---|---|---|
0.0125 | 0.0105 | 0.0755 | 0.0105 | 0.0303 | 0.0315 | 0.0158 | 0.0096 | 0.0014 | 0.0137 |
GGA | GGC | GGG | GGT | GTA | GTC | GTG | GTT | TAC | TAT |
---|---|---|---|---|---|---|---|---|---|
0.0161 | 0.0090 | 0.0067 | 0.0133 | 0.0148 | 0.0070 | 0.0069 | 0.0213 | 0.0023 | 0.0101 |
TCA | TCC | TCG | TCT | TGC | TGG | TGT | TTA | TTC | TTG |
---|---|---|---|---|---|---|---|---|---|
0.0221 | 0.0082 | 0.0015 | 0.0251 | 0.0018 | 0.0040 | 0.0201 | 0.0212 | 0.0078 | 0.0108 |
TTT |
---|
0.0187 |
Getting the individual site posterior probabilities#
I’m just displaying the posterior-probabilities from the first 20 positions only.
bprobs = result.alt.lf.get_bin_probs()
bprobs[:, :20]
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
-ve | 0.1491 | 0.0839 | 0.0000 | 0.1313 | 0.1146 | 0.1569 | 0.0843 | 0.1191 | 0.1020 | 0.0798 | 0.0760 | 0.1569 | 0.0937 | 0.0000 | 0.1563 | 0.5173 | 0.0798 | 0.0695 | 0.1146 | 0.1216 |
neutral | 0.7725 | 0.8127 | 0.8643 | 0.7850 | 0.7961 | 0.7667 | 0.8125 | 0.7926 | 0.8032 | 0.8141 | 0.8164 | 0.7667 | 0.8070 | 0.8655 | 0.7670 | 0.4823 | 0.8141 | 0.8196 | 0.7961 | 0.7909 |
+ve | 0.0784 | 0.1034 | 0.1357 | 0.0837 | 0.0893 | 0.0764 | 0.1032 | 0.0882 | 0.0948 | 0.1061 | 0.1076 | 0.0764 | 0.0993 | 0.1345 | 0.0766 | 0.0003 | 0.1061 | 0.1108 | 0.0893 | 0.0875 |