natsel_sitehet – a test of site heterogeneity#
This app evaluates evidence for whether sites differ in their mode of natural selection (Nielsen and Yang 1998).
from cogent3 import get_app
loader = get_app("load_aligned", format_name="fasta", moltype="dna")
aln = loader("data/primate_brca1.fasta")
sites_differ = get_app("natsel_sitehet",
"GNC", tree="data/primate_brca1.tree", optimise_motif_probs=False
)
result = sites_differ(aln)
result
| LR | df | pvalue |
|---|---|---|
| 1.4048 | 2 | 0.4954 |
| hypothesis | key | lnL | nfp | DLC | unique_Q |
|---|---|---|---|---|---|
| null | 'GNC-null' | -6708.3129 | 24 | True | True |
| alt | 'GNC-alt' | -6707.6104 | 26 | True | True |
The models have been constructed such that site-class bins have names indicating the mode of natural selection: -ve is purifying (oomega<1); neutral (omega=1); and +ve is positive natural selection (omega>1). The two parameters of interest relating to these are the bprobs (the maximum likelihood estimate of the frequency of the site-class) and the corresponding value of omega.
result.alt.lf
GNC-alt
log-likelihood = -6707.6104
number of free parameters = 26
| A>C | A>G | A>T | C>A | C>G | C>T | G>A | G>C | G>T | T>A | T>C |
|---|---|---|---|---|---|---|---|---|---|---|
| 0.85 | 3.56 | 0.97 | 1.64 | 2.18 | 6.32 | 8.08 | 1.23 | 0.78 | 1.28 | 3.03 |
| bin | bprobs | omega |
|---|---|---|
| -ve | 0.10 | 0.00 |
| neutral | 0.81 | 1.00 |
| +ve | 0.09 | 20.00 |
| edge | parent | length |
|---|---|---|
| Galago | root | 0.55 |
| HowlerMon | root | 0.14 |
| Rhesus | edge.3 | 0.06 |
| Orangutan | edge.2 | 0.02 |
| Gorilla | edge.1 | 0.01 |
| Human | edge.0 | 0.02 |
| Chimpanzee | edge.0 | 0.01 |
| edge.0 | edge.1 | 0.00 |
| edge.1 | edge.2 | 0.01 |
| edge.2 | edge.3 | 0.04 |
| edge.3 | root | 0.02 |
| AAA | AAC | AAG | AAT | ACA | ACC | ACG | ACT | AGA | AGC | AGG | AGT | ATA |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.06 | 0.02 | 0.03 | 0.06 | 0.02 | 0.00 | 0.00 | 0.03 | 0.02 | 0.03 | 0.01 | 0.04 | 0.02 |
| ATC | ATG | ATT | CAA | CAC | CAG | CAT | CCA | CCC | CCG | CCT | CGA | CGC |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.01 | 0.01 | 0.02 | 0.02 | 0.01 | 0.02 | 0.02 | 0.02 | 0.01 | 0.00 | 0.03 | 0.00 | 0.00 |
| CGG | CGT | CTA | CTC | CTG | CTT | GAA | GAC | GAG | GAT | GCA | GCC | GCG |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.00 | 0.00 | 0.01 | 0.01 | 0.01 | 0.01 | 0.08 | 0.01 | 0.03 | 0.03 | 0.02 | 0.01 | 0.00 |
| GCT | GGA | GGC | GGG | GGT | GTA | GTC | GTG | GTT | TAC | TAT | TCA | TCC |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.01 | 0.02 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.02 | 0.00 | 0.01 | 0.02 | 0.01 |
| TCG | TCT | TGC | TGG | TGT | TTA | TTC | TTG | TTT |
|---|---|---|---|---|---|---|---|---|
| 0.00 | 0.03 | 0.00 | 0.00 | 0.02 | 0.02 | 0.01 | 0.01 | 0.02 |
Getting the individual site posterior probabilities#
I’m just displaying the posterior-probabilities from the first 20 positions only.
bprobs = result.alt.lf.get_bin_probs()
bprobs[:, :20]
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| -ve | 0.1491 | 0.0839 | 0.0000 | 0.1313 | 0.1146 | 0.1569 | 0.0843 | 0.1191 | 0.1020 | 0.0798 | 0.0760 | 0.1569 | 0.0937 | 0.0000 | 0.1563 | 0.5173 | 0.0798 | 0.0695 | 0.1146 | 0.1216 |
| neutral | 0.7725 | 0.8127 | 0.8643 | 0.7851 | 0.7961 | 0.7668 | 0.8125 | 0.7927 | 0.8032 | 0.8141 | 0.8164 | 0.7668 | 0.8070 | 0.8655 | 0.7670 | 0.4824 | 0.8141 | 0.8197 | 0.7961 | 0.7909 |
| +ve | 0.0784 | 0.1034 | 0.1357 | 0.0837 | 0.0893 | 0.0764 | 0.1032 | 0.0882 | 0.0948 | 0.1061 | 0.1076 | 0.0764 | 0.0993 | 0.1345 | 0.0766 | 0.0003 | 0.1061 | 0.1108 | 0.0893 | 0.0875 |