natsel_sitehet – a test of site heterogeneity#

Note

These docs now use the new_type core objects via the following setting.

import os

# using new types without requiring an explicit argument
os.environ["COGENT3_NEW_TYPE"] = "1"

This app evaluates evidence for whether sites differ in their mode of natural selection (Nielsen and Yang 1998).

from cogent3 import get_app

loader = get_app("load_aligned", format="fasta", moltype="dna")
aln = loader("data/primate_brca1.fasta")

sites_differ = get_app("natsel_sitehet",
    "GNC", tree="data/primate_brca1.tree", optimise_motif_probs=False
)

result = sites_differ(aln)
result
Statistics
LRdfpvalue
1.404820.4954
hypothesiskeylnLnfpDLCunique_Q
null'GNC-null'-6708.312924TrueTrue
alt'GNC-alt'-6707.610426TrueTrue

The models have been constructed such that site-class bins have names indicating the mode of natural selection: -ve is purifying (oomega<1); neutral (omega=1); and +ve is positive natural selection (omega>1). The two parameters of interest relating to these are the bprobs (the maximum likelihood estimate of the frequency of the site-class) and the corresponding value of omega.

result.alt.lf

GNC-alt

log-likelihood = -6707.6104

number of free parameters = 26

Global params
A>CA>GA>TC>AC>GC>TG>AG>CG>TT>AT>C
0.853.560.971.642.186.328.081.230.781.283.03
Bin params
binbprobsomega
-ve0.100.00
neutral0.811.00
+ve0.0920.00
Edge params
edgeparentlength
Galagoroot0.55
HowlerMonroot0.14
Rhesusedge.30.06
Orangutanedge.20.02
Gorillaedge.10.01
Humanedge.00.02
Chimpanzeeedge.00.01
edge.0edge.10.00
edge.1edge.20.01
edge.2edge.30.04
edge.3root0.02
Motif params
AAAAACAAGAATACAACCACGACTAGAAGCAGGAGTATA
0.060.020.030.060.020.000.000.030.020.030.010.040.02
continuation
ATCATGATTCAACACCAGCATCCACCCCCGCCTCGACGC
0.010.010.020.020.010.020.020.020.010.000.030.000.00
continuation
CGGCGTCTACTCCTGCTTGAAGACGAGGATGCAGCCGCG
0.000.000.010.010.010.010.080.010.030.030.020.010.00
continuation
GCTGGAGGCGGGGGTGTAGTCGTGGTTTACTATTCATCC
0.010.020.010.010.010.010.010.010.020.000.010.020.01
continuation
TCGTCTTGCTGGTGTTTATTCTTGTTT
0.000.030.000.000.020.020.010.010.02

Getting the individual site posterior probabilities#

I’m just displaying the posterior-probabilities from the first 20 positions only.

bprobs = result.alt.lf.get_bin_probs()
bprobs[:, :20]
012345678910111213141516171819
-ve0.14910.08390.00000.13130.11460.15690.08430.11910.10200.07980.07600.15690.09370.00000.15630.51730.07980.06950.11460.1216
neutral0.77250.81270.86430.78510.79610.76680.81250.79270.80320.81410.81640.76680.80700.86550.76700.48240.81410.81970.79610.7909
+ve0.07840.10340.13570.08370.08930.07640.10320.08820.09480.10610.10760.07640.09930.13450.07660.00030.10610.11080.08930.0875