natsel_sitehet – a test of site heterogeneity

This app evaluates evidence for whether sites differ in their mode of natural selection (Nielsen and Yang 1998).

from cogent3.app import io, evo

loader = io.load_aligned(format="fasta", moltype="dna")
aln = loader("data/primate_brca1.fasta")

sites_differ = evo.natsel_sitehet(
    "GNC", tree="data/primate_brca1.tree", optimise_motif_probs=False
)

result = sites_differ(aln)
result
Statistics
LRdfpvalue
1.404820.4954
hypothesiskeylnLnfpDLCunique_Q
null'GNC-null'-6708.311924TrueTrue
alt'GNC-alt'-6707.609526TrueTrue

The models have been constructed such that site-class bins have names indicating the mode of natural selection: -ve is purifying (omega<1); neutral (omega=1); and +ve is positive natural selection (omega>1). The two parameters of interest relating to these are the bprobs (the maximum likelihood estimate of the frequency of the site-class) and the corresponding value of omega.

result.alt.lf

GNC-alt

log-likelihood = -6707.6095

number of free parameters = 26

Global params
A>CA>GA>TC>AC>GC>TG>AG>CG>TT>A
0.85303.56430.97341.64032.17996.32148.08081.23460.78291.2797
continuation
T>C
3.0290
Bin params
binbprobsomega
-ve0.10431.0000000830830555e-06
neutral0.80521.0
+ve0.090519.999999997429406
Edge params
edgeparentlength
Galagoroot0.5463
HowlerMonroot0.1364
Rhesusedge.30.0649
Orangutanedge.20.0235
Gorillaedge.10.0075
Humanedge.00.0182
Chimpanzeeedge.00.0085
edge.0edge.10.0000
edge.1edge.20.0099
edge.2edge.30.0364
edge.3root0.0233
Motif params
AAAAACAAGAATACAACCACGACTAGAAGC
0.05560.02350.03440.05560.02280.00460.00080.02890.02310.0286
continuation
AGGAGTATAATCATGATTCAACACCAGCAT
0.01400.03810.01860.00700.01280.01920.01960.00520.02380.0221
continuation
CCACCCCCGCCTCGACGCCGGCGTCTACTC
0.01950.00620.00060.02630.00110.00090.00230.00320.01370.0078
continuation
CTGCTTGAAGACGAGGATGCAGCCGCGGCT
0.01250.01050.07550.01050.03030.03150.01580.00960.00140.0137
continuation
GGAGGCGGGGGTGTAGTCGTGGTTTACTAT
0.01610.00900.00670.01330.01480.00700.00690.02130.00230.0101
continuation
TCATCCTCGTCTTGCTGGTGTTTATTCTTG
0.02210.00820.00150.02510.00180.00400.02010.02120.00780.0108
continuation
TTT
0.0187

Getting the individual site posterior probabilities

I’m just displaying the posterior-probabilities from the first 20 positions only.

bprobs = result.alt.lf.get_bin_probs()
bprobs[:, :20]
012345678910111213141516171819
-ve0.14910.08390.00000.13130.11460.15690.08430.11910.10200.07980.07600.15690.09370.00000.15630.51730.07980.06950.11460.1216
neutral0.77250.81270.86430.78510.79610.76670.81250.79270.80320.81410.81640.76670.80700.86550.76700.48230.81410.81970.79610.7909
+ve0.07840.10340.13570.08370.08930.07640.10320.08820.09480.10610.10760.07640.09930.13450.07660.00030.10610.11080.08930.0875