natsel_zhang – a branch-site test#

This is the hypothesis test presented in Zhang et al. It evaluates the hypothesis that a set of sites have undergone positive natural selection on a pre-specified set of lineages.

For this model class, there are groups of branches for which all positions are evolving neutrally but some proportion of those neutrally evolving sites change to adaptively evolving on so-called foreground edges. For the current example, we’ll define the Chimpanzee and Human branches as foreground and everything else as background. The following table defines the parameter scopes.

<IPython.core.display.HTML object>

Note

Our implementation is not as parametrically succinct as that of Zhang et al, we have 1 additional bin probability.

from cogent3 import get_app

loader = get_app("load_aligned", format="fasta", moltype="dna")
aln = loader("data/primate_brca1.fasta")

zhang_test = get_app("natsel_zhang",
    "GNC",
    tree="data/primate_brca1.tree",
    optimise_motif_probs=False,
    tip1="Human",
    tip2="Chimpanzee",
)

result = zhang_test(aln)
result
Statistics
LRdfpvalue
4.955230.1751
hypothesiskeylnLnfpDLCunique_Q
null'GNC-null'-6708.311924TrueTrue
alt'GNC-alt'-6705.834327TrueTrue
result.alt.lf

GNC-alt

log-likelihood = -6705.8343

number of free parameters = 27

Global params
A>CA>GA>TC>AC>GC>TG>AG>CG>TT>A
0.85573.53530.97471.65912.19436.26038.01251.24220.79441.2671
continuation
T>C
2.9654
Bin params
binbprobs
00.0508
10.0111
2a0.0427
2b0.8954
Edge params
edgeparentlength
Galagoroot0.5419
HowlerMonroot0.1359
Rhesusedge.30.0648
Orangutanedge.20.0235
Gorillaedge.10.0075
Humanedge.00.0182
Chimpanzeeedge.00.0085
edge.0edge.10.0000
edge.1edge.20.0099
edge.2edge.30.0365
edge.3root0.0234
Edge bin params
edgebinomega
Galago01.0000009833196434e-06
Galago11.0
Galago2a1.0000009833196434e-06
Galago2b1.0
HowlerMon01.0000009833196434e-06
HowlerMon11.0
HowlerMon2a1.0000009833196434e-06
HowlerMon2b1.0
Rhesus01.0000009833196434e-06
Rhesus11.0
Rhesus2a1.0000009833196434e-06
Rhesus2b1.0
Orangutan01.0000009833196434e-06
Orangutan11.0
Orangutan2a1.0000009833196434e-06
Orangutan2b1.0
Gorilla01.0000009833196434e-06
Gorilla11.0
Gorilla2a1.0000009833196434e-06
Gorilla2b1.0
Human01.0000009833196434e-06
Human11.0
Human2a3.7552270169714044
Human2b3.7552270169714044
Chimpanzee01.0000009833196434e-06
Chimpanzee11.0
Chimpanzee2a3.7552270169714044
Chimpanzee2b3.7552270169714044
edge.001.0000009833196434e-06
edge.011.0
edge.02a1.0000009833196434e-06
edge.02b1.0
edge.101.0000009833196434e-06
edge.111.0
edge.12a1.0000009833196434e-06
edge.12b1.0
edge.201.0000009833196434e-06
edge.211.0
edge.22a1.0000009833196434e-06
edge.22b1.0
edge.301.0000009833196434e-06
edge.311.0
edge.32a1.0000009833196434e-06
edge.32b1.0
Motif params
AAAAACAAGAATACAACCACGACTAGAAGC
0.05560.02350.03440.05560.02280.00460.00080.02890.02310.0286
continuation
AGGAGTATAATCATGATTCAACACCAGCAT
0.01400.03810.01860.00700.01280.01920.01960.00520.02380.0221
continuation
CCACCCCCGCCTCGACGCCGGCGTCTACTC
0.01950.00620.00060.02630.00110.00090.00230.00320.01370.0078
continuation
CTGCTTGAAGACGAGGATGCAGCCGCGGCT
0.01250.01050.07550.01050.03030.03150.01580.00960.00140.0137
continuation
GGAGGCGGGGGTGTAGTCGTGGTTTACTAT
0.01610.00900.00670.01330.01480.00700.00690.02130.00230.0101
continuation
TCATCCTCGTCTTGCTGGTGTTTATTCTTG
0.02210.00820.00150.02510.00180.00400.02010.02120.00780.0108
continuation
TTT
0.0187

Getting the posterior probabilities of site-class membership#

bprobs = result.alt.lf.get_bin_probs()
bprobs[:, :20]
012345678910111213141516171819
00.07240.04080.00000.06400.05590.07640.04100.05800.04950.03920.03740.07640.04580.00000.07610.24990.03920.03390.05590.0591
10.01060.01130.01220.01080.01100.01050.01130.01090.01110.01130.01130.01050.01120.01220.01060.00650.01130.01140.01100.0109
2a0.06020.03480.00000.05350.04700.06340.03500.04870.04190.03360.03210.06340.03890.00000.06310.21420.03360.02910.04700.0496
2b0.85670.91310.98780.87180.88610.84970.91270.88230.89750.91590.91920.84970.90410.98780.85030.52940.91590.92550.88610.8803

Getting all the statistics in tabular form#

tab = get_app("tabulate_stats")
stats = tab(result.alt)
stats
5x tabular_result('global params': Table, 'bin params': Table, 'edge params': Table, 'edge bin params': Table, 'motif params': Table)
stats["edge bin params"][:10]  # truncating the table
edge bin params
edgebinomega
Galago01.0000009833196434e-06
Galago11.0
Galago2a1.0000009833196434e-06
Galago2b1.0
HowlerMon01.0000009833196434e-06
HowlerMon11.0
HowlerMon2a1.0000009833196434e-06
HowlerMon2b1.0
Rhesus01.0000009833196434e-06
Rhesus11.0

10 rows x 3 columns