natsel_zhang – a branch-site test#

Note

These docs now use the new_type core objects via the following setting.

import os

# using new types without requiring an explicit argument
os.environ["COGENT3_NEW_TYPE"] = "1"

This is the hypothesis test presented in Zhang et al. It evaluates the hypothesis that a set of sites have undergone positive natural selection on a pre-specified set of lineages.

For this model class, there are groups of branches for which all positions are evolving neutrally but some proportion of those neutrally evolving sites change to adaptively evolving on so-called foreground edges. For the current example, we’ll define the Chimpanzee and Human branches as foreground and everything else as background. The following table defines the parameter scopes.

<IPython.core.display.HTML object>

Note

Our implementation is not as parametrically succinct as that of Zhang et al, we have 1 additional bin probability.

from cogent3 import get_app

loader = get_app("load_aligned", format="fasta", moltype="dna")
aln = loader("data/primate_brca1.fasta")

zhang_test = get_app("natsel_zhang",
    "GNC",
    tree="data/primate_brca1.tree",
    optimise_motif_probs=False,
    tip1="Human",
    tip2="Chimpanzee",
)

result = zhang_test(aln)
result
Statistics
LRdfpvalue
4.955330.1751
hypothesiskeylnLnfpDLCunique_Q
null'GNC-null'-6708.312924TrueTrue
alt'GNC-alt'-6705.835227TrueTrue
result.alt.lf

GNC-alt

log-likelihood = -6705.8352

number of free parameters = 27

Global params
A>CA>GA>TC>AC>GC>TG>AG>CG>TT>AT>C
0.863.530.971.662.196.268.011.240.791.272.96
Bin params
binbprobs
00.05
10.01
2a0.04
2b0.89
Edge params
edgeparentlength
Galagoroot0.54
HowlerMonroot0.14
Rhesusedge.30.06
Orangutanedge.20.02
Gorillaedge.10.01
Humanedge.00.02
Chimpanzeeedge.00.01
edge.0edge.10.00
edge.1edge.20.01
edge.2edge.30.04
edge.3root0.02
Edge bin params
edgebinomega
Galago00.00
Galago11.00
Galago2a0.00
Galago2b1.00
HowlerMon00.00
HowlerMon11.00
HowlerMon2a0.00
HowlerMon2b1.00
Rhesus00.00
Rhesus11.00
Rhesus2a0.00
Rhesus2b1.00
Orangutan00.00
Orangutan11.00
Orangutan2a0.00
Orangutan2b1.00
Gorilla00.00
Gorilla11.00
Gorilla2a0.00
Gorilla2b1.00
Human00.00
Human11.00
Human2a3.76
Human2b3.76
Chimpanzee00.00
Chimpanzee11.00
Chimpanzee2a3.76
Chimpanzee2b3.76
edge.000.00
edge.011.00
edge.02a0.00
edge.02b1.00
edge.100.00
edge.111.00
edge.12a0.00
edge.12b1.00
edge.200.00
edge.211.00
edge.22a0.00
edge.22b1.00
edge.300.00
edge.311.00
edge.32a0.00
edge.32b1.00
Motif params
AAAAACAAGAATACAACCACGACTAGAAGCAGGAGTATA
0.060.020.030.060.020.000.000.030.020.030.010.040.02
continuation
ATCATGATTCAACACCAGCATCCACCCCCGCCTCGACGC
0.010.010.020.020.010.020.020.020.010.000.030.000.00
continuation
CGGCGTCTACTCCTGCTTGAAGACGAGGATGCAGCCGCG
0.000.000.010.010.010.010.080.010.030.030.020.010.00
continuation
GCTGGAGGCGGGGGTGTAGTCGTGGTTTACTATTCATCC
0.010.020.010.010.010.010.010.010.020.000.010.020.01
continuation
TCGTCTTGCTGGTGTTTATTCTTGTTT
0.000.030.000.000.020.020.010.010.02

Getting the posterior probabilities of site-class membership#

bprobs = result.alt.lf.get_bin_probs()
bprobs[:, :20]
012345678910111213141516171819
00.07230.04070.00000.06380.05580.07630.04090.05790.04940.03910.03730.07630.04570.00000.07590.24940.03910.03380.05580.0590
10.01170.01240.01350.01190.01210.01160.01240.01200.01220.01250.01250.01160.01230.01350.01160.00720.01250.01260.01210.0120
2a0.06030.03490.00000.05360.04710.06350.03510.04880.04200.03370.03210.06350.03900.00000.06320.21460.03370.02920.04710.0498
2b0.85560.91200.98650.87070.88500.84860.91160.88120.89640.91480.91800.84860.90300.98650.84920.52870.91480.92440.88500.8792

Getting all the statistics in tabular form#

tab = get_app("tabulate_stats")
stats = tab(result.alt)
stats
5x tabular_result('global params': Table, 'bin params': Table, 'edge params': Table, 'edge bin params': Table, 'motif params': Table)
stats["edge bin params"][:10]  # truncating the table
edge bin params
edgebinomega
Galago00.00
Galago11.00
Galago2a0.00
Galago2b1.00
HowlerMon00.00
HowlerMon11.00
HowlerMon2a0.00
HowlerMon2b1.00
Rhesus00.00
Rhesus11.00

10 rows x 3 columns