Sample an alignment to a fixed length#

Let’s load in an alignment of rodents to use in the examples.

from cogent3 import get_app

loader = get_app("load_aligned", moltype="protein", format_name="phylip")
aln = loader("data/abglobin_aa.phylip")
aln
0
goat-cowVLSAADKSNVKAAWGKVGGNAGAYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGE
human...P...T..........AH..E....................................K
rabbit...P...T.I.T..E.I.SHG.E.....V.....G............FT...E.I.A..K
rat....D..T.I.NC...I..HG.E..E...Q...AA........S.I.V.P......A..K
marsupial...D...TH...I......H....A....A.T.................P....IQ...K

5 x 285 (truncated to 5 x 60) protein alignment

How to sample the first n positions of an alignment#

We can use the fixed_length app to sample an alignment to a fixed length. By default, it will sample from the beginning of an alignment, the argument length=20 specifies how many positions to sample.

from cogent3 import get_app

first_20 = get_app("fixed_length", length=20)
first_20(aln)
0
goat-cowVLSAADKSNVKAAWGKVGGN
human...P...T..........AH
rabbit...P...T.I.T..E.I.SH
rat....D..T.I.NC...I..H
marsupial...D...TH...I......H

5 x 20 protein alignment

How to sample n positions from within an alignment#

Creating the fixed_length app with the argument start=x specifies that the sampled sequence should begin x positions into the alignment.

from cogent3 import get_app

skip_10_take_20 = get_app("fixed_length", length=20, start=10)
skip_10_take_20(aln)
0
goat-cowKAAWGKVGGNAGAYGAEALE
human........AH..E.......
rabbit.T..E.I.SHG.E.....V.
rat.NC...I..HG.E..E...Q
marsupial..I......H....A....A

5 x 20 protein alignment

How to sample n positions randomly from within an alignment#

The start position can be selected at random with random=True. An optional seed can be provided to ensure the same start position is used when the app is called.

from cogent3 import get_app

random_20 = get_app("fixed_length", length=20, random=True)
random_20(aln)
0
goat-cowLKSKTSFVTLREAANGVAGA
human............P.......
rabbit..G..............S..
rat..A.A......D.....G..
marsupial...QS....M.GP.......

5 x 20 protein alignment