Sample an alignment to a fixed length#
Let’s load in an alignment of rodents to use in the examples.
from cogent3 import get_app
loader = get_app("load_aligned", moltype="protein", format_name="phylip")
aln = loader("data/abglobin_aa.phylip")
aln
| 0 | |
| goat-cow | VLSAADKSNVKAAWGKVGGNAGAYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGE |
| human | ...P...T..........AH..E....................................K |
| rabbit | ...P...T.I.T..E.I.SHG.E.....V.....G............FT...E.I.A..K |
| rat | ....D..T.I.NC...I..HG.E..E...Q...AA........S.I.V.P......A..K |
| marsupial | ...D...TH...I......H....A....A.T.................P....IQ...K |
5 x 285 (truncated to 5 x 60) protein alignment
How to sample the first n positions of an alignment#
We can use the fixed_length app to sample an alignment to a fixed length. By default, it will sample from the beginning of an alignment, the argument length=20 specifies how many positions to sample.
from cogent3 import get_app
first_20 = get_app("fixed_length", length=20)
first_20(aln)
| 0 | |
| goat-cow | VLSAADKSNVKAAWGKVGGN |
| human | ...P...T..........AH |
| rabbit | ...P...T.I.T..E.I.SH |
| rat | ....D..T.I.NC...I..H |
| marsupial | ...D...TH...I......H |
5 x 20 protein alignment
How to sample n positions from within an alignment#
Creating the fixed_length app with the argument start=x specifies that the sampled sequence should begin x positions into the alignment.
from cogent3 import get_app
skip_10_take_20 = get_app("fixed_length", length=20, start=10)
skip_10_take_20(aln)
| 0 | |
| goat-cow | KAAWGKVGGNAGAYGAEALE |
| human | ........AH..E....... |
| rabbit | .T..E.I.SHG.E.....V. |
| rat | .NC...I..HG.E..E...Q |
| marsupial | ..I......H....A....A |
5 x 20 protein alignment
How to sample n positions randomly from within an alignment#
The start position can be selected at random with random=True. An optional seed can be provided to ensure the same start position is used when the app is called.
from cogent3 import get_app
random_20 = get_app("fixed_length", length=20, random=True)
random_20(aln)
| 0 | |
| goat-cow | AVAKYAFSEASLSDLQFKLA |
| human | ........KD..CE....V. |
| rabbit | .IT.....KE..CE....V. |
| rat | DIN...I.KD..C.....A. |
| marsupial | ........KD..C..T..C. |
5 x 20 protein alignment