Select n sequences from a collection#
Let’s load an alignment of primates to use in examples.
from cogent3 import get_app
loader = get_app("load_aligned", moltype="dna")
aln = loader("data/primate_brca1.fasta")
aln
0 | |
Chimpanzee | TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGTTTATTACTCACTAAA |
Galago | .......A................................G................... |
HowlerMon | ...............................................G............ |
Rhesus | ...............................................G............ |
Orangutan | ............................................................ |
Gorilla | ............................................................ |
Human | ............................................................ |
7 x 2814 (truncated to 7 x 60) dna alignment
Select the first n sequences from an alignment#
Initialising take_n_seqs
with the argument number=3
creates an app that returns the first 3 sequences from an alignment
Note
“first n” refers to the ordering in the fasta file.
from cogent3 import get_app
first_3 = get_app("take_n_seqs", number=3)
first_3(aln)
0 | |
Galago | TGTGGCAAAAATACTCATGCCAGCTCATTACAGCATGAGAGCAGTTTATTACTCACTAAA |
HowlerMon | .......C................................A......G............ |
Rhesus | .......C................................A......G............ |
3 x 2814 (truncated to 3 x 60) dna alignment
Randomly selecting n sequences from an alignment#
Using random=True
and number=3
returns 3 random sequences. An optional argument for a seed
can be provided to ensure the same sequences are returned each time the app is called.
from cogent3 import get_app
random_n = get_app("take_n_seqs", random=True, number=3, seed=1)
random_n(aln)
0 | |
Chimpanzee | TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGTTTATTACTCACTAAA |
Rhesus | ...............................................G............ |
HowlerMon | ...............................................G............ |
3 x 2814 (truncated to 3 x 60) dna alignment
Selecting the same sequences from multiple alignments#
Providing the argument fixed_choice=True
ensures the same sequences are returned when (randomly) sampling sequences across several alignments.
from cogent3 import get_app
loader = get_app("load_aligned", moltype="dna")
aln1 = loader("data/primate_brca1.fasta")
aln2 = loader("data/brca1.fasta")
aln1.names
['Galago',
'HowlerMon',
'Rhesus',
'Orangutan',
'Gorilla',
'Human',
'Chimpanzee']
aln2.names
['FlyingFox',
'DogFaced',
'FreeTaile',
'LittleBro',
'TombBat',
'RoundEare',
'FalseVamp',
'LeafNose',
'Horse',
'Rhino',
'Pangolin',
'Cat',
'Dog',
'Llama',
'Pig',
'Cow',
'Hippo',
'SpermWhale',
'HumpbackW',
'Mole',
'Hedgehog',
'TreeShrew',
'FlyingLem',
'Galago',
'HowlerMon',
'Rhesus',
'Orangutan',
'Gorilla',
'Human',
'Chimpanzee',
'Jackrabbit',
'FlyingSqu',
'OldWorld',
'Mouse',
'Rat',
'NineBande',
'HairyArma',
'Anteater',
'Sloth',
'Dugong',
'Manatee',
'AfricanEl',
'AsianElep',
'RockHyrax',
'TreeHyrax',
'Aardvark',
'GoldenMol',
'Madagascar',
'Tenrec',
'LesserEle',
'GiantElep',
'Caenolest',
'Phascogale',
'Wombat',
'Bandicoot']
fixed_choice = get_app("take_n_seqs", number=2, random=True, fixed_choice=True)
result1 = fixed_choice(aln1).names
result2 = fixed_choice(aln2).names
result1 == result2
True