Select n sequences from a collection#
Let’s load an alignment of primates to use in examples.
from cogent3 import get_app
loader = get_app("load_aligned", moltype="dna")
aln = loader("data/primate_brca1.fasta")
aln
| 0 | |
| Chimpanzee | TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGTTTATTACTCACTAAA |
| Galago | .......A................................G................... |
| HowlerMon | ...............................................G............ |
| Rhesus | ...............................................G............ |
| Orangutan | ............................................................ |
| Gorilla | ............................................................ |
| Human | ............................................................ |
7 x 2814 (truncated to 7 x 60) dna alignment
Select the first n sequences from an alignment#
Initialising take_n_seqs with the argument number=3 creates an app that returns the first 3 sequences from an alignment
Note
“first n” refers to the ordering in the fasta file.
from cogent3 import get_app
first_3 = get_app("take_n_seqs", number=3)
first_3(aln)
| 0 | |
| Galago | TGTGGCAAAAATACTCATGCCAGCTCATTACAGCATGAGAGCAGTTTATTACTCACTAAA |
| Rhesus | .......C................................A......G............ |
| HowlerMon | .......C................................A......G............ |
3 x 2814 (truncated to 3 x 60) dna alignment
Randomly selecting n sequences from an alignment#
Using random=True and number=3 returns 3 random sequences. An optional argument for a seed can be provided to ensure the same sequences are returned each time the app is called.
from cogent3 import get_app
random_n = get_app("take_n_seqs", random=True, number=3, seed=1)
random_n(aln)
| 0 | |
| Chimpanzee | TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGTTTATTACTCACTAAA |
| Rhesus | ...............................................G............ |
| HowlerMon | ...............................................G............ |
3 x 2814 (truncated to 3 x 60) dna alignment
Selecting the same sequences from multiple alignments#
Providing the argument fixed_choice=True ensures the same sequences are returned when (randomly) sampling sequences across several alignments.
from cogent3 import get_app
loader = get_app("load_aligned", moltype="dna")
aln1 = loader("data/primate_brca1.fasta")
aln2 = loader("data/brca1.fasta")
aln1.names
('Galago',
'HowlerMon',
'Rhesus',
'Orangutan',
'Gorilla',
'Human',
'Chimpanzee')
aln2.names
('FlyingFox',
'DogFaced',
'FreeTaile',
'LittleBro',
'TombBat',
'RoundEare',
'FalseVamp',
'LeafNose',
'Horse',
'Rhino',
'Pangolin',
'Cat',
'Dog',
'Llama',
'Pig',
'Cow',
'Hippo',
'SpermWhale',
'HumpbackW',
'Mole',
'Hedgehog',
'TreeShrew',
'FlyingLem',
'Galago',
'HowlerMon',
'Rhesus',
'Orangutan',
'Gorilla',
'Human',
'Chimpanzee',
'Jackrabbit',
'FlyingSqu',
'OldWorld',
'Mouse',
'Rat',
'NineBande',
'HairyArma',
'Anteater',
'Sloth',
'Dugong',
'Manatee',
'AfricanEl',
'AsianElep',
'RockHyrax',
'TreeHyrax',
'Aardvark',
'GoldenMol',
'Madagascar',
'Tenrec',
'LesserEle',
'GiantElep',
'Caenolest',
'Phascogale',
'Wombat',
'Bandicoot')
fixed_choice = get_app("take_n_seqs", number=2, random=True, fixed_choice=True)
result1 = fixed_choice(aln1).names
result2 = fixed_choice(aln2).names
result1 == result2
True