Concatenating alignments#
The concat
app provides a mechanism to concatenate alignments.
from cogent3 import get_app
concat_alns_app = get_app("concat", moltype="dna")
Let’s create sample alignments with matching sequence names to use in the below examples.
from cogent3 import make_aligned_seqs
aln1 = make_aligned_seqs({"s1": "AAA", "s2": "CAA", "s3": "AAA"}, moltype="dna")
aln2 = make_aligned_seqs({"s1": "GCG", "s2": "GGG", "s3": "GGT"}, moltype="dna")
aln1
0 | |
s1 | AAA |
s2 | C.. |
s3 | ... |
3 x 3 dna alignment
aln2
0 | |
s1 | GCG |
s2 | .G. |
s3 | .GT |
3 x 3 dna alignment
How to concatenate alignments#
By default, sequences without matching names in the corresponding alignment are omitted (intersect=True
).
result = concat_alns_app([aln1, aln2])
result
0 | |
s1 | AAAGCG |
s3 | ....GT |
s2 | C...G. |
3 x 6 dna alignment
How to concatenate alignments with missing sequences#
By providing the argument intersect=False
, the concat
app will include missing sequences across alignments. Missing sequences are replaced by a sequence of "?"
.
from cogent3 import make_aligned_seqs, get_app
concat_missing = get_app("concat", moltype="dna", intersect=False)
aln3 = make_aligned_seqs({"s4": "GCG", "s5": "GGG"}, moltype="dna")
result = concat_missing([aln1, aln3])
result
0 | |
s1 | AAA??? |
s2 | C..... |
s4 | ???GCG |
s3 | ...... |
s5 | ???GGG |
5 x 6 dna alignment
How to concatenated alignments with a delimiter "N"
#
You can insert an "N"
character in between the concatenated sequences.
from cogent3 import get_app
concat_delim = get_app("concat", join_seq="N", moltype="dna")
result = concat_delim([aln1, aln2])
result
0 | |
s1 | AAANGCG |
s3 | .....GT |
s2 | C....G. |
3 x 7 dna alignment