Concatenating alignments#

The concat app provides a mechanism to concatenate alignments.

from cogent3 import get_app

concat_alns_app = get_app("concat", moltype="dna")

Let’s create sample alignments with matching sequence names to use in the below examples.

from cogent3 import make_aligned_seqs

aln1 = make_aligned_seqs({"s1": "AAA", "s2": "CAA", "s3": "AAA"}, moltype="dna")
aln2 = make_aligned_seqs({"s1": "GCG", "s2": "GGG", "s3": "GGT"}, moltype="dna")
aln1
0
s1AAA
s2C..
s3...

3 x 3 dna alignment

aln2
0
s1GCG
s2.G.
s3.GT

3 x 3 dna alignment

How to concatenate alignments#

By default, sequences without matching names in the corresponding alignment are omitted (intersect=True).

result = concat_alns_app([aln1, aln2])
result
0
s1AAAGCG
s3....GT
s2C...G.

3 x 6 dna alignment

How to concatenate alignments with missing sequences#

By providing the argument intersect=False, the concat app will include missing sequences across alignments. Missing sequences are replaced by a sequence of "?".

from cogent3 import make_aligned_seqs, get_app

concat_missing = get_app("concat", moltype="dna", intersect=False)
aln3 = make_aligned_seqs({"s4": "GCG", "s5": "GGG"}, moltype="dna")
result = concat_missing([aln1, aln3])
result
0
s1AAA???
s2C.....
s4???GCG
s3......
s5???GGG

5 x 6 dna alignment

How to concatenated alignments with a delimiter "N"#

You can insert an "N" character in between the concatenated sequences.

from cogent3 import get_app

concat_delim = get_app("concat", join_seq="N", moltype="dna")
result = concat_delim([aln1, aln2])
result
0
s1AAANGCG
s3.....GT
s2C....G.

3 x 7 dna alignment