Removing highly gapped positions#
Using the omit_gap_pos
app, we can remove position from an alignment which exceed a specified proportions of gaps.
Let’s create a sample alignment with gaps.
from cogent3 import make_aligned_seqs
aln = make_aligned_seqs({"s1": "ACGA-GA-CG", "s2": "GATGATG-AT"}, moltype="dna")
aln
0 | |
s2 | GATGATG-AT |
s1 | ACGA-GA.CG |
2 x 10 dna alignment
Removing highly gapped nucleotide positions#
Sites with over 99% gaps are excluded by default.
from cogent3 import get_app
omit_gap_pos_app = get_app("omit_gap_pos", moltype="dna")
result = omit_gap_pos_app(aln)
result
0 | |
s2 | GATGATGAT |
s1 | ACGA-GACG |
2 x 9 dna alignment
We can alter the threshold for the allowed fraction of gaps with the allowed_frac
argument. Let’s create an app that excludes all aligned sites with over 49% gaps.
omit_gap_pos_app = get_app("omit_gap_pos", allowed_frac=0.49, moltype="dna")
result = omit_gap_pos_app(aln)
result
0 | |
s1 | ACGAGACG |
s2 | GATGTGAT |
2 x 8 dna alignment
Removing highly gapped codon positions#
To eliminate any codon columns (where a column is a triple of nucleotides) that contain a gap character, we use the motif_length
argument.
omit_gap_pos_app = get_app(
"omit_gap_pos", allowed_frac=0, motif_length=3, moltype="dna"
)
result = omit_gap_pos_app(aln)
result
0 | |
s1 | ACG |
s2 | GAT |
2 x 3 dna alignment