Using a codon model

We load the unaligned sequences we will use in our examples.

from cogent3.app import io

reader = io.load_unaligned(format="fasta")
seqs = reader("data/SCA1-cds.fasta")

Codon alignment with default settings

The default settings will result in estimation of a guide tree (using percent identity between the sequences). The default “codon” model is MG94HKY.

from cogent3.app.align import progressive_align

codon_aligner = progressive_align("codon")
aligned = codon_aligner(seqs)
aligned
0
HumanATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC
Mouse Lemur..................................................T.........
Rat........T.................T....................A..T.........
Mouse...............................................A..T.........
Chimp............................................................
Macaque............................................................

6 x 2478 (truncated to 6 x 60) dna alignment

The parameters used to construct the alignment, including the guide tree and substitution model, are record in the info attribute.

aligned.info
{'Refs': {},
 'source': 'data/SCA1-cds.fasta',
 'align_params': {'omega': 0.4,
  'kappa': 3,
  'indel_length': 0.1,
  'indel_rate': 1e-10,
  'guide_tree': '((Mouse_Lemur:0.015105797134349978,(Rat:0.00654034335485719,Mouse:0.00754218023336982):0.025249030951039707):0.009160143497857723,(Chimp:0.0010087075768492318,Human:0.0007811864063533017):0.0025329760417166156,Macaque:0.0031170774657144143);',
  'model': 'MG94HKY',
  'lnL': -6513.5254663912865}}

Note

If can also specify unique_guides=True, which means a guide tree will be estimated for every alignment.

Specify a different distance measure for estimating the guide tree

The distance measures available are the same as for the nucleotide case (percent, TN93 or paralinear).

Note

An estimated guide tree has its branch lengths scaled so they are consistent with usage in a codon model.

nt_aligner = progressive_align("codon", distance="paralinear")
aligned = nt_aligner(seqs)
aligned
0
HumanATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC
Chimp............................................................
Macaque............................................................
Rat........T.................T....................A..T.........
Mouse...............................................A..T.........
Mouse Lemur..................................................T.........

6 x 2478 (truncated to 6 x 60) dna alignment

Providing a guide tree

Note

The guide tree needs to have branch lengths, otherwise a ValueError is raised.

tree = "((Chimp:0.001,Human:0.001):0.0076,Macaque:0.01,((Rat:0.01,Mouse:0.01):0.02,Mouse_Lemur:0.02):0.01)"
codon_aligner = progressive_align("codon", guide_tree=tree)
aligned = codon_aligner(seqs)
aligned
0
HumanATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC
Chimp............................................................
Macaque............................................................
Rat........T.................T....................A..T.........
Mouse...............................................A..T.........
Mouse Lemur..................................................T.........

6 x 2478 (truncated to 6 x 60) dna alignment

Specifying the gap parameters

codon_aligner = progressive_align(
    "codon", guide_tree=tree, indel_rate=0.001, indel_length=0.01
)
aligned = codon_aligner(seqs)
aligned
0
HumanATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC
Chimp............................................................
Macaque............................................................
Rat........T.................T....................A..T.........
Mouse...............................................A..T.........
Mouse Lemur..................................................T.........

6 x 2478 (truncated to 6 x 60) dna alignment

Specifying the substitution model and parameters

Any codon substitution model can be used. (See cogent3.available_models().) If you provide parameter values, those must be consistent with the model definition.

codon_aligner = progressive_align(
    "CNFHKY", guide_tree=tree, param_vals=dict(omega=0.1, kappa=3)
)
aligned = codon_aligner(seqs)
aligned
0
HumanATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC
Chimp............................................................
Macaque............................................................
Rat........T.................T....................A..T.........
Mouse...............................................A..T.........
Mouse Lemur..................................................T.........

6 x 2478 (truncated to 6 x 60) dna alignment

Alignment settings and file provenance are recorded in the info attribute

aligned.info
{'Refs': {},
 'source': 'data/SCA1-cds.fasta',
 'align_params': {'omega': 0.1,
  'kappa': 3,
  'indel_length': 0.1,
  'indel_rate': 1e-10,
  'guide_tree': '((Chimp:0.001,Human:0.001):0.0076,Macaque:0.01,((Rat:0.01,Mouse:0.01):0.02,Mouse_Lemur:0.02):0.01);',
  'model': 'CNFHKY',
  'lnL': -6211.757122737677}}