StatAlign – Home

An Extendable Software Package for Joint Bayesian Estimation of Alignments and Evolutionary Trees

What is StatAlign?

StatAlign is an extendable software package for Bayesian analysis of protein, DNA and RNA sequences. Multiple alignments, phylogenetic trees and evolutionary parameters are co-estimated in a Markov Chain Monte Carlo framework, allowing for reliable quantification of the uncertainty in these estimates.

This approach accounts for the inherent interdependence between estimation of alignments and phylogenies, avoiding the inherent bias present in traditional methods that rely on a single alignment. This helps to avoid artifacts such as those resulting from the choice of guide tree when constructing alignments.

The models behind the analysis permit the comparison of evolutionarily distant sequences. The TKF92 insertion-deletion model allows for indel information to be used to help inform phylogeny estimation, and can be coupled with a wide variety of substitution models. A broad range of models for nucleotide and amino acid data is included in the package and the plugin management system ensures that new models can be easily added.

StatAlign also includes model extension plugins capable of incorporating protein and RNA structural information, thereby increasing the reliability of the inference.

Although the increased computational overheads of this approach limit its application to smaller datasets (we have tested up to 20-30 sequences), the probabilistic nature of the model allows for more reliable information to be obtained from smaller datasets, and enables the uncertainty in the results to be quantified more accurately. We are currently working on a parallelised version that will extend the capabilities of StatAlign to much larger datasets.

StatAlign is available from the Downloads page.

For more information, please see our papers on the References page.

Recent news

12 Apr 2020

StatAlign v3.4 is released (What's new in this version?)

Minor change: full PDB filename (excluding file extension) now used for naming of structures when no header information is provided in the PDB file.

3 Jan 2019

StatAlign v3.3 is released (What's new in this version?)

MPI-based parallel version now available, using Metropolis-coupled MCMC to improve sampling efficiency.

27 Sept 2018

Book chapter in Computational Methods in Protein Evolution (Springer), describing analyses of the globins using the parallel version of StatAlign, with example code included:

Herman JL (2019) Enhancing statistical multiple sequence alignment and tree inference using structural information. In Computational Methods in Protein Evolution, Methods in Molecular Biology, T.X. Sikosek ed., Springer, New York, NY PubMed Springer website

10 Feb 2015

StatAlign v3.2 is released (What's new in this version?)

Joint sequence-structure analysis is now possible for cases where only some sequences have known structure, expanding the range of datasets that can be analysed using the StructAlign plugin.

4 Jun 2014

Paper describing StructAlign published in Molecular Biology and Evolution:

Herman JL, Challis CJ, Novák Á, Hein J and Schmidler, SC (2014) Simultaneous Bayesian estimation of alignment and phylogeny under a joint model of protein sequence and structure. Molecular Biology and Evolution, 31(9):2251-2266. PubMed MBE website

24 Nov 2013

StatAlign v3.1 is out! (What's new in this version?)

Proposal variance for substitution parameter is now tuned automatically. The StructAlign plugin can now read and write from PDB files, and the maximum likelihood structural superposition is written to a PDB file. Structural B-factors and pairwise RMSDs are used to annotate the current alignment sample in GUI mode.

26 Oct 2013

StatAlign v3.0 released! (What's new in this version?)

New features include:

Option to keep tree fixed when phylogeny is known
StructAlign plugin for alignment of protein structures
Introduced framework for handling extensions to the evolutionary model
Significant improvements to mixing on tree topologies
Several new types of MCMC moves to improve mixing and convergence
Automatic tuning for MCMC moves
Restructured code to allow for easy addition of new MCMC move objects
Code now allows for prior distributions to be easily specified
Selection of standard priors added (Gaussian, Gamma, etc.)

24 Feb 2013

StatAlign v2.1 has been released! (What's new in this version?)

Since version 2.0 StatAlign has built-in RNA-specific features, which include RNA secondary structure prediction from multiple alignments using either a thermodynamic approach (RNAalifold) or a Stochastic Context-Free Grammar approach (PPfold). This methodology allows more reliable structure predictions by incorporating alignment uncertainty. See our paper on the References page and on PubMed.