An Extendable Software Package for Joint Bayesian Estimation of Alignments and Evolutionary Trees |
This approach accounts for the inherent interdependence between estimation of alignments and phylogenies, avoiding the inherent bias present in traditional methods that rely on a single alignment. This helps to avoid artifacts such as those resulting from the choice of guide tree when constructing alignments.
The models behind the analysis permit the comparison of evolutionarily distant sequences. The TKF92 insertion-deletion model allows for indel information to be used to help inform phylogeny estimation, and can be coupled with a wide variety of substitution models. A broad range of models for nucleotide and amino acid data is included in the package and the plugin management system ensures that new models can be easily added.
StatAlign also includes model extension plugins capable of incorporating protein and RNA structural information, thereby increasing the reliability of the inference.
Although the increased computational overheads of this approach limit its application to smaller datasets (we have tested up to 20-30 sequences), the probabilistic nature of the model allows for more reliable information to be obtained from smaller datasets, and enables the uncertainty in the results to be quantified more accurately. We are currently working on a parallelised version that will extend the capabilities of StatAlign to much larger datasets.
StatAlign is available from the Downloads page.
For more information, please see our papers on the References page.
StatAlign v3.4 is released (What's new in this version?)
Minor change: full PDB filename (excluding file extension) now used for naming of structures when no header information is provided in the PDB file.
3 Jan 2019
StatAlign v3.3 is released (What's new in this version?)
MPI-based parallel version now available, using Metropolis-coupled MCMC to improve sampling efficiency.
27 Sept 2018
Book chapter in Computational Methods in Protein Evolution (Springer), describing analyses of the globins using the parallel version of StatAlign, with example code included:
Herman JL (2019) Enhancing statistical multiple sequence alignment and tree inference using structural information. In Computational Methods in Protein Evolution, Methods in Molecular Biology, T.X. Sikosek ed., Springer, New York, NY PubMed Springer website
10 Feb 2015
StatAlign v3.2 is released (What's new in this version?)
Joint sequence-structure analysis is now possible for cases where only some sequences have known structure, expanding the range of datasets that can be analysed using the StructAlign plugin.
4 Jun 2014
Paper describing StructAlign published in Molecular Biology and Evolution:
Herman JL, Challis CJ, Novák Á, Hein J and Schmidler, SC (2014) Simultaneous Bayesian estimation of alignment and phylogeny under a joint model of protein sequence and structure. Molecular Biology and Evolution, 31(9):2251-2266. PubMed MBE website
24 Nov 2013
StatAlign v3.1 is out! (What's new in this version?)
Proposal variance for substitution parameter is now tuned automatically. The StructAlign plugin can now read and write from PDB files, and the maximum likelihood structural superposition is written to a PDB file. Structural B-factors and pairwise RMSDs are used to annotate the current alignment sample in GUI mode.
26 Oct 2013
StatAlign v3.0 released! (What's new in this version?)
New features include:
24 Feb 2013
StatAlign v2.1 has been released! (What's new in this version?)
Since version 2.0 StatAlign has built-in RNA-specific features, which include RNA
secondary structure prediction from multiple alignments using either a thermodynamic
approach (RNAalifold) or a Stochastic Context-Free Grammar approach (PPfold).
This methodology allows more reliable structure predictions by incorporating
alignment uncertainty. See our paper on the References
page and on PubMed.