statalign.base
Class Utils

java.lang.Object
  extended by statalign.base.Utils

public class Utils
extends java.lang.Object

This class contains multi-purpose static functions.

Author:
miklos, novak, herman

Field Summary
static boolean DEBUG
          Debugging mode (various consistency checks done if on)
static boolean DOWNWEIGHT_INDEL_LIKELIHOOD
          If true then we downweight the indel contribution to the overall likelihood.
static org.apache.commons.math3.random.RandomGenerator generator
          The random number generator used throughout the program.
static double LEAF_COUNT_POW
          Power determining how much we favour realigning the larger subtree first when doing a nearest-neighbour interchange move.
static double log0
          log(0) is set to Double.NEGATIVE_INFINITY.
static double LOW_COUNT_MULTIPLIER
           
static int LOW_COUNT_THRESHOLD
           
static double MAX_ACCEPTANCE
          During the burnin, the proposalWidthControlVariable for all continuous parameters is adjusted in order to ensure that the average acceptance rate is between MIN_ACCEPTANCE and MAX_ACCEPTANCE where possible.
static int MAX_SILENT_LENGTH
           
static double MIN_ACCEPTANCE
          During the burnin, the proposalWidthControlVariable for all McmcMove objects is adjusted (if McmcMove.autoTune=true) in order to ensure that the average acceptance rate is between MIN_ACCEPTANCE and MAX_ACCEPTANCE where possible.
static double MIN_EDGE_LENGTH
           
static double MIN_SAMPLES_FOR_ACC_ESTIMATE
          Number of samples during burnin used to get a rough estimate of the current acceptance rate, for the purposes of tuning the proposal variance control parameters.
static int MIN_SEQ_LENGTH
          Minimum length for internal node sequence.
static boolean SHAKE_IF_STUCK
          If true then during the first half of the burnin if a particular McmcMove has been below its minimum acceptance rate for at least (LOW_COUNT_THRESHOLD * MIN_SAMPLES_FOR_ACC_ESTIMATE) iterations, then for the purposes of computing the acceptance ratio, we multiply the new log likelihood by LOW_COUNT_MULTIPLIER raised to a power that increases with the number of iterations beyond the threshold.
static double SILENT_INSERT_PROB
           
static double SPAN_MULTIPLIER
          During the burnin, the proposalWidthControlVariable for all continuous parameters is adjusted in order to ensure that the average acceptance rate is between MIN_ACCEPTANCE and MAX_ACCEPTANCE where possible.
static boolean USE_FULL_WINDOWS
          If this is set to true then the alignment moves operate on the whole alignment rather than selecting subwindows.
static boolean USE_INDEL_CORRECTION_FACTOR
          If true then we divide out the stationary probability of the internal nodes from the indel likelihood, as per Redelings and Suchard (2005), using the TKF92 stationary distribution defined in Thorne et al. (1992).
static boolean USE_MODEXT_EM
          If true, then ModelExtensions are allowed to offer a contribution to the emission probability used to compute the dynamic programming matrices for alignment proposals.
static boolean USE_MODEXT_UPP
          If true, then ModelExtensions are allowed to offer an upper contribution to the emission probability used to compute the dynamic programming matrices for alignment proposals.
static boolean USE_UPPER
          Whether to use information from the upper parts of the tree in order to fill out the hmm2 and hmm3 matrices.
static boolean VERBOSE
           
static double WINDOW_MULTIPLIER
          Initial value for the alignment proposal window length multiplier.
 
Method Summary
static java.lang.String[] alignmentTransformation(java.lang.String[] s, java.lang.String[] names, java.lang.String type, InputData input)
          Transforms an alignment into the prescribed format
static double calcEmProb(double[] fel, double[] aaEquDist)
          Calculates emission probability from Felsenstein likelihoods
static int chooseOne(double prob, statalign.base.MuDouble selectLogLike)
          Behaves exactly like weightedChoose(new double[]{1-prob,prob}, selectLogLike), but faster
static java.util.List<java.lang.String> classesInPackage(java.lang.String packageName)
          Finds all classes in a given package and all of its subpackages by walking through class path.
static java.lang.String convertTime(long x)
          Takes a time in milliseconds and converts to a string to be printed.
static char[] copyOf(char[] array)
           
static double[] copyOf(double[] array)
           
static int[] copyOf(int[] array)
           
static
<T> java.util.List<T>
findPlugins(java.lang.Class<T> superClass)
          Locates all plugins that are descendants of the specified plugin superclass.
static boolean isValidHistory(boolean p, boolean g, boolean[] neighb)
          For a tree of the form: gg / g / \ p u / \ t b this function determines valid possible indel states for p and g given fixed states for the neighbouring nodes.
static boolean isValidHistory(boolean p, boolean g, boolean[] neighb, boolean gIsRoot)
          For a tree of the form: gg / g / \ p u / \ t b or, if gIsRoot = true, then for a tree of the form g / \ p u / \ t b this function determines valid possible indel states for p and g given fixed states for the neighbouring nodes.
static
<T> java.lang.Iterable<T>
iterate(java.util.Enumeration<T> en)
          Makes Enumeration iterable.
static java.lang.String joinStrings(java.lang.Object[] strs, java.lang.String separator)
          Joins strings using a separator string.
static java.lang.String joinStrings(java.lang.Object[] strs, java.lang.String prefix, java.lang.String separator)
          Joins strings using a prefix and a separator string.
static int linearizerWeight(int length, statalign.base.MuDouble selectLike, double expectedLength)
          This function selects a random integer with expected value given by expectedLength.
static double linearizerWeightProb(int length, int index, double expectedLength)
          This function returns the probability of choosing a particular index with linearizerWeight.
static double logAdd(double a, double b)
          Logarithmically add two numbers
static double logBetaDensity(double x, double alpha, double beta)
           
static double logGammaDensity(double x, double shape, double rate)
           
static int logWeightedChoose(double[] logWeights)
           
static int logWeightedChoose(double[] logWeights, statalign.base.MuDouble selectLogLike)
          Equivalent to weightedChoose(weights, selectLogLike) where logWeights[i] = Math.log(weights[i]), but avoids overflows that might result from exponentiation.
static int minMax(int value, int min, int max)
           
static java.lang.String repeatedString(java.lang.String s, int n)
           
static int weightedChoose(double[] weights)
           
static int weightedChoose(double[] weights, statalign.base.MuDouble selectLogLike)
          Similar to weightedChoose(weights), but the log-probability of the selection will be subtracted from the mutable double object selectLogLike (reason: proposal is in the denominator of acceptance ratio) (MuDouble is used to allow for another return value, in C++ a double pointer/reference could be used instead)
static int weightedChoose(int[] weights)
          This function returns a random index, weighted by the weights in the array `weights'
static int weightedChoose(java.util.List<java.lang.Double> weights, statalign.base.MuDouble selectLogLike)
           
static int weightedChoose(java.util.List<java.lang.Integer> weights)
          This function returns a random index, weighted by the weights in the array `weights'
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEBUG

public static boolean DEBUG
Debugging mode (various consistency checks done if on)


USE_FULL_WINDOWS

public static boolean USE_FULL_WINDOWS
If this is set to true then the alignment moves operate on the whole alignment rather than selecting subwindows. This is usually much slower.


USE_MODEXT_EM

public static boolean USE_MODEXT_EM
If true, then ModelExtensions are allowed to offer a contribution to the emission probability used to compute the dynamic programming matrices for alignment proposals. NB this will be switched on automatically when a suitable ModelExtension is activated. Setting to true here will render this variable constitutively active, which is unlikely to be useful.


USE_MODEXT_UPP

public static boolean USE_MODEXT_UPP
If true, then ModelExtensions are allowed to offer an upper contribution to the emission probability used to compute the dynamic programming matrices for alignment proposals. The upper contribution involves information about all vertices outside of the current subtree. NB this will be switched on automatically when when a suitable ModelExtension is activated. Setting to true here will render this variable constitutively active, which is unlikely to be useful.


USE_UPPER

public static boolean USE_UPPER
Whether to use information from the upper parts of the tree in order to fill out the hmm2 and hmm3 matrices.


LEAF_COUNT_POW

public static double LEAF_COUNT_POW
Power determining how much we favour realigning the larger subtree first when doing a nearest-neighbour interchange move.


generator

public static org.apache.commons.math3.random.RandomGenerator generator
The random number generator used throughout the program. A new generator is constructed at each MCMC run using the seed in the corresponding MCMCPars object.


SPAN_MULTIPLIER

public static final double SPAN_MULTIPLIER
During the burnin, the proposalWidthControlVariable for all continuous parameters is adjusted in order to ensure that the average acceptance rate is between MIN_ACCEPTANCE and MAX_ACCEPTANCE where possible. This is done by repeatedly multiplying the proposalWidthControlVariable by SPAN_MULTIPLIER until the acceptance falls within the desired range.

See Also:
Constant Field Values

MIN_ACCEPTANCE

public static final double MIN_ACCEPTANCE
During the burnin, the proposalWidthControlVariable for all McmcMove objects is adjusted (if McmcMove.autoTune=true) in order to ensure that the average acceptance rate is between MIN_ACCEPTANCE and MAX_ACCEPTANCE where possible. This is done by repeatedly multiplying the proposalWidthControlVariable by SPAN_MULTIPLIER until the acceptance falls within the desired range.

See Also:
Constant Field Values

MAX_ACCEPTANCE

public static final double MAX_ACCEPTANCE
During the burnin, the proposalWidthControlVariable for all continuous parameters is adjusted in order to ensure that the average acceptance rate is between MIN_ACCEPTANCE and MAX_ACCEPTANCE where possible. This is done by repeatedly multiplying the proposalWidthControlVariable by SPAN_MULTIPLIER until the acceptance falls within the desired range.

See Also:
Constant Field Values

WINDOW_MULTIPLIER

public static double WINDOW_MULTIPLIER
Initial value for the alignment proposal window length multiplier.


MIN_SAMPLES_FOR_ACC_ESTIMATE

public static final double MIN_SAMPLES_FOR_ACC_ESTIMATE
Number of samples during burnin used to get a rough estimate of the current acceptance rate, for the purposes of tuning the proposal variance control parameters.

See Also:
Constant Field Values

log0

public static final double log0
log(0) is set to Double.NEGATIVE_INFINITY. This is used in logarithmic adding. The logarithm of an empty sum is set to this value.

See Also:
Constant Field Values

MIN_EDGE_LENGTH

public static final double MIN_EDGE_LENGTH
See Also:
Constant Field Values

MIN_SEQ_LENGTH

public static final int MIN_SEQ_LENGTH
Minimum length for internal node sequence.

See Also:
Constant Field Values

DOWNWEIGHT_INDEL_LIKELIHOOD

public static final boolean DOWNWEIGHT_INDEL_LIKELIHOOD
If true then we downweight the indel contribution to the overall likelihood.

See Also:
Constant Field Values

USE_INDEL_CORRECTION_FACTOR

public static final boolean USE_INDEL_CORRECTION_FACTOR
If true then we divide out the stationary probability of the internal nodes from the indel likelihood, as per Redelings and Suchard (2005), using the TKF92 stationary distribution defined in Thorne et al. (1992).

See Also:
Constant Field Values

LOW_COUNT_THRESHOLD

public static final int LOW_COUNT_THRESHOLD
See Also:
Constant Field Values

LOW_COUNT_MULTIPLIER

public static final double LOW_COUNT_MULTIPLIER
See Also:
Constant Field Values

SHAKE_IF_STUCK

public static final boolean SHAKE_IF_STUCK
If true then during the first half of the burnin if a particular McmcMove has been below its minimum acceptance rate for at least (LOW_COUNT_THRESHOLD * MIN_SAMPLES_FOR_ACC_ESTIMATE) iterations, then for the purposes of computing the acceptance ratio, we multiply the new log likelihood by LOW_COUNT_MULTIPLIER raised to a power that increases with the number of iterations beyond the threshold. This gradually favours the state jumping, which may be useful to avoid getting stuck in local modes during the burnin. Ideally such a scheme should not be needed, however.

See Also:
Constant Field Values

MAX_SILENT_LENGTH

public static final int MAX_SILENT_LENGTH
See Also:
Constant Field Values

SILENT_INSERT_PROB

public static final double SILENT_INSERT_PROB
See Also:
Constant Field Values

VERBOSE

public static boolean VERBOSE
Method Detail

logGammaDensity

public static double logGammaDensity(double x,
                                     double shape,
                                     double rate)
Parameters:
x -
shape -
rate -
Returns:
The unnormalised log density of Gamma(x | shape, rate)

logBetaDensity

public static double logBetaDensity(double x,
                                    double alpha,
                                    double beta)
Parameters:
x -
alpha -
beta -
Returns:
The unnormalised log density of Beta(x | alpha, beta)

linearizerWeight

public static int linearizerWeight(int length,
                                   statalign.base.MuDouble selectLike,
                                   double expectedLength)
This function selects a random integer with expected value given by expectedLength. The probability of the selection of that particular index is returned in selectLike. (MuDouble is used to allow for another return value, in C++ a double pointer/reference could be used instead)

Parameters:
length - The length of the array we need.
selectLike - A mutable double object to return the selection probability
expectedLength - The expected window length.
Returns:
A random integer as described above

linearizerWeightProb

public static double linearizerWeightProb(int length,
                                          int index,
                                          double expectedLength)
This function returns the probability of choosing a particular index with linearizerWeight. The value returned is equal to mu.value when linearizerWeight(length, mu) returns 'index'.

Parameters:
length - Distribution parameter as in linearizerWeight
index - Selected index
Returns:
Probability of the selection

weightedChoose

public static int weightedChoose(int[] weights)
This function returns a random index, weighted by the weights in the array `weights'


weightedChoose

public static int weightedChoose(java.util.List<java.lang.Integer> weights)
This function returns a random index, weighted by the weights in the array `weights'


weightedChoose

public static int weightedChoose(double[] weights,
                                 statalign.base.MuDouble selectLogLike)
Similar to weightedChoose(weights), but the log-probability of the selection will be subtracted from the mutable double object selectLogLike (reason: proposal is in the denominator of acceptance ratio) (MuDouble is used to allow for another return value, in C++ a double pointer/reference could be used instead)


weightedChoose

public static int weightedChoose(double[] weights)

weightedChoose

public static int weightedChoose(java.util.List<java.lang.Double> weights,
                                 statalign.base.MuDouble selectLogLike)

chooseOne

public static int chooseOne(double prob,
                            statalign.base.MuDouble selectLogLike)
Behaves exactly like weightedChoose(new double[]{1-prob,prob}, selectLogLike), but faster


logWeightedChoose

public static int logWeightedChoose(double[] logWeights,
                                    statalign.base.MuDouble selectLogLike)
Equivalent to weightedChoose(weights, selectLogLike) where logWeights[i] = Math.log(weights[i]), but avoids overflows that might result from exponentiation. (MuDouble is used to allow for another return value, in C++ a double pointer/reference could be used instead)


logWeightedChoose

public static int logWeightedChoose(double[] logWeights)

isValidHistory

public static boolean isValidHistory(boolean p,
                                     boolean g,
                                     boolean[] neighb)
For a tree of the form:
       gg
       /
      g
     / \
    p   u
  /  \
 t    b
 
this function determines valid possible indel states for p and g given fixed states for the neighbouring nodes.

Parameters:
p - The presence/absence of node p.
g - The presence/absence of node b.
neighb - An array indicating the state of the neighbouring nodes, in the order {t,b,u,gg}.
Returns:
A boolean value indicating whether the specified values of p and b are compatible with the neighbouring states.

isValidHistory

public static boolean isValidHistory(boolean p,
                                     boolean g,
                                     boolean[] neighb,
                                     boolean gIsRoot)
For a tree of the form:
       gg
       /
      g
     / \
    p   u
  /  \
 t    b
 
or, if gIsRoot = true, then for a tree of the form
      g
     / \
    p   u
  /  \
 t    b
 
this function determines valid possible indel states for p and g given fixed states for the neighbouring nodes.

Parameters:
gIsRoot - This is true if g is the root of the tree.
p - The presence/absence of node p.
g - The presence/absence of node b.
neighb - An array indicating the state of the neighbouring nodes, in the order {t,b,u,gg} (if gIsRoot=false), or {t,b,u} (if gIsRoot=true).
Returns:
A boolean value indicating whether the specified values of p and b are compatible with the neighbouring states.

convertTime

public static java.lang.String convertTime(long x)
Takes a time in milliseconds and converts to a string to be printed.

Parameters:
x - The time to be formatted, in milliseconds (as a long).
Returns:
A string to be printed.

logAdd

public static double logAdd(double a,
                            double b)
Logarithmically add two numbers

Parameters:
a - log(x)
b - log(y)
Returns:
log(x+y)

calcEmProb

public static double calcEmProb(double[] fel,
                                double[] aaEquDist)
Calculates emission probability from Felsenstein likelihoods


repeatedString

public static java.lang.String repeatedString(java.lang.String s,
                                              int n)

iterate

public static <T> java.lang.Iterable<T> iterate(java.util.Enumeration<T> en)
Makes Enumeration iterable.

Type Parameters:
T - Enumeration element type
Parameters:
en - the Enumeration
Returns:
an Iterable that can iterate through the elements of the Enumeration

joinStrings

public static java.lang.String joinStrings(java.lang.Object[] strs,
                                           java.lang.String separator)
Joins strings using a separator string. Accepts any Objects converting them to strings using their toString method.

Parameters:
strs - strings to join
separator - the separator string
Returns:
a string made up of the strings separated by the separator

joinStrings

public static java.lang.String joinStrings(java.lang.Object[] strs,
                                           java.lang.String prefix,
                                           java.lang.String separator)
Joins strings using a prefix and a separator string. Accepts any Objects converting them to strings using their toString method.

Parameters:
strs - strings to join
prefix - prefix for each string
separator - the separator string
Returns:
a string made up of the strings with the given prefix and separated by the separator

classesInPackage

public static java.util.List<java.lang.String> classesInPackage(java.lang.String packageName)
Finds all classes in a given package and all of its subpackages by walking through class path. Handles both directories and jar files.

Parameters:
packageName - the package in which the classes are searched for
Returns:
array of found class names (with full package prefixes)

findPlugins

public static <T> java.util.List<T> findPlugins(java.lang.Class<T> superClass)
Locates all plugins that are descendants of the specified plugin superclass. The plugins are expected to be in the package root.plugins where root refers to the package of the superclass.

Parameters:
superClass - the ancestral plugin class
Returns:
list of plugins found

alignmentTransformation

public static java.lang.String[] alignmentTransformation(java.lang.String[] s,
                                                         java.lang.String[] names,
                                                         java.lang.String type,
                                                         InputData input)
Transforms an alignment into the prescribed format

Parameters:
s - String array containing the alignment in StatAlign format
type - The name of the format, might be "StatAlign", "Clustal", "Fasta", "Phylip", "Nexus"
input - The input data. Needed for the Nexus format that needs a name of the alignment (set to input.title) the type of the alignment (either nucleotide or protein, read from input.model) and the list of characters in the substitution model (also read from input.model).
Returns:
String array containing the alignment in the prescribed format

copyOf

public static char[] copyOf(char[] array)

copyOf

public static int[] copyOf(int[] array)

copyOf

public static double[] copyOf(double[] array)

minMax

public static int minMax(int value,
                         int min,
                         int max)