Xml Input File
Assign Branch Lengths
As a prelude to this help document I recommend that you read Schultz and Churchill (1999) [link to journal] which is an excellent description of dealing with priors in morphological (standard) data analyses. In addition, since SIMMAP uses this approach, you should cite the authors.
SIMMAP uses one or two priors, depending on the number of states that exists for a character, when analyzing morphological or standard data types. For the rest of this document I will refer to characters as "standard" and this should be interpreted as including both, discrete morphological traits (e.g., polygamy vs. monogamy) and standard coded traits (e.g., forest vs. savannah). The two priors are: (1) a bias prior on two-state characters, and (2) an overall evolutionary rate prior. Each of these are dealt with below. First, I will discuss the overall rate prior, then deal with the bias prior for two-state characters. The laste section of this document describes a MCMC methods for selecting good parameters for the prior distribution.
Overall Rate Prior
SIMMAP uses a gamma prior on the overall substitution rate of a character. This method is similar to that suggested by Yang (1994) [link to journal] for molecular data but both parameters of the Γ (gamma) distribution can be defined. The definition of both parameters allows the distribution to have a mean that is not centered on a value of one. The basic approach is that the overall rate of standard characters are unknown (although the relative evolutionary divergences may be known using molecular divergence), that uncertainty can be satisfactorily accommodated by using a Bayesian prior. The Γ distribution is made discrete using k categories. α and β are the parameters of the Γ distribution. The expected value and standard deviation of the Γ prior is displayed dynamically as the parameter values are changed. You can select to break the distribution into 10, 20, 30, 40, 50, 60, and 70 categories.
What is the algorithm used by SIMMAP in applying a gamma rate prior?
- Re-scale the branch lengths of the tree in memory such that the overall length of the tree (sum of all branch lengths) is one (you can select to NOT re-scale the branch lengths - see below).
- The posterior probability of each gamma (Γ) category, defined by the prior, is calculated and a stochastic draw is made from this distribution.
- The rate value, for the sampled category, is used as multiplier of the branch lengths on the un-scaled or re-scaled tree.
The use of rescaling and placing a prior on the tree allows the user to explore the effects of a range of rates on the character histories while maintaining the branch length proportionality.
A variety of different gamma (Γ) priors are shown below to give a feel for the different shapes that the gamma can adopt, the expectation of the distribution, and the allowable values under the prior distribution.
Two-state Bais Parameter Prior
SIMMAP places a symmetrical Beta prior for morphological state frequencies of two-state characters (called the bias parameter and denoted by π). The Beta distribution is made discrete using k categories. α is the parameter of the Beta distribution. As α gets large the distribution becomes a narrow peak around 0.5 (see α=100.0 below). Therefore, if you wish to have equal prior probabilities for each state this value should be large. Alternatively, if you want an uninformative prior a Beta α parameter of 1 results in equal prior probabilities. You can select to break the distribution into 5, 7, 9, 11, 13, and 15 categories.
What is the algorithm used by SIMMAP in applying a beta bias prior?
- The posterior probability of each beta category, defined by the prior, is calculated and a stochastic draw is made from this distribution.
- The bias value, for the sampled category, is used.
Choosing Good Prior Parameters
Choosing parameters of the prior distributions for an analysis of morphological/standard characters is very challenging. In previous versions of SIMMAP you simply needed to guess or try a large number of different priors. What we would like is an approach that is objective, statistical, not ad hoc, and that is fairly fast.
To address this problem I have developed a two-step approach. In the first step SIMMAP 1.5 will perform an MCMC analysis to sample overall rate values (Gamma prior) and bias values (Beta prior for two-state characters). The next step uses the samples from the posterior distribution of these parameters and finds the best fitting Gamma and Beta distribution using the R Statistical Package. An example of this approach to choosing parameters for these prior distributions (see below).
I have written an R script (see the software distribution you downloaded) that will perform this analysis in R.
The output of this script will display the posterior distributions (and 95% HPD shown in red in the example below) of the parameters and the results of the best fits of the Gamma and Beta distributions. The parameter values shown at the bottom these later two plots (Figs 2 and 4 in the example shown below) indicates the prior parameter values needed to implement the best fit priors in a mutational mapping analysis.
- The R Statistical package is very easy to install: An installer package containing the base is available. Visit the R web site to download the package.
- The R script I have provided requires two packages: MASS (included in the base package) and TeachingDemos. These can be installed by starting the R application, selecting Packages & Data->Package Installer in the R main menu, select the Get List button, and select to install the required packages.
- To run the R script open the R program, change the working directory to the location of your output file from SIMMAP 1.5 by selecting Misc->Change Working Directory in the R main menu, open the R script (the file name is sumprmcmc.r) by selecting File->Source File... in the R main menu, and type prsummary("Your Output Filename") at the R command prompt.
- I am working on placing a full tutorial on the Tutorials pages of this site.
Common mistakes running the R script:
- The results file can not be found because it is not in the Current Working Directory. Without providing the full path to a file R expects the file in its current working directory (by default on the Mac OS X this your home directory). To change the current working directory type the following at the command prompt in R: setwd("/Users/bollback/Desktop/"). The quotes around the path are required. In this example the new current working directory would be my desktop. Simply change the path in the example to the location of the file on your computer. You can check the working directory by typing the follow at the R prompt: getwd(). Alternatively, you can change the working directory as described in Note 3 above.
- You get the following error when trying to run the scripts function: Error in read.table(filename, header = T) : object 'example.results' not found. This example assumes the file you are trying to load is called example.results. The problem here is that R is expecting an object called example.results in memory. When passing the filename or path to the function it needs to be in single or double quotes. E.g., prsummary("example.results").