Home
Home

 Help Documentation
 Current document  

Character Covariation (Correlations)

Often we would like to know whether two characters, or particular character states, covary on the phylogeny. For example, morphological studies addressing hypotheses regarding correlated evolution or ecological studies testing for associations between a character and a particular environment. Character histories can be used to address these questions. For a more indepth description of the use of character histories for measuring character correlation see Huelsenbeck et al. (2003).

Brief Overview of Method

Let us look at the method briefly for one of the statistics for measuring covariation discussed below. Imagine we have two characters with two-states (binary characters). We can sample a realization of a character (mutational) history for each. See the figure to the right which shows two possible character histories for each site. The histories are very similar with the only difference being the reconstruction at the root for the lower (L) character. For two binary characters we can have 4 different possible configurations (associations between the two charcters). For each state configuration we calculate the obersved time spent in this configuration along the tree (see the Observed 4x4 association table to the right). These values reflect the observed association for the sampled character history. However, we would like to correct for the likelihood that the current configuration is due simply to a chance association on the phylogeny. Luckily, this can be corrected in a straightforward manner by taking the products of the independent amount of time say in state 0 for the U character and the state 0 for the L character. The independent values for each character state are the sum of the rows/cols. For example, the expected frequency of seeing a 0-0 (U-L) state configuration is: (0.36+0.0) x (0.36+0.14) = 0.18. Most of the statistics for measuring covariation in SIMMAP utilize summarizing histories in a similar way to this. This, of course, is not the only way. Using the options for saving the raw character histories the user can develop a customized test of character correlation (see here for more information on saving the raw character histories).

Performing A Correlation Analysis

The first step to performing a character correlation analysis is to set the model of character evolution. This is covered elsewhere and will be skipped here (see the help documents on models and morphology priors). Once the evolutionary model has been configured the next step is to define which characters (sites) that are to be tested for correlation.
This can be accomplished through the Simulate Histories window [M]. Under the Sampling tab select the Character association (correlation) radio button (shown highlighted in the red oval below). Next, select which sites using the pop-up menus (Site 1: and Site 2:). Once the sites have been selected then the number of realizations (samples) from the posterior of character histories should be selected. (Note: the default settings are not recomendations.) The remainder of the settings are set identically to other character history analyses. (See the help documents on simulating character histories.)

So far we have set up the correlation analysis. SIMMAP uses predictive distributions to test whether the observed level of correlation is greater than expected by chance (i.e., significant). Predictive distributions use parameter values and topologies sampled from the posterior to generate the predictive (null) character histories. (If you specifiy the maximum likelihood estimates then the analysis becomes the parametric bootstrap.) The null hypothesis being tested is that "independent evolution could produce a could produce a correlation as extreme as that observed." To setup SIMMAP to perform a predictive test of character correlation select the Predictive Tests... option from the Analysis item in the main menu. The window to the right will open allowing different test options. This window works in cooperation with the Simulate Histories window options. There are two kinds of tests that SIMMAP performs: Positive selection and character correlations. (See the help documents on predictive distributions for more information.)

The next step is to select the statistic that measures the correlation between characters. This is accomplished by selecting the Character Correlation check box and then selecting the desired statistic. Two statistics can be chosen: (1) Association Measure (D) and (2) Mutual Historical Information (M). The form of these statistics vary depending on whether the characters are molecular or morphological. (The statistics are described below.) SIMMAP does not currently allow correlations between different data types.

The last item to be configured is the Sampling Design Options. This determines how many realizations are performed during sampling from the predictive distribution and is referenced to the number of samples requested in the Simualte Histories window. For example, if an analysis has been configured to perform 10 realizations for each tree/parameter posterior sample and the the Sampling Design Options text field has been defined as 10 then the total number of realizations will be a one hundred for each tree/parameter sample. If you wish the observed and null sampling to be the same leave this value as 1.


Correlation Statistics

A number of measures of correlation are implemented for nucleotide and morphological characters and their states.

Statistics for measuring morphological character and state associations, first described by Huelsenbeck et al. (2003), have been implemented in SIMMAP to test for state-by-state associations (Association Measures shown to the right). D is the overall association between character i and j. d is a measure of the association between the individual states of each character. a is the observed (o) or expected (e) association between character state i and j (n is the number of states for character 1 and m is the number of states for character 2. The association between one state and another is the frequency of occurrence of states on the phylogeny (see above).

The association measure for molecular (nucleotide) characters is simply the Chi-square statistic with a Yate's correction term (C = 0.5).

Alternatively, SIMMAP implements a statistic similar in form to the mutual information content statistic (MIC) for both nucleotide and morphological data types. As implemented in SIMMAP, this statistic evaluates the correlation between character histories along the phylogeny for two characters (M) and their states (m) where f is the fraction of time one state is associated with another in a character history.
About Search System Requirements License Acknowledgements Contact
Character Coding   Genetic Codes
Page Last Updated: 6 August 2008