Home
Home

 Help Documentation
 Current document  

Molecular Statistics

Statistics summarizing character histories are reported/saved as the posterior expectation (summary values) or saved to file(s) as individual values. SIMMAP collects summary statistics (e.g., number of synonymous changes) for each sampled character history when simulating multiple histories using the Simulate Histories option under the Analysis main menu item. Values are not saved when simulating a single history using the Simulate History option although a running history of the number of changes are stored to be viewed when displaying the trees in the Tree View window. The statistics used for detecting character correlation are dealt with elsewhere in the help documents (see here). The rest of this document is split into three sections: Summary, Individual, and Positive Selection statistics. Each section describes how to view the results from a molecular analysis within SIMMAP, how to save values to a file(s), and viewing the results of a test for positive selection. Setting up and running a positive selection analysis is dealt with in the help documents on predictive distributions.

IMPORTANT NOTE: Summary values are continuously collected from one experiment to the next so beware that the results are those stored since last cleared from memory. Clear the memory from one run to the next unless it is desired that they are averages over runs.

Summary Values

A molecular character history can be performed for an individual site (nucleotide) or as a codon. (For information on setting up the character coding for a codon analysis see the help documents here.) Once an analysis is complete summary statistics can be viewed by selecting the Molecular Nucleotide Statistics... option in the Statistics main menu item. This opens the window shown on the right. Each site is displayed by its position in the data file. If character histories have been summarized for a site then a triangle, , is shown before the site. Selecting the triangle will turn it down and display the summary data.

The following statistics are collected for nucleotide data:
    Sample size - number of samples used to calculate expectation
    Homoplasy Index - Not implemented
    Transitions - expected number of transitions
    Transversions - expected number of transversions
    Substitutions
      State i => j - expected number of changes from one state to another
    Dwell times
      State i - expected time spent in a state as a fraction of the total time


If the data being analyzed is protein coding the following summary statistics are collected:
    Sample size - number of samples
    Synonymous - expected number of synonymous changes
    Non-synonymous - expected number of non-synonymous changes
    Dwell times
      State i - expected time in state i
    Conservative - expected number of conservative amino replacements
    Radical - expected number of radical amino replacements
    Amino acid properties
      State i => j - expected number of changes from one amino acid to another
    Change in Hydropathy - expected change in amino acid hydropathy
    Average Hydropathy - expected amino acid hydropathy

Amino acid hydropathic values are derived from a paper by Kyte and Doolittle (1982) [full reference]. Briefly, each amino acid is given a score from -4.5 (hydrophilic) to 4.5 (hydrophobic). The following are the values used by SIMMAP from Kyte and Doolittle (1982):

Amino Acid Hydropathy Value
Isoleucine (I) 4.5
Valine (V) 4.2
Leucine (L) 3.8
Phenylalanine (F) 2.8
Cysteine (C) 2.5
Methionine (M) 1.9
Alanine (A) 1.8
Glycine (G) -0.4
Threonine (T) -0.7
Tryptophan (W) -0.9
Serine (S) -0.8
Tyrosine (Y) -1.3
Proline (P) -1.6
Histidine (H) -3.2
Glutamic acid (E) -3.5
Glutamine (Q) -3.5
Aspartic acid (D) -3.5
Asparagine (N) -3.5
Lysine (K) -3.9
Arginine (R) -4.5

Other values exist (e.g., the Interface and Octanol free energy scales developed by the Stephen White lab at UC Irvine) and I am working on incorporating custom tables into SIMMAP. This development is focusing on allowing text files of custom values to be used. If you are interested in these, or others, send me an email and I will expedite their inclusion.

In calculating the radical or conservative nature of different amino acid changes the BLOSUM62 matrix is used as described by Henikoff, S. and Henikoff, J.G. (1992) [full reference] and used by Cargill et al. (1999) [full reference]. As with the hydropathy values, any number of probability mutation matrices exist. If you wish to see different matrices send me an email and I will try to incorporate them into SIMMAP.

Saving Summary Values

Summary values can be saved in three different ways: (1) Using options in Statistics main menu item; (2) Using options available under the Miscellaneous tab in the Simulate Histories window; and (3) Selecting the Save As button in the Summary statistics windows (see above). Each of these is described briefly. Options 1 and 2 differ from Option 3 by occuring during the analysis. Selecting the former options does not slow down the analysis in any considerable way. Information on the format and data saved is described in the help documents on output files.

To save the results using Option 1 select the Set File to Save History Statistics to... option in Statistics the main menu item. This opens a window allowing you to define the file name and location to save the summary values to. This file is automatically append too until it is redefined. The Save Individual/Summary Histories To... window allows you to activate, or deactivate, saving summary statistics. The default is to NOT save the data. To activate the saving select the appropriate values in the Save Individual/Summary Histories To... window. These options can also be turned on and off in the Simulate Histories window under the Miscellaneous tab.

To save the results using Option 2 open the Simulate Histories window [M] and select Miscellaneous tab. Once selected, select the Set button under the Statistics (both are saved to the same file): heading. As described above, this opens the Save Individual/Summary Histories To... window. Select the file, location, and options as previously described.

Finally, Option 3, is a post-analysis approach. You can select to save the values displayed in the Nuc Stats or the Codon Stats windows shown above. To save the summary values simply select the Save As button in the window. This prompts you to select a name and location for the file.

Individual Values

Individual values are those from a single character history (e.g., the number of substitutions) rather than a summary over replicates. Unlike summary values the individual values can ONLY be saved during the character history analysis. If you neglect to configure an analysis to save the individual values the analysis will need to be performed again. While, this option is not a default I recommend its use in all final analyses. The values are those that are collected for calculating the summary values (posterior expectations). For more information on the data saved see the help documents on output files or the descriptions above. SIMMAP does not save values for both Nucleotide and Codons simultaneously; only values for the type of analysis selected are saved.

Saving Individual Values

Individual statistics are saved based on the type of analysis in some cases and available to all types of analyses. The following is a table describing the type of data saved and the analysis options (data types) it is available in. Activating these options for molecular data analysis is described later.

Data saved Availability by data type
Posterior expectations All analysis types (see summary values above for molecular specifics)
Tree histories (raw data) All analysis types (see the saving raw data help docs)
Predictive null values Character correlation and positive selection
Positive selection by node Positive selection (WARNING: Files can become really large!)


Saving Individual and Other Data

To activate saving different individual statistics one of the options (shown as red numbers in the image to the right) need to be activated by selecting the desired check box. Simply activating a switch will not lead to values being saved. A file needs to defined using the Set buttons to the right of each switch. Once a file is successfully defined the filename will be shown below each switch. Each of these is described briefly,
  1. This option activates saving the summary statistics at the end of an analysis. (If saving individual statistics [2.] is also selected to be saved they will be written to the same file.)

  2. This option activates saving the individual statistics from each realization from a history at the end of an analysis. (If saving summary statistics [1.] is also selected to be saved they will be written to the same file.)

  3. This option will activate saving the tree histories (raw data) to a file. For more information on this option see the help documents on saving raw character histories.

  4. This option will activate saving the null statistics from a predictive distribution test. See the help documents on predictive tests for more information.
About Search System Requirements License Acknowledgements Contact
Molecular Model Parameters   Morphology Priors
Page Last Updated: 6 August 2008