Home
Home

 Help Documentation
 Current document  

Nucleotide Models: Overview

SIMMAP implements the following nucleotide substitution models and their sub-variants,
  • General-time-reversible model (GTR)
  • Symmetric model (SYM)
  • Hasegawa-Kishino-Yano model (HKY85)
  • Kimura 2-parameter (K2P)
  • Felsenstein 1981 (F81)
  • Jukes-Cantor (JC69)
Among-site rate variation can be accommodated using
  • Discrete gamma (Yang 1994)
  • Site-specific rates
  • Invariant sites
Model parameters can be set in the following ways
  • Fixing values in the Model window
  • Values derived from a single line of a parameter file
  • Linking tree and parameter values (this is the approach if the results are to be averaged over samples from a distribution [e.g., posterior distribution using output of MrBayes]).
Configuring the nucleotide substitution model

SIMMAP allows users to perform a molecular character history analysis using fixed model parameters - an empirical Bayesian analysis - or using samples taken from the posterior distribution - a hierarchical Bayesian analysis. To configure the model select Set Model in the Models main menu. This will open the Model window which has three views (each are shown below). The tabs allow configuring different aspects of the model: substitution rate parameters, character frequencies, and rate variation.

Substitution parameters

Three susbtitution types (nst) are available: 1, 2, and 6. This does not mean that you can not configure the model to all possible subvariants of the basic GTR model (203 exist) as will be described.

When setting Nst = 1 this corresponds, depending on how the character frequencies are dealt with, to the Jukes-Cantor (1969) [equal frequency] or Felsenstein's (1981) [unequal frequency] model.

When Nst = 2 is selected the user can opt to Fix Kappa, using the text field to the right, or Use Kappa from file, or Estimate Kappa (from GTR). These settings refer to Kimura's 2-parameter (K2P) model, or the Hasegawa-Kishino-Yano (HKY85) / Felsenstein's 1984 (F84) model. Which is implemented depends on the choice of how nucleotide frequencies are treated (equal frequency = K2P; unequal frequency = HKY85/F84).

When Nst = 6 is selected the user can opt to Fix GTR rates or GTR rates from file. If you opt to fix the rates then values can be entered in the appropriate text fields (e.g., AG; row A, column G). This allows the user to enter, say the maximum likelihood estimates, values directly without using a parameter file. This is equivalant to an empirical Bayesian analysis. On the other hand if the values are read from a file the user can customize the model to any of the possible GTR sub-variant models. The parameter file will require a column for each of the rates but these can be set to a value of 1 for specific rates that should be equal. If these are samples from the posterior then this is equivalant to a hierarchical Bayesian analysis. For more information on using or how to use model parameters to perform a particular type of analysis see the help documents on molecular model parameters.

Character frequencies

Nucleotide frequencies can be dealt with in four ways: Equal frequencies, Empirical frequencies, Estimated frequencies (if present), or Fix frequencies.

Equal frequencies, as it says, sets each nucleotide frequency to being equal (0.25) as in the Jukes-Cantor (1969) or Kimura 2-parameter (K2P) models.

Empirical frequencies are calculated from the data matrix.

Estmated frequencies are read from the parameter file if present in the file. These can be samples from the posterior (e.g., from MrBayes) or maximum likelihood estimates (e.g., from PAUP*) or any program desired. For information on setting model parameters from a file and linking parameters and trees see the help page on molecular substitution model parameters.

If using point estimates (e.g., maximum likelihood estimates from PAUP*) the values can be entered manually using the Fix frequencies option or using a file containing the parameter values.



Rate variation

SIMMAP implements four different rate variation models: (1) a Discrete gamma distribution, (2) an Invariable sites model, (3) a Discrete gamma + Invariable sites model, and (4) a Site-specific rates model.

The interface for setting these options is shown to the right. This view can accessed by selecting the pop-up button at the top of the Model window.

The Discrete gamma has three options available depending on whether a parameter file has been opened.
  • The first option is simply to have no gamma rate variation and is the default in SIMMAP. The pop-up button selection for this option is None.

  • The second option is to use a fixed alpha value describing the gamma distribution. The pop-up button selection for this option is Fixed alpha and the value for alpha that is used can be entered in the available text box.

  • The third option is Alpha from file which is only available if a parameter file hase been loaded with values for the gamma parameter, alpha. Use of the gamma parameters are described in the help page on molecular substitution model parameters.
The Invariable sites has three options available depending on whether a parameter file has been opened.
  • The first option is simply to not accommodate invariable sites, and is the default in SIMMAP. The pop-up button selection for this option is None.

  • The second option is to use a fixed p (parameter describing the porportion of invariable sites) value. The pop-up button selection for this option is Fixed p and the value for p that is used can be entered in the available text box.

  • The third option is P from file which is only available if a parameter file hase been loaded with values for the proportion of inavriable sites. Use of the invariable sites parameter are described in the help page on molecular substitution model parameters.
The Discrete gamma + Invariable sites can be co-configured as just described. (Note: Currently, SIMMAP does not support combining the discrete gamma and/or invariable sites models with the site-specific models. This will change in future developments.)

The Site-specific rates has three options available depending on whether a parameter file has been opened. (Note: Currently, SIMMAP supports site-specific rates only for protein coding data, i.e., codon positions.)
  • The first option is simply to not accommodate site-specific rates, and is the default in SIMMAP. The pop-up button selection for this option is None.

  • The second option is to use a fixed set of site-specific rates. The pop-up button selection for this option is Fixed rates and the site-specific rate values are entered in the available text boxes.

  • The third option is Estimated rates which is only available if a parameter file hase been loaded with values for each site-specific rate. Use of site-specific rates parameters are described in the help page on molecular substitution model parameters.
Morphology Models: Overview

SIMMAP implements the Mk class of models for morphological or standard characters (Lewis 2001). When k=2 SIMMAP uses a symmetrical Beta prior in sampling character histories. If k > 2 then the bias parameter is 1/k. The morphology model also includes a prior on the overall evolutionary rate. SIMMAP uses a gamma prior. Both the bias and overall rate priors can be implemented as sampling from a distribution or from fixed values. For more details on morphology priors and configuring them see the help documents on morphology priors.

About Search System Requirements License Acknowledgements Contact
Simulating Character Histories   Tree Manipulation (Modifications)
Page Last Updated: 6 August 2008