Xml Input File
Assign Branch Lengths
Often we would like to determine whether a gene or specific codons in a gene are under positive selection (ω > 1). Numerous methods are available for this type of analysis. SIMMAP 1.5 implements one such method: the use of stochastic mutational mapping to determine the number of non-synonymous and synonymous changes, rates of these changes, and the rate ratio ω. The information below describes, briefly, an overview of the method, and all of the options available during this type of analysis.
Brief Overview of Method
How do we go from a mutational map to inferences about positive selection?
To explain the method I use the example of a single branch in a tree (see Figure 1 below). From this it should become obvious how the method scales to a full tree and across sites.
The first step is to simulate a mutational map for the 1st, 2nd, and 3rd nucleotide positions in the codon. In Figure 1 we can see that position 1 is in state C along the branch, position 2 starts in state A and changes once to state T, and position 3 starts in state G and changes to state A.
Next, we rescale these intervals to the third position length and we get a codon map as shown in Figure 1 in which we observe the following substitutions, CAG -> CTG -> CTA.
Finally, from this map we can count the number and types of codons changes, the length of time spent in each codon state, and the number of non-synonymous and synonymous sites for each observed codon states. Using these values we can calculate the relevant selection statistics (see here for specific on the statistics).
Configuring the Substitution Model
For the positive selection analysis you will need to select the Nucleotide tab to configure the substitution analysis if it is not already selected.
SIMMAP 1.5 offers two options for configuring the substitution model used during a positive selection analysis: Use model definitions in configuration file and Manual configuration (see Figure 2).
- Use model definitions in configuration file: If the input control file has parameters these can be used in analysis by selecting this option.
- Manual configuration: Alternatively, you can opt to manually configure the model. All possible sub-models of the GTR are supported.
- Number of substitution rates: select the number of types.
- Kappa: select the transition/transversion rate (Nst=2).
- Instantaneous rates: select the rates between nucleotides (Nst=6).
- Nucleotide frequencies: select Equal, Empirical, or Fixed. If the values are fixed you will need to enter the values for F(A), F(C), and F(G).
- Rate variation: select Equal, Fixed, Gamma, Inv, or Gamma+Inv. For Fixed values simply enter the rate multiplier in the Rate text field. For Gamma, Inv, and Gamma+Inv simply enter the parameter values for these distributions in the desired text fields.
Configuring the Analysis
- Initial random seed value text field: An initial seed value for the random number generator is entered here (see bottom of the window shown in Figure 3).
(A) General Tab -
- Positive selection button: This button must be selected in the Analysis Type box to perform an analysis of positie selection.
- Excluded/Included character tables: These tables and the associated buttons, Include and Exclude, are used to include or exclude characters from the analysis. The number of characters in the Included table must be divisible by 3 and the program assumes that the first character in the list is the nucleotide at the first position of the first codon, second is the second position of the first codon, etc.
- Codons switch: This switch must be activated for an analysis of positive selection. If the switch is left as Singelton the analysis will be report an error.
- Save individual statistics to file switch: By activating this switch and using the Set button to define a file to write the output to, results in the selection statistics for each replicate, being written to the file. See here for more details on the statistics.
- Save summary statistics to file switch: By activating this switch and using the Set button to define a file to write the output, results in a summary of selection statistics averaged across each replicate, being written to the file. See here for more details on the statistics.
- Save mutational maps to file switch: By activating this switch and using the Set button to define a file to write the output, each mutational codon map will be written to the file.
(B) Sampling Tab -
- Number of samples text field: Defined the number of mutational maps sampled for each tree and parameter combination.
- Perform predictive sampling switch: Selecting this switch activates the predictive distribution analysis of significance for the ω statistic. See here for a description of predictive densities/distributions.
- Number of predictive samples text field: Defines the number of predictive samples to perform. Generally, this should be equal to or greater than the value in the Number of samples text field or equal to P-value precision desired (e.g., 10 = 1/10, 100 = 1/100, etc.).
- Save predictive maps to file switch: By activating this switch and using the Set button to define a file to write the output, each predictive mutational codon map will be written to the file.
- Save predictive statistics to file switch: By activating this switch and using the Set button to define a file to write the output, each of the posterior expectations of statistics is saved to the output file. These values can be used to test for significance of different aspects of selection captured by the different statistics that are not calculated by the program.
- Trees box: This box allows the user to define which trees in the input configuration file to use. Two options exist. First, Use all trees. Second, Use trees numbered ... to ... allows the user to define a subset of trees from the input configuration file.
- Model parameters box: The options in this box allow the user to define which parameters to use, and how they should be used relative to the tree to used defined in the Trees box. First, you can Use all model parameters. Second, you can Use parameters ... to ... which allows the user to define a subset of the parameters from the input configuration file. Lastly, if the number of defined parameters equals the number of trees you can select to Link parameter order to tree order. If the numbers differ this option will return an error. Otherwise, the simulations will evaluate each parameter against each tree. Please be aware that this may results in very lengthy analyses.
Running the Analysis
To run the analysis select Analysis->Run Analysis... (Cmd-R). At this point you should observe the progress indicator letting you know how long before the run will be finished. You may want to get a cup of coffee while it runs.
Reviewing the Results
The results are displayed in the Positive Selection Statistics window (this can be opened by selecting Statistics->Codons->Positive Selection Statistics... in the main menu) or in the output files selected during the analysis (see above).
For a more detailed description of the statistics written to the output files and to the Positive Selection Statistics window see here and here for information from the tutorial.