Tutorial 4: Positive Selection
In this tutorial we will use the sample file called dna.xml. The sample files are contained in the Sample Files included with the software distribution. Red dots with lines in the images indicate the areas refered to.
A. OPEN THE FILE: (Figure 1) Open the file by selecting File->Open... and then navigate to the Sample Files folder and open dna.xml file. Once the file has been read you should see the following window open (Figure 1):
Close the window (Cmd-W). In this tutorial we will be examining all the characters except the last three which is the stop codon.
B. EXCLUDE STOP CODONS: (Figure 2) The next step is to exclude the last three nucleotides which encode the stop codon. SIMMAP 1.5 does not support the analysis of stops codons (please see information on the genetic code here). We will be using the Universal (default) genetic code in this tutorial. Characters can be excluded from the analysis in two ways. The first is to select Data->Include-Exclude Characters.... The following window should appear (Figure 2):
Select characters 769, 770, and 771 in the Included table and select the Exclude button. Close the window. Using this approach you have excluded the characters from the data matrix and will be asked when you quit whther you wish to save the modified data matrix.
Alternatively, if you wish to leave the data matrix alone you can exclude the characters only from this analysis in the Analysis window. This approach will be discussed below.
C. OPEN CONFIGURE MODELS: (Figure 3) Open the Models window by selecting Analysis->Configure Model... (Cmd-1).
(1) Select the Nucleotide tab if not already selected (see Figure 3).
(2) In this tuturial will be using the model parameters in the input file. To use these select the Use model definitions in the control file radio button. Alternatively, you could define a fixed model by selecting the Manual configuration radio button and then selecting the parameterization of the model.
D. CLOSE THE MODELS WINDOW: At this point we are finished configuring the models for the characters so go ahead and close the window.
E. CONFIGURE ANALYSIS - GENERAL TAB: (Figure 4) Open the Analysis window by selecting Analysis->Configure Analysis... (Cmd-2).
(1) Select the General tab if not already selected (see Figure 4).
(2) Since we are interested in an analysis of positive selection select the Positive selection radio button (see Figure 4).
(3) SIMMAP 1.5 by default analyses molecular data as nucleotides. In order to perform an analysis of positive selection the data needs to be defined as Codons (see Figure 4).
(4) We will be analyzing three codons in this tutorial for the sake of time. In this step exclude all of the sites except 4-12 (this correspond to codons 2, 3, and 4). Select all of the sites in the Included table using Cmd-A and the click on sites 4-12 using the mouse with the Cmd key depressed. Select the Exclude button. Notes on coding in SIMMAP 1.5: The number of characters included in the analysis must be 1) divisible by three, 2) not include stop codons, and 3) the program assumes the first nucletide in the included list is the first position of the first codon.
F. CONFIGURE ANALYSIS - SAMPLING TAB: (Figure 5)
(1) Select the Sampling tab (see Figure 5).
(2) Next we need to set the number of samples. Change the value in the Number of samples text box to 10 (see Figure 5).
(3) Since we would like to know whether the coding sequence and codons within the sequence are under positive selection we need to determine whether the statistics inferred could have arisen by chance (i.e., a neutral or purifying selective process), so we will be performing a predictive analysis. To perform a predictive analysis select the Perform predictive sampling radio button (see Figure 5).
(4) Since we are going to perform a predictive analysis we need to determine the number of predictive samples. Change the value in the Number of predictive samples text box to 10 (see Figure 5).
(5) To perform a fully hierarchal Bayesian analysis (using trees and parameters from a program such as MrBayes) the parameters and trees should be linked as they are draws from the joint posterior distribution (see Figure 5).
A number of other options are available. Please see the documents here describing saving data during this type of analysis and all the other options that are available.
G. CLOSE THE ANALYSIS WINDOW: At this point we are finished configuring the analysis so go ahead and close the window.
H. START THE ANALYSIS: We are now ready to run the analysis. This can be done by selecting Analysis->Run Analysis... (Cmd-R). At this point you should observe the progress indicator letting you know how long before the run will be finished. You may want to get a cup of coffee while it runs.
I. REVIEW THE RESULTS: (Figure 6) Now that the analysis is complete let's look at the results by opening the Positive Selection Statistics window by selecting Statistics->Codons->Positive Selection Statistics....
The following statistics are reported for each codon (identified by the three nucleotide site numbers) from left-to-right:
(1) N: The sample size, i.e., the number of mutational maps sampled for each codon.
(2) Ka: The posterior expectation of non-synonymous changes.
(3) Ks: The posterior expectation of synonymous changes.
(4) dn: The posterior expectation of the rate of non-synonymous changes.
(5) ds: The posterior expectation of the rate of synonymous changes.
(6) ω: The posterior expectation of the rate ratio of non-synonymous/synonymous changes.
(7) P-value: The predictive probability that the observed value of ω could have arisen simply as the result of the mutational process.
At the bottom of the window the gene-wide estimate for these statistics are reported.
These results can be saved by selecting Save Results... and the choosing a filename and location.
More information on the positive selection statistics can be found here.