Home
Home

 Help Documentation
 Current document  

Input Files

The full heirarchical Bayesian treatment requires three or two files for analysis of molecular and morphological data, respectively. SIMMAP can be used with only a data file and tree file (or a data file containing a tree block after the data block) to explore character evolution. For example, an empirical analysis might use the maximum-likelihood tree and MLE model parameter estimates. The data file should contain a DNA or morphological data matrix, the tree(s) file should contain a trees block containing trees in the Newick format (taxon names imbeded within the Newick tree representation are not allowed) preceeded by a translate block, and a parameter file (molecular data only).
Once a file(s) has been opened specific file information can be seen by requesting more file information from the Window menu by selecting Show File Inspector, or clicking the More Info buttons in the Data View and Tree View windows. The Show File Inspector window (shown to the right) displays four tabs (Data, Trees, Histories, and Parameter) with information describing each of the different files read by SIMMAP.

The next section describes the specifics of each type of file.

Specifics on each file type:

Data File
A data file can be opened by selecting Open Data... in the File menu. The data file should be a text file that contains the matrix of characters to analyze (SIMMAP uses the following file extensions to identify data files, .nex or .mcmc). The file should conform to the basics of the Nexus format standard. The following is an example of the minimum format requirements of a data file.

#NEXUS

begin data;
   dimensions ntax=4 nchar=5;
   format datatype=dna interleave=no;
      matrix
      cow   ACGTT
      horse ACGTA
      Human AGGAA
      whale ACGCC;
end;

Currently, the data matrix can not be interleaved. However, the file can contain spaces intersperced within the sequence. In addition, the symbols '?', '-', and N's are treated as uncertain and all possibble states have equal probability while calculating conditional likelihoods. The '.' character can be used as an abbreviation for the last state fully defined in the column.

Some of the standards of the Nexus data block format are NOT compatible with SIMMAP. As said above, the data file can not be interleaved. SIMMAP does not accommodate the use of symbols when using standard or morphological characters. The acceptable data type tags are: dna, rna, and standard. Currently, SIMMAP does not model amino acid characters although this is slotted for future development.


Trees File
A tree file can be opened by selecting Open Trees... in the File menu, or by including a trees block in the data file. The trees block must follow the data block. The tree file is a text file in standard Nexus format with a translate command followed by a listing of trees. Currently, SIMMAP requires that the file bear one of the following extensions: *.t, *.tree, *.trees, *.tr, and *.tre (this constraint will be relaxed in some future version of SIMMAP). Taxon names in the Newick tree representation are NOT permitted and the translate command must use integers to identify taxa in the trees. The following is an example of the format requirements of the trees file.

#NEXUS

begin trees;
   translate
      1 cow,
      2 horse,
      3 Human,
      4 whale;
   tree 1 = ((1:0.1,2:0.1):0.1,(3:0.1,4:0.1));
   ...
   ... [More Trees]
   ...
   tree 10000 = (((1:0.1,2:0.1):0.1,4:0.1):0.1,1:0.1);
end;

A full Bayesian analysis is one in which a particular character history is not conditioned on a fixed topology or set of parameter values but averaged across them in proportion to their probability. The tree file should then contain a sampling approximation of the posterior distribution of trees and branch lengths. This can be obtained using a program such as MrBayes, BEAST, and BAMBE which use MCMC to sample the posterior distribution.

Often, not all of the trees in a file are desired. For example, if the file contains the output of an MCMC analysis it may be desirable to exclude a number of the initial trees as burnin. This can be accomplished when opening a tree file (read more below) or at the time of analysis (see the help documents on simulating histories). Reading trees into memory that are not to be used is memory expensive and should be avoided on machines that have limited memory capacity.

To exclude an initial series, burnin sample, of trees from the file the following options in the Open Trees File window are available.

  1. All Trees
  2. Read Trees [Enter Number] through [Enter Number] (this last can be left blank indicating the user wishes all trees from the start value to the end of the file.
  3. Use branch lengths (if present) check box

If branch lengths are not present in the file, or if the user selected to ignore them, a window will be displayed (shown to the right below).
The following options are available:
  1. Fixed lengths - a fixed length, v, is assigned to each branch.
  2. Fixed tree height - a fixed height, T, is assigned to each tree in which it defines the distance from each tip node to the root. The resulting tree is ultrametric.
  3. Random uniform - branch length values are assigned by being radomly drawn from a uniform distribution on an interval, U[x,y], determined by the user (the interval must be positive). The user can select to sort the values and assign the values by forcing small or large values to the tips.
  4. Random exponential - branch length values are assigned by being radomly drawn from an exponential distribution defined by the parameter, λ, determined by the user (the parameter must be positive). The user can select to sort the values and assign the values by forcing small or large values to the tips.
  5. Birth-death (Rannala & Yang, 1996) - branch length values are assigned by sampling from the birth-death method described by Rannala and Yang (1996). The birth-death distribution is described by four parameters: the speciation rate, λ, the extinction rate, μ, the taxon sampling rate, ρ, and the tree depth or height, T. The speciation rate, taxon sampling rate, and the tree depth must be >0.

Histories File
A raw histories file generated by SIMMAP can be read into SIMMAP for visual exploration. This topic is dealt with in more depth on the saving histories help documents. Briefly, the file is a basic Nexus file containing a list of Newick formatted trees. The following is an example of a tree histories file.

#NEXUS

begin smptrees;
   translate
      1 cow,
      2 horse,
      3 Human,
      4 whale;
   tree ctree[1] = ((1:{A,0.1},2:{A,0.1}):{A,0.05:C,0.05},(3:{C,0.1},4:{C,0.1}));
   ...
   ... [More Trees]
   ...
   tree ctree[1] = ((1:{A,0.1},2:{A,0.1}):{A,0.05:C,0.05},(3:{C,0.1},4:{C,0.1}));
end;

The primary differences are the ctree[] tag following the tree tag, and the splitting of the branch length into a character history enclosed in curly braces, { }. First, the ctree[] tag indicates that the tree to follow includes character history information instead of branch length values. The values in brackets, [ ] (see above), indicates the character number that was mapped if molecular data or character number, tree number, number of states, bias rate parameter (if more than two character states this value is 1/n states), and gamma rate parameter if morphological data. Second, where the branch length is normally represented the character history is enclosed in curly braces, e.g., {A,0.05:C:0.05}, represents a top-down (left-right) coding of the character history along the branch. For additional information see the help pages on saving character histories.

Parameters File
A parameter file can be opened by selecting Open Parameters... in the File menu. This file is white space delimited (space, tab, etc.) list of parameters in text format. The first line must be a header row (also, white spaced delimited). The header is not case sensitive; pi(A) is equivalant to PI(A). Every subsequent row should contain a value for each header value. The file can not contain values for multiple models. A comment block is allowed as long as it is contained within an opening square bracket and a closing square bracket (e.g., [comment here]) and occupies the first line only.

You can include a header for each parameter of the GTR and then set some of the values to 1.0 to create a sub-model of the GTR. (While all values from the file are read into memory for each row the actual model used can be configured such that only certain column values are used while others are fixed.)

The default header values in SIMMAP are those from MrBayes v3.0 (shown below). If these are not the desired headers each parameter can be assigned a unique string by selecting Parameters Headers... in the Models menu. For additional information see the help documents on molecular parameters.

Default header/parameter association:

ParameterHeader
Freq Api(A)
Freq Cpi(C)
Freq Gpi(G)
Freq Tpi(T)
Kappa kappa
Rate A to Cr(A<->C)
Rate A to Gr(A<->G)
Rate A to Tr(A<->T)
Rate C to Gr(C<->G)
Rate C to Tr(C<->T)
Rate G to Tr(G<->T)
Alphaalpha
P-Inv Sitesp-inv
Site Specific 1m{1}
Site Specific 2m{2}
Site Specific 3m{3}
About Search System Requirements License Acknowledgements Contact
Genetic Codes   Keyboard Shortcuts
Page Last Updated: 6 August 2008