Detailed Example: Searching for the ML tree (i.e., "normal" searches)
Doing the searches (and knowing when to stop)
The usual goal of ML phylogenetic searches is to find the tree topology with the best likelihood. But, the only way to have any confidence that you've found it is by doing multiple search replicates and finding the same best tree multiple times (or at least multiple trees with very similar scores). If you want to know which is the best scoring tree across multiple runs, the easiest thing to do is to look at the log-likelihood scores that appear at the end of each of the .screen.log file and compare them across searches.
Running the program multiple times will often be necessary (and running multiple search replicates is ALWAYS necessary, either within a single program execution or across multiple). Running the program multiple times simultaneously can be an efficient way of using a computer with multiple cores or processors (see discussion here). The following describes the entire procedure of doing a set of searches and examining the results, and it assumes that the program was run multiple times. If the program was only run once with a number of search replicates, then much of the following is simplified.
Let's follow a series of example analyses:
Dataset : 150 taxon, 3188 column rRNA alignment
I started with a two-replicate search using the default program settings. Once it completed (this took about 70 minutes total on my laptop), the information that I want to look at is at the very end of the .screen.log file:
Completed 2 replicate runs (of 2). Results: Replicate 1 : -69675.8748 Replicate 2 : -69649.0250 (best) Final result of the best scoring rep (#2) stored in arb150.default1.best.tre Final results of all reps stored in arb150.default1.best.all.tre
The rather different scores and the fact that the output doesn't say "same topology" indicate that a different final tree was found by each of the two replicates. This is not particularly surprising for a dataset of this size, and doesn't necessarily indicate that anything is wrong with the program or its settings. This most likely means that the two replicates ended up at different tree "islands" or local optima, and that once it has reached either of them it cannot leave by simple branch swapping. This is a fact of life with any heuristic search algorithm, and can just as easily happen with other optimality criteria such as parsimony.
Even if the scores of both of the search replicates were the same, we would really need to do more searches to be confident that we've found the best trees. Clearly in this case we need to do more searches since the results are not the same, so I ran the program again with another two search replicates. The end of the .screen.log for run #2:
Completed 2 replicate runs (of 2). Results: Replicate 1 : -69649.0259 Replicate 2 : -69649.0245 (best) (same topology as 1) Final result of the best scoring rep (#2) stored in arb150.default2.best.tre Final results of all reps stored in arb150.default2.best.all.tre
This notes that the same final topology was found in both searches. The log-likelihood scores are very similar, which should always be the case when the same tree is found in multiple replicates.
A few things to note:
- identical trees will always give very similar scores (actually, exactly identical scores when branch lengths and parameters are fully optimized)
- but, very similar scores do not necessarily indicate that two trees are identical.
In this case the second replicate shouldn't be considered better in any important sense, despite the very slightly better score. The trees are the same, which is all that matters. Also note that the scores of both replicates in the second run are very similar to that of the second replicate of the first run. This doesn't necessarily mean that these three searches gave the same tree, but it is fairly probable. We'll see how to verify whether they really are the same tree later. We've gotten the same score in 3 out of 4 searches, which does give us some confidence that that is the score of the best tree or trees.
Still, we'll do another two replicates. Run #3 output:
Completed 2 replicate runs (of 2). Results: Replicate 1 : -69652.3843 Replicate 2 : -69649.0261 (best) Final result of the best scoring rep (#2) stored in arb150.default3.best.tre Final results of all reps stored in arb150.default3.best.all.tre
Again two different scores, but one of them is very similar to our previous best scores, so that is good. Another two searches:
Completed 2 replicate runs (of 2). Results: Replicate 1 : -69652.3825 Replicate 2 : -69649.0275 (best) Final result of the best scoring rep (#2) stored in arb150.default4.best.tre Final results of all reps stored in arb150.default4.best.all.tre
These look very similar to what we got in run #3. We got one more count of the best score that we know of.
A summary of the results of our eight searches across four program executions:
- 5 searches gave a score of about -69649.03. This is the best score that we know of, and based on the output of run #2 we know that at least two of the five are identical in topology.
- 2 searches gave a score of about -69652.38. This is probably a local optimum that searches can be trapped in.
- 1 search gave a score of -69675.8748. Another local optimum.
I would feel confident that we've done a good job of searching for the best tree for this dataset at this point. This is entirely subjective. Returning the same best score four or five times is good, regardless of how many replicates it takes. If computational time is not a limiting resource, then more search replicates will never hurt. See below for a discussion of what to do with problematic datasets.
The above example dataset is fairly large, and we saw evidence of local optima in treespace. However, in a fairly small number of searches we were able to find trees with very similar scores several times (and as we will see below, they also have identical topologies). However, some datasets may not perform so well. Repeatability of results across search replicates is the primary indicator of whether a heuristic method is doing well, and determines how many search replicates is "enough"
A few rules of thumb and things to keep in mind:
- Some (usually very large) datasets are very poorly behaved, and eight search replicates might result in eight different likelihood scores. In this case there are a few options, listed from best to worst. Which you choose partly depends on how much time and/or computational resources you have.
- Keep doing search replicates until the best score that you've seen across replicates is found at least twice. Ideally the topologies should be identical as well.
- Keep doing search replicates until you haven't found a new better scoring tree in some number of replicates. For example, if you've done 20 replicates and the best score you've seen is X, run another 5 replicates. If you don't find a score better than X, stop. If you do, update X to the new best value and run another 5 replicates. Etc.
- Search repeatability is correlated with the number of sequences in a dataset, but there are many examples in which datasets of 60 sequences turn out to be much harder than others four or five times their size. The number of searches that was adequate on one dataset may be insufficient for another. Decisions should only be made on the basis of actual search results.
- Features of datasets that seem to worsen search repeatability are:
- The presence of many very similar (or identical) sequences
- Low phylogenetic signal / low sequence divergence
- There isn't much that can be done about these, although it is best to condense identical sequences to a single exemplar.
- In some cases the results of different search replicates will be essentially identical in score (+/- 0.05 lnL) but different in topology. This is usually due to branches of zero length (GARLI's minimum is actually 1.0e-8), which itself is often due to very similar or identical sequences. The trees will look identical by eye because the zero length branches are indistinguishable from a polytomy. For the purposes of knowing when you've done enough searches, these differences across replicates can be ignored.
Following the series of eight search replicates across four executions of the program discussed above, we now have results strewn across a number of files. We would like to collect them for further investigation. Depending on what you want to do, there are many ways to go about this. Here I discuss several things that one might want to do with the results and the ways that I would go about them. Note that much of this would be simpler had we run the program once and specified eight search replicates, but we didn't really know in advance how many replicates we would need.
Note that you don't HAVE to do any of the following. You can just figure out which is the best scoring replicate and across runs/searches and if you have gotten it multiple times you can take it as your Maximum Likelihood tree.
First we probably want to compare the trees returned by each of the separate runs and get them into a single tree file. The final trees were stored in two files per run:
- The best tree across all replicates of a run is saved to the .best.tre file (i.e., only one tree appears regardless of the number of search replicates)
- The best tree found by every individual search replicate is saved to the .best.all.tre file
Post-processing with PAUP*
I usually use PAUP* to collect the trees and compare them. Note that I use the command line version and don't have a graphical version to even try this on, so you'll have to figure out how these commands translate to the menus if you use a graphical version. Even if you are using a graphical version, you can always type these commands into it. I would do the following, all in PAUP*
Loading the trees
- Start PAUP and execute your NEXUS datafile
- Execute one of the .best.all.tre files:
- Note that there is a PAUP block in the .best.tre files that does a number of handy things when it is executed, including setting and fixing GARLI's parameter estimates and telling PAUP to store and fix the branch lengths estimated by GARLI. This also loads the trees that are contained in that file. Note that the parameter values that are loaded are only from one of the GARLI search replicates, but the values should be very similar for all of them.
- Now use the gettrees command to load trees from each of the other .best.all.tre files. i.e.:
gettrees file= arb150.default2.best.all.tre mode=7;
- Note that you need to specify the "mode=7" to tell PAUP to keep the trees that are already in memory when loading more. If you mess up, you can use the "clear" command in PAUP to remove all trees from memory and start again.
Comparing the trees
- (OPTIONAL) Type "lscore". This tells PAUP to score the trees in memory, and it has already been told above to use GARLI's estimates of all of the parameters. PAUP will very quickly display log-likelihood scores that very nearly match those output by GARLI for each of the search replicates.
- (OPTIONAL) Type :
- This tells PAUP to score the trees again, this time estimating branch lengths itself. The scores you see will usually be slightly better than those output by GARLI, but typically only by 0.01 lnL or less.
- Now to compare the trees. I would use the Symmetric Difference (Robinson-Foulds) tree distance metric, which you get in PAUP with
- This displays the pairwise distances between trees. I won't go into exactly what this distance metric means, but a distance of zero indicates that trees are identical and larger values mean that the trees are less similar. The maximum distance for a dataset of N sequences is 2 * (N - 3). For our above example the important part of the output is this:
Symmetric-difference distances between trees 1 2 3 4 5 6 7 8 1 - 2 46 - 3 46 0 - 4 46 0 0 - 5 24 28 28 28 - 6 46 0 0 0 28 - 7 24 28 28 28 2 28 - 8 46 0 0 0 28 0 28 -
- After starting at that for a bit, we can see that trees 2, 3, 4, 6 and 8 are identical. Those correspond to exactly the same five replicates that gave the best score of about -69649.0261, so that is good news.
- If we just wanted to compare the best tree from each run, instead of from each replicate, we could go through the above procedure except with the .best.tre files instead of the .best.all.tre files. In that case the tree distance output would be:
1 2 3 4 1 - 2 0 - 3 0 0 - 4 0 0 0 -
- indicating that the best of the two replicates of each of the four runs gave exactly the same tree.
- If the program was only run once, then the output will already indicate which search replicates resulted in identical trees. However, it does not give a quantitative of how similar the trees are.
Saving the trees
If you've done the above to load the trees into PAUP, now we might want to save the best tree(s) per run, per replicate or across all runs/replicates to a single file. Note that if you only did one run with multiple search replicates, you don't need to do this since the best overall tree is already in the .best.tre file, and the best trees from each replicate are in the .best.all.tre file.
- First we might want to specify an outgroup so that PAUP orients the trees appropriately when saving them (even if they are oriented correctly in the file that PAUP loads them from, they will be oriented with the default outgroup of taxon 1 when saved). Type something like:
outgroup 1-3 6
- The outgroup could also be specified in a PAUP block in the datafile.
- Now we save the trees.
savetree file= arb.alltrees.tre brlens=user
- CAREFUL HERE! By default PAUP will save the tree with parsimony branch lengths instead of maintaining GARLI's estimates. Setting brlens=user tells it to maintain GARLI's lengths when saving. Alternatively if you want PAUP to estimate the branch lenghts when saving the trees you'll need to set the criterion to likelihood first (set crit=like) and ensure that the model parameters are set appropriately.
- If you want to only save one of the trees (i.e., the very best tree), you can do that too. If tree 3 was the one you wanted:
savetree file= arb.veryBest.tre blens=user from=3 to=3
Other techniques for collecting results
There are many other ways that you might want to explore or collect your results.
Manually combining treefiles
Instead of using PAUP to load the results of multiple runs and to then save them to a single file, you can simply manually combine them. If you open one of the .best.tre files in a text editor, you will see something like the following:
#nexus begin trees; translate 1 EscCo143, 2 EscCo145, 3 SleTyp15, 4 SlePara4, . . . 150 Pd2Occul; tree bestREP2 = [&U][!GarliScore -69649.024503][!GarliModel r 0.953187 2.396777 1.229445 0.997765 3.566751 e 0.236189 0.233233 0.279949 0.250629 a 0.718326 p 0.098152 ](1:0.00000001, (((48:0.06527213,49:0.03476198):0.44628288, ... <etc>; end; begin paup; clear; gett file=arb150.default.best.tre storebr; lset userbr nst=6 rmat=(0.953187 2.396777 1.229445 0.997765 3.566751) base=( 0.236189 0.233233 0.279949) rates=gamma shape=0.718326 ncat=4 pinv=0.098152; end;
This looks ugly, but it is not that complicated. The translate command shows the correspondence between taxon names and the numbers in the tree descriptions. The section that starts with begin paup; sets the parameter values in PAUP if this file is executed. The line that begins with tree and ends with a ; (semicolon) is the actual tree description itself. If you open each of the tree files that you want to combine, you can simply copy and paste the lines that begin with tree ... from each file one after another into a trees block in a single file. That file with whatever set of trees you chose can then be read by programs that read Nexus tree files, such as FigTree, Mesquite, PAUP, etc. The fact that multiple trees may have the same name (the bestREP2 in the example above) will not cause problems in any software that I am aware of.
If you have done many runs or ran the MPI version of the program you may have trees scattered across many files. Manually copying and pasting the trees into one file may not be a very inviting option.
If you have access to a Unix style command line (including OS X), you can very easily grab all of the trees from multiple files and copy them into a single file with one command:
grep -h ^tree *.best.tre > destFilename.tre
(or replace .best.tre with .best.all.tre if you want the results of every search replicate). Now each of the tree descriptions appear one after another in the file destFilename.tre. Just add the first part of any of the other tree files (through the end of the translate block) to the start of that file and an "end;" at the end of the file and you'll have a fully functional tree file.
Alternatively you can easily make a PAUP block that loads all of the trees. At the command line, type
for i in *.best.tre <return> do <return> echo "gett file = $i mode=7;" <return> done > myPaupBlock.nex <return>
This will make a file named myPaupBlock with commands that load all of the tree files in the current directory with a name that ends with .best.tre into PAUP. Simply execute your dataset in PAUP and then execute this file. Although it isn't strictly necessary, to be technically correct you should edit this file and add
#nexus begin paup;
at the start of the file, and
at the end.
Detailed Example: A bootstrap analysis
Doing the searches
Setting up a bootstrap analysis is not very different from doing a normal GARLI analysis. Simply set bootstrapreps to whatever number of replicates you want. Bootstrap replicates are slightly faster than normal searches, but doing 100 bootstrap replicates will still take almost 100 times longer than a single search. Therefore, it makes sense to "optimize" your search settings somewhat so that each search is intensive enough to find a good tree but takes as little time as possible.
(I'm working on a discusion of search tuning currently)
Making the bootstrap consensus
Once a number of bootstrap replicates have been completed, you'll want to calculate the bootstrap consensus tree. Unfortunately, GARLI does not do this itself. It must be done by another program such as SumTrees, PAUP*, CONSENSE of the PHYLIP package or Phyutility. Other options exist that I don't have direct experience with.
SumTrees is a nice command-line program by Jeet Sukumaran that is extremely useful for making bootstrap consensus trees from GARLI output. It requires an installation of Python 2.4 or newer (the newest Python version is 2.6, and OS X 10.5 Leopard comes with Python 2.5). Note that SumTrees replaces Jeet's previous program BootScore, which included more or less the same features with regards to bootstrap consensuses. Instructions for obtaining and installing it are available here: http://www.jeetworks.org/programs/sumtrees
The best feature of SumTrees is that it allows visualization of bootstrap values on a given tree. In other words, it allows you to place the values on a fully resolved tree obtained elsewhere (best ML tree, parsimony tree, Bayesian consensus tree, etc.), rather than visualizing them on a consensus tree itself. This is excellent, since placing the bootstrap values on a tree that you already have is often exactly what you want to do! The other nice thing that SumTrees allows is calculating a majority rule consensus that also includes branch length information.
- For a normal majority rule bootstrap consensus with SumTrees you simply provide the name of the GARLI bootstrap tree file and specify an output file:
sumtrees.py mybootrun.boot.tre --output=myconsensus.tre
- To place the bootstrap values on a given tree, for example the best tree found by GARLI during non-bootstrap searches:
sumtrees.py mybootrun.boot.tre --target=mysearch.best.tre --output=supportOnBest.tre
If you have bootstrap trees in multiple files (because you did multiple runs or used the MPI version), SumTrees can also easily handle this. Simply list the multiple files like this:
sumtrees.py mybootrun1.boot.tre mybootrun2.boot.tre <etc> --output=myconsensus.tre
Once your support values have been calculated and the output tree file has been written, you can open it in a program such as FigTree or TreeView to see the values. SumTrees has many other useful options and ways of formatting the output (support as percentages vs proportions, number of decimal places in support values, etc). See the sumtrees website or enter
for more information.
If you are familiar with PAUP* doing the consensus is easy. Simply execute your dataset, and then load the trees with the gettrees command:
gettrees file= myfilename.boot.tre; gettrees file= myfilename2.boot.tre mode=7; etc.
If you've only done a single bootstrap run, then only the first line is necessary. The "mode=7" tells PAUP to keep the trees that are already in memory when loading more from file. Otherwise it replaces them with the ones from file. Then, to do a majority rule consensus:
contree /strict=no majrule=yes treefile=mybootcon.tre
This will show the tree on the screen with all bootstrap values over 50%, but without branch length information. It will also save the consensus tree to the file "mybootcon.tre". Other things that you can add to the end of the contree command are "LE50", which tells it to include as many groupings as possible in the consensus, including those with support < 50%. Going in the opposite direction, you can increase the support cutoff for groups to appear by adding "percent=80", which would only show groups with > 80% bootstrap support.
IMPORTANT NOTE: PAUP* cannot save the tree with the support values included (unless it did the whole bootstrap itself). It also cannot save or display the consensus with branch length information included. These are the downsides of using PAUP* to calculate the consensus. If you want to do either of these things you'll need to use Sumtrees or another program.