Sorry for an uncharacteristically technical post.  But, I’ve produced an excellent example of a problem that’s been plaguing the widely-used phylogenetics program MrBayes and thought it might be of interest to the handful of systematists who read this blog.

I’ve been running analyses on the Azteca y’all sent after my desperate plea last month and noticed something odd.  I set MrBayes to do two runs of 4 chains each.  After 10 million generations they produced post-stationarity consensus trees that were topologically identical (that is, the avg. st. dev. splits frequency fell to .005 after 10 million generations and the two trees looked the same).  But the scale bars showed the two analyses off by an order of magnitude in estimating how fast the DNA sequences had changed.  Look:

A portion of Run 1: note the scale bars at right, each tick represents .01 changes/site

The corresponding portion of Run 2: an order of magnitude more evolution

In fact, the two runs agreed on nearly all parameters except for overall likelihood (Run 1, the slow one, was slightly more likely) and the rate multipliers:

PRSF should approach 1 for all parameters as analyses converge; 3 of my partitions had rate multipliers that were well off

What’s going on?

Lucky for me, this anomaly has surfaced before.   A timely paper in Systematic Biology dissects the problem:

…inaccurate branch-length estimates result from either 1) poor mixing of MCMC chains or 2) posterior distributions with excessive weight at long tree lengths. Both effects are caused by a rapid increase in the volume of branch-length space as branches become longer.

I’ve gotten results back before from MrBayes where the rates seem oddly fast.  Since the independent runs normally converge to the same (wrong?) answer, though, I didn’t think much of it.   This Azteca example is the first I’ve seen where the runs fall out both ways.

I hope the next version (purportedly called “RevBayes”, and due out in 2010) resolves the problem.  In the meantime, it’s probably a good idea to check MrBayes rate estimates against estimates from likelihood analyses to make sure they are comparable before publishing.

source: Brown, J. M., Hedtke, S. M., Lemmon, A. R., Lemmon, E. M. 2009. When Trees Grow Too Long: Investigating the Causes of Highly Inaccurate Bayesian Branch-Length Estimates. Systematic Zoology Advance Access published on December 10, 2009, DOI 10.1093/sysbio/syp081.