An example of why DNA Barcoding is still impractical for putting names to unknown samples

Check out this phylogeny from a recent paper on using DNA to identify Brazilian Pheidole:

Mitochondrial sequences taken from Pheidole in the Brazilian Atlantic Forest (blue) placed in a phylogeny with identified Pheidole from Genbank (gray). Modified from Figure 4 in Ng’endo et al (2013).

The researchers took a large sample of Pheidole- a common, diverse ant genus found in warmer regions worldwide- from a Brazilian forest reserve, sequenced the mitochondrial COI gene, compared morphogical identifications with identifications suggested by the genetics, and then compared the genetics against known, identified Pheidole DNA. Not a single unknown sample came within 4% of matching an identified reference. That’s actually kind of amazing. Ants are tremendously diverse, and a great many species remain to be discovered.

While the notion we can infer species identity from a snippet of DNA seems simple enough, in practice DNA barcoding remains impractical. This is especially true in large understudied groups. Without a well-curated reference database, most mystery samples will fall between the cracks.

I am not making an argument against barcoding. The technique shows great promise when wedded to a strong taxonomic foundation. But that’s just it. The taxonomic foundation- knowing what species look like, where they are found, and which Latin names apply to them- is a prerequisite for all the rest, and it’s still woefully inadequate.

Pheidole vafra from Paraná, Brazil

source: Ng’endo RN, Osiemo ZB, Brandl R. 2013. DNA barcodes for species identification in the hyperdiverse ant genus Pheidole (Formicidae: Myrmicinae)Journal of Insect Science 13:27.

11 thoughts on “An example of why DNA Barcoding is still impractical for putting names to unknown samples”

  1. What do you think about using it to analyze stomach contents for Amazonian birds that forage in the leaf litter?

    1. I think that’d depend on the level of resolution needed, Jeff. You can probably use barcoding successfully to make a subfamilial or generic ID, but considering the general state of taxonomy in tropical invertebrates you’re unlikely to get far at the species level.

      1. Now the descriptions of diet are “spiders, orthopterans, etc”… so even to genus would be a step forward. Thanks for the reply. Something to think about.

    1. I work with a group with high generic diversity but mostly few species per genus. Barcodes do cluster the morphologically defined species based on genetic similarity, but do not show any evidence for a fixed percentage difference between species. Barcodes are useful for catching misidentifications and mislabelling (and pseudogenes) and sometimes for suggesting cryptic species in ‘morphologically variable species’, but not always.

      For most of the species where I have only a few barcodes from a few sites it looks good. But for the bisexual species for which I have the most barcodes (e.g. 1-10 individuals from a dozen sites), the amount of genetic divergence is often very high – and the more I sample the more divergent haplotypes show up. This may be saying something interesting about recolonization by long separated populations after glaciation, but I don’t think it is telling me that I have clusters of ‘cryptic species’. They look alike and inhabit the same sites.

      The clonal lineages also are fairly conservative morphologically, but for some (probably very ancient lineages) their barcode regions are all over the place. It would be silly to split them into species that cannot be differentiated morphologically just because one mitochondrial gene has a lot of variability.

      I’m not sure how the hypothesis that 658 bp of one mitochondrial gene would be the solution to the species problem ever gained any traction – it seems more like magical thinking than science.

  2. Julie Stahlhut

    In this case, barcoding can assign the unknowns to clusters, providing morphological taxonomists with a pre-sorted set of subjects accompanied by testable hypotheses. Unmatched barcodes can be an important clue to cryptic or undescribed species diversity, but barcodes alone aren’t sufficient to put a name on a cryptic species (nor to tell whether the distinct clusters really delimit species that have never before been described).

    1. Geneticists, I would add, need to sequence the “significant” segments of the various types of DNA if they cannot sequence all of it.

      For a group like ants where chemical production is so important, these code segments, for instance, would be better targets than the seemingly (to ignorant me) random selection of mDNA employed here. Instead of looking at DNA employed in who-knows-what (and possibly “junk-dna”), surely it is time to at least try to target sequences likely used by the animal to separate species. Has no one looked to see what regions in the larger DNA repertoire the between species associated variability lies ?

      BTW, isn’t the use of the term “barcode” kinda stupid ?? You can’t make this stuff up. Whenever I scan a barcode I always get an ant sequence rather than the item name & price at my local Walmart !!

  3. Alejandro Merchan

    I agree that barcoding in itself is just a tool. And should used as such, coupled with taxonomic, ecological and behavioral work. However, faced with daunting numbers of tropical insect samples, barcoding can give you a quick indication of possible species, can show you genetic divergence when the morphology doesn’t suggest there is any, and so on. We should emphasize a holistic approach and teach that just focusing on COI, for example, won’t solve of the puzzles. Great post!

  4. Karl Magnacca

    I’m not clear on what point is being made here. The authors apparently never tried to attach the names of previously-described taxa, they only sorted them to morphospecies. And those lined up fairly well with the sequences (two pairs of morphospecies with identical sequences, and one pair of potential cryptic species or high mtDNA variation). Given what you say about the diversity and large number of undescribed species, it would seem unsurprising that no sequences match with any that has been sequenced (the one that they were able to put a name on, P. lucretii, isn’t included in the reference sequences).

    Frankly, I think the biggest problem is getting the names right on the reference sequences in the first place. Many times in Genbank you can find examples of wildly different sequences listed under the same species name, and investigation shows that one was misidentified.

  5. Pingback: Links 5/31/13 | Mike the Mad Biologist

  6. I think what the important thing to take away is that using one gene isn’t practical for barcoding and is just bad science really. If you look at any bar-coding based phylogeny, any good one at least, several genes are employed, not just one.

Leave a Reply