Environmental Microbiology 101: 16s rRNA Sequencing and the Microbial Species Definition
- akargol007
- Oct 3
- 8 min read
Environmental microorganisms are as varied as the environments in which they live. Classification systems are not always perfect, but they can be helpful in understanding how microbes are related to each other. Standardized systems and methods for classifying microbes also help environmental microbiologists predict function in a ecosystems based on the microbes they find there.
Classifying organisms: the three kingdoms and beyond
Life on earth is split into three extremely broad groups called kingdoms: bacteria, archaea, and eukaryotes. Members of a kingdom share only a few key features. Bacteria and archaea are very similar visually, and in fact were not identified as separate kingdoms until the use of genetic classification methods. Archaea were first named because they were thought to be older than bacteria, but the two actually emerged on earth around the same time.
Both bacteria and archaea are single-celled organisms that lack a nucleus, and rarely have any visible cell structures at all. They have a single chromosome containing all of the DNA needed to build the cell. Recent research suggests that the small, circular chromosomes of archaea might not be as simple as they appear and may instead have organizational structures called domains, mediated by specific proteins, which may help archaea adapt to shifts in environmental conditions (Takemata and Bell 2020). There is no evidence of a domain-style organizational structure in the bacterial genome.
Other major differences between bacteria and archaea are seen at the level of cellular processes. For example, the molecules involved in protein production differ, including the tRNA molecule involved in starting the process and the precise set of enzymes responsible for catalyzing the reactions.
The eukaryote kingdom includes everything else, from animals and plants to insects and invertebrates, and even a number of single-celled species. This group is defined by the presence of cellular structures which the other two groups lack. They have a nucleus, a small envelope that holds their DNA, and organelles, cell components that perform specific functions such as energy generation and protein processing.
The next taxonomic group beyond kingdom, and the one at which classification begins to provide more substantial information, is phylum. Members of a phylum share additional key features, though they may still not have very much in common. From there, the groups descend in specificity as follows: class, order, family, genus, and species. The use of sub-species, also called phylotype, is also gaining popularity as we refine our understanding of both macro- and microorganisms.
In macroorganisms, the difference between species can often be determined visually. Species-level distinctions are what separate humans, for example, from their close ancestors like Neanderthals. A common macro-world definition of species is two organisms that can breed with each other and produce viable (non-sterile) offspring, although there are exceptions to this rule. We also utilize visual characteristics and behaviors to make the distinction.
Even with all these tools, it is not always easy to distinguish between species of plants and animals. But the process becomes infinitely harder with organisms that can't be seen with the naked eye and do not have clear characteristics and behaviors to be examined. Below, I will describe how we obtain information about the identity of microbes and how the data is used to apply the concept of "species" to organisms that cannot fit neatly into our macro-scale perceptions.
Understanding microorganisms through standardized genetic analysis - the 16S gene
As I briefly mentioned in my first post in the Environmental Microbiology 101 series, most microorganisms cannot be cultured under laboratory conditions. Restricting ourselves to only culturable organisms limits our understanding of microbial communities, because looking at a community from a culture perspective can produce a completely different picture from a genetic study. In soils, for example, the most common bacteria found in culture-based studies are Streptococcus and Bacillus, but culture-independent methods reveal that these organisms make up only a small fraction of the total community.
As a result, modern microbial classification is almost always done through a genetic lens. The target gene for genetic analysis is another defining characteristic that separates kingdoms. In bacteria and archaea we use the 16S gene for classification, while in eukaryotes an analogous gene called 18S is used. Both genes encode subunits of the ribosomal RNA, or rRNA, gene.
Ribosomes are the structures that produce proteins, making the 16S/18S gene one of the most important in the genome. Because the gene is so vital, its sequence has changed very little over the billions of years since life first evolved. It doesn't experience the gradual genetic shifts that most other genes experience (see my post on 6PPD-quinone and microbial pollutant degradation for more details on how genes can change in sequence and function over time). The exception is several small sections of the 16S gene, called the hypervariable regions, which are tolerant to change without completely destroying the ribosome's function.
They hypervariable regions experience gradual change over time, and these reflect the greater changes happening to the rest of an organism's genome. By determining the sequence of the hypervariable regions, and then comparing it to other known organisms, we can gain an understanding of where a particular microbe sits within the grand tree of life.
Obtaining and processing microbial sequence data - OTUs to ASVs
A detailed discussion of DNA sequencing mechanics is beyond the scope of this particular post, although I may explore it in more depth in the future. Briefly, microbiologists first use specialized physical and chemical processes to separate DNA from other cellular components. In an environmental sample, the DNA from hundreds of different microbial species is pulled from cells and purified. Then we use enzymes and computers to determine the sequence of the genes, defining what nucleotide base (A, T, C, or G) is present at each position in the sequence. Tagging molecules called primers ensure that the correct gene, in this case 16S rRNA, is targeted in the process, while other genes are ignored. Each sequence obtained from a sample is called a read.
16S data has historically been processed by clustering reads into operational taxonomic units (OTUs) and then assigning taxonomy. Sequences are clustered to a specified similarity threshold, usually 97%, although 99% may also be used (Blaxter et al 2005). This means that if two sequences share 97% of more of their 16S gene sequence, they are grouped. A consensus sequence is defined for each cluster by selecting the most common nucleotide at each position in the sequence to produce an "average." Consensus sequences are then matched to a database of known organisms and assigned the identity of the sequence they match.
A new method of clustering has been introduced in the past few years which uses amplicon sequence variants, or ASVs, instead of OTUs. Amplicon is another name for a read. ASVs are a more precise and stringent clustering method than OTUs. Before clustering, sequences are corrected using one of several error models to determine the most likely "true" sequence for each read. Corrected sequences are then clustered based on exact similarity, meaning that if even one nucleotide in the 16S sequence differs, the sequences are considered to represent two different organisms.
Comparisons of OTU and ASV clustering have shown significant differences in the results depending on the method chosen. One study noted substantial differences in community diversity based on which method was used, resulting in different conclusions being drawn (Chiarello et al. 2022). While other studies found that the choice in clustering method did not change the broad interpretation of the data, small and potentially important differences were observed that might impact our conclusions about an ecosystem (Jeske and Gallert 2022).
In the context of environmental microbiology, we are still working through some issues surrounding the OTU to ASV transition. Environmental samples are complex and often contain inhibitor compounds that can interfere with purifying and sequencing the DNA, resulting in small but important errors. Even with complex error correction algorithms such as DADA2 and Kraken2 to correct the sequences, environmental samples can produce thousands of unique ASVs, each only representing a single organism (Jeske and Gallert 2022). While we know that environmental communities are diverse, they aren't that diverse, which suggests errors in sequencing that cannot yet be fully addressed by correction algorithms.
Keeping this in mind, my current preference is to use traditional OTU clustering for unknown environmental samples and ASVs for simple or well-defined communities. As the algorithms for error correction improve, and as those changes become reflected in the databases used for cluster identification, I am hopeful that we can move toward ASVs for environmental community studies as well.
Applying taxonomy to 16S clustering results
Let's return to our discussion of taxonomy as it applies to microorganisms. Because most microbes reproduce asexually and they are very difficult to assess visually, genetics are the main source of our species definitions. Two microbes are considered to be of the same species if they share 97% or more of their 16S sequence. Moving through the taxonomic classification system:
Organisms of the same genus share 95% similar 16S sequences
Families share 90%
Orders share 84%
Classes share 78%
And at the highest level, phylum, bacterial species still share 75% of their 16S sequence
While these numbers do not provide a complete picture, they do reliably tell us when two organisms diverged from each other on the evolutionary time scale. Microbiologists use this information in conjunction with understanding of metabolism trends at the family and genus level to understand the members of a microbial community and how they might work together.
Challenges associated with 16S-based approaches
The major problem with basing species definitions solely on 16S classification is that the 16S gene is just one gene. And while that gene can tell us a lot about how a microbe is related to other known organisms, it provides no direct information about the microbe's metabolism or other features.
Other molecular biology techniques can overcome this limitation by expanding what genes are studied. Whole-genome sequencing goes far beyond 16S to consider every gene in a sample. When used for a single organism or a simple community, this information can be very helpful. It can also be very helpful for environmental samples, but with so many unique organisms, the amount of data can also become overwhelming. It may become difficult to separate out what information is actually useful for answering your questions about the ecosystem.
Sequencing results are also database-dependent, which means they can change as the methods for data processing change. A set of sequences can be processed through a database and produce one list of microbes, and then re-processed months or years later with different results. The error correction algorithms also change rapidly and can impact the final determination of microbes. This problem is currently addressed by including the month and year in which the data processing was done in any scientific publication, and maintaining legacy versions of databases and algorithms that can be used by scientists looking to reproduce old results.
Conclusion
With the help of 16S rRNA sequencing, we can understand how microbes are related to each other and to other members of the universal tree of life. But there is so much more to microbes beyond their taxonomy. Microbiologists are interested in understanding not just identity, but microbial function and how organisms in the environment compete with each other, or work together, to create a balanced and harmonious ecosystem. My next post in the Environmental Microbiology 101 series will delve into some key microbial groups in soil and aquatic ecosystems, including their metabolic characteristics and common roles.
Want to learn more about methods for identifying environmental microorganisms in your terrestrial or aquatic ecosystem of choice? Reach out to AppliedMicrobio today!
References
Blaxter, M., J. Mann, T. Chapman, F. Thomas, C. Whitton, R. Floyd and E. Abebe. 2005. "Defining operational taxonomic units using DNA barcode data." Philos Trans R Soc Lond B Biol Sci 360(1462): 1935-1943. 10.1098/rstb.2005.1725.
Chiarello, M., M. McCauley, S. Villeger and C. R. Jackson. 2022. "Ranking the biases: The choice of OTUs vs. ASVs in 16S rRNA amplicon data analysis has stronger effects on diversity measures than rarefaction and OTU identity threshold." PLoS One 17(2): e0264443. 10.1371/ journal.pone.0264443.
Jeske, J. T. and C. Gallert. 2022. "Microbiome Analysis via OTU and ASV-Based Pipelines-A Comparative Interpretation of Ecological Data in WWTP Systems." Bioengineering (Basel) 9(4). 10.3390/bioengineering9040146.
Kirchman, D. L. 2012. Processes in Microbial Ecology. Oxford, United Kingdom: Oxford University Press.
Madigan, M. T., K. S. Bender, D. H. Buckley, W. M. Sattley and D. A. Stahl. 2017. Brock Biology of Microorganisms. New York, USA: Pearson.
Takemata, N. and S. D. Bell. 2020. "Emerging views of genome organization in Archaea." J Cell Sci 133(10). 10.1242/jcs.243782.


Comments