Metagenomics can also help to address whether rare species play an important functional role in soils

This strategy is being employed by members of a European initiative, Meta Explore, who are screening fosmid clones from a variety of environmental samples to access enzymes of interest to industry, including chitinases and dehalogenases: . Another approach is to sequence subsets of the metagenome, such as collections of ribosomal RNA signature sequences. For example, based on 16S rRNA gene sequence data, we have developed a relatively good understanding of the species diversity and distribution of specific bacterial and archaeal phyla in different soils. Further, based on work from Noah Fierer, Rob Knight, and their colleagues at the University of Colorado, Boulder, we know that pH and salinity are major drivers of microbial biogeography. From these and other studies, we also know that soils contain high abundances of Acidobacteria, whose 26 subgroups vary in abundance from one soil type to another. Also, some phyla are more prevalent in a given soil type than in others. More generally, databases of 16S sequences are yielding insights into how chemical and physical parameters correlate with microbial distributions in soils. Here, I use the term metagenome to refer to sequencing of total community DNA, including both phylogenetic and functional genes, while taking a shotgun-sequencing approach. Although few shotgun soil metagenome studies are published, more are anticipated during the next year as investigators take advantage of recent advances in sequencing instruments, for example, using 454 pyrosequencing and Illumina technologies. These 2nd-generation sequencing approaches generate megabases to gigabases of sequence data, respectively,blueberry pot in single runs with relatively short read lengths of approximately 400 to 100 bp, respectively.

Other sequencing technologies recently developed, including the Pacific Biosciences platform for sequencing single molecules of DNA, holdpromise for generating longer sequencing read lengths.In a project involving my group at the Lawrence Berkeley National Laboratory, James Tiedje and his colleagues at Michigan State University, and the Joint Genome Institute , we are using a combination of second-generation platforms to sequence DNA from microbes in soil samples from the Great Prairie of the United States, including native prairie and adjacent cultivated soils from Wisconsin, Iowa, and Kansas . This project aims to determine the impact of land management on soil microbial communities and their functions, including cycling of carbon and nitrogen. One of the sites, Kansas native prairie, is also the focus of another project that is specifically addressing the impact of altered rainfall patterns due to climate change on carbon cycling processes in the Great Prairie . The Kansas prairie metagenome that was sequenced at JGI currently has the largest amount of sequence data of any soil metagenome to date, approaching 400 Gb of Illumina sequence, and will serve as a resource for this project. Also in collaboration with JGI, we sequenced DNA extracted from Alaskan permafrost soil samples collected by Mark Waldrop from the U.S. Geological Survey . The aim is to use metagenomics to gain an understanding of the impact of climate warming-induced thaw on the microbial degradation of carbon reserves that have been trapped in permafrost for thousands of years and that have potential to contribute large amounts of greenhouse gases to the atmosphere. Other ongoing soil metagenome sequencing projects include several that focus on field sites for which there is substantial temporal environmental and climate data. For example, the UK Rothamsted Field Station is one of the longest running field stations in the world and has served as the site for several metagenome sequencing projects.

One of these projects, “DeepSoil” , is sequencing DNA from a longterm grassland and an adjacent fallow site at Rothamsted. The overarching goal of this sequencing effort is to establish the long-term impact of plants on the soil microbiota. Another project at Rothamsted is a French metagenome sequencing project, Metasoil, coordinated by Tim Vogel and Pascal Simonet of the Ecole Centrale de Lyon, France. The Metasoil project is sequencing DNA from the Park Grass site at Rothamsted that was established in 1856. Their strategy relies on constructing and sequencing a fosmid library in addition to shotgun metagenome sequencing. Cheryl Kuske and coworkers at the Department of Energy Los Alamos National Laboratory, in collaboration with JGI, are sequencing soils from selected free air-carbon dioxide enrichment sites in the United States. These sites were established to determine the influence of increases in atmospheric CO2 levels due to climate change on terrestrial ecosystems. In addition, Folker Meyer and coworkers at the DOE Argonne National Laboratory are sequencing metagenomes from several different U.S. soils that were collected across a range of habitats to determine which microorganisms and functional processes predominate in different soil ecosystems. Together these soil metagenomics projects will be a tremendous resource to the scientific community and will provide a much greater understanding of microbial diversity and functions in soil.Although the sequencing of DNA is no longer a bottleneck, large amounts of sequence data generated from analyzing highly diverse soil communities are proving a challenge to accommodate. This issue is exacerbated by the need to cope with short reads—for example, 75–125 bp—that arise from analyses using the Illumina instrument. Thus, better algorithms, new bio-informatics tools, and “terabytes” of computer storage are required. Increased access to supercomputers, such as the National Energy Research Scientific Computing Center at the Lawrence Berkeley National Laboratory, can help.

For instance, we used NERSC to perform BLASTX of our permafrost metagenome data . This analysis took approximately 800,000 core hours, or the equivalent of more than 85 computer years, which lasted 2 weeks using the NERSC supercomputer and nodes at JGI. Cloud computing will further help to reduce this bottleneck. Another challenge is the large numbers of errors that different sequencing platforms generate. How can we differentiate sequencing errors from microheterogeneity within DNA samples from soil microbial communities? Also, there can be difficulties with different steps in sample processing. For example, each DNA extraction procedure can introduce its own bias with respect to sample loss or preferential lysis of some members of the microbial community over others. The most commonly used extraction procedures rely on beating with microscopic beads to lyse cells, although pressure lysis is another attractive option. Ideally, different laboratories should each use the same extraction protocol. However, despite the availability of commercial kits, laboratories typically follow their own favorite DNA extraction methods. Another problem lies with soil samples that have low biomass or high levels of contaminants such as humic acids that result in low DNA yields. For example, permafrost soils yield relatively little DNA in our experience. However, amplifying DNA before preparing a library might help. Two DNA-amplifying methods are used: multiple displacement amplification and emulsion PCR . Of the two, the MDA approach is subject to considerable bias, whereas emPCR should be less biased because each template is separately amplified. However, to my knowledge, no one has directly compared the two methods. Sometimes the volume of data falls short for conducting a metagenome analysis. For instance, when Susannah Tringe and coworkers at JGI first assembled soil metagenome data, their efforts failed because the 100 Mbp of sequence data that they collected proved insufficient. They estimated that they would need 2–5 Gbp to obtain draft genome assemblies of the most dominant organisms in soil, and current estimates from analysis of the Great Prairie metagenome data suggest that probably closer to 2 Tbp of data are needed! However, even a relatively low level of coverage was sufficient for some initial comparisons of the soil metagenome from a Minnesota farm to other available metagenome sequence datasets. Recently Etienne Yergeau and colleagues at the National Research Council of Canada produced 1 Gbp of sequence data from permafrost soil after amplifying their sample via MDA, which introduced considerable bias. Nevertheless,nursery pots when these data were compared to other metagenome data, DNA extracts from Minnesota farm soil—but not data from marine or other habitats—proved to be most closely related to the permafrost sample.Although we are learning a great deal about dominant bacteria and archaea in soils based on 16S rRNA gene sequence data, many of the dominant operational taxonomic units that we detect in soil have no close representatives in culture collections. Researchers are addressing these deficiencies through initiatives such as the Genomic Encyclopedia for Bacteria and Archaea project that Jonathan Eisen of the University of California, Davis and JGI coordinates. The long-term goal is to fill in the phylogenetic tree of life by sequencing genomes from underrepresented phyla. Another project, “Microbial Earth,” being coordinated by Nikos Kyrpides at JGI, calls for sequencing microbial type strains in culture collections. Meanwhile, the Earth Microbiome Project is an initiative that aims to sequence what some call the “dark matter” of biology, the full microbial diversity on Earth . The EMP will begin sequencing 10,000 metagenomes from various collections and habitats, and eventually will cover hundreds of thousands of such samples, pending dedicated support.

The soil microbial ecology research community has established an international consortium, the International TerraGenome Consortium . The consortium recognizes the high complexity of the soil environment and is focused on determining “the soil metagenome.” TerraGenome is a clearinghouse for information about funding for soil metagenomics research, for development and provision of bioinformatics tools, for metadata standards, and for workshops and meetings on these topics. For example, TerraGenome set forth criteria for metadata obtained from analyzing soil samples that researchers must meet before their sequence data may be deposited into centrally held databases. This effort to set the minimum information about an environmental marker sequence was coordinated through the Genome Standards Consortium .Through soil metagenomics research, we can address fundamental questions about soil microbial ecology. For example, is there functional microbial redundancy in soil? Soil microbial community compositions differ in different soils in terms of dominant populations, according to 16S rRNA gene surveys. Although soil pH is a Table 1. Examples of ongoing soil metagenome sequencing projects Sequencing project Strategy Status Principal Investigators Funding “Metasoil”: Rothamsted Park Grass permanent grassland, started 1856 454–FLX and Titanium 24 Gigabases Pascal Simonet & Tim Vogel France “DeepSoil”: Rothamsted Highfield permanent grassland and permanent bare-fallow plots, started 1959 Illumina paired end sequencing 80 Gigabases Dirk Evers , Tim Vogel, , Janet Jansson , James Tiedje Illumina Great Prairie Grand Challenge pilot study. Native prairie and adjacent cultivated corn sites in Wisconsin, Iowa and Kansas 454 Titanium and Illumina paired end sequencing  1.6 Terabase James Tiedje ; Janet Jansson, Susannah Tringe & Eddy Rubin DOE-JGI Alaskan permafrost, thermokarst bog and active layer samples Illumina paired end sequencing  80 Gigabases Janet Jansson, Eddy Rubin, Rachel Mackelprang & Jenni Hultman , Mark Waldrop DOE-JGI 24 Sites across continental US 454-FLX & Titanium 1.2 Gigabases Folker Meyer DOE-Argonne laboratory DOE FACE sites- impact of elevated CO2–5 sites 454–FLX & Titanium 5 Gigabases Cheryl Kuske DOE-JGI 314 Y Microbe / Volume 6, Number 7, 2011 6key driver of soil community composition, biogeography also plays a role. As an illustration, we can compare soil microbial diversity to the diversity of microbial communities in the human gut. The gut microbiota from one individual to another differs at the 16S rRNA gene level, but at the broad functional level the communities are rather homogenous in healthy individuals. This pattern suggests that several different bacterial species can carry out the same functional roles in the human intestine. The situation in soil might be similar, but we have yet to explore and compare many soil metagenomes in depth to determine whether that possibility holds. Metagenomics can help us determine whether microorganisms in soils embody a specialized cache of gene functions. Available metagenome sequence datasets are already providing clues as to what functions are predominant in soils. For example, genes for cellobiose phosphorylase, an enzyme that degrades plant carbohydrates, were identified in a Minnesota farm soil metagenome, but not in one from the Sargasso Sea. When we screened permafrost for other functional genes specifically involved in cycling carbon and nitrogen, the samples included several genes that were more or less prevalent after thaw. For example, although methanogens may not be numerically dominant in permafrost, they play a key role in producing methane, which is 21 times more potent as a greenhouse gas than carbon dioxide. With deep sequencing, it should be possible to obtain genomes of some of the dominant species in soil and even some species of relatively low abundance, provided that they do not have large amounts of strain heterogeneities.