The National Ecological Observatory Network's soil metagenomes: assembly and basic analysis

F1000Res. 2021 Apr 19:10:299. doi: 10.12688/f1000research.51494.2. eCollection 2021.

Abstract

The largest dataset of soil metagenomes has recently been released by the National Ecological Observatory Network (NEON), which performs annual shotgun sequencing of soils at 47 sites across the United States. NEON serves as a valuable educational resource, thanks to its open data and programming tutorials, but there is currently no introductory tutorial for accessing and analyzing the soil shotgun metagenomic dataset. Here, we describe methods for processing raw soil metagenome sequencing reads using a bioinformatics pipeline tailored to the high complexity and diversity of the soil microbiome. We describe the rationale, necessary resources, and implementation of steps such as cleaning raw reads, taxonomic classification, assembly into contigs or genomes, annotation of predicted genes using custom protein databases, and exporting data for downstream analysis. The workflow presented here aims to increase the accessibility of NEON's shotgun metagenome data, which can provide important clues about soil microbial communities and their ecological roles.

Keywords: metagenomics; microbial ecology; soil microbiome; tutorial; workflow.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Computational Biology / methods
  • Metagenome*
  • Metagenomics / methods
  • Neon
  • Soil*

Substances

  • Soil
  • Neon

Grants and funding

ZRW is funded by the National Science Foundation (NSF) Graduate Research Fellowship Program. ZRW, MCD, and JMB are funded by the NSF Macrosystems Biology Program (Award# 1638577). BH and JLN are funded by the BU Bioinformatics Research and Interdisciplinary Training Experience (BRITE) NSF-REU program (Award #1949968).