Submitted Successfully!
Thank you for your contribution! You can also upload a video entry related to this topic through the link below: https://encyclopedia.pub/user/video_add?id=28624
Check Note
2000/2000
Ver. Summary Created by Modification Content Size Created at Operation
1 handwiki -- 1169 2022-10-10 01:43:32

MG-RAST is an open-source web application server that suggests automatic phylogenetic and functional analysis of metagenomes. It is also one of the biggest repositories for metagenomic data. The name is an abbreviation of Metagenomic Rapid Annotations using Subsystems Technology. The pipeline automatically produces functional assignments to the sequences that belong to the metagenome by performing sequence comparisons to databases in both nucleotide and amino-acid levels. The applications supplies phylogenetic and functional assignments of the metagenome being analysed, as well as tools for comparing different metagenomes. It also provides a RESTful API for programmatic access. The server was created and maintained by Argonne National Laboratory from the University of Chicago. In December 29 of 2016, the system had analyzed 60 terabase-pairs of data from more than 150,000 data sets. Among the analyzed data sets, more than 23,000 are available to the public. Currently, the computational resources are provided by the DOE Magellan cloud at Argonne National Laboratory, Amazon EC2 Web services, and a number of traditional clusters.

functional analysis computational resources web application
Information
Contributor :
View Times: 67
Entry Collection: HandWiki
Revision: 1 time (View History)
Update Date: 10 Oct 2022
Table of Contents

    1. Background

    MG-RAST has been developed as an effort to have a free, public resource for the analysis and the storage of metagenome sequence data. The service removes one of the primary bottlenecks in metagenome analysis: the availability of high-performance computing for annotating data.[1]

    Metagenomic and metatranscriptomic studies involve the processing of large datasets and therefore they can require computationally expensive analysis. Nowadays, scientists are able to generate such volumes of data because, in the recent years, the sequencing costs have reduced dramatically. This fact has shifted the limiting factor to the computing costs:for instance, a recent study of the University of Maryland, estimated a cost of more than $5 million per terabase using their CLOVR metagenome analysis pipeline.[2] As the size and number of sequence datasets continue to increase, costs related to their analysis will continue to rise.

    Additionally, MG-RAST also works as a repository tool for metagenomic data. Metadata collection and interpretation is vital for genomic and metagenomic studies, and challenges in this regard include the exchange, curation, and distribution of this information. The MG-RAST system has been an early adopter of the minimal checklist standards and the expanded biome-specific environmental packages devised by the Genomics Standards Consortium, and provides an easy-to-use uploader for metadata capture at the time of data submission.[3]

    2. Pipeline for Metagenomic Data Analysis

    The MG-RAST application offers automated quality control, annotation, comparative analysis and archiving service of metagenomic and amplicon sequences using a combination of several bioinformatics tools. The application was built to analyze metagenomic data, but it also supports amplicon (16S, 18S, and ITS) sequences and metatranscriptome (RNA-seq) sequences processing. Presently, MG-RAST is not capable of predicting coding regions from eukaryotes and therefore it is of limited use for eukaryotic metagenomes analysis.[4]

    The pipeline of MG-RAST can be divided into five stages:

    2.1. Data Hygiene

    Includes steps for quality control and artifacts removal. Firstly, low-quality regions are trimmed using SolexaQA and reads showing inappropriate lengths are removed. A dereplication step is included in the case of metagenome and metatranscriptome datasets processing. Subsequently, DRISEE (Duplicate Read Inferred Sequencing Error Estimation) is used to assess the sample sequencing error based on Artificial Duplicate Reads (ADRs) measuring. And finally, the pipeline offers the possibility of screening the reads using Bowtie aligner and removing the reads showing matches close to model organisms genomes (including fly, mouse, cow and human).

    2.2. Feature Extraction

    MG-RAST identifies gene sequences by using a machine learning approach: FragGeneScan. Ribosomal RNA sequences are identified through an initial BLAT search against a reduced version of SILVA database.

    2.3. Feature Annotation

    In order to identify the putative functions and annotation of the genes, MG-RAST builds clusters of proteins at 90% identity level using the UCLUST implementation in QIIME. The longest sequence of each cluster will be selected for a similarity analysis. The similarity analysis is computed through sBLAT (in which BLAT algorithm is parallelized using OpenMP). The search is computed against a protein database derived from the M5nr, which provides nonredundant integration of sequences from GenBank, SEED, IMG, UniProt, KEGG and eggNOGs databases.[5]

    The reads associated to rRNA sequences are clustered at 97% identity. The longest sequence of each cluster is picked as representative and will be used for a BLAT search against the M5rna database, which integrates SILVA, Greengenes and RDP.

    2.4. Profile Generation

    The data is integrated into a number of data products. The most important ones are the abundance profiles, which represent a pivoted and aggregated version of the similarity files.

    2.5. Data Loading

    Finally, the obtained abundance profiles are loaded into the respective databases.

    2.6. Detailed Steps of the MG-RAST Pipeline

    MG-RAST Pipeline Description
    qc_stats Generate quality control statistics
    preprocess Preprocessing, to trim low-quality regions from FASTQ data
    dereplication Dereplication for shotgun metagenome data by using k-mer approach
    screen Removing reads that are near-exact matches to the genomes of model organisms (fly, mouse, cow and human)
    rna detection BLAT search against a reduced RNA database, to identifies ribosomal RNA
    rna clustering rRNA-similar reads are then clustered at 97% identity
    rna sims blat BLAT similarity search for the longest cluster representative against the M5rna database
    genecalling A machine learning approach, FragGeneScan, to predict coding regions in DNA sequences
    aa filtering Filter proteins
    aa clustering Cluster proteins at 90% identity level using uclust
    aa sims blat BLAT similarity analysis to identify protein
    aa sims annotation Sequence similarity against protein database from the M5nr
    rna sims annotation Sequence similarity against RNA database from the M5rna
    index sim seq Index sequence similarity to data sources
    md5 annotation summary Generate summary report md5 annotation, function annotation, organism annotation, LCAa annotation, ontology annotation and source annotation
    function annotation summary Generate summary report md5 annotation, function annotation, organism annotation, LCAa annotation, ontology annotation and source annotation
    organism annotation summary Generate summary report md5 annotation, function annotation, organism annotation, LCAa annotation, ontology annotation and source annotation
    lca annotation summary Generate summary report md5 annotation, function annotation, organism annotation, LCAa annotation, ontology annotation and source annotation
    ontology annotation summary Generate summary report md5 annotation, function annotation, organism annotation, LCAa annotation, ontology annotation and source annotation
    source annotation summary Generate summary report md5 annotation, function annotation, organism annotation, LCAa annotation, ontology annotation and source annotation
    md5 summary load Load summary report to the project
    function summary load Load summary report to the project
    organism summary load Load summary report to the project
    lca summary load Load summary report to the project
    ontology summary load Load summary report to the project
    done stage  
    notify job completion Send notification to user via email

    3. MG-RAST Utilities

    Besides metagenome analysis, MG-RAST can also be used for data discovery. The visualization or comparison of metagenomes profiles and data sets can be implemented in a wide variety of modes; the web interface allows to select data based on criteria like composition, sequences quality, functionality or sample type and offers several ways to compute statistical inferences and ecological analyses. The profiles for the metagenomes can be visualized and compared by using barcharts, trees, spreadsheet-like tables, heatmaps, PCoA, rarefaction plots, circular recruitment plot, and KEGG maps.

    References

    1. Meyer, F.; Paarmann, D.; D'Souza, M.; Olson, R.; Glass, EM; Kubal, M.; Paczian, T.; Rodriguez, A. et al. (2008-01-01). "The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes". BMC Bioinformatics 9: 386. doi:10.1186/1471-2105-9-386. ISSN 1471-2105. PMID 18803844.  http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2563014
    2. Angiuoli, Samuel V.; Matalka, Malcolm; Gussman, Aaron; Galens, Kevin; Vangala, Mahesh; Riley, David R.; Arze, Cesar; White, James R. et al. (2011-01-01). "CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing". BMC Bioinformatics 12: 356. doi:10.1186/1471-2105-12-356. ISSN 1471-2105. PMID 21878105.  http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3228541
    3. Field, Dawn; Amaral-Zettler, Linda; Cochrane, Guy; Cole, James R.; Dawyndt, Peter; Garrity, George M.; Gilbert, Jack; Glöckner, Frank Oliver et al. (2011-06-21). "The Genomic Standards Consortium". PLOS Biology 9 (6): e1001088. doi:10.1371/journal.pbio.1001088. ISSN 1545-7885. PMID 21713030.  http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3119656
    4. Keegan, Kevin P.; Glass, Elizabeth M.; Meyer, Folker (2016-01-01). MG-RAST, a Metagenomics Service for Analysis of Microbial Community Structure and Function. 1399. 207–233. doi:10.1007/978-1-4939-3369-3_13. ISBN 978-1-4939-3367-9.  https://dx.doi.org/10.1007%2F978-1-4939-3369-3_13
    5. Wilke, Andreas; Harrison, Travis; Wilkening, Jared; Field, Dawn; Glass, Elizabeth M.; Kyrpides, Nikos; Mavrommatis, Konstantinos; Meyer, Folker (2012-01-01). "The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools". BMC Bioinformatics 13: 141. doi:10.1186/1471-2105-13-141. ISSN 1471-2105. PMID 22720753.  http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3410781
    More
    Information
    Contributor MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :
    View Times: 67
    Entry Collection: HandWiki
    Revision: 1 time (View History)
    Update Date: 10 Oct 2022
    Table of Contents
      1000/1000

      Confirm

      Are you sure you want to delete?

      Video Upload Options

      Do you have a full video?
      Cite
      If you have any further questions, please contact Encyclopedia Editorial Office.
      Handwiki,  MG-RAST. Encyclopedia. Available online: https://encyclopedia.pub/entry/28624 (accessed on 27 January 2023).
      Handwiki . MG-RAST. Encyclopedia. Available at: https://encyclopedia.pub/entry/28624. Accessed January 27, 2023.
      Handwiki, . "MG-RAST," Encyclopedia, https://encyclopedia.pub/entry/28624 (accessed January 27, 2023).
      Handwiki,  (2022, October 10). MG-RAST. In Encyclopedia. https://encyclopedia.pub/entry/28624
      Handwiki, . ''MG-RAST.'' Encyclopedia. Web. 10 October, 2022.
      Top
      Feedback