ASaiM

ASaiM is a bioinformatics environment for optimised processing and analysis of massive microbiota data, particularly gut microbiota data.

Context

Massive data of intestinal microbiota are available in the public data repositories such as ENA, NCBI, DDBJ, ... For example, in the ENA public data repository, 721 studies contain in their description the word “intestin”, “gut” or “feac” and in their name “meta” (09/26/2015). However, this data is not easy to identify (many of these 721 studies are not relevant) and query. Moreover the datasets underwent different analyses and the results of different projects can not be compared directly. Datasets from public data repositories need to be formated to make them informative and standardized and then extract information such as which organisms are present or which functions are done in a specific sample of gut microbiota.

For that, data from public data repositories must be re-analyzed using a full analytical workflow with several steps, such as the standard defined by [12]:

  1. Quality control
  2. (Assembly of sequences)
  3. Sorting of pertinent sequence
  4. Functional annotation
  5. Taxonomic analysis
  6. Comparative analysis

Several solutions could be used [12]:

  • QIIME [4], Mothur [18], ...

    These tools are useful to analyze microbiota data. Most of them are available through command line, which is more convenient to analyze large and numerous datasets. However, the execution by command line is also a limitation for many users. Also, these tools are only one or few steps in the data analysis process, and are mostly dedicated to the analysis of 16S rDNA datasets

  • MEGAN [10]

    This tool targets metagenomic data but lacks critical tasks such as taxonomic and functional analysis

  • CAMERA [19]

    This pipeline is no longer supported starting from 1st of July 2014

  • IMG/M [15][16]

    IMG/M is an experimental metagenome data management and analysis system. It provides a genome database from bacterial, archaeal and selected eukaryotic organisms and a suite of tools for data exploration and comparative data analysis. This tool is designed for assembled metagenomes and does not provide any tool for quality control and other tasks up to assembly. Moreover, it is not modular and can be used only via the web-based interface

  • MG-RAST [17][3]

    This tool uses an automated pipeline to analyze the data and standardize the outputs. Numerous tools are also available but none for quality control and assembly of the sequences. Also this tool is only available through a web interface, where datasets have to be uploaded. Data analysis is then decentralized. We do not have any control on that and it could take time. This solution is not convenient for large and numerous datasets, such as the ones in public data repositories

  • EBI metagenomics [9]

    EBI metagenomics tool is similar to MG-RAST or IMG/M with an automated pipeline via a web-based interface. The advantage over other solutions is that this tool used EBI expertise in sequence data archiving and analysis. However, this solution, like MG-Rast or IMG/M, is not convenient to analyze large and numerous datasets

  • CloVR-metagenomics [1]

    CloVR-metagenomics is desktop application which can be complemented by a cloud-based instance. It provides a defined pipeline for automated sequence analysis. However, this solution lacks some tools such as quality control, assembly or gene detection

  • SmashCommunity [2]

    This solution provides an automated workflow from sequence assembly to comparative analysis, with numerous tools. However all these tools have to be installed locally before any execution and SmashCommunity is only executable with command-line, no user-friendly interface being available

  • RAMMCAP [14]

    RAMMCAP is a metagenomic platform with a workflow which enables a complete metagenomic analysis. The strength of this platform relies on the minimization of the computation cost of the various processing tasks. However, it does not provide a user-friendly interface and each of the required programs has to be compiled and installed separetely. This is a weakness for an inexperienced user.

  • MetAMOS [20]

    This tool is an open source, modular and customizable framework for metagenomic assembly and analysis to produce genomic scaffolds, open-reading frames and taxonomic or functional annotations. This tool is mainly focused on metagenome assembly and is presented as the assembly-centric counterpart to QIIME and Mothur. It also provides several interesting tools for analysis. This tool is guided by two principles: modularity and robustness. It encourages users to tailor the tool to the biological questions they want to answer, not the opposite. However, this tool does not provide many useful sotfwares, is managed only with command line (useful for numerous analyses but not for a single specific analysis) and the pipeline definition lacks of visual and documented information.

  • Galaxy [8][7]

    Galaxy is an open, web-based platform for performing accessible, reproducible, and transparent genomic science. This platform offers numerous tools and also several worflows to analyze metagenomic datasets, such as Galaxy metagenomic pipeline [11], Orione [5], BioMaS [6], Huttenhower Lab Galaxy instance. It combines a user-friendly web-based interface and an use with an API for command-line. However, the available metagenomic workflows have to be adapted to process gut microbiota data with specific databases such as the catalog of reference genes in the human gut microbiome [13]

None of these solutions respond to all following requirements

  • Complete analytical workflow such as the one proposed by [12] with gut microbiota specific databases
  • User-friendly interface and command-line use to automate analysis of numerous datasets
  • Data management capabilities

Solution

New sequencing platforms produce huge amount of short reads. Notwithstanding, inappropriate use of sequence analysis procedures may result in numerous errors and misinterpretation. This is particularly true for exploration of metagenomic and metatranscriptomic data from complex microorganims communities colonizing all environments. Hence as these communities are highly studied, there is an urgent need for modular, accessible and sharable user-friendly tools.

ASaiM is an open-source opinionated framework dedicated to microbiota sequence analyses. With a selected collection of tools, workflows and databases, ASaiM helps exploitation of taxonomic and metabolic information from raw microbiota sequences, using a custom Galaxy instance

_images/galaxy_instance.png

Screenshot of the custom Galaxy instance of ASaiM framework

This framework is developed to :

  • be easy to use for all from beginners to expert. Check by yourself how to construct and execute a workflow or follow the tutorial with available toy dataset
  • incorporate numerous but carefully selected tools. Check out all the available tools.
  • help generation of modular workflows. Look at the availables workflows.
  • improve transparency and reproducibility of microbiota studies

ASaiM provides therefore a powerful framework to easily and rapidly exploit microbiota data in a reproducible and transparent environment.

References

[1]Samuel V. Angiuoli, Malcolm Matalka, Aaron Gussman, Kevin Galens, Mahesh Vangala, David R. Riley, Cesar Arze, James R. White, Owen White, and W. Florian Fricke. CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinformatics, 12(1):356, August 2011. URL: http://www.biomedcentral.com/1471-2105/12/356/abstract, doi:10.1186/1471-2105-12-356.
[2]Manimozhiyan Arumugam, Eoghan D. Harrington, Konrad U. Foerstner, Jeroen Raes, and Peer Bork. SmashCommunity: a metagenomic annotation and analysis tool. Bioinformatics, 26(23):2977–2978, December 2010. URL: http://bioinformatics.oxfordjournals.org/content/26/23/2977, doi:10.1093/bioinformatics/btq536.
[3]Ramy K. Aziz, Daniela Bartels, Aaron A. Best, Matthew DeJongh, Terrence Disz, Robert A. Edwards, Kevin Formsma, Svetlana Gerdes, Elizabeth M. Glass, Michael Kubal, Folker Meyer, Gary J. Olsen, Robert Olson, Andrei L. Osterman, Ross A. Overbeek, Leslie K. McNeil, Daniel Paarmann, Tobias Paczian, Bruce Parrello, Gordon D. Pusch, Claudia Reich, Rick Stevens, Olga Vassieva, Veronika Vonstein, Andreas Wilke, and Olga Zagnitko. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics, 9(1):75, February 2008. URL: http://www.biomedcentral.com/1471-2164/9/75/abstract, doi:10.1186/1471-2164-9-75.
[4]J. Gregory Caporaso, Justin Kuczynski, Jesse Stombaugh, Kyle Bittinger, Frederic D. Bushman, Elizabeth K. Costello, Noah Fierer, Antonio Gonzalez Peña, Julia K. Goodrich, Jeffrey I. Gordon, Gavin A. Huttley, Scott T. Kelley, Dan Knights, Jeremy E. Koenig, Ruth E. Ley, Catherine A. Lozupone, Daniel McDonald, Brian D. Muegge, Meg Pirrung, Jens Reeder, Joel R. Sevinsky, Peter J. Turnbaugh, William A. Walters, Jeremy Widmann, Tanya Yatsunenko, Jesse Zaneveld, and Rob Knight. QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7(5):335–336, May 2010. URL: http://www.nature.com/nmeth/journal/v7/n5/full/nmeth.f.303.html, doi:10.1038/nmeth.f.303.
[5]Gianmauro Cuccuru, Massimiliano Orsini, Andrea Pinna, Andrea Sbardellati, Nicola Soranzo, Antonella Travaglione, Paolo Uva, Gianluigi Zanetti, and Giorgio Fotia. Orione, a web-based framework for NGS analysis in microbiology. Bioinformatics, 30(13):1928–1929, July 2014. URL: http://bioinformatics.oxfordjournals.org/content/30/13/1928, doi:10.1093/bioinformatics/btu135.
[6]Bruno Fosso, Monica Santamaria, Marinella Marzano, Daniel Alonso-Alemany, Gabriel Valiente, Giacinto Donvito, Alfonso Monaco, Pasquale Notarangelo, and Graziano Pesole. BioMaS: a modular pipeline for Bioinformatic analysis of Metagenomic AmpliconS. BMC Bioinformatics, 16(1):203, July 2015. URL: http://www.biomedcentral.com/1471-2105/16/203/abstract, doi:10.1186/s12859-015-0595-z.
[7]Belinda Giardine, Cathy Riemer, Ross C. Hardison, Richard Burhans, Laura Elnitski, Prachi Shah, Yi Zhang, Daniel Blankenberg, Istvan Albert, James Taylor, Webb Miller, W. James Kent, and Anton Nekrutenko. Galaxy: A platform for interactive large-scale genome analysis. Genome Res., 15(10):1451–1455, October 2005. URL: http://genome.cshlp.org/content/15/10/1451, doi:10.1101/gr.4086505.
[8]Jeremy Goecks, Anton Nekrutenko, James Taylor, and \$author firstName \$author.lastName. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology, 11(8):R86, August 2010. URL: http://genomebiology.com/2010/11/8/R86/abstract, doi:10.1186/gb-2010-11-8-r86.
[9]Sarah Hunter, Matthew Corbett, Hubert Denise, Matthew Fraser, Alejandra Gonzalez-Beltran, Christopher Hunter, Philip Jones, Rasko Leinonen, Craig McAnulla, Eamonn Maguire, John Maslen, Alex Mitchell, Gift Nuka, Arnaud Oisel, Sebastien Pesseat, Rajesh Radhakrishnan, Philippe Rocca-Serra, Maxim Scheremetjew, Peter Sterk, Daniel Vaughan, Guy Cochrane, Dawn Field, and Susanna-Assunta Sansone. EBI metagenomics—a new resource for the analysis and archiving of metagenomic data. Nucl. Acids Res., 42(D1):D600–D606, January 2014. URL: http://nar.oxfordjournals.org/content/42/D1/D600, doi:10.1093/nar/gkt961.
[10]Daniel H. Huson, Suparna Mitra, Hans-Joachim Ruscheweyh, Nico Weber, and Stephan C. Schuster. Integrative analysis of environmental sequences using MEGAN4. Genome Res., 21(9):1552–1560, September 2011. URL: http://genome.cshlp.org/content/21/9/1552, doi:10.1101/gr.120618.111.
[11]Sergei Kosakovsky Pond, Samir Wadhawan, Francesca Chiaromonte, Guruprasad Ananda, Wen-Yu Chung, James Taylor, Anton Nekrutenko, and The Galaxy Team. Windshield splatter analysis with the Galaxy metagenomic pipeline. Genome Res., 19(11):2144–2153, November 2009. URL: http://genome.cshlp.org/content/19/11/2144, doi:10.1101/gr.094508.109.
[12](1, 2, 3) Efthymios Ladoukakis, Fragiskos N. Kolisis, and Aristotelis A. Chatziioannou. Integrative workflows for metagenomic analysis. Front Cell Dev Biol, November 2014. URL: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4237130/, doi:10.3389/fcell.2014.00070.
[13]Junhua Li, Huijue Jia, Xianghang Cai, Huanzi Zhong, Qiang Feng, Shinichi Sunagawa, Manimozhiyan Arumugam, Jens Roat Kultima, Edi Prifti, Trine Nielsen, Agnieszka Sierakowska Juncker, Chaysavanh Manichanh, Bing Chen, Wenwei Zhang, Florence Levenez, Juan Wang, Xun Xu, Liang Xiao, Suisha Liang, Dongya Zhang, Zhaoxi Zhang, Weineng Chen, Hailong Zhao, Jumana Yousuf Al-Aama, Sherif Edris, Huanming Yang, Jian Wang, Torben Hansen, Henrik Bjørn Nielsen, Søren Brunak, Karsten Kristiansen, Francisco Guarner, Oluf Pedersen, Joel Doré, S. Dusko Ehrlich, MetaHIT Consortium, Peer Bork, and Jun Wang. An integrated catalog of reference genes in the human gut microbiome. Nat Biotech, 32(8):834–841, August 2014. URL: http://www.nature.com/nbt/journal/v32/n8/full/nbt.2942.html, doi:10.1038/nbt.2942.
[14]Weizhong Li. Analysis and comparison of very large metagenomes with fast clustering and functional annotation. BMC Bioinformatics, 10(1):359, October 2009. URL: http://www.biomedcentral.com/1471-2105/10/359/abstract, doi:10.1186/1471-2105-10-359.
[15]Victor M. Markowitz, I.-Min A. Chen, Ken Chu, Ernest Szeto, Krishna Palaniappan, Manoj Pillay, Anna Ratner, Jinghua Huang, Ioanna Pagani, Susannah Tringe, Marcel Huntemann, Konstantinos Billis, Neha Varghese, Kristin Tennessen, Konstantinos Mavromatis, Amrita Pati, Natalia N. Ivanova, and Nikos C. Kyrpides. IMG/M 4 version of the integrated metagenome comparative analysis system. Nucl. Acids Res., 42(D1):D568–D573, January 2014. URL: http://nar.oxfordjournals.org/content/42/D1/D568, doi:10.1093/nar/gkt919.
[16]Victor M. Markowitz, Natalia N. Ivanova, Ernest Szeto, Krishna Palaniappan, Ken Chu, Daniel Dalevi, I.-Min A. Chen, Yuri Grechkin, Inna Dubchak, Iain Anderson, Athanasios Lykidis, Konstantinos Mavromatis, Philip Hugenholtz, and Nikos C. Kyrpides. IMG/M: a data management and analysis system for metagenomes. Nucl. Acids Res., 36(suppl 1):D534–D538, January 2008. URL: http://nar.oxfordjournals.org/content/36/suppl_1/D534, doi:10.1093/nar/gkm869.
[17]F. Meyer, D. Paarmann, M. D’Souza, R. Olson, E. M. Glass, M. Kubal, T. Paczian, A. Rodriguez, R. Stevens, A. Wilke, J. Wilkening, and R. A. Edwards. The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics, 9(1):386, September 2008. URL: http://www.biomedcentral.com/1471-2105/9/386/abstract, doi:10.1186/1471-2105-9-386.
[18]Patrick D. Schloss, Sarah L. Westcott, Thomas Ryabin, Justine R. Hall, Martin Hartmann, Emily B. Hollister, Ryan A. Lesniewski, Brian B. Oakley, Donovan H. Parks, Courtney J. Robinson, Jason W. Sahl, Blaz Stres, Gerhard G. Thallinger, David J. Van Horn, and Carolyn F. Weber. Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities. Appl. Environ. Microbiol., 75(23):7537–7541, December 2009. URL: http://aem.asm.org/content/75/23/7537, doi:10.1128/AEM.01541-09.
[19]Rekha Seshadri, Saul A Kravitz, Larry Smarr, Paul Gilna, and Marvin Frazier. CAMERA: A Community Resource for Metagenomics. PLoS Biol, 5(3):e75, March 2007. URL: http://dx.doi.org/10.1371/journal.pbio.0050075, doi:10.1371/journal.pbio.0050075.
[20]Todd J. Treangen, Sergey Koren, Daniel D. Sommer, Bo Liu, Irina Astrovskaya, Brian Ondov, Aaron E. Darling, Adam M. Phillippy, and Mihai Pop. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. Genome Biol., 14(1):R2, 2013. doi:10.1186/gb-2013-14-1-r2.