Taxonomic analysis

To identify micro-organisms populating a sample and their proportion, we use taxonomic and phylogenetic approaches. Indeed, in these approaches, each reads or sequences are assigned to the most plausible microbial lineage.

Most tools used to estimate sample composition use 16S rRNA genes as marker for bacteria and archea and 18S rRNA genes for eukaryota. Other tools, such as MetaPhlAn [2][3], propose alternatives based on more general clade-specific marker genes.

We use such approaches to analyze taxonomy of sequences with MetaPhlAn2 [3] on all rRNA sequences. This tool infer the presence and read coverage of clade-specific markers to detect taxonomic clades and their relative abundance.

Taxonomic assignation

This tool is available on left panel in Assign taxonomy on non rRNA sequences (STRUCTURAL AND FUNCTIONAL ANALYSIS TOOLS). In this tutorial, you execute it on all sequences before SortMeRNA execution.

../../_images/metaphlan_2.png

In the dataset, we obtain a text file with 59 lines, each line corresponding to a taxonomic assignation (represented at different taxonomic level) with its relative abundance.

../../_images/metaphlan_2_output.png

Formatting

The output in plain text is not easy to interpret: all taxonomic levels are mixed together.

Visualization

To extract information, we can also use visualization tools to get graphical representations of MetaPhlAn 2 output. Two solutions can be used: GraPhlAn or KRONA.

Interactive visualization

Krona [1] is a visualization tool for intuitive exploration of relative abundances of taxonomic classifications. Krona requires a formatted input file.

MetaPhlAn2 output has then to be formatted using Format MetaPhlAn2 output for Krona (in Assign taxonomy on non rRNA sequences, Taxonomic assignation):

../../_images/format_metaphlan2.png

Krona can then be called (Krona pie chart from taxonomic profile, in Visualize data, in Post-treatments)

../../_images/krona_call.png

Krona produces an interactive HTML file. The content can be visualized inside Galaxy environment by clicking on View data on top right of Krona output in right panel.

../../_images/krona_output.png

This visualization is similar to the one on EBI metagenomic.

Static, easy to export visualization

Alternatively, GraPhlAn is a tool for producing circular representation of taxonomic analyses, easily exportable. This tool requires 2 files: a tree and an annotation file.

However, MetaPhlAn produces only a text file. We need to use a tool to extract tree and annotations from MetaPhlAn output. We use export2graphlan, available in section Visualize data (in Post-treatments). Numerous parameters modulates informations in annotation file. For our dataset, we fix :

  • Levels to annotate in the tree: 5
  • Levels to annotate in the external legend: 6,7
  • Title font size: 15
  • Default size for clades not found as biomarkers: 10
  • Minimum value of biomarker clades: 0
  • Maximum value of biomarker clades: 250
  • Font size: 10
  • Minimum font size: 8
  • Maximum font size: 12
  • Font size for the annotation legend: 11
  • Minimum abundance value for a clade to be annotated: 0
  • Number of clades to highlight: 100
  • Row number contaning the names of the features: 0
  • Row number containing the names of the samples: 0

We decide to display the maximum of clade (100, here). If you want more or less, you can modulate the number of clades to highlight. And if you want to change displayed annotations, you can change levels to annotate.

This tool will generate two outputs (a tree and an annotation files). These two outputs have to be combined in first GraPhlAn script (Modify an input tree for GraPhlAn, in Visualize data):

../../_images/graphlan_annotate_parameters.png

This tool generates a PhyloXML file, input file for GraPhlAn.

GraPhlAn is available in Visualize data section (Post-treatments). It generates an output file (an image) corresponding to circular representation of MetaPhlAn outputs. Available parameters have impact on output file format, size, ...

../../_images/graphlan_parameters.png

With our dataset, we obtain a nice graphical representation of taxonomic diversity inside our sample, with circle radius being proportional to relative abundance of the corresponding clade.

After these taxonomic analyses, we can then run functional analyses.

[1]Brian D. Ondov, Nicholas H. Bergman, and Adam M. Phillippy. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics, 12(1):385, September 2011. URL: http://www.biomedcentral.com/1471-2105/12/385/abstract, doi:10.1186/1471-2105-12-385.
[2]Nicola Segata, Levi Waldron, Annalisa Ballarini, Vagheesh Narasimhan, Olivier Jousson, and Curtis Huttenhower. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Meth, 9(8):811–814, August 2012. URL: http://www.nature.com/nmeth/journal/v9/n8/full/nmeth.2066.html, doi:10.1038/nmeth.2066.
[3](1, 2) Duy Tin Truong, Eric A. Franzosa, Timothy L. Tickle, Matthias Scholz, George Weingart, Edoardo Pasolli, Adrian Tett, Curtis Huttenhower, and Nicola Segata. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Meth, 12(10):902–903, October 2015. URL: http://www.nature.com.gate1.inist.fr/nmeth/journal/v12/n10/full/nmeth.3589.html, doi:10.1038/nmeth.3589.