Choice of assembly software has a critical impact on virome characterisation

Thomas D.S. Sutton, Adam G. Clooney, Feargal Ryan, R. Paul Ross, Colin Hill

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Background: The viral component of microbial communities plays a vital role in driving bacterial diversity, facilitating nutrient turnover and shaping community composition. Despite their importance, the vast majority of viral sequences are poorly annotated and share little or no homology to reference databases. As a result, investigation of the viral metagenome (virome) relies heavily on de novo assembly of short sequencing reads to recover compositional and functional information. Metagenomic assembly is particularly challenging for virome data, often resulting in fragmented assemblies and poor recovery of viral community members. Despite the essential role of assembly in virome analysis and difficulties posed by these data, current assembly comparisons have been limited to subsections of virome studies or bacterial datasets. Design: This study presents the most comprehensive virome assembly comparison to date, featuring 16 metagenomic assembly approaches which have featured in human virome studies. Assemblers were assessed using four independent virome datasets, namely, simulated reads, two mock communities, viromes spiked with a known phage and human gut viromes. Results: Assembly performance varied significantly across all test datasets, with SPAdes (meta) performing consistently well. Performance of MIRA and VICUNA varied, highlighting the importance of using a range of datasets when comparing assembly programs. It was also found that while some assemblers addressed the challenges of virome data better than others, all assemblers had limitations. Low read coverage and genomic repeats resulted in assemblies with poor genome recovery, high degrees of fragmentation and low-accuracy contigs across all assemblers. These limitations must be considered when setting thresholds for downstream analysis and when drawing conclusions from virome data.

LanguageEnglish
Article number12
JournalMicrobiome
Volume7
Issue number1
DOIs
Publication statusPublished - 28 Jan 2019

Keywords

  • Assembly
  • Bacteriophage
  • Benchmark
  • Comparison
  • Metagenome
  • Phage
  • Viral
  • Virome

ASJC Scopus subject areas

  • Microbiology
  • Microbiology (medical)

Cite this

Sutton, Thomas D.S. ; Clooney, Adam G. ; Ryan, Feargal ; Ross, R. Paul ; Hill, Colin. / Choice of assembly software has a critical impact on virome characterisation. In: Microbiome. 2019 ; Vol. 7, No. 1.
@article{3f1b6a4a2f194042a57149774e2be9b9,
title = "Choice of assembly software has a critical impact on virome characterisation",
abstract = "Background: The viral component of microbial communities plays a vital role in driving bacterial diversity, facilitating nutrient turnover and shaping community composition. Despite their importance, the vast majority of viral sequences are poorly annotated and share little or no homology to reference databases. As a result, investigation of the viral metagenome (virome) relies heavily on de novo assembly of short sequencing reads to recover compositional and functional information. Metagenomic assembly is particularly challenging for virome data, often resulting in fragmented assemblies and poor recovery of viral community members. Despite the essential role of assembly in virome analysis and difficulties posed by these data, current assembly comparisons have been limited to subsections of virome studies or bacterial datasets. Design: This study presents the most comprehensive virome assembly comparison to date, featuring 16 metagenomic assembly approaches which have featured in human virome studies. Assemblers were assessed using four independent virome datasets, namely, simulated reads, two mock communities, viromes spiked with a known phage and human gut viromes. Results: Assembly performance varied significantly across all test datasets, with SPAdes (meta) performing consistently well. Performance of MIRA and VICUNA varied, highlighting the importance of using a range of datasets when comparing assembly programs. It was also found that while some assemblers addressed the challenges of virome data better than others, all assemblers had limitations. Low read coverage and genomic repeats resulted in assemblies with poor genome recovery, high degrees of fragmentation and low-accuracy contigs across all assemblers. These limitations must be considered when setting thresholds for downstream analysis and when drawing conclusions from virome data.",
keywords = "Assembly, Bacteriophage, Benchmark, Comparison, Metagenome, Phage, Viral, Virome",
author = "Sutton, {Thomas D.S.} and Clooney, {Adam G.} and Feargal Ryan and Ross, {R. Paul} and Colin Hill",
year = "2019",
month = "1",
day = "28",
doi = "10.1186/s40168-019-0626-5",
language = "English",
volume = "7",
journal = "Microbiome",
issn = "2049-2618",
number = "1",

}

Choice of assembly software has a critical impact on virome characterisation. / Sutton, Thomas D.S.; Clooney, Adam G.; Ryan, Feargal; Ross, R. Paul; Hill, Colin.

In: Microbiome, Vol. 7, No. 1, 12, 28.01.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Choice of assembly software has a critical impact on virome characterisation

AU - Sutton, Thomas D.S.

AU - Clooney, Adam G.

AU - Ryan, Feargal

AU - Ross, R. Paul

AU - Hill, Colin

PY - 2019/1/28

Y1 - 2019/1/28

N2 - Background: The viral component of microbial communities plays a vital role in driving bacterial diversity, facilitating nutrient turnover and shaping community composition. Despite their importance, the vast majority of viral sequences are poorly annotated and share little or no homology to reference databases. As a result, investigation of the viral metagenome (virome) relies heavily on de novo assembly of short sequencing reads to recover compositional and functional information. Metagenomic assembly is particularly challenging for virome data, often resulting in fragmented assemblies and poor recovery of viral community members. Despite the essential role of assembly in virome analysis and difficulties posed by these data, current assembly comparisons have been limited to subsections of virome studies or bacterial datasets. Design: This study presents the most comprehensive virome assembly comparison to date, featuring 16 metagenomic assembly approaches which have featured in human virome studies. Assemblers were assessed using four independent virome datasets, namely, simulated reads, two mock communities, viromes spiked with a known phage and human gut viromes. Results: Assembly performance varied significantly across all test datasets, with SPAdes (meta) performing consistently well. Performance of MIRA and VICUNA varied, highlighting the importance of using a range of datasets when comparing assembly programs. It was also found that while some assemblers addressed the challenges of virome data better than others, all assemblers had limitations. Low read coverage and genomic repeats resulted in assemblies with poor genome recovery, high degrees of fragmentation and low-accuracy contigs across all assemblers. These limitations must be considered when setting thresholds for downstream analysis and when drawing conclusions from virome data.

AB - Background: The viral component of microbial communities plays a vital role in driving bacterial diversity, facilitating nutrient turnover and shaping community composition. Despite their importance, the vast majority of viral sequences are poorly annotated and share little or no homology to reference databases. As a result, investigation of the viral metagenome (virome) relies heavily on de novo assembly of short sequencing reads to recover compositional and functional information. Metagenomic assembly is particularly challenging for virome data, often resulting in fragmented assemblies and poor recovery of viral community members. Despite the essential role of assembly in virome analysis and difficulties posed by these data, current assembly comparisons have been limited to subsections of virome studies or bacterial datasets. Design: This study presents the most comprehensive virome assembly comparison to date, featuring 16 metagenomic assembly approaches which have featured in human virome studies. Assemblers were assessed using four independent virome datasets, namely, simulated reads, two mock communities, viromes spiked with a known phage and human gut viromes. Results: Assembly performance varied significantly across all test datasets, with SPAdes (meta) performing consistently well. Performance of MIRA and VICUNA varied, highlighting the importance of using a range of datasets when comparing assembly programs. It was also found that while some assemblers addressed the challenges of virome data better than others, all assemblers had limitations. Low read coverage and genomic repeats resulted in assemblies with poor genome recovery, high degrees of fragmentation and low-accuracy contigs across all assemblers. These limitations must be considered when setting thresholds for downstream analysis and when drawing conclusions from virome data.

KW - Assembly

KW - Bacteriophage

KW - Benchmark

KW - Comparison

KW - Metagenome

KW - Phage

KW - Viral

KW - Virome

UR - http://www.scopus.com/inward/record.url?scp=85060649058&partnerID=8YFLogxK

U2 - 10.1186/s40168-019-0626-5

DO - 10.1186/s40168-019-0626-5

M3 - Article

VL - 7

JO - Microbiome

T2 - Microbiome

JF - Microbiome

SN - 2049-2618

IS - 1

M1 - 12

ER -