Comparing bioinformatic pipelines for microbial 16S rRNA amplicon sequencing
Autoři:
Andrei Prodan aff001; Valentina Tremaroli aff002; Harald Brolin aff002; Aeilko H. Zwinderman aff003; Max Nieuwdorp aff001; Evgeni Levin aff001
Působiště autorů:
Department of Experimental Vascular Medicine, Amsterdam University Medical Centers, Amsterdam, The Netherlands
aff001; Wallenberg Laboratory for Cardiovascular and Metabolic Research, Department of Molecular and Clinical Medicine, Institute of Medicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
aff002; Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Amsterdam University Medical Centers, Amsterdam, The Netherlands
aff003; Horaizon BV, Delft, the Netherlands
aff004
Vyšlo v časopise:
PLoS ONE 15(1)
Kategorie:
Research Article
doi:
https://doi.org/10.1371/journal.pone.0227434
Souhrn
Microbial amplicon sequencing studies are an important tool in biological and biomedical research. Widespread 16S rRNA gene microbial surveys have shed light on the structure of many ecosystems inhabited by bacteria, including the human body. However, specialized software and algorithms are needed to convert raw sequencing data into biologically meaningful information (i.e. tables of bacterial counts). While different bioinformatic pipelines are available in a rapidly changing and improving field, users are often unaware of limitations and biases associated with individual pipelines and there is a lack of agreement regarding best practices. Here, we compared six bioinformatic pipelines for the analysis of amplicon sequence data: three OTU-level flows (QIIME-uclust, MOTHUR, and USEARCH-UPARSE) and three ASV-level (DADA2, Qiime2-Deblur, and USEARCH-UNOISE3). We tested workflows with different quality control options, clustering algorithms, and cutoff parameters on a mock community as well as on a large (N = 2170) recently published fecal sample dataset from the multi-ethnic HELIUS study. We assessed the sensitivity, specificity, and degree of consensus of the different outputs. DADA2 offered the best sensitivity, at the expense of decreased specificity compared to USEARCH-UNOISE3 and Qiime2-Deblur. USEARCH-UNOISE3 showed the best balance between resolution and specificity. OTU-level USEARCH-UPARSE and MOTHUR performed well, but with lower specificity than ASV-level pipelines. QIIME-uclust produced large number of spurious OTUs as well as inflated alpha-diversity measures and should be avoided in future studies. This study provides guidance for researchers using amplicon sequencing to gain biological insights.
Klíčová slova:
Bacteria – Bioinformatics – Clustering algorithms – DNA sequencing – Quality control – Ribosomal RNA – Sequence alignment – Sequence databases
Zdroje
1. Baird DJ, HajibabeiI M. Biomonitoring 2.0: a new paradigm in ecosystem assessment made possible by next-generation DNA sequencing. Mol Ecol. 2012;21:2039–2044. doi: 10.1111/j.1365-294x.2012.05519.x 22590728
2. Lynch S V., Pedersen O. The Human Intestinal Microbiome in Health and Disease. Phimister EG, editor. N Engl J Med. 2016;375:2369–2379. doi: 10.1056/NEJMra1600266 27974040
3. van Dijk EL, Jaszczyszyn Y, Naquin D, Thermes C. The Third Revolution in Sequencing Technology. Trends Genet. 2018;34:666–681. doi: 10.1016/j.tig.2018.05.008 29941292
4. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75:7537–41. doi: 10.1128/AEM.01541-09 19801464
5. Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460–2461. doi: 10.1093/bioinformatics/btq461 20709691
6. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Meth. 2016;13:581–583. doi: 10.1038/nmeth.3869 27214047
7. Amir A, McDonald D, Navas-Molina JA, Kopylova E, Morton JT, Zech Xu Z, et al. Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns. Gilbert JA, editor. mSystems. 2017;2:e00191–16. doi: 10.1128/mSystems.00191-16 28289731
8. Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol. 2019;37:852–857 doi: 10.1038/s41587-019-0209-9 31341288
9. Callahan BJ, McMurdie PJ, Holmes SP, Callahan BJ, Mcmurdie PJ, Holmes SP. Exact sequence variants should replace operational taxonomic units in marker gene data analysis. bioRxiv. 2017. doi: 10.1038/ismej.2017.119 28731476
10. Edgar RC. UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing. bioRxiv. 2016;081257. doi: 10.1101/081257
11. Almeida A, Mitchell AL, Tarkowska A, Finn RD. Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments. Gigascience. 2018;7:giy054. doi: 10.1093/gigascience/giy054 29762668
12. Nearing JT, Douglas GM, Comeau AM, Langille MGI. Denoising the Denoisers: An independent evaluation of microbiome sequence error-correction methods. PeerJ. 2018;6:e5364. doi: 10.7717/peerj.5364 30123705
13. Snijder MB, Galenkamp H, Prins M, Derks EM, Peters RJG, Zwinderman AH, et al. Cohort profile: the Healthy Life in an Urban Setting (HELIUS) study in Amsterdam, The Netherlands. BMJ Open. 2017;7:e017873. doi: 10.1136/bmjopen-2017-017873 29247091
14. Deschasaux M, Bouter KE, Prodan A, Levin E, Groen AK, Herrema H, et al. Depicting the composition of gut microbiota in a population with varied ethnic origins but shared geography. Nat Med. 2018;24:1526. doi: 10.1038/s41591-018-0160-1 30150717
15. Mobini R, Tremaroli V, Ståhlman M, Karlsson F, Levin M, Ljungberg M, et al. Metabolic effects of Lactobacillus reuteriDSM 17938 in people with type 2 diabetes: A randomized controlled trial. Diabetes, Obes Metab. 2017;19:579–589. doi: 10.1111/dom.12861 28009106
16. Kozich JJ, Westcott SL, Baxter NT, Highlander SK, Schloss PD. Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform. Appl Env Microbiol. 2013/06/25. 2013;79:5112–5120. doi: 10.1128/aem.01043-13 23793624
17. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010;7:335–6. doi: 10.1038/nmeth.f.303 20383131
18. Edgar RC. UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat Methods. 2013/08/21. 2013;10:996–998. doi: 10.1038/nmeth.2604 23955772
19. Edgar RC, Flyvbjerg H. Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics. 2015;31:3476–3482. doi: 10.1093/bioinformatics/btv401 26139637
20. Rideout JR, He Y, Navas-Molina JA, Walters WA, Ursell LK, Gibbons SM, et al. Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences. PeerJ. 2014;2:e545. doi: 10.7717/peerj.545 25177538
21. Westcott SL, Schloss PD. OptiClust, an Improved Method for Assigning Amplicon-Based Sequence Data to Operational Taxonomic Units. mSphere. 2017;2:e00073–17. doi: 10.1128/mSphereDirect.00073-17 28289728
22. Rognes T, Flouri T, Nichols B, Quince C, Mahé F. VSEARCH: a versatile open source tool for metagenomics. PeerJ. 2016;4:e2584. doi: 10.7717/peerj.2584 27781170
23. Edgar R. UCHIME2: improved chimera prediction for amplicon sequencing. bioRxiv. 2016; 074252. doi: 10.1101/074252
24. McMurdie PJ, Holmes S. phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data. PLoS One. 2013;8:e61217. doi: 10.1371/journal.pone.0061217 23630581
25. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. https://www.R-project.org/.; 2016.
26. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York; 2009.
27. Wei T, Simko V. R package ‘corrplot’: visualization of a correlation matrix (version 0.84).’. Retrived from https://githubcom/taiyun/corrplot. 2017.
28. Chen H, Boutros C. VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R. BMC Bioinformatics. 2011;12:35. doi: 10.1186/1471-2105-12-35 21269502
29. Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y, et al. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 2011;39:e90–e90. doi: 10.1093/nar/gkr344 21576222
30. Edgar RC. Accuracy of microbial community diversity estimated by closed- and open-reference OTUs. PeerJ. 2017;5:e3889. doi: 10.7717/peerj.3889 29018622
31. Bokulich N, Subramanian S, Faith J, Gevers D. Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing. Nature. 2013 Available: https://www.nature.com/articles/nmeth.2276
Článek vyšel v časopise
PLOS One
2020 Číslo 1
- S diagnostikou Parkinsonovy nemoci může nově pomoci AI nástroj pro hodnocení mrkacího reflexu
- Proč při poslechu některé muziky prostě musíme tančit?
- Je libo čepici místo mozkového implantátu?
- Chůze do schodů pomáhá prodloužit život a vyhnout se srdečním chorobám
- Pomůže v budoucnu s triáží na pohotovostech umělá inteligence?
Nejčtenější v tomto čísle
- Severity of misophonia symptoms is associated with worse cognitive control when exposed to misophonia trigger sounds
- Chemical analysis of snus products from the United States and northern Europe
- Calcium dobesilate reduces VEGF signaling by interfering with heparan sulfate binding site and protects from vascular complications in diabetic mice
- Effect of Lactobacillus acidophilus D2/CSL (CECT 4529) supplementation in drinking water on chicken crop and caeca microbiome
Zvyšte si kvalifikaci online z pohodlí domova
Všechny kurzy