HOME   |    PDF   |   


Benchmarking of 16S rRNA gene databases using known strain sequences


Kunal Dixit1, Dimple Davray1, Diptaraj Chaudhari2, Pratik Kadam2, Rudresh Kshirsagar2, Yogesh Shouche2, Dhiraj Dhotre3,*, Sunil D. Saroj1,*



1Symbiosis School of Biological Sciences (SSBS), Symbiosis International (Deemed University), Pune, India; 2National Center for Microbial Resource (NCMR), National Center for Cell Science (NCCS), Pune, India; 3Reliance Life Sciences Pvt Ltd, Rabale, Mumbai, India; Corresponding author*



Dhiraj Dhotre - Dhiraj.Dhotre@relbio.com, Sunil D. Saroj - sunil.saroj@ssbs.edu.in


Article Type

Research Article



Received January 27, 2021; Revised March 10, 2021; Accepted March 10, 2021, Published March 31, 2021



16S rRNA gene analysis is the most convenient and robust method for microbiome studies. Inaccurate taxonomic assignment of bacterial strains could have deleterious effects as all downstream analyses rely heavily on the accurate assessment of microbial taxonomy. The use of mock communities to check the reliability of the results has been suggested. However, often the mock communities used in most of the studies represent only a small fraction of taxa and are used mostly as validation of sequencing run to estimate sequencing artifacts. Moreover, a large number of databases and tools available for classification and taxonomic assignment of the 16S rRNA gene make it challenging to select the best-suited method for a particular dataset. In the present study, we used authentic and validly published 16S rRNA gene type strain sequences (full length, V3-V4 region) and analyzed them using a widely used QIIME pipeline along with different parameters of OTU clustering and QIIME compatible databases. Data Analysis Measures (DAM) revealed a high discrepancy in ratifying the taxonomy at different taxonomic hierarchies. Beta diversity analysis showed clear segregation of different DAMs. Limited differences were observed in reference data set analysis using partial (V3-V4) and full-length 16S rRNA gene sequences, which signify the reliability of partial 16S rRNA gene sequences in microbiome studies. Our analysis also highlights common discrepancies observed at varioustaxonomic levels using various methods and databases.



16S rRNA gene; Genomic Databases; Taxonomic Discrepancy; QIIME.



Dixit et al. Bioinformation 17(3): 377-391 (2021)


Edited by

P Kangueane






Biomedical Informatics



This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.