HOME   |    PDF   |   


Title

Fast-HBR: Fast hash based duplicate read remover

Authors

Sami Altayyar* & Abdel Monim Artoli

 

Affiliation

Department of Computer Science, College of Computer and Information Sciences, King Saud University, P.O. Box 51178, Riyadh 11543, Saudi Arabia; *Corresponding author

 

Email

E-mail:436107303@student.ksu.edu.sa , aartoli@ksu.edu.sa;

 

Article Type

Research Article

 

Date

Received November 13, 2021; Revised November 29, 2021; Accepted November 29, 2021, Published January 31, 2022

 

Abstract

The Next-Generation Sequencing (NGS) platforms produce massive amounts of data to analyze various features in environmental samples. These data contain multiple duplicate reads which impact the analyzing process efficiency and accuracy. We describe Fast-HBR, a fast and memory-efficient duplicate reads removing tool without a reference genome using de-novo principles. It uses hash tables to represent reads in integer value to minimize memory usage for faster manipulation. Fast-HBR is faster and has less memory footprint when compared with the state of the art De-novo duplicate removing tools. Fast-HBR implemented in Python 3 is available at https://github.com/Sami-Altayyar/Fast-HBR.

 

Keywords

Fast-HBR, duplicate read remover, Fast hash

 

Citation

Altayyar & Monim Artoli, Bioinformation 18(1): 36-40 (2022)

 

Edited by

P Kangueane

 

ISSN

0973-2063

 

Publisher

Biomedical Informatics

 

License

This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.