Analyzing the performance of de novo metagenome assemblers for long-read HiFi data
Open Access
Author:
Murali, Mayank
Graduate Program:
Computer Science and Engineering
Degree:
Master of Science
Document Type:
Master Thesis
Date of Defense:
March 14, 2022
Committee Members:
Chitaranjan Das, Program Head/Chair Mingfu Shao, Thesis Advisor/Co-Advisor David Koslicki, Committee Member
Keywords:
Metagenome assembly long-reads
Abstract:
Advancement in DNA sequencing technology has produced whole genome sequences for an increased number of species. Short-reads are cheaper and comparatively have lower error-rate than long-reads. Genome assembly tools that use such sequencing data (short-reads) have been successfully able to assemble metagenomic data but a major percentage of the results are contaminated and incomplete. The introduction of HiFi long-reads by PacBio has accredited the researchers to develop new algorithms for metagenomic assembly that utilize the maximum potential of long-reads technology to capture 16S RNA genes at full-length. For this thesis, we develop a pipeline to benchmark and analyze the different state-of-the-art de novo metagenome assemblers through both synthetic and
real datasets that consists of gut microbiome metagenomic data from various species. We evaluate the steps in the pipeline using tools like metaQUAST, BUSCO and CheckM. The benchmark uncovers differences in performance, methods and the difficulties in running assessment on the results.