Advances in next-generation sequencing (NGS) have significantly reduced the cost and improved the efficiency of obtaining single nucleotide polymorphism (SNP) markers, particularly through restriction site-associated DNA sequencing (RAD-seq). Meanwhile, the progression in whole genome sequencing has led to the utilization of an increasing number of reference genomes in SNP calling processes. This study utilized RAD-seq data from 242 individuals of Engelhardia roxburghiana, a tropical tree of the walnut family (Juglandaceae), with SNP calling conducted using the STACKS pipeline. We aimed to compare both reference-based approaches, namely, employing a closely related species as the reference genome versus the species itself as the reference genome, to evaluate their respective merits and limitations. Our findings indicate a substantial discrepancy in the number of obtained SNPs between using a closely related species as opposed to the species itself as reference genomes, the former yielded approximately an order of magnitude fewer SNPs compared to the latter. While the missing rate of individuals and sites of the final SNPs obtained in the two scenarios showed no significant difference. The results showed that using the reference genome of the species itself tends to be prioritized in RAD-seq studies. However, if this is unavailable, considering closely related genomes is feasible due to their wide applicability and low missing rate as alternatives. This study contributes to enrich the understanding of the impact of SNP acquisition when utilizing different reference genomes.
Keywords: RAD-seq; Reference-based approach; SNP calling; STACKS.
Copyright © 2024 Elsevier B.V. All rights reserved.