The Taiwan Biobank (TWB) aims to build a nationwide research database that integrates genomic/epigenomic profiles, lifestyle patterns, dietary habits, environmental exposure history and long-term health outcomes of 300,000 residents of Taiwan. We describe here an investigation of the population structure of Han Chinese on this Pacific island using genotype data of 591,048 SNPs in an initial freeze of 10,801 unrelated TWB participants. In addition to the North-South cline reported in other Han Chinese populations, we find the Taiwanese Han Chinese clustered into three cline groups: 5% were of northern Han Chinese ancestry, 79.9% were of southern Han Chinese ancestry, and 14.5% belonged to a third (T) group. We also find that this T group is genetically distinct from neighbouring Southeast Asians and Austronesian tribes but similar to other southern Han Chinese. Interestingly, high degree of LD between HLA haplotype A*33:03-B*58:01, an MHC allele being of pathological relevance, and SNPs across the MHC region was observed in subjects with T origin, but not in other Han Chinese. This suggested the T group individuals may have experienced evolutionary events independent from the other southern Han Chinese. Based on the newly-discovered population structure, we detect different loci susceptible to type II diabetes in individuals with southern and northern Han Chinese ancestries. Finally, as one of the largest dataset currently available for the Chinese population, genome-wide statistics for the 10,810 subjects are made publicly accessible through Taiwan View (https://taiwanview.twbiobank.org.tw/index; date last accessed October 14, 2016) to encourage future genetic research and collaborations with the island Taiwan.
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.