Predicting the origin of stains from next generation sequencing mRNA data

Forensic Sci Int Genet. 2018 May:34:37-48. doi: 10.1016/j.fsigen.2018.01.001. Epub 2018 Jan 6.

Abstract

We used our previously published NGS mRNA approach for body fluid identification to analyse 183 body fluids/tissues, including mock casework samples. The resulting data set was used to build a probabilistic model that predicts the origin of a stain. Our approach uses partial least squares followed by linear discriminant analysis to classify samples into six commonly occurring forensic body fluids. The model differs from the ones previously suggested in that it incorporates quantitative information (NGS read counts) rather than just presence/absence of markers. The suggested approach also allows for visualisation of important markers and their correlation with the different body fluids. We compared our model to previously published methods to show that the inclusion of read count information improves the prediction. Finally, we applied the model to mixed body fluid samples to test its ability to identify the individual components in a mixture.

Keywords: Body fluid identification; Forensic science; Linear discriminant analysis (LDA); Massive parallel sequencing (MPS); Partial least squares (PLS); Prediction model; mRNA.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Blood Chemical Analysis
  • Cervix Mucus / chemistry
  • Discriminant Analysis
  • Female
  • Forensic Genetics / methods
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Least-Squares Analysis
  • Male
  • Menstruation
  • Models, Statistical
  • Probability
  • RNA, Messenger / genetics*
  • Saliva / chemistry
  • Semen / chemistry
  • Sequence Analysis, RNA*
  • Skin / chemistry

Substances

  • RNA, Messenger