Machine learning (ML) already accelerates discoveries in many scientific fields and is the driver behind several new products. Recently, growing sample sizes enabled the use of ML approaches in larger omics studies. This work provides a guide through a typical analysis of an omics dataset using ML. As an example, this chapter demonstrates how to build a model predicting Drug-Induced Liver Injury based on transcriptomics data contained in the LINCS L1000 dataset. Each section covers best practices and pitfalls starting from data exploration and model training including hyperparameter search to validation and analysis of the final model. The code to reproduce the results is available at https://github.com/Evotec-Bioinformatics/ml-from-omics .
Keywords: Artificial intelligence; DILI; Drug discovery; Drug-Induced Liver Injury; Machine learning; SVM; Support vector machine; Transcriptomics.
© 2022. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.