The rise of antibiotic-resistant Mycobacterium tuberculosis (Mtb) has reduced the availability of medications for tuberculosis therapy, resulting in increased morbidity and mortality globally. Tuberculosis spreads from the lungs to other parts of the body, including the brain and spine. Developing a single drug can take several decades, making drug discovery costly and time-consuming. Machine learning algorithms like support vector machines (SVM), k-nearest neighbor (k-NN), random forest (RF) and Gaussian naive base (GNB) are fast and effective and are commonly used in drug discovery. These algorithms are ideal for the virtual screening of large compound libraries to classify molecules as active or inactive. For the training of the models, a dataset of 307 was downloaded from BindingDB. Among 307 compounds, 85 compounds were labeled as active, having an IC50 below 58 mM, while 222 compounds were labeled inactive against thymidylate kinase, with 87.2% accuracy. The developed models were subjected to an external ZINC dataset of 136,564 compounds. Furthermore, we performed the 100-ns dynamic simulation and post trajectories analysis of compounds having good interaction and score in molecular docking. As compared to the standard reference compound, the top three hits revealed greater stability and compactness. In conclusion, our predicted hits can inhibit thymidylate kinase overexpression to combat Mycobacterium tuberculosis.Communicated by Ramaswamy H. Sarma.
Keywords: MD simulation; Machine learning; docking; thymidylate kinase; virtual screening.