A fact based analysis of decision trees for improving reliability in cloud computing

Muhammad Asim Shahid; Muhammad Mansoor Alam; Mazliham Mohd Su'ud

doi:10.1371/journal.pone.0311089

A fact based analysis of decision trees for improving reliability in cloud computing

PLoS One. 2024 Dec 3;19(12):e0311089. doi: 10.1371/journal.pone.0311089. eCollection 2024.

Authors

Muhammad Asim Shahid^{1

2}, Muhammad Mansoor Alam^{1

3

4

5}, Mazliham Mohd Su'ud⁵

Affiliations

¹ Malaysian Institute of Information Technology, Universiti Kuala Lumpur, Kuala Lumpur, Malaysia.
² School of Computing and Information Sciences, Sohail University, Karachi, Pakistan.
³ Faculty of Computing, Riphah International University, Islamabad, Pakistan.
⁴ School of Computer Science, University of Technology Sydney, Ultimo, NSW, Australia.
⁵ Persiaran Multimedia, Multimedia University, Cyberjaya, Malaysia.

Abstract

The popularity of cloud computing (CC) has increased significantly in recent years due to its cost-effectiveness and simplified resource allocation. Owing to the exponential rise of cloud computing in the past decade, many corporations and businesses have moved to the cloud to ensure accessibility, scalability, and transparency. The proposed research involves comparing the accuracy and fault prediction of five machine learning algorithms: AdaBoostM1, Bagging, Decision Tree (J48), Deep Learning (Dl4jMLP), and Naive Bayes Tree (NB Tree). The results from secondary data analysis indicate that the Central Processing Unit CPU-Mem Multi classifier has the highest accuracy percentage and the least amount of fault prediction. This holds for the Decision Tree (J48) classifier with an accuracy rate of 89.71% for 80/20, 90.28% for 70/30, and 92.82% for 10-fold cross-validation. Additionally, the Hard Disk Drive HDD-Mono classifier has an accuracy rate of 90.35% for 80/20, 92.35% for 70/30, and 90.49% for 10-fold cross-validation. The AdaBoostM1 classifier was found to have the highest accuracy percentage and the least amount of fault prediction for the HDD Multi classifier with an accuracy rate of 93.63% for 80/20, 90.09% for 70/30, and 88.92% for 10-fold cross-validation. Finally, the CPU-Mem Mono classifier has an accuracy rate of 77.87% for 80/20, 77.01% for 70/30, and 77.06% for 10-fold cross-validation. Based on the primary data results, the Naive Bayes Tree (NB Tree) classifier is found to have the highest accuracy rate with less fault prediction of 97.05% for 80/20, 96.09% for 70/30, and 96.78% for 10 folds cross-validation. However, the algorithm complexity is not good, taking 1.01 seconds. On the other hand, the Decision Tree (J48) has the second-highest accuracy rate of 96.78%, 95.95%, and 96.78% for 80/20, 70/30, and 10-fold cross-validation, respectively. J48 also has less fault prediction but with a good algorithm complexity of 0.11 seconds. The difference in accuracy and less fault prediction between NB Tree and J48 is only 0.9%, but the difference in time complexity is 9 seconds. Based on the results, we have decided to make modifications to the Decision Tree (J48) algorithm. This method has been proposed as it offers the highest accuracy and less fault prediction errors, with 97.05% accuracy for the 80/20 split, 96.42% for the 70/30 split, and 97.07% for the 10-fold cross-validation.

Copyright: © 2024 Asim Shahid et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Algorithms*
Bayes Theorem*
Cloud Computing*
Decision Trees*
Deep Learning
Machine Learning*
Reproducibility of Results

Grants and funding

The author(s) received no specific funding for this work.