Detecting anomalies in large networks is a major challenge. Nowadays, many studies rely on machine learning techniques to solve this problem. However, much of this research depends on synthetic or limited datasets and tends to use specialized machine learning methods to achieve good detection results. This study focuses on analyzing firewall logs from a large industrial control network and presents a novel method for generating anomalies that simulate real attacker actions within the network without the need for a dedicated testbed or installed security controls. To demonstrate that the proposed method is feasible and that the constructed logs behave as one would expect real-world logs to behave, different supervised and unsupervised learning models were compared using different feature subsets, feature construction methods, scaling methods, and aggregation levels. The experimental results show that unsupervised learning methods have difficulty in detecting the injected anomalies, suggesting that they can be seamlessly integrated into existing firewall logs. Conversely, the use of supervised learning methods showed significantly better performance compared to unsupervised approaches and a better suitability for use in real systems.
Keywords: anomaly detection; artificially generated attacks; cybersecurity; datasets; firewall logs; machine learning; security logs.