In this paper, we considered one of the problems that arise during drilling automation, namely the automation of lithology identification from drill cuttings images. Usually, this work is performed by experienced geologists, but this is a tedious and subjective process. Drill cuttings are the cheapest source of rock formation samples; therefore, reliable lithology prediction can greatly reduce the cost of analysis during drilling. To predict the lithology content from images of cuttings samples, we used a convolutional neural network (CNN). For training a model with an acceptable generalization ability, we applied dataset-cleaning techniques, which help to reveal bad samples, as well as samples with uncertain labels. It was shown that the model trained on a cleaned dataset performs better in terms of accuracy. Data cleaning was performed using a cross-validation technique, as well as a clustering analysis of embeddings, where it is possible to identify clusters with distinctive visual characteristics and clusters where visually similar samples of rocks are attributed to different lithologies during the labeling process.
Keywords: drill cuttings; lithology prediction; machine learning; noisy labels.