The presence of occlusion in human activity recognition (HAR) tasks hinders the performance of recognition algorithms, as it is responsible for the loss of crucial motion data. Although it is intuitive that it may occur in almost any real-life environment, it is often underestimated in most research works, which tend to rely on datasets that have been collected under ideal conditions, i.e., without any occlusion. In this work, we present an approach that aimed to deal with occlusion in an HAR task. We relied on previous work on HAR and artificially created occluded data samples, assuming that occlusion may prevent the recognition of one or two body parts. The HAR approach we used is based on a Convolutional Neural Network (CNN) that has been trained using 2D representations of 3D skeletal motion. We considered cases in which the network was trained with and without occluded samples and evaluated our approach in single-view, cross-view, and cross-subject cases and using two large scale human motion datasets. Our experimental results indicate that the proposed training strategy is able to provide a significant boost of performance in the presence of occlusion.
Keywords: convolutional neural networks; deep learning; human activity recognition; occlusion.