Recently, soft attention mechanisms have been successfully used in a wide variety of applications such as the generation of image captions, text translation, etc. This mechanism attempts to mimic the visual cortex of a human brain by not analyzing all the objects in a scene equally, but by looking for clues (or salient features) which might give a more compact representation of the environment. In doing so, the human brain can process information more quickly and without overloading. Having learned this lesson, in this paper, we try to make a bridge from the visual to the audio scene classification problem, namely the classification of heart sound signals. To do so, a novel approach merging soft attention mechanisms and recurrent neural nets is proposed. Using the proposed methodology, the algorithm can successfully learn automatically significant audio segments when detecting and classifying abnormal heart sound signals, both improving these classification results and somehow creating a simple justification for them.