This paper studies the use of deep convolutional neural networks to segment heart sounds into their main components. The proposed methods are based on the adoption of a deep convolutional neural network architecture, which is inspired by similar approaches used for image segmentation. Different temporal modeling schemes are applied to the output of the proposed neural network, which induce the output state sequence to be consistent with the natural sequence of states within a heart sound signal (S1, systole, S2, diastole). In particular, convolutional neural networks are used in conjunction with underlying hidden Markov models and hidden semi-Markov models to infer emission distributions. The proposed approaches are tested on heart sound signals from the publicly available PhysioNet dataset, and they are shown to outperform current state-of-the-art segmentation methods by achieving an average sensitivity of 93.9% and an average positive predictive value of 94% in detecting S1 and S2 sounds.