Whole cell cryo-electron tomography emerges as an important component for structural system biology approaches. It allows the localization and structural characterization of macromolecular complexes in near living conditions. However, the method is hampered by low resolution, missing data and low signal-to-noise ratio (SNR). To overcome some of these difficulties one can align and average a large set of subtomograms. Existing alignment methods are mostly based on an exhaustive scanning and sampling of all but discrete relative rotations and translations of one subtomogram with respect to the other. In this paper, we propose a gradient-guided alignment method based on two subtomogram similarity measures. We also propose a stochastic parallel optimization that increases significantly the efficiency for the simultaneous refinement of a set of alignment candidates. Results on simulated data of model complexes and experimental structures of protein complexes show that even for highly distorted subtomograms and with only a small number of very sparsely distributed initial alignment seeds, our method can accurately recover true transformations with a significantly higher precision than scanning based alignment methods.