Computational techniques for accurate and efficient prediction of protein-protein complex structures are widely used for elucidating protein-protein interactions, which play important roles in biological systems. Recently, it has been reported that selecting a structure similar to the native structure among generated structure candidates (decoys) is possible by calculating binding free energies of the decoys based on all-atom molecular dynamics (MD) simulations with explicit solvent and the solution theory in the energy representation, which is called evERdock. A recent version of evERdock achieves a higher-accuracy decoy selection by introducing MD relaxation and multiple MD simulations/energy calculations; however, huge computational cost is required. In this paper, we propose an efficient decoy selection method using evERdock and the best arm identification (BAI) framework, which is one of the techniques of reinforcement learning. The BAI framework realizes an efficient selection by suppressing calculations for nonpromising decoys and preferentially calculating for the promising ones. We evaluate the performance of the proposed method for decoy selection problems of three protein-protein complex systems. Their results show that computational costs are successfully reduced by a factor of 4.05 (in the best case) compared to a standard decoy selection approach without sacrificing accuracy.