In this paper, we present a model for the generation of grid cells and the emergence of place cells from multimodal input to the entorhinal cortex (EC). In this model, grid cell activity in the dorsocaudal medial entorhinal cortex (dMEC) [28] results from the operation of a long-distance path integration system located outside the hippocampal formation, presumably in retrosplenial and/or parietal cortex. If the connections between these structures and dMEC are organized as a modulo N operator, the resulting activity of dMEC neurons is a grid cell pattern. Furthermore, a robust high-resolution positional code can be built from a small set of different grid cells if the modulo factors are relatively prime. On the other hand, broad visual place cell activity in the MEC can result from the integration of visual information depending on the view-field of the visual input. The merging of entorhinal visual place cell information and grid cell information in the EC and/or in the dentate gyrus (DG) allows the building of precise and robust "place cells" (e.g., whose activity is maintained if light is suppressed for a short duration). Our model supports our previous proposition that hippocampal "place cell" activity code transitions between two successive states ("transition cells") rather than mere current locations. Furthermore, we discuss the possibility that the hippocampal loop participates in the emergence of grid cell activity but is not sufficient by itself. Finally, path integration at a short time scale (which is reset from one place to the next) would be merged in the subiculum with CA3/CA1 "transition cells" [22] to provide a robust feedback about current action to the deep layer of the entorhinal cortex in order to predict the recognition of the new animal location.