Quilty-Dunn et al. argue that deep convolutional neural networks (DCNNs) optimized for image classification exemplify structural disanalogies to human vision. A different kind of artificial vision - found in reinforcement-learning agents navigating artificial three-dimensional environments - can be expected to be more human-like. Recent work suggests that language-like representations substantially improves these agents' performance, lending some indirect support to the language-of-thought hypothesis (LoTH).